Arduino Amiga Floppy Disk Reader/Writer

Open Source and Free! - by Robert Smith

The Story So Far

Below is a description and log of how I created this solution:


My Amiga Background

I owe my career to the Amiga, specifically the A500+ that my parents purchased for me for Christmas at the age of 10. At first I played the games, but after a while I started getting curious about what else it could do. I played with the Deluxe Paint III and learnt about Workbench.

The Amiga 500 Plus:

Every month I purchased the popular Amiga Format magazine. One month had a free copy of AMOS. I entered the Amiga Format Write A Game In AMOS competition when AMOS Professional was put on a coverdisk later, and was one of the 12 (I think) winners with In The Pipe Line. You really had to chase them for prizes though!

AMOS - The Creator!:

Moving on, I used it as part of my GCSEs and A-Level projects (thanks to Highspeed Pascal, which was compatible with Turbo Pascal on the PC)

Anyway, that was a long time ago, and I have boxes of disks, and an A500+ that doesn’t work anymore, so I thought about backing those disks up onto my computer, for both preservation and nostalgia.

The Amiga Forever website has an excellent list of options that include hardware, and abusing two floppy drives in a PC - Sadly none of these were an option with modern hardware, and the KryoFlux/Catweasel controllers are too expensive. I was really surprised that most of it was closed source.

Massively into electronics and having played with Atmel devices (AT89C4051) whilst at University I decided to take a look at the Arduino (credit to GreatScott for the inspiration showing just how easy it is to get started) I wondered if this was possible.

So I Googled for Arduino floppy drive reading code, and after skipping all of the projects that abused the drive to play music, I didn’t really find any solutions. I found a few discussions in a few groups suggesting it wouldn’t be possible. I did find a project based around an FPGA which was very interesting reading, but not the direction I wanted to go, so the only option was to build a solution myself.



When I started this project I hadn’t got a clue how the floppy drive worked, and even less how the data was encoded onto them. The following websites were invaluable in my understanding on what happens and how they work:



Based on the research I now knew theoretically how the data was written to the disk, and how the disk spun.

I began to work out some numbers. Based on the speed the double density disk rotated at (300rpm) and the way the data is stored (80 tracks, 11 sectors per track and 512 bytes per sector, encoded using MFM), to read the data accurately I needed to be able to sample the data at 500Khz; that’s quite fast when you consider the Arduino is only running at 16Mhz.

In the attempts that follow I’m only talking about the Arduino side. Jump to decoding.


Attempt 1:

First I needed to gather the hardware and interface to the floppy drive. The floppy drive I took from an old PC at work, and grabbed its IDE cable at the same time.

Below is a photo of liberated floppy drive from an old PC:

Studying the pinout of the drive I realised I only needed a few of the wires from it, and after looking at the drive I realised it didn't use the 12v input either.

Getting the drive spinning was achieved by selecting the drive and enabling the motor. Moving the head was simple. You set the /DIR pin high or low, and then pulsed the /STEP pin. You could tell if the head had reached track 0 (the first track) by monitoring the /TRK00 pin.

I was curious about the /INDEX pin. This pulses once each rotation. As the Amiga doesn't use this to find the start of the track I didn't need it and could ignore it. After this its just a matter of choosing which side of the disk to read (/SIDE1) and connecting up /RDATA.

With the high data rate requirement my first thought was to find a way to make this less of an issue by trying to reduce the requirements on this rate.

The plan was to use two 8-bit shift registers (SN74HC594N) to reduce the required sampling frequency by a factor of 8. I used was what Ebay called Pro Mini atmega328 Board 5V 16M Arduino Compatible Nano (so I don't know what that is officially, but this does work on the Uno!) to buffer this parallel data and send it to the PC using it’s serial/USART interface. I knew this needed to be running faster than 500Kbaud (with all of the serial overhead involved too).

After ditching the standard Arduino serial library, I was really pleased to find I could configure the USART on the Arduino at uptp 2M baud, and with one of those F2DI break-out boards (ebay called it Basic Breakout Board For FTDI FT232RL USB to Serial IC For Arduino - see below) I could happily and send and receive data at this rate (62.5Khz) but I needed to do this accurately.

The FTDI breakout board that perfectly fits the interface on the Arduino board:

First I used the arduino to setup on of the 8-bit shift registers only one of the 8 bits clocked high. The other received a feed directly from the floppy drive (thus providing serial to parallel conversion).

The folloing is a crazy picture of the breadboard I built this on at the time:

I used one of the Arduinos timers to generate a 500Khz signal on one of its output pins and as the hardware manages this it is very accurate! - Well, my multimeter measured it as exactly 500khz anyway.

The code worked, I clocked in a full 8-bits of data at 62.5khz, leaving the Arduino CPU hardly utilized. However I wasn’t receiving anything meaningful. At this point I realised I needed to take a closer look at the actual data coming out of the floppy drive. So I purchased a cheap old Oscilloscope from Ebay (Gould OS300 20Mhz Oscilloscope) to check out what was going on.

Whilst waiting for the oscilloscope to arrive I decided to try something else.

A fragment of code used to read the data from the shift registers:

void readTrackData() {
    byte op;
    for (int a=0; a<5632; a++) {
        // We'll wait for the "byte" start marker
        while (digitalRead(PIN_BYTE_READ_SIGNAL)==LOW) {};
        // Read the byte
        if (digitalRead(DATA_LOWER_NIBBLE_PB0)==HIGH) op|=1;
        if (digitalRead(DATA_LOWER_NIBBLE_PB1)==HIGH) op|=2;
        if (digitalRead(DATA_LOWER_NIBBLE_PB2)==HIGH) op|=4;
        if (digitalRead(DATA_LOWER_NIBBLE_PB3)==HIGH) op|=8;
        if (digitalRead(DATA_UPPER_NIBBLE_A0)==HIGH) op|=16;
        if (digitalRead(DATA_UPPER_NIBBLE_A1)==HIGH) op|=32;
        if (digitalRead(DATA_UPPER_NIBBLE_A2)==HIGH) op|=64;
        if (digitalRead(DATA_UPPER_NIBBLE_A3)==HIGH) op|=128;

        // Wait for high to drop again 
        while (digitalRead(PIN_BYTE_READ_SIGNAL)==HIGH) {};


Attempt 2:

I decided that the shift registers, whilst a nice idea probably weren’t helping. I was able to easily read 8 bits in one go, but it occurred to me that I couldn’t be sure that all of the bits were clocked in correctly in the first place. Reading the documentation it suggested that the data were more of short pulses rather than highs and lows.

I removed the shift registers and wondered what would happen if I tried to check for a pulse from the drive in an Interrupt (ISR) using the previously setup 500Khz signal. I reconfigured the Arduino to generate the ISR, and after I got passed the issues of the Arduino libraries getting in the way (using the ISR I wanted) I moved to Timer 2.

I wrote a short ISR that would shift left a global single byte by one bit and then if the pin connected to the floppy drive data line was LOW (the pulses are low-going) I would OR a 1 onto it. Every 8 times I did this I wrote the completed byte to the USART.

This didn't go as expected! The Arduino started to behave very erratically and strange. I soon realised the ISR was taking more time to execute than than the time between calls to it. (I could receive a pulse every 2µSec and based on the speed of the Arduino and making a wild assumption that every C instruction translated to 1 clock machine code cycle I realised I could at most have 32 instructions. Sadly most would be more than one instruction, and after Googling I realised the overhead on starting an ISR was massive anyway. Not to mention the digitalRead functions being very slow.

I ditched the digitalRead function in favour of accessing the port pins directly! This still didn’t help and wasn’t fast enough. Not prepared to give up, I shelved this approach and decided to move on and try something else.

At this point the oscilloscope that I purchased arrived, and it worked! A crusty old Oscilloscope that was probably older than me! But still did the job perfectly. (If you don’t know what n Oscilloscope is check out EEVblog #926 - Introduction To The Oscilloscope, and if you're into electronics then I suggest watching a few more and having a browse around the EEVBlog website.

My newly purchased crusty old Oscilloscope (Gould OS300 20Mhz):

After connecting the 500Khz signal to one channel and the output from the floppy drive to another it was obvious something wasn’t right. The 500Khz signal was a perfect square wave using it as a trigger, the floppy data was all over the place. I could see the pulses, but it was more of a blur. Likewise if I triggered from the floppy drive signal, the 500Khz signal square wave signal was all over the place and not in sync with it.

Photos of the traces on the oscilloscope triggering off of the two channels. You cant quite see it, but on the channel not being triggered is thousands of faint ghostly lines:

Individually I could measure pulses from both signals at 500Khz, which didn’t make sense, as if they ere both running at the same speed but won’t trigger so you can see both signals properly then something must be wrong.

After a lot of playing with the trigger levels I managed to work out what was going on. My signal was a perfect 500Khz, but looking at the floppy drive dignal, well they were spaced correctly, but not all the time. Between groups of pulses there was an error drift, ans also gaps in the data that put the signal totally out of sync.

Remembering the previous research, the drive was supposed to rotate at 300rpm, but it might not actually be exactly 300rpm, plus the drive that wrote the data might also not be at exactly 300rpm. Then there is the spacing between sectors and sector gaps. Clearly there was a synchronisation issue, and synchronising the 500Khz signal to the floppy drive at the start of a read wasn’t going to work.

I also discovered that the pulse from the floppy drive was extremely short, although you could modify this by changing the pullup resistor, and if the timing was not exactly right then the Arduino might miss a pulse all together.

When I was at university (University of Leicester) I took a module called embedded system. We studied the Atmel 8051 microcontrollers. One of the projects involved counting pulses from a simulated weather station (rotary encoder). Back then I sampled the pin at regular intervals, but this wasn't very accurate.

The module lecturer, Prof Pont suggested that I should have used the hardware counter features of the device (I didn’t even know it had one the time.)

I checked the datasheet for the ATMega328 and sure enough each of the three timers could be configured to count pulses triggered from an external input. This meant speed was no longer an issue. All I actually needed to know was if a pulse had occurred within a 2µSec time window.


Attempt 3:

I adjusted the Arduino sketch to reset the 500khz timer when the first pulse was detected and each time the 500khz timer overflowed I checked the counter value to see if a pulse had been detected. I then performed the same bit-shifting sequence and every 8 bits wrote out a byte to the USART.

Data was coming in and I started to analyse it on the PC. In the data I started to see what looked like valid data. The odd sync word would appear, or groups of 0xAAAA sequences, but nothing reliable. I knew I was on to something, but was still missing something.


Attempt 4:

I realised that as the data was being read, the data from the drive was probably going out of sync/phase with my 500khz signal. I confirmed this by just reading 20 bytes each time I started reading.

Reading up about how to handle this sync issue I stumbled across the phrase Phase Locked Loop or PLL. In very simple terms, for what we are doing, the phase locked loop would dynamically adjusts the clock frequency (the 500khz) to compensate for frequency drift and variance in the signal.

The resolution on the timer wasn't high enough to vary it by smal enough amounts (eg: 444khz, 470khz, 500khz, 533khz, 571khz etc) and to perform this properly I would probably need the code to run a whole lot faster.

The Arduino timers work by counting up to a predefined number (in this case 16 for 500khz) then they set an overflow register and start again from 0. The actual counter value can be read and written to at any point.

I adjusted the sketch to wait in a loop until the timer overflowed, and when it overflowed I checked for a pulse as before. The difference time time was that when a pulse was detected inside the loop I reset the timer counter value to a pre-defined phase position, effectivly resynchronising the timer with each pulse.

I chose the value I wrote to the timer counter such that it would overflow at 1µSec from the detection pulse (half way) so that next time the timer overflowed the pulse would have been 2µSec appart.

This worked! I was now reading almost perfect data from the disk. I was still getting a lot of checksum errors which was annoying. I resolved most of these by continuously re-reading the same track on the drive until I had all 11 sectors with valid header and data checksums.

I was curious at this point, so I hooked it all back up to the oscilloscope again to see what was going on now, and as I guessed, I could now see both traces as they were both staing in sync with each other:

I would love to see this a little clearer, if anyone wants to donate me a lovely top of the range digital oscilloscope (eg one of them Keysight ones!) I would really appreciate it!


Attempt 5:

I wondered if I could improve on this. Looking at the code, specifically the inner reading loop (see below) I had a while loop waiting for the overflow and then an inner if looking for a pulse to sync to.

A fragment of code used to read the data and sync to it:

register bool done = false;

// Wait for 500khz overflow 
while (!(TIFR2&_BV(TOV2))) {   
    // falling edge detected while waiting for the 500khz pulse.  
    if ((TCNT0) && (!done)) {
        // pulse detected, reset the timer counter to sync with the pulse

        // Wait for the pulse to go high again
        done = true;

// Reset the overflow flag
// Did we detect a pulse from the drive?
if (TCNT0) {                            

I realised that depending on which instruction was being executed in the above loops, the time between pulse detection and writing TCNT2=phase; could change by the time taken to execute a few instructions.

Realising that this may be causing some errors/jitter in the data and also with the above loop it is possible I might actually miss the pulse from the drive (thus missing a re-sync bit) I decided to take trick from one of my earlier attempts, the ISR (interrupt).

I wired the data pulse to a second pin on the Arduino. The data was now connected to the COUNTER0 trigger and now also the INT0 pin. INT0 is one of the highest interrupt priorities so should minimise the delays between trigger and the ISR being called, and as this is the only interrupt I am actually interested in al of the others are disabled.

All the interrupt needed to do was perform the re-sync code above, this changed the code to look like this:

// Wait for 500khz overflow 
while (!(TIFR2&_BV(TOV2))) {}   

// Reset the overflow flag

// Did we detect a pulse from the drive?
if (TCNT0) {                            

The ISR looked like this: (note I didn't use attachInterrupt as this also adds overhead to the call)

volatile byte targetPhase;

ISR (INT0_vect) {

But compiling this produced far too much code to execute fast enough. In fact disassembling the above produced:

push    r1
push    r0
in      r0,     0x3f	; 63
push    r0
eor     r1,     r1
push    r24 
lds     r24,    0x0102	; 0x800102 
sts     0x00B2, r24	; 0x8000b2 <__TEXT_REGION_LENGTH__+0x7e00b2>
pop     r24
pop     r0
out     0x3f,   r0	; 63
pop     r0
pop     r1

By analysing the code I realised there were only a few instructions I actually needed. Noting that the compiler would keep track of any registers I bashed I changed the ISR as follows:

volatile byte targetPhase asm ("targetPhase");

ISR (INT0_vect) {
  asm volatile("lds __tmp_reg__, targetPhase");      
  asm volatile("sts %0, __tmp_reg__" : : "M" (_SFR_MEM_ADDR(TCNT2)));

Which disassembled, produced the following instructions:

push	r1
push	r0
in	r0, 0x3f	; 63
push	r0
eor	r1, r1
lds	r0, 0x0102	; 0x800102 
sts	0x00B2, r0	; 0x8000b2 <__TEXT_REGION_LENGTH__+0x7e00b2>
pop	r0
out	0x3f, r0	; 63
pop	r0
pop	r1

Still too many instructions. I noticed that the compiler was adding a lot of extra instructions, that for my application really didn't need to be there. So I looked up the ISR() and stumbled upon a second parameter ISR_NAKED. Adding this would prevent the compiler from adding any special code, but then I would be responsible for maintaining registers, the stack and returning from the interrupt correctly. I also would need to maintain the SREG register, but as none of the commands I needed to call modified it I didn't need to worry about it.

This changed the ISR code to become:

ISR (INT0_vect, ISR_NAKED) {
  asm volatile("push __tmp_reg__");                                         // Preserve the tmp_register
  asm volatile("lds __tmp_reg__, targetPhase");                             // Copy the phase value into the tmp_register
  asm volatile("sts %0, __tmp_reg__" : : "M" (_SFR_MEM_ADDR(TCNT2)));       // Copy the tmp_register into the memory location where TCNT2 is
  asm volatile("pop __tmp_reg__");                                          // Restore the tmp_register
  asm volatile("reti");                                                     // And exit the ISR

Which the compiler converted to:

push	r0
lds	r0, 0x0102	; 0x800102 
sts	0x00B2, r0	; 0x8000b2 <__TEXT_REGION_LENGTH__+0x7e00b2>
pop	r0

Five instructions! Perfect, or at least as fast as it was going to be, theoretically taking 0.3125µSec to execute! This should now mean the re-sync should happen at time-consistant periods after the pulse. Below is a timing diagram of what is going on. This is how you recover data from a serial data feed that doesn't have a clock signal:

This improved the results a little. It’s still not perfect. Some disks read perfectly every time, some disks it takes ages and has to keep retrying. I am unsure if this is because some of the disks have been sitting there for so long that the magnetism has degraded to such a low level that the drives amplifiers can't cope with it. I wondered if this was something to do with the PC floppy disk drive, so I connected this up to an external Amiga floppy disk drive I had, but the results were identical.


Attempt 6:

I wondered if there was anything else that could be done. Perhapses the signal from the drive was more noisy than I thought it had been. After reading further information I discovered that a 1KOhm pullup resistor was the norm, fed into a Schmitt trigger.

After installing an SN74HCT14N Hex Schmitt Trigger and reconfiguring the sketch to trigger on rising edges instead of falling edges I gave it a try, but it didn't really make any notable difference. I guess as I was looking for one or more pulses each time this probably absorbed any noise anyway. So we'll stick method Attempt 5!

My final breadboard solution looked line this:

Note the wiring on the above is slightly different to the live sketch. I re-ordered some of the Arduino pins to make the circuit diagram easier.


Attempt 7:

I was a little dissatisfied with some of the disks I had not reading. Some times the disks just didn't sit correctly in the floppy drive. I guess the spring on the shutter wasn't helping.

I started looking at detecting if there were any errors in the actual received MFM data from the disk.

From the rules of how MFM encoding works, I realised that a few simple rules could be applied as follows:

Firstly when decoding MFM data I looked to see if there were two '1's in a row. If they were I assumed that the data had got a little blurred over time and ignored the second '1'.

With this rule applied, there are literally three situations of 5 bits where errors are left to occur. This would be a new area where I could look to improve the data.

Mostly though I was surprised, there really weren't that many MFM errors detected. I am a little confused why some of the disks won't read when no errors are found.

This is an area for further investigation.



After reading how MFM worked, I wasn’t entirely sure how it aligned correctly.

At first I thought that the drive output 1s and 0s for the on and off bits. This wasn’t the case. The drive outputs a pulse for every phase transition, ie: every time the data went from 0 to 1, or 1 to 0.

After reading this I wondered if I needed to convert this back into 1’s and 0s by feeding it into a flip-flop toggle, or read the data, search for sectors, and if none were found then invert the data and try again!

It turns out this isn’t the case and it’s much simpler. The pulses are actually the RAW MFM data and can be fed straight into the decoding algorithms. Now I understood this I started writing code to scan a buffer from the drive and search for the sync word 0x4489. Surprisingly I found it!

From the research I had conducted, I realised I needed to actually search for 0xAAAAAAAA44894489 (a note from the research also suggested that there were some bugs in early Amiga code that meant that the above sequence wasn’t found. So instead I searched for 0x2AAAAAAA44894489 after ANDing the data with 0x7FFFFFFFFFFFFFFF).

As expected I found up to 11 of these on each track corresponding to the actual start of the 11 Amiga sectors. I then started to read the bytes that followed to see if I could decode the sector information.

I took a snippet of code from one of the above references to help with MFM decoding. No point in re-inventing the wheel eh?

After reading the header and data, I tried writing it to disk as an ADF file. The standard ADF file format is very simple. It is literally just the 512 bytes from each sector (from both sides of the disk) written in order. After writing it and trying to open it with ADFOpus and got mixed results, sometimes it opened the file, sometimes it failed. There were obviously errors in the data. I started to look at the checksum fields in the header, rejecting sectors with invalid checksums and repeating reading until I had 11 valid ones.

For some disks this was all 11 on the first read, some took several attempts and different phase values too.

Finally I managed to write valid ADF files. Some disks would take ages, some literally the speed the Amiga would have read them. Not having a working Amiga anymore I couldn’t actually check if these disks read properly normally, they’ve been stored in a box in the attic for years so may well have degraded.


Writing Data - Attempt 1

So after successfully being able to read disks I figured if you want to keep the original physical medium you might want to write disks back again. I figured I'd work this out in reverse, starting with the software (ie: converting the ADF disk files into MFM data for the interface to write somehow)

So I started by adding classes to read an ADF disk, and encode all the sectors as one track. Knowing I could potentially test the data I created by feeding it back into the decoding part, I started work on this. While working on this I decided to try to find out what was wrong with my Amiga. After-all, I can't test any disks I create if I don't have anything real to test them on.

Taking my A500+ apart I noticed it had suffered one of the most common problems, the clock battery had leaked everywhere. So I de-soldered this from the board and set about cleaning the board up. Whilst at it I pulled the entire machine out and set about cleaning up 20 years of dust and grime. I even took the floppy drive apart to clean it.

Whilst cleaning it I decided it was time to get rid of the yellowing, so followed the information about Retr0brite and tried it.

I then checked all of the joints on the main motherboard and found a lose connection by the power connector, a few touchups with the soldering iron and as good as new. I waited until I was happy with the Retr0brite process before reassembling the computer

Meanwhile I continued working on the code for writing disks. I wanted to read the status of the write protect line, but no matter what I set it to it didn't seem to change voltage. So I pulled the drive apart and followed the traces from the little switches that detect the write protect status to a little IC. At this point I guessed that the output is probably only available when you actually want to write data.

After a lot of experimentation I found that you needed to pull the /WRITE_GATE pin LOW before spinning up the drive to enable writing. At this point you could obtain the write protect status. I also noticed that while the /WRITE_GATE was low the drive didn't switch back off like it used to until that pin had returned to its default HIGH state.

The Amiga would write an entire track in one go. A track in memory is 11*512 bytes (5638 bytes), however, after MFM encoding and putting in correct AmigaDOS format, the track works out as 14848 bytes. Well, there's no way that can fit in the Arduino's 2k of memory, nor its 1k of EEPROM. I needed an alternative method.

I decided I would try to send the data 1 byte at a time in a high priority thread and wait for a response byte from the Arduino before sending the next. I changed the baud rate to 2M to reduce the lag between characters. This meant that it took roughly 5.5uS to send each character, and 5.5uS to receive one back. The Arduino would need to write out 8 bits, at 500khz, so would need a new byte every 16uS. So there should be time, assuming the code loop is tight enough and the operating system doesn't delay the sending and receiving too much.

This was a complete utter failure. The entire read/write cycle took far too long, well beyond one revolution of the disk. The Arduino side was probably fast enough, but the OS wasn't responsive enough. Reading disks works because the OS (Windows in my case) would buffer the data coming in, but writing, Windows would just send it all in one go, but because of the rate I'm sending at is far faster than the Arduino needs it data would be lost. This was why I decided on this two-way acknowledgement process


Writing Data - Attempt 2

Software flow control for this application was just not fast enough. I decided to investigate hardware flow control. I noticed on the FTDI breakout board there are CTS and DTR pin. These stand for Clear To Send and Data Terminal Ready. I noticed that while the breakout board was connected, the Arduino board connected the CTS to GND.

FTDI breakout board with CTS pin bent out and a wire attached

I also didn't know which direction these pins were actually in, but after some experimentation, I found the CTS pin could be signalled from the Arduino and used by the PC to control the flow. Normally this is done using a circular buffer, but in my case I couldn't allow this, so I simply set it to '1' when I don't want data, and '0' while I do.

FTDI breakout board connected to the Arduino on the breadboard

This now meant I could just ask the OS to bulk send the bytes as one chunk, and hope that it was all handled at the kernel level it wouldn't get interrupted.

I had an inner loop that output each bit from the 8 bits but decided it was probably better timing wise to unravel it into 8 sets of commands instead.

This didn't work. If I allowed the code to run without actually running the disk writing part then all bytes were received correctly, with running the code it didn't and bytes being received were being lost.

I suspected that changing the status of the CTX line didn't instantly stop the flow of data and the computer may still send a character or two. Possibly by the time I had signalled the CTX line it was already in the process of sending the next character.


Writing Data - Attempt 3

I didn't want to have a serial interrupt as I didn't want any of the writing timings to be distorted. I realised that in-between writing each bit to the floppy drive there would be a number of CPU cycles sitting in the next while loop. I decided to check between each bit write if another byte had been received since CTX went high and store it.

My theory was that when you raised CTX the computer was probably already in the middle of transmitting the next byte and as you can't stop it mid-stream then it would half after this one. This means I only need to check for one extra byte during the loop and use it if found instead of looking at the serial port again.

So this seemed to work, and the Arduino completed the write without losing any data from the computer. The only questions now was, has it actually written any data, and if so, is any of it valid!

At this point I had only encoded one track, so I decided to run the entire algorithm to encode all 80 tracks. Something strange was happening. The drive head wasn't moving at all. It still did when reading, but not when writing.

I found that in order to move the drive head back and forth you first had to raise the /WRITE GATE pin, I suspected this was required for changing the surface also. Once I added code to do this the drive head moved as expected. This did make sense and would prevent accidental writing of tracks while moving the head around.

So at this point I wrote a disk image out I had created previously, and then tried to read it back. Nothing could be detected!. Either the data I had written was invalid, or the way I was writing it was wrong.

I decided to feed the encoded MFM sector data that I was creating into my sector decoding algorithm used by the reader to validate that what I was generating was correct ans valid, and it was. Something was obviously wrong with how I was writing the data to the disk.


Writing Data - Attempt 4

As no data was being read correctly I decided try a few different approaches. I wasn't sure if the /WRITE DATA pin should be pulsed (and if so, by how long), toggled or just set to the raw data value. My current implementation pulsed the pin. I hadn't been able to find any information online about how the write pin was physically suppose to be manipulated when writing.

The read head would send us a pulse each time there is a flux reversal. I decided to change the implementation so that WRITE DATA was just set to the value of the bit. That didn't work either. So I changed the code to toggle the current state of the pin. Still no luck.

Clearly one of these approaches must have been the correct one. So I decided to get out the trusty Oscilloscope again to have a look at what was going on. I decided to write the MFM pattern 0xAA to to every byte on a track continuously. 0xAA in binary is B10101010, so this would give me a perfect square wave that I could monitor for the required frequency.

If it didn't see a perfect square wave at the desired frequency then I knew there must be some kind of timing issue.

I hooked up the scope, but was surprised to see the timings were perfect. However, being an old scope I couldn't see more than a few pulses. The scope had this wonderful x10 "mag" mode. When pressed it increased the time-base by 10, but more importantly allowed you to scroll through all of the data much like on a modern digital scope.

Something wasn't correct here. It looked like every 12 bits or so I ended up with a period of just "high".

Either the data I was sending was in some way invalid, or there was something causing a pause in the writing process every 12 bits or so. 12 being a strange number considering there are only 8 bits in a byte.

After thinking about this I wondered if I was back with a flow control issue. The way I had designed the loop was to scoop up any stray extra bytes that were received after we had waited for one. But it wasn't intelligent enough to prevent the wait every other byte. I had two choices, move something into an interrupt, or patch the loop.

I decided to have a go at correcting the way the loop worked first. The issue was as a result of a delay caused by waiting for the next byte from the computer. If we lowered CTX and waited for a byte, by the time we raised CTX again another byte was already on the way.

I change the loop so that when the second byte received had been used the Arduino momentarily pulled CTS low and then high again to allow another character to be sent. This meant on the next loop we would have already received the next byte so no waiting was required.

Testing this produced a perfect square wave:

This meant all of the timing for writing a track was perfect, it was just down to the actual data that was being written. I decided to let this run for a few tracks and sides, and then read it back to see if it had written correctly. I was setting the /WRITE_DATA pin to the corresponding bit value from the data received.

When reading the data back it looked like nothing had been encoded, but then I skipped to the other side of the disk. Sure enough there was my pattern. I didn't know why it had only written to one side of the disk.

After some thinking I started to wonder if the /WRITE GATE pin didn't actually work the way I thought it did. It occurred to be that by pulling the pin low it may be enabling the erase head on the drive. If this was the case then I should only do this when I was actually writing or I might end up with noise on the disk as it spins and erases.

I changed all of the code so that the /WRITE GATE was only used when first starting the drive, and later only literally during the write loop. That worked! I was now writing data to both sides of the disk!

So I tried again with a real ADF disk image and let it complete. I then used the reader portion to see if I could read it back. It worked! but for some reason it took quite some time to read this disk back. I wasn't getting any MFM errors but it was struggling to find all of the sectors.

There's were two possibilities for me to look at now, firstly had the data actually written timely enough, and secondly would the disk actually work in a real Amiga!

Too excited with the idea that I might have actually written a disk I booted up the now working A500+ and put the disk in. Moments later the disk started booted, and then displayed the famous checksum error message. So I was writing something valid, but it wasn't consistent.

I decided that unless I could read the data back at a much more accurate rate then writing a disk was pointless.


Reading Data (again)

I wanted to improve the reading quality as I wasn't happy with the current implementation. The current implementation didn't allow enough flexibility for the pulses to arrive at slightly odd times. I needed a new approach.

Firstly, I decided I was going to sync the reading to the /INDEX pulse. Not required by the Amiga, but may come in handy later on for me testing writing and reading.

Several people in the comments to the first half of this project suggested that I should be recording the timing between pulses rather than the method that I had implemented. The only issue with this was getting this data to the PC fast enough. If I was to send a byte for each bit then I could easily exceed the maximum 2M baud.

I decided that the best thing to do would be to try to make sense of the data a little. So I decided to let the counter I was using originally to free-run, right up to 255. I then put the code in a loop waiting for a pulse and that this point saw how much time had passed.

In an ideal situation the minimum value this would ever be would be 32 (corresponding to 2uS). With MFM you could only ever have a maximum of three 0's in a row, so the maximum this value should reach was 128. This meant there were a maximum of 4 possible combinations in a row.

I sampled several disks to see where the majority of these frequencies lay, and the results can be seen below:

Looking at this, I find the majority of the points around a counter of 52, 89 and 120. However these were somewhat specific to my drive and therefor not a good guideline. After some experimentation I used the following formula: value = (COUNTER - 16) / 32. When clipped between 0 and 3 this gave me the output I required. Every 4 of these and I could write a byte out.

It occurred to me that because you couldn't have two '1's together in an MFM encoded bit stream I could safely assume anything for the first value was invalid and could be treated as another '01' sequence. The next part was to unpack this data once received by the PC and turn it back into MFM data. This was simple, since 00 couldn't happen, a 01 meant write '01', an 10 meant write '001' and a 11 meant write '0001'. I gave this a try and to my suprise my results were 100% successful. I tried with a few more disks too, 100%! I now had a very reliable disk reader.

With this new approach being a lot more tolerant on the data from the disk I no longer needed any phase analysis or as many retries. Most of my disks now read perfectly. Some required a few retries, but got there in the end. The last part was to statistically analyse the data and see if it could be repaired, however, 99% of the time bad data coming in was completly unrecognizable and so was little help.


Writing Data - Attempt 5

Now that I could verify what I had written with high accuracy it meant testing the writer would be much easier.

I set about analyzing the code to see what was going wrong. I wrote a 0x55 sequence to an entire track and then read it back in. From time to time a bit had shifted in the data coming back meaning there was some kind of timing issue in writing.

It turned out that this was partly due to the way I was handling the serial port, and partly due to the use of the timer. I was waiting for the timer to reach the value 32, writing the bit, and then resetting it. I changed it so I didn't have to modify the timer counter value.

I would write the first bit when the counter reached 16, then the next when it reached 48 (16+32), and the next when it reached 80 (16+32+32) and so on. Timer2 being only 8-bit rolls over back to zero after the 8th bit, exactly when we needed it to. This meant that as long as we wrote the bit at the required timer value we would be at exactly 500kbps.

I also looked at how I was reading the data from the serial port. This was being read in-between each bit, but this needed to be as short as possible too. After a little experimentation I achieved the shortest working block.

After modifying the Windows code to support verify I was now ready to try again. This time I knew that if the disk verified properly then it should work properly in the Amiga.

So I tried writing another disk out. With verify it took longer. With the new algorithm about 95% of the tracks passed verification on the first go, with only the reaining 5% having to be re-written once more. I was happy with this and popped the disk into the Amiga. It worked perfectly!


Writing Data - Attempt 6

After some feedback from some people who have been using this it was clear that even with verify on the drive wasn't always producing fully readable disks. The software could read them back perfectly, but the Amiga computers would report of a few checksum errors here and there.

I had another look at the code, wondered if it was a timing issue and looked to see if it could be made to be interrupt driven, but sadly with the small amount of time between each bit there simply isn't enough time with interrupts to achieve this with preserving the registers you modify etc.

I then looked back at the writing code. There is a small chance that after a full byte has written, the code could have looped back to start writing the next byte before the timer had overflowed back to 0, allowing the first bit to be written early.

I added a small loop to ensure this couldn't happen which hopefully will fix this for anyone having this issue.



I have had lots of feedback from people who have successfully made this project, both fully working and not working. I decided I would build a diagnostics module into the code to help anyone who can't get theirs to work.

The diagnostics option consists of a few extra commands for the Arduino to process as well as a whole series of events that get ran through to ensure everything is wired correctly.


Writing Data - Attempt 7

After getting a lot of reports of of checksum errors for written disks I started to investigate. I thought at first I was going to have to get down to looking at the MFM data from the disk but the problem was actually much simpler

Looking at XCopy Pro to see the checksum errors, it reported codes 4 and 6 meaning checksum errors in the sector headers and data areas. If it had just been the data area then I would have assumed that it was purely something to do with writing the last few bits of the track, but it wasn't.

I started looking at the writing code and the padding I had around each track, wondering if I was overwriting the start of a track now and then, so I massivly reduced the post-track padding from 256 bytes to 8. To my surprise my verify then kicked out a tonne of errors.

This made me wonder if the actual issue is I'm not writing enough data. I set about adding a Track Erase command to the Arduino which would write the 0xAA pattern to the entire track and then write my track afterwards. To my surprise XCopy gave it a 100% thumbs up. So hopefully that's cleared that problem up.


So what's been happening?

I've been out of this project for a while. But recently jumped back in to address some issues and merge some of the excellent things that others have done into this.

For example, there is a version that will read Commodore 1581 disks, as well as reading Atari HD disks


Reading Data (Again)

I'd been playing around with the idea that I didn't really understand why the timings i'd used for reading were what they were. Whilst they worked, the numbers didn't really follow an exact pattern, and didn't follow the ranges that they should.

I knew that there were three MFM bit patterns that should occur, 01, 001 and 0001. I had also seen 00001 which is technically not allowed. Ignoring the last one, these should take 64, 96 and 128uS. This would theoretically make the centre points (64-96) 80 and (96-128) 112. This is what I had. But it suddenly occurred to me that I was working with timings at the Arduino clock speed. This meant every instruction counts towards this.

I had one disk, with one track that always read with errors, but on an Amiga it read ok. It must have something to do with these timings as it seemed that the Amiga must be very tolerant of bit drift.

I started looking at the disassembled code, and the instructions involved. I was surprised to find that reading the value of the counter, and resetting it together took 4 clock ticks. So, I did some experiments with applying this offset to the timings used and to my surprise, the disk could now be read!

To tidy this up for future use, these are all calculated at the top of the code now.


Writing Data, Again (Precomp) - Failure 1

One of the comments I heard while working on this project, was that people were surprised that writing worked as well as it did without what is known as write pre-compensation. After some background reading I was struggling to find out if it was even used on the Amiga, although you could see it was supported in the hardware.

The theory of write pre-compensation goes a little like this: Writing to the outer tracks, each time you send a '1' you cause a flux transition on the surface of the disk. This is like making tiny magnets on the surface of the disk. On the outer tracks this is fine, but as you get closer to the inner tracks these magnets get closer and closer together. Ignoring the details, you're suppose to adjust the timings slightly between bits to compensate for al of this, and also the time taken to build up the magnetic field.

Surprisingly there’s a fair amount of conflicting information about this, but the basis for this is to try to adjust the timings sightly to keep the '1' bits as far apart as possible without causing them to fall into another position.

The tricky part to all this, is the amount of pre-compensation, and when to use it. Some documents talk about applying it from track 40. Some say after track 79. Some say both and by different amounts.

The Amiga supports precomp of 0, 140, 280 and 560ns. Yes, nanoseconds. That's an incredibly small amount. The 16Mhz Arduino base clock is 62.5ns, so a mere 2 basic (like add) instructions would add up to 125ns. I wondered if we could even be that accurate with the resolution available.

I realised that there was no way this could be calculated on the Arduino, so it would need to be pre-calculated on the PC prior to being sent. I decided on some code, that would read the raw MFM bit-stream and convert it as follows:

MFM PatternBinary Encoded As

So, with this pattern I could send 4 of these sequences for each byte. But what about precomp. I needed to say for each sequence if the sequence should be EARLY (ie: -125ns), LATE (+125ns) or normal. Well there goes another two bits.

So, each nibble (nybble!?) (half a byte) now contained information regarding each sequence. But could I send this quick enough? Well. The baud rate is 2M, and roughly 10bits accounting for start and stop bits etc) which means each byte would arrive approx 5uS apart. According to what we learnt, that's roughly 88-90 Arduino clock ticks. On a good day, lets say 80 instructions. Sounds like plenty.

I needed to know this was going to work and be accurate. My trusty oscilloscope wouldn't be able to keep up with this, and even if it could, for the short amount of data it didn't have any storage options.

I decided upon a logic analyser but they're so expensive. Well, until I found the cheap clone knock off 24MHZ ones for about £10. And it worked very well. I was able to capture the data I was writing and accurately check the timings. Ok, with Nyquist sampling theory I really needed something like 32Mhz, but this was probably good enough.

I hooked it up, wrote some code to read this, set the timer to zero, and waited in a loop until it reached the required value based on 64 + (32 * pattern) +/- precomp. At first inspection, something very weird was going on. The sequence was there, but the data was shifting around all over the place. The pulses seemed to be shifting by approx +/- 180ns all over the place seemly random. Odd.

Then I remembered the issue I'd had with reading, and disassembled the code. The loop looked something like:

while (TCNT2 < value) {}; 
Looks harmless enough. But then it would be pulling the value from TCNT (2 clock ticks), then comparing it with another value. Another clock tick. Then jumping back if it wasn't the value. Another clock tick. This loop cost 4 ticks. That's quartering our resolution. How was I suppose to write something to a quarter of this with this method? Answer is you can't.


Writing Data, Again (Precomp) - Failure 2

So I needed to find a faster way of monitoring the hardware counter, that was ticking at the speed of some of the instructions I was using. I knew that TCNT2 (Counter2/Timer2) could do a lot more than just count.

I started looking at the data sheet again for the ATMega328 chip. Its not for the feint of heart, but it does explain about every aspect of the micro-controller, from controlling pins, to the physical dimensions of the chips.

If you go to the section for the counters, there's a lot of other registers associated with them. Up-to now I hadn't really played with them, well, not since my very early experiments back when I started this journey.

There are registers that allow you to control the range it counts between (OCR2A & OCR2B), as well as bits in registers controlling how the timer operates. For example TCCR2A is used for controlling PWM output, or as I have it setup, no output.

Then there's its pair, TCCR2B, which up to now I have been using to set the counter speed, in this case to 'no pre-scaling' meaning run as fast as the oscillator is running (16Mhz).

You've also got TIMSK2 which is involved in enabling interrupts, and then finally TIFR2. Now that's an interesting register. Taking a closer look, there are three bits in it that you can read. OCF2B, OCF2A and TOV2. These 'bits' get set based on the rules setup in TCCR2A. For example, you can have these bits set when TCNT2 reaches the value of OCR2A, OCR2B or overflows back to zero.

I knew that merely reading from TCNT2 alone used 2 instruction cycles, but what about checking these bits. After disassembling the code I discovered they can be tested in a single instruction cycle. This meant the loop I used before could be replaced with something more like:

while (!(TIFR2 & bit(OCF2B)) {};
This loop would exit if I setup the above registers correctly when TCNT2 reached OCR2B. Fantastic. An interesting note. You have to write a '1' bit to these registers to clear them to zero. Don't ask. That's what the documentation says, and it's correct.

So I re-implemented the solution with this in mind. It was much better, but there was still lots of jitter. Part of the issue now, was to maintain writing to the disk I also needed to be reading the data from the serial port.

The code I was currently using to read the serial port and keep some buffered data looked like this:

if (UCSR0A & bit(RXC0)) {
	SERIAL_BUFFER[serialWritePos++] = UDR0;

The above piece of code, yet again, looks simple, but compiles down to around 14 instruction cycles! So when do I run this code? Once per 'bit'? Remember, we don't write any '0', we only write the '1's. So the minimum interval would be (theoretically) 64uS, max being 128uS. The data comes in around 1 byte every 80 uS. But we have to control this (this is what the CTS controls were for).

The micro-controller will buffer up-to two characters for you before data is lost. So it seemed that as long as I check once per cycle I should be ok. So this code was placed before the while loop.

Something weird happened. The PC finished sending all of the data and waited for the response. But the Arduino never sent one. The serial port, just has some interesting registers too. One of them will tell you if you haven't read the data quick enough. If you do

if (UCSR0A & (bit(FE0)|bit(DOR0))) break;
Then your loop will exit if the serial port has a Data Over Run, or if the has been a Frame Error.

So, turns out, I was getting overrun errors. I wasn't reading the data quick enough, so the PC had sent all the data, but the Arduino had lost/overwritten some of the bytes before I had a chance to read them.

I couldn't put the above code in the while loop because it could potentially add a further jitter as the loop could potentially take over 14 clock cycles before testing again. So I decided to try something like this:

while (TCNT2<OCR2B-16) {
	if (UCSR0A & bit(RXC0)) {
		SERIAL_BUFFER[serialWritePos++] = UDR0;
while (!(TIFR2 & bit(OCF2B)) {};
// Set the output to '0' at this point.  Wait a little, and then put it back to '1'.

I was switching the CTS line on and off only when there was or was not space in the buffer, and the theory with the above code, is allow up-to 16 clock cycles before the pulse to check for serial data, then stop and wait in the tighter loop and set the output. The output pulse, according to some of the drives needs to be something like 0.15 ~ 1.1uS. The, previous write method I was using they were 4us and it was fine, besides, its the falling edge (transition from 1 to 0) on the pulse that triggers the write, not the pulse being low its self!

Seemed like a good plan. But I was still getting issues, this time, namely my code was catching that I wasn't receiving enough data from the serial port quickly enough to keep my output buffer full. There's are 16 clock cycles sitting in the second loop not reading data from the serial port. If I reduce that number, then this issue went away, but the jitter came back.

At this point I almost gave up. I was so close, but just not good enough. I couldn't have any jitter, as if there was any then trying to use pre-compensation would be pointless anyway.


Writing Data, Again (Precomp) - Success

I needed a new approach. It was becoming obvious that me bit-banging out the data wasn't going to produce the level of accuracy I needed, well, not with this microcontroller. A faster one maybe.

I then thought back to my early experiments, and about the timers again. The could output waveforms, controlled at the hardware level usually for PWM. These would be a repeating pattern however. I wondered if there was a way I could use this information, a way I could vary these waveforms to produce what I needed getting it to do the accurate switching and me just programming it to do them.

What I really needed was a one-shot pulse. I.e: Start it running, after a specific time it would go low, stay there, and go back high again. But the Atmel microcontroller isn't designed to do this, or at least isn't suppose to be able to.

So I started researching it, I came across a trick which abused the AVR timer to create a one-shot pulse. Interesting. So I set about adapting this to what I needed. It essentially would set the timer up backwards so that it gets stuck after the pulse finishes. I doubt if this was ever intended, but the logic of the timers allows this to work!

So I set about changing the code to create the 01, 001, 0001 etc sequences using this one shot. I'd start it going at the first '0', and let it do the '1' pulse too. As the timer also works at 16Mhz, all I would need to do is offset where the pulse started to achieve precomp.

After some more analysis of the hardware registers for Timer2, I ended up setting them as follows:

TCCR2A = bit(COM2B1) | bit(WGM20) | bit(WGM21)| bit(WGM22);  
TCCR2B = bit(WGM22) | bit(CS20);

TCCR2A: COM2B0 and COM2B1 combined according to the data sheet mean CLEAR OC2B when TCNT2 = OCR2B, and SET OC2B when TCNT2 reaches the bottom (being TCNT2=0). WGM20, WGM21, WGM22 state that we want to use Fast PWM mode.

TCCR2B: WGM22 enables waveform generation on the output pin associated, which for us, luckily, is the same pin we have been writing with, and CS20 is the precale=1 value we have used before.

The way the timer works, is when it hits BOTTOM (ie: 0) its suppose to reset TCNT2 back to the value of OCR2A. But if we set this to zero as well, the timer gets stick at 0 unless we manually change it.

All I now needed to do was work out the value of OCR2B, which is where we want the pulse to trigger at, and the initial value of TCNT2 to give the right amount of delay for that pulse.

Now, because the hardware was generating the pulse, it gave me more time to calculate the values and monitor the serial port, leaving it just until the last minute to accurately time the reset for the next sequence.

I ended up with some code like the following:

while (!(TIFR2 & bit(OCF2B))) {
	if (UCSR0A & bit(RXC0)) {
		SERIAL_BUFFER[serialWritePos++] = UDR0;

while (!(TIFR2 & bit(TOV2)));

TCNT2 = counter;
OCR2B = pulseStart;

So, the first while loop would run until the pulse started, but we didn't actually need to generate the pulse, so we didn't care how accurate it was, and the second one accurately waits for the overflow condition (TCNT hitting BOTTOM).

After this, we immediately set TCNT2 (counter) to the delay for the new pulse, and write where the pulse should start from.

But there's a problem here. The second while loop also caused a little delay, plus writing to TCNT2 takes time. But this time, its all fixed, and as such we can compensate for it. I measured 6 clock cycles which accurately gave the correct timings.

Success! I confirmed in the logic analyser that the output was indeed what I needed, was timed very accuratly, and that the pulse start was shifting back and forth by the 2 ticks that I wanted.

If you want to see exactly how this is done, then the code is up on GitHub, like the rest of the project. I was very pleased. At the start of this I wasn't sure this was even possible. I suspect that I would have been told its not possible, but using the hardware inbuilt components rather than doing the work myself made it possible.



This is a shortened summary. The actual process was a lot more infuriating debugging data at the bit-level seeing what I had done wrong and understanding how WinUAE implemented disk access.

I had the idea in the back of my mind of supporting the drive natively in Windows. This would involve device drivers for floppy disk access, and then file system drives to support the Amiga file system. This would be fantastic, and a lot of work.

I then got side-tracked with a thought that had come to mind a few times, and after receving an email about it I decided to investigate further. Would it be possible to actually support this drive in WinUAE meaning you could use native Amiga disks in the emulator?

I had absolutely no idea, but couldn't see why not. I assumed, somewhere within the all that code it must be simulating a disk drive... somehow.... I assumed my approach would have to change some what, but as I'm sure you've seen, I'm not afraid of changing the code!

First step. Could I even compile the WinUAE source? Well Toni, who maintains it so excellently includes instructions on such things, and within minutes of downloading all the files it compiled and ran. I now just had to understand the code, and not surprisingly, there's a lot, and in a variety of styles, probably due to the nature of the project.

There's no reason that anyone probably looked into doing this before, and you could argue that there isn't any need. But I like a challenge. Toni had commented that he would have thought it would be difficult to synchronise the emulated Amiga timings with that of the a real spinning disk. I assumed he meant physically getting the disk MFM spin position to match where the emulator thought it was, but I had a plan.

Firstly, to support this the best I decided I needed to have more access to the floppy drive. Whilst it could work with the existing setup, it would be nice to be able to turn on and off the motor separate from other functions, and detect a disk being present in the drive. These are all available from the disk, I just hadn't needed them so they weren't connected up or right. There's a mod detailed for this in the hardware section. Its optional, each and highly recommended!

My plan was that if I could some how stream the data to the PC intime for WinUAE to read it, then all should be well. I'm not going to go into details here, but while that sounds simple enough the reality is far from simple.

Part of the problem is the data needs to be available to WinUAE when it wants it, and not when I have it ready. If I don't have the data and provide blank data you basically get read-errors. If however I force WinUAE to wait, the program will stutter and freeze (including the audio and mouse pointer) which is highly undesirable and unpleasant. This is not anyone's fault, this is just the pitfalls of what I'm trying to do.

Now, I can cache a full track read in memory, but I need to be sure it is an exact revolution without overlap or missing data, literally to the bit, which is no easy feat. But what do I get WinUAE to do while its caching it?

So first challenge. Read a complete track. Ah, that index pulse the drive generates, that'd gotta help. It does, but not to the level of accuracy I'd hoped. It oculd be my implementation, but I found its pulse jittered around a little. Not much, but just enough it wasn't reliable enough on its own.

So I settled for read the data until the pulse, carry on reading until the next pulse, and then try to match up the overlap. This worked surprisingly well, although wasn't anywhere near as trivial as I make it sound. This took several weeks to get working reliability making sure not to miss a single bit.

The downside is, the disk spins one revolution approx every 200ms. But I need to wait for the index marker first, which could be up to 200ms away if we're really unlucky. This means that at worse case, reading an entire track could take up to 400ms. We can't stop WinUAE for that long.

I wondered if it was possible, rather than to wait for the entire track, but to supply what we had so far, and then stall the process if necessary. Not good. It still stuttered too much to be acceptable.

I then had a thought. We're talking about physical hardware here, which doesn't work like its emulated counterpart. For example, when you switch the motor on to start the disk spinning, you can't read the data right away, you usually have to wait up to 750ms first. I know this, WinUAE knows this, and Kickstart knows this.

What about switching/stepping between tracks. Is there some dead time when the data from the drive isn't available or isn't considered stable? It turns out there is. Its usually up to about 18ms. Well, I still need to find up to another 382ms.

...or did I. I had a thought. What if, until the data was ready, I just kept feeding a '0' or a '1' back to WinUAE. There would be no pattern to lock onto, and no data, but because of the way WinUAE worked, it would still think a disk is present and spinning (it simulates the index pulse). Would it allow for a longer amount of dead time?

Turns out it does. I whacked this way way up to a maximum of 600ms before we potentially could stall WinUAE and everything was smooth. The data read, the mouse and music worked smoothly and a disk booted.

The data was being streamed from the Arduino in the background using a separate thread. I had setup a queue system whereby commands requesting changing the track number etc got queued, but we simulated a valid response back knowing the only important thing would be actually reading and writing to and from the disk.

Any command that was issued would immediately stop this streaming, but if left, would continuously read exact rotations from the disk, which WinUAE would switch in and out when it requested. My theory being that this may help support the 'weak bit' protection used in some games, but would also fix a potential read error that may have occurred.

So I had a disk reading successfully and, whilst not quite as fast as a real Amiga, it didn't seem to care. Even X-Copy didn't seem to care about this.

So now the challenge of writing occurred. Well, I now had the new Precomp writing code ready to go. So hopefully I could get this reliable too.

To cut this much shorter than it actually took, this was a absolute monster to get working. The main reason was for the way WinUAE handled the data it was trying to write. Data could get lost and I didn't know where the start of it was. So, instead I decided I would capture the data as its received from the amiga DMA write. It's what WinUAE was doing anyway, but I coudl receive it directly, and then, when finally the DMA request had completed I could just queue up a request to write it to disk.

This worked much better. The best test of this was to try and format a disk from within Workbench, after all it would also want to verify the results. Could any of my timing tricks work here too? Seemed so. I was now able to format a disk from Workbench, have it verified as it formatted it, and still play a MOD file solidly without any gaps in audio. Mission successful.

I wanted better support though, I have a few copy protected disks that still wouldn't boot. I suspected it was something to do with tricks of varying the bit-cell timings slightly to make some areas of the disk read quicker than others. A fairly common technique. I added code to the read streaming to tell if a bit was read normally, fast or slow, which I then simulated in WinUAE. Sadly this wasn't enough, so I guess that's as far as the Arduino board is going to go. I guess to those games, they still looked like pirate copies.

I had designed this interface with supporting other hardware too, after all, adding support for this hadn't been trivial and quite complex, not only the physical considerations, but actually figuring out what was going on in the emulator!

So the code was written with a set of common functions (abstract class) that needed to be implemented in order to support different hardware, which would make this far simpler in the future, and I still wanted to reach my goal of booting a copy protected game properly. I had two to try, the cheesy Captain Planet, and Lemmings, both of which came with the Cartoon Classics pack I got with the Amiga 500+, both that didn't work when you copied them, and both only booted so far and stopped on the copy in the same place they stopped when using them in WinUAE with my drive.


WinUAE and Copy Protected Disks

I don't like to give up. Happy that standard AmigaDOS disks would boot I turned my eyes to copy protection. I already knew a lot about what was going on, from short/long tracks, to weak/unformatted areas etc. My thought was that if I could read the data accurately enough then I didn't see why the emulator wouldn't play these games.

The problem was data. I had very limited bandwidth on the serial port from the Arduino. We're running at 2M baud, which (at 8 bits, 1 stop bit, and 1 start bit) means that we should be able to transmit one byte every 5uS. From the disk however, the shortest sequence being '01', could arrive at 4uS intervals. Sometimes even quicker depending on the speed of the drive we're using vs the one being written with.

So I first decided to look at the Lemmings disk I had. I knew it wouldn't boot if you copied it with XCopy, so knew something fishy was going on.

I added an extra command to the firmware for the Arduino which would give me (for a single revolution) a breakdown of timings for bit-cells found on a cylinder. There were obviously hot-spots around the 4uS, 6uS and 8uS intervals, as expected. But there were some tracks that had a lot more variation in them.

Now I knew that there were some tricks done with the data being recorded at slightly different bit-rates, the theory being that if you read the track and timed it, and compared it to another track you could measure the difference and detect the original vs the copy (as the copy would correct this slight timing inconsistency). We're talking very small difference here, if they were large the drive wouldn't read the data properly.

In the previous version I had tried to add a 'fast' and 'slow' option, but it didn't work. I needed something more sophisticated. Previously I had tried to state this speed per bit-cell, but did it really need to be that accurate?

I wondered what would happen if I averaged the timings over several patterns, and sent that to the pc, and used that to tell WinUAE to play the data slightly faster or slower. I knew it could do this, I had found some SCP recordings of these disks and they did boot, so knew it was possible.

The final format of this I ended up with was to send a byte back to the computer with the following information:

Bits of one Byte
The speed was simply calculated by adding together the timer value - the start of the bit cell timing. Eg: If the timing was at 4.2uS, we'd remove the 4.0. Obviously these are in ticks, not uS, but you get the idea. This would give (for each cell) a number between 0 and 31, 15/16 being the centre or "100% correct speed"

Once added together, this number would go between 0 and 63, so we shift it right three (divide by 8) which gives us a number between 0 and 7, which nicely fits there. But would this be of high enough resolution to solve the problem.

Well, once presented back to WinUAE in the correct way Lemmings booted! So whatever method that was using for bit-cell timings, we just bypassed it. Now that might not work for all games that use this technique, and if a higher resolution is needed later on we would have to average the timings over 4 cells instead.

Timings from Track 0, Upper Side, Lemmings Disk
Timings from Track 0, Upper Side, Lemmings Disk.
There's some suspicious bit-cell times there, but well within tolerance!

So, what about Captain Planet? No joy. Whatever method that was using, it wasn't that. So what was it doing? Again, I tried the analysis of the timings on the disk to see what was going on. Most of the tracks were pretty typical, but there were two, specifically track 0 lower side that was really strange.

The best way, is to show you what an SCP dump of this track looks like in HxCFloppyEmulator's Track Analyser. I only discovered this recently:

Visual Floppy Disk for Captain Planet Track 0, Lower Side

So, the green sections represent 11 sectors of data. It even detects them as AmigaDOS. Looks normal, data sits in the 4, 6 and 8uS as you'd expect. The white area is the 'gap' at the end that usually just fills the space to complete one revolution. Mostly this is ignored, and regardless, as I need to provide an accurate revolution of data to WinUAE this is fed in anyway. Copiers would usually just ignore this, but it looks strange, like there's some data there too.

Taking a closer look you can see this:

Visual Floppy Disk for Captain Planet Track 0, Lower Side - Track Gap

Well that's a little strange. So, ignoring the green areas which are data sectors, the right half of the image is just the 0xAA (01010101) sequence you often see, this is normal, but what's going on on the left hand side?

This must have something to do with it. My code already kept providing revolutions of data into WinUAE, this would allow it to bypass the 'weak bit' protection scheme as data mis-read would get re-presented differently each time. But what was going on here?

I decided to take a single revolution of data from the SCP file, and feed it into WinUAE, for just this track, and have it read the rest from the drive as before. The game booted! So clearly it didn't need multiple revolutions of data. So what was going on? If you look at the recording, this scatter of dots is different, so there is something going on.

There is clearly no way I can send that amount of craziness in 2M baud. I just don't have the bandwidth. But I do still had a few tricks.

This 'something else' case (the more than 3 0's in a row) that I was able to receive, which I was treating as an error maybe starting to mean something more useful.

I started reading up again about disks, and what they do if there's no flux data for a while. If you notice the timings, some of them are way up in 11-12uS. That's way out of range for MFM data.

So, drives have a circuit in them that automatically adjusts the amplification of head as its being read to compensate for disks recorded with different strengths of magnetic fields. After all, older disks still work, and their magnetic data will have started to decay.

Apparently, after about 3 zeros in a row it keeps trying to tune in, and eventually this amplification becomes so high that a bit might be detected that isn't even there. This is how the 'weak' bit stuff is suppose to work. When it actually detects a real bit, this gets massively overloaded and starts to revert back to its previous levels, and then the whole process starts again.

Looking at the data my code was reading, this special case was being tripped quite a lot, but how was I handling it? Incorrectly obviously. But I couldn't report sequences of 00001 or longer, I just don't have the bits. But what if I didn't have to. What if I only needed to know that we'd gone past the possibility of the 0001 sequence, but without detecting a '1', ie, confirming an '0000' sequence.

So I made some changes so when this was triggered it generated four zero bits, but the game still didn't boot. So I looked at the condition that triggered this. We waited in a loop, until we saw a bit, or the timer overflowed. The overflow being what may have triggered, or a bit way beyond the standards. This means that 00001 and 00000 would be detected as the same thing. But they're not.

I made this more strict. I changed the loop so it exited if the timer reached the 8uS boundary or hit a bit. Unless the bit was exactly on that boundary we would now know accurately we found a 0000, and any future '1' bits would be detected on the next loop. ie: we could detect 0000 and then 01, meaning we successfully detected 000001.

Did it work? no. I slept on this and came back to it in the morning.

A thought occurred to me. In between the checks for bits I perform other tasks, such as sending data to the PC or checking if the INDEX pin had signalled. What if a bit pulse arrived during this time, and the pulse was short enough that by the time we got back to checking for it, it had gone, and we missed it.

Is there a better way of watching for data? Well learning from how I now detect the index pulse, I started looking at the data sheet. The method I used there wouldn't work as the data wasn't coming in on a pin that could detect rising or falling edges. The ATMega328 does however allow you to setup an interrupt for pin state change for every pin.

I don't want an interrupt, but it must signal that one should be generated. Sure enough, if I do this:

PCMSK2 = bit(PCINT20);
PCICR = bit(PCIE2); 
This would normally setup an interrupt when the state of the pin (4) changes. You can only detect changes though, not edges. Not massively a problem. If we know we're not in a pulse, then the change would be the pulse, and as long as we wait until the pulse is finished before resetting then all should be ok.

Our main 'detect pulses' loop now looks like this:

while ((!(PCIFR & bit(PCIF2)))&&(!(TIFR2 & bit(OCF2B)))) {}
Basically, wait in the loop until the pin changes or the timer reaches the upper 8uS threshold.

Then at the end I add the following:

PCIFR |= bit(PCIF2);

Bingo! The game booted! You cannot believe how happy I was! Whilst the above seemed like only a few days of work, this was evening after evening, after work for about a month to get this to happen.

This copy protection used two methods, firstly, valid sectors which everything woudl read correctly, then hiding some data in the sector gap but hidden in-between unformatted areas. This data would not normally be copied.

Now, I'm not saying this will bypass every copy protection, but these are the only ones I have to try, and they both work. If anyone has any others that don't work with this, well, then I guess we'll have to see if I can get a copy and try it.

For now though, I'm done. It's been a lot of hair-pulling fun trying to achieve this impossible goal, and I'm so pleased that I didn't give up. These little Atmel devices can do so much more than you expect if you take time to study the data sheet. They kind of remind me of the Amiga in their own way. They have lots of special options like timers, counters watchdogs etc available at the hardware level that you just need to enable, leaving the CPU to do other things. So much of the code I see around the Internet makes me sad. I see PWM being done manually when the hardware can do it. I see people wanting to count pulses externally, and end up in loops, and sometimes interrupts, when neither of these are required. Anyone that's interested in embedded programming, I strongly urge you to read the data sheet and programming guide! There's so much more to these little devices.


Say NO to CH340

I decided to do some checking on the differences between the FTDI and CH340 USB to Serial converters. Short answer, they both werk perfectly for small amounts of data (like a few bytes), but for streams/large amounts of data, the CH340 can't cope and the data starts to get corrupted. So, don't use any CH340 converters for this project.

Also, at the same time I tested the USB to Serial device on the Arduino UNO. Same problem. So, for this project, you MUST use the FTDI converter.


Kickstart 1.3

So, having never owned an Amiga with anything before Kickstart 2.04 I never really had the opportunity to test this. This was until I tried writing a few ADFs from the Internet to disk so I could try them. The disks were some old Demoscene sequences, and very good too, but they weren't compatible with Kickstart 2+. I found one of my old disks that had a copy of Relokick and booted back to Kickstart 1.3

Weirdly, the disks I had written wouldn't load at all. It was like the OS couldn't detect anything. After going back to the code and comparing the data of a disk that did work, and one that I had written I discovered something very interesting.

Back when I was originally working on the reader, I read about it being a bug in Kickstart 1.3 that some disks were encoded with 2AAAAAAA44894489 (starting with a 2, not an A). After a more detailed look this isn't a bug after all. The '2' had to be placed there or the MFM sequence would end up with two '1's in a row which isn't allowed.

I know that the trackdisk device in Kickstart 2 and above had been re-written so I assume they just ignore the 2AAAAAAA part all together and just looked for the sync word (4489). Upon fixing this and re-writing the disk it all worked perfectly.


WinUAE - but faster?

I'd been very pleased with the disk access from WinUAE, and at this point it wasn't publically available. I had however been releasing copies out to a few people for further testing.

One such tester was really into the Demoscene and had commented that some of these demos did some crazy things with disk access.

One specific demo he told me about was called Spaceballs - State of the Art

Spaceballs - State of the Art

So I decided to download it, wrote it to a floppy disk, and tried it on my Amiga. Now this really is an interesting demo, from an engineering point of view. The moment the disk was inserted the demo started, like instantly, and while it was playing it was loading the next pieces in the background. The entire demo was timed so that the animations and effects lasted for long enough for the data to be read from the disk.

For fun, I tried it on WinUAE. Not so good. The disk just couldn't keep up. I wasn't suprised, lets face it, there is a lot going on to even make this work.

So inspired by this, I wondered if I could speed up the disk access without affecting the performance in WinUAE. I needed a baseline to measure.

My baseline, using the Captain Planet disk. I timed how long it took from the moment the WinUAE window appeared, to when the game music started playing, a mere 21 seconds:

Captain Planet on WinUAE, from Start to Music, 21.23 Seconds
Excuse the poor quality and wavy camera

Theres several factors affecting the slowness. The main ones are when the emulator switches to a track/disk side I start reading the data instantly. But it won't be available to the emulator for around 200ms absolute best case. The emulator then re-plays this data, taking a further 200ms. The problem is, the code needs to extract a full revolution of the disk so that WinUAE sees a complete track that perfectly loops.

I'd achieved this by noting where the INDEX pulse was on the drive, and then capturing all the data between that and when it occurs again. The index pulse isn't perfectly aligned, but it's close enough that we can do a rough compare to perfectly align the data.

So before we even start reading data to send to the emulator we have to wait for the first INDEX pulse. Worse case, thats 200ms. So for a single track, it could take upto 600ms before it's actually been processed by the Amiga OS, best case, around 400ms.

I wondered if there was a way I could do-away with this and capture a revolution without the index pulses. So I set about re-writing the code, as it was a little bit of a mess, and now having a grasp how it all works I could probably write it a little better.

My thought was, I knew the time it took for each bit-cell, so could I instead invent a virtual INDEX marker by timing approx 200ms of data and then patching up the seem.

Well, that didn't work very well. Turns out, and I should have realised that, not all drives spin at exactly 300rpm, so wouldn't be exactly 200ms.

So I decided to use the INDEX pulses to first time how long a rotation took, and then use that information to help better align a track. This worked, and after a little bit of tweaking I now had a slightly faster and improved loading. We'd gone from 21.23 seconds, down to 19.5 seconds, shaving off nearly 2 seconds!:

Captain Planet on WinUAE, from Start to Music, 19.5 Seconds
Excuse the poor quality and wavy camera

Not bad. After watching a few disks boot I had a fw other ideas. Firstly, whenever the emulator requests something of the drive, the code aborts reading whatever it is reading. Incompleted tracks are discarded. So, if at the time of the abort request the track was at 90% or more complete and the code had no record of this before, then the abort was ignored so the track read could complete. This technically might slow it down by roughly 20ms, but its such a small edge case that is bound to speed up reading this track later on.

After measuring this, I couldn't really see any difference. So what else could I do? I noticed that the games I had seemed to be well optomised. Typically these disks would see to a track, read it, then switch to the other head (upper or lower) and then read that.

I decided that after extracting one revolution and presenting it to the emulator, whilst the OS was reading it I'd switch sides and read that too. This is dead time so might as well use it for something. I think this is called read-ahead caching, or at least that's what i'm calling this. For disks that aren't optimised this wouldn't make any difference, but lets try our benchmark.

Booted up, and oh, 16.14 seconds! So thats nearly 5 seconds faster than the first attempt, and 3.5 seconds quicker than the previous attempt:

Captain Planet on WinUAE, from Start to Music, 16.14 Seconds
Excuse the poor quality and wavy camera

So with these changes, for this game, I sped up access by around 24%. Thats great!. Don't mis-understand, we're not reading the disk any faster, just reading it more efficiently. Now what about that original Demoscene demo I was trying. Well, it runs a lot better, still can't keep up, but much better. For reference, on a real machine, Captain Planet takes just over 12 seconds, so theres still a lot of improvement that could be made.... if I could just figure out how...

One last thing, I decided to call all of this technology (for fun) RIT™ (its like JIT - Just in Time) whch stands for Read in Time(ish)


So What's Next?

The entire project free and open source under GNU General Public Licence V3. If we want to have any hope of preserving the Amiga then we shouldn't be ripping each other off for the privilege, and besides, I want to give back to the best platform I ever worked on. I’m also hoping people will develop this and take it further and keep sharing.

I realised that this interface is also good for other formats. And whilst I don't directly support them with my Windows applications (you can extract a DD disk via the SCP option), the drive should be able to read any MFM disk format. It just needs the code writing to decode the track data.

I'll be keeping an eye on this, and I'm not sure if it's because of this project or just coincidence, but there's a fair few other projects doing the same, not specifically for the Amiga, and using different hardware, but either way it's good to see. I'm going to continue to investigate and improve the WinUAE support as time allows. If anyone has any insite on anything that won't load it'd be really appreciated!

If you enjoyed reading this and want to thank me, consider making a donation. It's all free so it's entirely optional