Snippets of original source code and what they can tell us
Assembling large machine code programs on memory-starved 8-bit home computers can be a tricky process. Assembly language source code is always considerably larger than the machine code that it produces, so if you're trying to build a machine code binary that fills your computer to the brim, assembling the whole thing in-place is not an option (at least, not on the original hardware).
The most popular approach is to split the source code up into smaller batches, and then assemble each batch to produce a set of smaller binaries that you can concatenate into the final game binary. On the BBC Micro, this is fairly easy to do with the assembler that comes built into BBC BASIC, with each part assembling its code, saving it to a file, and then loading the next BASIC program to assemble the next part.
A side-effect of this approach is that unless you clear down the computer's memory between program loads - something you are extremely unlikely to do, as this process relies on variables retaining their values between parts - then you will be left with fragments of the previous part's program and its assembled code in memory. If the source code defines a variable's block of memory by simply incrementing the program counter in P%, rather than using an explicit sequence of EQU commands to zero the block, then the block will contain whatever was already in memory, and whatever was already there will then be saved into the finished game binary.
As a result, it is pretty common to find bits of original source code buried in game binaries, particularly with large games. Aviator is no exception, so let's take a look at the secrets that are buried in the released game, and what they can tell us about the original source code.
Analysing the source for clues
------------------------------
Here is a list of all the parts of the game binary that don't contain game code or data, but instead contain "background noise" from the compilation process. Sometimes the data is just a jumbled mess that doesn't contain any clues, and sometimes it's a snippet of source code that is identifiable as assembly language, but which doesn't tell us anything. And then sometimes you hit the jackpot, and you find a chunk of original source code that's really interesting.
This table is therefore a list of all the potential sites in our hunt for clues to the original source code. You can click on the links below to see the clues in situ; they will either be buried in EQUB blocks, or marked with "these bytes appear to be unused", but the table shows what they contain in text form, which is what we analyse next.
Location in source code | Location in game binary | Contents |
---|---|---|
End of previousTime | 2 bytes at offset &1D9E to &1D9F | (spaces) |
distanceFromHit to fuelLevel | 7 bytes at offset &258C | HI=&44A Must be a fragment of "YAHI=&44A0", which sets the address of the yObjectHi variable |
scoreLo to highScoreHi | 4 bytes at offset &259C | I=&4 Must be a fragment of "ZAHI=&44C8", which sets the address of the zObjectHi variable |
End of maxLineDistance to scoreText | 570 bytes at offset &2AC6 to &2CFF | Large block of source code - see clue 1 below |
End of scoreText | 12 bytes at offset &2CF4 to &2CFF | :JMP sut5 |
End of lineEndPointId | 7 bytes at offset &2DC1 to &2DC7 | tru2:C |
End of CheckTimePassed | 10 bytes at offset &2E26 to &2E2F | =ABCwxDEFP |
End of lineStartPointId | 9 bytes at offset &2EF1 to &2EF9 | (jumbled mess) |
alienObjectId to matrix4Lo | 60 bytes at offset &3108 to &3143 | edba`_^\[ZYX}|{zxwvutrqp19 LDY#31 .??? LDX P... |
End of CheckFlyingSkills to matrix4Hi | 38 bytes at offset &315E to &3183 | (jumbled mess) |
End of scaleFactor | 3 bytes at offset &31FD to &31FF | #18 |
xObjectLo for unused objects 35-39 | 6 bytes at offset &3323 to &3327 | #31:BC |
yObjectLo for unused objects 35-39 | 6 bytes at offset &334B to &3351 | #4:JMP |
zObjectLo for unused objects 35-39 | 6 bytes at offset &3373 to &3379 | (spaces) |
xObjectHi for unused objects 35-39 | 6 bytes at offset &339B to &33A1 | A SIZE |
yObjectHi for unused objects 35-39 | 6 bytes at offset &3363 to &3369 | Tokenised BASIC line number |
zObjectHi for unused objects 35-39 | 5 bytes at offset &33EB to &33F0 | A:LSR |
End of RemoveScore to xTempPoint2Hi | 623 bytes at offset &3791 to &39FF | Large block of source code with embedded machine code - see clues 2 and 3 below |
End of forceFactor | 5 bytes at offset &3E8D to &3E91 | (jumbled mess) |
End of xJoyCoord to altitudeMinutes | 18 bytes at offset &3EEE to &3EFF | LDY#1:L |
End of ClearRows | 1 byte at offset &46F8 | @ |
End of ClearRows | 92 bytes at offset &4C9A to &4CF7 | &00s and &FFs |
Most of the above clues aren't that eye-opening: finding a snippet of code like ":JMP sut5" simply tells us that there must have been a label in the original source code called "sut5", but that's about it. Other clues, like "HI=&44A" and "I=&4", don't make much sense on their own, but they reveal their secrets when combined with other clues (in this case, the variable name XAHI from clue 3 below).
But some of them are real clues, so let's take a quick look at how we find them in the first place, before revealing the juiciest secrets below.
What clues look like in the game binary
---------------------------------------
One of the easiest ways of tracking down clues in a game binary is to load the binary into a hex editor. Hex editors show the contents of the file both as hexadecimal bytes and as ASCII characters, so if there's a block of original source code hidden in there, it should be fairly obvious. In the table above, the offsets are given from the start of the game binary, so if you load the AVIA.bin file into a hex editor and jump to the relevant offset, you should be able to see the snippets for yourself (you can grab the file from the accompanying repository if you want to try this).
In the case of hidden code from the BBC BASIC assembler sources, the embedded assembly language is generally quite readable, though the surrounding BASIC is tokenised and line numbers are stored as integers rather than ASCII text, so the source code appears as assembly language, embedded in random noise. Luckily it's easy enough to copy the source snippets into a modern text editor and strip out the line numbers, and it wouldn't be that hard to convert them to the original line numbers, given a bit of patience (I've left the line numbers out of the examples below, for clarity).
Finding machine code that's hidden in the binary is rather more difficult, as it won't be obvious from the hex editor. The only way to track this type down is to disassemble the whole binary, and work out which parts are used and which are noise.
The Aviator binary contains two big chunks of original BBC BASIC assembler source code, plus an unused chunk of machine code. Interestingly, one batch of assembler source code contains test code and differs from the final version, so our glimpse of the original source is even more intriguing than you'd think.
Let's take a deeper look at the more interesting secrets buried in the Aviator game binary.
Clue 1
------
This big block of BBC BASIC assembler code is hidden in the workspace between the maxLineDistance and scoreText variables. It contains 570 bytes of source code that looks like this once the BASIC line number bytes are stripped out (the ellipses either indicate corrupted source, or variables like maxObjDistance that contain content that overwrites the source):
.dlp2 STA&76 .dl ... :B ... :BNE dlp2 DEC&74:BNE dlp1 rts .UBUL LDY#15:STY OB:LDA#98:STA PP .ubu2 TYA:CLC:ADC#&D8:TAX JSR MOBJ JSR UOBJ LDY OB:LDA OSTAT,Y:BMI ubu1 LDA#0:STA FRFLAG .ubu1 DEC PP:DEC OB LDY OB:CPY#12:BCS ubu2 rts .SUTR JMP TEST:BMI sut1:BEQ sut1 LDA&0CC5:BNE sut1 LDA&FE64: ... ... #15:CMP#14:BCS sut1 ... A#16 .sut3 DEC THEME:LDX#7 .sut5 CPX THEME:BEQ sut4 .sut2 CMP FLDPTR,X:B
Think about it: this is part of Geoff Crammond's original source code! He literally wrote this - it's in his own, personal style, with his own indented layout, spaces between the mnemonics and variable names (but no spaces between mnemonics and numbers), and his own label names, with routine names in four-letter capitals, and in-routine labels in lower case with three letters and a number.
This is great! This is what software archaeology is all about... and it's really interesting to compare this snippet of Aviator source code with the comparatively unreadable Elite source code, which doesn't bother with things like spaces or indents or consistent labelling (see my Elite source code project to see for yourself). The difference is really illuminating; the Aviator source code is a lot neater and easier to follow, no doubt about it.
By looking at the code and comparing it to the disassembly on this site, we can see that this snippet of source code contains the end of the Delay routine, then the whole UpdateBullets routine and the start of the SpawnAlien routine. When I disassembled Aviator from the game binary, I had to invent my own labels as they don't get saved as part of the machine code binary, but you can see from the above that Geoff Crammond called the update bullets routine UBUL, and the alien spawning routine SUTR.
There are a few more interesting points. For a start, the version of UpdateBullets/UBUL in the final game is a bit longer than this version, and the instructions do not match up. Then there's a pretty explicit jump to a test routine ("JMP TEST"), which replaces the LDX themeStatus instruction at the start of the SpawnAlien/SUTR routine. But we can at least work out that the original source's name for the alienObjectId variable was FLDPTR, while AddPointToObject was known as either MOBJ or UOBJ.
It's a pretty good snippet for clues.
Clue 2
------
Interestingly, in the original game binary, the lineBufferV variable contains what looks like random noise, but it actually disassembles into code that is never called and is totally ignored, and which doesn't appear anywhere else in the main game code. Specifically, it contains slightly different versions of the DrawCanopyCorners and RemoveScore routines, so perhaps this is a glimpse into early code that didn't make it into the final game?
Here's the disassembly (the labels are mine and match the final code for these routines):
.DrawCanopyCorners LDX #7 LDA #%01110111 STA P LDA #%10001000 STA Q LDA #%11101110 STA R LDA #%00010001 STA S .corn1 LDY #1 .corn2 LDA row1_block0_0,X AND P ORA Q STA row1_block0_0,X LDA row1_block38_0,X AND R ORA S STA row1_block38_0,X DEX DEY BPL corn2 LSR R LSR S LSR P LSR Q CPX #&FF BNE corn1 RTS .RemoveScore LDY #HI(row3_block0_0) LDX #LO(row3_block0_0) LDA #8 STA R LDA #0 JSR FillCanopyRows RTS
This version of DrawCanopyCorners draws the canopy corners in a different place on-screen - right at the very edges of the screen, rather than indented by one block - and RemoveScore clears from row 3, block 0 rather than row 3, block 1. These routines therefore don't take the block of rivets round the edges of the canopy into consideration, so one wonders whether they pre-date the addition of the rivets, but somehow didn't get removed from the source code? We can but speculate...
Clue 3
------
The machine code in clue 2 is actually embedded in a larger 623-byte block of BBC BASIC assembler source - it's as if the clues themselves contain clues! This bigger block is between RemoveScore and xTempPoint2Hi, and it looks like this:
LDA&86:CLC:A ... machine code (see clue 2) ... .STIP LDX#2 .sti1 LDA XALO,Y:CLC:ADC DTIP:STA&77,X LDA XAHI,Y:ADC#5:STA&7A,X .sti4 TYA:CLC:ADC#40:TAY .sti2 DEX:BPL sti3:rts .sti3 BEQ sti1 LDA XALO,Y:STA&77,X LDA XAHI,Y:STA&7A,X:JMP sti4 .HITS LDX#2 .hit2 TYA:CLC:ADC#40:TAY LDA XALO,Y:SEC:SBC&77,X:STA&74 LDA XAHI,Y:SBC&7A,X:BNE hit1 LDA&74:CMP&80,X:BCS hit1 DEX:BPL hit2:LDA OB:STA EPTR TSX:INX:INX:TXS LDA#27:STA EPLO:rts .hit1 rts .ADIF LDA#0:STA&70:STA&72
Again, this contains part of the original source code, specifically the routine that I call GetAlienWeakSpot in my disassembly (which is called STIP in the original), CheckAlienWeakSpot (called HITS in the original), and the start of GetTrailVectorStep (called ADIF in the original).
We can also see that the original names for the variables that I call xObjectHi and xObjectLo were XAHI and XALO, so it isn't much of a leap to match yObjectHi to YAHI, zObjectLo to ZALO, and so on. Meanwhile, hitTimer is EPLO in the original, hitObjectId is EPTR, and the temporary variable that I call Q is named DTIP in the original.
This time, the code is 100% identical to the final code in the released game, so we can say with certainty that this is a genuine fragment of the original source code for the release version of Aviator, as hand-written by Geoff Crammond on his very own BBC Micro.
This is what makes all the effort worthwhile - digging into the game binary to unearth artifacts like this. It's a rare glimpse into the normally hidden world of this legendary BBC Micro programmer, a glimpse of the very keystrokes that formed part of his first real masterpiece. What a privilege it is to see... and all hidden in plain sight.