I used to play a ton of Roller Coaster Tycoon when I was a kid. I loved the
game but I was never very good at making the roller coasters. They always felt
too spread out, or too unnatural looking. As a ten year old I idly wondered
about writing a computer program that was really good at playing the game. What
sort of parks would it make? How would a computer approach the freedoms
inherent in an empty park? What would we learn about the game engine from doing
Sadly, this one wasn't generated by
In case you're not familiar, Roller Coaster Tycoon is a amusement park
simulation game most notable because the entire game was written in x86
assembler by Chris Sawyer.
Finally a few months ago, I had the tools and the free time available to work
on this, and I made some progress toward writing a program that would generate
cool looking roller coasters. Let's examine the parts of this program in turn.
Interacting with the Game
So let's say you have a program that can generate roller coasters. How do you
actually put them in the game, or integrate them into your parks?
Fortunately, Roller Coaster Tycoon has a format for saving track layouts
to disk. Even more amazingly, this format has been documented.
4B: departure control flags
4C number of trains
4D number of cars per train
4E: minimum wait time in seconds
4F: maximum wait time in seconds
Once you decode the ride data, it follows a format. Byte 0 stores the ride
type - 00000010 is a suspended steel roller coaster, for example. Some bytes
indicate the presence of flags - the 17th bit tells you whether the coaster can
have a vertical loop. And so on, and so on.
To compress space, RCT used a cheap run-length encoding algorithm that
would compress duplicates of the same byte. But once you encoded/decoded the
file, it was super easy to get it running in the game.
So, great! I could write my coaster generator in any language I wanted, write
out the file to disk, then load it from any of the parks.
Getting Track Data
There are a lot of track pieces in the game, and I needed to get a lot of
data about each of them to be able to make assertions about generated roller
coasters. As an example, if the track is currently banked left, which pieces
are even possible to construct next?
A steep upward slope track piece increases the car height 4 units. A sharp
left turn would advance the car 3 squares forward and 3 squares left, and also
rotate the car's direction by 90 degrees. I had less than zero interest in
doing this all by hand, and would probably make a mistake doing it. So I went
looking for the source of truth in the game..
Literally the same week that I started looking at this, Ted John
started decompiling the Roller Coaster Tycoon 2 source from x86 into
C, and posting the results on Github. More importantly (for me), Ted
and the source code actually showed how to read and decompile the source of the
The repository shipped with an EXE that would load the C sources before the
x86 game code. From there, the C code could (and did, often) use
assembler calls to jump back into the game source, for parts that hadn't been
This also introduced me to the tools you use to decompile x86 into C. We used
the reverse engineering tool IDA Pro to read the raw assembly, with a
shared database that had information on subroutines that had been decompiled.
Using IDA is probably as close as I will come to a profession in code-breaking
and/or reverse engineering.
Most of the time with IDA involved reading, annotating the code, and then
double checking your results against other parts of the code, the same way
you might annotate a crossword puzzle. Other times I used guess and check -
change a value in the code, then re-run the game and see what specifically had
changed, or use debugging statements to see what went on.
So I started looking for the track data in the game. This turned out to
be really, really difficult. You would have a hunch, or use the limited
search capability in the game to search for something you thought should be
there. Ultimately, I figured out where the strings "Too high!" and "Too
low!" were being called in the engine, figuring that track height
data would have been computed or used near those points.
This was only part of the solution - it turns out that track data is not
stored in one big map but in several maps all around the code base. Some
places store information about track bank, some store information about
heights and it's tricky to compile it all together. Ultimately, I was
able to figure it out by spending enough time with the code and testing
different addresses to see if the values there lined up with the
pre-determined track order.
With a genetic algorithm you are going to be generating a lot of roller
coasters. I wanted a quick way to see whether those roller coasters were
getting better or not by plotting them. So I used Go's image package
to draw roller coasters. To start I didn't try for an isometric view, although
that would be fun to draw. Instead I just plotted height change in one image
and x/y changes in another image. Running this against existing roller
coasters also revealed some flaws in my track data.
A fitness function
A good fitness function will have penalties/rewards for various pieces of
Is the ride complete?
Does the ride intersect itself at any points?
Does the ride respect gravity, e.g. will a car make it all the way around the
How exciting is the ride, per the in-game excitement meter?
How nauseating is the ride, per the in-game excitement meter?
The first two points on that list are easy; the last three are much more
difficult. Finding the excitement data was very tricky. I eventually found it
by getting the excitement for a "static" ride with no moving parts (the Crooked
House) and searching for the actual numbers used in the game. Here's the
function that computes excitement, nausea and intensity for a Crooked House
sub_65C4D4 proc near
or dword ptr [edi+1D0h], 2
or dword ptr [edi+1D0h], 8
mov byte ptr [edi+198h], 5
mov ebx, 0D7h ; ''
mov ecx, 3Eh ; '>'
mov ebp, 22h ; '"'
mov [edi+140h], bx
mov [edi+142h], cx
mov [edi+144h], bp
xor ecx, ecx
mov dl, 7
shl dl, 5
and byte ptr [edi+114h], 1Fh
or [edi+114h], dl
Got that? In this case
0xD7 in hex is 215 in decimal, which is the ride's
excitement rating. I got lucky that this value is static and not changed by
anything, which meant I could search for it from outside the binary. This is
then stored in the ride's location in memory (register
edi), at the offset
0x140. In between there are a few subroutine calls, which shows that nothing
is ever really easy when you are reading x86, as well as calls to functions
that I have nothing besides hunches about.
Anyway, when you turn this into C, you get something like this:
void crooked_house_excitement(rct_ride *ride)
// Set lifecycle bits
ride->lifecycle_flags |= RIDE_LIFECYCLE_TESTED;
ride->lifecycle_flags |= RIDE_LIFECYCLE_NO_RAW_STATS;
ride->var_198 = 5;
ride_rating excitement = RIDE_RATING(2,15);
ride_rating intensity = RIDE_RATING(0,62);
ride_rating nausea = RIDE_RATING(0,34);
excitement = apply_intensity_penalty(excitement, intensity);
rating_tuple tup = per_ride_rating_adjustments(ride, excitement, intensity, nausea);
ride->excitement = tup.excitement;
ride->intensity = tup.intensity;
ride->nausea = tup.nausea;
ride->upkeep_cost = compute_upkeep(ride);
// Upkeep flag? or a dirtiness flag
ride->var_14D |= 2;
// clear all bits except lowest 5
ride->var_114 &= 0x1f;
// set 6th,7th,8th bits
ride->var_114 |= 0xE0;
And we're lucky in this case that the function is relatively contained; many
places in the code feature jumps and constructs that make following the code
So this one wasn't too bad, but I got bogged down trying to compute excitement
for a ride that had a track. The function gets orders of magnitude more complex
than this. One positive is, as far as I can tell, excitement and nausea ratings
are wholly functions of overall ride statistics like the vertical and lateral
G-forces, and there's no accumulator per track segment.
Most of the computation involves multiplying a ride statistic by a
constant, then bit shifting the value so it can't be too
high/influence the final number by too much.
And sadly, this is where the project stalled. It was impossible to test the
C code, because the track computation functions were buried four subroutines
deep, and each of those subroutines had at least 500 lines of code. Decompiling
each of these correctly, just to get to the code I wanted, was going to be
a massive pain. There are ways around this, but ultimately I got back from
vacation and had to focus on more pressing issues.
You can hack Roller Coaster Tycoon! There are a bunch of people doing
interesting stuff with the game, including improving the peep UI, working
on cross compilation (you can play it on Macs!), adding intrigues like the
possibility of a worker strike, removing limitations based on the number of
bytes (you can only have 255 rides, for example), and more.
It's been really fun having an utterly useless side project. I learned a lot
about registers, calling conventions, bit shifting tricks, and other things
that probably won't be useful at all, for anything.
I will definitely revisit this project at some point, hopefully when more of
the game has been decompiled, or I might try to dig back into the x86/C more on