Firstly, the toolset.
The first thing we'll be needing is development tools. Yeah, there's "off the peg" toolsets available, but I wanted to be up at the bleeding edge. So, off to GNU's site, and let's get cracking.
I built and installed the latest versions of libtools (which includes the assembler and linker), gcc, g++, newlib and gdb, all for target arm-none-eabi. If you want to know how to do this, googling "arm bare metal" should elucidate. Otherwise, there's always codesourcery.
Now, booting. Obviously, the first thing we need to do is boot the board. In my case, it's very uncomplicated. No first-stage booters, no relocating stuff from flash, just a bunch of RAM that your binary gets loaded into, starting at address 0x00000000. Easy peasy.
So. How does ARM (specifically, the ARM1176jzf-s processor on the Raspberry Pi) boot? Well, there's chapter and verse on the ARM site, but here's the TL;DR version.
When the ARM powers on, it executes ARM (32 bit) instructions starting from address 0x00000000.
Simples, right? Well, not quite. Address 0x00000000 is the start of what's known as the exception vector table, which contains 8 bytes for each of 8 potential exceptions. 8 bytes (or 2 words) is enough to store an absolute jump instruction, or an instruction to move an address from memory into the program counter. So the simplest vector table would look like this:
b _reset @ Power on reset
b _undef @ Undefined instruction
b _swi @ Software interrupt
b _prefetch_abort @ Prefetch Abort
b _data_abort @ Data abort
b . @ Unused
b _irq @ IRQ
b _fiq @ "fast interrupt"
And that would be fine. However, that's not how it's normally done, mainly because it's impossible, with this setup, to change the vectors on the fly. So what we do is this:
.section .reset, "ax"
ldr pc, _reset_address
ldr pc, _undef_address
ldr pc, _swi_address
ldr pc, _prefetch_abort_address
ldr pc, _data_abort_address
ldr pc, _irq_address
@ Fast interrupt handler starts here
_reset_address: .word _reset
_undef_address: .word _undef
_swi_address: .word _swi
_prefetch_abort_address: .word _prefetch_abort
_data_abort_address: .word _data_abort
_irq_address: .word _irq
.set _undef, _no_handler
.set _prefetch_abort, _no_handler
.set _data_abort, _no_handler
That's loads bigger, but what does it change, exactly?
The "fast interrupt" code gets to miss an indirection, so it's faster. We simply start the interrupt handler directly at the end of the vector table. I'm not actually doing this at the moment, but it's possible.
The other exceptions load their address from an indirection table, so we can repatch them on the fly.
We have a "generic" handler for unhandled exceptions. The way that gets patched in is to do with the linker. A .weak directive for a symbol will allow us to simply not define a symbol in our code, and the linker will replace it with zero instead of barfing. The .set directive enables us to use a different default to zero. Thus, any of the _undef, _prefetch_abort or _data_abort entry points (in the code above) will redirect to _no_handler unless we define those entry points elsewhere. This is a trick we'll use again later. Note _reset, _swi and _irq have no defaults, and thus must be defined elsewhere (I've defined them to simply jump to _no_handler for the moment.
All we need to do is assemble that and link it to load at 0x00000000, and we have a booter. It will do bugger all, but it will work.