Previously on NASM...
Some time ago I posted a blog entry called A Gentle Introduction to NASM: Why and How to Get Started where I explained why I think NASM is so cool and some of the best parts of learning Assembly language. Since my laptop runs on an Intel i5 Coffee Lake, the ISA I use is x86. I explained this in more detail in said post. All this basically means that NASM is a really nice option to get started with this and that it's how I'll be showcasing the basics of x86 Assembly.
I'm well aware that this architecture is falling behind in recent times due to events such as the Rebirth of ARM (which might be a later post) and the even more recent RISC V release. In case you are interested in reading about this, I'll link this RISC V post, where you can dive a bit deeper on this new ISA.
So if it's becoming outdated, why is it I choose it to teach you? Because it's powerful, mainly. It's relatively easy to grasp and still, a high percentage of the current processors in the market are Intel, as you can see in this article as of 2024, and also, it's the one I'm running. The thing is it seems to become outdated by the day, but it's not quite yet.
In this post I intend to code your very first Assembly program, explaining bit by bit what does each and every part mean, as well as compiling it both with the tool I provided in my repo and by hand, in case you distrust me so badly you'd rather do it yourself. (If that's the case, are you perchance working in Cybersecurity? If not, consider it as a possible career path).
The anatomy of (mostly) any NASM program
Well, in short, any nasm program is composed by three and a half main parts. Why three and a half you ask? Keep reading and find out, I'm not going to spoil the fun for you. Basically three of those parts are needed for the program to run, the other one is not mandatory, but given how fast Assembly can escalate in complexity, it's very recommended. That's the only hint I'm giving you.
Now calling what composes a NASM program parts is both neither accurate nor proffesional. The technical name is sections and this is how I'm going to refer to them as.
1. Section .data
: The Shelf
First up, we have the .data
section. You can think of it as your program's personal vault. This is where all your valuables (aka data) are stored. You’ll find your string messages, numeric constants, and all the other little tidbits that need to hang out in memory during your program’s run. Here you can declare any variable you want such as strings (character arrays), integers, floating point numbers, etc. Here's a small example of what I mean:
section .data
message db 'Hello, world!', 0
age db 25
word dw 49865 ; Define a word
dword dd 159385 ; Define a double word
qword dq 29347589798473 ; Define a quad word
So here's the deal, you first declare the name of the variable, followed by db
which means define byte and then the content of said variable. Keep in mind that NASM has bytes, words, double words and quad words. Bytes are, well, bytes, 8 bits. Words are 16 bits or 2 bytes, double words contain 32 bits or 4 bytes and quad words are specific of x86_64 containing 64 bits or 8 bytes. Still, for now we will mostly use bytes and words, so don't sweat it too much.
Also, remember that whenever you declare a String in NASM you need to also put the NULL or EOL (End of Line) characters in order to indicate where does the String end. The null character has a value of 0 or 0x0 in hexadecimal and the new line char has a value of 10 or 0x0A in hex.
2. Section .bss
: The Dynamo
Next, we have the .bss section, which is short for "Block Started by Symbol," but let’s call it the "Better Start Stocking" section. This is where we declare variables that don’t have any initial value. It's the dynamic equivalent of the previous one. Here we just indicate how many bytes we want to reserve for a particular variable so we can dump a value on it later on. For now this will not be used, but it's always nice to know you can do this. Below there's an example of how you would use this section.
section .bss
counter0 resb 1 ; Reserve 1 byte
counter2 resw 1 ; Reserve 1 word
counter3 resd 1 ; Reserve 1 double word
counter4 resq 1 ; Reserve 1 quad word
As you can see, the counter variables are currently empty and hold no value, but they do have space set aside for future use. Also, take note of how NASM keeps the size suffixes consistent.
These size suffixes are those final letters (b, w, d, q) that indicate the size of whatever you’re dealing with. In this case, res
stands for reserve, and the last letter tells the compiler how many bytes should be set aside.
3. Section .text
: The Spellbook
Finally, we arrive at the .text
section. This is where the magic happens, hence the name "Spellbook". This is where the actual instructions of the program are written. Here, you tell the CPU exactly what you want it to do. Painstakingly, step by step. Byte by byte.
Granted, there’s a huge variety of instructions. As we mentioned in the last NASM post, x86 is a CISC ISA, so there’s a lot to keep track of. I’m providing a handy x86 cheat sheet so you can quickly glance at some of the instructions you might encounter. Still, I’ll do my best to explain each one I use and demonstrate here.
section .text
global _start
_start:
mov eax, 0x4 ; System syscall code: write
mov ebx, 1 ; file descriptor: stdout
mov ecx, message ; message to write
mov edx, 14 ; message length
int 0x80 ; perform the system call, raise interrupt
mov eax, 0x1 ; System syscall code: exit
xor ebx, ebx ; exit code 0
int 0x80 ; perform the system call, raise interrupt
Even if this is a short program, there's a lot to explain so pay attention. Once you understand the underlaying mechanic, you'll start seeing these programs as successive blocks of code with a particular goal instead of endless lines of haphazardly laid out code.
So, the global _start
is there to tell the assembler to make the _start
label accessible from outside this file. This is important because it is the entry point of your program, the first instruction that will run when your program is executed. Without this, the operating system wouldn’t know where to begin.
Then you can see two separated blocks. The first one prints the message on the standard output and the second exits the program. How? Well, just as the comments say. In any operating system you have interruption codes that allow you to perform certain operations. In this case, whenever you execute an interruption when the value of the register eax
is 4 (0x4), the function executed is a syscall write. Then it's the OS the one that does the rest of the hard work, looking for values in preset registers. Values that, of course, have to be set by you.
In order for you to understand this better, I'm breaking it down briefly:
- eax: Stores the code of the syscall. Different codes will execute different syscalls.
- ebx: In this case, it contains where should the write syscall put the text (STDOUT).
- ecx: Contains the message itself.
- edx: Indicates the lenght of the char array.
- int 0x80: Tells the OS to execute an interrupt. It can also work with the
syscall
keyword
If you understood the previous block, the second one should be easy to grasp. The gimmick is exactly the same, but instead of executing a syscall write, we execute the exit of the program.
Just in case, I'm going to break it down too:
- eax: Contains, again, the code for the syscall to be executed. In this case, 0x1 is for the exit syscall.
- ebx: Stores the value of the exit. It's 0 because it terminated successfully.
- int 0x80: Same as before, execute the syscall stored in the eax registry.
As you can see it's relatively simple. A bit weirder than regular programming (okay I admit, a lot weirder) but once you grasp the basics, decoding what the hell all this means is not that hard, at least for now. Once your programs start to have 500+ lines of code, it can become a tiny bit of a nightmare but for now it's all rainbows and sunshine.
3.5 Section ; Comments
: The footnotes
Writing code is just part of actually developing programs or coding in general. As you might have foreseen, this kind of programming escalates really quickly in complexity. So for the love of God, your sanity, and the sake of clean code, document your code. I really can't stress this enough. Just imagine what you will have to go through if you code a relatively big file in Assembly and then park it for a couple of weeks. If you come back after that, may the Lord have mercy on your soul because you will end up spending more time trying to figure out what all that messy spaghetti does rather than actually doing some productive work.
At least if you have it well documented, you can spend the first 20-30 minutes reading the comments and getting back the idea of how everything works together. Also, please separate your code into blocks. If you are capable of making each block have a relatively distinct functionality instead of meshing everything together, then when you try to debug, you can pinpoint the problem more easily, and it also helps you understand the code better.
It's true that there's absolutely nothing stopping you from ignoring any good advice and coding like a madman for lines on end. Just know that the next time you open that file, you'll be playing a game where you are the killer, the victim, and the detective. Just saying.
The complete code
Alright, that's enough yapping. here's the complete code so you can copy, paste and run it. It should work. If it doesn't, great! Time to practice whatever you learned here. Anyway:
section .data
msg db 'Hello, world!', 0xA ; Message to write to console
section .text
global _start
_start:
; Prepare parameters for write system call
mov eax, 0x4 ; System call number for write
mov ebx, 1 ; File descriptor for console output
mov ecx, msg ; Pointer to message to write
mov edx, 14 ; Length of message
; Invoke write system call
int 0x80 ; Interrupt to trigger system call
; Terminate program
mov eax, 0x1 ; System call number for exit
xor ebx, ebx ; Return value
int 0x80 ; Interrupt to trigger system call
That's enough
It's been a short post hasn't it? Well, I thought it'd be better if I make a short one that is more hands on. Still, it's got a lot of information, and if it's your first time battling this prehistorical language, I'm guessing you've got enough.
I know, I know, if you read my last NASM post I said I'd teach you something more complex than just a "Hello World!" but what do you expect? Most people here don't even know the basics, I thought it'd be more suitable starting step by step. Regardless, I'm going to showcase more complex programs. I really want to make a good guide on how Assembly works and how you can use it well. Your time will come my nerdy Low-Level Padawan.
That's it for now. Isn't it enough? Perfect, stay tuned for more. See you next time. Keep coding!
Comments
Post a Comment