Introduction
Merry Crhistmas my fellow nerds! Here's my gift to you, I hope you enjoy.
If you recall, in the last post, Your first NASM program we learned a very few basic concepts about Assembly like interruptions, codes and the role of some registers when executing an interruption. Well, interruptions are very common in this kind of programming. They basically tell the CPU to stop everything, shout to the operating system for help, and wait for a task to be done. Want to print text to the screen? You’re going to need an interruption. Need to read from a file? Time for another system interruption
The thing is, these interrupts aren’t the only way to modify the execution flow. You have two more tools in your toolbelt: functions and procedures. These are essentially two fancy names for the same concept. Both are ways of altering the regular flow of execution, and both can take in parameters and perform operations that you code. The main difference is that functions return a value, typically stored in the eax
register, whereas procedures do not. So, yeah, these are like interrupts you code manually. More or less. Okay, not exactly, but the end result is somewhat similar. Kind of.
My intention for this post is to teach you the basics of functions and procedures. If you understand these, you'll have an easier time with interruptions. As I said, they are not technically the same, because interruptions halt code execution and yield control to the OS whereas functions are still your code, so no privileged execution there, but the big picture is basically the same. Store parameters, execute the call and return.
Also, I will be assuming you know what registers are and what they are used for. In case you don't, not a problem, just check out this guide about registers. It's not the greatest thing in the world, but at least it'll get you started. Without any further ado, let's do further.
The anatomy of a function
In assembly, functions and procedures help keep your code organized, reusable, and more importantly, debuggable. Because as much fun as it is to single-step through 700 lines of spaghetti code, having your operations neatly tucked into functions is like programming bliss. And as someone who's been at war with his own code, trust me.
Both functions and procedures share a very similar anatomy, so I'll be covering only the functions, since those are the ones who return a value. The rest is exactly the same.
The entry point
Every function or procedure needs a starting line. In Assembly, this is marked by a label, a friendly little marker that the CPU can jump to when it’s time to execute the function. It’s like saying, "Hey, start here when you’re ready!"
my_neat_little_function:
And voilá. Just like that you have declared the entrypoint for a function. Remember that tags in assembly need to end with a colon (:). Now, once the CPU encounters this label (after a good old-fashioned call
), it knows exactly where to go
(Golden) Retrieving the parameters
Most of the time, when you define a function, you expect to pass in some data and get something back. That’s the deal. But this is assembly so there’s no fancy syntax where you just say, “Hey, pass this integer called number to the function and let the magic happen.” Nope. You have to use the stack. And let’s be honest, this is exactly what scares people away from assembly. You’ve got to keep track of how many bytes you’ve shifted the stack by, retrieve each parameter manually, do your thing, and then clean up the stack afterward.
It’s tricky and requires a decent understanding of what memory addresses are and how they work. For now, just remember this: push the values you want onto the stack BEFORE calling the function. Don’t worry if this sounds confusing, we’ll dig deeper into it later in the post.
Once your values are safely pushed onto the stack, you can call the function. The first thing you need to do after entering the function is to retrieve those values in the correct order. That means you’ll pull them off the stack in reverse order to how you pushed them. So, the last parameter you push is the first one you retrieve after the function is called.
Here's an example of what I mean.
_start:
mov eax, 4321 ; Define a number
push eax ; Save the numner to the stack
call int_to_str ; Call the function
.
.
.
int_to_str:
; Now the state of the stack is as follows:
; ,_________________,
; |_________________|
; |_______ret_______| <-- esp points here
; |_______eax_______|
; |_________________|
mov eax, [esp + 4]
This code might surprise you for a couple reasons. Firstly, what the hell is ret
? Simply put, the memory address of the next instruction to be executed after the return of the function. That's how the CPU knows what's the next step after the function/procedure is done. And secondly, why is it esp + 4
if we are moving down the stack? Well, because the memory addresses of the stack grow "negatively". This means that the "higher" on the stack, the lower the memory address becomes.
FYI, esp
literally means Extended Stack Pointer and always should be pointing to the top of the stack. It can be modified, but be careful.
Another thing that might catch your attention is why 4 and not any other value? Well, as everything in here, it has a reason. A good one by the way, but it will take a while to explain, so we'll focus on that later.
Doing the Work: The Function’s Job
After retrieving the parameters, you need to do something with them, right? Well, that's precisely the next step to be done. Perform whatever actions you need to do with them. This is an example, but this will depend on what you want that function to do.
; Initialize buffer pointers
mov edi, buffer_input + 11 ; Point to the end of the buffer
mov byte [edi], 0 ; Null terminator
; Handle zero explicitly
test eax, eax
jnz itoa_loop
mov byte [edi], '0'
jmp itoa_loop_done
.
.
.
Don't worry, I'll give you the full code of the function, how it's used and all that so you can copy it, tinker and break it down all on your own.
The exit plan
If your function is the kind that gives back (i.e. a function rather than a procedure) it’s going to return a value. In assembly, this value is most often passed back through the rax register although there's nothing set in stone. However, if you are going to distribute your code, make sure to follow a convention or document well if you don't.
Keep in mind that pushing and popping data to and from the stack can be done inside or outside a function. You could do it in the main code before calling the function, like we saw earlier, or right after the function's label. But honestly, it’s much better to handle it inside the function. Why? Because it saves users from having to gather all the "ingredients" beforehand. And let’s be real here, users will mess up anything they can, and even things they can’t! So, make life easier for everyone and wrap it all up in the function. Also, remember not to do it twice. I don't know why, but I felt compelled to explain this.
Here's the exit part of my "itoa" function:
itoa_loop_done:
; You can clear the stack values here.
; Before the ret you should make sure the values
; you want to return are accesible and stored
; in the registers you want or have documente or have documented.
ret
And here an example of a function that adds two numbers and exits how it should be done instead:
add_numbers:
; Function to add two numbers
; Arguments are passed on the stack
; Return value is in eax
push ebp ; Save the base pointer
mov ebp, esp ; Set the base pointer to the current stack pointer
mov eax, [ebp+8] ; Get the first argument
mov ebx, [ebp+12] ; Get the second argument
add eax, ebx ; Add the two numbers
pop ebp ; Restore the base pointer
ret
POP vs MOV <reg>, [esp + n]
In short, pop
modifies automagically the esp
(stack pointer) register, meaning the top of the stack changes every time you execute it. mov
on the other hand doesn't, which means you can retrieve values from the stack while maintaining the pointer to the top untouched.
In other words, when you use pop
, you're basically saying, "Hey, give me whatever’s on the top of the stack and put it in this register." It also has a side job: after grabbing the value, it automatically moves the stack pointer (esp
) to the next spot, like cleaning up after itself. So it's super nice and simple!
Now, with mov eax, [esp+n]
, things get a little more manual. Here, you're telling the processor, "I know exactly where I want to get my value from, specifically esp + n
." You're not messing with the stack pointer at all. It’s like reaching over a couple of pizza slices to grab the third one instead of just taking the one on top. And the pizza box? It stays right there. The stack pointer doesn’t budge, so no automatic cleanup happens.
The big difference? pop
is hands-off and moves the stack pointer for you, making it a quick and tidy operation. Meanwhile, mov eax, [esp+n]
is more DIY. You get exactly what you want from the stack, but the stack pointer doesn't move, so you’re responsible for keeping things organized.
When to use each? Obviously it depends wether you want to alter the top of the stack or not. Have you not been paying attention?
Full code
I'm giving you the complete code of the itoa function. What it does is it takes an Integer and transforms it into a String (Int to ASCII = itoa). If you want to leave it here, feel free to do so. The next section is me explaining briefly the stack and why the values go in increments of 4.
section .bss
buffer_input resb 12 ; Reserve 12 bytes for the string buffer (sufficient for a 32-bit integer + null terminator)
section .data
ten dd 10
newline db 10
section .text
global _start
_start:
mov eax, 4321 ; Define a number
push eax ; Save the numner to the stack
call int_to_str ; Call the function
add esp, 4 ; Remove eax from stack (no pop)
; Print the number 4321
mov eax, 4 ; Syscall write
mov ebx, 1 ; Stdout
mov ecx, buffer_input ; Msg to print in ecx
mov edx, 12 ; Len of buffer
int 0x80 ; Execute interruption
; Print a newline char
mov eax, 4
mov ebx, 1
mov ecx, newline
mov edx, 1
int 0x80
; Exit program
mov eax, 1
xor ebx, ebx
int 0x80
; Definition function. Receives a number as a parameter and
; Transforms it into suitable text to print.
int_to_str:
; Now the state of the stack is as follows:
; ,_________________,
; |_________________|
; |_______ret_______| <-- esp points here
; |_______eax_______|
; |_________________|
mov eax, [esp + 4]
; Initialize buffer pointers
mov edi, buffer_input + 11 ; Point to the end of the buffer
mov byte [edi], 0 ; Null terminator
; Handle zero explicitly
test eax, eax
jnz itoa_loop
mov byte [edi], '0'
jmp itoa_loop_done
itoa_loop:
; Convert each digit to a character
xor edx, edx ; Clear edx for division
mov esi, 10
div esi ; Divide eax by 10
add dl, '0' ; Convert remainder to ASCII
mov [edi], dl ; Store ASCII character
dec edi ; Move to the next character
test eax, eax ; Check if eax is zero
jnz itoa_loop ; Continue if not zero
itoa_loop_done:
; You can clear the stack values here, altough
; in my opinion that should be done outside the
; function.
; Before the ret you should make sure the values
; you want to return are accesible and stored
; in the registers you want or have documente or have documented.
ret
The triple S (Spooky Scary Stack)
Alright, let’s dive into the world of memory addresses! Picture the stack as a gigantic filing cabinet. Each drawer in this cabinet has a number (that’s the memory address), and inside each drawer you store some data, specifically a byte (8 bits). Now, in computers, these "drawers" are in super tight formation—each one is just a tiny chunk of space. But here’s the thing: depending on the architecture (like 32-bit or 64-bit), your cabinet drawers either hold 4 bytes or 8 bytes of data. Wow, these number already appeared. Perhaps there's a method to the madness.
But why does the stack grow in increments of 4 or 8? Well, it’s all about alignment. Computers love to access memory in fixed chunks. In a 32-bit system (where registers are 4 bytes), the stack moves in 4-byte jumps so every time you "pop" something from the stack, it fits entirely in a single registry. In a 64-bit system, where registers are 8 bytes, the stack jumps by 8. It’s like stacking books that are all the same size, it keeps everything nice and neat. If you tried to stack a bunch of different-sized books, it would get messy real quick, and your CPU would waste time organizing things. That’s why we stick to increments of 4 or 8.
Now you might be thinking, "Well, a memory address holds just one byte, yet the stack moves in increments of 4 or 8, what's stopping me from making increments of 3 or 18 if I want?" The answer is nothing, really. This is Assembly, boy! You can do what you want, but the Operating System will always store information according to this logic, so If you make your own increments you'll get corrupt or garbage data.
; Below you have a diagram explaining the difference graphically. Basically each Memory Address holds a single byte, and since a register can hold many bytes, the step is as big as the register itself, containing several Memory Addresses.
So, to recap: memory addresses are like numbered drawers in a filing cabinet, and the stack is like a stack of books where you can only work with the topmost one. The increments of 4 or 8 bytes come down to the architecture, ensuring your computer can access memory efficiently. Makes sense? Neat.
The renowned artist
That diagram above was done by hand using excalidraw.com but, before I painstakingly drawed all this, I asked ChatGPT for a picture of a filing cabinet representing the stack of a computer and this is what it came up with:
You can barely make out what is going on in there, but I thought it was funny and didn't know where to add it, so here it is. A little mishap with GPT
The end
Well, this is it for now. You have learned what a function is, the diference with a procedure, how to design one and how the stack works. Just remember: always let the function do the heavy lifting, don’t get fancy with your memory addresses (stick to 4 and 8 steps), and most importantly, treat the stack with respect because you never know when your program might throw a tantrum if you mess it up.
Also, I'd like to add that if the previous complete program is overwhelming, do not fret. I'm set to explain it in detail in a later post. There's a few things to unpack there, and I want to make sure you understand everything. However, I think it's interesting enough to be used here as an example of what a function is.
Regardless that is all for now. Happy coding, and may your stack never overflow (unless you're into that kind of thing)
Comments
Post a Comment