Which Assembly Registers To Store Variables
This is the 5th chapter in a series about virtual memory. The goal is to acquire some CS basics in a dissimilar and more than practical way.
If y'all missed the previous capacity, you lot should probably start there:
- Chapter 0: Hack The Virtual Retention: C strings & /proc
- Affiliate 1: Hack The Virtual Retentivity: Python bytes
- Chapter 2: Hack The Virtual Retention: Cartoon the VM diagram
- Chapter 3: Hack the Virtual Memory: malloc, the heap & the program pause
The Stack
As we have seen in chapter 2, the stack resides at the high cease of memory and grows downward. But how does information technology work exactly? How does it translate into assembly code? What are the registers used? In this chapter we will take a closer look at how the stack works, and how the program automatically allocates and de-allocates local variables.
Once we empathize this, nosotros will be able to play a bit with it, and hijack the flow of our program. Prepare? Let'southward kickoff!
Notation: We will talk but about the user stack, as opposed to the kernel stack
Prerequisites
In club to fully empathise this article, you will need to know:
- The nuts of the C programming language (especially pointers)
Surround
All scripts and programs have been tested on the following system:
- Ubuntu
- Linux ubuntu 4.4.0-31-generic #50~fourteen.04.one-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
- Tools used:
- gcc
- gcc (Ubuntu iv.8.iv-2ubuntu1~14.04.3) 4.viii.four
- objdump
- GNU objdump (GNU Binutils for Ubuntu) ii.2
Everything we cover will exist true for this arrangement/environment, just may be different on another organisation
Automatic allocation
Let's start look at a very simple programme that has one office that uses one variable (0-main.c
):
#include <stdio.h> int master(void) { int a; a = 972; printf("a = %d\n", a); render (0); }
Permit'due south compile this program and disassemble information technology using objdump
:
holberton$ gcc 0-main.c holberton$ objdump -d -j .text -Grand intel
The assembly code produced for our main
office is the following:
000000000040052d <master>: 40052d: 55 push rbp 40052e: 48 89 e5 mov rbp,rsp 400531: 48 83 ec 10 sub rsp,0x10 400535: c7 45 fc cc 03 00 00 mov DWORD PTR [rbp-0x4],0x3cc 40053c: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 40053f: 89 c6 mov esi,eax 400541: bf e4 05 xl 00 mov edi,0x4005e4 400546: b8 00 00 00 00 mov eax,0x0 40054b: e8 c0 iron ff ff call 400410 <printf@plt> 400550: b8 00 00 00 00 mov eax,0x0 400555: c9 leave 400556: c3 ret 400557: 66 0f 1f 84 00 00 00 nop Word PTR [rax+rax*i+0x0] 40055e: 00 00
Let'south focus on the start 3 lines for now:
000000000040052d <main>: 40052d: 55 push rbp 40052e: 48 89 e5 mov rbp,rsp 400531: 48 83 ec 10 sub rsp,0x10
The first lines of the function primary
refers to rbp
and rsp
; these are special purpose registers. rbp
is the base pointer, which points to the base of the current stack frame, and rsp
is the stack pointer, which points to the height of the current stack frame.
Let's decompose footstep by step what is happening here. This is the state of the stack when we enter the function main
before the first pedagogy is run:
-
button rbp
educational activity pushes the value of the registerrbp
onto the stack. Because it "pushes" onto the stack, at present the value ofrsp
is the memory address of the new acme of the stack. The stack and the registers now wait like this:
-
mov rbp, rsp
copies the value of the stack pointerrsp
to the base of operations pointerrbp
->rpb
andrsp
now both bespeak to the elevation of the stack
-
sub rsp, 0x10
creates a space to store values of local variables. The space betwixtrbp
andrsp
is this infinite. Note that this space is large enough to store our variable of typeinteger
We take just created a infinite in memory – on the stack – for our local variables. This infinite is called a stack frame. Every function that has local variables will use a stack frame to store those variables.
Using local variables
The fourth line of assembly code of our main
office is the following:
400535: c7 45 fc cc 03 00 00 mov DWORD PTR [rbp-0x4],0x3cc
0x3cc
is actually the value 972
in hexadecimal. This line corresponds to our C-code line:
a = 972;
mov DWORD PTR [rbp-0x4],0x3cc
is setting the retentivity at address rbp - 4
to 972
. [rbp - four]
IS our local variable a
. The computer doesn't actually know the name of the variable we apply in our code, it simply refers to memory addresses on the stack.
This is the state of the stack and the registers after this operation:
leave
, Automated de-allocation
If we expect now at the finish of the function, we will find this:
400555: c9 exit
The instruction leave
sets rsp
to rbp
, and and so pops the height of the stack into rbp
.
Because we pushed the previous value of rbp
onto the stack when nosotros entered the function, rbp
is at present set to the previous value of rbp
. This is how:
- The local variables are "de-allocated", and
- the stack frame of the previous function is restored before we leave the current function.
The state of the stack and the registers rbp
and rsp
are restored to the same land every bit when nosotros entered our main
role.
Playing with the stack
When the variables are automatically de-allocated from the stack, they are not completely "destroyed". Their values are still in memory, and this space volition potentially be used by other functions.
This is why information technology is important to initialize your variables when you write your code, considering otherwise, they will take whatsoever value there is on the stack at the moment when the program is running.
Allow'southward consider the following C code (1-main.c
):
#include <stdio.h> void func1(void) { int a; int b; int c; a = 98; b = 972; c = a + b; printf("a = %d, b = %d, c = %d\n", a, b, c); } void func2(void) { int a; int b; int c; printf("a = %d, b = %d, c = %d\n", a, b, c); } int chief(void) { func1(); func2(); render (0); }
Every bit yous tin can see, func2
does non ready the values of its local vaiables a
, b
and c
, yet if nosotros compile and run this program information technology will print…
holberton$ gcc 1-main.c && ./a.out a = 98, b = 972, c = 1070 a = 98, b = 972, c = 1070 holberton$
… the same variable values of func1
! This is because of how the stack works. The two functions alleged the same corporeality of variables, with the aforementioned type, in the same society. Their stack frames are exactly the aforementioned. When func1
ends, the retentivity where the values of its local variables reside are not cleared – just rsp
is incremented.
Every bit a outcome, when we call func2
its stack frame sits at exactly the aforementioned place of the previous func1
stack frame, and the local variables of func2
have the same values of the local variables of func1
when we left func1
.
Let'south examine the assembly code to bear witness it:
holberton$ objdump -d -j .text -M intel
000000000040052d <func1>: 40052d: 55 push button rbp 40052e: 48 89 e5 mov rbp,rsp 400531: 48 83 ec 10 sub rsp,0x10 400535: c7 45 f4 62 00 00 00 mov DWORD PTR [rbp-0xc],0x62 40053c: c7 45 f8 cc 03 00 00 mov DWORD PTR [rbp-0x8],0x3cc 400543: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8] 400546: 8b 55 f4 mov edx,DWORD PTR [rbp-0xc] 400549: 01 d0 add eax,edx 40054b: 89 45 fc mov DWORD PTR [rbp-0x4],eax 40054e: 8b 4d fc mov ecx,DWORD PTR [rbp-0x4] 400551: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8] 400554: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc] 400557: 89 c6 mov esi,eax 400559: bf 34 06 forty 00 mov edi,0x400634 40055e: b8 00 00 00 00 mov eax,0x0 400563: e8 a8 fe ff ff call 400410 <printf@plt> 400568: c9 go out 400569: c3 ret 000000000040056a <func2>: 40056a: 55 push rbp 40056b: 48 89 e5 mov rbp,rsp 40056e: 48 83 ec 10 sub rsp,0x10 400572: 8b 4d fc mov ecx,DWORD PTR [rbp-0x4] 400575: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8] 400578: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc] 40057b: 89 c6 mov esi,eax 40057d: bf 34 06 40 00 mov edi,0x400634 400582: b8 00 00 00 00 mov eax,0x0 400587: e8 84 fe ff ff phone call 400410 <printf@plt> 40058c: c9 leave 40058d: c3 ret 000000000040058e <main>: 40058e: 55 push rbp 40058f: 48 89 e5 mov rbp,rsp 400592: e8 96 ff ff ff call 40052d <func1> 400597: e8 ce ff ff ff call 40056a <func2> 40059c: b8 00 00 00 00 mov eax,0x0 4005a1: 5d pop rbp 4005a2: c3 ret 4005a3: 66 2e 0f 1f 84 00 00 nop Word PTR cs:[rax+rax*i+0x0] 4005aa: 00 00 00 4005ad: 0f 1f 00 nop DWORD PTR [rax]
As y'all tin meet, the way the stack frame is formed is always consistent. In our ii functions, the size of the stack frame is the same since the local variables are the same.
push rbp mov rbp,rsp sub rsp,0x10
And both functions end with the leave
statement.
The variables a
, b
and c
are referenced the same way in the two functions:
-
a
lies at memory addressrbp - 0xc
-
b
lies at memory addressrbp - 0x8
-
c
lies at retention accostrbp - 0x4
Notation that the order of those variables on the stack is not the same equally the lodge of those variables in our code. The compiler orders them as information technology wants, and then you should never assume the gild of your local variables in the stack.
So, this is the state of the stack and the registers rbp
and rsp
before nosotros get out func1
:
When we leave the part func1
, nosotros hit the instruction leave
; every bit previously explained, this is the state of the stack, rbp
and rsp
right before returning to the function main
:
So when we enter func2
, the local variables are set to whatsoever sits in retention on the stack, and that is why their values are the same as the local variables of the part func1
.
ret
You might accept noticed that all our case functions end with the instruction ret
. ret
pops the return address from stack and jumps there. When functions are chosen the program uses the instruction telephone call
to push button the return accost before information technology jumps to the first pedagogy of the function called.
This is how the program is able to telephone call a function and then return from said part the calling function to execute its side by side instruction.
And so this means that there are more than just variables on the stack, there are likewise retention addresses of instructions. Let's revisit our 1-main.c
code.
When the main
function calls func1
,
400592: e8 96 ff ff ff call 40052d <func1>
it pushes the retentivity address of the next instruction onto the stack, and and then jumps to func1
.
As a consequence, before executing any instructions in func1
, the summit of the stack contains this address, and so rsp
points to this value.
Subsequently the stack frame of func1
is formed, the stack looks like this:
Wrapping everything upward
Given what we but learned, we can directly utilize rbp
to directly access all our local variables (without using the C variables!), as well as the saved rbp
value on the stack and the return address values of our functions.
To do so in C, we tin can use:
register long rsp asm ("rsp"); register long rbp asm ("rbp");
Here is the listing of the program 2-main.c
:
#include <stdio.h> void func1(void) { int a; int b; int c; register long rsp asm ("rsp"); register long rbp asm ("rbp"); a = 98; b = 972; c = a + b; printf("a = %d, b = %d, c = %d\northward", a, b, c); printf("func1, rpb = %lx\n", rbp); printf("func1, rsp = %lx\n", rsp); printf("func1, a = %d\north", *(int *)(((char *)rbp) - 0xc) ); printf("func1, b = %d\northward", *(int *)(((char *)rbp) - 0x8) ); printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) ); printf("func1, previous rbp value = %sixty\due north", *(unsigned long int *)rbp ); printf("func1, return address value = %threescore\north", *(unsigned long int *)((char *)rbp + 8) ); } void func2(void) { int a; int b; int c; register long rsp asm ("rsp"); register long rbp asm ("rbp"); printf("func2, a = %d, b = %d, c = %d\northward", a, b, c); printf("func2, rpb = %lx\northward", rbp); printf("func2, rsp = %lx\n", rsp); } int main(void) { annals long rsp asm ("rsp"); register long rbp asm ("rbp"); printf("main, rpb = %lx\n", rbp); printf("main, rsp = %lx\northward", rsp); func1(); func2(); return (0); }
Getting the values of the variables
From our previous discoveries, we know that our variables are referenced via rbp
– 0xX:
-
a
is atrbp - 0xc
-
b
is atrbp - 0x8
-
c
is atrbp - 0x4
Then in social club to get the values of those variables, we need to dereference rbp
. For the variable a
:
- cast our variable
rbp
to achar *
:(char *)rbp
- subtract the right amount of bytes to get the accost of where the variable is in memory:
(char *)rbp) - 0xc
- cast it again to a pointer pointing to an
int
sincea
is of typeint
:(int *)(((char *)rbp) - 0xc)
- and dereference it to get the value sitting at this address:
*(int *)(((char *)rbp) - 0xc)
The saved rbp
value
Looking at the above diagram, the current rbp
straight points to the saved rbp
, and then we simply have to cast our variable rbp
to a pointer to an unsigned long int
and dereference it: *(unsigned long int *)rbp
.
The render address value
The return accost value is right earlier the saved previous rbp
on the stack. rbp
is eight bytes long, so we simply need to add 8 to the electric current value of rbp
to get the address where this return value is on the stack. This is how we practice it:
- bandage our variable
rbp
to achar *
:(char *)rbp
- add 8 to this value: ((char *)rbp + 8)
- cast it to signal to an
unsigned long int
:(unsigned long int *)((char *)rbp + 8)
- dereference it to get the value at this address:
*(unsigned long int *)((char *)rbp + 8)
The output of our program
holberton$ gcc ii-main.c && ./a.out chief, rpb = 7ffc78e71b70 primary, rsp = 7ffc78e71b70 a = 98, b = 972, c = 1070 func1, rpb = 7ffc78e71b60 func1, rsp = 7ffc78e71b50 func1, a = 98 func1, b = 972 func1, c = 1070 func1, previous rbp value = 7ffc78e71b70 func1, return accost value = 400697 func2, a = 98, b = 972, c = 1070 func2, rpb = 7ffc78e71b60 func2, rsp = 7ffc78e71b50 holberton$
We can see that:
- from
func1
we can admission all our variables correctly viarbp
- from
func1
we tin get therbp
of the functionprimary
- we ostend that
func1
andfunc2
practise have the aforementionedrbp
andrsp
values - the difference between
rsp
andrbp
is 0x10, every bit seen in the assembly code (sub rsp,0x10
) - in the
main
office,rsp
==rbp
because there are no local variables
The render address from func1
is 0x400697
. Let's double check this assumption by disassembling the program. If nosotros are correct, this should be the address of the instruction right after the telephone call of func1
in the main
role.
holberton$ objdump -d -j .text -M intel | less
0000000000400664 <main>: 400664: 55 push rbp 400665: 48 89 e5 mov rbp,rsp 400668: 48 89 e8 mov rax,rbp 40066b: 48 89 c6 mov rsi,rax 40066e: bf 3b 08 xl 00 mov edi,0x40083b 400673: b8 00 00 00 00 mov eax,0x0 400678: e8 93 fd ff ff call 400410 <printf@plt> 40067d: 48 89 e0 mov rax,rsp 400680: 48 89 c6 mov rsi,rax 400683: bf 4c 08 40 00 mov edi,0x40084c 400688: b8 00 00 00 00 mov eax,0x0 40068d: e8 7e fd ff ff call 400410 <printf@plt> 400692: e8 96 fe ff ff call 40052d <func1> 400697: e8 7a ff ff ff telephone call 400616 <func2> 40069c: b8 00 00 00 00 mov eax,0x0 4006a1: 5d pop rbp 4006a2: c3 ret 4006a3: 66 2e 0f 1f 84 00 00 nop Word PTR cs:[rax+rax*1+0x0] 4006aa: 00 00 00 4006ad: 0f 1f 00 nop DWORD PTR [rax]
And yes! \o/
Hack the stack!
Now that we know where to notice the return address on the stack, what if we were to modify this value? Could nosotros alter the flow of a programme and brand func1
return to somewhere else? Let'south add together a new function, called farewell
to our programme (three-master.c
):
#include <stdio.h> #include <stdlib.h> void bye(void) { printf("[10] I am in the function bye!\n"); exit(98); } void func1(void) { int a; int b; int c; annals long rsp asm ("rsp"); register long rbp asm ("rbp"); a = 98; b = 972; c = a + b; printf("a = %d, b = %d, c = %d\n", a, b, c); printf("func1, rpb = %lx\north", rbp); printf("func1, rsp = %lx\due north", rsp); printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) ); printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) ); printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) ); printf("func1, previous rbp value = %threescore\north", *(unsigned long int *)rbp ); printf("func1, render accost value = %lx\n", *(unsigned long int *)((char *)rbp + 8) ); } void func2(void) { int a; int b; int c; register long rsp asm ("rsp"); register long rbp asm ("rbp"); printf("func2, a = %d, b = %d, c = %d\n", a, b, c); printf("func2, rpb = %sixty\n", rbp); printf("func2, rsp = %lx\northward", rsp); } int main(void) { register long rsp asm ("rsp"); annals long rbp asm ("rbp"); printf("main, rpb = %lx\north", rbp); printf("main, rsp = %lx\north", rsp); func1(); func2(); return (0); }
Let's see at which accost the code of this function starts:
holberton$ gcc 3-main.c && objdump -d -j .text -M intel | less
00000000004005bd <cheerio>: 4005bd: 55 push rbp 4005be: 48 89 e5 mov rbp,rsp 4005c1: bf d8 07 40 00 mov edi,0x4007d8 4005c6: e8 b5 fe ff ff call 400480 <puts@plt> 4005cb: bf 62 00 00 00 mov edi,0x62 4005d0: e8 eb atomic number 26 ff ff call 4004c0 <exit@plt>
At present let's supercede the render address on the stack from the func1
office with the address of the get-go of the part bye
, 4005bd
(4-main.c
):
#include <stdio.h> #include <stdlib.h> void farewell(void) { printf("[x] I am in the office bye!\northward"); exit(98); } void func1(void) { int a; int b; int c; register long rsp asm ("rsp"); register long rbp asm ("rbp"); a = 98; b = 972; c = a + b; printf("a = %d, b = %d, c = %d\n", a, b, c); printf("func1, rpb = %lx\n", rbp); printf("func1, rsp = %lx\due north", rsp); printf("func1, a = %d\due north", *(int *)(((char *)rbp) - 0xc) ); printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) ); printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) ); printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp ); printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) ); /* hack the stack! */ *(unsigned long int *)((char *)rbp + 8) = 0x4005bd; } void func2(void) { int a; int b; int c; register long rsp asm ("rsp"); annals long rbp asm ("rbp"); printf("func2, a = %d, b = %d, c = %d\n", a, b, c); printf("func2, rpb = %lx\n", rbp); printf("func2, rsp = %sixty\due north", rsp); } int main(void) { register long rsp asm ("rsp"); register long rbp asm ("rbp"); printf("chief, rpb = %lx\n", rbp); printf("main, rsp = %lx\n", rsp); func1(); func2(); return (0); }
holberton$ gcc 4-main.c && ./a.out main, rpb = 7fff62ef1b60 main, rsp = 7fff62ef1b60 a = 98, b = 972, c = 1070 func1, rpb = 7fff62ef1b50 func1, rsp = 7fff62ef1b40 func1, a = 98 func1, b = 972 func1, c = 1070 func1, previous rbp value = 7fff62ef1b60 func1, render address value = 40074d [x] I am in the function farewell! holberton$ echo $? 98 holberton$
We have chosen the office bye
, without calling it!
Outro
I hope that you lot enjoyed this and learned a couple of things about the stack. As usual, this will be continued! Let me know if you take annihilation you would like me to cover in the next chapter.
Questions? Feedback?
If you have questions or feedback don't hesitate to ping u.s. on Twitter at @holbertonschool or @julienbarbier42.
Haters, please ship your comments to /dev/nothing
.
Happy Hacking!
Cheers for reading!
As e'er, no one is perfect (except Chuck of course), and then don't hesitate to contribute or send me your comments if you observe anything I missed.
Files
This repo contains the source code (X-principal.c
files) for programs created in this tutorial.
Read more about the virtual retentiveness
Follow @holbertonschool or @julienbarbier42 on Twitter to get the next chapters! This was the fifth affiliate in our series on the virtual retentivity. If you missed the previous ones, hither are the links to them:
- Chapter 0: Hack The Virtual Memory: C strings & /proc
- Affiliate i: Hack The Virtual Memory: Python bytes
- Chapter 2: Hack The Virtual Memory: Drawing the VM diagram
- Chapter 3: Hack the Virtual Memory: malloc, the heap & the program break
Many thanks to Naomi for proof-reading!
Which Assembly Registers To Store Variables,
Source: https://blog.holbertonschool.com/hack-virtual-memory-stack-registers-assembly-code/
Posted by: hendersonreand2000.blogspot.com
0 Response to "Which Assembly Registers To Store Variables"
Post a Comment