banner



Which Assembly Registers To Store Variables

This is the 5th chapter in a series about virtual memory. The goal is to acquire some CS basics in a dissimilar and more than practical way.

If y'all missed the previous capacity, you lot should probably start there:

  • Chapter 0: Hack The Virtual Retention: C strings & /proc
  • Affiliate 1: Hack The Virtual Retentivity: Python bytes
  • Chapter 2: Hack The Virtual Retention: Cartoon the VM diagram
  • Chapter 3: Hack the Virtual Memory: malloc, the heap & the program pause

The Stack

As we have seen in chapter 2, the stack resides at the high cease of memory and grows downward. But how does information technology work exactly? How does it translate into assembly code? What are the registers used? In this chapter we will take a closer look at how the stack works, and how the program automatically allocates and de-allocates local variables.

Once we empathize this, nosotros will be able to play a bit with it, and hijack the flow of our program. Prepare? Let'southward kickoff!

Notation: We will talk but about the user stack, as opposed to the kernel stack

Prerequisites

In club to fully empathise this article, you will need to know:

  • The nuts of the C programming language (especially pointers)

Surround

All scripts and programs have been tested on the following system:

  • Ubuntu
    • Linux ubuntu 4.4.0-31-generic #50~fourteen.04.one-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  • Tools used:
    • gcc
    • gcc (Ubuntu iv.8.iv-2ubuntu1~14.04.3) 4.viii.four
    • objdump
    • GNU objdump (GNU Binutils for Ubuntu) ii.2

Everything we cover will exist true for this arrangement/environment, just may be different on another organisation

Automatic allocation

Let's start look at a very simple programme that has one office that uses one variable (0-main.c):

          #include <stdio.h>  int master(void) {     int a;      a = 972;     printf("a = %d\n", a);     render (0); }                  

Permit'due south compile this program and disassemble information technology using objdump:

          holberton$ gcc 0-main.c holberton$ objdump -d -j .text -Grand intel                  

The assembly code produced for our main office is the following:

          000000000040052d <master>:   40052d:       55                      push   rbp   40052e:       48 89 e5                mov    rbp,rsp   400531:       48 83 ec 10             sub    rsp,0x10   400535:       c7 45 fc cc 03 00 00    mov    DWORD PTR [rbp-0x4],0x3cc   40053c:       8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]   40053f:       89 c6                   mov    esi,eax   400541:       bf e4 05 xl 00          mov    edi,0x4005e4   400546:       b8 00 00 00 00          mov    eax,0x0   40054b:       e8 c0 iron ff ff          call   400410 <printf@plt>   400550:       b8 00 00 00 00          mov    eax,0x0   400555:       c9                      leave     400556:       c3                      ret       400557:       66 0f 1f 84 00 00 00    nop    Word PTR [rax+rax*i+0x0]   40055e:       00 00                  

Let'south focus on the start 3 lines for now:

          000000000040052d <main>:   40052d:       55                      push   rbp   40052e:       48 89 e5                mov    rbp,rsp   400531:       48 83 ec 10             sub    rsp,0x10                  

The first lines of the function primary refers to rbp and rsp; these are special purpose registers. rbp is the base pointer, which points to the base of the current stack frame, and rsp is the stack pointer, which points to the height of the current stack frame.

Let's decompose footstep by step what is happening here. This is the state of the stack when we enter the function main before the first pedagogy is run:

the stack

  • button rbp educational activity pushes the value of the register rbp onto the stack. Because it "pushes" onto the stack, at present the value of rsp is the memory address of the new acme of the stack. The stack and the registers now wait like this:

the stack

  • mov rbp, rsp copies the value of the stack pointer rsp to the base of operations pointer rbp -> rpb and rsp now both bespeak to the elevation of the stack

the stack

  • sub rsp, 0x10 creates a space to store values of local variables. The space betwixt rbp and rsp is this infinite. Note that this space is large enough to store our variable of type integer

the stack

We take just created a infinite in memory – on the stack – for our local variables. This infinite is called a stack frame. Every function that has local variables will use a stack frame to store those variables.

Using local variables

The fourth line of assembly code of our main office is the following:

                      400535:       c7 45 fc cc 03 00 00    mov    DWORD PTR [rbp-0x4],0x3cc                  

0x3cc is actually the value 972 in hexadecimal. This line corresponds to our C-code line:

          a = 972;                  

mov DWORD PTR [rbp-0x4],0x3cc is setting the retentivity at address rbp - 4 to 972. [rbp - four] IS our local variable a. The computer doesn't actually know the name of the variable we apply in our code, it simply refers to memory addresses on the stack.

This is the state of the stack and the registers after this operation:

the stack

leave, Automated de-allocation

If we expect now at the finish of the function, we will find this:

                      400555:       c9                      exit                  

The instruction leave sets rsp to rbp, and and so pops the height of the stack into rbp.

the stack

the stack

Because we pushed the previous value of rbp onto the stack when nosotros entered the function, rbp is at present set to the previous value of rbp. This is how:

  • The local variables are "de-allocated", and
  • the stack frame of the previous function is restored before we leave the current function.

The state of the stack and the registers rbp and rsp are restored to the same land every bit when nosotros entered our main role.

Playing with the stack

When the variables are automatically de-allocated from the stack, they are not completely "destroyed". Their values are still in memory, and this space volition potentially be used by other functions.

This is why information technology is important to initialize your variables when you write your code, considering otherwise, they will take whatsoever value there is on the stack at the moment when the program is running.

Allow'southward consider the following C code (1-main.c):

          #include <stdio.h>  void func1(void) {      int a;      int b;      int c;       a = 98;      b = 972;      c = a + b;      printf("a = %d, b = %d, c = %d\n", a, b, c); }  void func2(void) {      int a;      int b;      int c;       printf("a = %d, b = %d, c = %d\n", a, b, c); }  int chief(void) {     func1();     func2();     render (0); }                  

Every bit yous tin can see, func2 does non ready the values of its local vaiables a, b and c, yet if nosotros compile and run this program information technology will print…

          holberton$ gcc 1-main.c && ./a.out  a = 98, b = 972, c = 1070 a = 98, b = 972, c = 1070 holberton$                  

… the same variable values of func1! This is because of how the stack works. The two functions alleged the same corporeality of variables, with the aforementioned type, in the same society. Their stack frames are exactly the aforementioned. When func1 ends, the retentivity where the values of its local variables reside are not cleared – just rsp is incremented.
Every bit a outcome, when we call func2 its stack frame sits at exactly the aforementioned place of the previous func1 stack frame, and the local variables of func2 have the same values of the local variables of func1 when we left func1.

Let'south examine the assembly code to bear witness it:

          holberton$ objdump -d -j .text -M intel                  
          000000000040052d <func1>:   40052d:       55                      push button   rbp   40052e:       48 89 e5                mov    rbp,rsp   400531:       48 83 ec 10             sub    rsp,0x10   400535:       c7 45 f4 62 00 00 00    mov    DWORD PTR [rbp-0xc],0x62   40053c:       c7 45 f8 cc 03 00 00    mov    DWORD PTR [rbp-0x8],0x3cc   400543:       8b 45 f8                mov    eax,DWORD PTR [rbp-0x8]   400546:       8b 55 f4                mov    edx,DWORD PTR [rbp-0xc]   400549:       01 d0                   add    eax,edx   40054b:       89 45 fc                mov    DWORD PTR [rbp-0x4],eax   40054e:       8b 4d fc                mov    ecx,DWORD PTR [rbp-0x4]   400551:       8b 55 f8                mov    edx,DWORD PTR [rbp-0x8]   400554:       8b 45 f4                mov    eax,DWORD PTR [rbp-0xc]   400557:       89 c6                   mov    esi,eax   400559:       bf 34 06 forty 00          mov    edi,0x400634   40055e:       b8 00 00 00 00          mov    eax,0x0   400563:       e8 a8 fe ff ff          call   400410 <printf@plt>   400568:       c9                      go out     400569:       c3                      ret      000000000040056a <func2>:   40056a:       55                      push   rbp   40056b:       48 89 e5                mov    rbp,rsp   40056e:       48 83 ec 10             sub    rsp,0x10   400572:       8b 4d fc                mov    ecx,DWORD PTR [rbp-0x4]   400575:       8b 55 f8                mov    edx,DWORD PTR [rbp-0x8]   400578:       8b 45 f4                mov    eax,DWORD PTR [rbp-0xc]   40057b:       89 c6                   mov    esi,eax   40057d:       bf 34 06 40 00          mov    edi,0x400634   400582:       b8 00 00 00 00          mov    eax,0x0   400587:       e8 84 fe ff ff          phone call   400410 <printf@plt>   40058c:       c9                      leave     40058d:       c3                      ret    000000000040058e <main>:   40058e:       55                      push   rbp   40058f:       48 89 e5                mov    rbp,rsp   400592:       e8 96 ff ff ff          call   40052d <func1>   400597:       e8 ce ff ff ff          call   40056a <func2>   40059c:       b8 00 00 00 00          mov    eax,0x0   4005a1:       5d                      pop    rbp   4005a2:       c3                      ret       4005a3:       66 2e 0f 1f 84 00 00    nop    Word PTR cs:[rax+rax*i+0x0]   4005aa:       00 00 00    4005ad:       0f 1f 00                nop    DWORD PTR [rax]                  

As y'all tin meet, the way the stack frame is formed is always consistent. In our ii functions, the size of the stack frame is the same since the local variables are the same.

          push   rbp mov    rbp,rsp sub    rsp,0x10                  

And both functions end with the leave statement.

The variables a, b and c are referenced the same way in the two functions:

  • a lies at memory address rbp - 0xc
  • b lies at memory address rbp - 0x8
  • c lies at retention accost rbp - 0x4

Notation that the order of those variables on the stack is not the same equally the lodge of those variables in our code. The compiler orders them as information technology wants, and then you should never assume the gild of your local variables in the stack.

So, this is the state of the stack and the registers rbp and rsp before nosotros get out func1:

the stack

When we leave the part func1, nosotros hit the instruction leave; every bit previously explained, this is the state of the stack, rbp and rsp right before returning to the function main:

the stack

So when we enter func2, the local variables are set to whatsoever sits in retention on the stack, and that is why their values are the same as the local variables of the part func1.

the stack

ret

You might accept noticed that all our case functions end with the instruction ret. ret pops the return address from stack and jumps there. When functions are chosen the program uses the instruction telephone call to push button the return accost before information technology jumps to the first pedagogy of the function called.
This is how the program is able to telephone call a function and then return from said part the calling function to execute its side by side instruction.

And so this means that there are more than just variables on the stack, there are likewise retention addresses of instructions. Let's revisit our 1-main.c code.

When the main function calls func1,

                      400592:       e8 96 ff ff ff          call   40052d <func1>                  

it pushes the retentivity address of the next instruction onto the stack, and and then jumps to func1.
As a consequence, before executing any instructions in func1, the summit of the stack contains this address, and so rsp points to this value.

the stack

Subsequently the stack frame of func1 is formed, the stack looks like this:

the stack

Wrapping everything upward

Given what we but learned, we can directly utilize rbp to directly access all our local variables (without using the C variables!), as well as the saved rbp value on the stack and the return address values of our functions.

To do so in C, we tin can use:

                      register long rsp asm ("rsp");     register long rbp asm ("rbp");                  

Here is the listing of the program 2-main.c:

          #include <stdio.h>  void func1(void) {     int a;     int b;     int c;     register long rsp asm ("rsp");     register long rbp asm ("rbp");      a = 98;     b = 972;     c = a + b;     printf("a = %d, b = %d, c = %d\northward", a, b, c);     printf("func1, rpb = %lx\n", rbp);     printf("func1, rsp = %lx\n", rsp);     printf("func1, a = %d\north", *(int *)(((char *)rbp) - 0xc) );     printf("func1, b = %d\northward", *(int *)(((char *)rbp) - 0x8) );     printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) );     printf("func1, previous rbp value = %sixty\due north", *(unsigned long int *)rbp );     printf("func1, return address value = %threescore\north", *(unsigned long int *)((char *)rbp + 8) ); }  void func2(void) {     int a;     int b;     int c;     register long rsp asm ("rsp");     register long rbp asm ("rbp");      printf("func2, a = %d, b = %d, c = %d\northward", a, b, c);     printf("func2, rpb = %lx\northward", rbp);     printf("func2, rsp = %lx\n", rsp); }  int main(void) {     annals long rsp asm ("rsp");     register long rbp asm ("rbp");      printf("main, rpb = %lx\n", rbp);     printf("main, rsp = %lx\northward", rsp);     func1();     func2();     return (0); }                  

Getting the values of the variables

the stack

From our previous discoveries, we know that our variables are referenced via rbp – 0xX:

  • a is at rbp - 0xc
  • b is at rbp - 0x8
  • c is at rbp - 0x4

Then in social club to get the values of those variables, we need to dereference rbp. For the variable a:

  • cast our variable rbp to a char *: (char *)rbp
  • subtract the right amount of bytes to get the accost of where the variable is in memory: (char *)rbp) - 0xc
  • cast it again to a pointer pointing to an int since a is of type int: (int *)(((char *)rbp) - 0xc)
  • and dereference it to get the value sitting at this address: *(int *)(((char *)rbp) - 0xc)

The saved rbp value

the stack

Looking at the above diagram, the current rbp straight points to the saved rbp, and then we simply have to cast our variable rbp to a pointer to an unsigned long int and dereference it: *(unsigned long int *)rbp.

The render address value

the stack

The return accost value is right earlier the saved previous rbp on the stack. rbp is eight bytes long, so we simply need to add 8 to the electric current value of rbp to get the address where this return value is on the stack. This is how we practice it:

  • bandage our variable rbp to a char *: (char *)rbp
  • add 8 to this value: ((char *)rbp + 8)
  • cast it to signal to an unsigned long int: (unsigned long int *)((char *)rbp + 8)
  • dereference it to get the value at this address: *(unsigned long int *)((char *)rbp + 8)

The output of our program

          holberton$ gcc ii-main.c && ./a.out  chief, rpb = 7ffc78e71b70 primary, rsp = 7ffc78e71b70 a = 98, b = 972, c = 1070 func1, rpb = 7ffc78e71b60 func1, rsp = 7ffc78e71b50 func1, a = 98 func1, b = 972 func1, c = 1070 func1, previous rbp value = 7ffc78e71b70 func1, return accost value = 400697 func2, a = 98, b = 972, c = 1070 func2, rpb = 7ffc78e71b60 func2, rsp = 7ffc78e71b50 holberton$                  

We can see that:

  • from func1 we can admission all our variables correctly via rbp
  • from func1 we tin get the rbp of the function primary
  • we ostend that func1 and func2 practise have the aforementioned rbp and rsp values
  • the difference between rsp and rbp is 0x10, every bit seen in the assembly code (sub rsp,0x10)
  • in the main office, rsp == rbp because there are no local variables

The render address from func1 is 0x400697. Let's double check this assumption by disassembling the program. If nosotros are correct, this should be the address of the instruction right after the telephone call of func1 in the main role.

          holberton$ objdump -d -j .text -M intel | less                  
          0000000000400664 <main>:   400664:       55                      push   rbp   400665:       48 89 e5                mov    rbp,rsp   400668:       48 89 e8                mov    rax,rbp   40066b:       48 89 c6                mov    rsi,rax   40066e:       bf 3b 08 xl 00          mov    edi,0x40083b   400673:       b8 00 00 00 00          mov    eax,0x0   400678:       e8 93 fd ff ff          call   400410 <printf@plt>   40067d:       48 89 e0                mov    rax,rsp   400680:       48 89 c6                mov    rsi,rax   400683:       bf 4c 08 40 00          mov    edi,0x40084c   400688:       b8 00 00 00 00          mov    eax,0x0   40068d:       e8 7e fd ff ff          call   400410 <printf@plt>   400692:       e8 96 fe ff ff          call   40052d <func1>   400697:       e8 7a ff ff ff          telephone call   400616 <func2>   40069c:       b8 00 00 00 00          mov    eax,0x0   4006a1:       5d                      pop    rbp   4006a2:       c3                      ret       4006a3:       66 2e 0f 1f 84 00 00    nop    Word PTR cs:[rax+rax*1+0x0]   4006aa:       00 00 00    4006ad:       0f 1f 00                nop    DWORD PTR [rax]                  

And yes! \o/

Hack the stack!

Now that we know where to notice the return address on the stack, what if we were to modify this value? Could nosotros alter the flow of a programme and brand func1 return to somewhere else? Let'south add together a new function, called farewell to our programme (three-master.c):

          #include <stdio.h> #include <stdlib.h>  void bye(void) {     printf("[10] I am in the function bye!\n");     exit(98); }  void func1(void) {     int a;     int b;     int c;     annals long rsp asm ("rsp");     register long rbp asm ("rbp");      a = 98;     b = 972;     c = a + b;     printf("a = %d, b = %d, c = %d\n", a, b, c);     printf("func1, rpb = %lx\north", rbp);     printf("func1, rsp = %lx\due north", rsp);     printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) );     printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) );     printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) );     printf("func1, previous rbp value = %threescore\north", *(unsigned long int *)rbp );     printf("func1, render accost value = %lx\n", *(unsigned long int *)((char *)rbp + 8) ); }  void func2(void) {     int a;     int b;     int c;     register long rsp asm ("rsp");     register long rbp asm ("rbp");      printf("func2, a = %d, b = %d, c = %d\n", a, b, c);     printf("func2, rpb = %sixty\n", rbp);     printf("func2, rsp = %lx\northward", rsp); }  int main(void) {     register long rsp asm ("rsp");     annals long rbp asm ("rbp");      printf("main, rpb = %lx\north", rbp);     printf("main, rsp = %lx\north", rsp);     func1();     func2();     return (0); }                  

Let's see at which accost the code of this function starts:

          holberton$ gcc 3-main.c && objdump -d -j .text -M intel | less                  
          00000000004005bd <cheerio>:   4005bd:       55                      push   rbp   4005be:       48 89 e5                mov    rbp,rsp   4005c1:       bf d8 07 40 00          mov    edi,0x4007d8   4005c6:       e8 b5 fe ff ff          call   400480 <puts@plt>   4005cb:       bf 62 00 00 00          mov    edi,0x62   4005d0:       e8 eb atomic number 26 ff ff          call   4004c0 <exit@plt>                  

At present let's supercede the render address on the stack from the func1 office with the address of the get-go of the part bye, 4005bd (4-main.c):

          #include <stdio.h> #include <stdlib.h>  void farewell(void) {     printf("[x] I am in the office bye!\northward");     exit(98); }  void func1(void) {     int a;     int b;     int c;     register long rsp asm ("rsp");     register long rbp asm ("rbp");      a = 98;     b = 972;     c = a + b;     printf("a = %d, b = %d, c = %d\n", a, b, c);     printf("func1, rpb = %lx\n", rbp);     printf("func1, rsp = %lx\due north", rsp);     printf("func1, a = %d\due north", *(int *)(((char *)rbp) - 0xc) );     printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) );     printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) );     printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp );     printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) );     /* hack the stack! */     *(unsigned long int *)((char *)rbp + 8) = 0x4005bd; }  void func2(void) {     int a;     int b;     int c;     register long rsp asm ("rsp");     annals long rbp asm ("rbp");      printf("func2, a = %d, b = %d, c = %d\n", a, b, c);     printf("func2, rpb = %lx\n", rbp);     printf("func2, rsp = %sixty\due north", rsp); }  int main(void) {     register long rsp asm ("rsp");     register long rbp asm ("rbp");      printf("chief, rpb = %lx\n", rbp);     printf("main, rsp = %lx\n", rsp);     func1();     func2();     return (0); }                  
          holberton$ gcc 4-main.c && ./a.out main, rpb = 7fff62ef1b60 main, rsp = 7fff62ef1b60 a = 98, b = 972, c = 1070 func1, rpb = 7fff62ef1b50 func1, rsp = 7fff62ef1b40 func1, a = 98 func1, b = 972 func1, c = 1070 func1, previous rbp value = 7fff62ef1b60 func1, render address value = 40074d [x] I am in the function farewell! holberton$ echo $? 98 holberton$                  

We have chosen the office bye, without calling it! ?

Outro

I hope that you lot enjoyed this and learned a couple of things about the stack. As usual, this will be continued! Let me know if you take annihilation you would like me to cover in the next chapter.

Questions? Feedback?

If you have questions or feedback don't hesitate to ping u.s. on Twitter at @holbertonschool or @julienbarbier42.
Haters, please ship your comments to /dev/nothing.

Happy Hacking!

Cheers for reading!

As e'er, no one is perfect (except Chuck of course), and then don't hesitate to contribute or send me your comments if you observe anything I missed.

Files

This repo contains the source code (X-principal.c files) for programs created in this tutorial.

Read more about the virtual retentiveness

Follow @holbertonschool or @julienbarbier42 on Twitter to get the next chapters! This was the fifth affiliate in our series on the virtual retentivity. If you missed the previous ones, hither are the links to them:

  • Chapter 0: Hack The Virtual Memory: C strings & /proc
  • Affiliate i: Hack The Virtual Memory: Python bytes
  • Chapter 2: Hack The Virtual Memory: Drawing the VM diagram
  • Chapter 3: Hack the Virtual Memory: malloc, the heap & the program break

Many thanks to Naomi for proof-reading! ?

Which Assembly Registers To Store Variables,

Source: https://blog.holbertonschool.com/hack-virtual-memory-stack-registers-assembly-code/

Posted by: hendersonreand2000.blogspot.com

0 Response to "Which Assembly Registers To Store Variables"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel