Express yourself, even if you need to hack virtual memory to do it!

Lucia Rodriguez
6 min readAug 27, 2020
Structure of a program in main memory (Taken from https://gabrieletolomei.wordpress.com/miscellanea/operating-systems/in-memory-layout/)

Even if the machines of our time are more intended for using them rather than understanding how they work (at least for most users, like me), getting at least a very general idea about their functioning is interesting.

Coordinate operation between hardware and software in Unix machines, for example, requires a kernel, a program that handles creation and management of the different processes required by a computer, hardware management, space management of the file system and so on. But how the kernel handles all these tasks? procfs is a solution. It’s a virtual file system which maps how the kernel is dealing with the processes and its resources, mainly memory use. It’s virtual because it’s mounted at the start of the operating system and it doesn’t have persistency (it is not saved permanently)

The /proc file system is the way that procfs maps all the data about processes in a file system. It works as an interface to the internal data structures in the kernel. It provides information about the system and some features can be manipulated through its files.

How is /proc structured?

How your PC processes are organized in an organized way

If you do an ls in /proc, you will find a set of directories named after numbers, and a set of files with names like meminfo, stat, swaps, uptime or vmstat. These files usually refer to a specific feature of the machine performance as a whole and the directories are named after Processes ID (PID) and contains relevant information about them. Inside each directory we can find files like stat (the process status), cmdline (the command line arguments for the process) or fd, which is a directory with contains all the file descriptors related to this PID.

The status files are really interesting because all about a process is implemented can be found there. Its current status, the size of the program, its threads, its status facing signals and capabilities are some of the features that you can find on these files. Ps program (used for getting a snapshot of the current processes) get the printed data from here.

More interesting, /proc files collects data about processes and how the memory is handled for them. We’ll find a stack file, that can follow how the process uses the stack in their running. Even more interesting, we’ll find the maps and the mem files. The maps document collects info about how the process is spending memory. It’s a map. On the other hand, the mem file is the memory held by this process itself.

And why is this important? Well, you have the memory of your processes in your hand thanks to the /proc system! The stack is a section in your computer’s memory that stores temporary all the variables and data while it is executing and after it comes to an end, they will be freed from the memory, automatically. But what happens when you need some type of persistency? Then you allocate memory (via malloc() in C, for example) in the heap. While the stack will be in the higher sections of your memory, the heap will be in the lowest. While stack data is volatile, the data stored in the heap will be there unless you explicitly free them. If you know where a process component is in heap, you will be able to play with it (reading, even overwriting it) if the permissions are allowing it.

As I said before, /proc/<PID>/maps is a map, but it’s not that easy. Most of the files in /proc subdirectories are raw data, only a few of them are human-readable at its first sight. Maps are one of the latter kind. Even if its format can see as incomprehensible, the fields are clearly defined: on the first place we find the addresses (the start and the end of each section), the permissions over that section of memory, the offset into the mapping, the device of the memory section, the inode (which usually contains file metadata, it’s a data structure common in Unix) and finally the path of the component that holds that memory section.

We did some exercise to understand better how memory works under the /proc file system. We were tasked with overwriting a string in the heap of a process. We have the map and the memory to do that, Easy, breezy, right?

It’s literally a map

Nope! Well, you can maybe take the addresses of the heap of a process and later make an open the mem file and write the new string on it. But you would have to write that long address in this funny format and use it to do all these steps for doing that. But here in Holberton, we’re learning to do these things on efficient and most important, reusable ways. So, we need a script.

How we can do that?

At first, we need to read line by line the /maps file. You can do it with the language that you prefer but thanks gods, we have Python. You need to open() the file with the proper permissions: so if we only check this file for consulting, reading permission would do fine.

After that, you need to loop over each line and tokenizes it. Do you remember that /maps’ last field is the path? Well, we need to find the “heap” line. So, if you find the heap, you will store the information which that line contains. You only need to check if the string “[heap]” is in the last token from the line. If you find it, you only need to save the addresses and permissions fields.

On the first place, you should check for an “r” and a “w” in the permissions file, meaning that section of memory can be read an written. After that, you need to split the addresses to get where the memory section for the heap starts and where it ends. You mostly need the first section of the addresses field, the start one, and codifies it in hexadecimal format (the one used for memory addresses).

After you collect all this info, you can close the file descriptor you used for reading the /maps file. And now you need to open() the path of the PID’s mem file in an rb+ mode, meaning you will open this file in binary mode and you will read it and write on it. After that, you will need to get to the start address you get before, that is, going directly to the memory section dedicated to the heap via open()’s seek method. Once you get there, you create a variable that co the entire heap memory (end address — start address) and you can start looking for the string you want to replace. For this, you can use the index() method of the read() you did for containing the heap. Once you get the exact position the string is located in the heap, you can do a seek() in the heap container using the start_address + the index of the string. On this point, you can write() the new string in a bytes format for replacing the former one.

And voila! You can try it running a process from a C program that strdup() a string, so you will find this string in the heap of this process while it is running. Remember to do it in an infinite loop that you can stop via an external signal, like a Ctrl + C. You only need the PID of this process the string that you allocated in the C program and the new string. Of course, you need to do some tweaks first, mainly to assure the script close file descriptors and exit properly in case of failure or filling spaces if the new string is shorter than the original, but this is, grosso modo, how you can overwrite in the heap of a process.

Sources

The /proc file system

procfs — Wikipedia

Holberton School blog — Hack The Virtual Memory: C strings & /proc

--

--