Years it has been since i have been using PE files right under my nose and i never realized. When i found about it, i couldn’t resist myself to dig down further until i reach its core. The way the format was designed, one can simply see the clear relevance between these executable and the Operating system like a mirror image. Are you ready to take the long dive? Well, take a long breathe, you gonna need it 😉
Oh wait! Let’s do some preparation first. It’ll be good if we first revise some good old concepts of operating system.
Remember it? No, Okay let me give you a brief about it. These are the fixed size blocks to store data in a virtual memory. It’s the smallest unit for memory management which acts a part of main memory and temporarily keeps the data safe until it’s needed by a process to run on priority. For the whole time, the process remains unaware of all these swaps happening between the main memory and the auxiliary memory(kind of story behind the scenes).
There are multiple advantages of using Pages:
- An isolated address space for each program. So, each program runs independently without consuming other program space in the middle of run.
- Each section of a file loaded into memory can get fresh page with different read, write and execute permissions based on their configurations.
- Last but not the least, Shifting of page files into disk when not in use and replace them with the program which needs memory at priority.
So now you know enough to start PE files. Now is the time to dive. Get Set Go.
What is PE file?
In simple words, PE is the native Win32 file format. All of win32 executable (except VxDs and 16-bit DLLs) uses PE file format like 32bit DLLs, COM files, OCX controls, Control Panel Applets (.CPL files) and .NET executables are all PE format. Not a shocking news, but NT’s kernel mode drivers also use this file format. So, what’s in there so special about it? Well, let’s find out. Let me show you a picture of how it’d look like if an executable file of Windows is parsed by simply putting it’s various parts one above another:
As you can see, it’s easy to understand if we break down the file into multiple parts. Let’s start with the first one:
It’s the first member of the family. All PE file starts with this header. I won’t be telling you much of story about it. Rather, i’ll directly take you through its specs:
- Size: 64 bytes
- Identification: First two bytes contains values 4D and 5A (“MZ” in ASCII) named after Mark Zbikowsky, a well known architect of MS-DOS.
- Purpose: Required in executable validation when program runs from DOS.
- Point of Interest: The header contains a DWORD(4 bytes in a 16-bit architecture) at the end of the header or just before the next header DOS Stub begins, known as “ifanew”. This contains the offset of PE header, relative to the beginning of the file. When the program needs to be loaded by Windows loader, it looks for this value to skip the DOS Stub and go directly to PE header.
Here is a picture showing DOS header defined in C language:
It contains the program to be executed when the DOS runs the PE file. It starts just after 4-byte reserved address known as “ifanew” and it’s standard universal size is 128 bytes. By default, it contains the string “This program cannot be run in DOS mode.” which can be embedded with one of your program in case, you want your PE file compatible to run in both DOS as well as windows versions.
Behind the scenes
Think of those early days of Microsoft Windows, when The Windows™ 1.x, 2.x and 3.xx OS not only existed in the same volumes as Microsoft® DOS, but also used to run on top of an MS-DOS OS. It was quite probable that a user might attempt or even like to to run some of the Windows® programs under DOS. Then what? What our Microsoft programmers did?
Microsoft® programmers made sure all Windows® programs will have a simple 16-bit DOS program placed at the beginning of each Windows executable that will alert the user if they are attempting to run a Windows® program under DOS. That’s all the DOS “Stub” program does.
This is how DOS Stub header looks like in PE file:
This is where the main story begins. It’s also known as IMAGE_NT_HEADER and contains 3 main components as shown in picture:
The structure begins with a DWORD containing the value 50h, 45h, 00, 00 (meaning “PE” followed by two termination zeros). Like the name suggest, it’s just a signature that PE header starts here.
The next 20 bytes after Signature represents file header. It contains information about physical layout and properties of the file, for example: No of sections, size of optional header etc. Let me show you it’s structure:Most of them are not important to understand. So, I am going to tell you about the two of them only:
Number of Sections
The value in it tells you the number of sections the PE file holds with it. If it’s value is 04h, 00. It means it only contains four sections.
It represents the time when the linker or the compiler for an OBJ file produced this file.
It contains the flag value which can help in identifying if the file is a Dll or an executable.
This header follows FileHeader and makes next 224 bytes containing information about the logical layout of the file. Some of the important ones are:
It is the stored value in it presents the address where the execution of the file starts.
It’s the loading address of PE file. PE file expects windows loader to load the file in memory at this address. Most of the cases, it’s value is 400000h.
It defines the granularity of alignment of sections when they are loaded into the memory by windows loader. Most of the cases, it’s value is 4096(1000h) which means each section is going to get stored in multiple slots of 4096 bytes each no matter the actual size of the section(less or more).
It defines the granularity of alignment of sections in the file(when it’s not loaded). The way to store section in file is same as defined in SectionAlignment. The only different in size of each slot. In this case, it’s 512 bytes(200h).
Size of Image
It gives you the size of PE image when loaded into the memory. It can be calculated by the the sum of all headers and the sections aligned to SectionAlignment.
The last 228 bytes represent DataDirectory, an array or 16 IMAGE_DATA_DIRECTORY structures, each one of them relating to important data structure in PE file, for example, Import table, Export table etc. We’ll talk about them in next chapter.
As the name suggests, it contains the information about the Sections present in PE file such as their attributes and virtual offset. If there are 5 sections present in a PE file, there must be 5 IMAGE_SECTION_HEADER structures present just after PE file header, 40 bytes each. Following picture shows how would one structure look like:
Well Known Sections
The section table is followed by sections. Here is a brief on some well known sections:
The section, also known as CODE, is the place where all the instructions reside. These instructions are further executed by CPU. This is the section which contains “Entry Point” as mentioned earlier.
The .rdata section typically represents the import and export information. This section can also store other read-only data used by the program like literals, constant strings etc.
The .data section consist of the program’s global data, which can be accessed from anywhere in the program.
The .rsrc section contains the resources like icons, images, cursor groups, menu etc. used by the executable. This section can be seen as structured into a resource tree form if opened in a resource editor like ResHacker.
Have a look at the following picture showing the sections of a PE file with their attributes:
That’s all brief about PE File Structure.
I know, that’s not enough specially for those who don’t to stop when it just started to become interesting. Don’t worry, i also won’t like to keep you hanging here. It’s just the beginning of Malware Analysis. In next chapter, we are going to learn about file entropy and how knowing about it can help you in Static malware analysis. Common, join me!!