Take a quiz to access yourself.

# Introduction

In malware analysis, File entropy calculation has its own importance. It gives a rough idea to the analyst to which analysis methods he should use and what results he can expect before start even further.

Let’s make it more clear with two files with different entropy values. First, a file with lesser values can mean a sign of no obfuscation or compression. In such cases, analyst gets a direct conclusion of moving towards a static analysis. While, a file which possess higher values of it simply means a packed sample. It leads a analyst direct towards dynamic analysis if analyst is not so keen to spend more time on unpacking stuff.

Seems like we need to stop right here and think about what this term “entropy” actually means, only then i’ll be able to tell you in more details its value in malware analysis.

# What is Enrotpy?

In simpler words, it defines the randomness. It’s more like how unpredictable something is. In more technical words, “In computing, entropy is the randomness collected by an operating system or application for use in cryptography or other uses that require random data. This randomness is often collected from hardware sources, either pre-existing ones such as mouse movements or specially provided randomness generators.” as defined by Wikipedia.

One can now easily conclude the meaning of entropy in respect to a file as the measurement of the how much disordered the bytes are in a file. There are various units to define it like Nat, Shanon or Hartley. Well, the most common unit used is Shanon. The range of values a file’s entropy must come in as per Shanon’s algorithm is 0 to 8. So, when its value is zero, one can say the outcome is certain. On contrary, when its value is 8, the outcome is most unpredictable it could be.

## Shanon’s Formula

The formula given by Shanon to measure randomness in outcome of events is: where i is the event with probability pi.

This equation will always result in between 0 to 8.

Let’s try to use this formula on different files of different bytes formation.

### Case 1

The file is formed of 100 bytes of values zero. The probability of getting a value other than zero next to one zero is zero, so if the same is applied into formula, the outcome will be zero. It means the pattern shows perfect certainty. ### Case 2

This one consists of 100 bytes and contains half count of zeros and other half of ones. The probability of getting a value other than zero next to zero is one. Same theory can be applied for one as well like probability of getting a value other than one next to one is one. So, if the same is applied into Shanon’s formula, the outcome is one. ### Case 3

In another scenario of same file when compressed with RAR with same formula applied, it comes out to be:

Entropy = 5.059246862 ### Case 4

If the same file with half zeros and half ones  is encrypted using PGP tool and the output file entropy is calculated against Shanon’s formula, it comes out to be:

Entropy = 7.8347915272 After looking at a variety of examples, one can simply conclude to one fact. More randomness a file possess, more higher the value of entropy is.

## How useful is it in Terms of Malware Analysis?

As mentioned, entropy values provides a rough estimation of whether the file is encrypted or not. This is important because it helps in deciding the expectation of analyst and methods to use for further analysis .

Also, One can see “interesting” areas of the malware in file entropy profile diagram. Certain malwares are not encrypted entirely. Let’s take an example of the Gauss APT. In this case, the payload was heavily encrypted. In certain scenarios such as incidence response, analysts want to know what exactly was the attackers doing in order to perform damage control. Based on entropy graph, analysts will be able to quickly identify and prioritize those crucial areas for analysis. Let me show an example of how a entropy graph might look like for SPA encrypted cyphertext: Furthermore, entropy can be helpful in malware classification. A combined results of other identifiers with Entropy area similarities might help in classifying it more easily and to a high degree of accuracy. It is because the same hacker groups usually use similar encryption routines and the same malware platform to launch their attacks.

Learn about well known artistry that malware authors use to cover up their malicious codes from anti-viruses in our next chapter:

Obfuscation – Base of Packers (Malware Analysis – Chapter 3)

Thanks for Reading the Article. If you liked this post, please comment with your suggestions and feedback. All critics are welcomed.