Email headers has always been a pain as you are never sure how much true they are. I mean, if you are living in an ideal world, only then you can believe on all information passed through metadatas of email to be true and never modified from the sender’s end while it’s transported. Well, time to come back to real world, trust me it’s a nightmare. I have seen extremely bad formatted mails, mostly intended to be like.
An email as per standards defined for ARPA(Advance Research Projects Agency) Network Messages in RFC822, must consist of headers and body. RFC822 is a globally recognized and well followed standard to send messages between computers around the world. Consider an email like a post mail in which the sender writes its message in text form and packs it in envelope written with all information required to deliver it to the intended receiver. The standards define very few policies for the content in the envelope. However, sometimes information in a message might help a sender system to build an envelope. These standards are strictly applied only to passing of messages between two systems, not to the way of storing them in systems.
Structure of Headers:
A message what you see in your mail user agents like Microsoft Outlook, Thunderbird etc. or online free mail services like Gmail, Yahoo etc. may look very different in structure than the content in its source. Following image shows a mail source, go through it once line by line and you’ll start finding some familiar terms like sender’s mail address, your own mail address, subject of your mail and in the bottom, the content what sender intended to send(if not encoded).
Like i said, top part of the mail contains headers. How to identify them? Well, it’s simple. Just keep on looking those lines starting with some parameters like Delivered-to, Received etc followed by a colon “:”.
From the above sample, we can see the headers containing colon are Delivered-to, Received-to, Received-SPF, Authentication-Results, Message-Id, Mime-Version, From, To, Date, Subject, Content-type and last Content-Transfer-Encoding. Yeah, that’s a lot, may be for some of you but trust me, it’s the most simple sample with least number of headers. If you don’t believe me, no worries, i have a better example i’ll be covering in other posts after explaining it first.
Most Basic Headers:
I’m going to explain headers mentioned line by line in above example. I’ll also try to give you a brief on their importance in an e-mail communication. Let’s start with the first one:
1. Delivered-to: [email protected]
Most simple one to understand, the final destination of an e-mail it shows. This header has address of the final receiver of the e-mail. Final receiver, not just receiver, huh? Well, e-mails can be redirected to some other addresses based on the receiver end system configuration. So, it might be different from the actual address sender intended to send to. In our case, it’s the same.
2. Received: from mail.traffictocash.com (amin175.hrn9.com. [126.96.36.199]) by mx.google.com with ESMTP id 195si11036403pfa.12.2016.02.11.00.04.07 for <[email protected]>; Thu, 11 Feb 2016 00:04:07 -0800 (PST)
This may look a bit longer but stay with me, it’s easy and self explanatory once you know how this received chain of multiple received/x-received headers are formed. Let me give you an overview with an example:
Let’s say there is a sender ‘A’ using a mail user agent ‘X’ who writes subject and some content, enters receiver ‘B’ e-mail address in ‘To’ parameters and clicks on Send. Now, the mail passes through many MTAs before it reaches to its final destination. MTA stands for Mail Transfer Agent which is also known as mail relay and simply acts a transmission agent. In the process of transmission of an e-mail from sender end to receiver end, many MTAs come across helping in routing from one to another MTA via MSA(Mail Submission Agent) or to MDA(mail delivery agent) which further archives that mail and is responsible for its delivery to destination. So, if the process is defined through a pic, it’d look like this:
While an e-mail passes through different MTAs, each one of them adds an Received header which provides it’s tracing details. The above mentioned header can be explained in parts as follows:
- Received: from <the name the sending computer> (<the name associated with that computer’s IP address> [<its IP address>]) by <the receiving computer’s name> (<the software used by that computer (usually Sendmail, qmail or Postfix)>) with <protocol (usually SMTP or ESMTP)> id <id assigned by local computer for logging>; <timestamp (usually given in the computer’s localtime)>
By comparing this format to our example,
mail.traffictocash.com - the name the sending computer
amin175.hrn9.com - the name associated with that computer's IP address
[188.8.131.52] - Associated Computer's IP address
mx.google.com - the receiving computer's name
ESMTP - Protocol
195si11036403pfa.12.2016.02.11.00.04.07 - id assigned by local computer for logging
Thu, 11 Feb 2016 00:04:07 -0800 (PST) - timestamp usually given in the computer's localtime
As you can see multiple received header, that means the e-mail has hoped on multiple MTAs during it’s transmission from sender to receiver. As these headers are sequentially added through the transmission process, it’s easy to trace the sender if one starts reading them from top to bottom. The top most received header will show the receiver end and the bottom most will show the sender’s end. If same theory applied to our example, amin175.hrn9.com is the actual sender’s computer name.
It’s a globally unique identity given to an e-mail which is either added by Mail User Agent(MUA) or by Mail Submission Agent(MSA) during the transmission. It’s not mandatory for an e-mail to have a message-id to be delivered, though every e-mail should have it. No two different e-mails are supposed to have same message-ids otherwise one will be discarded. There is a unique format defined in RFC2822 according to which there should be no comment character(#) or folding white spaces and the generating host has to guarantee of its uniqueness. Most generally used message-id is combination of time-stamp and sender’s host name.
As name defines itself, It contains the unique identity of sender and generally formed by a combination of the name sender is known by on email communications, then followed by his email address in angular brackets. It’s very common if the name is not present in from address specially in those cases when
Likewise From header, similar combination of name and address is applied on this header. The only difference is it provides identity of receiver. This header may contain multiple addresses as sending email from one to many is possible.
The time-stamp when sender clicks on the send button or adds the email in the queue of MDA to send it later is stored in this header.
The subject line of the message decided by sender is stored in this header. It’s always a single line. It’s not mandatory to add something into this header.
Email can be sent in multiple forms like text/plain in which only plain text format forms the body, text/html in which the body content is represented in more graphical form by embedding body content including images and links into html text, multipart alternative in which both form text/plain and text/html are present and the last one is mixed in which in addition to either of three types, one or multiple attachments are appended to the body. When attachments comes into role, a special parameter is added into this header, i.e., boundary. Boundary contains a sender’s MUA or first MTA generated value that helps in identifying the different parts of message by separating them. The other parameter it comprises of is charset which defines the character set used in body text. A very common and default encoding used in email communication is US-ASCII. It may be changed based on the type of language one uses while writing message.
As per the definition (RFC 821), email are limited to lines of 1000 characters of 7 bits each which means that any bytes you send down the pipe can’t have the most significant (“highest-order”) bit set to “1” if SMTP protocol is followed. But the content people around the world want to send might not want to obey this restriction, for example, an image or text file containing Unicode characters. Well, where there is a will, there is a way. Content-Transfer-Encoding provides you the most easy solution. One can use transfer-encoding in number of varieties available to transfer the high bit values through email. Some of them are:
- 7Bit Encoding :
It simply means your data consists only of US-ASCII characters. Also, when you use this encoding, you are also agreeing that your lines are made up of not more than 1000 characters. This encoding demands no extra work, so is the most easy and globally most used encoding. As per the ascii table, first 128 characters are covered under this encoding and can be used without any alteration.
- 8Bit Encoding : If your data may include extended ASCII characters, 8th(highest) bit may be used which extents your character usage limits. Thus, you can use “Extended ASCII Codes” as described in ascii table. As with
7bit, there’s still a 1000-character line limit.
- Binary Encoding :
This one is the same as
8bit, just with no line length restriction. One can still include any characters he wants with no extra encoding.
- Quoted Printable : There are HTML files with international characters and might have more than 1000-character. To support them, the
quoted-printable encoding (RFC 1341) is designed and contains two things:
- Escaping of non-US-ASCII characters so that they can be represented in only 7-bit characters. (they can be displayed as an equals sign plus two 7-bit characters.)
- No lines are greater than 76 characters, and that line breaks are represented using special characters(escaped).
Because of the escaping and short lines, Quoted Printable is much harder to read by a human than
8bit, but it supports a much wider range of possible content.
- Base64 Encoding : What if you want to send an image file to someone then you don’t have many options left with you except quoted-printable but inefficient as every byte will be represented by 3 characters. Well,
base64 might come into use here. How does it work?
It basically encodes 3 raw bytes into 4 US-ASCII characters which makes it efficient over quoted-printable. As per RFC 1341, It has to be fitted to the line length of 76 characters to stay under an SMTP limits, but that’s easy to manage when you’re just splitting or concatenating arbitrary characters at fixed lengths. The bad part is that it’s completely unreadable by humans even just being plain US-ASCII text.