The HTTPS protocol explained! — Under the Hood

Anusha Dasari
8 min readFeb 18, 2020

--

Majority of software engineers, when asked, what HTTPS is, the most likely answer you are going to hear is that it is secure. And the answer ends there. If you think you are one of them and are eager to know in-depth about how HTTPS works, this article is for you. If you haven’t read my previous article on the HTTPS basics, you can head over here before going in detail.

So let’s delve in!

HTTPS essentially involves preserving 3 things:

  • Confidentiality: Ensures that sensitive data over the wire is not easily readable, by encryption
  • Integrity: Ensuring that only the server can read what you send it and only you can read what it sends back.
  • Authenticity: Verifying that you are talking directly to the server that you think you are talking to.

HTTPS protects data using one of the 2 protocols: SSL, TLS. Let’s see what they are.

SSL: SSL is the abbreviation of Secure Socket Layer. It uses public-key encryption. Originally built by Netscape back in the early 90s, version 2.0 was launched in ’95 which was the first time we started using secure transport layer en masse in a web browser. Version 3.0 was released in ’96.

TLS: TLS means Transport Layer Security. It is the latest industry-standard cryptography protocol and is a successor to SSL. The initial version was defined in 1999 as an upgrade to SSL 3.0. Version 1.1 released in 2006, 1.2 in Aug 2008, and 1.3 in Aug 2018, each fixing design flaws in the previous versions or adding features. With each new release, there are improvements in security and speed. The updates continue to make all communication safer, faster and more efficient.

One of the biggest turning points in internet history was the infamous POODLE attack in 2014, which was SSL’s death knell. It is a man-in-the-middle attack which exploits Internet and security software client’s fallback to SSL 3.0. On average, one only needs to make 256 SSL 3.0 requests to reveal one byte of encrypted messages. SSL support started being removed soon after this and SSL 3.0 is now considered broken and should no longer be used.

As standards evolved, that protocol has been deprecated and replaced by Transport Layer Security (TLS). Because so many got used to using ‘SSL’ we still use it. The two are interchangeable today, but the actual encryption is TLS, not SSL today. TLS 1.0 is, in reality, SSL 3.1, with just the name of the protocol being changed.

SSL and TLS simply refer to the handshake that takes place between a client and a server. The handshake doesn’t do any encryption itself, it just agrees on a shared secret and type of encryption that is going to be used.

So how does HTTPS or SSL/TLS work?

HTTPS is a secure form of the HTTP protocol. It wraps an encrypted layer around HTTP, Transport Layer Security (TLS).

HTTP is just a protocol, but when paired with TLS it becomes encrypted.

TLS and SSL are Socket oriented protocols thus encrypting a socket or transmission channel between sender and receiver but not the data. This is the main reason which makes these two protocols independent of the application layer.

The HTTPS Stack

HTTPS is based on public/private-key cryptography:

  • The public key is used for encryption
  • The secret private key is required for decryption.

A certificate is a public key with a label identifying the owner.

The beauty is that anyone can intercept every single one of the messages you exchange with a server, including the ones where you are agreeing on the key and encryption strategy to use, and still not be able to read any of the actual data you send.

Let’s now get acquainted with one more process you need to know about, before looking at the handshake, which is acquiring the digital certificate.

HTTPS requires a TLS certificate to be installed on your server. An SSL/TLS certificate works by storing your randomly generated keys (public and private) in your server. The public key is verified with the client and the private key used in the decryption process.

Certificate Authority (CA): In Cryptography, a CA is an entity that issues digital certificates. A digital certificate certifies the ownership of a public key by the named subject of the certificate. Every time you browse through a website using HTTPS, the owner of the site has used a CA to verify their ownership of their domain and obtain the certificate that the site can serve up. There are a huge number of CAs and the way this CA authority mechanics works is that,

  1. Your machine needs to trust a CA
  2. The CA signs the certificate
  3. When it is returned to the browser through the website
  4. Your machine validates that the certificate is legitimate by referring to your local list of trusted authorities

Different browsers and operating systems have different procedures. For example, Chrome takes the trust store of the operating system (except Extended Validity (EV) certificates) as seen on the Root CA Policy of chromium. Firefox, on the other hand, maintains all its CAs themselves and doesn’t use the systems store at all. They also have a published Inclusion Policy. And then there is the Apples Root Certificate Program.

Generally, all of them require that the CA is certified by an acknowledged authority like WebTrust or something equivalent. To get certified the CA has to prove a few things, like how it determines the owner of a domain, how it keeps the root CAs private key secure, processes and a lot of other things. There is a PDF online from WebTrust with all the requirements.

The SSL/TLS certificate is kind of like your passport. It contains various pieces of data, including the name of the owner, the property (eg. domain) it is attached to, the certificate’s public key, the digital signature and information about the certificate’s validity dates. The client checks that it either implicitly trusts the certificate, or that it is verified and trusted by one of several Certificate Authorities (CAs) that it also implicitly trusts. Note that the server is also allowed to require a certificate to prove the client’s identity, but this typically only happens in very sensitive applications. You can take a look at this certificate of any HTTPS website by clicking on the padlock in the address bar.

Certificates are not the same as protocols

Before anyone starts worrying that they need to replace their existing SSL Certificates with TLS Certificates, it’s important to note that certificates are not dependent on protocols. That is, you don’t need to use a TLS Certificate vs. an SSL Certificate. While many vendors tend to use the phrase “SSL/TLS Certificate”, it may be more accurate to call them “Certificates for use with SSL and TLS”, since the protocols are determined by your server configuration, not the certificates themselves.

Now that we are familiar with some required jargon such as SSL, TLS and CA, we are now ready to look at the actual handshake itself.

The HTTPS Handshake

When your browser connects to an HTTPS server, the server will answer with its certificate. The browser checks if the certificate is valid:

  • The owner information needs to match the server name that the user requested.
  • The certificate needs to be signed by a trusted certification authority.

If one of these conditions is not met, the user is informed about the problem.

HTTPS Connection Sequence Diagram

A series of handshakes take place. The initial request is sent to the server for verification. When the server responds that it is the desired server the client then sends a hello message. This initialisation only needs to occur once for each unique connection. HTTP/2 has a distinct advantage over HTTP/1.1 since it multiplexes connections instead of opening multiple connections.

Once the connection is established, both parties can use the agreed algorithm and keys to securely send messages to each other. We will break the handshake up into 3 main phases — Hello, Certificate Exchange and Key Exchange.

Client and server negotiate on how to communicate securely

  1. ClientHello — The handshake begins with the client sending a ClientHello message. This contains all the information the server needs to connect to the client via TLS, including the various Cipher suites it supports in order of preference and the maximum TLS version that it supports.
  2. ServerHello — The server responds with a ServerHello, agrees on protocol version and Cipher suite, and also provides it’s public key back to the client.

The client verifies the public key against its list of trusted CAs. The data is not yet encrypted. It is still in the negotiation phase. ‘Man in the Middle’ can see identities of server and that they are trying to communicate. No content of the communication is being shared yet though.

3. Client key exchange: Client can perform a key exchange with the server and this response is encrypted with the server’s public key. The encryption of the actual message data exchanged by the client and server will be done using a symmetric algorithm, the exact nature of which was already agreed during the Hello phase. Both parties need to agree on this single, symmetric key, a process that is accomplished securely using asymmetric encryption and the server’s public/private keys.

4. Server Finished: Server now returns a server finished. And secure communication can begin.

At this point, the reader communication can proceed. The initial handshakes steps take place in a matter of milliseconds.

When HTTPS is used, Which Elements of the Communication is Encrypted?

Once the HTTPS handshake is complete all communication between the client and the server is encrypted. This includes the full URL, data (plain text or binary), cookies and other headers. The only part of the communication not encrypted is what domain or host the client requested a connection. This is because when the connection is initiated an HTTP request is made to the target server to create the secure connection. Once HTTPS is established the full URL is used.

Is HTTPS Secure?

From a purely technical sense yes. There are ways to attack and potentially hack the protocol, but they require compromising components outside the control of the client and server. And these attacks are not easy.

  • Break into any Certificate Authority
  • Compromise a router near any Certificate Authority
  • Compromise a Certificate Authority’s recursive DNS server
  • Attack some other network protocol, such as TCP or BGP
  • A government could order a Certificate Authority to produce a malicious certificate (only speculative)

Check out https://badssl.com/ which is an interesting site to play around. Also Wireshark is an interesting tool to check out to see the handshake in action and in more detail. It helps debug any connection related issues as well.

To sum up, we have seen how HTTPS ensures confidentiality, message integrity and authenticity. It uses public and private key infrastructure (PKI) for a flexible encryption scheme. It uses asymmetric cryptography for key exchange and thereafter symmetric cryptography for channel encryption.

--

--