MEMEPh. ideas that are worth sharing...

In-depth understanding of how HTTPS works

Foreword


In recent years, the Internet has undergone earth-shaking changes. In particular, the HTTP protocol, which we have been accustomed to, is gradually being replaced by the HTTPS protocol. With the joint promotion of browsers, search engines, CA agencies, and large Internet companies, the Internet has ushered in. In the "HTTPS encryption era", HTTPS will completely replace HTTP as the mainstream of the transmission protocol in the next few years.

After reading this article, I hope you can understand:

If you want to read more high-quality articles, please click on the GitHub blog , and fifty high-quality articles a year are waiting for you!

 

1. What is HTTPS


HTTPS is to establish the SSL encryption layer on HTTP and encrypt the transmitted data. It is a secure version of the HTTP protocol. It is now widely used for security-sensitive communications on the World Wide Web, such as transaction payments.

The main functions of HTTPS are:

(1) Encrypt data and establish an information security channel to ensure data security during transmission;

(2) Perform real identity authentication on the website server.

We often use HTTPS communication on web login pages and shopping checkout interfaces. When using HTTPS communication, it is no longer used http://, but used instead https://. In addition, when the browser accesses a web site where HTTPS communication is valid, a lock mark will appear in the browser's address bar. The way HTTPS is displayed varies by browser.

 

2. Why do you need HTTPS?


There may be security issues such as information theft or identity masquerading in the HTTP protocol. Using the HTTPS communication mechanism can effectively prevent these problems. Next, let's first understand what problems exist in the HTTP protocol:

Since HTTP itself does not have the function of encryption, it is impossible to encrypt the entire communication (the content of requests and responses communicated using the HTTP protocol). That is, HTTP messages are sent in plaintext (referring to unencrypted messages) .

Defects in the HTTP plaintext protocol are an important cause of security issues such as data leakage, data tampering, traffic hijacking, and phishing attacks. The HTTP protocol cannot encrypt data, and all communication data is "streaking" in plaintext on the network. Through network sniffing equipment and some technical means, the content of HTTP packets can be restored.

The so-called integrity refers to the accuracy of information. Failure to prove its completeness usually means that the information cannot be judged as accurate. Since the HTTP protocol cannot prove the integrity of the communication message, there is no way to know even if the content of the request or response has been tampered with during the period after the request or response is sent until the other party receives it.

In other words, there is no way to confirm that the request/response sent and the request/response received are the same .

Requests and responses in the HTTP protocol do not acknowledge the communicating parties . When communicating with the HTTP protocol, since there is no processing step to confirm the communicating party, anyone can initiate a request. In addition, as long as the server receives a request, no matter who the other party is, it will return a response (but only if the IP address and port number of the sender are not restricted by the Web server)

The HTTP protocol cannot verify the identity of the communication party, and anyone can forge a fake server to deceive the user and achieve "phishing fraud" without the user being aware of it.

In contrast, the HTTPS protocol has the following advantages over the HTTP protocol (described in detail below):

 

3. How does HTTPS solve the above problems of HTTP?


HTTPS is not a new protocol at the application layer. Only the HTTP communication interface part is replaced by SSL and TLS protocols.

Typically, HTTP communicates directly with TCP. When using SSL, it evolved to communicate with SSL first, and then communicate with SSL and TCP. In short, the so-called HTTPS is actually HTTP in the shell of the SSL protocol .

After adopting SSL, HTTP has the encryption, certificate and integrity protection of HTTPS. That is to say , HTTP plus encryption processing and authentication and integrity protection is HTTPS .

The main functions of the HTTPS protocol basically depend on the TLS/SSL protocol. The function implementation of TLS/SSL mainly depends on three basic algorithms: hash function, symmetric encryption and asymmetric encryption, which use asymmetric encryption to realize identity authentication and key. Negotiation, the symmetric encryption algorithm uses the negotiated key to encrypt the data, and verifies the integrity of the information based on the hash function .

 

1. Solve the problem that the content may be eavesdropped - encryption

Method 1. Symmetric encryption

In this way, the same key is used for encryption and decryption. Keys are used for both encryption and decryption. The password cannot be decrypted without the key, and conversely, anyone who has the key can decrypt it .

When encrypting with symmetric encryption, the key must also be sent to the other party. But how can it be transferred safely? When forwarding keys over the Internet, if the communication is intercepted, the keys can fall into the hands of an attacker, and the encryption is meaningless. There is also a way to securely store the received key.

 

Method 2. Asymmetric encryption

Public-key encryption uses a pair of asymmetric keys. One is called the private key and the other is called the public key. As the name suggests, the private key cannot be known to anyone else, while the public key can be released at will and available to anyone .

With public key encryption, the party sending the ciphertext uses the other party's public key for encryption, and the other party uses its own private key for decryption after receiving the encrypted information. In this way, there is no need to send the private key for decryption, and there is no need to worry about the key being eavesdropped and stolen by an attacker.

Asymmetric encryption is characterized by one-to-many information transmission, and the server only needs to maintain a private key to perform encrypted communication with multiple clients.

This approach has the following disadvantages:

Method 3. Symmetric encryption + asymmetric encryption (HTTPS adopts this method)

The advantage of using a symmetric key is that the decryption is faster. The advantage of using an asymmetric key is that the transmitted content cannot be cracked, because even if you intercept the data, you cannot crack the content without the corresponding private key. For example, if you grab a safe, but you cannot open the safe without the key to the safe. Then we combine symmetric encryption and asymmetric encryption, and make full use of their respective advantages. Asymmetric encryption is used in the key exchange link, and symmetric encryption is used in the subsequent stage of establishing communication and exchanging messages .

The specific method is: the party sending the ciphertext uses the other party's public key to encrypt the "symmetric key", and then the other party decrypts with its own private key to obtain the "symmetric key", which can ensure that the exchanged key is safe. Under the premise of using symmetric encryption to communicate . Therefore, HTTPS adopts a hybrid encryption mechanism that uses both symmetric encryption and asymmetric encryption.

 

2. Solve the problem that the message may be tampered with - digital signature

There are many intermediate nodes in the network transmission process. Although the data cannot be decrypted, it may be tampered with. How to verify the integrity of the data? ---- Verify digital signature.

 

Digital signatures have two functions :

How digital signatures are generated:

A piece of text is first generated by the Hash function to generate a message digest, and then encrypted with the sender's private key to generate a digital signature, which is sent to the receiver together with the original text. Next is the process of the recipient verifying the digital signature.

 

Check digital signature process :

The receiver can decrypt the encrypted digest information only with the sender's public key, and then use the HASH function to generate a digest message for the received original text, which is compared with the digest message obtained in the previous step. If they are the same, it means that the received information is complete and has not been modified during the transmission process; otherwise, it means that the information has been modified, so the digital signature can verify the integrity of the information.

Suppose the message passing happens between Kobe and James. James sends the message together with the digital signature to Kobe. After Kobe receives the message, it can verify that the received message is sent by James by verifying the digital signature. Of course, the premise of this process is that Kobe knows James' public key. The crux of the problem is that, like the message itself, the public key cannot be sent directly to Kobe in an insecure network, or how can the obtained public key prove to be James's.

At this point, a Certificate Authority (CA) needs to be introduced . There are not many CAs, and the Kobe client has built-in certificates of all trusted CAs. The CA digitally signs James' public key (and other information) to generate the certificate.

 

3. Solve the problem that the identity of the communication party may be disguised - digital certificate

The digital certificate authority is in the position of a third-party authority trusted by both the client and the server. Let's introduce the business process of the digital certificate certification authority:

 

4. HTTPS workflow


1. The Client initiates an HTTPS (for example https://juejin.im/user/5a9a9cdcf265da238b7d771c) request. According to RFC2818, the Client knows that it needs to connect to the 443 (default) port of the Server.

2. The server returns the pre-configured public key certificate to the client.

3. The client verifies the public key certificate: such as whether it is within the validity period, whether the purpose of the certificate matches the site requested by the client, whether it is in the CRL revocation list, whether its upper-level certificate is valid, this is a recursive process, until Verify to the root certificate (the built-in Root certificate of the operating system or the built-in Root certificate of the Client). If the verification is passed, continue, otherwise, a warning message will be displayed.

4. The client uses the pseudo-random number generator to generate the symmetric key used for encryption, and then encrypts the symmetric key with the public key of the certificate and sends it to the server.

5. The server decrypts the message with its own private key and obtains the symmetric key. So far, both Client and Server hold the same symmetric key.

6. The server encrypts "plaintext content A" with a symmetric key and sends it to the client.

7. The Client decrypts the ciphertext of the response using the symmetric key, and obtains "Plaintext Content A".

8. The client initiates an HTTPS request again, and uses the symmetric key to encrypt the "plaintext content B" of the request, and then the server uses the symmetric key to decrypt the ciphertext to obtain "plaintext content B".

 

5. the difference between HTTP and HTTPS

Regarding security, the simplest analogy to describe the relationship between the two is that trucks deliver goods. The trucks under HTTP are open-top, and the goods are exposed. On the other hand, https is a closed container vehicle, and the security is naturally improved a lot.

 

6. Why Not All Websites Use HTTPS


Since HTTPS is so secure and reliable, why don't all web sites use HTTPS?

First of all, many people still think that there is a threshold for HTTPS implementation. This threshold lies in the need for an SSL certificate issued by an authoritative CA. From the selection, purchase and deployment of certificates, the traditional mode is time-consuming and labor-intensive.

Second, HTTPS is generally considered to be more performant than HTTP, because encrypted communication consumes more CPU and memory resources than plain text communication . If each communication is encrypted, it will consume a lot of resources, and when it is evenly distributed on a computer, the number of requests that can be processed will certainly be reduced accordingly. But this is not the case. Users can solve this problem by optimizing performance and deploying certificates in SLB or CDN. To give a practical example, during the "Double Eleven" period, Taobao and Tmall with HTTPS on the whole site still ensured smooth and smooth operations such as website and mobile access, browsing, and transactions. Through testing, it is found that the performance of many pages after optimization is the same as that of HTTP or even slightly improved, so HTTPS is actually not slow after optimization.

In addition, wanting to save the cost of purchasing certificates is also one of the reasons . For HTTPS communication, a certificate is essential. The certificate used must be purchased from a Certification Authority (CA).

Lastly is security awareness. Compared with China, the security awareness and technology application of the foreign Internet industry are relatively mature, and the trend of HTTPS deployment is jointly promoted by the society, enterprises, and the government.