In-depth understanding of how HTTPS works
Foreword
In recent years, the Internet has undergone earth-shaking changes. In particular, the HTTP protocol, which we have been accustomed to, is gradually being replaced by the HTTPS protocol. With the joint promotion of browsers, search engines, CA agencies, and large Internet companies, the Internet has ushered in. In the "HTTPS encryption era", HTTPS will completely replace HTTP as the mainstream of the transmission protocol in the next few years.
After reading this article, I hope you can understand:
- What's wrong with HTTP communication
- How HTTPS Improves HTTP and Those Problems
- How HTTPS Works
If you want to read more high-quality articles, please click on the GitHub blog , and fifty high-quality articles a year are waiting for you!
1. What is HTTPS
HTTPS is to establish the SSL encryption layer on HTTP and encrypt the transmitted data. It is a secure version of the HTTP protocol. It is now widely used for security-sensitive communications on the World Wide Web, such as transaction payments.
The main functions of HTTPS are:
(1) Encrypt data and establish an information security channel to ensure data security during transmission;
(2) Perform real identity authentication on the website server.
We often use HTTPS communication on web login pages and shopping checkout interfaces. When using HTTPS communication, it is no longer used http://, but used instead https://. In addition, when the browser accesses a web site where HTTPS communication is valid, a lock mark will appear in the browser's address bar. The way HTTPS is displayed varies by browser.
2. Why do you need HTTPS?
There may be security issues such as information theft or identity masquerading in the HTTP protocol. Using the HTTPS communication mechanism can effectively prevent these problems. Next, let's first understand what problems exist in the HTTP protocol:
- Communication is in clear text (not encrypted), the content may be eavesdropped
Since HTTP itself does not have the function of encryption, it is impossible to encrypt the entire communication (the content of requests and responses communicated using the HTTP protocol). That is, HTTP messages are sent in plaintext (referring to unencrypted messages) .
Defects in the HTTP plaintext protocol are an important cause of security issues such as data leakage, data tampering, traffic hijacking, and phishing attacks. The HTTP protocol cannot encrypt data, and all communication data is "streaking" in plaintext on the network. Through network sniffing equipment and some technical means, the content of HTTP packets can be restored.
- The integrity of the message cannot be proven, so it may be tampered with
The so-called integrity refers to the accuracy of information. Failure to prove its completeness usually means that the information cannot be judged as accurate. Since the HTTP protocol cannot prove the integrity of the communication message, there is no way to know even if the content of the request or response has been tampered with during the period after the request or response is sent until the other party receives it.
In other words, there is no way to confirm that the request/response sent and the request/response received are the same .
- Does not verify the identity of the communicating party, so there is a risk of masquerading
Requests and responses in the HTTP protocol do not acknowledge the communicating parties . When communicating with the HTTP protocol, since there is no processing step to confirm the communicating party, anyone can initiate a request. In addition, as long as the server receives a request, no matter who the other party is, it will return a response (but only if the IP address and port number of the sender are not restricted by the Web server)
The HTTP protocol cannot verify the identity of the communication party, and anyone can forge a fake server to deceive the user and achieve "phishing fraud" without the user being aware of it.
In contrast, the HTTPS protocol has the following advantages over the HTTP protocol (described in detail below):
- Data privacy: content is symmetrically encrypted, and each connection generates a unique encryption key
- Data Integrity: Content transmission is integrity checked
- Identity authentication: a third party cannot forge the identity of the server (client)
3. How does HTTPS solve the above problems of HTTP?
HTTPS is not a new protocol at the application layer. Only the HTTP communication interface part is replaced by SSL and TLS protocols.
Typically, HTTP communicates directly with TCP. When using SSL, it evolved to communicate with SSL first, and then communicate with SSL and TCP. In short, the so-called HTTPS is actually HTTP in the shell of the SSL protocol .
After adopting SSL, HTTP has the encryption, certificate and integrity protection of HTTPS. That is to say , HTTP plus encryption processing and authentication and integrity protection is HTTPS .
The main functions of the HTTPS protocol basically depend on the TLS/SSL protocol. The function implementation of TLS/SSL mainly depends on three basic algorithms: hash function, symmetric encryption and asymmetric encryption, which use asymmetric encryption to realize identity authentication and key. Negotiation, the symmetric encryption algorithm uses the negotiated key to encrypt the data, and verifies the integrity of the information based on the hash function .
1. Solve the problem that the content may be eavesdropped - encryption
Method 1. Symmetric encryption
In this way, the same key is used for encryption and decryption. Keys are used for both encryption and decryption. The password cannot be decrypted without the key, and conversely, anyone who has the key can decrypt it .
When encrypting with symmetric encryption, the key must also be sent to the other party. But how can it be transferred safely? When forwarding keys over the Internet, if the communication is intercepted, the keys can fall into the hands of an attacker, and the encryption is meaningless. There is also a way to securely store the received key.
Method 2. Asymmetric encryption
Public-key encryption uses a pair of asymmetric keys. One is called the private key and the other is called the public key. As the name suggests, the private key cannot be known to anyone else, while the public key can be released at will and available to anyone .
With public key encryption, the party sending the ciphertext uses the other party's public key for encryption, and the other party uses its own private key for decryption after receiving the encrypted information. In this way, there is no need to send the private key for decryption, and there is no need to worry about the key being eavesdropped and stolen by an attacker.
Asymmetric encryption is characterized by one-to-many information transmission, and the server only needs to maintain a private key to perform encrypted communication with multiple clients.
This approach has the following disadvantages:
- The public key is public, so hackers can use the public key to decrypt the information encrypted by the private key after intercepting it to obtain the content;
- The public key does not contain the information of the server. The use of asymmetric encryption algorithm cannot ensure the legitimacy of the server's identity, and there is a risk of man-in-the-middle attacks. The public key sent by the server to the client may be intercepted and tampered with by the man-in-the-middle during transmission;
- The use of asymmetric encryption takes a certain amount of time in the data encryption and decryption process , which reduces the data transmission efficiency;
Method 3. Symmetric encryption + asymmetric encryption (HTTPS adopts this method)
The advantage of using a symmetric key is that the decryption is faster. The advantage of using an asymmetric key is that the transmitted content cannot be cracked, because even if you intercept the data, you cannot crack the content without the corresponding private key. For example, if you grab a safe, but you cannot open the safe without the key to the safe. Then we combine symmetric encryption and asymmetric encryption, and make full use of their respective advantages. Asymmetric encryption is used in the key exchange link, and symmetric encryption is used in the subsequent stage of establishing communication and exchanging messages .
The specific method is: the party sending the ciphertext uses the other party's public key to encrypt the "symmetric key", and then the other party decrypts with its own private key to obtain the "symmetric key", which can ensure that the exchanged key is safe. Under the premise of using symmetric encryption to communicate . Therefore, HTTPS adopts a hybrid encryption mechanism that uses both symmetric encryption and asymmetric encryption.
2. Solve the problem that the message may be tampered with - digital signature
There are many intermediate nodes in the network transmission process. Although the data cannot be decrypted, it may be tampered with. How to verify the integrity of the data? ---- Verify digital signature.
Digital signatures have two functions :
- It can be determined that the message is indeed signed and sent by the sender, because others cannot fake the sender's signature.
- The digital signature can confirm the integrity of the message and prove whether the data has not been tampered with.
How digital signatures are generated:
A piece of text is first generated by the Hash function to generate a message digest, and then encrypted with the sender's private key to generate a digital signature, which is sent to the receiver together with the original text. Next is the process of the recipient verifying the digital signature.
Check digital signature process :
The receiver can decrypt the encrypted digest information only with the sender's public key, and then use the HASH function to generate a digest message for the received original text, which is compared with the digest message obtained in the previous step. If they are the same, it means that the received information is complete and has not been modified during the transmission process; otherwise, it means that the information has been modified, so the digital signature can verify the integrity of the information.
Suppose the message passing happens between Kobe and James. James sends the message together with the digital signature to Kobe. After Kobe receives the message, it can verify that the received message is sent by James by verifying the digital signature. Of course, the premise of this process is that Kobe knows James' public key. The crux of the problem is that, like the message itself, the public key cannot be sent directly to Kobe in an insecure network, or how can the obtained public key prove to be James's.
At this point, a Certificate Authority (CA) needs to be introduced . There are not many CAs, and the Kobe client has built-in certificates of all trusted CAs. The CA digitally signs James' public key (and other information) to generate the certificate.
3. Solve the problem that the identity of the communication party may be disguised - digital certificate
The digital certificate authority is in the position of a third-party authority trusted by both the client and the server. Let's introduce the business process of the digital certificate certification authority:
- The operator of the server submits the public key, organizational information, personal information (domain name) and other information to the third-party agency CA and applies for certification;
- CA verifies the authenticity of the information provided by the applicant through online, offline and other means, such as whether the organization exists, whether the enterprise is legal, whether it has the ownership of the domain name, etc.;
- If the information is reviewed and approved, the CA will issue a certification document-certificate to the applicant. The certificate contains the following information: the applicant's public key, the applicant's organizational information and personal information, the information of the issuing authority CA, the validity time, the plaintext of the certificate serial number and other information, and also contains a signature. The generation algorithm of the signature: first, use the hash function to calculate the information digest of the public plaintext information, and then use the CA's private key to encrypt the information digest, and the ciphertext is the signature;
- When the client Client sends a request to the server Server, the Server returns the certificate file;
- The client Client reads the relevant plaintext information in the certificate, uses the same hash function to calculate the information digest, and then uses the public key of the corresponding CA to decrypt the signed data, and compares the information digest of the certificate. Legitimacy, that is, the server's public key is trustworthy.
- The client will also verify the domain name information, validity time and other information related to the certificate; the client will have built-in trusted CA certificate information (including the public key), if the CA is not trusted, the certificate corresponding to the CA cannot be found, and the certificate will also be judged illegal.
4. HTTPS workflow
1. The Client initiates an HTTPS (for example https://juejin.im/user/5a9a9cdcf265da238b7d771c) request. According to RFC2818, the Client knows that it needs to connect to the 443 (default) port of the Server.
2. The server returns the pre-configured public key certificate to the client.
3. The client verifies the public key certificate: such as whether it is within the validity period, whether the purpose of the certificate matches the site requested by the client, whether it is in the CRL revocation list, whether its upper-level certificate is valid, this is a recursive process, until Verify to the root certificate (the built-in Root certificate of the operating system or the built-in Root certificate of the Client). If the verification is passed, continue, otherwise, a warning message will be displayed.
4. The client uses the pseudo-random number generator to generate the symmetric key used for encryption, and then encrypts the symmetric key with the public key of the certificate and sends it to the server.
5. The server decrypts the message with its own private key and obtains the symmetric key. So far, both Client and Server hold the same symmetric key.
6. The server encrypts "plaintext content A" with a symmetric key and sends it to the client.
7. The Client decrypts the ciphertext of the response using the symmetric key, and obtains "Plaintext Content A".
8. The client initiates an HTTPS request again, and uses the symmetric key to encrypt the "plaintext content B" of the request, and then the server uses the symmetric key to decrypt the ciphertext to obtain "plaintext content B".
5. the difference between HTTP and HTTPS
- HTTP is a clear text transmission protocol, and HTTPS protocol is a network protocol constructed by SSL+HTTP protocol for encrypted transmission and identity authentication, which is more secure than HTTP protocol.
Regarding security, the simplest analogy to describe the relationship between the two is that trucks deliver goods. The trucks under HTTP are open-top, and the goods are exposed. On the other hand, https is a closed container vehicle, and the security is naturally improved a lot.
- HTTPS is more secure than HTTP, more friendly to search engines, and beneficial to SEO. Google and Baidu preferentially index HTTPS pages;
- HTTPS requires an SSL certificate, while HTTP does not;
- HTTPS standard port 443, HTTP standard port 80;
- HTTPS is based on the transport layer, HTTP is based on the application layer;
- HTTPS displays a green security lock in the browser, HTTP does not;
6. Why Not All Websites Use HTTPS
Since HTTPS is so secure and reliable, why don't all web sites use HTTPS?
First of all, many people still think that there is a threshold for HTTPS implementation. This threshold lies in the need for an SSL certificate issued by an authoritative CA. From the selection, purchase and deployment of certificates, the traditional mode is time-consuming and labor-intensive.
Second, HTTPS is generally considered to be more performant than HTTP, because encrypted communication consumes more CPU and memory resources than plain text communication . If each communication is encrypted, it will consume a lot of resources, and when it is evenly distributed on a computer, the number of requests that can be processed will certainly be reduced accordingly. But this is not the case. Users can solve this problem by optimizing performance and deploying certificates in SLB or CDN. To give a practical example, during the "Double Eleven" period, Taobao and Tmall with HTTPS on the whole site still ensured smooth and smooth operations such as website and mobile access, browsing, and transactions. Through testing, it is found that the performance of many pages after optimization is the same as that of HTTP or even slightly improved, so HTTPS is actually not slow after optimization.
In addition, wanting to save the cost of purchasing certificates is also one of the reasons . For HTTPS communication, a certificate is essential. The certificate used must be purchased from a Certification Authority (CA).
Lastly is security awareness. Compared with China, the security awareness and technology application of the foreign Internet industry are relatively mature, and the trend of HTTPS deployment is jointly promoted by the society, enterprises, and the government.