MEMEPh. ideas that are worth sharing...

Deciphering HTTP2 and HTTP3 Features

Foreword


Compared with HTTP/1.1, HTTP/2 can be said to greatly improve the performance of web pages. Just upgrading to this protocol can reduce a lot of performance optimization work that needs to be done before. Of course, compatibility issues and how to downgrade gracefully should be domestic One of the reasons it is not commonly used.

Although HTTP/2 improves the performance of web pages, it does not mean that it is perfect. HTTP/3 was introduced to solve some of the problems existing in HTTP/2.

 

1. What has changed since the invention of HTTP/1.1?


If you look closely at the downloads required to open the home pages of the most popular websites, you will find a very clear trend. In recent years, the amount of downloaded data required to load the homepage of a website has gradually increased and has exceeded 2100K. But what we should be more concerned about here is that the average number of resources that each page needs to download in order to complete the display and rendering has exceeded 100.

As the graph below shows, since 2011, the size of the transferred data and the average number of requested resources have continued to grow and show no signs of slowing down. The green line in the graph shows the increase in the size of the transferred data, and the red line shows the increase in the average number of requested resources.

Since HTTP/1.1 was released in 1997, we have been using HTTP/1.x for a long time, but with the explosive development of the Internet in the past decade, the content of web pages has changed from text-based content to rich media ( Such as pictures, sound, video), and there are more and more applications (such as chat, live video) that have high real-time requirements on page content, so some features specified in the protocol at that time can no longer meet the needs of modern networks.

 

2. Defects of HTTP/1.1


1. High Latency - Bringing down the page loading speed

Although network bandwidth has grown very rapidly in recent years, we have not seen a corresponding reduction in network latency. The network delay problem is mainly due to Head-Of-Line Blocking, which results in the inability of bandwidth to be fully utilized .

Head-of-queue blocking means that when a request in the sequence of requests sent sequentially is blocked for some reason, all the requests queued later are also blocked, which will cause the client to delay receiving data. For head-of-line blocking, people have tried the following solutions:

Distribute the resources of the same page to different domain names to increase the connection limit. Chrome has a mechanism that allows the establishment of 6 TCP persistent connections at the same time for the same domain name by default . When using persistent connections, although a TCP pipe can be shared, only one request can be processed in a pipe at a time . Until the end, other requests can only be in a blocking state. In addition, if 10 requests occur at the same time under the same domain name, then 4 of them will enter the queue waiting state until the ongoing requests are completed.

Spriting combines multiple small pictures into one large picture, and then uses JavaScript or CSS to "cut" the small pictures again.

Inlining is another technique to prevent sending many small image requests. The original data of the image is embedded in the URL in the CSS file to reduce the number of network requests.

.icon1 {
    background: url(data:image/png;base64,<data>) no-repeat;
  }
.icon2 {
    background: url(data:image/png;base64,<data>) no-repeat;
  }

Concatenation uses tools such as webpack to package multiple smaller JavaScript files into one larger JavaScript file, but if one of the files is changed, a large amount of data will be re-downloaded to multiple files.

 

2. Stateless features - huge HTTP headers

Because the message Header generally carries many fixed header fields such as "User Agent", "Cookie", "Accept", "Server" (as shown in the figure below), up to hundreds of bytes or even thousands of bytes, but the Body is often only dozens of bytes. Bytes (such as GET requests,

204/301/304 responses) have become an out-and-out "big son". The content carried in the header is too large, which increases the transmission cost to a certain extent. What's worse, many field values ​​in thousands of request and response messages are repeated, which is very wasteful.

 

3. Clear text transmission - insecurity

When HTTP/1.1 transmits data, all the transmitted content is in clear text, and neither the client nor the server can verify the identity of the other party, which cannot guarantee the security of the data to a certain extent.

 

Have you heard the news about the "free WiFi trap"?

Hackers take advantage of the shortcomings of HTTP plaintext transmission and set up a WiFi hotspot in public places to start "fishing" to trick netizens into surfing the Internet. Once you connect to this WiFi hotspot, all traffic will be intercepted and saved. If there is sensitive information such as bank card number and website password, it will be dangerous. Hackers can pretend to be you and do whatever they want.

4. Does not support server push messages

 

3. Introduction to SPDY Protocol and HTTP/2


1. SPDY protocol

As we mentioned above, due to the defects of HTTP/1.x, we will introduce sprite images, inline small images, use multiple domain names, etc. to improve performance. However, these optimizations bypassed the protocol. Until 2009, Google disclosed the SPDY protocol developed by itself, mainly to solve the problem of HTTP/1.1 inefficiency. Google's launch of SPDY is a formal transformation of the HTTP protocol itself. Reduce latency, compress headers, etc. The practice of SPDY has proved the effect of these optimizations, and finally brought the birth of HTTP/2.

HTTP/1.1 has two main shortcomings: insufficient security and low performance . Due to the huge historical burden of HTTP/1.x, protocol modification and compatibility are the primary goals to be considered, otherwise it will destroy countless numbers on the Internet. existing assets. As shown in the figure above,

SPDY is located under HTTP and above TCP and SSL, so that it can be easily compatible with the old version of the HTTP protocol (encapsulate the content of HTTP1.x into a new frame format), and can use the existing SSL function at the same time .

After the SPDY protocol proved feasible on the Chrome browser, it was used as the basis of HTTP/2, and the main features were inherited in HTTP/2.

 

2. Introduction to HTTP/2

In 2015, HTTP/2 was released. HTTP/2 is a replacement for the current HTTP protocol (HTTP/1.x), but it is not a rewrite, HTTP methods/status codes/semantics are the same as HTTP/1.x. HTTP/2 is based on SPDY, focused on performance, and one of the biggest goals is to use only one connection between the user and the website . Judging from the current situation, some top sites at home and abroad have basically realized the deployment of HTTP/2, and the use of HTTP/2 can bring about 20% to 60% efficiency improvement.

HTTP/2 consists of two specifications:

  1. Hypertext Transfer Protocol version 2 - RFC7540
  2. HPACK - Header Compression for HTTP/2 - RFC7541

 

4. New features of HTTP/2


1. Binary transmission

There are two main reasons for the substantial reduction in the amount of data transmitted by HTTP/2: transmission in binary mode and header compression . Let's first introduce binary transmission. HTTP/2 uses binary format to transmit data, rather than plain text messages in HTTP/1.x. The binary protocol is more efficient to parse. HTTP/2 splits request and response data into smaller frames, and they are binary encoded .

It moves some features of the TCP protocol to the application layer, "breaks up" the original "Header+Body" message into several small binary "Frames", and uses "HEADERS" frames to store header data, "DATA" Frames store entity data. After the HTP/2 data is framed, the "Header+Body" message structure completely disappears, and the protocol sees only "fragments" one by one.

In HTTP/2, all communication under the same domain name is done on a single connection, which can carry any number of bidirectional data streams. Each data stream is sent as a message, which in turn consists of one or more frames. Multiple frames can be sent out of sequence, and can be reassembled according to the stream identifier in the frame header .

 

2.Header compression

HTTP/2 does not use the traditional compression algorithm, but develops a special "HPACK" algorithm, establishes a "dictionary" at both ends of the client and the server, uses index numbers to represent repeated strings, and uses Huffman coding to Compressing integers and strings can achieve a high compression rate of 50% to 90%.

 

Specifically:

For example, in the two requests in the figure below, the first request sends all the header fields, and the second request only needs to send the difference data, which can reduce redundant data and reduce overhead

 

3. Multiplexing

The technique of multiplexing was introduced in HTTP/2. Multiplexing is a good solution to the problem that browsers limit the number of requests under the same domain name, and it is also easier to achieve full-speed transmission. After all, a new TCP connection needs to slowly increase the transmission speed.

You can intuitively feel how much faster HTTP/2 is than HTTP/1 through this link. In HTTP/2, with binary framing, HTTP/2 no longer relies on TCP links to achieve multi-stream parallelism. In HTTP/2,

This feature greatly improves performance:

As shown in the figure above, the multiplexing technique can transmit all request data through only one TCP connection.

 

4. Server Push

HTTP2 also changes the traditional "request-response" working mode to a certain extent. The server no longer responds to requests completely passively, but can also create a new "stream" to actively send messages to the client. For example, when the browser just requests HTML, it sends the JS and CSS files that may be used to the client in advance to reduce the waiting delay, which is called "Server Push" (Server Push, also called Cache push)

For example, as shown in the figure below, the server actively pushes JS and CSS files to the client without sending these requests when the client parses HTML.

In addition, it needs to be added that the server can actively push, and the client also has the right to choose whether to receive or not. If the resource pushed by the server has been cached by the browser, the browser can reject it by sending a RST_STREAM frame. Active push also abides by the same-origin policy. In other words, the server cannot push third-party resources to the client at will, but must be confirmed by both parties.

 

5. Improve security

For compatibility reasons, HTTP/2 continues the "plaintext" feature of HTTP/1. Data can be transmitted in plaintext as before, and encrypted communication is not mandatory. However, the format is still binary, but no decryption is required.

However, since HTTPS is the general trend, and mainstream browsers such as Chrome and Firefox have publicly announced that they only support encrypted HTTP/2, the "fact" HTTP/2 is encrypted . That is to say, HTTP/2 that is usually seen on the Internet uses the "https" protocol name and runs on TLS. The HTTP/2 protocol defines two string identifiers: "h2" for encrypted HTTP/2, and "h2c" for plaintext HTTP/2.
 

6. New features of HTTP/3


1. Disadvantages of HTTP/2

Although HTTP/2 solves many problems of previous versions, it still has a huge problem, mainly caused by the underlying TCP protocol . The main disadvantages of HTTP/2 are as follows:

HTTP/2 is transmitted using the TCP protocol, and if HTTPS is used, the TLS protocol is also required for secure transmission, and the use of TLS also requires a handshake process, which requires two handshake delay processes :

① When establishing a TCP connection, it is necessary to perform a three-way handshake with the server to confirm that the connection is successful, which means that data transmission can only be performed after 1.5 RTTs are consumed.

② Make a TLS connection. There are two versions of TLS—TLS1.2 and TLS1.3. The time it takes for each version to establish a connection is different, roughly 1~2 RTTs are required.

In short, we need to spend 3-4 RTTs before transmitting the data.

We mentioned above that in HTTP/2, multiple requests are run in a TCP pipeline. But when there is packet loss, HTTP/2 does not perform as well as HTTP/1. Because TCP has a special "packet loss retransmission" mechanism in order to ensure reliable transmission, lost packets must wait for retransmission confirmation. When HTTP/2 packet loss occurs, the entire TCP must start waiting for retransmission, then it will block All requests in this TCP connection (as shown below). For HTTP/1.1, multiple TCP connections can be opened. In this case, only one of the connections will be affected, and the remaining TCP connections can still transmit data normally.

After reading this, some people may consider why not directly modify the TCP protocol? In fact, this is an impossible task. Because TCP has existed for too long, it has been flooded in various devices, and this protocol is implemented by the operating system, and it is not realistic to update it.

 

2. Introduction to HTTP/3

Google was aware of these problems when it launched SPDY, so it started a new "QUIC" protocol based on the UDP protocol, allowing HTTP to run on QUIC instead of TCP.

And this "HTTP over QUIC" is the next major version of the HTTP protocol, HTTP/3. It achieves a qualitative leap on the basis of HTTP/2, and truly "perfectly" solves the "head of line blocking" problem.

Although QUIC is based on UDP, many new functions have been added on the original basis. Next, we will focus on several new functions of QUIC. However, HTTP/3 is still in the draft stage and may change before the official release, so this article tries not to cover those unstable details.

 

3. QUIC new features

We mentioned above that QUIC is based on UDP, and UDP is "connectionless" and does not require "handshake" and "hand wave" at all, so it is faster than TCP. In addition, QUIC also achieves reliable transmission, ensuring that data can reach its destination. It also introduced HTTP/2-like "streaming" and "multiplexing", where a single "stream" is ordered and may block due to packet loss, but other "streams" are not affected. Specifically, the QUIC protocol has the following characteristics:

Although UDP does not provide reliable transmission, QUIC adds a layer on top of UDP to ensure reliable data transmission. It provides packet retransmission, congestion control, and some other features present in TCP.

Since QUIC is based on UDP, QUIC can use 0-RTT or 1-RTT to establish a connection, which means that QUIC can send and receive data at the fastest speed, which can greatly improve the speed of opening a page for the first time. 0RTT connection can be said to be the biggest performance advantage of QUIC over HTTP2 .

QUIC currently uses TLS1.3, which has more advantages than the earlier version of TLS1.3, the most important of which is to reduce the number of RTTs spent in handshakes.

Unlike TCP, QUIC implements that there can be multiple independent logical data streams on the same physical connection (as shown in the figure below). The separate transmission of the data stream is realized, which solves the problem of TCP squadron head blocking.

 

7. Summary