Understanding Content-Length: Avoiding Common Pitfalls in HTTP Message Handling

Contents hide

Introduction

What is Content-Length

How Content-Length Works

Content-Length > Actual Length

Content-Length < Actual Length

What to Do If You Can’t Determine the Value of Content-Length

What is Transfer-Encoding: chunked

How Transfer-Encoding: chunked Works

Conclusion

Introduction

Content-Length, the HTTP message length, is represented by the number of bytes in decimal digits. In general, many tasks are handled by frameworks, so we rarely pay attention to this content, but in rare cases where Content-Length doesn’t match the actual message length, the program may experience rather strange anomalies, such as:

No response until timeout.
The request gets truncated, and the parsing of the next request becomes confused.

What is Content-Length

Content-Length is the length of the HTTP message, represented by the number of bytes in decimal digits, and is a common field in Headers. Content-Length should be precise, otherwise it will lead to anomalies (specifically, in HTTP1.0, this field is optional).

The Content-Length header indicates the byte size of the entity body in the message. This size includes all content encoding, for example, if a text file is compressed with gzip, the Content-Length header refers to the size after compression and not the original size.

How Content-Length Works

Content-Length uses decimal numbers to represent the message’s length, allowing the server/client to know the length of the message to be read next.

Content-Length >

If this length is incorrect, the following situations occur:

Content-Length > Actual Length

If the Content-Length is larger than the actual length, after reading to the end of the message, the server/client will wait for the next byte, naturally leading to no response until timeout.

Similarly, if the Content-Length in the response message exceeds the actual length, the effect is the same:

Content-Length < Actual Length

If this length is less than the actual length, the first request’s message will be truncated, for example, the parameter is param=piaoruiqing, and Content-Length is 10, then the message of this request will be truncated to: param=piao, as shown:

But, is that it? Of course not, let’s see what unexpected things occur during the second request, as shown:

During two consecutive requests, the first message is truncated, while the second doesn’t experience the expected cutoff, instead the server throws an exception: Request method 'ruiqingPOST' not supported. Isn’t it exciting? (ﾉ)ﾟДﾟ( )

What kind of magical method is ruiqingPOST? At this point, with the sensitivity cultivated through years of development (DEBUG) experience, we can generally guess that the leftover message from the previous request appeared in this request. Whip out Wireshark to verify, as shown:

The cause of this situation is that Connection: keep-alive is enabled. If Connection: close is used, the phenomenon produced is each request being truncated, but no parsing confusion occurs (such as concatenating the leftover message from the previous request to subsequent request messages).

Copyright Notice

 This article was published on [Piaoruiqing's Blog](https://blog.piaoruiqing.com/), allowing for non-commercial use reproduction. However, reproduction must retain the original author [Piaoruiqing](https://blog.piaoruiqing.com/) and the link: [https://blog.piaoruiqing.com](https://blog.piaoruiqing.com/).

 For authorization negotiations or cooperation, please contact: [[email protected]](https://blog.piaoruiqing.com/mailto:[email protected]).

What to Do If You Can’t Determine the Value of Content-Length

The Content-Length header indicates the byte size of the entity body in the message. However, if the message length cannot be obtained before the request processing is completed, we cannot explicitly specify Content-Length, and instead should use Transfer-Encoding: chunked.

What is Transfer-Encoding: chunked

The data is sent in a series of chunks. The Content-Length header is not sent in this case. At the beginning of each chunk, the current chunk’s length must be explicitly indicated in hexadecimal followed by \r\n, followed by the chunk itself, also ending with \r\n. The terminating block is a regular chunk except for its length being 0.

How Transfer-Encoding: chunked Works

Let’s examine how Transfer-Encoding: chunked works using a file download example. The server code is as follows:

Use postman to initiate a request and observe it with Wireshark, as shown:

In Wireshark, you can clearly see the chunked data structure, which generally consists of: the returned message is divided into multiple data blocks, each contains two parts, length + data, both parts end with CRLF (\r\n). The terminating block is a special data block with a length of 0, as shown:

Thus, the chunked encoding is completed. It is mainly used in scenarios where a large amount of data is transferred, but the length of the response is not available until the request is fully processed. For instance, when a large HTML table generated from data queried from a database needs to be transmitted, or when transmitting large amounts of images.

Conclusion

If Content-Length exists and takes effect, it must be correct, or exceptions will occur. (Greater than the actual value causes timeout, less than the actual value causes truncation and may lead to subsequent data parsing confusion)
If the message contains the Transfer-Encoding: chunked header, then Content-Length will be ignored.

Unicorn Network Threat Analyzer

Understanding Content-Length: Avoiding Common Pitfalls in HTTP Message Handling

Introduction

What is Content-Length

How Content-Length Works

Content-Length > Actual Length

Content-Length < Actual Length

What to Do If You Can’t Determine the Value of Content-Length

What is Transfer-Encoding: chunked

How Transfer-Encoding: chunked Works

Conclusion

Real-time,Accuracy and Efficiency

Products

Quick Links

Company

Unicorn Network Threat Analyzer

Understanding Content-Length: Avoiding Common Pitfalls in HTTP Message Handling

Introduction

What is Content-Length

How Content-Length Works

Content-Length > Actual Length

Content-Length < Actual Length

What to Do If You Can’t Determine the Value of Content-Length

What is Transfer-Encoding: chunked

How Transfer-Encoding: chunked Works

Conclusion

Related posts:

Real-time,Accuracy and Efficiency

Products

Quick Links

Company