Git development
 help / color / mirror / Atom feed
From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <stolee@gmail.com>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"Jeff King" <peff@peff.net>, "Patrick Steinhardt" <ps@pks.im>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>
Subject: [PATCH v3 02/11] git-zlib: handle data streams larger than 4GB
Date: Fri, 08 May 2026 08:16:40 +0000	[thread overview]
Message-ID: <c611913194cab1fcba5f990bf44ba15f721e1223.1778228209.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2102.v3.git.1778228209.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

On Windows, zlib's `uLong` type is 32-bit even on 64-bit systems. When
processing data streams larger than 4GB, the `total_in` and `total_out`
fields in zlib's `z_stream` structure wrap around, which caused the
sanity checks in `zlib_post_call()` to trigger `BUG()` assertions.

The git_zstream wrapper now tracks its own 64-bit totals rather than
copying them from zlib. The sanity checks compare only the low bits,
using `maximum_unsigned_value_of_type(uLong)` to mask appropriately for
the platform's `uLong` size.

This is based on work by LordKiRon in git-for-windows#6076.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 git-zlib.c    | 25 +++++++++++++++++--------
 git-zlib.h    |  4 ++--
 object-file.c |  2 +-
 3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/git-zlib.c b/git-zlib.c
index df9604910e..b91cb323ae 100644
--- a/git-zlib.c
+++ b/git-zlib.c
@@ -30,6 +30,9 @@ static const char *zerr_to_string(int status)
  */
 /* #define ZLIB_BUF_MAX ((uInt)-1) */
 #define ZLIB_BUF_MAX ((uInt) 1024 * 1024 * 1024) /* 1GB */
+
+/* uLong is 32-bit on Windows, even on 64-bit systems */
+#define ULONG_MAX_VALUE maximum_unsigned_value_of_type(uLong)
 static inline uInt zlib_buf_cap(unsigned long len)
 {
 	return (ZLIB_BUF_MAX < len) ? ZLIB_BUF_MAX : len;
@@ -39,31 +42,37 @@ static void zlib_pre_call(git_zstream *s)
 {
 	s->z.next_in = s->next_in;
 	s->z.next_out = s->next_out;
-	s->z.total_in = s->total_in;
-	s->z.total_out = s->total_out;
+	s->z.total_in = (uLong)(s->total_in & ULONG_MAX_VALUE);
+	s->z.total_out = (uLong)(s->total_out & ULONG_MAX_VALUE);
 	s->z.avail_in = zlib_buf_cap(s->avail_in);
 	s->z.avail_out = zlib_buf_cap(s->avail_out);
 }
 
 static void zlib_post_call(git_zstream *s, int status)
 {
-	unsigned long bytes_consumed;
-	unsigned long bytes_produced;
+	size_t bytes_consumed;
+	size_t bytes_produced;
 
 	bytes_consumed = s->z.next_in - s->next_in;
 	bytes_produced = s->z.next_out - s->next_out;
-	if (s->z.total_out != s->total_out + bytes_produced)
+	/*
+	 * zlib's total_out/total_in are uLong which may wrap for >4GB.
+	 * We track our own totals and verify only the low bits match.
+	 */
+	if ((s->z.total_out & ULONG_MAX_VALUE) !=
+	    ((s->total_out + bytes_produced) & ULONG_MAX_VALUE))
 		BUG("total_out mismatch");
 	/*
 	 * zlib does not update total_in when it returns Z_NEED_DICT,
 	 * causing a mismatch here. Skip the sanity check in that case.
 	 */
 	if (status != Z_NEED_DICT &&
-	    s->z.total_in != s->total_in + bytes_consumed)
+	    (s->z.total_in & ULONG_MAX_VALUE) !=
+	    ((s->total_in + bytes_consumed) & ULONG_MAX_VALUE))
 		BUG("total_in mismatch");
 
-	s->total_out = s->z.total_out;
-	s->total_in = s->z.total_in;
+	s->total_out += bytes_produced;
+	s->total_in += bytes_consumed;
 	/* zlib-ng marks `next_in` as `const`, so we have to cast it away. */
 	s->next_in = (unsigned char *) s->z.next_in;
 	s->next_out = s->z.next_out;
diff --git a/git-zlib.h b/git-zlib.h
index 0e66fefa8c..44380e8ad3 100644
--- a/git-zlib.h
+++ b/git-zlib.h
@@ -7,8 +7,8 @@ typedef struct git_zstream {
 	struct z_stream_s z;
 	unsigned long avail_in;
 	unsigned long avail_out;
-	unsigned long total_in;
-	unsigned long total_out;
+	size_t total_in;
+	size_t total_out;
 	unsigned char *next_in;
 	unsigned char *next_out;
 } git_zstream;
diff --git a/object-file.c b/object-file.c
index 2acc9522df..086b2b65ff 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1118,7 +1118,7 @@ int odb_source_loose_write_stream(struct odb_source *source,
 	} while (ret == Z_OK || ret == Z_BUF_ERROR);
 
 	if (stream.total_in != len + hdrlen)
-		die(_("write stream object %ld != %"PRIuMAX), stream.total_in,
+		die(_("write stream object %"PRIuMAX" != %"PRIuMAX), (uintmax_t)stream.total_in,
 		    (uintmax_t)len + hdrlen);
 
 	/*
-- 
gitgitgadget


  parent reply	other threads:[~2026-05-08  8:16 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28 16:26 [PATCH 0/6] Handle cloning of objects larger than 4GB on Windows Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 1/6] index-pack, unpack-objects: use size_t for object size Johannes Schindelin via GitGitGadget
2026-04-30 14:13   ` Torsten Bögershausen
2026-05-03 14:46     ` Johannes Schindelin
2026-04-28 16:26 ` [PATCH 2/6] git-zlib: handle data streams larger than 4GB Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 3/6] odb, packfile: use size_t for streaming object sizes Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 4/6] delta, packfile: use size_t for delta header sizes Johannes Schindelin via GitGitGadget
2026-04-29 13:28   ` Derrick Stolee
2026-05-03 14:49     ` Johannes Schindelin
2026-04-28 16:26 ` [PATCH 5/6] test-tool: add a helper to synthesize large packfiles Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 6/6] t5608: add regression test for >4GB object clone Johannes Schindelin via GitGitGadget
2026-04-29 13:34   ` Derrick Stolee
2026-05-01  6:38     ` Jeff King
2026-05-01 13:19       ` Derrick Stolee
2026-05-04 17:07         ` Johannes Schindelin
2026-04-29 13:35 ` [PATCH 0/6] Handle cloning of objects larger than 4GB on Windows Derrick Stolee
2026-05-04 17:08 ` [PATCH v2 00/11] " Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 01/11] index-pack, unpack-objects: use size_t for object size Johannes Schindelin via GitGitGadget
2026-05-05 19:11     ` Torsten Bögershausen
2026-05-08  7:36       ` Johannes Schindelin
2026-05-08 19:09         ` Torsten Bögershausen
2026-05-10  2:41           ` Junio C Hamano
2026-05-10  9:14             ` Torsten Bögershausen
2026-05-04 17:08   ` [PATCH v2 02/11] git-zlib: handle data streams larger than 4GB Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 03/11] odb, packfile: use size_t for streaming object sizes Johannes Schindelin via GitGitGadget
2026-05-05 19:27     ` Torsten Bögershausen
2026-05-08  7:38       ` Johannes Schindelin
2026-05-04 17:08   ` [PATCH v2 04/11] delta, packfile: use size_t for delta header sizes Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 05/11] test-tool: add a helper to synthesize large packfiles Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 06/11] t5608: add regression test for >4GB object clone Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 07/11] test-tool synthesize: use the unsafe hash for speed Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 08/11] test-tool synthesize: precompute pack for 4 GiB + 1 Johannes Schindelin via GitGitGadget
2026-05-04 18:27     ` Derrick Stolee
2026-05-05 20:54       ` Johannes Schindelin
2026-05-04 17:08   ` [PATCH v2 09/11] test-tool synthesize: add precomputed SHA-256 " Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 10/11] t5608: mark >4GB tests as EXPENSIVE Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 11/11] ci: run expensive tests on push builds to integration branches Johannes Schindelin via GitGitGadget
2026-05-04 18:35     ` Derrick Stolee
2026-05-05 12:56       ` Junio C Hamano
2026-05-05 23:07         ` Junio C Hamano
2026-05-06  8:33           ` Johannes Schindelin
2026-05-07  9:18             ` Junio C Hamano
2026-05-07 10:24               ` Patrick Steinhardt
2026-05-08  2:50         ` Junio C Hamano
2026-05-08  8:16   ` [PATCH v3 00/11] Handle cloning of objects larger than 4GB on Windows Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 01/11] index-pack, unpack-objects: use size_t for object size Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` Johannes Schindelin via GitGitGadget [this message]
2026-05-08  8:16     ` [PATCH v3 03/11] odb, packfile: use size_t for streaming object sizes Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 04/11] delta, packfile: use size_t for delta header sizes Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 05/11] test-tool: add a helper to synthesize large packfiles Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 06/11] t5608: add regression test for >4GB object clone Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 07/11] test-tool synthesize: use the unsafe hash for speed Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 08/11] test-tool synthesize: precompute pack for 4 GiB + 1 Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 09/11] test-tool synthesize: add precomputed SHA-256 " Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 10/11] t5608: mark >4GB tests as EXPENSIVE Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 11/11] ci: run expensive tests on push builds to integration branches Johannes Schindelin via GitGitGadget
2026-05-10 23:51       ` [PATCH] ci: enable EXPENSIVE for contributor builds Junio C Hamano
2026-05-11  7:05         ` Patrick Steinhardt
2026-05-11  8:29           ` Junio C Hamano
2026-05-11 10:02             ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c611913194cab1fcba5f990bf44ba15f721e1223.1778228209.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox