From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <stolee@gmail.com>,
"Torsten Bögershausen" <tboegi@web.de>,
"Jeff King" <peff@peff.net>,
"Johannes Schindelin" <johannes.schindelin@gmx.de>
Subject: [PATCH v2 00/11] Handle cloning of objects larger than 4GB on Windows
Date: Mon, 04 May 2026 17:08:17 +0000 [thread overview]
Message-ID: <pull.2102.v2.git.1777914508.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2102.git.1777393580.gitgitgadget@gmail.com>
On Windows, unsigned long is 32-bit even on 64-bit systems. This causes
multiple problems when Git handles objects larger than 4GB. This patch
series is a very targeted fix for a very early part of the problem: it
addresses the most fundamental truncation points that prevent a >4GB object
from surviving a clone at all.
Specifically, this fixes:
* zlib's uLong wrapping and triggering BUG() assertions in the git_zstream
wrapper
* Object sizes being truncated in pack streaming, delta headers, and
index-pack/unpack-objects
* pack-objects re-encoding reused pack entries with a truncated size,
producing corrupt packs on the wire
Many other code paths still use unsigned long for object sizes (e.g.,
cat-file -s, object_info.sizep, the delta machinery) and will need their own
conversions. This series does not attempt to fix those.
Based on work by @LordKiRon in git-for-windows/git#6076.
The last two commits add a test helper that synthesizes a pack with a >4GB
blob and regression tests that clone it via both the unpack-objects and
index-pack code paths using file:// transport.
Changes since v1:
* dramatically accelerated the test helper that generates 4GB pack files,
via two separate strategies:
1. using the "unsafe" SHA-1 for the blob OID computation.
2. using pre-computed "Lego blocks" to construct the 4GB packs needed in
the test cases, where the size (and therefore the involved OIDs) are
well-known in advance.
* even with these improvements, the actual git clone is still slow (of
course, because it cannot use any of those shortcuts), therefore the
tests are marked as EXPENSIVE.
* to exercise those tests nevertheless, the last patch lets all EXPENSIVE
test cases be run for the integration branches other than seen.
Johannes Schindelin (11):
index-pack, unpack-objects: use size_t for object size
git-zlib: handle data streams larger than 4GB
odb, packfile: use size_t for streaming object sizes
delta, packfile: use size_t for delta header sizes
test-tool: add a helper to synthesize large packfiles
t5608: add regression test for >4GB object clone
test-tool synthesize: use the unsafe hash for speed
test-tool synthesize: precompute pack for 4 GiB + 1
test-tool synthesize: add precomputed SHA-256 pack for 4 GiB + 1
t5608: mark >4GB tests as EXPENSIVE
ci: run expensive tests on push builds to integration branches
Makefile | 1 +
builtin/index-pack.c | 9 +-
builtin/pack-objects.c | 23 +-
builtin/unpack-objects.c | 5 +-
ci/lib.sh | 9 +
compat/zlib-compat.h | 2 +
delta.h | 14 +-
git-zlib.c | 25 +-
git-zlib.h | 4 +-
object-file.c | 12 +-
odb/streaming.c | 13 +-
odb/streaming.h | 2 +-
oss-fuzz/fuzz-pack-headers.c | 2 +-
pack-bitmap.c | 2 +-
pack-check.c | 6 +-
packfile.c | 57 ++--
packfile.h | 4 +-
t/helper/meson.build | 1 +
t/helper/test-synthesize.c | 541 +++++++++++++++++++++++++++++++++++
t/helper/test-tool.c | 1 +
t/helper/test-tool.h | 1 +
t/t5608-clone-2gb.sh | 37 +++
22 files changed, 718 insertions(+), 53 deletions(-)
create mode 100644 t/helper/test-synthesize.c
base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2102%2Fdscho%2Ffix-large-clones-on-windows-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2102/dscho/fix-large-clones-on-windows-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2102
Range-diff vs v1:
1: dc660106ea = 1: dc660106ea index-pack, unpack-objects: use size_t for object size
2: 92f4327b1f = 2: 92f4327b1f git-zlib: handle data streams larger than 4GB
3: 3a539061c5 = 3: 3a539061c5 odb, packfile: use size_t for streaming object sizes
4: 3274cba862 = 4: 3274cba862 delta, packfile: use size_t for delta header sizes
5: afa74a3a2b = 5: afa74a3a2b test-tool: add a helper to synthesize large packfiles
6: a3019888d8 = 6: a3019888d8 t5608: add regression test for >4GB object clone
-: ---------- > 7: 859e93e7a9 test-tool synthesize: use the unsafe hash for speed
-: ---------- > 8: 29b9a74e91 test-tool synthesize: precompute pack for 4 GiB + 1
-: ---------- > 9: 8e6e720804 test-tool synthesize: add precomputed SHA-256 pack for 4 GiB + 1
-: ---------- > 10: 5b44410b2f t5608: mark >4GB tests as EXPENSIVE
-: ---------- > 11: 1eaaa7fad7 ci: run expensive tests on push builds to integration branches
--
gitgitgadget
next prev parent reply other threads:[~2026-05-04 17:08 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 16:26 [PATCH 0/6] Handle cloning of objects larger than 4GB on Windows Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 1/6] index-pack, unpack-objects: use size_t for object size Johannes Schindelin via GitGitGadget
2026-04-30 14:13 ` Torsten Bögershausen
2026-05-03 14:46 ` Johannes Schindelin
2026-04-28 16:26 ` [PATCH 2/6] git-zlib: handle data streams larger than 4GB Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 3/6] odb, packfile: use size_t for streaming object sizes Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 4/6] delta, packfile: use size_t for delta header sizes Johannes Schindelin via GitGitGadget
2026-04-29 13:28 ` Derrick Stolee
2026-05-03 14:49 ` Johannes Schindelin
2026-04-28 16:26 ` [PATCH 5/6] test-tool: add a helper to synthesize large packfiles Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 6/6] t5608: add regression test for >4GB object clone Johannes Schindelin via GitGitGadget
2026-04-29 13:34 ` Derrick Stolee
2026-05-01 6:38 ` Jeff King
2026-05-01 13:19 ` Derrick Stolee
2026-05-04 17:07 ` Johannes Schindelin
2026-04-29 13:35 ` [PATCH 0/6] Handle cloning of objects larger than 4GB on Windows Derrick Stolee
2026-05-04 17:08 ` Johannes Schindelin via GitGitGadget [this message]
2026-05-04 17:08 ` [PATCH v2 01/11] index-pack, unpack-objects: use size_t for object size Johannes Schindelin via GitGitGadget
2026-05-05 19:11 ` Torsten Bögershausen
2026-05-08 7:36 ` Johannes Schindelin
2026-05-08 19:09 ` Torsten Bögershausen
2026-05-10 2:41 ` Junio C Hamano
2026-05-10 9:14 ` Torsten Bögershausen
2026-05-04 17:08 ` [PATCH v2 02/11] git-zlib: handle data streams larger than 4GB Johannes Schindelin via GitGitGadget
2026-05-04 17:08 ` [PATCH v2 03/11] odb, packfile: use size_t for streaming object sizes Johannes Schindelin via GitGitGadget
2026-05-05 19:27 ` Torsten Bögershausen
2026-05-08 7:38 ` Johannes Schindelin
2026-05-04 17:08 ` [PATCH v2 04/11] delta, packfile: use size_t for delta header sizes Johannes Schindelin via GitGitGadget
2026-05-04 17:08 ` [PATCH v2 05/11] test-tool: add a helper to synthesize large packfiles Johannes Schindelin via GitGitGadget
2026-05-04 17:08 ` [PATCH v2 06/11] t5608: add regression test for >4GB object clone Johannes Schindelin via GitGitGadget
2026-05-04 17:08 ` [PATCH v2 07/11] test-tool synthesize: use the unsafe hash for speed Johannes Schindelin via GitGitGadget
2026-05-04 17:08 ` [PATCH v2 08/11] test-tool synthesize: precompute pack for 4 GiB + 1 Johannes Schindelin via GitGitGadget
2026-05-04 18:27 ` Derrick Stolee
2026-05-05 20:54 ` Johannes Schindelin
2026-05-04 17:08 ` [PATCH v2 09/11] test-tool synthesize: add precomputed SHA-256 " Johannes Schindelin via GitGitGadget
2026-05-04 17:08 ` [PATCH v2 10/11] t5608: mark >4GB tests as EXPENSIVE Johannes Schindelin via GitGitGadget
2026-05-04 17:08 ` [PATCH v2 11/11] ci: run expensive tests on push builds to integration branches Johannes Schindelin via GitGitGadget
2026-05-04 18:35 ` Derrick Stolee
2026-05-05 12:56 ` Junio C Hamano
2026-05-05 23:07 ` Junio C Hamano
2026-05-06 8:33 ` Johannes Schindelin
2026-05-07 9:18 ` Junio C Hamano
2026-05-07 10:24 ` Patrick Steinhardt
2026-05-08 2:50 ` Junio C Hamano
2026-05-08 8:16 ` [PATCH v3 00/11] Handle cloning of objects larger than 4GB on Windows Johannes Schindelin via GitGitGadget
2026-05-08 8:16 ` [PATCH v3 01/11] index-pack, unpack-objects: use size_t for object size Johannes Schindelin via GitGitGadget
2026-05-08 8:16 ` [PATCH v3 02/11] git-zlib: handle data streams larger than 4GB Johannes Schindelin via GitGitGadget
2026-05-08 8:16 ` [PATCH v3 03/11] odb, packfile: use size_t for streaming object sizes Johannes Schindelin via GitGitGadget
2026-05-08 8:16 ` [PATCH v3 04/11] delta, packfile: use size_t for delta header sizes Johannes Schindelin via GitGitGadget
2026-05-08 8:16 ` [PATCH v3 05/11] test-tool: add a helper to synthesize large packfiles Johannes Schindelin via GitGitGadget
2026-05-08 8:16 ` [PATCH v3 06/11] t5608: add regression test for >4GB object clone Johannes Schindelin via GitGitGadget
2026-05-08 8:16 ` [PATCH v3 07/11] test-tool synthesize: use the unsafe hash for speed Johannes Schindelin via GitGitGadget
2026-05-08 8:16 ` [PATCH v3 08/11] test-tool synthesize: precompute pack for 4 GiB + 1 Johannes Schindelin via GitGitGadget
2026-05-08 8:16 ` [PATCH v3 09/11] test-tool synthesize: add precomputed SHA-256 " Johannes Schindelin via GitGitGadget
2026-05-08 8:16 ` [PATCH v3 10/11] t5608: mark >4GB tests as EXPENSIVE Johannes Schindelin via GitGitGadget
2026-05-08 8:16 ` [PATCH v3 11/11] ci: run expensive tests on push builds to integration branches Johannes Schindelin via GitGitGadget
2026-05-10 23:51 ` [PATCH] ci: enable EXPENSIVE for contributor builds Junio C Hamano
2026-05-11 7:05 ` Patrick Steinhardt
2026-05-11 8:29 ` Junio C Hamano
2026-05-11 10:02 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.2102.v2.git.1777914508.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=johannes.schindelin@gmx.de \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.