All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <stolee@gmail.com>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"Jeff King" <peff@peff.net>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>
Subject: [PATCH v2 00/11] Handle cloning of objects larger than 4GB on Windows
Date: Mon, 04 May 2026 17:08:17 +0000	[thread overview]
Message-ID: <pull.2102.v2.git.1777914508.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2102.git.1777393580.gitgitgadget@gmail.com>

On Windows, unsigned long is 32-bit even on 64-bit systems. This causes
multiple problems when Git handles objects larger than 4GB. This patch
series is a very targeted fix for a very early part of the problem: it
addresses the most fundamental truncation points that prevent a >4GB object
from surviving a clone at all.

Specifically, this fixes:

 * zlib's uLong wrapping and triggering BUG() assertions in the git_zstream
   wrapper
 * Object sizes being truncated in pack streaming, delta headers, and
   index-pack/unpack-objects
 * pack-objects re-encoding reused pack entries with a truncated size,
   producing corrupt packs on the wire

Many other code paths still use unsigned long for object sizes (e.g.,
cat-file -s, object_info.sizep, the delta machinery) and will need their own
conversions. This series does not attempt to fix those.

Based on work by @LordKiRon in git-for-windows/git#6076.

The last two commits add a test helper that synthesizes a pack with a >4GB
blob and regression tests that clone it via both the unpack-objects and
index-pack code paths using file:// transport.

Changes since v1:

 * dramatically accelerated the test helper that generates 4GB pack files,
   via two separate strategies:
   1. using the "unsafe" SHA-1 for the blob OID computation.
   2. using pre-computed "Lego blocks" to construct the 4GB packs needed in
      the test cases, where the size (and therefore the involved OIDs) are
      well-known in advance.
 * even with these improvements, the actual git clone is still slow (of
   course, because it cannot use any of those shortcuts), therefore the
   tests are marked as EXPENSIVE.
 * to exercise those tests nevertheless, the last patch lets all EXPENSIVE
   test cases be run for the integration branches other than seen.

Johannes Schindelin (11):
  index-pack, unpack-objects: use size_t for object size
  git-zlib: handle data streams larger than 4GB
  odb, packfile: use size_t for streaming object sizes
  delta, packfile: use size_t for delta header sizes
  test-tool: add a helper to synthesize large packfiles
  t5608: add regression test for >4GB object clone
  test-tool synthesize: use the unsafe hash for speed
  test-tool synthesize: precompute pack for 4 GiB + 1
  test-tool synthesize: add precomputed SHA-256 pack for 4 GiB + 1
  t5608: mark >4GB tests as EXPENSIVE
  ci: run expensive tests on push builds to integration branches

 Makefile                     |   1 +
 builtin/index-pack.c         |   9 +-
 builtin/pack-objects.c       |  23 +-
 builtin/unpack-objects.c     |   5 +-
 ci/lib.sh                    |   9 +
 compat/zlib-compat.h         |   2 +
 delta.h                      |  14 +-
 git-zlib.c                   |  25 +-
 git-zlib.h                   |   4 +-
 object-file.c                |  12 +-
 odb/streaming.c              |  13 +-
 odb/streaming.h              |   2 +-
 oss-fuzz/fuzz-pack-headers.c |   2 +-
 pack-bitmap.c                |   2 +-
 pack-check.c                 |   6 +-
 packfile.c                   |  57 ++--
 packfile.h                   |   4 +-
 t/helper/meson.build         |   1 +
 t/helper/test-synthesize.c   | 541 +++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c         |   1 +
 t/helper/test-tool.h         |   1 +
 t/t5608-clone-2gb.sh         |  37 +++
 22 files changed, 718 insertions(+), 53 deletions(-)
 create mode 100644 t/helper/test-synthesize.c


base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2102%2Fdscho%2Ffix-large-clones-on-windows-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2102/dscho/fix-large-clones-on-windows-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2102

Range-diff vs v1:

  1:  dc660106ea =  1:  dc660106ea index-pack, unpack-objects: use size_t for object size
  2:  92f4327b1f =  2:  92f4327b1f git-zlib: handle data streams larger than 4GB
  3:  3a539061c5 =  3:  3a539061c5 odb, packfile: use size_t for streaming object sizes
  4:  3274cba862 =  4:  3274cba862 delta, packfile: use size_t for delta header sizes
  5:  afa74a3a2b =  5:  afa74a3a2b test-tool: add a helper to synthesize large packfiles
  6:  a3019888d8 =  6:  a3019888d8 t5608: add regression test for >4GB object clone
  -:  ---------- >  7:  859e93e7a9 test-tool synthesize: use the unsafe hash for speed
  -:  ---------- >  8:  29b9a74e91 test-tool synthesize: precompute pack for 4 GiB + 1
  -:  ---------- >  9:  8e6e720804 test-tool synthesize: add precomputed SHA-256 pack for 4 GiB + 1
  -:  ---------- > 10:  5b44410b2f t5608: mark >4GB tests as EXPENSIVE
  -:  ---------- > 11:  1eaaa7fad7 ci: run expensive tests on push builds to integration branches

-- 
gitgitgadget

  parent reply	other threads:[~2026-05-04 17:08 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28 16:26 [PATCH 0/6] Handle cloning of objects larger than 4GB on Windows Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 1/6] index-pack, unpack-objects: use size_t for object size Johannes Schindelin via GitGitGadget
2026-04-30 14:13   ` Torsten Bögershausen
2026-05-03 14:46     ` Johannes Schindelin
2026-04-28 16:26 ` [PATCH 2/6] git-zlib: handle data streams larger than 4GB Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 3/6] odb, packfile: use size_t for streaming object sizes Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 4/6] delta, packfile: use size_t for delta header sizes Johannes Schindelin via GitGitGadget
2026-04-29 13:28   ` Derrick Stolee
2026-05-03 14:49     ` Johannes Schindelin
2026-04-28 16:26 ` [PATCH 5/6] test-tool: add a helper to synthesize large packfiles Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 6/6] t5608: add regression test for >4GB object clone Johannes Schindelin via GitGitGadget
2026-04-29 13:34   ` Derrick Stolee
2026-05-01  6:38     ` Jeff King
2026-05-01 13:19       ` Derrick Stolee
2026-05-04 17:07         ` Johannes Schindelin
2026-04-29 13:35 ` [PATCH 0/6] Handle cloning of objects larger than 4GB on Windows Derrick Stolee
2026-05-04 17:08 ` Johannes Schindelin via GitGitGadget [this message]
2026-05-04 17:08   ` [PATCH v2 01/11] index-pack, unpack-objects: use size_t for object size Johannes Schindelin via GitGitGadget
2026-05-05 19:11     ` Torsten Bögershausen
2026-05-08  7:36       ` Johannes Schindelin
2026-05-08 19:09         ` Torsten Bögershausen
2026-05-10  2:41           ` Junio C Hamano
2026-05-10  9:14             ` Torsten Bögershausen
2026-05-04 17:08   ` [PATCH v2 02/11] git-zlib: handle data streams larger than 4GB Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 03/11] odb, packfile: use size_t for streaming object sizes Johannes Schindelin via GitGitGadget
2026-05-05 19:27     ` Torsten Bögershausen
2026-05-08  7:38       ` Johannes Schindelin
2026-05-04 17:08   ` [PATCH v2 04/11] delta, packfile: use size_t for delta header sizes Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 05/11] test-tool: add a helper to synthesize large packfiles Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 06/11] t5608: add regression test for >4GB object clone Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 07/11] test-tool synthesize: use the unsafe hash for speed Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 08/11] test-tool synthesize: precompute pack for 4 GiB + 1 Johannes Schindelin via GitGitGadget
2026-05-04 18:27     ` Derrick Stolee
2026-05-05 20:54       ` Johannes Schindelin
2026-05-04 17:08   ` [PATCH v2 09/11] test-tool synthesize: add precomputed SHA-256 " Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 10/11] t5608: mark >4GB tests as EXPENSIVE Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 11/11] ci: run expensive tests on push builds to integration branches Johannes Schindelin via GitGitGadget
2026-05-04 18:35     ` Derrick Stolee
2026-05-05 12:56       ` Junio C Hamano
2026-05-05 23:07         ` Junio C Hamano
2026-05-06  8:33           ` Johannes Schindelin
2026-05-07  9:18             ` Junio C Hamano
2026-05-07 10:24               ` Patrick Steinhardt
2026-05-08  2:50         ` Junio C Hamano
2026-05-08  8:16   ` [PATCH v3 00/11] Handle cloning of objects larger than 4GB on Windows Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 01/11] index-pack, unpack-objects: use size_t for object size Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 02/11] git-zlib: handle data streams larger than 4GB Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 03/11] odb, packfile: use size_t for streaming object sizes Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 04/11] delta, packfile: use size_t for delta header sizes Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 05/11] test-tool: add a helper to synthesize large packfiles Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 06/11] t5608: add regression test for >4GB object clone Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 07/11] test-tool synthesize: use the unsafe hash for speed Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 08/11] test-tool synthesize: precompute pack for 4 GiB + 1 Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 09/11] test-tool synthesize: add precomputed SHA-256 " Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 10/11] t5608: mark >4GB tests as EXPENSIVE Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 11/11] ci: run expensive tests on push builds to integration branches Johannes Schindelin via GitGitGadget
2026-05-10 23:51       ` [PATCH] ci: enable EXPENSIVE for contributor builds Junio C Hamano
2026-05-11  7:05         ` Patrick Steinhardt
2026-05-11  8:29           ` Junio C Hamano
2026-05-11 10:02             ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2102.v2.git.1777914508.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.