Git development
 help / color / mirror / Atom feed
From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Kristofer Karlsson <krka@spotify.com>,
	Patrick Steinhardt <ps@pks.im>,
	Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: [PATCH v2 0/7] More work supporting objects larger than 4GB on Windows
Date: Mon, 15 Jun 2026 11:52:22 +0000	[thread overview]
Message-ID: <pull.2137.v2.git.1781524349.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2137.git.1780570272.gitgitgadget@gmail.com>

This patch series tries to address the problems pointed out by the expensive
tests that now run in CI: t5608 and t7508 verify various aspects about
objects larger than 4GB, which Git does not currently handle correctly when
run on a platform where size_t is 64-bit and unsigned long is 32-bit.

Changes vs v1:

 * Rebased onto master, which merged ps/odb-source-loose (with which these
   patches previously conflicted rather badly).
 * Removed superfluous size_t s variables (thanks, Patrick!).

Johannes Schindelin (7):
  compat/msvc: use _chsize_s for ftruncate
  patch-delta: use size_t for sizes
  pack-objects(check_pack_inflate()): use size_t instead of unsigned
    long
  packfile: widen unpack_entry()'s size out-parameter to size_t
  pack-objects: use size_t for in-core object sizes
  packfile,delta: drop the `cast_size_t_to_ulong()` wrappers
  odb: use size_t for object_info.sizep and the size APIs

 apply.c                       |  8 ++--
 archive.c                     |  4 +-
 attr.c                        |  2 +-
 bisect.c                      |  2 +-
 blame.c                       | 15 +++++--
 builtin/cat-file.c            | 61 ++++++++++++++---------------
 builtin/difftool.c            |  2 +-
 builtin/fast-export.c         |  7 +++-
 builtin/fast-import.c         | 29 ++++++++++----
 builtin/fsck.c                |  2 +-
 builtin/grep.c                | 12 +++---
 builtin/index-pack.c          | 10 ++---
 builtin/log.c                 |  2 +-
 builtin/ls-files.c            |  2 +-
 builtin/ls-tree.c             |  4 +-
 builtin/merge-tree.c          |  6 +--
 builtin/mktag.c               |  2 +-
 builtin/notes.c               |  6 +--
 builtin/pack-objects.c        | 73 +++++++++++++++++++++--------------
 builtin/repo.c                |  4 +-
 builtin/tag.c                 |  4 +-
 builtin/unpack-file.c         |  2 +-
 builtin/unpack-objects.c      |  8 ++--
 bundle.c                      |  2 +-
 combine-diff.c                |  4 +-
 commit.c                      | 10 ++---
 compat/msvc-posix.h           | 24 +++++++++++-
 config.c                      |  2 +-
 delta.h                       | 20 +++-------
 diff.c                        |  5 ++-
 dir.c                         |  2 +-
 entry.c                       |  4 +-
 fmt-merge-msg.c               |  4 +-
 fsck.c                        |  2 +-
 grep.c                        |  4 +-
 http-push.c                   |  2 +-
 list-objects-filter.c         |  2 +-
 mailmap.c                     |  2 +-
 match-trees.c                 |  4 +-
 merge-blobs.c                 |  6 +--
 merge-blobs.h                 |  2 +-
 merge-ort.c                   |  2 +-
 notes-cache.c                 |  2 +-
 notes-merge.c                 |  2 +-
 notes.c                       |  8 ++--
 object-file.c                 |  6 +--
 object.c                      |  2 +-
 odb.c                         | 12 +++---
 odb.h                         | 10 ++---
 odb/source-loose.c            | 12 +-----
 odb/streaming.c               | 13 +------
 pack-bitmap.c                 |  4 +-
 pack-check.c                  |  5 +--
 pack-objects.h                |  2 +-
 packfile.c                    | 54 ++++++++++----------------
 packfile.h                    |  5 ++-
 patch-delta.c                 |  8 ++--
 path-walk.c                   |  2 +-
 protocol-caps.c               |  5 ++-
 read-cache.c                  |  6 +--
 ref-filter.c                  |  2 +-
 reflog.c                      |  2 +-
 rerere.c                      |  2 +-
 submodule-config.c            |  2 +-
 t/helper/test-delta.c         | 10 +++--
 t/helper/test-pack-deltas.c   |  3 +-
 t/helper/test-partial-clone.c |  2 +-
 t/unit-tests/u-odb-inmemory.c |  2 +-
 tag.c                         |  4 +-
 tree-walk.c                   | 10 +++--
 tree.c                        |  2 +-
 xdiff-interface.c             |  2 +-
 72 files changed, 300 insertions(+), 271 deletions(-)


base-commit: ea97ad8d017de0c9037451a78008a0fd60abea0c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2137%2Fdscho%2Fobjects-larger-than-4gb-on-windows-pt2-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2137/dscho/objects-larger-than-4gb-on-windows-pt2-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2137

Range-diff vs v1:

 1:  de9fc5c455 = 1:  531bca775c compat/msvc: use _chsize_s for ftruncate
 2:  1fd7646ca1 = 2:  66a642c39e patch-delta: use size_t for sizes
 3:  ddb75326cd = 3:  271a5299e3 pack-objects(check_pack_inflate()): use size_t instead of unsigned long
 4:  bdebc36f21 = 4:  5c329535df packfile: widen unpack_entry()'s size out-parameter to size_t
 5:  68750ba2d1 = 5:  01b9209b26 pack-objects: use size_t for in-core object sizes
 6:  460d733fee = 6:  12c142f8ab packfile,delta: drop the `cast_size_t_to_ulong()` wrappers
 7:  f3aeae983a ! 7:  37d030d867 odb: use size_t for object_info.sizep and the size APIs
     @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const
       	struct object_info oi = OBJECT_INFO_INIT;
       	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
      @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
     - 		if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) {
     - 			size_t s = size;
     - 			buf = replace_idents_using_mailmap(buf, &s);
     + 		if (odb_read_object_info_extended(the_repository->objects, &oid, &oi, flags) < 0)
     + 			die("git cat-file: could not get object info");
     + 
     +-		if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) {
     +-			size_t s = size;
     +-			buf = replace_idents_using_mailmap(buf, &s);
      -			size = cast_size_t_to_ulong(s);
     -+			size = s;
     - 		}
     +-		}
     ++		if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG))
     ++			buf = replace_idents_using_mailmap(buf, &size);
       
       		printf("%"PRIuMAX"\n", (uintmax_t)size);
     + 		ret = 0;
      @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
       		break;
       
     @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const
       
       	case 'p':
      @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
     - 		if (use_mailmap) {
     - 			size_t s = size;
     - 			buf = replace_idents_using_mailmap(buf, &s);
     + 		if (!buf)
     + 			die("Cannot read object %s", obj_name);
     + 
     +-		if (use_mailmap) {
     +-			size_t s = size;
     +-			buf = replace_idents_using_mailmap(buf, &s);
      -			size = cast_size_t_to_ulong(s);
     -+			size = s;
     - 		}
     +-		}
     ++		if (use_mailmap)
     ++			buf = replace_idents_using_mailmap(buf, &size);
       
       		/* otherwise just spit out the data */
     + 		break;
      @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
     - 		if (use_mailmap) {
     - 			size_t s = size;
     - 			buf = replace_idents_using_mailmap(buf, &s);
     + 		buf = odb_read_object_peeled(the_repository->objects, &oid,
     + 					     exp_type_id, &size, NULL);
     + 
     +-		if (use_mailmap) {
     +-			size_t s = size;
     +-			buf = replace_idents_using_mailmap(buf, &s);
      -			size = cast_size_t_to_ulong(s);
     -+			size = s;
     - 		}
     +-		}
     ++		if (use_mailmap)
     ++			buf = replace_idents_using_mailmap(buf, &size);
       		break;
       	}
     + 	default:
      @@ builtin/cat-file.c: cleanup:
       struct expand_data {
       	struct object_id oid;
     @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, s
       
       		contents = odb_read_object(the_repository->objects, oid,
      @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, struct expand_data *d
     - 		if (use_mailmap) {
     - 			size_t s = size;
     - 			contents = replace_idents_using_mailmap(contents, &s);
     + 		if (!contents)
     + 			die("object %s disappeared", oid_to_hex(oid));
     + 
     +-		if (use_mailmap) {
     +-			size_t s = size;
     +-			contents = replace_idents_using_mailmap(contents, &s);
      -			size = cast_size_t_to_ulong(s);
     -+			size = s;
     - 		}
     +-		}
     ++		if (use_mailmap)
     ++			contents = replace_idents_using_mailmap(contents, &size);
       
       		if (type != data->type)
     + 			die("object %s changed type!?", oid_to_hex(oid));
      @@ builtin/cat-file.c: static void batch_object_write(const char *obj_name,
     + 		}
     + 
     + 		if (use_mailmap && (data->type == OBJ_COMMIT || data->type == OBJ_TAG)) {
     +-			size_t s = data->size;
     + 			char *buf = NULL;
     + 
     + 			buf = odb_read_object(the_repository->objects, &data->oid,
     + 					      &data->type, &data->size);
       			if (!buf)
       				die(_("unable to read %s"), oid_to_hex(&data->oid));
     - 			buf = replace_idents_using_mailmap(buf, &s);
     +-			buf = replace_idents_using_mailmap(buf, &s);
      -			data->size = cast_size_t_to_ulong(s);
     -+			data->size = s;
     ++			buf = replace_idents_using_mailmap(buf, &data->size);
       
       			free(buf);
       		}
     @@ builtin/log.c: static int show_blob_object(const struct object_id *oid, struct r
      
       ## builtin/ls-files.c ##
      @@ builtin/ls-files.c: static void expand_objectsize(struct repository *repo, struct strbuf *line,
     - 			      const enum object_type type, unsigned int padded)
     - {
     + 	size_t len;
     + 
       	if (type == OBJ_BLOB) {
      -		unsigned long size;
      +		size_t size;
     @@ builtin/ls-files.c: static void expand_objectsize(struct repository *repo, struc
      
       ## builtin/ls-tree.c ##
      @@ builtin/ls-tree.c: static void expand_objectsize(struct strbuf *line, const struct object_id *oid,
     - 			      const enum object_type type, unsigned int padded)
     - {
     + 	size_t len;
     + 
       	if (type == OBJ_BLOB) {
      -		unsigned long size;
      +		size_t size;
     @@ notes.c: static void format_note(struct notes_tree *t, const struct object_id *o
       	if (!t)
      
       ## object-file.c ##
     -@@ object-file.c: static int parse_loose_header(const char *hdr, struct object_info *oi)
     +@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi)
       	}
       
       	if (oi->sizep)
     @@ object-file.c: static int parse_loose_header(const char *hdr, struct object_info
       
       	/*
       	 * The length must be followed by a zero byte
     -@@ object-file.c: static int read_object_info_from_path(struct odb_source *source,
     - 	void *map = NULL;
     - 	git_zstream stream, *stream_to_end = NULL;
     - 	char hdr[MAX_HEADER_LEN];
     --	unsigned long size_scratch;
     -+	size_t size_scratch;
     - 	enum object_type type_scratch;
     - 	struct stat st;
     - 
      @@ object-file.c: int force_object_loose(struct odb_source *source,
     - {
     + 	struct odb_source_files *files = odb_source_files_downcast(source);
       	const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
       	void *buf;
      -	unsigned long len;
     @@ object-file.c: int read_loose_object(struct repository *repo,
       
       	fd = git_open(path);
       	if (fd >= 0)
     -@@ object-file.c: int odb_source_loose_read_object_stream(struct odb_read_stream **out,
     - 	struct object_info oi = OBJECT_INFO_INIT;
     - 	struct odb_loose_read_stream *st;
     - 	unsigned long mapsize;
     --	unsigned long size_ul;
     - 	void *mapped;
     - 
     - 	mapped = odb_source_loose_map_object(source, oid, &mapsize);
     -@@ object-file.c: int odb_source_loose_read_object_stream(struct odb_read_stream **out,
     - 		goto error;
     - 	}
     - 
     --	/*
     --	 * object_info.sizep is unsigned long* (32-bit on Windows), but
     --	 * st->base.size is size_t (64-bit). Use temporary variable.
     --	 * Note: loose objects >4GB would still truncate here, but such
     --	 * large loose objects are uncommon (they'd normally be packed).
     --	 */
     --	oi.sizep = &size_ul;
     -+	oi.sizep = &st->base.size;
     - 	oi.typep = &st->base.type;
     - 
     - 	if (parse_loose_header(st->hdr, &oi) < 0 || st->base.type < 0)
     - 		goto error;
     --	st->base.size = size_ul;
     - 
     - 	st->mapped = mapped;
     - 	st->mapsize = mapsize;
      
       ## object.c ##
      @@ object.c: struct object *parse_object_with_flags(struct repository *r,
     @@ odb.h: int odb_read_object_info_extended(struct object_database *odb,
       enum odb_has_object_flags {
       	/* Retry packed storage after checking packed and loose storage */
      
     + ## odb/source-loose.c ##
     +@@ odb/source-loose.c: static int read_object_info_from_path(struct odb_source_loose *loose,
     + 	void *map = NULL;
     + 	git_zstream stream, *stream_to_end = NULL;
     + 	char hdr[MAX_HEADER_LEN];
     +-	unsigned long size_scratch;
     ++	size_t size_scratch;
     + 	enum object_type type_scratch;
     + 	struct stat st;
     + 
     +@@ odb/source-loose.c: static int odb_source_loose_read_object_stream(struct odb_read_stream **out,
     + 	struct object_info oi = OBJECT_INFO_INIT;
     + 	struct odb_loose_read_stream *st;
     + 	unsigned long mapsize;
     +-	unsigned long size_ul;
     + 	void *mapped;
     + 
     + 	mapped = odb_source_loose_map_object(loose, oid, &mapsize);
     +@@ odb/source-loose.c: static int odb_source_loose_read_object_stream(struct odb_read_stream **out,
     + 		goto error;
     + 	}
     + 
     +-	/*
     +-	 * object_info.sizep is unsigned long* (32-bit on Windows), but
     +-	 * st->base.size is size_t (64-bit). Use temporary variable.
     +-	 * Note: loose objects >4GB would still truncate here, but such
     +-	 * large loose objects are uncommon (they'd normally be packed).
     +-	 */
     +-	oi.sizep = &size_ul;
     ++	oi.sizep = &st->base.size;
     + 	oi.typep = &st->base.type;
     + 
     + 	if (parse_loose_header(st->hdr, &oi) < 0 || st->base.type < 0)
     + 		goto error;
     +-	st->base.size = size_ul;
     + 
     + 	st->mapped = mapped;
     + 	st->mapsize = mapsize;
     +
       ## odb/streaming.c ##
      @@ odb/streaming.c: static int open_istream_incore(struct odb_read_stream **out,
       		.base.read = read_istream_incore,

-- 
gitgitgadget

  parent reply	other threads:[~2026-06-15 11:52 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-04 10:51 [PATCH 0/7] More work supporting objects larger than 4GB on Windows Johannes Schindelin via GitGitGadget
2026-06-04 10:51 ` [PATCH 1/7] compat/msvc: use _chsize_s for ftruncate Johannes Schindelin via GitGitGadget
2026-06-04 10:51 ` [PATCH 2/7] patch-delta: use size_t for sizes Johannes Schindelin via GitGitGadget
2026-06-08 13:53   ` Patrick Steinhardt
2026-06-15  9:29     ` Johannes Schindelin
2026-06-04 10:51 ` [PATCH 3/7] pack-objects(check_pack_inflate()): use size_t instead of unsigned long Johannes Schindelin via GitGitGadget
2026-06-08 13:53   ` Patrick Steinhardt
2026-06-15  9:29     ` Johannes Schindelin
2026-06-04 10:51 ` [PATCH 4/7] packfile: widen unpack_entry()'s size out-parameter to size_t Johannes Schindelin via GitGitGadget
2026-06-08 13:53   ` Patrick Steinhardt
2026-06-15  9:29     ` Johannes Schindelin
2026-06-04 10:51 ` [PATCH 5/7] pack-objects: use size_t for in-core object sizes Johannes Schindelin via GitGitGadget
2026-06-04 10:51 ` [PATCH 6/7] packfile,delta: drop the `cast_size_t_to_ulong()` wrappers Johannes Schindelin via GitGitGadget
2026-06-08 13:53   ` Patrick Steinhardt
2026-06-04 10:51 ` [PATCH 7/7] odb: use size_t for object_info.sizep and the size APIs Johannes Schindelin via GitGitGadget
2026-06-08 13:53   ` Patrick Steinhardt
2026-06-15  9:29     ` Johannes Schindelin
2026-06-15 11:52 ` Johannes Schindelin via GitGitGadget [this message]
2026-06-15 11:52   ` [PATCH v2 1/7] compat/msvc: use _chsize_s for ftruncate Johannes Schindelin via GitGitGadget
2026-06-15 11:52   ` [PATCH v2 2/7] patch-delta: use size_t for sizes Johannes Schindelin via GitGitGadget
2026-06-15 11:52   ` [PATCH v2 3/7] pack-objects(check_pack_inflate()): use size_t instead of unsigned long Johannes Schindelin via GitGitGadget
2026-06-15 11:52   ` [PATCH v2 4/7] packfile: widen unpack_entry()'s size out-parameter to size_t Johannes Schindelin via GitGitGadget
2026-06-15 11:52   ` [PATCH v2 5/7] pack-objects: use size_t for in-core object sizes Johannes Schindelin via GitGitGadget
2026-06-15 11:52   ` [PATCH v2 6/7] packfile,delta: drop the `cast_size_t_to_ulong()` wrappers Johannes Schindelin via GitGitGadget
2026-06-15 11:52   ` [PATCH v2 7/7] odb: use size_t for object_info.sizep and the size APIs Johannes Schindelin via GitGitGadget
2026-06-15 14:55   ` [PATCH v2 0/7] More work supporting objects larger than 4GB on Windows Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2137.v2.git.1781524349.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=krka@spotify.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox