From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Kristofer Karlsson <krka@spotify.com>,
Patrick Steinhardt <ps@pks.im>,
Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: [PATCH v2 0/7] More work supporting objects larger than 4GB on Windows
Date: Mon, 15 Jun 2026 11:52:22 +0000 [thread overview]
Message-ID: <pull.2137.v2.git.1781524349.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2137.git.1780570272.gitgitgadget@gmail.com>
This patch series tries to address the problems pointed out by the expensive
tests that now run in CI: t5608 and t7508 verify various aspects about
objects larger than 4GB, which Git does not currently handle correctly when
run on a platform where size_t is 64-bit and unsigned long is 32-bit.
Changes vs v1:
* Rebased onto master, which merged ps/odb-source-loose (with which these
patches previously conflicted rather badly).
* Removed superfluous size_t s variables (thanks, Patrick!).
Johannes Schindelin (7):
compat/msvc: use _chsize_s for ftruncate
patch-delta: use size_t for sizes
pack-objects(check_pack_inflate()): use size_t instead of unsigned
long
packfile: widen unpack_entry()'s size out-parameter to size_t
pack-objects: use size_t for in-core object sizes
packfile,delta: drop the `cast_size_t_to_ulong()` wrappers
odb: use size_t for object_info.sizep and the size APIs
apply.c | 8 ++--
archive.c | 4 +-
attr.c | 2 +-
bisect.c | 2 +-
blame.c | 15 +++++--
builtin/cat-file.c | 61 ++++++++++++++---------------
builtin/difftool.c | 2 +-
builtin/fast-export.c | 7 +++-
builtin/fast-import.c | 29 ++++++++++----
builtin/fsck.c | 2 +-
builtin/grep.c | 12 +++---
builtin/index-pack.c | 10 ++---
builtin/log.c | 2 +-
builtin/ls-files.c | 2 +-
builtin/ls-tree.c | 4 +-
builtin/merge-tree.c | 6 +--
builtin/mktag.c | 2 +-
builtin/notes.c | 6 +--
builtin/pack-objects.c | 73 +++++++++++++++++++++--------------
builtin/repo.c | 4 +-
builtin/tag.c | 4 +-
builtin/unpack-file.c | 2 +-
builtin/unpack-objects.c | 8 ++--
bundle.c | 2 +-
combine-diff.c | 4 +-
commit.c | 10 ++---
compat/msvc-posix.h | 24 +++++++++++-
config.c | 2 +-
delta.h | 20 +++-------
diff.c | 5 ++-
dir.c | 2 +-
entry.c | 4 +-
fmt-merge-msg.c | 4 +-
fsck.c | 2 +-
grep.c | 4 +-
http-push.c | 2 +-
list-objects-filter.c | 2 +-
mailmap.c | 2 +-
match-trees.c | 4 +-
merge-blobs.c | 6 +--
merge-blobs.h | 2 +-
merge-ort.c | 2 +-
notes-cache.c | 2 +-
notes-merge.c | 2 +-
notes.c | 8 ++--
object-file.c | 6 +--
object.c | 2 +-
odb.c | 12 +++---
odb.h | 10 ++---
odb/source-loose.c | 12 +-----
odb/streaming.c | 13 +------
pack-bitmap.c | 4 +-
pack-check.c | 5 +--
pack-objects.h | 2 +-
packfile.c | 54 ++++++++++----------------
packfile.h | 5 ++-
patch-delta.c | 8 ++--
path-walk.c | 2 +-
protocol-caps.c | 5 ++-
read-cache.c | 6 +--
ref-filter.c | 2 +-
reflog.c | 2 +-
rerere.c | 2 +-
submodule-config.c | 2 +-
t/helper/test-delta.c | 10 +++--
t/helper/test-pack-deltas.c | 3 +-
t/helper/test-partial-clone.c | 2 +-
t/unit-tests/u-odb-inmemory.c | 2 +-
tag.c | 4 +-
tree-walk.c | 10 +++--
tree.c | 2 +-
xdiff-interface.c | 2 +-
72 files changed, 300 insertions(+), 271 deletions(-)
base-commit: ea97ad8d017de0c9037451a78008a0fd60abea0c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2137%2Fdscho%2Fobjects-larger-than-4gb-on-windows-pt2-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2137/dscho/objects-larger-than-4gb-on-windows-pt2-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2137
Range-diff vs v1:
1: de9fc5c455 = 1: 531bca775c compat/msvc: use _chsize_s for ftruncate
2: 1fd7646ca1 = 2: 66a642c39e patch-delta: use size_t for sizes
3: ddb75326cd = 3: 271a5299e3 pack-objects(check_pack_inflate()): use size_t instead of unsigned long
4: bdebc36f21 = 4: 5c329535df packfile: widen unpack_entry()'s size out-parameter to size_t
5: 68750ba2d1 = 5: 01b9209b26 pack-objects: use size_t for in-core object sizes
6: 460d733fee = 6: 12c142f8ab packfile,delta: drop the `cast_size_t_to_ulong()` wrappers
7: f3aeae983a ! 7: 37d030d867 odb: use size_t for object_info.sizep and the size APIs
@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const
struct object_info oi = OBJECT_INFO_INIT;
unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
- if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) {
- size_t s = size;
- buf = replace_idents_using_mailmap(buf, &s);
+ if (odb_read_object_info_extended(the_repository->objects, &oid, &oi, flags) < 0)
+ die("git cat-file: could not get object info");
+
+- if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) {
+- size_t s = size;
+- buf = replace_idents_using_mailmap(buf, &s);
- size = cast_size_t_to_ulong(s);
-+ size = s;
- }
+- }
++ if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG))
++ buf = replace_idents_using_mailmap(buf, &size);
printf("%"PRIuMAX"\n", (uintmax_t)size);
+ ret = 0;
@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
break;
@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const
case 'p':
@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
- if (use_mailmap) {
- size_t s = size;
- buf = replace_idents_using_mailmap(buf, &s);
+ if (!buf)
+ die("Cannot read object %s", obj_name);
+
+- if (use_mailmap) {
+- size_t s = size;
+- buf = replace_idents_using_mailmap(buf, &s);
- size = cast_size_t_to_ulong(s);
-+ size = s;
- }
+- }
++ if (use_mailmap)
++ buf = replace_idents_using_mailmap(buf, &size);
/* otherwise just spit out the data */
+ break;
@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
- if (use_mailmap) {
- size_t s = size;
- buf = replace_idents_using_mailmap(buf, &s);
+ buf = odb_read_object_peeled(the_repository->objects, &oid,
+ exp_type_id, &size, NULL);
+
+- if (use_mailmap) {
+- size_t s = size;
+- buf = replace_idents_using_mailmap(buf, &s);
- size = cast_size_t_to_ulong(s);
-+ size = s;
- }
+- }
++ if (use_mailmap)
++ buf = replace_idents_using_mailmap(buf, &size);
break;
}
+ default:
@@ builtin/cat-file.c: cleanup:
struct expand_data {
struct object_id oid;
@@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, s
contents = odb_read_object(the_repository->objects, oid,
@@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, struct expand_data *d
- if (use_mailmap) {
- size_t s = size;
- contents = replace_idents_using_mailmap(contents, &s);
+ if (!contents)
+ die("object %s disappeared", oid_to_hex(oid));
+
+- if (use_mailmap) {
+- size_t s = size;
+- contents = replace_idents_using_mailmap(contents, &s);
- size = cast_size_t_to_ulong(s);
-+ size = s;
- }
+- }
++ if (use_mailmap)
++ contents = replace_idents_using_mailmap(contents, &size);
if (type != data->type)
+ die("object %s changed type!?", oid_to_hex(oid));
@@ builtin/cat-file.c: static void batch_object_write(const char *obj_name,
+ }
+
+ if (use_mailmap && (data->type == OBJ_COMMIT || data->type == OBJ_TAG)) {
+- size_t s = data->size;
+ char *buf = NULL;
+
+ buf = odb_read_object(the_repository->objects, &data->oid,
+ &data->type, &data->size);
if (!buf)
die(_("unable to read %s"), oid_to_hex(&data->oid));
- buf = replace_idents_using_mailmap(buf, &s);
+- buf = replace_idents_using_mailmap(buf, &s);
- data->size = cast_size_t_to_ulong(s);
-+ data->size = s;
++ buf = replace_idents_using_mailmap(buf, &data->size);
free(buf);
}
@@ builtin/log.c: static int show_blob_object(const struct object_id *oid, struct r
## builtin/ls-files.c ##
@@ builtin/ls-files.c: static void expand_objectsize(struct repository *repo, struct strbuf *line,
- const enum object_type type, unsigned int padded)
- {
+ size_t len;
+
if (type == OBJ_BLOB) {
- unsigned long size;
+ size_t size;
@@ builtin/ls-files.c: static void expand_objectsize(struct repository *repo, struc
## builtin/ls-tree.c ##
@@ builtin/ls-tree.c: static void expand_objectsize(struct strbuf *line, const struct object_id *oid,
- const enum object_type type, unsigned int padded)
- {
+ size_t len;
+
if (type == OBJ_BLOB) {
- unsigned long size;
+ size_t size;
@@ notes.c: static void format_note(struct notes_tree *t, const struct object_id *o
if (!t)
## object-file.c ##
-@@ object-file.c: static int parse_loose_header(const char *hdr, struct object_info *oi)
+@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi)
}
if (oi->sizep)
@@ object-file.c: static int parse_loose_header(const char *hdr, struct object_info
/*
* The length must be followed by a zero byte
-@@ object-file.c: static int read_object_info_from_path(struct odb_source *source,
- void *map = NULL;
- git_zstream stream, *stream_to_end = NULL;
- char hdr[MAX_HEADER_LEN];
-- unsigned long size_scratch;
-+ size_t size_scratch;
- enum object_type type_scratch;
- struct stat st;
-
@@ object-file.c: int force_object_loose(struct odb_source *source,
- {
+ struct odb_source_files *files = odb_source_files_downcast(source);
const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
void *buf;
- unsigned long len;
@@ object-file.c: int read_loose_object(struct repository *repo,
fd = git_open(path);
if (fd >= 0)
-@@ object-file.c: int odb_source_loose_read_object_stream(struct odb_read_stream **out,
- struct object_info oi = OBJECT_INFO_INIT;
- struct odb_loose_read_stream *st;
- unsigned long mapsize;
-- unsigned long size_ul;
- void *mapped;
-
- mapped = odb_source_loose_map_object(source, oid, &mapsize);
-@@ object-file.c: int odb_source_loose_read_object_stream(struct odb_read_stream **out,
- goto error;
- }
-
-- /*
-- * object_info.sizep is unsigned long* (32-bit on Windows), but
-- * st->base.size is size_t (64-bit). Use temporary variable.
-- * Note: loose objects >4GB would still truncate here, but such
-- * large loose objects are uncommon (they'd normally be packed).
-- */
-- oi.sizep = &size_ul;
-+ oi.sizep = &st->base.size;
- oi.typep = &st->base.type;
-
- if (parse_loose_header(st->hdr, &oi) < 0 || st->base.type < 0)
- goto error;
-- st->base.size = size_ul;
-
- st->mapped = mapped;
- st->mapsize = mapsize;
## object.c ##
@@ object.c: struct object *parse_object_with_flags(struct repository *r,
@@ odb.h: int odb_read_object_info_extended(struct object_database *odb,
enum odb_has_object_flags {
/* Retry packed storage after checking packed and loose storage */
+ ## odb/source-loose.c ##
+@@ odb/source-loose.c: static int read_object_info_from_path(struct odb_source_loose *loose,
+ void *map = NULL;
+ git_zstream stream, *stream_to_end = NULL;
+ char hdr[MAX_HEADER_LEN];
+- unsigned long size_scratch;
++ size_t size_scratch;
+ enum object_type type_scratch;
+ struct stat st;
+
+@@ odb/source-loose.c: static int odb_source_loose_read_object_stream(struct odb_read_stream **out,
+ struct object_info oi = OBJECT_INFO_INIT;
+ struct odb_loose_read_stream *st;
+ unsigned long mapsize;
+- unsigned long size_ul;
+ void *mapped;
+
+ mapped = odb_source_loose_map_object(loose, oid, &mapsize);
+@@ odb/source-loose.c: static int odb_source_loose_read_object_stream(struct odb_read_stream **out,
+ goto error;
+ }
+
+- /*
+- * object_info.sizep is unsigned long* (32-bit on Windows), but
+- * st->base.size is size_t (64-bit). Use temporary variable.
+- * Note: loose objects >4GB would still truncate here, but such
+- * large loose objects are uncommon (they'd normally be packed).
+- */
+- oi.sizep = &size_ul;
++ oi.sizep = &st->base.size;
+ oi.typep = &st->base.type;
+
+ if (parse_loose_header(st->hdr, &oi) < 0 || st->base.type < 0)
+ goto error;
+- st->base.size = size_ul;
+
+ st->mapped = mapped;
+ st->mapsize = mapsize;
+
## odb/streaming.c ##
@@ odb/streaming.c: static int open_istream_incore(struct odb_read_stream **out,
.base.read = read_istream_incore,
--
gitgitgadget
next prev parent reply other threads:[~2026-06-15 11:52 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-04 10:51 [PATCH 0/7] More work supporting objects larger than 4GB on Windows Johannes Schindelin via GitGitGadget
2026-06-04 10:51 ` [PATCH 1/7] compat/msvc: use _chsize_s for ftruncate Johannes Schindelin via GitGitGadget
2026-06-04 10:51 ` [PATCH 2/7] patch-delta: use size_t for sizes Johannes Schindelin via GitGitGadget
2026-06-08 13:53 ` Patrick Steinhardt
2026-06-15 9:29 ` Johannes Schindelin
2026-06-04 10:51 ` [PATCH 3/7] pack-objects(check_pack_inflate()): use size_t instead of unsigned long Johannes Schindelin via GitGitGadget
2026-06-08 13:53 ` Patrick Steinhardt
2026-06-15 9:29 ` Johannes Schindelin
2026-06-04 10:51 ` [PATCH 4/7] packfile: widen unpack_entry()'s size out-parameter to size_t Johannes Schindelin via GitGitGadget
2026-06-08 13:53 ` Patrick Steinhardt
2026-06-15 9:29 ` Johannes Schindelin
2026-06-04 10:51 ` [PATCH 5/7] pack-objects: use size_t for in-core object sizes Johannes Schindelin via GitGitGadget
2026-06-04 10:51 ` [PATCH 6/7] packfile,delta: drop the `cast_size_t_to_ulong()` wrappers Johannes Schindelin via GitGitGadget
2026-06-08 13:53 ` Patrick Steinhardt
2026-06-04 10:51 ` [PATCH 7/7] odb: use size_t for object_info.sizep and the size APIs Johannes Schindelin via GitGitGadget
2026-06-08 13:53 ` Patrick Steinhardt
2026-06-15 9:29 ` Johannes Schindelin
2026-06-15 11:52 ` Johannes Schindelin via GitGitGadget [this message]
2026-06-15 11:52 ` [PATCH v2 1/7] compat/msvc: use _chsize_s for ftruncate Johannes Schindelin via GitGitGadget
2026-06-15 11:52 ` [PATCH v2 2/7] patch-delta: use size_t for sizes Johannes Schindelin via GitGitGadget
2026-06-15 11:52 ` [PATCH v2 3/7] pack-objects(check_pack_inflate()): use size_t instead of unsigned long Johannes Schindelin via GitGitGadget
2026-06-15 11:52 ` [PATCH v2 4/7] packfile: widen unpack_entry()'s size out-parameter to size_t Johannes Schindelin via GitGitGadget
2026-06-15 11:52 ` [PATCH v2 5/7] pack-objects: use size_t for in-core object sizes Johannes Schindelin via GitGitGadget
2026-06-15 11:52 ` [PATCH v2 6/7] packfile,delta: drop the `cast_size_t_to_ulong()` wrappers Johannes Schindelin via GitGitGadget
2026-06-15 11:52 ` [PATCH v2 7/7] odb: use size_t for object_info.sizep and the size APIs Johannes Schindelin via GitGitGadget
2026-06-15 14:55 ` [PATCH v2 0/7] More work supporting objects larger than 4GB on Windows Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.2137.v2.git.1781524349.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=johannes.schindelin@gmx.de \
--cc=krka@spotify.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox