Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v3 3/3] b4: introduce configuration for the Git project
From: Patrick Steinhardt @ 2026-06-15 12:58 UTC (permalink / raw)
  To: Karthik Nayak
  Cc: git, Junio C Hamano, Tuomas Ahola, Weijie Yuan, Ramsay Jones,
	SZEDER Gábor, Kristoffer Haugsbakk, Toon Claes
In-Reply-To: <CAOLa=ZQxA52p+9DcZZ=gVTqZ66ETQvZRQYjZNFjzdbsPwTW2iQ@mail.gmail.com>

On Wed, Jun 10, 2026 at 07:13:33AM -0400, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > We're about to extend our documentation to recommend b4 for sending
> 
> Nit: This is in the past now

True, will fix.

Patrick

^ permalink raw reply

* Re: [PATCH v3 1/3] MyFirstContribution: recommend shallow threading of cover letters
From: Patrick Steinhardt @ 2026-06-15 12:58 UTC (permalink / raw)
  To: Karthik Nayak
  Cc: git, Junio C Hamano, Tuomas Ahola, Weijie Yuan, Ramsay Jones,
	SZEDER Gábor, Kristoffer Haugsbakk, Toon Claes
In-Reply-To: <CAOLa=ZQE-kkpSX=pP2A6SXdbp_O6AHzRmbUDOtKCsvz2Yz66Ng@mail.gmail.com>

On Wed, Jun 10, 2026 at 07:08:33AM -0400, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > The "MyFirstContribution" document recommends the use of deep threading
> > of cover letters: every cover letter of subsequent iterations shall be
> > linked to the cover letter of the preceding version. The result of this
> > is that eventually, threads with many versions are getting nested so
> > deep that it becomes hard to follow.
> >
> > Adapt the recommendation to instead propose shallow threading of cover
> > letters: instead of linking the cover letter to the previous cover
> > letter, the user is supposed to always link it to the first cover
> > letter. This still makes it easy to follow the iterations, but has the
> > benefit of nesting to a much shallower level.
> 
> Should we also modify 'Documentation/SubmittingPatches'? Which states:
> 
>   All subsequent versions of a patch series and other related patches
>   should be grouped into their own e-mail thread to help readers find
>   all parts of the series.  To that end, send them as replies to either
>   an additional "cover letter" message (see below), the first patch, or
>   the respective preceding patch. Here is a
>   link:MyFirstContribution.html#v2-git-send-email[step-by-step guide] on
>   how to submit updated versions of a patch series.
> 
> Personally, I find it a bit awkward when new versions are sent as a new
> separate thread, especially when the subject is changed over versions.

I don't necessarily see this as contradicting advice, I rather read it
as "patches of vN+1 should have their own subthread". But it certainly
is confusingly written, and I'm not even sure myself whether I'm reading
it correctly or not.

I kind of feel like this is a bit outside the scope of this series. Also
because I'm not a 100% sure how to reword this to make it read nicer :)
But I'm very happy to accept suggestions here.

Patrick

^ permalink raw reply

* Re: [PATCH v5 06/10] reset: introduce ability to skip updating HEAD
From: Patrick Steinhardt @ 2026-06-15 12:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Pablo Sabater, Kristoffer Haugsbakk, Phillip Wood
In-Reply-To: <xmqq33ytneiu.fsf@gitster.g>

On Thu, Jun 11, 2026 at 11:00:25AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Note that in a previous iteration we instead introduced a flag that made
> > callers opt out of updating any references. This was somewhat awkward
> > though because we already have the `UPDATE_ORIG_HEAD` flag, so the
> > result was somewhat inconsistent.
> >
> > Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  builtin/rebase.c | 14 ++++++++++----
> >  reset.c          |  9 +++++++--
> >  reset.h          |  9 ++++++---
> >  sequencer.c      |  4 +++-
> >  4 files changed, 26 insertions(+), 10 deletions(-)
> >
> > diff --git a/reset.c b/reset.c
> > ...
> > @@ -129,7 +133,7 @@ int reset_working_tree(struct repository *r,
> >  		oid = &head_oid;
> >  
> >  	if (refs_only) {
> > -		if (!dry_run)
> > +		if (update_head)
> >  			return update_refs(r, opts, oid, head);
> >  		return 0;
> >  	}
> 
> So when refs_only and update-head are in effect, we will call
> update_refs(), even if dry_run is given.  update_refs() does not
> seem to pay attention to (opts->flags & RESET_WORKING_TREE_DRY_RUN)
> at all, so wouldn't this mean that we would update even in a dry-run
> session?

Ugh, good catch, this is obviously wrong. Will fix.

Patrick

^ permalink raw reply

* Re: [PATCH 0/9] refs: stop using `chdir_notify_reparent()`
From: Patrick Steinhardt @ 2026-06-15 12:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Karthik Nayak
In-Reply-To: <20260613140024.GA766297@coredump.intra.peff.net>

On Sat, Jun 13, 2026 at 10:00:24AM -0400, Jeff King wrote:
> On Fri, Jun 12, 2026 at 08:18:16AM +0200, Patrick Steinhardt wrote:
> 
> > > If we move to a world of all absolute paths where chdir-notify is not
> > > necessary, will we lose that optimization?
> > 
> > Probably. Unfortunately, the commit doesn't have any repeatable
> > benchmarks in there, so it's hard to say whether we could still
> > reproduce those issues or not.
> 
> Here's an easy-ish reproduction specific to the ref code:
> 
>   rm -rf a/
>   dir=$(perl -e 'print "a/" x 1024')
>   mkdir -p $dir &&
>   cd $dir &&
>   git init &&
>   git commit --allow-empty -m foo &&
>   seq -f 'create refs/heads/foo%05g HEAD' 10000 |
>   git update-ref --stdin &&
>   time git show-ref
> 
> Before your series, I get timings like this:
> 
>   real	0m0.078s
>   user	0m0.020s
>   sys	0m0.057s
> 
> After, I get:
> 
>   real	0m0.876s
>   user	0m0.004s
>   sys	0m0.872s
> 
> So it really is measurable (and I did not expect the effect to be nearly
> so large). Unsurprisingly the extra CPU goes to system time.

This is indeed surprisingly bad.

> But obviously that case is quite silly. It's an absurdly deep hierarchy,
> and 10,000 loose refs is a lot. Just running "git pack-refs --all"
> brings the before/after to roughly the same timings (around 40ms --
> faster even than the before timing).
> 
> So it _can_ matter, but I think ultimately the better direction is
> probably "make fewer syscalls". Which we do via packfiles, and via
> packed-refs, and eventually via reftables, all of which put more data
> into a single file.
> 
> I offer the script above more as food for thought, and not necessarily
> an argument against your series.

Hum, yeah. I'm a bit hesitant to just wave your findings away. I mean I
agree with you that it's unlikely to really matter in practice. But you
never really know, and I'm not sure that I consider dropping the chdir
infra important enough to knowingly take that hit.

I definitely think that we should merge the remainder of this series
though, as these patches simplify "setup.c" and fix a couple of memory
leaks. But maybe we drop the last patch for now and...

> > Ideally, we'd have the best of both worlds: absolute paths everywhere
> > without the performance hit. A while back I had a discussion with
> > Torvalds on the securiy mailing list around this issue, and ultimately
> > the conclusion was that the best way forward would be to use openat(3p).
> > 
> > This wouldn't only allow us to optimize cases like this, but it also has
> > the added benefit that we're much less prone to TOCTOU-style issues and
> > we might even be able to use flags like O_BENEATH. So it would basically
> > be win-win. The only problem is of course that Windows doesn't have
> > openat(3p), so we'd have to emulate it, and that's where I always lost
> > the desire to do this.
> > 
> > When waking up this morning though I had the thought that we shouldn't
> > try to emulate openat(3p) directly, but instead create a higher-level
> > interface.
> > [...]
> 
> Yeah, I think given a decent interface it might not be so bad. It would
> mean code thinking about filesystem syscalls in a different way, but if
> done subsystem-by-subsystem it might be OK to do incrementally. Much of
> the code that would want to switch to this is using repo_git_path() or
> similar already (and getting rid of those remaining static-buffer
> functions would be a nice bonus).
> 
> I do wonder if your series here to move to absolute paths makes the
> TOCTOU situation a little worse. With a relative path, once we are
> "inside" the repo then we are only susceptible to changes within it.
> Whereas with an absolute path, if one of the intermediate paths changes
> from under us, there may be confusion.
> 
> Without thinking on it too hard, though, I'd guess if any such case is a
> security problem, it already was during the "open" part (because it
> implies that the attacker controls paths below you in the hierarchy, and
> you had to get to your cwd _somehow_, at which point they could have
> attacked you then).

... eventually give this idea here a test?

Patrick

^ permalink raw reply

* Re: [PATCH 9/9] refs: always use absolute paths for reference stores
From: Patrick Steinhardt @ 2026-06-15 12:36 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
In-Reply-To: <CAOLa=ZR60bhH4z9ZoKTCn97QzautcihxPbTZ=_e0raMTjzajZQ@mail.gmail.com>

On Fri, Jun 12, 2026 at 02:58:19AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Both the "files" and "reftable" backends use
> > `refs_compute_filesystem_location()` to figure out the location of both
> > the git and common directories. Depending on how the function is called
> > we may or may not return an absolute path.
> >
> > There isn't really a good reason to use relative paths though. Quite on
> > the contrary, because we sometimes use relative paths we are forced to
> > register for chdir(3p) notifications via `chdir_notify_reparent()`.
> >
> 
> With the previous changes added, we register via
> `chdir_notify_register()`
> 
> > Adapt the function to always return absolute paths. This results in a
> > user-visible change in behaviour where we now unconditionally print
> > absolute paths in error messages. But arguably, that change in behaviour
> > is acceptable and may even be good in cases where a Git command may end
> > up accessing references across multiple different repositories.
> >
> > Furthermore, drop the calls to `chdir_notify_reparent()`, which aren't
> > required anymore now that the paths are always absolute.
> >
> 
> Same here, should be `chdir_notify_register()`

Yes, will fix.

Patrick

^ permalink raw reply

* Re: [PATCH 4/9] refs: unregister reference stores from "chdir_notify"
From: Patrick Steinhardt @ 2026-06-15 12:36 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
In-Reply-To: <CAOLa=ZS_0b9o2YucgA6Se_Mq4nLo1Luow7adTLAifbkF9jpUrA@mail.gmail.com>

On Fri, Jun 12, 2026 at 02:18:28AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
[snip]
> > We never noticed either of these symptoms, but they are obviously bad.
> >
> > Partially fix those issues by unregistering the reference stores when
> > releasing them. The leak of the main reference database will be fixed in
> > a subsequent commit.
> >
> > Note that this requires us to use `chdir_notify_register()` instead of
> > `chdir_notify_parent()`, as there is no infrastructure to unregister the
> 
> Shouldn't this be s/chdir_notify_parent/chdir_notify_reparent ?

Yup, good catch.

Patrick

^ permalink raw reply

* Re: [PATCH 2/9] setup: stop applying repository format twice
From: Patrick Steinhardt @ 2026-06-15 12:36 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
In-Reply-To: <CAOLa=ZQC7YCBxjxkbm8qcWqpNFgAKNpvw9B6t=+XnX4bbkGq0Q@mail.gmail.com>

On Fri, Jun 12, 2026 at 02:00:20AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > When discovering the repository in "setup.c" we apply the final
> > repository format multiple times:
> >
> >   - Once via `repository_format_configure()`, where we configure the
> >     repository format for both `struct repository_format` and `struct
> >     repository`.
> >
> >   - And once via `apply_repository_format()`, where we then apply the
> >     `struct repository_format` to the `struct repository` again.
> >
> 
> Okay so we're talking applying the repository format to the `struct
> repository` specifically.
> 
> > As the format will be applied to the repository when applying the format
> > it's thus somewhat unnecessary to also apply it to the repository when
> > adapting the discovered format.
> 
> This was a bit confusing to read at first. Okay since we already apply
> the format in the second step, the first is not necessary.

I agree. I'll rephrase this a bit.

Patrick

^ permalink raw reply

* [PATCH] gitlab-ci: migrate Windows builds away from Chocolatey
From: Patrick Steinhardt @ 2026-06-15 12:21 UTC (permalink / raw)
  To: git

The Windows builds in GitLab CI use Chocolatey to install dependencies.
Unfortunately, Chocolatey seems to be very unreliable, which causes the
jobs to fail very regularly. This is a limitation that seems to be
somewhat known [1]:

  As an organization, you want 100% reliability (or at least that
  potential), and you may want full trust and control as well. This is
  something you can get with internally hosted packages, and you are
  unlikely to achieve from use of the Community Package Repository.

So using the Community Package Repository is kind of discouraged in case
one wants reliability. We _do_ want reliability though, and we cannot
easily switch to an enterprise license to fix this issue.

Introduce a new script that downloads and installs dependencies
directly. This has a couple of benefits:

  - We can drop our dependency on Chocolatey completely, thus improving
    reliability.

  - We can easily cache the installers.

  - We get direct control over the exact versions we install.

  - Installing dependencies is sped up from roundabout 3 minutes to 1
    minute.

[1]: https://docs.chocolatey.org/en-us/community-repository/community-packages-disclaimer/#summary

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
Hi

I've been quite annoyed recently because our Windows builds in GitLab CI
are extremely flakey. All of those flakes come from Chocolatey, which is
why this patch moves away from it.

Thanks!

Patrick
---
 .gitlab-ci.yml              | 11 ++++++---
 ci/install-dependencies.ps1 | 55 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index e0b9a0d82b..87a5343a94 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -161,11 +161,16 @@ test:mingw64:
     TEST_OUTPUT_DIRECTORY: "C:/Git-Test"
   tags:
     - saas-windows-medium-amd64
+  cache:
+    key:
+      files:
+        - ci/install-dependencies.ps1
+    paths:
+      - .dependencies
   before_script:
     - *windows_before_script
-    - choco install -y git meson ninja rust-ms
-    - Import-Module $env:ChocolateyInstall\helpers\chocolateyProfile.psm1
-    - refreshenv
+    - ./ci/install-dependencies.ps1
+    - $env:Path = "C:\Meson;C:\Rust\bin;$env:Path"
     - New-Item -Path $env:TEST_OUTPUT_DIRECTORY -ItemType Directory
 
 build:msvc-meson:
diff --git a/ci/install-dependencies.ps1 b/ci/install-dependencies.ps1
new file mode 100755
index 0000000000..e3b367fa54
--- /dev/null
+++ b/ci/install-dependencies.ps1
@@ -0,0 +1,55 @@
+param(
+    [string]$DownloadDirectory = '.dependencies'
+)
+
+$ErrorActionPreference = 'Stop'
+$ProgressPreference = 'SilentlyContinue'
+
+$GitVersion = '2.54.0.windows.1'
+$MesonVersion = '1.11.0'
+$RustVersion = '1.96.0'
+
+New-Item -Path $DownloadDirectory -ItemType Directory -Force | Out-Null
+New-Item -Path .git/info -ItemType Directory -Force | Out-Null
+New-Item -Path .git/info/exclude -ItemType File -Force | Out-Null
+Add-Content -Path .git/info/exclude -Value "/$DownloadDirectory"
+
+function Get-Installer {
+    param(
+        [Parameter(Mandatory = $true)][string]$Name,
+        [Parameter(Mandatory = $true)][string]$Url
+    )
+
+    $path = Join-Path $DownloadDirectory $Name
+    if (-not (Test-Path $path)) {
+        Write-Host "Downloading $Url"
+        Invoke-WebRequest $Url -OutFile $path -TimeoutSec 300
+    }
+    return $path
+}
+
+function Invoke-Installer {
+    param(
+        [Parameter(Mandatory = $true)][string]$FilePath,
+        [Parameter(Mandatory = $true)][string[]]$ArgumentList
+    )
+
+    Write-Host "Running $FilePath $($ArgumentList -join ' ')"
+    $process = Start-Process -Wait -PassThru -FilePath $FilePath -ArgumentList $ArgumentList
+    if ($process.ExitCode -ne 0) {
+        throw "$FilePath failed with exit code $($process.ExitCode)"
+    }
+}
+
+$gitAssetVersion = $GitVersion -replace '\.windows\.\d+$', ''
+$gitInstaller = Get-Installer "Git-Installer.exe" `
+    "https://github.com/git-for-windows/git/releases/download/v$GitVersion/PortableGit-$gitAssetVersion-64-bit.7z.exe"
+Invoke-Installer $gitInstaller @('-y', '-o"C:\Program Files\Git"')
+
+$mesonMsi = Get-Installer "meson.msi" `
+    "https://github.com/mesonbuild/meson/releases/download/$MesonVersion/meson-$MesonVersion-64.msi"
+Invoke-Installer msiexec.exe @('/i', $mesonMsi, 'INSTALLDIR=C:\Meson', '/quiet', '/norestart')
+
+$rustMsi = Get-Installer "rust.msi" `
+    "https://static.rust-lang.org/dist/rust-$RustVersion-x86_64-pc-windows-msvc.msi"
+Invoke-Installer msiexec.exe @('/i', $rustMsi, 'INSTALLDIR=C:\Rust', 'ADDLOCAL=Rustc,Cargo,Std', '/quiet', '/norestart')

---
base-commit: ea97ad8d017de0c9037451a78008a0fd60abea0c
change-id: 20260615-b4-pks-gitlab-ci-drop-chocolatey-bfe9d4bb1442


^ permalink raw reply related

* [PATCH v2 7/7] odb: use size_t for object_info.sizep and the size APIs
From: Johannes Schindelin via GitGitGadget @ 2026-06-15 11:52 UTC (permalink / raw)
  To: git
  Cc: Kristofer Karlsson, Patrick Steinhardt, Johannes Schindelin,
	Johannes Schindelin
In-Reply-To: <pull.2137.v2.git.1781524349.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When `js/objects-larger-than-4gb-on-windows` widened the streaming,
index-pack and unpack-objects code paths, in the interest of keeping the
patches somewhat reasonably-sized, it left the public ODB API still
typed in `unsigned long`. In particular `struct object_info::sizep` and
the four wrappers built on top of it (`odb_read_object`,
`odb_read_object_peeled`, `odb_read_object_info`, `odb_pretend_object`)
still return the unpacked size through `unsigned long *`, so on Windows
`cat-file -s` and the `git add` / `git status` paths for a >4 GiB blob
silently cap at 4 GiB.

Widen the field and the four wrappers. The previous commits already
widened the `unpack_entry()` cascade and pack-objects' in-core size
accessors, so most of the cascade arrives here with no further work: the
temporary shims in `packed_object_info_with_index_pos()` and in
`unpack_entry()`'s delta-base recovery path go away, the two
`SET_SIZE(entry, cast_size_t_to_ulong(canonical_size))` calls in
`check_object()` and the matching one in `drop_reused_delta()` collapse
to plain `SET_SIZE`, and `oe_get_size_slow()`'s tail
`cast_size_t_to_ulong()` is gone too.

What remains narrow are the boundaries this series does not
intend to touch: the diff, blame, textconv and fast-import machinery.

Even so, this patch is unfortunately quite large.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 apply.c                       |  6 ++--
 archive.c                     |  4 +--
 attr.c                        |  2 +-
 bisect.c                      |  2 +-
 blame.c                       | 15 ++++++---
 builtin/cat-file.c            | 61 ++++++++++++++++-------------------
 builtin/difftool.c            |  2 +-
 builtin/fast-export.c         |  7 ++--
 builtin/fast-import.c         | 22 +++++++++----
 builtin/fsck.c                |  2 +-
 builtin/grep.c                | 12 +++----
 builtin/index-pack.c          |  6 ++--
 builtin/log.c                 |  2 +-
 builtin/ls-files.c            |  2 +-
 builtin/ls-tree.c             |  4 +--
 builtin/merge-tree.c          |  6 ++--
 builtin/mktag.c               |  2 +-
 builtin/notes.c               |  6 ++--
 builtin/pack-objects.c        | 33 +++++++++++++------
 builtin/repo.c                |  4 ++-
 builtin/tag.c                 |  4 +--
 builtin/unpack-file.c         |  2 +-
 builtin/unpack-objects.c      |  6 ++--
 bundle.c                      |  2 +-
 combine-diff.c                |  4 ++-
 commit.c                      | 10 +++---
 config.c                      |  2 +-
 diff.c                        |  5 ++-
 dir.c                         |  2 +-
 entry.c                       |  4 +--
 fmt-merge-msg.c               |  4 +--
 fsck.c                        |  2 +-
 grep.c                        |  4 ++-
 http-push.c                   |  2 +-
 list-objects-filter.c         |  2 +-
 mailmap.c                     |  2 +-
 match-trees.c                 |  4 +--
 merge-blobs.c                 |  6 ++--
 merge-blobs.h                 |  2 +-
 merge-ort.c                   |  2 +-
 notes-cache.c                 |  2 +-
 notes-merge.c                 |  2 +-
 notes.c                       |  8 +++--
 object-file.c                 |  6 ++--
 object.c                      |  2 +-
 odb.c                         | 12 +++----
 odb.h                         | 10 +++---
 odb/source-loose.c            | 12 ++-----
 odb/streaming.c               | 13 +-------
 pack-bitmap.c                 |  4 +--
 packfile.c                    | 12 ++-----
 path-walk.c                   |  2 +-
 protocol-caps.c               |  5 +--
 read-cache.c                  |  6 ++--
 ref-filter.c                  |  2 +-
 reflog.c                      |  2 +-
 rerere.c                      |  2 +-
 submodule-config.c            |  2 +-
 t/helper/test-pack-deltas.c   |  3 +-
 t/helper/test-partial-clone.c |  2 +-
 t/unit-tests/u-odb-inmemory.c |  2 +-
 tag.c                         |  4 +--
 tree-walk.c                   | 10 +++---
 tree.c                        |  2 +-
 xdiff-interface.c             |  2 +-
 65 files changed, 209 insertions(+), 191 deletions(-)

diff --git a/apply.c b/apply.c
index 3cf544e9a9..5e54453f79 100644
--- a/apply.c
+++ b/apply.c
@@ -3321,7 +3321,7 @@ static int apply_binary(struct apply_state *state,
 	if (odb_has_object(the_repository->objects, &oid, 0)) {
 		/* We already have the postimage */
 		enum object_type type;
-		unsigned long size;
+		size_t size;
 		char *result;
 
 		result = odb_read_object(the_repository->objects, &oid,
@@ -3384,7 +3384,7 @@ static int read_blob_object(struct strbuf *buf, const struct object_id *oid, uns
 		strbuf_addf(buf, "Subproject commit %s\n", oid_to_hex(oid));
 	} else {
 		enum object_type type;
-		unsigned long sz;
+		size_t sz;
 		char *result;
 
 		result = odb_read_object(the_repository->objects, oid,
@@ -3611,7 +3611,7 @@ static int load_preimage(struct apply_state *state,
 
 static int resolve_to(struct image *image, const struct object_id *result_id)
 {
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 	char *data;
 
diff --git a/archive.c b/archive.c
index 51229107a5..59790be986 100644
--- a/archive.c
+++ b/archive.c
@@ -87,7 +87,7 @@ static void *object_file_to_archive(const struct archiver_args *args,
 				    const struct object_id *oid,
 				    unsigned int mode,
 				    enum object_type *type,
-				    unsigned long *sizep)
+				    size_t *sizep)
 {
 	void *buffer;
 	const struct commit *commit = args->convert ? args->commit : NULL;
@@ -158,7 +158,7 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
 	write_archive_entry_fn_t write_entry = c->write_entry;
 	int err;
 	const char *path_without_prefix;
-	unsigned long size;
+	size_t size;
 	void *buffer;
 	enum object_type type;
 
diff --git a/attr.c b/attr.c
index 75369547b3..c61472a4e6 100644
--- a/attr.c
+++ b/attr.c
@@ -768,7 +768,7 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
 					      const char *path, unsigned flags)
 {
 	struct object_id oid;
-	unsigned long sz;
+	size_t sz;
 	enum object_type type;
 	void *buf;
 	unsigned short mode;
diff --git a/bisect.c b/bisect.c
index e29d1cbc64..94c7028d2a 100644
--- a/bisect.c
+++ b/bisect.c
@@ -154,7 +154,7 @@ static void show_list(const char *debug, int counted, int nr,
 		struct commit *commit = p->item;
 		unsigned commit_flags = commit->object.flags;
 		enum object_type type;
-		unsigned long size;
+		size_t size;
 		char *buf = odb_read_object(the_repository->objects,
 					    &commit->object.oid, &type,
 					    &size);
diff --git a/blame.c b/blame.c
index 977cbb7097..126e232416 100644
--- a/blame.c
+++ b/blame.c
@@ -1041,10 +1041,13 @@ static void fill_origin_blob(struct diff_options *opt,
 		    textconv_object(opt->repo, o->path, o->mode,
 				    &o->blob_oid, 1, &file->ptr, &file_size))
 			;
-		else
+		else {
+			size_t file_size_st = 0;
 			file->ptr = odb_read_object(the_repository->objects,
 						    &o->blob_oid, &type,
-						    &file_size);
+						    &file_size_st);
+			file_size = cast_size_t_to_ulong(file_size_st);
+		}
 		file->size = file_size;
 
 		if (!file->ptr)
@@ -2869,10 +2872,14 @@ void setup_scoreboard(struct blame_scoreboard *sb,
 		    textconv_object(sb->repo, sb->path, o->mode, &o->blob_oid, 1, (char **) &sb->final_buf,
 				    &sb->final_buf_size))
 			;
-		else
+		else {
+			size_t final_buf_size_st = 0;
 			sb->final_buf = odb_read_object(the_repository->objects,
 							&o->blob_oid, &type,
-							&sb->final_buf_size);
+							&final_buf_size_st);
+			sb->final_buf_size =
+				cast_size_t_to_ulong(final_buf_size_st);
+		}
 
 		if (!sb->final_buf)
 			die(_("cannot read blob %s for path %s"),
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 2b64f8f733..adb2ef5130 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -84,7 +84,7 @@ static char *replace_idents_using_mailmap(char *object_buf, size_t *size)
 
 static int filter_object(const char *path, unsigned mode,
 			 const struct object_id *oid,
-			 char **buf, unsigned long *size)
+			 char **buf, size_t *size)
 {
 	enum object_type type;
 
@@ -120,7 +120,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
 	struct object_id oid;
 	enum object_type type;
 	char *buf;
-	unsigned long size;
+	size_t size;
 	struct object_context obj_context = {0};
 	struct object_info oi = OBJECT_INFO_INIT;
 	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
@@ -163,11 +163,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
 		if (odb_read_object_info_extended(the_repository->objects, &oid, &oi, flags) < 0)
 			die("git cat-file: could not get object info");
 
-		if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) {
-			size_t s = size;
-			buf = replace_idents_using_mailmap(buf, &s);
-			size = cast_size_t_to_ulong(s);
-		}
+		if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG))
+			buf = replace_idents_using_mailmap(buf, &size);
 
 		printf("%"PRIuMAX"\n", (uintmax_t)size);
 		ret = 0;
@@ -188,9 +185,15 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
 		break;
 
 	case 'c':
-		if (textconv_object(the_repository, path, obj_context.mode,
-				    &oid, 1, &buf, &size))
+	{
+		unsigned long size_ul = 0;
+		int textconv_ret = textconv_object(the_repository, path,
+						   obj_context.mode, &oid, 1,
+						   &buf, &size_ul);
+		size = size_ul;
+		if (textconv_ret)
 			break;
+	}
 		/* else fallthrough */
 
 	case 'p':
@@ -216,11 +219,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
 		if (!buf)
 			die("Cannot read object %s", obj_name);
 
-		if (use_mailmap) {
-			size_t s = size;
-			buf = replace_idents_using_mailmap(buf, &s);
-			size = cast_size_t_to_ulong(s);
-		}
+		if (use_mailmap)
+			buf = replace_idents_using_mailmap(buf, &size);
 
 		/* otherwise just spit out the data */
 		break;
@@ -263,11 +263,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
 		buf = odb_read_object_peeled(the_repository->objects, &oid,
 					     exp_type_id, &size, NULL);
 
-		if (use_mailmap) {
-			size_t s = size;
-			buf = replace_idents_using_mailmap(buf, &s);
-			size = cast_size_t_to_ulong(s);
-		}
+		if (use_mailmap)
+			buf = replace_idents_using_mailmap(buf, &size);
 		break;
 	}
 	default:
@@ -288,7 +285,7 @@ cleanup:
 struct expand_data {
 	struct object_id oid;
 	enum object_type type;
-	unsigned long size;
+	size_t size;
 	unsigned short mode;
 	off_t disk_size;
 	const char *rest;
@@ -404,7 +401,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 			fflush(stdout);
 		if (opt->transform_mode) {
 			char *contents;
-			unsigned long size;
+			size_t size;
 
 			if (!data->rest)
 				die("missing path for '%s'", oid_to_hex(oid));
@@ -416,9 +413,12 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 					    oid_to_hex(oid), data->rest);
 			} else if (opt->transform_mode == 'c') {
 				enum object_type type;
-				if (!textconv_object(the_repository,
-						     data->rest, 0100644, oid,
-						     1, &contents, &size))
+				unsigned long size_ul = 0;
+				if (textconv_object(the_repository,
+						    data->rest, 0100644, oid,
+						    1, &contents, &size_ul))
+					size = size_ul;
+				else
 					contents = odb_read_object(the_repository->objects,
 								   oid, &type, &size);
 				if (!contents)
@@ -434,7 +434,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	}
 	else {
 		enum object_type type;
-		unsigned long size;
+		size_t size;
 		void *contents;
 
 		contents = odb_read_object(the_repository->objects, oid,
@@ -442,11 +442,8 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 		if (!contents)
 			die("object %s disappeared", oid_to_hex(oid));
 
-		if (use_mailmap) {
-			size_t s = size;
-			contents = replace_idents_using_mailmap(contents, &s);
-			size = cast_size_t_to_ulong(s);
-		}
+		if (use_mailmap)
+			contents = replace_idents_using_mailmap(contents, &size);
 
 		if (type != data->type)
 			die("object %s changed type!?", oid_to_hex(oid));
@@ -546,15 +543,13 @@ static void batch_object_write(const char *obj_name,
 		}
 
 		if (use_mailmap && (data->type == OBJ_COMMIT || data->type == OBJ_TAG)) {
-			size_t s = data->size;
 			char *buf = NULL;
 
 			buf = odb_read_object(the_repository->objects, &data->oid,
 					      &data->type, &data->size);
 			if (!buf)
 				die(_("unable to read %s"), oid_to_hex(&data->oid));
-			buf = replace_idents_using_mailmap(buf, &s);
-			data->size = cast_size_t_to_ulong(s);
+			buf = replace_idents_using_mailmap(buf, &data->size);
 
 			free(buf);
 		}
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 2a21005f2e..26778f8515 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -319,7 +319,7 @@ static char *get_symlink(struct repository *repo,
 		data = strbuf_detach(&link, NULL);
 	} else {
 		enum object_type type;
-		unsigned long size;
+		size_t size;
 		data = odb_read_object(repo->objects, oid, &type, &size);
 		if (!data)
 			die(_("could not read object %s for symlink %s"),
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 2eb43a28da..0be43104dc 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -317,7 +317,10 @@ static void export_blob(const struct object_id *oid)
 		object = (struct object *)lookup_blob(the_repository, oid);
 		eaten = 0;
 	} else {
-		buf = odb_read_object(the_repository->objects, oid, &type, &size);
+		size_t size_st = 0;
+		buf = odb_read_object(the_repository->objects, oid, &type,
+				      &size_st);
+		size = cast_size_t_to_ulong(size_st);
 		if (!buf)
 			die(_("could not read blob %s"), oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
@@ -880,7 +883,7 @@ static char *anonymize_tag(void)
 
 static void handle_tag(const char *name, struct tag *tag)
 {
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 	char *buf;
 	const char *tagger, *tagger_end, *message;
diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 3dff898c43..d11a2cc2c1 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -1291,7 +1291,10 @@ static void load_tree(struct tree_entry *root)
 			die(_("can't load tree %s"), oid_to_hex(oid));
 	} else {
 		enum object_type type;
-		buf = odb_read_object(the_repository->objects, oid, &type, &size);
+		size_t size_st = 0;
+		buf = odb_read_object(the_repository->objects, oid, &type,
+				      &size_st);
+		size = cast_size_t_to_ulong(size_st);
 		if (!buf || type != OBJ_TREE)
 			die(_("can't load tree %s"), oid_to_hex(oid));
 	}
@@ -2560,7 +2563,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa
 			die(_("mark :%" PRIuMAX " not a commit"), commit_mark);
 		oidcpy(&commit_oid, &commit_oe->idx.oid);
 	} else if (!repo_get_oid(the_repository, p, &commit_oid)) {
-		unsigned long size;
+		size_t size;
 		char *buf = odb_read_object_peeled(the_repository->objects,
 						   &commit_oid, OBJ_COMMIT, &size,
 						   &commit_oid);
@@ -2627,10 +2630,12 @@ static void parse_from_existing(struct branch *b)
 		oidclr(&b->branch_tree.versions[1].oid, the_repository->hash_algo);
 	} else {
 		unsigned long size;
+		size_t size_st = 0;
 		char *buf;
 
 		buf = odb_read_object_peeled(the_repository->objects, &b->oid,
-					     OBJ_COMMIT, &size, &b->oid);
+					     OBJ_COMMIT, &size_st, &b->oid);
+		size = cast_size_t_to_ulong(size_st);
 		parse_from_commit(b, buf, size);
 		free(buf);
 	}
@@ -2722,7 +2727,7 @@ static struct hash_list *parse_merge(unsigned int *count)
 				die(_("mark :%" PRIuMAX " not a commit"), idnum);
 			oidcpy(&n->oid, &oe->idx.oid);
 		} else if (!repo_get_oid(the_repository, from, &n->oid)) {
-			unsigned long size;
+			size_t size;
 			char *buf = odb_read_object_peeled(the_repository->objects,
 							   &n->oid, OBJ_COMMIT,
 							   &size, &n->oid);
@@ -3330,7 +3335,10 @@ static void cat_blob(struct object_entry *oe, struct object_id *oid)
 	char *buf;
 
 	if (!oe || oe->pack_id == MAX_PACK_ID) {
-		buf = odb_read_object(the_repository->objects, oid, &type, &size);
+		size_t size_st = 0;
+		buf = odb_read_object(the_repository->objects, oid, &type,
+				      &size_st);
+		size = cast_size_t_to_ulong(size_st);
 	} else {
 		type = oe->type;
 		buf = gfi_unpack_entry(oe, &size);
@@ -3438,8 +3446,10 @@ static struct object_entry *dereference(struct object_entry *oe,
 		buf = gfi_unpack_entry(oe, &size);
 	} else {
 		enum object_type unused;
+		size_t size_st = 0;
 		buf = odb_read_object(the_repository->objects, oid,
-				      &unused, &size);
+				      &unused, &size_st);
+		size = cast_size_t_to_ulong(size_st);
 	}
 	if (!buf)
 		die(_("can't load object %s"), oid_to_hex(oid));
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 248f8ff5a0..76b723f36d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -724,7 +724,7 @@ static int fsck_loose(const struct object_id *oid, const char *path,
 	struct for_each_loose_cb *data = cb_data;
 	struct object *obj;
 	enum object_type type = OBJ_NONE;
-	unsigned long size;
+	size_t size;
 	void *contents = NULL;
 	int eaten;
 	struct object_info oi = OBJECT_INFO_INIT;
diff --git a/builtin/grep.c b/builtin/grep.c
index 6a09571903..26b85479ca 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -520,7 +520,7 @@ static int grep_submodule(struct grep_opt *opt,
 		enum object_type object_type;
 		struct tree_desc tree;
 		void *data;
-		unsigned long size;
+		size_t size;
 		struct strbuf base = STRBUF_INIT;
 
 		obj_read_lock();
@@ -573,7 +573,7 @@ static int grep_cache(struct grep_opt *opt,
 			enum object_type type;
 			struct tree_desc tree;
 			void *data;
-			unsigned long size;
+			size_t size;
 
 			data = odb_read_object(the_repository->objects, &ce->oid,
 					       &type, &size);
@@ -666,7 +666,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 			enum object_type type;
 			struct tree_desc sub;
 			void *data;
-			unsigned long size;
+			size_t size;
 
 			data = odb_read_object(the_repository->objects,
 					       &entry.oid, &type, &size);
@@ -730,7 +730,7 @@ static void collect_blob_oids_for_tree(struct repository *repo,
 			enum object_type type;
 			struct tree_desc sub_tree;
 			void *data;
-			unsigned long size;
+			size_t size;
 
 			data = odb_read_object(repo->objects, &entry.oid,
 					       &type, &size);
@@ -764,7 +764,7 @@ static void collect_blob_oids_for_treeish(struct grep_opt *opt,
 {
 	struct tree_desc tree;
 	void *data;
-	unsigned long size;
+	size_t size;
 	struct strbuf base = STRBUF_INIT;
 	int len;
 
@@ -841,7 +841,7 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 	if (obj->type == OBJ_COMMIT || obj->type == OBJ_TREE) {
 		struct tree_desc tree;
 		void *data;
-		unsigned long size;
+		size_t size;
 		struct strbuf base;
 		int hit, len;
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 3c4474e681..78da3a6566 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -258,7 +258,7 @@ static unsigned check_object(struct object *obj)
 		return 0;
 
 	if (!(obj->flags & FLAG_CHECKED)) {
-		unsigned long size;
+		size_t size;
 		int type = odb_read_object_info(the_repository->objects,
 						&obj->oid, &size);
 		if (type <= 0)
@@ -905,7 +905,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 	if (collision_test_needed) {
 		void *has_data;
 		enum object_type has_type;
-		unsigned long has_size;
+		size_t has_size;
 		read_lock();
 		has_type = odb_read_object_info(the_repository->objects, oid, &has_size);
 		if (has_type < 0)
@@ -1515,7 +1515,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 		struct ref_delta_entry *d = sorted_by_pos[i];
 		enum object_type type;
 		void *data;
-		unsigned long size;
+		size_t size;
 
 		if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
 			continue;
diff --git a/builtin/log.c b/builtin/log.c
index e464b30af4..d027ce1e0b 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -613,7 +613,7 @@ static int show_blob_object(const struct object_id *oid, struct rev_info *rev, c
 
 static int show_tag_object(const struct object_id *oid, struct rev_info *rev)
 {
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 	char *buf = odb_read_object(the_repository->objects, oid, &type, &size);
 	unsigned long offset = 0;
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 12d5d828ff..f30507215a 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -256,7 +256,7 @@ static void expand_objectsize(struct repository *repo, struct strbuf *line,
 	size_t len;
 
 	if (type == OBJ_BLOB) {
-		unsigned long size;
+		size_t size;
 		if (odb_read_object_info(repo->objects, oid, &size) < 0)
 			die(_("could not get object info about '%s'"),
 			    oid_to_hex(oid));
diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index 57846911ce..46edaffc2e 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -32,7 +32,7 @@ static void expand_objectsize(struct strbuf *line, const struct object_id *oid,
 	size_t len;
 
 	if (type == OBJ_BLOB) {
-		unsigned long size;
+		size_t size;
 		if (odb_read_object_info(the_repository->objects, oid, &size) < 0)
 			die(_("could not get object info about '%s'"),
 			    oid_to_hex(oid));
@@ -220,7 +220,7 @@ static int show_tree_long(const struct object_id *oid, struct strbuf *base,
 		return early;
 
 	if (type == OBJ_BLOB) {
-		unsigned long size;
+		size_t size;
 		if (odb_read_object_info(the_repository->objects, oid, &size) == OBJ_BAD)
 			xsnprintf(size_text, sizeof(size_text), "BAD");
 		else
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 312b595d1e..49f41e520f 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -69,7 +69,7 @@ static const char *explanation(struct merge_list *entry)
 	return "removed in remote";
 }
 
-static void *result(struct merge_list *entry, unsigned long *size)
+static void *result(struct merge_list *entry, size_t *size)
 {
 	enum object_type type;
 	struct blob *base, *our, *their;
@@ -96,7 +96,7 @@ static void *result(struct merge_list *entry, unsigned long *size)
 			   base, our, their, size);
 }
 
-static void *origin(struct merge_list *entry, unsigned long *size)
+static void *origin(struct merge_list *entry, size_t *size)
 {
 	enum object_type type;
 	while (entry) {
@@ -119,7 +119,7 @@ static int show_outf(void *priv UNUSED, mmbuffer_t *mb, int nbuf)
 
 static void show_diff(struct merge_list *entry)
 {
-	unsigned long size;
+	size_t size;
 	mmfile_t src, dst;
 	xpparam_t xpp;
 	xdemitconf_t xecfg;
diff --git a/builtin/mktag.c b/builtin/mktag.c
index f40264a878..37c17e6beb 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -50,7 +50,7 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 {
 	int ret;
 	enum object_type type;
-	unsigned long size;
+	size_t size;
 	void *buffer;
 	const struct object_id *repl;
 
diff --git a/builtin/notes.c b/builtin/notes.c
index 9af602bdd7..962df867c8 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -150,7 +150,7 @@ static int list_each_note(const struct object_id *object_oid,
 
 static void copy_obj_to_fd(int fd, const struct object_id *oid)
 {
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 	char *buf = odb_read_object(the_repository->objects, oid, &type, &size);
 	if (buf) {
@@ -313,7 +313,7 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset)
 	char *value;
 	struct object_id object;
 	enum object_type type;
-	unsigned long len;
+	size_t len;
 
 	BUG_ON_OPT_NEG(unset);
 
@@ -721,7 +721,7 @@ static int append_edit(int argc, const char **argv, const char *prefix,
 
 	if (note && !edit) {
 		/* Append buf to previous note contents */
-		unsigned long size;
+		size_t size;
 		enum object_type type;
 		struct strbuf buf = STRBUF_INIT;
 		char *prev_buf = odb_read_object(the_repository->objects, note, &type, &size);
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 961d547ef2..b5092d97ee 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -356,14 +356,17 @@ static void *get_delta(struct object_entry *entry)
 	unsigned long size, base_size, delta_size;
 	void *buf, *base_buf, *delta_buf;
 	enum object_type type;
+	size_t size_st = 0, base_size_st = 0;
 
 	buf = odb_read_object(the_repository->objects, &entry->idx.oid,
-			      &type, &size);
+			      &type, &size_st);
+	size = cast_size_t_to_ulong(size_st);
 	if (!buf)
 		die(_("unable to read %s"), oid_to_hex(&entry->idx.oid));
 	base_buf = odb_read_object(the_repository->objects,
 				   &DELTA(entry)->idx.oid, &type,
-				   &base_size);
+				   &base_size_st);
+	base_size = cast_size_t_to_ulong(base_size_st);
 	if (!base_buf)
 		die("unable to read %s",
 		    oid_to_hex(&DELTA(entry)->idx.oid));
@@ -528,9 +531,11 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
 			type = st->type;
 			size = st->size;
 		} else {
+			size_t size_st = 0;
 			buf = odb_read_object(the_repository->objects,
 					      &entry->idx.oid, &type,
-					      &size);
+					      &size_st);
+			size = cast_size_t_to_ulong(size_st);
 			if (!buf)
 				die(_("unable to read %s"),
 				    oid_to_hex(&entry->idx.oid));
@@ -1937,6 +1942,7 @@ static struct pbase_tree_cache *pbase_tree_get(const struct object_id *oid)
 	struct pbase_tree_cache *ent, *nent;
 	void *data;
 	unsigned long size;
+	size_t size_st = 0;
 	enum object_type type;
 	int neigh;
 	int my_ix = pbase_tree_cache_ix(oid);
@@ -1964,7 +1970,8 @@ static struct pbase_tree_cache *pbase_tree_get(const struct object_id *oid)
 	/* Did not find one.  Either we got a bogus request or
 	 * we need to read and perhaps cache.
 	 */
-	data = odb_read_object(the_repository->objects, oid, &type, &size);
+	data = odb_read_object(the_repository->objects, oid, &type, &size_st);
+	size = cast_size_t_to_ulong(size_st);
 	if (!data)
 		return NULL;
 	if (type != OBJ_TREE) {
@@ -2119,13 +2126,15 @@ static void add_preferred_base(struct object_id *oid)
 	struct pbase_tree *it;
 	void *data;
 	unsigned long size;
+	size_t size_st = 0;
 	struct object_id tree_oid;
 
 	if (window <= num_preferred_base++)
 		return;
 
 	data = odb_read_object_peeled(the_repository->objects, oid,
-				      OBJ_TREE, &size, &tree_oid);
+				      OBJ_TREE, &size_st, &tree_oid);
+	size = cast_size_t_to_ulong(size_st);
 	if (!data)
 		return;
 
@@ -2237,7 +2246,7 @@ static void prefetch_to_pack(uint32_t object_index_start) {
 
 static void check_object(struct object_entry *entry, uint32_t object_index)
 {
-	unsigned long canonical_size;
+	size_t canonical_size;
 	enum object_type type;
 	struct object_info oi = {.typep = &type, .sizep = &canonical_size};
 
@@ -2436,7 +2445,7 @@ static void drop_reused_delta(struct object_entry *entry)
 	unsigned *idx = &to_pack.objects[entry->delta_idx - 1].delta_child_idx;
 	struct object_info oi = OBJECT_INFO_INIT;
 	enum object_type type;
-	unsigned long size;
+	size_t size;
 
 	while (*idx) {
 		struct object_entry *oe = &to_pack.objects[*idx - 1];
@@ -2748,7 +2757,7 @@ size_t oe_get_size_slow(struct packing_data *pack,
 	size_t size;
 
 	if (e->type_ != OBJ_OFS_DELTA && e->type_ != OBJ_REF_DELTA) {
-		unsigned long sz;
+		size_t sz;
 		packing_data_lock(&to_pack);
 		if (odb_read_object_info(the_repository->objects,
 					 &e->idx.oid, &sz) < 0)
@@ -2833,10 +2842,12 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 
 	/* Load data if not already done */
 	if (!trg->data) {
+		size_t sz_st = 0;
 		packing_data_lock(&to_pack);
 		trg->data = odb_read_object(the_repository->objects,
 					    &trg_entry->idx.oid, &type,
-					    &sz);
+					    &sz_st);
+		sz = cast_size_t_to_ulong(sz_st);
 		packing_data_unlock(&to_pack);
 		if (!trg->data)
 			die(_("object %s cannot be read"),
@@ -2848,10 +2859,12 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 		*mem_usage += sz;
 	}
 	if (!src->data) {
+		size_t sz_st = 0;
 		packing_data_lock(&to_pack);
 		src->data = odb_read_object(the_repository->objects,
 					    &src_entry->idx.oid, &type,
-					    &sz);
+					    &sz_st);
+		sz = cast_size_t_to_ulong(sz_st);
 		packing_data_unlock(&to_pack);
 		if (!src->data) {
 			if (src_entry->preferred_base) {
diff --git a/builtin/repo.c b/builtin/repo.c
index 71a5c1c29c..69f3626467 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -784,13 +784,14 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 	for (size_t i = 0; i < oids->nr; i++) {
 		struct object_info oi = OBJECT_INFO_INIT;
 		unsigned long inflated;
+		size_t inflated_st = 0;
 		struct commit *commit;
 		struct object *obj;
 		void *content;
 		off_t disk;
 		int eaten;
 
-		oi.sizep = &inflated;
+		oi.sizep = &inflated_st;
 		oi.disk_sizep = &disk;
 		oi.contentp = &content;
 
@@ -798,6 +799,7 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 						  OBJECT_INFO_SKIP_FETCH_OBJECT |
 						  OBJECT_INFO_QUICK) < 0)
 			continue;
+		inflated = cast_size_t_to_ulong(inflated_st);
 
 		obj = parse_object_buffer(the_repository, &oids->oid[i], type,
 					  inflated, content, &eaten);
diff --git a/builtin/tag.c b/builtin/tag.c
index d51c2e3349..06c125b53c 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -238,7 +238,7 @@ static int git_tag_config(const char *var, const char *value,
 
 static void write_tag_body(int fd, const struct object_id *oid)
 {
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 	char *buf, *sp, *orig;
 	struct strbuf payload = STRBUF_INIT;
@@ -388,7 +388,7 @@ static void create_reflog_msg(const struct object_id *oid, struct strbuf *sb)
 	enum object_type type;
 	struct commit *c;
 	char *buf;
-	unsigned long size;
+	size_t size;
 	int subject_len = 0;
 	const char *subject_start;
 
diff --git a/builtin/unpack-file.c b/builtin/unpack-file.c
index 87877a9fab..387389ed49 100644
--- a/builtin/unpack-file.c
+++ b/builtin/unpack-file.c
@@ -12,7 +12,7 @@ static char *create_temp_file(struct object_id *oid)
 	static char path[50];
 	void *buf;
 	enum object_type type;
-	unsigned long size;
+	size_t size;
 	int fd;
 
 	buf = odb_read_object(the_repository->objects, oid, &type, &size);
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index e7a50c493c..f3849bb654 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -231,7 +231,7 @@ static int check_object(struct object *obj, enum object_type type,
 		die("object type mismatch");
 
 	if (!(obj->flags & FLAG_OPEN)) {
-		unsigned long size;
+		size_t size;
 		int type = odb_read_object_info(the_repository->objects, &obj->oid, &size);
 		if (type != obj->type || type <= 0)
 			die("object of unexpected type");
@@ -436,6 +436,7 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size,
 {
 	void *delta_data, *base;
 	unsigned long base_size;
+	size_t base_size_st = 0;
 	struct object_id base_oid;
 
 	if (type == OBJ_REF_DELTA) {
@@ -512,7 +513,8 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size,
 		return;
 
 	base = odb_read_object(the_repository->objects, &base_oid,
-			       &type, &base_size);
+			       &type, &base_size_st);
+	base_size = cast_size_t_to_ulong(base_size_st);
 	if (!base) {
 		error("failed to read delta-pack base object %s",
 		      oid_to_hex(&base_oid));
diff --git a/bundle.c b/bundle.c
index 42327f9739..fd2db2c837 100644
--- a/bundle.c
+++ b/bundle.c
@@ -296,7 +296,7 @@ int list_bundle_refs(struct bundle_header *header, int argc, const char **argv)
 
 static int is_tag_in_date_range(struct object *tag, struct rev_info *revs)
 {
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 	char *buf = NULL, *line, *lineend;
 	timestamp_t date;
diff --git a/combine-diff.c b/combine-diff.c
index b799862068..3ce71db8bb 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -325,7 +325,9 @@ static char *grab_blob(struct repository *r,
 		*size = fill_textconv(r, textconv, df, &blob);
 		free_filespec(df);
 	} else {
-		blob = odb_read_object(r->objects, oid, &type, size);
+		size_t size_st = 0;
+		blob = odb_read_object(r->objects, oid, &type, &size_st);
+		*size = cast_size_t_to_ulong(size_st);
 		if (!blob)
 			die(_("unable to read %s"), oid_to_hex(oid));
 		if (type != OBJ_BLOB)
diff --git a/commit.c b/commit.c
index fd8723502e..7950effc58 100644
--- a/commit.c
+++ b/commit.c
@@ -395,7 +395,7 @@ const void *repo_get_commit_buffer(struct repository *r,
 	const void *ret = get_cached_commit_buffer(r, commit, sizep);
 	if (!ret) {
 		enum object_type type;
-		unsigned long size;
+		size_t size;
 		ret = odb_read_object(r->objects, &commit->object.oid, &type, &size);
 		if (!ret)
 			die("cannot read commit object %s",
@@ -404,7 +404,7 @@ const void *repo_get_commit_buffer(struct repository *r,
 			die("expected commit for %s, got %s",
 			    oid_to_hex(&commit->object.oid), type_name(type));
 		if (sizep)
-			*sizep = size;
+			*sizep = cast_size_t_to_ulong(size);
 	}
 	return ret;
 }
@@ -437,7 +437,7 @@ static inline void set_commit_tree(struct commit *c, struct tree *t)
 static void load_tree_from_commit_contents(struct repository *r, struct commit *commit)
 {
 	enum object_type type;
-	unsigned long size;
+	size_t size;
 	char *buf;
 	const char *p;
 	struct object_id tree_oid;
@@ -604,7 +604,7 @@ int repo_parse_commit_internal(struct repository *r,
 {
 	enum object_type type;
 	void *buffer;
-	unsigned long size;
+	size_t size;
 	struct object_info oi = {
 		.typep = &type,
 		.sizep = &size,
@@ -1313,7 +1313,7 @@ static void handle_signed_tag(const struct commit *parent, struct commit_extra_h
 	struct merge_remote_desc *desc;
 	struct commit_extra_header *mergetag;
 	char *buf;
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 	struct strbuf payload = STRBUF_INIT;
 	struct strbuf signature = STRBUF_INIT;
diff --git a/config.c b/config.c
index a1b92fe083..21b231052c 100644
--- a/config.c
+++ b/config.c
@@ -1442,7 +1442,7 @@ int git_config_from_blob_oid(config_fn_t fn,
 {
 	enum object_type type;
 	char *buf;
-	unsigned long size;
+	size_t size;
 	int ret;
 
 	buf = odb_read_object(repo->objects, oid, &type, &size);
diff --git a/diff.c b/diff.c
index 5a584fa1d5..816b89dc6c 100644
--- a/diff.c
+++ b/diff.c
@@ -4594,8 +4594,9 @@ int diff_populate_filespec(struct repository *r,
 		}
 	}
 	else {
+		size_t size_st = 0;
 		struct object_info info = {
-			.sizep = &s->size
+			.sizep = &size_st
 		};
 
 		if (!(size_only || check_binary))
@@ -4617,6 +4618,7 @@ int diff_populate_filespec(struct repository *r,
 			die("unable to read %s", oid_to_hex(&s->oid));
 
 object_read:
+		s->size = cast_size_t_to_ulong(size_st);
 		if (size_only || check_binary) {
 			if (size_only)
 				return 0;
@@ -4631,6 +4633,7 @@ object_read:
 			if (odb_read_object_info_extended(r->objects, &s->oid, &info,
 							  OBJECT_INFO_LOOKUP_REPLACE))
 				die("unable to read %s", oid_to_hex(&s->oid));
+			s->size = cast_size_t_to_ulong(size_st);
 		}
 		s->should_free = 1;
 	}
diff --git a/dir.c b/dir.c
index 33c81c256e..b6764d98a7 100644
--- a/dir.c
+++ b/dir.c
@@ -324,7 +324,7 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
 			size_t *size_out, char **data_out)
 {
 	enum object_type type;
-	unsigned long sz;
+	size_t sz;
 	char *data;
 
 	*size_out = 0;
diff --git a/entry.c b/entry.c
index 7817aee362..c444fe5a10 100644
--- a/entry.c
+++ b/entry.c
@@ -92,11 +92,9 @@ static int create_file(const char *path, unsigned int mode)
 void *read_blob_entry(const struct cache_entry *ce, size_t *size)
 {
 	enum object_type type;
-	unsigned long ul;
 	void *blob_data = odb_read_object(the_repository->objects, &ce->oid,
-					  &type, &ul);
+					  &type, size);
 
-	*size = ul;
 	if (blob_data) {
 		if (type == OBJ_BLOB)
 			return blob_data;
diff --git a/fmt-merge-msg.c b/fmt-merge-msg.c
index 45d8b20e97..14441f23ae 100644
--- a/fmt-merge-msg.c
+++ b/fmt-merge-msg.c
@@ -528,11 +528,11 @@ static void fmt_merge_msg_sigs(struct strbuf *out)
 	for (i = 0; i < origins.nr; i++) {
 		struct object_id *oid = origins.items[i].util;
 		enum object_type type;
-		unsigned long size;
+		size_t size;
 		char *buf = odb_read_object(the_repository->objects, oid,
 					    &type, &size);
 		char *origbuf = buf;
-		unsigned long len = size;
+		size_t len = size;
 		struct signature_check sigc = { NULL };
 		struct strbuf payload = STRBUF_INIT, sig = STRBUF_INIT;
 
diff --git a/fsck.c b/fsck.c
index b4ffee6a04..94c8651c7d 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1328,7 +1328,7 @@ static int fsck_blobs(struct oidset *blobs_found, struct oidset *blobs_done,
 	oidset_iter_init(blobs_found, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
 		enum object_type type;
-		unsigned long size;
+		size_t size;
 		char *buf;
 
 		if (oidset_contains(blobs_done, oid))
diff --git a/grep.c b/grep.c
index a54e5d86a9..1d75d31421 100644
--- a/grep.c
+++ b/grep.c
@@ -1931,9 +1931,11 @@ void grep_source_clear_data(struct grep_source *gs)
 static int grep_source_load_oid(struct grep_source *gs)
 {
 	enum object_type type;
+	size_t size_st = 0;
 
 	gs->buf = odb_read_object(gs->repo->objects, gs->identifier,
-				  &type, &gs->size);
+				  &type, &size_st);
+	gs->size = cast_size_t_to_ulong(size_st);
 	if (!gs->buf)
 		return error(_("'%s': unable to read %s"),
 			     gs->name,
diff --git a/http-push.c b/http-push.c
index 520d6c3b6a..c61d9f7e02 100644
--- a/http-push.c
+++ b/http-push.c
@@ -365,7 +365,7 @@ static void start_put(struct transfer_request *request)
 	enum object_type type;
 	char hdr[50];
 	void *unpacked;
-	unsigned long len;
+	size_t len;
 	int hdrlen;
 	ssize_t size;
 	git_zstream stream;
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 78316e7f90..c912ff3079 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -280,7 +280,7 @@ static enum list_objects_filter_result filter_blobs_limit(
 	void *filter_data_)
 {
 	struct filter_blobs_limit_data *filter_data = filter_data_;
-	unsigned long object_length;
+	size_t object_length;
 	enum object_type t;
 
 	switch (filter_situation) {
diff --git a/mailmap.c b/mailmap.c
index 3b2691781d..72b639e602 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -186,7 +186,7 @@ int read_mailmap_blob(struct repository *repo, struct string_list *map,
 {
 	struct object_id oid;
 	char *buf;
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 
 	if (!name)
diff --git a/match-trees.c b/match-trees.c
index 4216933d06..2a43c0fa1a 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -61,7 +61,7 @@ static void *fill_tree_desc_strict(struct repository *r,
 {
 	void *buffer;
 	enum object_type type;
-	unsigned long size;
+	size_t size;
 
 	buffer = odb_read_object(r->objects, hash, &type, &size);
 	if (!buffer)
@@ -186,7 +186,7 @@ static int splice_tree(struct repository *r,
 	char *subpath;
 	int toplen;
 	char *buf;
-	unsigned long sz;
+	size_t sz;
 	struct tree_desc desc;
 	unsigned char *rewrite_here;
 	const struct object_id *rewrite_with;
diff --git a/merge-blobs.c b/merge-blobs.c
index 6fc2799417..16a75bd1e3 100644
--- a/merge-blobs.c
+++ b/merge-blobs.c
@@ -9,7 +9,7 @@
 static int fill_mmfile_blob(mmfile_t *f, struct blob *obj)
 {
 	void *buf;
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 
 	buf = odb_read_object(the_repository->objects, &obj->object.oid,
@@ -35,7 +35,7 @@ static void *three_way_filemerge(struct index_state *istate,
 				 mmfile_t *base,
 				 mmfile_t *our,
 				 mmfile_t *their,
-				 unsigned long *size)
+				 size_t *size)
 {
 	enum ll_merge_result merge_status;
 	mmbuffer_t res;
@@ -61,7 +61,7 @@ static void *three_way_filemerge(struct index_state *istate,
 
 void *merge_blobs(struct index_state *istate, const char *path,
 		  struct blob *base, struct blob *our,
-		  struct blob *their, unsigned long *size)
+		  struct blob *their, size_t *size)
 {
 	void *res = NULL;
 	mmfile_t f1, f2, common;
diff --git a/merge-blobs.h b/merge-blobs.h
index 13cf9669e5..5797517a06 100644
--- a/merge-blobs.h
+++ b/merge-blobs.h
@@ -6,6 +6,6 @@ struct index_state;
 
 void *merge_blobs(struct index_state *, const char *,
 		  struct blob *, struct blob *,
-		  struct blob *, unsigned long *);
+		  struct blob *, size_t *);
 
 #endif /* MERGE_BLOBS_H */
diff --git a/merge-ort.c b/merge-ort.c
index 544be9e466..4f6273bd51 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -3716,7 +3716,7 @@ static int read_oid_strbuf(struct merge_options *opt,
 {
 	void *buf;
 	enum object_type type;
-	unsigned long size;
+	size_t size;
 	buf = odb_read_object(opt->repo->objects, oid, &type, &size);
 	if (!buf) {
 		path_msg(opt, ERROR_OBJECT_READ_FAILED, 0,
diff --git a/notes-cache.c b/notes-cache.c
index bf5bb1f6c1..74cef802bd 100644
--- a/notes-cache.c
+++ b/notes-cache.c
@@ -82,7 +82,7 @@ char *notes_cache_get(struct notes_cache *c, struct object_id *key_oid,
 	const struct object_id *value_oid;
 	enum object_type type;
 	char *value;
-	unsigned long size;
+	size_t size;
 
 	value_oid = get_note(&c->tree, key_oid);
 	if (!value_oid)
diff --git a/notes-merge.c b/notes-merge.c
index b9322abbcb..118cad2518 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -339,7 +339,7 @@ static void write_note_to_worktree(const struct object_id *obj,
 				   const struct object_id *note)
 {
 	enum object_type type;
-	unsigned long size;
+	size_t size;
 	void *buf = odb_read_object(the_repository->objects, note, &type, &size);
 
 	if (!buf)
diff --git a/notes.c b/notes.c
index 8f315e2a00..ec9c2cb150 100644
--- a/notes.c
+++ b/notes.c
@@ -811,7 +811,8 @@ int combine_notes_concatenate(struct object_id *cur_oid,
 			      const struct object_id *new_oid)
 {
 	char *cur_msg = NULL, *new_msg = NULL, *buf;
-	unsigned long cur_len, new_len, buf_len;
+	unsigned long buf_len;
+	size_t cur_len, new_len;
 	enum object_type cur_type, new_type;
 	int ret;
 
@@ -875,7 +876,7 @@ static int string_list_add_note_lines(struct string_list *list,
 				      const struct object_id *oid)
 {
 	char *data;
-	unsigned long len;
+	size_t len;
 	enum object_type t;
 
 	if (is_null_oid(oid))
@@ -1282,7 +1283,8 @@ static void format_note(struct notes_tree *t, const struct object_id *object_oid
 	static const char utf8[] = "utf-8";
 	const struct object_id *oid;
 	char *msg, *msg_p;
-	unsigned long linelen, msglen;
+	unsigned long linelen;
+	size_t msglen;
 	enum object_type type;
 
 	if (!t)
diff --git a/object-file.c b/object-file.c
index bce941874e..3a21c14027 100644
--- a/object-file.c
+++ b/object-file.c
@@ -300,7 +300,7 @@ int parse_loose_header(const char *hdr, struct object_info *oi)
 	}
 
 	if (oi->sizep)
-		*oi->sizep = cast_size_t_to_ulong(size);
+		*oi->sizep = size;
 
 	/*
 	 * The length must be followed by a zero byte
@@ -931,7 +931,7 @@ int force_object_loose(struct odb_source *source,
 	struct odb_source_files *files = odb_source_files_downcast(source);
 	const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
 	void *buf;
-	unsigned long len;
+	size_t len;
 	struct object_info oi = OBJECT_INFO_INIT;
 	struct object_id compat_oid;
 	enum object_type type;
@@ -1614,7 +1614,7 @@ int read_loose_object(struct repository *repo,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	unsigned long *size = oi->sizep;
+	size_t *size = oi->sizep;
 
 	fd = git_open(path);
 	if (fd >= 0)
diff --git a/object.c b/object.c
index 465902ecc6..23b84aa7e2 100644
--- a/object.c
+++ b/object.c
@@ -325,7 +325,7 @@ struct object *parse_object_with_flags(struct repository *r,
 {
 	int skip_hash = !!(flags & PARSE_OBJECT_SKIP_HASH_CHECK);
 	int discard_tree = !!(flags & PARSE_OBJECT_DISCARD_TREE);
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 	int eaten;
 	const struct object_id *repl = lookup_replace_object(r, oid);
diff --git a/odb.c b/odb.c
index 965ef68e4e..7d555be09f 100644
--- a/odb.c
+++ b/odb.c
@@ -625,7 +625,7 @@ static int oid_object_info_convert(struct repository *r,
 	enum object_type type;
 	struct object_id oid, delta_base_oid;
 	struct object_info new_oi, *oi;
-	unsigned long size;
+	size_t size;
 	void *content;
 	int ret;
 
@@ -716,7 +716,7 @@ int odb_read_object_info_extended(struct object_database *odb,
 /* returns enum object_type or negative */
 int odb_read_object_info(struct object_database *odb,
 			 const struct object_id *oid,
-			 unsigned long *sizep)
+			 size_t *sizep)
 {
 	enum object_type type;
 	struct object_info oi = OBJECT_INFO_INIT;
@@ -730,7 +730,7 @@ int odb_read_object_info(struct object_database *odb,
 }
 
 int odb_pretend_object(struct object_database *odb,
-		       void *buf, unsigned long len, enum object_type type,
+		       void *buf, size_t len, enum object_type type,
 		       struct object_id *oid)
 {
 	hash_object_file(odb->repo->hash_algo, buf, len, type, oid);
@@ -744,7 +744,7 @@ int odb_pretend_object(struct object_database *odb,
 void *odb_read_object(struct object_database *odb,
 		      const struct object_id *oid,
 		      enum object_type *type,
-		      unsigned long *size)
+		      size_t *size)
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	unsigned flags = OBJECT_INFO_DIE_IF_CORRUPT | OBJECT_INFO_LOOKUP_REPLACE;
@@ -762,12 +762,12 @@ void *odb_read_object(struct object_database *odb,
 void *odb_read_object_peeled(struct object_database *odb,
 			     const struct object_id *oid,
 			     enum object_type required_type,
-			     unsigned long *size,
+			     size_t *size,
 			     struct object_id *actual_oid_return)
 {
 	enum object_type type;
 	void *buffer;
-	unsigned long isize;
+	size_t isize;
 	struct object_id actual_oid;
 
 	oidcpy(&actual_oid, oid);
diff --git a/odb.h b/odb.h
index 73553ed5a7..e2f0bbad25 100644
--- a/odb.h
+++ b/odb.h
@@ -228,12 +228,12 @@ struct odb_source *odb_add_to_alternates_memory(struct object_database *odb,
 void *odb_read_object(struct object_database *odb,
 		      const struct object_id *oid,
 		      enum object_type *type,
-		      unsigned long *size);
+		      size_t *size);
 
 void *odb_read_object_peeled(struct object_database *odb,
 			     const struct object_id *oid,
 			     enum object_type required_type,
-			     unsigned long *size,
+			     size_t *size,
 			     struct object_id *oid_ret);
 
 /*
@@ -245,13 +245,13 @@ void *odb_read_object_peeled(struct object_database *odb,
  * that reference it.
  */
 int odb_pretend_object(struct object_database *odb,
-		       void *buf, unsigned long len, enum object_type type,
+		       void *buf, size_t len, enum object_type type,
 		       struct object_id *oid);
 
 struct object_info {
 	/* Request */
 	enum object_type *typep;
-	unsigned long *sizep;
+	size_t *sizep;
 	off_t *disk_sizep;
 	struct object_id *delta_base_oid;
 	void **contentp;
@@ -356,7 +356,7 @@ int odb_read_object_info_extended(struct object_database *odb,
  */
 int odb_read_object_info(struct object_database *odb,
 			 const struct object_id *oid,
-			 unsigned long *sizep);
+			 size_t *sizep);
 
 enum odb_has_object_flags {
 	/* Retry packed storage after checking packed and loose storage */
diff --git a/odb/source-loose.c b/odb/source-loose.c
index 7d7ea2fb84..66e6bb8d3f 100644
--- a/odb/source-loose.c
+++ b/odb/source-loose.c
@@ -72,7 +72,7 @@ static int read_object_info_from_path(struct odb_source_loose *loose,
 	void *map = NULL;
 	git_zstream stream, *stream_to_end = NULL;
 	char hdr[MAX_HEADER_LEN];
-	unsigned long size_scratch;
+	size_t size_scratch;
 	enum object_type type_scratch;
 	struct stat st;
 
@@ -355,7 +355,6 @@ static int odb_source_loose_read_object_stream(struct odb_read_stream **out,
 	struct object_info oi = OBJECT_INFO_INIT;
 	struct odb_loose_read_stream *st;
 	unsigned long mapsize;
-	unsigned long size_ul;
 	void *mapped;
 
 	mapped = odb_source_loose_map_object(loose, oid, &mapsize);
@@ -379,18 +378,11 @@ static int odb_source_loose_read_object_stream(struct odb_read_stream **out,
 		goto error;
 	}
 
-	/*
-	 * object_info.sizep is unsigned long* (32-bit on Windows), but
-	 * st->base.size is size_t (64-bit). Use temporary variable.
-	 * Note: loose objects >4GB would still truncate here, but such
-	 * large loose objects are uncommon (they'd normally be packed).
-	 */
-	oi.sizep = &size_ul;
+	oi.sizep = &st->base.size;
 	oi.typep = &st->base.type;
 
 	if (parse_loose_header(st->hdr, &oi) < 0 || st->base.type < 0)
 		goto error;
-	st->base.size = size_ul;
 
 	st->mapped = mapped;
 	st->mapsize = mapsize;
diff --git a/odb/streaming.c b/odb/streaming.c
index 7602a8d5d8..20531e864c 100644
--- a/odb/streaming.c
+++ b/odb/streaming.c
@@ -157,26 +157,15 @@ static int open_istream_incore(struct odb_read_stream **out,
 		.base.read = read_istream_incore,
 	};
 	struct odb_incore_read_stream *st;
-	unsigned long size_ul;
 	int ret;
 
 	oi.typep = &stream.base.type;
-	/*
-	 * object_info.sizep is unsigned long* (32-bit on Windows), but
-	 * stream.base.size is size_t (64-bit). We use a temporary variable
-	 * because the types are incompatible. Note: this path still truncates
-	 * for >4GB objects, but large objects should use pack streaming
-	 * (packfile_store_read_object_stream) which handles size_t properly.
-	 * This incore fallback is only used for small objects or when pack
-	 * streaming is unavailable.
-	 */
-	oi.sizep = &size_ul;
+	oi.sizep = &stream.base.size;
 	oi.contentp = (void **)&stream.buf;
 	ret = odb_read_object_info_extended(odb, oid, &oi,
 					    OBJECT_INFO_DIE_IF_CORRUPT);
 	if (ret)
 		return ret;
-	stream.base.size = size_ul;
 
 	CALLOC_ARRAY(st, 1);
 	*st = stream;
diff --git a/pack-bitmap.c b/pack-bitmap.c
index f9af8a96bd..e8a82945cc 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1856,7 +1856,7 @@ static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
 static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 				     uint32_t pos)
 {
-	unsigned long size;
+	size_t size;
 	struct object_info oi = OBJECT_INFO_INIT;
 
 	oi.sizep = &size;
@@ -1891,7 +1891,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 			die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
 	}
 
-	return size;
+	return cast_size_t_to_ulong(size);
 }
 
 static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
diff --git a/packfile.c b/packfile.c
index c174982d10..78c389e6f3 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1607,13 +1607,10 @@ static int packed_object_info_with_index_pos(struct packed_git *p, off_t obj_off
 	 * a "real" type later if the caller is interested.
 	 */
 	if (oi->contentp) {
-		size_t size_st = 0;
 		*oi->contentp = cache_or_unpack_entry(p->repo, p, obj_offset,
-						      &size_st, &type);
+						      oi->sizep, &type);
 		if (!*oi->contentp)
 			type = OBJ_BAD;
-		else if (oi->sizep)
-			*oi->sizep = cast_size_t_to_ulong(size_st);
 	} else if (oi->sizep || oi->typep || oi->delta_base_oid) {
 		type = unpack_object_header(p, &w_curs, &curpos, &size);
 	}
@@ -1633,7 +1630,7 @@ static int packed_object_info_with_index_pos(struct packed_git *p, off_t obj_off
 				goto out;
 			}
 		}
-		*oi->sizep = (unsigned long)size;
+		*oi->sizep = size;
 	}
 
 	if (oi->disk_sizep || (oi->mtimep && p->is_cruft)) {
@@ -1919,7 +1916,6 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			struct object_id base_oid;
 			if (!(offset_to_pack_pos(p, obj_offset, &pos))) {
 				struct object_info oi = OBJECT_INFO_INIT;
-				unsigned long bsz_ul = 0;
 
 				nth_packed_object_id(&base_oid, p,
 						     pack_pos_to_index(p, pos));
@@ -1930,13 +1926,11 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 				mark_bad_packed_object(p, &base_oid);
 
 				oi.typep = &type;
-				oi.sizep = &bsz_ul;
+				oi.sizep = &base_size;
 				oi.contentp = &base;
 				if (odb_read_object_info_extended(r->objects, &base_oid,
 								  &oi, 0) < 0)
 					base = NULL;
-				else
-					base_size = bsz_ul;
 
 				external_base = base;
 			}
diff --git a/path-walk.c b/path-walk.c
index 94ff90bd15..edc8e736d7 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -368,7 +368,7 @@ static int walk_path(struct path_walk_context *ctx,
 		struct oid_array filtered = OID_ARRAY_INIT;
 
 		for (size_t i = 0; i < list->oids.nr; i++) {
-			unsigned long size;
+			size_t size;
 
 			if (odb_read_object_info(ctx->repo->objects,
 						 &list->oids.oid[i],
diff --git a/protocol-caps.c b/protocol-caps.c
index 35072ed60b..8858ea4489 100644
--- a/protocol-caps.c
+++ b/protocol-caps.c
@@ -50,7 +50,7 @@ static void send_info(struct repository *r, struct packet_writer *writer,
 	for_each_string_list_item (item, oid_str_list) {
 		const char *oid_str = item->string;
 		struct object_id oid;
-		unsigned long object_size;
+		size_t object_size;
 
 		if (get_oid_hex_algop(oid_str, &oid, r->hash_algo) < 0) {
 			packet_writer_error(
@@ -66,7 +66,8 @@ static void send_info(struct repository *r, struct packet_writer *writer,
 			if (odb_read_object_info(r->objects, &oid, &object_size) < 0) {
 				strbuf_addstr(&send_buffer, " ");
 			} else {
-				strbuf_addf(&send_buffer, " %lu", object_size);
+				strbuf_addf(&send_buffer, " %"PRIuMAX,
+					    (uintmax_t)object_size);
 			}
 		}
 
diff --git a/read-cache.c b/read-cache.c
index 21829102ae..21ca58beea 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -250,7 +250,7 @@ static int ce_compare_link(const struct cache_entry *ce, size_t expected_size)
 {
 	int match = -1;
 	void *buffer;
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 	struct strbuf sb = STRBUF_INIT;
 
@@ -3462,7 +3462,7 @@ void *read_blob_data_from_index(struct index_state *istate,
 				const char *path, unsigned long *size)
 {
 	int pos, len;
-	unsigned long sz;
+	size_t sz;
 	enum object_type type;
 	void *data;
 
@@ -3490,7 +3490,7 @@ void *read_blob_data_from_index(struct index_state *istate,
 		return NULL;
 	}
 	if (size)
-		*size = sz;
+		*size = cast_size_t_to_ulong(sz);
 	return data;
 }
 
diff --git a/ref-filter.c b/ref-filter.c
index 1da4c0e60d..8ba91c72a1 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -86,7 +86,7 @@ struct ref_trailer_buf {
 static struct expand_data {
 	struct object_id oid;
 	enum object_type type;
-	unsigned long size;
+	size_t size;
 	off_t disk_size;
 	struct object_id delta_base_oid;
 	void *content;
diff --git a/reflog.c b/reflog.c
index 82337078d0..04edbe5670 100644
--- a/reflog.c
+++ b/reflog.c
@@ -154,7 +154,7 @@ static int tree_is_complete(const struct object_id *oid)
 
 	if (!tree->buffer) {
 		enum object_type type;
-		unsigned long size;
+		size_t size;
 		void *data = odb_read_object(the_repository->objects, oid,
 					     &type, &size);
 		if (!data) {
diff --git a/rerere.c b/rerere.c
index 0296700f9f..068321b24f 100644
--- a/rerere.c
+++ b/rerere.c
@@ -990,7 +990,7 @@ static int handle_cache(struct index_state *istate,
 
 	while (pos < istate->cache_nr) {
 		enum object_type type;
-		unsigned long size;
+		size_t size;
 
 		ce = istate->cache[pos++];
 		if (ce_namelen(ce) != len || memcmp(ce->name, path, len))
diff --git a/submodule-config.c b/submodule-config.c
index a81897b4e0..f75997402a 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -694,7 +694,7 @@ static const struct submodule *config_from(struct submodule_cache *cache,
 		enum lookup_type lookup_type)
 {
 	struct strbuf rev = STRBUF_INIT;
-	unsigned long config_size;
+	size_t config_size;
 	char *config = NULL;
 	struct object_id oid;
 	enum object_type type;
diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
index c493b75e02..840797cf0d 100644
--- a/t/helper/test-pack-deltas.c
+++ b/t/helper/test-pack-deltas.c
@@ -48,7 +48,8 @@ static void write_ref_delta(struct hashfile *f,
 			    struct object_id *base)
 {
 	unsigned char header[MAX_PACK_OBJECT_HEADER];
-	unsigned long size, base_size, delta_size, compressed_size, hdrlen;
+	unsigned long delta_size, compressed_size, hdrlen;
+	size_t size, base_size;
 	enum object_type type;
 	void *base_buf, *delta_buf;
 	void *buf = odb_read_object(the_repository->objects,
diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
index a7aab426d0..87c59108e0 100644
--- a/t/helper/test-partial-clone.c
+++ b/t/helper/test-partial-clone.c
@@ -17,7 +17,7 @@ static void object_info(const char *gitdir, const char *oid_hex)
 {
 	struct repository r;
 	struct object_id oid;
-	unsigned long size;
+	size_t size;
 	struct object_info oi = {.sizep = &size};
 	const char *p;
 
diff --git a/t/unit-tests/u-odb-inmemory.c b/t/unit-tests/u-odb-inmemory.c
index 482502ef4b..6844bfc37c 100644
--- a/t/unit-tests/u-odb-inmemory.c
+++ b/t/unit-tests/u-odb-inmemory.c
@@ -20,7 +20,7 @@ static void cl_assert_object_info(struct odb_source_inmemory *source,
 				  const char *expected_content)
 {
 	enum object_type actual_type;
-	unsigned long actual_size;
+	size_t actual_size;
 	void *actual_content;
 	struct object_info oi = {
 		.typep = &actual_type,
diff --git a/tag.c b/tag.c
index 2f12e51024..1a00ded6eb 100644
--- a/tag.c
+++ b/tag.c
@@ -49,7 +49,7 @@ int gpg_verify_tag(struct repository *r, const struct object_id *oid,
 {
 	enum object_type type;
 	char *buf;
-	unsigned long size;
+	size_t size;
 	int ret;
 
 	type = odb_read_object_info(r->objects, oid, NULL);
@@ -207,7 +207,7 @@ int parse_tag(struct repository *r, struct tag *item)
 {
 	enum object_type type;
 	void *data;
-	unsigned long size;
+	size_t size;
 	int ret;
 
 	if (item->object.parsed)
diff --git a/tree-walk.c b/tree-walk.c
index 7e1b956f27..a67f06b9eb 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -87,7 +87,7 @@ void *fill_tree_descriptor(struct repository *r,
 			   struct tree_desc *desc,
 			   const struct object_id *oid)
 {
-	unsigned long size = 0;
+	size_t size = 0;
 	void *buf = NULL;
 
 	if (oid) {
@@ -610,7 +610,7 @@ int get_tree_entry(struct repository *r,
 {
 	int retval;
 	void *tree;
-	unsigned long size;
+	size_t size;
 	struct object_id root;
 
 	tree = odb_read_object_peeled(r->objects, tree_oid, OBJ_TREE, &size, &root);
@@ -682,7 +682,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 		if (!t.buffer) {
 			void *tree;
 			struct object_id root;
-			unsigned long size;
+			size_t size;
 			tree = odb_read_object_peeled(r->objects, &current_tree_oid,
 						      OBJ_TREE, &size, &root);
 			if (!tree)
@@ -778,6 +778,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 		} else if (S_ISLNK(*mode)) {
 			/* Follow a symlink */
 			unsigned long link_len;
+			size_t link_len_st = 0;
 			size_t len;
 			char *contents, *contents_start;
 			struct dir_state *parent;
@@ -797,7 +798,8 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 
 			contents = odb_read_object(r->objects,
 						   &current_tree_oid, &type,
-						   &link_len);
+						   &link_len_st);
+			link_len = cast_size_t_to_ulong(link_len_st);
 
 			if (!contents)
 				goto done;
diff --git a/tree.c b/tree.c
index d703ab97c8..53f7395e9f 100644
--- a/tree.c
+++ b/tree.c
@@ -188,7 +188,7 @@ int repo_parse_tree_gently(struct repository *r, struct tree *item,
 {
 	 enum object_type type;
 	 void *buffer;
-	 unsigned long size;
+	 size_t size;
 
 	if (item->object.parsed)
 		return 0;
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 5ee2b96d0a..db6938689f 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -179,7 +179,7 @@ int read_mmfile(mmfile_t *ptr, const char *filename)
 void read_mmblob(mmfile_t *ptr, struct object_database *odb,
 		 const struct object_id *oid)
 {
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 
 	if (is_null_oid(oid)) {
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v2 6/7] packfile,delta: drop the `cast_size_t_to_ulong()` wrappers
From: Johannes Schindelin via GitGitGadget @ 2026-06-15 11:52 UTC (permalink / raw)
  To: git
  Cc: Kristofer Karlsson, Patrick Steinhardt, Johannes Schindelin,
	Johannes Schindelin
In-Reply-To: <pull.2137.v2.git.1781524349.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When I started the transition from `unsigned long` to `size_t`, in the
interest of keeping the patches reviewable, I introduced these calls to
prevent data type narrowing from silently failing to handle large object
sizes. I also introduced `*_sz()` variants that would allow most of the
callers to keep using that `unsigned long` that the 90s kindly asked to
be returned.

After the preceding commits, the only places that called the narrow
wrappers either no longer exist or already use the `_sz` form
internally, so the wrappers just narrow values back through
`cast_size_t_to_ulong()` for no reason.

Drop them and rename the `_sz` variants back to the natural names.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 delta.h       | 14 ++------------
 packfile.c    | 28 ++++++++--------------------
 packfile.h    |  2 +-
 patch-delta.c |  4 ++--
 4 files changed, 13 insertions(+), 35 deletions(-)

diff --git a/delta.h b/delta.h
index bb149dc82b..eb5c6d2fdb 100644
--- a/delta.h
+++ b/delta.h
@@ -86,11 +86,8 @@ void *patch_delta(const void *src_buf, size_t src_size,
  * This must be called twice on the delta data buffer, first to get the
  * expected source buffer size, and again to get the target buffer size.
  */
-/*
- * Size_t variant that doesn't truncate - use for >4GB objects on Windows.
- */
-static inline size_t get_delta_hdr_size_sz(const unsigned char **datap,
-					   const unsigned char *top)
+static inline size_t get_delta_hdr_size(const unsigned char **datap,
+					const unsigned char *top)
 {
 	const unsigned char *data = *datap;
 	size_t cmd, size = 0;
@@ -104,11 +101,4 @@ static inline size_t get_delta_hdr_size_sz(const unsigned char **datap,
 	return size;
 }
 
-static inline unsigned long get_delta_hdr_size(const unsigned char **datap,
-					       const unsigned char *top)
-{
-	size_t size = get_delta_hdr_size_sz(datap, top);
-	return cast_size_t_to_ulong(size);
-}
-
 #endif
diff --git a/packfile.c b/packfile.c
index dab0a9b16d..c174982d10 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1164,11 +1164,12 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
 }
 
 /*
- * Size_t variant for >4GB delta results on Windows.
+ * Read a delta object's header at curpos in p (already inflated as needed)
+ * and return the size of the result object (the post-application target).
  */
-static size_t get_size_from_delta_sz(struct packed_git *p,
-				     struct pack_window **w_curs,
-				     off_t curpos)
+size_t get_size_from_delta(struct packed_git *p,
+			   struct pack_window **w_curs,
+			   off_t curpos)
 {
 	const unsigned char *data;
 	unsigned char delta_head[20], *in;
@@ -1215,18 +1216,10 @@ static size_t get_size_from_delta_sz(struct packed_git *p,
 	data = delta_head;
 
 	/* ignore base size */
-	get_delta_hdr_size_sz(&data, delta_head+sizeof(delta_head));
+	get_delta_hdr_size(&data, delta_head+sizeof(delta_head));
 
 	/* Read the result size */
-	return get_delta_hdr_size_sz(&data, delta_head+sizeof(delta_head));
-}
-
-unsigned long get_size_from_delta(struct packed_git *p,
-				  struct pack_window **w_curs,
-				  off_t curpos)
-{
-	size_t size = get_size_from_delta_sz(p, w_curs, curpos);
-	return cast_size_t_to_ulong(size);
+	return get_delta_hdr_size(&data, delta_head+sizeof(delta_head));
 }
 
 int unpack_object_header(struct packed_git *p,
@@ -1634,12 +1627,7 @@ static int packed_object_info_with_index_pos(struct packed_git *p, off_t obj_off
 				ret = -1;
 				goto out;
 			}
-			/*
-			 * Use size_t variant to avoid die() on >4GB deltas.
-			 * oi->sizep is unsigned long, so truncation may occur,
-			 * but streaming code uses its own size_t tracking.
-			 */
-			size = get_size_from_delta_sz(p, &w_curs, tmp_pos);
+			size = get_size_from_delta(p, &w_curs, tmp_pos);
 			if (size == 0) {
 				ret = -1;
 				goto out;
diff --git a/packfile.h b/packfile.h
index 0b5ae3f9fc..bd4494906d 100644
--- a/packfile.h
+++ b/packfile.h
@@ -458,7 +458,7 @@ int is_pack_valid(struct packed_git *);
 void *unpack_entry(struct repository *r, struct packed_git *, off_t,
 		   enum object_type *, size_t *);
 unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, size_t *sizep);
-unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t);
+size_t get_size_from_delta(struct packed_git *, struct pack_window **, off_t);
 int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, size_t *);
 off_t get_delta_base(struct packed_git *p, struct pack_window **w_curs,
 		     off_t *curpos, enum object_type type,
diff --git a/patch-delta.c b/patch-delta.c
index 44cda97994..42199fa956 100644
--- a/patch-delta.c
+++ b/patch-delta.c
@@ -27,12 +27,12 @@ void *patch_delta(const void *src_buf, size_t src_size,
 	top = (const unsigned char *) delta_buf + delta_size;
 
 	/* make sure the orig file size matches what we expect */
-	size = get_delta_hdr_size_sz(&data, top);
+	size = get_delta_hdr_size(&data, top);
 	if (size != src_size)
 		return NULL;
 
 	/* now the result size */
-	size = get_delta_hdr_size_sz(&data, top);
+	size = get_delta_hdr_size(&data, top);
 	dst_buf = xmallocz(size);
 
 	out = dst_buf;
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 5/7] pack-objects: use size_t for in-core object sizes
From: Johannes Schindelin via GitGitGadget @ 2026-06-15 11:52 UTC (permalink / raw)
  To: git
  Cc: Kristofer Karlsson, Patrick Steinhardt, Johannes Schindelin,
	Johannes Schindelin
In-Reply-To: <pull.2137.v2.git.1781524349.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

`pack-objects` stores per-entry object sizes in either the 31-bit
`size_` member of the `struct object_entry` or, when the value does not
fit, the `pack->delta_size[]` spill array.  The accessors (`oe_size`,
`oe_delta_size`, `oe_get_size_slow`, `oe_size_*_than`) and the setters
(`oe_set_size`, `oe_set_delta_size`) used `unsigned long` for the spill
type, which on Windows means the spill silently caps at 4 GiB per entry.
That is what made `upload-pack` die with "object too large to read on
this platform" when serving the >4 GiB blob in `t5608` tests 5 and 6
when run with `GIT_TEST_CLONE_2GB`.

Widen them all to `size_t` (including `pack->delta_size`) and drop the
three `cast_size_t_to_ulong()` calls in `check_object()` that guarded
`in_pack_size`.  The two `SET_SIZE(entry, canonical_size)` calls in the
same function stay cast-free as before, since `canonical_size` is still
`unsigned long` until a later commit widens `object_info::sizep`.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/pack-objects.c | 35 ++++++++++++++++++-----------------
 pack-objects.h         |  2 +-
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 56d1bb498d..961d547ef2 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -66,8 +66,8 @@ static inline struct object_entry *oe_delta(
 		return &pack->objects[e->delta_idx - 1];
 }
 
-static inline unsigned long oe_delta_size(struct packing_data *pack,
-					  const struct object_entry *e)
+static inline size_t oe_delta_size(struct packing_data *pack,
+				   const struct object_entry *e)
 {
 	if (e->delta_size_valid)
 		return e->delta_size_;
@@ -83,11 +83,11 @@ static inline unsigned long oe_delta_size(struct packing_data *pack,
 	return pack->delta_size[e - pack->objects];
 }
 
-unsigned long oe_get_size_slow(struct packing_data *pack,
-			       const struct object_entry *e);
+size_t oe_get_size_slow(struct packing_data *pack,
+			const struct object_entry *e);
 
-static inline unsigned long oe_size(struct packing_data *pack,
-				    const struct object_entry *e)
+static inline size_t oe_size(struct packing_data *pack,
+			     const struct object_entry *e)
 {
 	if (e->size_valid)
 		return e->size_;
@@ -145,7 +145,7 @@ static inline void oe_set_delta_sibling(struct packing_data *pack,
 
 static inline void oe_set_size(struct packing_data *pack,
 			       struct object_entry *e,
-			       unsigned long size)
+			       size_t size)
 {
 	if (size < pack->oe_size_limit) {
 		e->size_ = size;
@@ -159,7 +159,7 @@ static inline void oe_set_size(struct packing_data *pack,
 
 static inline void oe_set_delta_size(struct packing_data *pack,
 				     struct object_entry *e,
-				     unsigned long size)
+				     size_t size)
 {
 	if (size < pack->oe_delta_size_limit) {
 		e->delta_size_ = size;
@@ -496,7 +496,7 @@ static void copy_pack_data(struct hashfile *f,
 
 static inline int oe_size_greater_than(struct packing_data *pack,
 				       const struct object_entry *lhs,
-				       unsigned long rhs)
+				       size_t rhs)
 {
 	if (lhs->size_valid)
 		return lhs->size_ > rhs;
@@ -2279,7 +2279,7 @@ static void check_object(struct object_entry *entry, uint32_t object_index)
 		default:
 			/* Not a delta hence we've already got all we need. */
 			oe_set_type(entry, entry->in_pack_type);
-			SET_SIZE(entry, cast_size_t_to_ulong(in_pack_size));
+			SET_SIZE(entry, in_pack_size);
 			entry->in_pack_header_size = used;
 			if (oe_type(entry) < OBJ_COMMIT || oe_type(entry) > OBJ_BLOB)
 				goto give_up;
@@ -2333,8 +2333,8 @@ static void check_object(struct object_entry *entry, uint32_t object_index)
 		if (have_base &&
 		    can_reuse_delta(&base_ref, entry, &base_entry)) {
 			oe_set_type(entry, entry->in_pack_type);
-			SET_SIZE(entry, cast_size_t_to_ulong(in_pack_size)); /* delta size */
-			SET_DELTA_SIZE(entry, cast_size_t_to_ulong(in_pack_size));
+			SET_SIZE(entry, in_pack_size); /* delta size */
+			SET_DELTA_SIZE(entry, in_pack_size);
 
 			if (base_entry) {
 				SET_DELTA(entry, base_entry);
@@ -2357,7 +2357,8 @@ static void check_object(struct object_entry *entry, uint32_t object_index)
 			 * object size from the delta header.
 			 */
 			delta_pos = entry->in_pack_offset + entry->in_pack_header_size;
-			canonical_size = get_size_from_delta(p, &w_curs, delta_pos);
+			canonical_size = get_size_from_delta(p, &w_curs,
+							     delta_pos);
 			if (canonical_size == 0)
 				goto give_up;
 			SET_SIZE(entry, canonical_size);
@@ -2713,7 +2714,7 @@ static pthread_mutex_t progress_mutex;
 
 static inline int oe_size_less_than(struct packing_data *pack,
 				    const struct object_entry *lhs,
-				    unsigned long rhs)
+				    size_t rhs)
 {
 	if (lhs->size_valid)
 		return lhs->size_ < rhs;
@@ -2736,8 +2737,8 @@ static inline void oe_set_tree_depth(struct packing_data *pack,
  * reconstruction (so non-deltas are true object sizes, but deltas
  * return the size of the delta data).
  */
-unsigned long oe_get_size_slow(struct packing_data *pack,
-			       const struct object_entry *e)
+size_t oe_get_size_slow(struct packing_data *pack,
+			const struct object_entry *e)
 {
 	struct packed_git *p;
 	struct pack_window *w_curs;
@@ -2771,7 +2772,7 @@ unsigned long oe_get_size_slow(struct packing_data *pack,
 
 	unuse_pack(&w_curs);
 	packing_data_unlock(&to_pack);
-	return cast_size_t_to_ulong(size);
+	return size;
 }
 
 static int try_delta(struct unpacked *trg, struct unpacked *src,
diff --git a/pack-objects.h b/pack-objects.h
index 83299d4732..e97e84ddcb 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -141,7 +141,7 @@ struct packing_data {
 	uint32_t index_size;
 
 	unsigned int *in_pack_pos;
-	unsigned long *delta_size;
+	size_t *delta_size;
 
 	/*
 	 * Only one of these can be non-NULL and they have different
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 4/7] packfile: widen unpack_entry()'s size out-parameter to size_t
From: Johannes Schindelin via GitGitGadget @ 2026-06-15 11:52 UTC (permalink / raw)
  To: git
  Cc: Kristofer Karlsson, Patrick Steinhardt, Johannes Schindelin,
	Johannes Schindelin
In-Reply-To: <pull.2137.v2.git.1781524349.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The topic `js/objects-larger-than-4gb-on-windows` widened the streaming,
index-pack and unpack-objects paths to `size_t` but deliberately stopped
at the in-memory `unpack_entry()` cascade, which still hands back the
unpacked size through `unsigned long *`.  On Windows that boundary
truncates above 4 GiB because that data type is only 32 bits wide on
that platform.

Widen the code path. Except `packed_object_info_with_index_pos()`: It
cannot yet pass `oi->sizep` directly because the field is still
`unsigned long *`; bridge it with a `size_t` temporary that narrows
back, and let a later commit drop the bridge once the field is wide
too. `gfi_unpack_entry()` keeps its narrow signature because fast-import
tracks sizes through `unsigned long` everywhere it crosses subsystem
boundaries, keeping its signature allows the scope of this commit to be
somewhat reasonable, still.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/fast-import.c |  7 ++++++-
 pack-check.c          |  5 ++---
 packfile.c            | 28 +++++++++++++++++-----------
 packfile.h            |  3 ++-
 4 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 82bc6dcc00..3dff898c43 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -1239,6 +1239,8 @@ static void *gfi_unpack_entry(
 	unsigned long *sizep)
 {
 	enum object_type type;
+	size_t size_st = 0;
+	void *data;
 	struct packed_git *p = all_packs[oe->pack_id];
 	if (p == pack_data && p->pack_size < (pack_size + the_hash_algo->rawsz)) {
 		/* The object is stored in the packfile we are writing to
@@ -1260,7 +1262,10 @@ static void *gfi_unpack_entry(
 		 */
 		p->pack_size = pack_size + the_hash_algo->rawsz;
 	}
-	return unpack_entry(the_repository, p, oe->idx.offset, &type, sizep);
+	data = unpack_entry(the_repository, p, oe->idx.offset, &type, &size_st);
+	if (sizep)
+		*sizep = cast_size_t_to_ulong(size_st);
+	return data;
 }
 
 static void load_tree(struct tree_entry *root)
diff --git a/pack-check.c b/pack-check.c
index 2792f34d25..5adfb3f272 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -143,9 +143,8 @@ static int verify_packfile(struct repository *r,
 			data = NULL;
 			data_valid = 0;
 		} else {
-			unsigned long sz;
-			data = unpack_entry(r, p, entries[i].offset, &type, &sz);
-			size = sz;
+			data = unpack_entry(r, p, entries[i].offset, &type,
+					    &size);
 			data_valid = 1;
 		}
 
diff --git a/packfile.c b/packfile.c
index e202f48837..dab0a9b16d 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1454,7 +1454,7 @@ struct delta_base_cache_entry {
 	struct delta_base_cache_key key;
 	struct list_head lru;
 	void *data;
-	unsigned long size;
+	size_t size;
 	enum object_type type;
 };
 
@@ -1525,7 +1525,7 @@ static void detach_delta_base_cache_entry(struct delta_base_cache_entry *ent)
 }
 
 static void *cache_or_unpack_entry(struct repository *r, struct packed_git *p,
-				   off_t base_offset, unsigned long *base_size,
+				   off_t base_offset, size_t *base_size,
 				   enum object_type *type)
 {
 	struct delta_base_cache_entry *ent;
@@ -1558,8 +1558,8 @@ void clear_delta_base_cache(void)
 }
 
 static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
-				 void *base, unsigned long base_size,
-				 unsigned long delta_base_cache_limit,
+				 void *base, size_t base_size,
+				 size_t delta_base_cache_limit,
 				 enum object_type type)
 {
 	struct delta_base_cache_entry *ent;
@@ -1614,10 +1614,13 @@ static int packed_object_info_with_index_pos(struct packed_git *p, off_t obj_off
 	 * a "real" type later if the caller is interested.
 	 */
 	if (oi->contentp) {
-		*oi->contentp = cache_or_unpack_entry(p->repo, p, obj_offset, oi->sizep,
-						      &type);
+		size_t size_st = 0;
+		*oi->contentp = cache_or_unpack_entry(p->repo, p, obj_offset,
+						      &size_st, &type);
 		if (!*oi->contentp)
 			type = OBJ_BAD;
+		else if (oi->sizep)
+			*oi->sizep = cast_size_t_to_ulong(size_st);
 	} else if (oi->sizep || oi->typep || oi->delta_base_oid) {
 		type = unpack_object_header(p, &w_curs, &curpos, &size);
 	}
@@ -1735,7 +1738,7 @@ int packed_object_info(struct packed_git *p, off_t obj_offset,
 static void *unpack_compressed_entry(struct packed_git *p,
 				    struct pack_window **w_curs,
 				    off_t curpos,
-				    unsigned long size)
+				    size_t size)
 {
 	int st;
 	git_zstream stream;
@@ -1790,11 +1793,11 @@ int do_check_packed_object_crc;
 struct unpack_entry_stack_ent {
 	off_t obj_offset;
 	off_t curpos;
-	unsigned long size;
+	size_t size;
 };
 
 void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
-		   enum object_type *final_type, unsigned long *final_size)
+		   enum object_type *final_type, size_t *final_size)
 {
 	struct pack_window *w_curs = NULL;
 	off_t curpos = obj_offset;
@@ -1911,7 +1914,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 		void *delta_data;
 		void *base = data;
 		void *external_base = NULL;
-		unsigned long delta_size, base_size = size;
+		size_t delta_size, base_size = size;
 		int i;
 		off_t base_obj_offset = obj_offset;
 
@@ -1928,6 +1931,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			struct object_id base_oid;
 			if (!(offset_to_pack_pos(p, obj_offset, &pos))) {
 				struct object_info oi = OBJECT_INFO_INIT;
+				unsigned long bsz_ul = 0;
 
 				nth_packed_object_id(&base_oid, p,
 						     pack_pos_to_index(p, pos));
@@ -1938,11 +1942,13 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 				mark_bad_packed_object(p, &base_oid);
 
 				oi.typep = &type;
-				oi.sizep = &base_size;
+				oi.sizep = &bsz_ul;
 				oi.contentp = &base;
 				if (odb_read_object_info_extended(r->objects, &base_oid,
 								  &oi, 0) < 0)
 					base = NULL;
+				else
+					base_size = bsz_ul;
 
 				external_base = base;
 			}
diff --git a/packfile.h b/packfile.h
index 49d6bdecf6..0b5ae3f9fc 100644
--- a/packfile.h
+++ b/packfile.h
@@ -455,7 +455,8 @@ off_t nth_packed_object_offset(const struct packed_git *, uint32_t n);
 off_t find_pack_entry_one(const struct object_id *oid, struct packed_git *);
 
 int is_pack_valid(struct packed_git *);
-void *unpack_entry(struct repository *r, struct packed_git *, off_t, enum object_type *, unsigned long *);
+void *unpack_entry(struct repository *r, struct packed_git *, off_t,
+		   enum object_type *, size_t *);
 unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, size_t *sizep);
 unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t);
 int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, size_t *);
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 3/7] pack-objects(check_pack_inflate()): use size_t instead of unsigned long
From: Johannes Schindelin via GitGitGadget @ 2026-06-15 11:52 UTC (permalink / raw)
  To: git
  Cc: Kristofer Karlsson, Patrick Steinhardt, Johannes Schindelin,
	Johannes Schindelin
In-Reply-To: <pull.2137.v2.git.1781524349.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

`write_reuse_object()` learned to track its packed-object size as
`size_t` in 606c192380 (odb, packfile: use size_t for streaming
object sizes, 2026-05-08), but the comparison sink it feeds,
`check_pack_inflate()`, still takes the expected decompressed size
as `unsigned long`. The call site bridges the mismatch with
`cast_size_t_to_ulong()`, which on Windows turns a >4 GiB object
into an immediate die().

That function only uses `expect` once: as the right-hand side of a
`stream.total_out == expect` equality test against zlib's counter.
zlib's own `total_out` counter is `uLong` and is therefore still
32-bit-bound on Windows. Widening `expect` to `size_t` cannot fix that,
but it is a strict improvement nonetheless: instead of dying outright,
an oversized object now simply makes the equality fail and lets
`write_reuse_object()` fall back to `write_no_reuse_object()`, which
decompresses and re-deflates the content (and which the larger
pack-objects widening series targets separately).

Drop the `cast_size_t_to_ulong()` shim at the call site now that
the receiving parameter speaks the same type as `entry_size`.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/pack-objects.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 50675481e1..56d1bb498d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -453,7 +453,7 @@ static int check_pack_inflate(struct packed_git *p,
 		struct pack_window **w_curs,
 		off_t offset,
 		off_t len,
-		unsigned long expect)
+		size_t expect)
 {
 	git_zstream stream;
 	unsigned char fakebuf[4096], *in;
@@ -671,8 +671,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
 	datalen -= entry->in_pack_header_size;

 	if (!pack_to_stdout && p->index_version == 1 &&
-	    check_pack_inflate(p, &w_curs, offset, datalen,
-			       cast_size_t_to_ulong(entry_size))) {
+	    check_pack_inflate(p, &w_curs, offset, datalen, entry_size)) {
 		error(_("corrupt packed object for %s"),
 		      oid_to_hex(&entry->idx.oid));
 		unuse_pack(&w_curs);
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v2 2/7] patch-delta: use size_t for sizes
From: Johannes Schindelin via GitGitGadget @ 2026-06-15 11:52 UTC (permalink / raw)
  To: git
  Cc: Kristofer Karlsson, Patrick Steinhardt, Johannes Schindelin,
	Johannes Schindelin
In-Reply-To: <pull.2137.v2.git.1781524349.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

`patch_delta()` takes the source and delta sizes by value and writes
back the reconstructed target size through an `unsigned long *`.  That
datatype cannot represent a value that exceeds 4 GiB on systems where
`unsigned long` is 32-bit (notably 64-bit Windows builds), though, even
though the delta encoding itself, the on-disk layout, and the in-memory
buffers happily carry such sizes. A `size_t` companion to
`get_delta_hdr_size()`, `get_delta_hdr_size_sz()`, was introduced in
17fa077596 (delta, packfile: use size_t for delta header sizes,
2026-05-08) precisely so that `patch_delta()` could be widened without
changing the on-the-wire decoding helper's signature.

Widen `patch_delta()`'s three size parameters to `size_t` and switch
its internal use of `get_delta_hdr_size()` to the `_sz` variant.
Then propagate the wider type through the callers.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 apply.c                  |  2 +-
 builtin/index-pack.c     |  4 ++--
 builtin/unpack-objects.c |  2 +-
 delta.h                  |  6 +++---
 packfile.c               |  4 +---
 patch-delta.c            | 12 ++++++------
 t/helper/test-delta.c    | 10 ++++++----
 7 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/apply.c b/apply.c
index 249248d4f2..3cf544e9a9 100644
--- a/apply.c
+++ b/apply.c
@@ -3232,7 +3232,7 @@ static int apply_binary_fragment(struct apply_state *state,
 				 struct patch *patch)
 {
 	struct fragment *fragment = patch->fragments;
-	unsigned long len;
+	size_t len;
 	void *dst;
 
 	if (!fragment)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index cf0bd8280d..3c4474e681 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -71,7 +71,7 @@ struct base_data {
 	/* Not initialized by make_base(). */
 	struct list_head list;
 	void *data;
-	unsigned long size;
+	size_t size;
 };
 
 /*
@@ -1048,7 +1048,7 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 {
 	void *delta_data, *result_data;
 	struct base_data *result;
-	unsigned long result_size;
+	size_t result_size;
 
 	if (show_stat) {
 		int i = delta_obj - objects;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 59e9b8711e..e7a50c493c 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -314,7 +314,7 @@ static void resolve_delta(unsigned nr, enum object_type type,
 			  void *delta, unsigned long delta_size)
 {
 	void *result;
-	unsigned long result_size;
+	size_t result_size;
 
 	result = patch_delta(base, base_size,
 			     delta, delta_size,
diff --git a/delta.h b/delta.h
index fad68cfc45..bb149dc82b 100644
--- a/delta.h
+++ b/delta.h
@@ -75,9 +75,9 @@ diff_delta(const void *src_buf, unsigned long src_bufsize,
  * *trg_bufsize is updated with its size.  On failure a NULL pointer is
  * returned.  The returned buffer must be freed by the caller.
  */
-void *patch_delta(const void *src_buf, unsigned long src_size,
-		  const void *delta_buf, unsigned long delta_size,
-		  unsigned long *dst_size);
+void *patch_delta(const void *src_buf, size_t src_size,
+		  const void *delta_buf, size_t delta_size,
+		  size_t *dst_size);
 
 /* the smallest possible delta size is 4 bytes */
 #define DELTA_SIZE_MIN	4
diff --git a/packfile.c b/packfile.c
index 89366abfe3..e202f48837 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1964,10 +1964,8 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			      (uintmax_t)curpos, p->pack_name);
 			data = NULL;
 		} else {
-			unsigned long sz;
 			data = patch_delta(base, base_size, delta_data,
-					   delta_size, &sz);
-			size = sz;
+					   delta_size, &size);
 
 			/*
 			 * We could not apply the delta; warn the user, but
diff --git a/patch-delta.c b/patch-delta.c
index b5c8594db6..44cda97994 100644
--- a/patch-delta.c
+++ b/patch-delta.c
@@ -12,13 +12,13 @@
 #include "git-compat-util.h"
 #include "delta.h"
 
-void *patch_delta(const void *src_buf, unsigned long src_size,
-		  const void *delta_buf, unsigned long delta_size,
-		  unsigned long *dst_size)
+void *patch_delta(const void *src_buf, size_t src_size,
+		  const void *delta_buf, size_t delta_size,
+		  size_t *dst_size)
 {
 	const unsigned char *data, *top;
 	unsigned char *dst_buf, *out, cmd;
-	unsigned long size;
+	size_t size;
 
 	if (delta_size < DELTA_SIZE_MIN)
 		return NULL;
@@ -27,12 +27,12 @@ void *patch_delta(const void *src_buf, unsigned long src_size,
 	top = (const unsigned char *) delta_buf + delta_size;
 
 	/* make sure the orig file size matches what we expect */
-	size = get_delta_hdr_size(&data, top);
+	size = get_delta_hdr_size_sz(&data, top);
 	if (size != src_size)
 		return NULL;
 
 	/* now the result size */
-	size = get_delta_hdr_size(&data, top);
+	size = get_delta_hdr_size_sz(&data, top);
 	dst_buf = xmallocz(size);
 
 	out = dst_buf;
diff --git a/t/helper/test-delta.c b/t/helper/test-delta.c
index 52ea00c937..8223a60229 100644
--- a/t/helper/test-delta.c
+++ b/t/helper/test-delta.c
@@ -21,7 +21,7 @@ int cmd__delta(int argc, const char **argv)
 	int fd;
 	struct strbuf from = STRBUF_INIT, data = STRBUF_INIT;
 	char *out_buf;
-	unsigned long out_size;
+	size_t out_size;
 
 	if (argc != 5 || (strcmp(argv[1], "-d") && strcmp(argv[1], "-p")))
 		usage(usage_str);
@@ -31,11 +31,13 @@ int cmd__delta(int argc, const char **argv)
 	if (strbuf_read_file(&data, argv[3], 0) < 0)
 		die_errno("unable to read '%s'", argv[3]);
 
-	if (argv[1][1] == 'd')
+	if (argv[1][1] == 'd') {
+		unsigned long delta_size;
 		out_buf = diff_delta(from.buf, from.len,
 				     data.buf, data.len,
-				     &out_size, 0);
-	else
+				     &delta_size, 0);
+		out_size = delta_size;
+	} else
 		out_buf = patch_delta(from.buf, from.len,
 				      data.buf, data.len,
 				      &out_size);
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 1/7] compat/msvc: use _chsize_s for ftruncate
From: Johannes Schindelin via GitGitGadget @ 2026-06-15 11:52 UTC (permalink / raw)
  To: git
  Cc: Kristofer Karlsson, Patrick Steinhardt, Johannes Schindelin,
	Johannes Schindelin
In-Reply-To: <pull.2137.v2.git.1781524349.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

On Windows, `unsigned long` and `long` are 32 bits even on 64-bit
builds. The MSVC compatibility header has shimmed `ftruncate()` with

	#define ftruncate _chsize

ever since `compat/msvc-posix.h` was introduced. `_chsize()` takes a
32-bit `long` for the new length, which silently truncates files (and
the requested size) to 2 GiB. That is enough to make t7508 test 126
"git add fails gracefully with 4 GiB and 8 GiB files" fail under
MSVC: `test-tool truncate` creates a sparse 4 GiB or 8 GiB file via
the shimmed `ftruncate()`, and the test never gets off the ground.

`_chsize_s()` is the modern replacement, accepts a 64-bit `__int64`
length, and is the only sensible target on Windows. The catch is that
it does not follow the POSIX `-1` + `errno` convention: it returns
`0` on success and an errno value (a small positive integer) on
failure. A plain `#define ftruncate _chsize_s` would therefore
silently break callers that test the return value as `< 0` or against
`-1`, of which there are several: `http.c`, `parallel-checkout.c`,
and `t/helper/test-truncate.c` among them.

Introduce a `static inline` wrapper that calls `_chsize_s()`, copies
its errno return into `errno`, and translates the result to the
familiar `-1` / `0` convention, then point `ftruncate` at the
wrapper. Place the wrapper after `#include "mingw-posix.h"` so the
`off_t` parameter resolves to the already-widened `off64_t` rather
than the 32-bit `_off_t` from `compat/vcbuild/include/unistd.h`.

MinGW is unaffected: its `ftruncate()` already takes `off_t` and
routes through `ftruncate64()` when `_FILE_OFFSET_BITS=64`, which is
the default in our build.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 compat/msvc-posix.h | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/compat/msvc-posix.h b/compat/msvc-posix.h
index c500b8b4aa..7ce39b8d3f 100644
--- a/compat/msvc-posix.h
+++ b/compat/msvc-posix.h
@@ -16,7 +16,6 @@
 #define __attribute__(x)
 #define strcasecmp   _stricmp
 #define strncasecmp  _strnicmp
-#define ftruncate    _chsize
 #define strtoull     _strtoui64
 #define strtoll      _strtoi64

@@ -30,4 +29,27 @@ typedef int sigset_t;

 #include "mingw-posix.h"

+/*
+ * MSVC's `_chsize()` takes a 32-bit `long` and silently truncates files
+ * to 2 GiB. `_chsize_s()` accepts a 64-bit length but returns 0 on
+ * success or an errno value on failure, rather than the -1/errno
+ * convention POSIX `ftruncate()` callers expect. Wrap it so callers
+ * that test the return value as `< 0` or against `-1` keep working.
+ *
+ * Note: this declaration must follow `#include "mingw-posix.h"` so
+ * `off_t` resolves to `off64_t` and the parameter type matches the
+ * underlying `_chsize_s()` width.
+ */
+static inline int msvc_ftruncate(int fd, off_t length)
+{
+	int err = _chsize_s(fd, length);
+
+	if (err) {
+		errno = err;
+		return -1;
+	}
+	return 0;
+}
+#define ftruncate msvc_ftruncate
+
 #endif /* COMPAT_MSVC_POSIX_H */
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v2 0/7] More work supporting objects larger than 4GB on Windows
From: Johannes Schindelin via GitGitGadget @ 2026-06-15 11:52 UTC (permalink / raw)
  To: git; +Cc: Kristofer Karlsson, Patrick Steinhardt, Johannes Schindelin
In-Reply-To: <pull.2137.git.1780570272.gitgitgadget@gmail.com>

This patch series tries to address the problems pointed out by the expensive
tests that now run in CI: t5608 and t7508 verify various aspects about
objects larger than 4GB, which Git does not currently handle correctly when
run on a platform where size_t is 64-bit and unsigned long is 32-bit.

Changes vs v1:

 * Rebased onto master, which merged ps/odb-source-loose (with which these
   patches previously conflicted rather badly).
 * Removed superfluous size_t s variables (thanks, Patrick!).

Johannes Schindelin (7):
  compat/msvc: use _chsize_s for ftruncate
  patch-delta: use size_t for sizes
  pack-objects(check_pack_inflate()): use size_t instead of unsigned
    long
  packfile: widen unpack_entry()'s size out-parameter to size_t
  pack-objects: use size_t for in-core object sizes
  packfile,delta: drop the `cast_size_t_to_ulong()` wrappers
  odb: use size_t for object_info.sizep and the size APIs

 apply.c                       |  8 ++--
 archive.c                     |  4 +-
 attr.c                        |  2 +-
 bisect.c                      |  2 +-
 blame.c                       | 15 +++++--
 builtin/cat-file.c            | 61 ++++++++++++++---------------
 builtin/difftool.c            |  2 +-
 builtin/fast-export.c         |  7 +++-
 builtin/fast-import.c         | 29 ++++++++++----
 builtin/fsck.c                |  2 +-
 builtin/grep.c                | 12 +++---
 builtin/index-pack.c          | 10 ++---
 builtin/log.c                 |  2 +-
 builtin/ls-files.c            |  2 +-
 builtin/ls-tree.c             |  4 +-
 builtin/merge-tree.c          |  6 +--
 builtin/mktag.c               |  2 +-
 builtin/notes.c               |  6 +--
 builtin/pack-objects.c        | 73 +++++++++++++++++++++--------------
 builtin/repo.c                |  4 +-
 builtin/tag.c                 |  4 +-
 builtin/unpack-file.c         |  2 +-
 builtin/unpack-objects.c      |  8 ++--
 bundle.c                      |  2 +-
 combine-diff.c                |  4 +-
 commit.c                      | 10 ++---
 compat/msvc-posix.h           | 24 +++++++++++-
 config.c                      |  2 +-
 delta.h                       | 20 +++-------
 diff.c                        |  5 ++-
 dir.c                         |  2 +-
 entry.c                       |  4 +-
 fmt-merge-msg.c               |  4 +-
 fsck.c                        |  2 +-
 grep.c                        |  4 +-
 http-push.c                   |  2 +-
 list-objects-filter.c         |  2 +-
 mailmap.c                     |  2 +-
 match-trees.c                 |  4 +-
 merge-blobs.c                 |  6 +--
 merge-blobs.h                 |  2 +-
 merge-ort.c                   |  2 +-
 notes-cache.c                 |  2 +-
 notes-merge.c                 |  2 +-
 notes.c                       |  8 ++--
 object-file.c                 |  6 +--
 object.c                      |  2 +-
 odb.c                         | 12 +++---
 odb.h                         | 10 ++---
 odb/source-loose.c            | 12 +-----
 odb/streaming.c               | 13 +------
 pack-bitmap.c                 |  4 +-
 pack-check.c                  |  5 +--
 pack-objects.h                |  2 +-
 packfile.c                    | 54 ++++++++++----------------
 packfile.h                    |  5 ++-
 patch-delta.c                 |  8 ++--
 path-walk.c                   |  2 +-
 protocol-caps.c               |  5 ++-
 read-cache.c                  |  6 +--
 ref-filter.c                  |  2 +-
 reflog.c                      |  2 +-
 rerere.c                      |  2 +-
 submodule-config.c            |  2 +-
 t/helper/test-delta.c         | 10 +++--
 t/helper/test-pack-deltas.c   |  3 +-
 t/helper/test-partial-clone.c |  2 +-
 t/unit-tests/u-odb-inmemory.c |  2 +-
 tag.c                         |  4 +-
 tree-walk.c                   | 10 +++--
 tree.c                        |  2 +-
 xdiff-interface.c             |  2 +-
 72 files changed, 300 insertions(+), 271 deletions(-)


base-commit: ea97ad8d017de0c9037451a78008a0fd60abea0c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2137%2Fdscho%2Fobjects-larger-than-4gb-on-windows-pt2-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2137/dscho/objects-larger-than-4gb-on-windows-pt2-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2137

Range-diff vs v1:

 1:  de9fc5c455 = 1:  531bca775c compat/msvc: use _chsize_s for ftruncate
 2:  1fd7646ca1 = 2:  66a642c39e patch-delta: use size_t for sizes
 3:  ddb75326cd = 3:  271a5299e3 pack-objects(check_pack_inflate()): use size_t instead of unsigned long
 4:  bdebc36f21 = 4:  5c329535df packfile: widen unpack_entry()'s size out-parameter to size_t
 5:  68750ba2d1 = 5:  01b9209b26 pack-objects: use size_t for in-core object sizes
 6:  460d733fee = 6:  12c142f8ab packfile,delta: drop the `cast_size_t_to_ulong()` wrappers
 7:  f3aeae983a ! 7:  37d030d867 odb: use size_t for object_info.sizep and the size APIs
     @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const
       	struct object_info oi = OBJECT_INFO_INIT;
       	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
      @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
     - 		if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) {
     - 			size_t s = size;
     - 			buf = replace_idents_using_mailmap(buf, &s);
     + 		if (odb_read_object_info_extended(the_repository->objects, &oid, &oi, flags) < 0)
     + 			die("git cat-file: could not get object info");
     + 
     +-		if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) {
     +-			size_t s = size;
     +-			buf = replace_idents_using_mailmap(buf, &s);
      -			size = cast_size_t_to_ulong(s);
     -+			size = s;
     - 		}
     +-		}
     ++		if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG))
     ++			buf = replace_idents_using_mailmap(buf, &size);
       
       		printf("%"PRIuMAX"\n", (uintmax_t)size);
     + 		ret = 0;
      @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
       		break;
       
     @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const
       
       	case 'p':
      @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
     - 		if (use_mailmap) {
     - 			size_t s = size;
     - 			buf = replace_idents_using_mailmap(buf, &s);
     + 		if (!buf)
     + 			die("Cannot read object %s", obj_name);
     + 
     +-		if (use_mailmap) {
     +-			size_t s = size;
     +-			buf = replace_idents_using_mailmap(buf, &s);
      -			size = cast_size_t_to_ulong(s);
     -+			size = s;
     - 		}
     +-		}
     ++		if (use_mailmap)
     ++			buf = replace_idents_using_mailmap(buf, &size);
       
       		/* otherwise just spit out the data */
     + 		break;
      @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
     - 		if (use_mailmap) {
     - 			size_t s = size;
     - 			buf = replace_idents_using_mailmap(buf, &s);
     + 		buf = odb_read_object_peeled(the_repository->objects, &oid,
     + 					     exp_type_id, &size, NULL);
     + 
     +-		if (use_mailmap) {
     +-			size_t s = size;
     +-			buf = replace_idents_using_mailmap(buf, &s);
      -			size = cast_size_t_to_ulong(s);
     -+			size = s;
     - 		}
     +-		}
     ++		if (use_mailmap)
     ++			buf = replace_idents_using_mailmap(buf, &size);
       		break;
       	}
     + 	default:
      @@ builtin/cat-file.c: cleanup:
       struct expand_data {
       	struct object_id oid;
     @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, s
       
       		contents = odb_read_object(the_repository->objects, oid,
      @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, struct expand_data *d
     - 		if (use_mailmap) {
     - 			size_t s = size;
     - 			contents = replace_idents_using_mailmap(contents, &s);
     + 		if (!contents)
     + 			die("object %s disappeared", oid_to_hex(oid));
     + 
     +-		if (use_mailmap) {
     +-			size_t s = size;
     +-			contents = replace_idents_using_mailmap(contents, &s);
      -			size = cast_size_t_to_ulong(s);
     -+			size = s;
     - 		}
     +-		}
     ++		if (use_mailmap)
     ++			contents = replace_idents_using_mailmap(contents, &size);
       
       		if (type != data->type)
     + 			die("object %s changed type!?", oid_to_hex(oid));
      @@ builtin/cat-file.c: static void batch_object_write(const char *obj_name,
     + 		}
     + 
     + 		if (use_mailmap && (data->type == OBJ_COMMIT || data->type == OBJ_TAG)) {
     +-			size_t s = data->size;
     + 			char *buf = NULL;
     + 
     + 			buf = odb_read_object(the_repository->objects, &data->oid,
     + 					      &data->type, &data->size);
       			if (!buf)
       				die(_("unable to read %s"), oid_to_hex(&data->oid));
     - 			buf = replace_idents_using_mailmap(buf, &s);
     +-			buf = replace_idents_using_mailmap(buf, &s);
      -			data->size = cast_size_t_to_ulong(s);
     -+			data->size = s;
     ++			buf = replace_idents_using_mailmap(buf, &data->size);
       
       			free(buf);
       		}
     @@ builtin/log.c: static int show_blob_object(const struct object_id *oid, struct r
      
       ## builtin/ls-files.c ##
      @@ builtin/ls-files.c: static void expand_objectsize(struct repository *repo, struct strbuf *line,
     - 			      const enum object_type type, unsigned int padded)
     - {
     + 	size_t len;
     + 
       	if (type == OBJ_BLOB) {
      -		unsigned long size;
      +		size_t size;
     @@ builtin/ls-files.c: static void expand_objectsize(struct repository *repo, struc
      
       ## builtin/ls-tree.c ##
      @@ builtin/ls-tree.c: static void expand_objectsize(struct strbuf *line, const struct object_id *oid,
     - 			      const enum object_type type, unsigned int padded)
     - {
     + 	size_t len;
     + 
       	if (type == OBJ_BLOB) {
      -		unsigned long size;
      +		size_t size;
     @@ notes.c: static void format_note(struct notes_tree *t, const struct object_id *o
       	if (!t)
      
       ## object-file.c ##
     -@@ object-file.c: static int parse_loose_header(const char *hdr, struct object_info *oi)
     +@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi)
       	}
       
       	if (oi->sizep)
     @@ object-file.c: static int parse_loose_header(const char *hdr, struct object_info
       
       	/*
       	 * The length must be followed by a zero byte
     -@@ object-file.c: static int read_object_info_from_path(struct odb_source *source,
     - 	void *map = NULL;
     - 	git_zstream stream, *stream_to_end = NULL;
     - 	char hdr[MAX_HEADER_LEN];
     --	unsigned long size_scratch;
     -+	size_t size_scratch;
     - 	enum object_type type_scratch;
     - 	struct stat st;
     - 
      @@ object-file.c: int force_object_loose(struct odb_source *source,
     - {
     + 	struct odb_source_files *files = odb_source_files_downcast(source);
       	const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
       	void *buf;
      -	unsigned long len;
     @@ object-file.c: int read_loose_object(struct repository *repo,
       
       	fd = git_open(path);
       	if (fd >= 0)
     -@@ object-file.c: int odb_source_loose_read_object_stream(struct odb_read_stream **out,
     - 	struct object_info oi = OBJECT_INFO_INIT;
     - 	struct odb_loose_read_stream *st;
     - 	unsigned long mapsize;
     --	unsigned long size_ul;
     - 	void *mapped;
     - 
     - 	mapped = odb_source_loose_map_object(source, oid, &mapsize);
     -@@ object-file.c: int odb_source_loose_read_object_stream(struct odb_read_stream **out,
     - 		goto error;
     - 	}
     - 
     --	/*
     --	 * object_info.sizep is unsigned long* (32-bit on Windows), but
     --	 * st->base.size is size_t (64-bit). Use temporary variable.
     --	 * Note: loose objects >4GB would still truncate here, but such
     --	 * large loose objects are uncommon (they'd normally be packed).
     --	 */
     --	oi.sizep = &size_ul;
     -+	oi.sizep = &st->base.size;
     - 	oi.typep = &st->base.type;
     - 
     - 	if (parse_loose_header(st->hdr, &oi) < 0 || st->base.type < 0)
     - 		goto error;
     --	st->base.size = size_ul;
     - 
     - 	st->mapped = mapped;
     - 	st->mapsize = mapsize;
      
       ## object.c ##
      @@ object.c: struct object *parse_object_with_flags(struct repository *r,
     @@ odb.h: int odb_read_object_info_extended(struct object_database *odb,
       enum odb_has_object_flags {
       	/* Retry packed storage after checking packed and loose storage */
      
     + ## odb/source-loose.c ##
     +@@ odb/source-loose.c: static int read_object_info_from_path(struct odb_source_loose *loose,
     + 	void *map = NULL;
     + 	git_zstream stream, *stream_to_end = NULL;
     + 	char hdr[MAX_HEADER_LEN];
     +-	unsigned long size_scratch;
     ++	size_t size_scratch;
     + 	enum object_type type_scratch;
     + 	struct stat st;
     + 
     +@@ odb/source-loose.c: static int odb_source_loose_read_object_stream(struct odb_read_stream **out,
     + 	struct object_info oi = OBJECT_INFO_INIT;
     + 	struct odb_loose_read_stream *st;
     + 	unsigned long mapsize;
     +-	unsigned long size_ul;
     + 	void *mapped;
     + 
     + 	mapped = odb_source_loose_map_object(loose, oid, &mapsize);
     +@@ odb/source-loose.c: static int odb_source_loose_read_object_stream(struct odb_read_stream **out,
     + 		goto error;
     + 	}
     + 
     +-	/*
     +-	 * object_info.sizep is unsigned long* (32-bit on Windows), but
     +-	 * st->base.size is size_t (64-bit). Use temporary variable.
     +-	 * Note: loose objects >4GB would still truncate here, but such
     +-	 * large loose objects are uncommon (they'd normally be packed).
     +-	 */
     +-	oi.sizep = &size_ul;
     ++	oi.sizep = &st->base.size;
     + 	oi.typep = &st->base.type;
     + 
     + 	if (parse_loose_header(st->hdr, &oi) < 0 || st->base.type < 0)
     + 		goto error;
     +-	st->base.size = size_ul;
     + 
     + 	st->mapped = mapped;
     + 	st->mapsize = mapsize;
     +
       ## odb/streaming.c ##
      @@ odb/streaming.c: static int open_istream_incore(struct odb_read_stream **out,
       		.base.read = read_istream_incore,

-- 
gitgitgadget

^ permalink raw reply

* Re: [PATCH] commit-graph: use timestamp_t for max parent generation accumulator
From: Derrick Stolee @ 2026-06-15 11:44 UTC (permalink / raw)
  To: Patrick Steinhardt, Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren
In-Reply-To: <ai-zzWn9Ls6-j9h8@pks.im>

On 6/15/26 4:11 AM, Patrick Steinhardt wrote:
> On Sun, Jun 14, 2026 at 06:57:50AM +0000, Elijah Newren via GitGitGadget wrote:
>>      commit-graph: use timestamp_t for max parent generation accumulator
>>      
>>      We found a few repositories in the wild with commits whose authors were
>>      apparently on a computer in the year 2120 when they recorded their
>>      commits. Apparently, in a century from now, some folks are going to have
>>      a really weird timezone as well (-13068837), though the timezone doesn't
>>      factor into this patch at all.

>> @@ -1669,7 +1669,7 @@ static void compute_reachable_generation_numbers(
>>   			struct commit *current = list->item;
>>   			struct commit_list *parent;
>>   			int all_parents_computed = 1;
>> -			uint32_t max_gen = 0;
>> +			timestamp_t max_gen = 0;
>>   
>>   			for (parent = current->parents; parent; parent = parent->next) {
>>   				repo_parse_commit(info->r, parent->item);
> 
> This looks obviously correct.

I agree. I was surprised this was the only necessary change, but
your message clearly describes how the timing of the patch that
delivered this change contributed to the mismatch.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH] builtin/history: unuse the commit buffer after use
From: Patrick Steinhardt @ 2026-06-15  9:48 UTC (permalink / raw)
  To: Kaartic Sivaraam; +Cc: Git mailing list
In-Reply-To: <20260614141600.620272-1-kaartic.sivaraam@gmail.com>

On Sun, Jun 14, 2026 at 02:15:40PM +0000, Kaartic Sivaraam wrote:
> While running `git history reword` using a Git built with `SANITIZE` flag set
> to `address,leak`, we could observe the following leak being reported:

Huh, curious. That seems to hint that we're missing test coverage for
this specific scenario, as our test suite doesn't detect this leak.

[snip]
> A deeper investigation on this reveals the following as the root cause.
> 
> As part of rewording a commit in `git history`, we get the commit message
> buffer in the `commit_tree_ext` function. This in turn obtains the buffer
> from `repo_logmsg_reencode`. Given how `commit_tree_ext` is invoking the
> function with the last two parameters as NULL, we are clearly not expecting
> a reencode to happen. In this case, the buffer that we receive from
> `repo_logmsg_reencode` ends up always being obtained from a call to
> `repo_get_commit_buffer`.
> 
> This buffer is expected to be released with an accompanying call to
> `repo_unuse_commit_buffer` which takes care of freeing it. This call
> is missing in the `commit_tree_ext` flow thus resulting in the leak.

So this doesn't really read specific at all, and I would have expected
us to hit this leak. Puzzling.

> Fix this by ensuring we call `repo_unuse_commit_buffer` on the
> original_message buffer.
> 
> Signed-off-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> ---
> I must mention that I also noticed the following comment in `commit_tree_ext`:
> 
> »       /* We retain authorship of the original commit. */
> »       original_message = repo_logmsg_reencode(repo, commit_with_message, NULL, NULL);
> 
> ... but I'm not quite sure why we don't unuse the buffer after its purpose is
> done. Kindly englighten me in case I missed something.

Did you maybe confuse "authorship" with "ownership" while reading the
comment? The comment only mentions that we retain the original "Author"
commit metadata, it doesn't refer to ownership of the underlying
objects.

> diff --git a/builtin/history.c b/builtin/history.c
> index 091465a59e..0e9259b5d7 100644
> --- a/builtin/history.c
> +++ b/builtin/history.c
> @@ -154,6 +154,7 @@ static int commit_tree_ext(struct repository *repo,
>  	free_commit_extra_headers(original_extra_headers);
>  	strbuf_release(&commit_message);
>  	free(original_author);
> +	repo_unuse_commit_buffer(repo, commit_with_message, original_message);
>  	return ret;
>  }

Yup, this makes sense to me.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH v14 3/6] branch: prepare delete_branches for a bulk caller
From: Phillip Wood @ 2026-06-15  9:47 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget, git
  Cc: Kristoffer Haugsbakk, Johannes Sixt, Harald Nordgren
In-Reply-To: <259113e304c4085c2bd90cce3a40c965744d5a00.1780999917.git.gitgitgadget@gmail.com>

Hi Harald

On 09/06/2026 11:11, Harald Nordgren via GitGitGadget wrote:
>
> @@ -240,7 +245,7 @@ static int delete_branches(int argc, const char **argv, int kinds,
>   	int i;
>   	int ret = 0;
>   	int remote_branch = 0;
> -	int force, quiet;
> +	int force, quiet, dry_run, no_head_fallback;

As with the previous patch it would be safer to initialize the new 
variables where they are declared.

>   	for_each_string_list_item(item, &refs_to_delete) {
>   		char *describe_ref = item->util;
>   		char *name = item->string;
> -		if (!refs_ref_exists(get_main_ref_store(the_repository), name)) {
> +		if (dry_run) {
> +			if (!quiet)
> +				printf(remote_branch
> +					? _("Would delete remote-tracking branch %s (was %s).\n")
> +					: _("Would delete branch %s (was %s).\n"),

I wondered what the "was %s" was about but it prints the symref target 
or oid of the ref.

Thanks

Phillip

> +					name + branch_name_pos, describe_ref);
> +		} else if (!refs_ref_exists(get_main_ref_store(the_repository), name)) {
>   			char *refname = name + branch_name_pos;
>   			if (!quiet)
>   				printf(remote_branch


^ permalink raw reply

* Re: [PATCH v14 1/6] branch: add --forked filter for --list mode
From: Phillip Wood @ 2026-06-15  9:46 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget, git
  Cc: Kristoffer Haugsbakk, Johannes Sixt, Harald Nordgren
In-Reply-To: <7383872f4b2f422ec36b11ab5fb31cce08e6106a.1780999917.git.gitgitgadget@gmail.com>

Hi Harald

On 09/06/2026 11:11, Harald Nordgren via GitGitGadget wrote:
> From: Harald Nordgren <haraldnordgren@gmail.com>
> 
> Add a --forked option to "git branch" list mode that lists only
> branches whose configured upstream matches <branch>. The argument
> can be a ref (e.g. "origin/main", "master") or a shell glob
> (e.g. "origin/*"), and may be repeated to widen the filter.
> 
> It is an ordinary list filter, so it combines with the others:
> 
>      git branch --merged origin/main --forked 'origin/*'
> 
> lists branches forked from origin that are already merged into
> origin/main, and --no-merged inverts the question.
> 
> This is the building block for --prune-merged, which deletes the
> listed branches once they have landed on their upstream.
> 
> Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
> ---
>   Documentation/git-branch.adoc | 10 +++-
>   builtin/branch.c              | 18 ++++++-
>   ref-filter.c                  | 70 ++++++++++++++++++++++++++
>   ref-filter.h                  | 10 ++++
>   t/t3200-branch.sh             | 92 +++++++++++++++++++++++++++++++++++
>   5 files changed, 197 insertions(+), 3 deletions(-)

It's nice to see that moving the code into the ref-filter.c has reduced 
the overall number of additions by ~50 lines. The documentation and 
implementation look fine though I have a couple of thoughts:

  - Previous iterations supported "origin" as a short hand for the branch
    origin/HEAD points to. That was nice because it means we can use the
    same syntax for "git checkout -b" and "git branch --forked". It
    would probably be a good idea to support it.

  - We could probably be a bit smarter about the way we handle patterns
    by copying what dwim_ref() does to support things like
    remotes/origin/* but I don't think we need to do that now.

> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> index e7829c2c4b..4e7deddc04 100755
> --- a/t/t3200-branch.sh
> +++ b/t/t3200-branch.sh
> @@ -1717,4 +1717,96 @@ test_expect_success 'errors if given a bad branch name' '
>   	test_cmp expect actual
>   '
>   
> +test_expect_success '--forked: setup' '
> +	test_create_repo forked-upstream &&
> +	test_commit -C forked-upstream base &&
> +	git -C forked-upstream branch one base &&
> +	git -C forked-upstream branch two base &&
> +
> +	test_create_repo forked-other &&
> +	test_commit -C forked-other other-base &&
> +	git -C forked-other branch foreign other-base &&
> +
> +	git clone forked-upstream forked &&
> +	git -C forked remote add other ../forked-other &&

We can use "add -f" to fetch here rather than doing it separately.

> +	git -C forked fetch other &&
> +	git -C forked branch local-base &&
> +	git -C forked branch --track local-one origin/one &&
> +	git -C forked branch --track local-two origin/two &&
> +	git -C forked branch --track local-foreign other/foreign &&
> +	git -C forked branch detached &&

Normally we use "detached" to mean no branch, lets read on and see how 
this is used ...

> +	git -C forked branch --track local-trunk local-base
> +'
> +
> +test_expect_success '--forked <upstream-tracking-branch> filters by upstream' '
> +	git -C forked branch --forked origin/one --format="%(refname:short)" >actual &&

origin/one and origin/two point to the same commit, so this demonstrates 
that we're checking the branch names, not the topology which is good. 
All of the local branches point at their upstream which isn't very 
realistic - I wonder if we should add some local commits?

The tests all look sensible, but there is no coverage for combining 
--forked with branch names as in

     git branch --forked <arg> <branch>

Thanks

Phillip


> +	echo local-one >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--forked <glob> filters by wildmatch' '
> +	git -C forked branch --forked "origin/*" --format="%(refname:short)" >actual &&
> +	cat >expect <<-\EOF &&
> +	local-one
> +	local-two
> +	main
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--forked <local-branch> matches branches with local upstream' '
> +	git -C forked branch --forked local-base --format="%(refname:short)" >actual &&
> +	echo local-trunk >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--forked can be repeated to widen the filter' '
> +	git -C forked branch --forked origin/one --forked other/foreign --format="%(refname:short)" >actual &&
> +	cat >expect <<-\EOF &&
> +	local-foreign
> +	local-one
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--forked combines literal and glob arguments' '
> +	git -C forked branch --forked local-base --forked "other/*" --format="%(refname:short)" >actual &&
> +	cat >expect <<-\EOF &&
> +	local-foreign
> +	local-trunk
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--forked "*/*" covers every remote-tracking upstream' '
> +	git -C forked branch --forked "*/*" --format="%(refname:short)" >actual &&
> +	cat >expect <<-\EOF &&
> +	local-foreign
> +	local-one
> +	local-two
> +	main
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--forked composes with --no-merged' '
> +	test_when_finished "git -C forked checkout detached" &&
> +	git -C forked checkout local-one &&
> +	test_commit -C forked local-only &&
> +	git -C forked branch --forked "origin/*" --no-merged origin/one \
> +		--format="%(refname:short)" >actual &&
> +	echo local-one >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--forked rejects unknown branch/pattern' '
> +	test_must_fail git -C forked branch --forked nope 2>err &&
> +	test_grep "not a valid branch or pattern" err
> +'
> +
> +test_expect_success '--forked requires a value' '
> +	test_must_fail git -C forked branch --forked 2>err &&
> +	test_grep "requires a value" err
> +'
> +
>   test_done


^ permalink raw reply

* Re: [PATCH v14 4/6] branch: add --prune-merged <branch>
From: Phillip Wood @ 2026-06-15  9:46 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget, git
  Cc: Kristoffer Haugsbakk, Johannes Sixt, Harald Nordgren
In-Reply-To: <9924373da0a0598cabe4f08f3bc4200833679171.1780999917.git.gitgitgadget@gmail.com>

Hi Harald

On 09/06/2026 11:11, Harald Nordgren via GitGitGadget wrote:
> From: Harald Nordgren <haraldnordgren@gmail.com>
> 
> 	git branch --prune-merged <branch>...

Please see my comments on the previous version about the naming of this 
option. I really think we need to start a discussion to find a better 
name for this option as the other options to delete a branch are named 
"delete" rather than "prune" and this does not remove the branches 
listed by "--merge"

> deletes the local branches that "--forked <branch>" would list,
> keeping only those whose tip is reachable from their configured
> upstream: the work has already landed on the upstream they track,
> so the local copy is no longer needed.
> 
> Reachability is read from local refs; nothing is fetched. Run
> "git fetch" first if you want fresh upstream refs.

I don't  think this sentence adds anything - git never fetches unless 
the user explicitly asks it to.

> 
> Three kinds of branches are spared:
> 
>    * any branch checked out in any worktree;
>    * any branch whose upstream no longer resolves locally, since a
>      missing upstream is not by itself a sign of integration;
>    * any branch whose push destination equals its upstream
>      (<branch>@{push} is the same as <branch>@{upstream}), such as
>      a local "main" that tracks and pushes to "origin/main". Right
>      after a pull it just looks "fully merged", so it is left
>      alone. Only branches that push somewhere other than their
>      upstream, typically topics in a fork workflow, are candidates.
> 
> Branches that are not yet merged into their upstream are reported
> as a short warning and skipped, so one unmerged topic does not
> abort the whole sweep.

I'm not sure about this warning - the user has asked us to delete the 
branches whose upstreams match those passed on the commandline and that 
have been merged so do they really want to hear about the ones that have 
not been merged? It might be useful to have a way to list those that 
have not been merged in the future.

> Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
> ---
>   Documentation/git-branch.adoc |  24 ++++
>   builtin/branch.c              |  67 +++++++++++-
>   t/t3200-branch.sh             | 201 ++++++++++++++++++++++++++++++++++
>   3 files changed, 290 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/git-branch.adoc b/Documentation/git-branch.adoc
> index 62ebab6051..fdaccc9662 100644
> --- a/Documentation/git-branch.adoc
> +++ b/Documentation/git-branch.adoc
> @@ -25,6 +25,7 @@ git branch (-m|-M) [<old-branch>] <new-branch>
>   git branch (-c|-C) [<old-branch>] <new-branch>
>   git branch (-d|-D) [-r] <branch-name>...
>   git branch --edit-description [<branch-name>]
> +git branch --prune-merged <branch>...
>   
>   DESCRIPTION
>   -----------
> @@ -201,6 +202,29 @@ This option is only applicable in non-verbose mode.
>   	Print the name of the current branch. In detached `HEAD` state,
>   	nothing is printed.
>   
> +`--prune-merged <branch>...`::
> +	Delete the local branches that `--forked` would list for the
> +	given _<branch>_ arguments, but only those whose tip is
> +	reachable from their configured upstream. In other words, the
> +	work on the branch has already landed on the upstream it
> +	tracks, so the local copy is no longer needed. Several
> +	_<branch>_ patterns may be given, e.g. `git branch
> +	--prune-merged origin/main 'feature*'`.
> ++
> +Reachability is checked against whatever the upstream refs say
> +locally; nothing is fetched. Run `git fetch` first if you want
> +the upstream refs refreshed.
Maybe

Reachability is checked against the remote-tracking branch. Run `git 
fetch` first if you want update the remote-tracking branch.

> ++
> +A branch is left alone if any of the following holds:

s/left alone/not deleted/

> +its upstream no longer resolves locally; it is checked out in any

s/upstream no longer resolves locally/upstream remote-tracking branch no 
longer exists/

> +worktree; or its push destination (`<branch>@{push}`) equals its
> +upstream (`<branch>@{upstream}`), so it cannot be distinguished
> +from a freshly pulled trunk that just looks "fully merged".

What's a "freshly pulled trunk"? "trunk" does not appear in gitglossary(7)

> ++
> +Branches refused by the "fully merged" safety check are listed as
> +warnings and skipped; pass them to `git branch -D` explicitly if
> +you want them gone.

s/them gone/to delete them/

> +
>   `-v`::
>   `-vv`::
>   `--verbose`::
> diff --git a/builtin/branch.c b/builtin/branch.c
> index 2cc5a8cde0..af37a0ceb7 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -38,6 +38,7 @@ static const char * const builtin_branch_usage[] = {
>   	N_("git branch [<options>] (-c | -C) [<old-branch>] <new-branch>"),
>   	N_("git branch [<options>] [-r | -a] [--points-at]"),
>   	N_("git branch [<options>] [-r | -a] [--format]"),
> +	N_("git branch [<options>] --prune-merged <branch>..."),
>   	NULL
>   };
>   
> @@ -715,6 +716,61 @@ static int parse_opt_forked(const struct option *opt, const char *arg, int unset
>   	return 0;
>   }
>   
> +static int prune_merged_branches(int argc, const char **argv,
> +				 int quiet)
> +{
> +	struct ref_store *refs = get_main_ref_store(the_repository);
> +	struct ref_filter filter = REF_FILTER_INIT;
> +	struct ref_array candidates;
> +	struct strvec deletable = STRVEC_INIT;
> +	int i, ret = 0;
> +
> +	if (!argc)
> +		die(_("--prune-merged requires at least one <branch>"));
> +
> +	for (i = 0; i < argc; i++)
> +		if (ref_filter_forked_add(&filter, argv[i]) < 0)
> +			die(_("'%s' is not a valid branch or pattern"), argv[i]);
> +
> +	filter.kind = FILTER_REFS_BRANCHES;
> +	memset(&candidates, 0, sizeof(candidates));

It would be nicer to add "= { 0 }" to the declaration of candidates above.

> +	filter_refs(&candidates, &filter, filter.kind);
> +
> +	for (i = 0; i < candidates.nr; i++) {
> +		const char *full_name = candidates.items[i]->refname;
> +		const char *short_name;
> +		struct branch *branch;
> +		const char *upstream, *push;
> +
> +		if (!skip_prefix(full_name, "refs/heads/", &short_name))
> +			continue;

If we've set filter.kind = FILTER_REFS_BRANCHS how can this condition fail?

> +		if (branch_checked_out(full_name))
> +			continue;
> +
> +		branch = branch_get(short_name);
> +		upstream = branch ? branch_get_upstream(branch, NULL) : NULL;

How can branch be NULL? Don't we require branch_get() to succeed in 
order to filter it?

> +		if (!upstream || !refs_ref_exists(refs, upstream))
> +			continue;
> +		push = branch ? branch_get_push(branch, NULL) : NULL;
> +		if (!push || !strcmp(push, upstream))
> +			continue;

By the time we've reached this point we know that 
branch@{upstream}exists and does not match branch@{push} - good

> +		strvec_push(&deletable, short_name);
> +	}
> +
> +	if (deletable.nr)
> +		ret = delete_branches(deletable.nr, deletable.v,
> +				      FILTER_REFS_BRANCHES,
> +				      DELETE_BRANCH_WARN_ONLY |
> +				      DELETE_BRANCH_NO_HEAD_FALLBACK |
> +				      (quiet ? DELETE_BRANCH_QUIET : 0));

Here we delete the branches - good.
> +		OPT_BOOL(0, "prune-merged", &prune_merged,
> +			N_("delete local branches whose upstream matches <branch> and is merged")),

s/is/are/

Sorry I didn't get round to reviewing these last week, I'll try and take 
a look at the tests and the other patches tomorrow

Thanks

Phillip

> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> index 4e7deddc04..27ea1319bb 100755
> --- a/t/t3200-branch.sh
> +++ b/t/t3200-branch.sh
> @@ -1809,4 +1809,205 @@ test_expect_success '--forked requires a value' '
>   	test_grep "requires a value" err
>   '
>   
> +test_expect_success '--prune-merged: setup' '
> +	test_create_repo pm-upstream &&
> +	test_commit -C pm-upstream base &&
> +	git -C pm-upstream checkout -b next &&
> +	test_commit -C pm-upstream one-commit &&
> +	test_commit -C pm-upstream two-commit &&
> +	git -C pm-upstream branch one HEAD~ &&
> +	git -C pm-upstream branch two HEAD &&
> +	git -C pm-upstream branch wip main &&
> +	git -C pm-upstream checkout main &&
> +	test_create_repo pm-fork
> +'
> +
> +test_expect_success '--prune-merged deletes branches integrated into upstream' '
> +	test_when_finished "rm -rf pm-merged" &&
> +	git clone pm-upstream pm-merged &&
> +	git -C pm-merged remote add fork ../pm-fork &&
> +	test_config -C pm-merged remote.pushDefault fork &&
> +	test_config -C pm-merged push.default current &&
> +	git -C pm-merged branch one one-commit &&
> +	git -C pm-merged branch --set-upstream-to=origin/next one &&
> +	git -C pm-merged branch two two-commit &&
> +	git -C pm-merged branch --set-upstream-to=origin/next two &&
> +
> +	git -C pm-merged branch --prune-merged "origin/*" &&
> +
> +	test_must_fail git -C pm-merged rev-parse --verify refs/heads/one &&
> +	test_must_fail git -C pm-merged rev-parse --verify refs/heads/two
> +'
> +
> +test_expect_success '--prune-merged accepts a literal upstream' '
> +	test_when_finished "rm -rf pm-literal" &&
> +	git clone pm-upstream pm-literal &&
> +	git -C pm-literal remote add fork ../pm-fork &&
> +	test_config -C pm-literal remote.pushDefault fork &&
> +	test_config -C pm-literal push.default current &&
> +	git -C pm-literal branch one one-commit &&
> +	git -C pm-literal branch --set-upstream-to=origin/next one &&
> +
> +	git -C pm-literal branch --prune-merged origin/next &&
> +
> +	test_must_fail git -C pm-literal rev-parse --verify refs/heads/one
> +'
> +
> +test_expect_success '--prune-merged unions multiple <branch> arguments' '
> +	test_when_finished "rm -rf pm-union" &&
> +	git clone pm-upstream pm-union &&
> +	git -C pm-union remote add fork ../pm-fork &&
> +	test_config -C pm-union remote.pushDefault fork &&
> +	test_config -C pm-union push.default current &&
> +	git -C pm-union branch one one-commit &&
> +	git -C pm-union branch --set-upstream-to=origin/next one &&
> +	git -C pm-union branch two base &&
> +	git -C pm-union branch --set-upstream-to=origin/main two &&
> +	git -C pm-union checkout --detach &&
> +
> +	git -C pm-union branch --prune-merged origin/next origin/main &&
> +
> +	test_must_fail git -C pm-union rev-parse --verify refs/heads/one &&
> +	test_must_fail git -C pm-union rev-parse --verify refs/heads/two
> +'
> +
> +test_expect_success '--prune-merged accepts a local upstream' '
> +	test_when_finished "rm -rf pm-local" &&
> +	git clone pm-upstream pm-local &&
> +	git -C pm-local remote add fork ../pm-fork &&
> +	test_config -C pm-local remote.pushDefault fork &&
> +	test_config -C pm-local push.default current &&
> +	git -C pm-local checkout -b trunk &&
> +	git -C pm-local branch one one-commit &&
> +	git -C pm-local branch --set-upstream-to=trunk one &&
> +	git -C pm-local merge --ff-only one-commit &&
> +
> +	git -C pm-local branch --prune-merged trunk &&
> +
> +	test_must_fail git -C pm-local rev-parse --verify refs/heads/one
> +'
> +
> +test_expect_success '--prune-merged warns instead of erroring on un-integrated commits' '
> +	test_when_finished "rm -rf pm-unmerged" &&
> +	git clone pm-upstream pm-unmerged &&
> +	git -C pm-unmerged remote add fork ../pm-fork &&
> +	test_config -C pm-unmerged remote.pushDefault fork &&
> +	test_config -C pm-unmerged push.default current &&
> +	git -C pm-unmerged checkout -b wip origin/wip &&
> +	git -C pm-unmerged branch --set-upstream-to=origin/next wip &&
> +	test_commit -C pm-unmerged local-only &&
> +	git -C pm-unmerged checkout - &&
> +
> +	git -C pm-unmerged branch --prune-merged "origin/*" 2>err &&
> +	test_grep "not fully merged" err &&
> +	test_grep ! "If you are sure you want to delete it" err &&
> +	git -C pm-unmerged rev-parse --verify refs/heads/wip
> +'
> +
> +test_expect_success '--prune-merged is silent about not-merged-to-HEAD' '
> +	test_when_finished "rm -rf pm-nohead" &&
> +	git clone pm-upstream pm-nohead &&
> +	git -C pm-nohead remote add fork ../pm-fork &&
> +	test_config -C pm-nohead remote.pushDefault fork &&
> +	test_config -C pm-nohead push.default current &&
> +	git -C pm-nohead branch topic one-commit &&
> +	git -C pm-nohead branch --set-upstream-to=origin/next topic &&
> +
> +	git -C pm-nohead branch --prune-merged "origin/*" 2>err &&
> +
> +	test_grep ! "not yet merged to HEAD" err &&
> +	test_must_fail git -C pm-nohead rev-parse --verify refs/heads/topic
> +'
> +
> +test_expect_success '--prune-merged skips branches whose upstream is gone' '
> +	test_when_finished "rm -rf pm-upstream-gone" &&
> +	git clone pm-upstream pm-upstream-gone &&
> +	git -C pm-upstream-gone remote add fork ../pm-fork &&
> +	test_config -C pm-upstream-gone remote.pushDefault fork &&
> +	test_config -C pm-upstream-gone push.default current &&
> +	git -C pm-upstream-gone branch one one-commit &&
> +	git -C pm-upstream-gone branch --set-upstream-to=origin/next one &&
> +
> +	git -C pm-upstream-gone update-ref -d refs/remotes/origin/next &&
> +	git -C pm-upstream-gone branch --prune-merged "origin/*" &&
> +
> +	git -C pm-upstream-gone rev-parse --verify refs/heads/one
> +'
> +
> +test_expect_success '--prune-merged never deletes the checked-out branch' '
> +	test_when_finished "rm -rf pm-head" &&
> +	git clone pm-upstream pm-head &&
> +	git -C pm-head remote add fork ../pm-fork &&
> +	test_config -C pm-head remote.pushDefault fork &&
> +	test_config -C pm-head push.default current &&
> +	git -C pm-head checkout -b one one-commit &&
> +	git -C pm-head branch --set-upstream-to=origin/next one &&
> +
> +	git -C pm-head branch --prune-merged "origin/*" &&
> +
> +	git -C pm-head rev-parse --verify refs/heads/one
> +'
> +
> +test_expect_success '--prune-merged spares branches that push back to their upstream' '
> +	test_when_finished "rm -rf pm-push-eq" &&
> +	git clone pm-upstream pm-push-eq &&
> +	git -C pm-push-eq checkout --detach &&
> +
> +	git -C pm-push-eq branch --prune-merged "origin/*" &&
> +
> +	git -C pm-push-eq rev-parse --verify refs/heads/main
> +'
> +
> +test_expect_success '--prune-merged spares a per-branch pushRemote==upstream remote' '
> +	test_when_finished "rm -rf pm-push-branch" &&
> +	git clone pm-upstream pm-push-branch &&
> +	git -C pm-push-branch remote add fork ../pm-fork &&
> +	test_config -C pm-push-branch remote.pushDefault fork &&
> +	test_config -C pm-push-branch push.default current &&
> +	test_config -C pm-push-branch branch.main.pushRemote origin &&
> +	git -C pm-push-branch checkout --detach &&
> +
> +	git -C pm-push-branch branch --prune-merged "origin/*" &&
> +
> +	git -C pm-push-branch rev-parse --verify refs/heads/main
> +'
> +
> +test_expect_success '--prune-merged prunes when @{push} differs from @{upstream}' '
> +	test_when_finished "rm -rf pm-push-diff" &&
> +	git clone pm-upstream pm-push-diff &&
> +	git -C pm-push-diff remote add fork ../pm-fork &&
> +	test_config -C pm-push-diff remote.pushDefault fork &&
> +	test_config -C pm-push-diff push.default current &&
> +	git -C pm-push-diff branch topic one-commit &&
> +	git -C pm-push-diff branch --set-upstream-to=origin/next topic &&
> +	git -C pm-push-diff checkout --detach &&
> +
> +	git -C pm-push-diff branch --prune-merged "origin/*" &&
> +
> +	test_must_fail git -C pm-push-diff rev-parse --verify refs/heads/topic
> +'
> +
> +test_expect_success '--prune-merged requires at least one <branch>' '
> +	test_must_fail git -C forked branch --prune-merged 2>err &&
> +	test_grep "requires at least one <branch>" err
> +'
> +
> +test_expect_success '--prune-merged takes positional <branch> arguments' '
> +	test_when_finished "rm -rf pm-positional" &&
> +	git clone pm-upstream pm-positional &&
> +	git -C pm-positional remote add fork ../pm-fork &&
> +	test_config -C pm-positional remote.pushDefault fork &&
> +	test_config -C pm-positional push.default current &&
> +	git -C pm-positional branch one one-commit &&
> +	git -C pm-positional branch --set-upstream-to=origin/next one &&
> +	git -C pm-positional branch two base &&
> +	git -C pm-positional branch --set-upstream-to=origin/main two &&
> +	git -C pm-positional checkout --detach &&
> +
> +	git -C pm-positional branch --prune-merged origin/next origin/main &&
> +
> +	test_must_fail git -C pm-positional rev-parse --verify refs/heads/one &&
> +	test_must_fail git -C pm-positional rev-parse --verify refs/heads/two
> +'
> +
>   test_done


^ permalink raw reply

* Re: [PATCH 7/7] odb: use size_t for object_info.sizep and the size APIs
From: Johannes Schindelin @ 2026-06-15  9:29 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Johannes Schindelin via GitGitGadget, git, Kristofer Karlsson
In-Reply-To: <aibJZ8EXoQSD2lsB@pks.im>

Hi Patrick,

On Mon, 15 Jun 2026, Patrick Steinhardt wrote:

> On Thu, Jun 04, 2026 at 10:51:12AM +0000, Johannes Schindelin via GitGitGadget wrote:
> > diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> > index fa45f774d7..fa6e396ddc 100644
> > --- a/builtin/cat-file.c
> > +++ b/builtin/cat-file.c
> > @@ -120,7 +120,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
> >  	struct object_id oid;
> >  	enum object_type type;
> >  	char *buf;
> > -	unsigned long size;
> > +	size_t size;
> >  	struct object_context obj_context = {0};
> >  	struct object_info oi = OBJECT_INFO_INIT;
> >  	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
> > @@ -166,7 +166,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
> >  		if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) {
> >  			size_t s = size;
> >  			buf = replace_idents_using_mailmap(buf, &s);
> > -			size = cast_size_t_to_ulong(s);
> > +			size = s;
> >  		}
> >  
> >  		printf("%"PRIuMAX"\n", (uintmax_t)size);
> 
> Can't we drop this local variable completely and instead supply `&size`
> directly?

Well spotted!

> > @@ -219,7 +225,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
> >  		if (use_mailmap) {
> >  			size_t s = size;
> >  			buf = replace_idents_using_mailmap(buf, &s);
> > -			size = cast_size_t_to_ulong(s);
> > +			size = s;
> >  		}
> >  
> >  		/* otherwise just spit out the data */
> > @@ -266,7 +272,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
> >  		if (use_mailmap) {
> >  			size_t s = size;
> >  			buf = replace_idents_using_mailmap(buf, &s);
> > -			size = cast_size_t_to_ulong(s);
> > +			size = s;
> >  		}
> >  		break;
> >  	}
> > @@ -446,7 +455,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
> >  		if (use_mailmap) {
> >  			size_t s = size;
> >  			contents = replace_idents_using_mailmap(contents, &s);
> > -			size = cast_size_t_to_ulong(s);
> > +			size = s;
> >  		}
> >  
> >  		if (type != data->type)
> 
> Likewise for these three instances.

I totally agree.

> > @@ -555,7 +564,7 @@ static void batch_object_write(const char *obj_name,
> >  			if (!buf)
> >  				die(_("unable to read %s"), oid_to_hex(&data->oid));
> >  			buf = replace_idents_using_mailmap(buf, &s);
> > -			data->size = cast_size_t_to_ulong(s);
> > +			data->size = s;
> >  
> >  			free(buf);
> >  		}
> 
> And I think this site here can be adapted, as well.

Indeed!

> > diff --git a/diff.c b/diff.c
> > index 5a584fa1d5..816b89dc6c 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -4594,8 +4594,9 @@ int diff_populate_filespec(struct repository *r,
> >  		}
> >  	}
> >  	else {
> > +		size_t size_st = 0;
> >  		struct object_info info = {
> > -			.sizep = &s->size
> > +			.sizep = &size_st
> >  		};
> >  
> >  		if (!(size_only || check_binary))
> > @@ -4617,6 +4618,7 @@ int diff_populate_filespec(struct repository *r,
> >  			die("unable to read %s", oid_to_hex(&s->oid));
> >  
> >  object_read:
> > +		s->size = cast_size_t_to_ulong(size_st);
> >  		if (size_only || check_binary) {
> >  			if (size_only)
> >  				return 0;
> > @@ -4631,6 +4633,7 @@ object_read:
> >  			if (odb_read_object_info_extended(r->objects, &s->oid, &info,
> >  							  OBJECT_INFO_LOOKUP_REPLACE))
> >  				die("unable to read %s", oid_to_hex(&s->oid));
> > +			s->size = cast_size_t_to_ulong(size_st);
> >  		}
> >  		s->should_free = 1;
> >  	}
> 
> The flow in this function is quite weird if you ask me, but that's a
> preexisting issue. This does look correct to me, even if it's awkward.

Yes, on all four accounts.

Ciao,
Johannes

^ permalink raw reply

* Re: [PATCH 4/7] packfile: widen unpack_entry()'s size out-parameter to size_t
From: Johannes Schindelin @ 2026-06-15  9:29 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Johannes Schindelin via GitGitGadget, git, Kristofer Karlsson
In-Reply-To: <aibJW3h4PaYhOqFb@pks.im>

Hi Patrick,

On Mon, 15 Jun 2026, Patrick Steinhardt wrote:

> On Thu, Jun 04, 2026 at 10:51:09AM +0000, Johannes Schindelin via GitGitGadget wrote:
> > diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> > index 82bc6dcc00..3dff898c43 100644
> > --- a/builtin/fast-import.c
> > +++ b/builtin/fast-import.c
> > @@ -1239,6 +1239,8 @@ static void *gfi_unpack_entry(
> >  	unsigned long *sizep)
> >  {
> >  	enum object_type type;
> > +	size_t size_st = 0;
> > +	void *data;
> >  	struct packed_git *p = all_packs[oe->pack_id];
> >  	if (p == pack_data && p->pack_size < (pack_size + the_hash_algo->rawsz)) {
> >  		/* The object is stored in the packfile we are writing to
> > @@ -1260,7 +1262,10 @@ static void *gfi_unpack_entry(
> >  		 */
> >  		p->pack_size = pack_size + the_hash_algo->rawsz;
> >  	}
> > -	return unpack_entry(the_repository, p, oe->idx.offset, &type, sizep);
> > +	data = unpack_entry(the_repository, p, oe->idx.offset, &type, &size_st);
> > +	if (sizep)
> > +		*sizep = cast_size_t_to_ulong(size_st);
> > +	return data;
> >  }
> 
> Nit, please feel free to ignore: do we want to add a NEEDSWORK comment
> here?

Hehe... My mind translates the `cast_size_t_to_ulong()` function to
"NEEDSWORK!" already ;-)

Ciao,
Johannes

^ permalink raw reply

* Re: [PATCH 3/7] pack-objects(check_pack_inflate()): use size_t instead of unsigned long
From: Johannes Schindelin @ 2026-06-15  9:29 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Johannes Schindelin via GitGitGadget, git, Kristofer Karlsson
In-Reply-To: <aibJVSrKPCfDVXw7@pks.im>

Hi Patrick,

On Mon, 15 Jun 2026, Patrick Steinhardt wrote:

> On Thu, Jun 04, 2026 at 10:51:08AM +0000, Johannes Schindelin via GitGitGadget wrote:
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > 
> > `write_reuse_object()` learned to track its packed-object size as
> > `size_t` in 606c192380 (odb, packfile: use size_t for streaming
> > object sizes, 2026-05-08), but the comparison sink it feeds,
> > `check_pack_inflate()`, still takes the expected decompressed size
> > as `unsigned long`. The call site bridges the mismatch with
> > `cast_size_t_to_ulong()`, which on Windows turns a >4 GiB object
> > into an immediate die().
> > 
> > That function only uses `expect` once: as the right-hand side of a
> > `stream.total_out == expect` equality test against zlib's counter.
> > zlib's own `total_out` counter is `uLong` and is therefore still
> > 32-bit-bound on Windows. Widening `expect` to `size_t` cannot fix that,
> > but it is a strict improvement nonetheless: instead of dying outright,
> > an oversized object now simply makes the equality fail and lets
> > `write_reuse_object()` fall back to `write_no_reuse_object()`, which
> > decompresses and re-deflates the content (and which the larger
> > pack-objects widening series targets separately).
> 
> Hm. I wonder whether it's possible to reset `stream.total_out` on every
> iteration and instead have a local `size_t` variable that we use to
> track the total number of inflated bytes?

Possible? Yes. Appropriate? Unlikely. We would now pretend to have
inflated less bytes, _just_ to appease a data type limitation that we
already worked around in d05d666977 (git-zlib: handle data streams larger
than 4GB, 2026-05-08).

Ciao,
Johannes

^ permalink raw reply

* Re: [PATCH 2/7] patch-delta: use size_t for sizes
From: Johannes Schindelin @ 2026-06-15  9:29 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Johannes Schindelin via GitGitGadget, git, Kristofer Karlsson
In-Reply-To: <aibJTHKsmqe_EJHc@pks.im>

Hi Patrick,

On Mon, 15 Jun 2026, Patrick Steinhardt wrote:

> On Thu, Jun 04, 2026 at 10:51:07AM +0000, Johannes Schindelin via GitGitGadget wrote:
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > 
> > `patch_delta()` takes the source and delta sizes by value and writes
> > back the reconstructed target size through an `unsigned long *`.  That
> > datatype cannot represent a value that exceeds 4 GiB on systems where
> > `unsigned long` is 32-bit (notably 64-bit Windows builds), though, even
> > though the delta encoding itself, the on-disk layout, and the in-memory
> > buffers happily carry such sizes. A `size_t` companion to
> > `get_delta_hdr_size()`, `get_delta_hdr_size_sz()`, was introduced in
> > 17fa077596 (delta, packfile: use size_t for delta header sizes,
> > 2026-05-08) precisely so that `patch_delta()` could be widened without
> > changing the on-the-wire decoding helper's signature.
> > 
> > Widen `patch_delta()`'s three size parameters to `size_t` and switch
> > its internal use of `get_delta_hdr_size()` to the `_sz` variant.
> > Then propagate the wider type through the callers.
> 
> Does `get_delta_hdr_size()` have any remaining callers after this patch
> series? I currently only spot two such callers, and you convert both of
> them in this patch.

As you noticed later on in the review: No, there are no such callers left,
and the `_sz` variant gets renamed, concluding the incremental migration
of that function from `unsigned long` to `size_t`.

> And can we reasonably add a test case that exercises this change?

Not reasonably, no. This would require constructing another artificial
_large_ object, this time with an unpacked Git object with a size >=4GB
that needs to be transmogrified into a different object.

Better leave the verification of this patch to static analysis (GCC or
Clang have become quite good at spotting things like this; Coverity would
be, too, if it ever comes back up from its "upgrades to the Scan servers",
https://web.archive.org/web/20260516152422/https://scan.coverity.com/
seems to be the start date of this update).

> 
> > diff --git a/packfile.c b/packfile.c
> > index 89366abfe3..e202f48837 100644
> > --- a/packfile.c
> > +++ b/packfile.c
> > @@ -1964,10 +1964,8 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
> >  			      (uintmax_t)curpos, p->pack_name);
> >  			data = NULL;
> >  		} else {
> > -			unsigned long sz;
> >  			data = patch_delta(base, base_size, delta_data,
> > -					   delta_size, &sz);
> > -			size = sz;
> > +					   delta_size, &size);
> 
> Nice that we get rid of this awkward construct.

Awkward, but necessary to allow for an incremental, reviewable conversion
;-)

Ciao,
Johannes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox