Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v2 4/5] builtin/refs: add "create" subcommand
From: Patrick Steinhardt @ 2026-06-30 10:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq5x31ukqv.fsf@gitster.g>

On Mon, Jun 29, 2026 at 01:58:32PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > +	if (repo_get_oid_with_flags(repo, argv[1], &newoid, GET_OID_SKIP_AMBIGUITY_CHECK))
> > +		die(_("invalid object ID: '%s'"), argv[1]);
> > +	if (is_null_oid(&newoid))
> > +		die(_("cannot create reference with null old object ID"));
> 
> An apparent typo here, "with null old" -> "with null new object
> name".
> 
> Other than that, I think this one is good.

Yes, indeed, good eyes.

Patrick

^ permalink raw reply

* Re: [PATCH v2 0/5] builtin/refs: add ability to write references
From: Patrick Steinhardt @ 2026-06-30 10:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqik71ul0j.fsf@gitster.g>

On Mon, Jun 29, 2026 at 01:52:44PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Reference-related functionality in Git is currently spread across many
> > different commands: git-update-ref(1), git-for-each-ref(1),
> > git-show-ref(1), git-pack-refs(1) and git-symbolic-ref(1). This makes it
> > hard for users to discover what functionality we have available to work
> > with references.
> >
> > We have thus started to consolidate this functionality into git-refs(1),
> > which is a toolbox of everything related to references. Until now, the
> > command doesn't handle functionality of git-update-ref(1).
> 
> This unfortunately hasn't heard any responses since June 17th, so I
> took a look at it again myself.  All the things we discussed during
> the review of the initial round has been addressed, it seems.
> 
> Shall we mark the topic ready for 'next' now?

Let me send one last reroll to fix the typo you pointed out. But other
than that I think this should be ready.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH 2/2] format-patch: fix leak of rev_info in prepare_bases()
From: Patrick Steinhardt @ 2026-06-30 10:26 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Karthik Nayak
In-Reply-To: <20260630064301.GB3733961@coredump.intra.peff.net>

On Tue, Jun 30, 2026 at 02:43:01AM -0400, Jeff King wrote:
> In prepare_bases() we do a custom revision walk, separate from the main
> format-patch walk. After we finish, we fail to call release_revisions(),
> possibly leaking its contents.
> 
> We failed to notice it so far because the revision machinery doesn't
> always allocate. But at least one case can trigger the leak: if a commit
> graph is present, then the topo-walk allocates revs.topo_walk_info and
> some associated data structures. You can see it in the test suite by
> running:
> 
>   make SANITIZE=leak
>   cd t
>   GIT_TEST_COMMIT_GRAPH=1 ./t4014-format-patch.sh
> 
> which yields many entries like:
> 
>   ==git==3687620==ERROR: LeakSanitizer: detected memory leaks
>   Direct leak of 200 byte(s) in 1 object(s) allocated from:
>       #0 0x7f4ccba185cb in malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:74
>       #1 0x55cd452cdd0b in do_xmalloc wrapper.c:55
>       #2 0x55cd452cdd9d in xmalloc wrapper.c:76
>       #3 0x55cd45255473 in init_topo_walk revision.c:3845
>       #4 0x55cd45255bef in prepare_revision_walk revision.c:4017
>       #5 0x55cd44ffec40 in prepare_bases builtin/log.c:1872
>       #6 0x55cd450010ec in cmd_format_patch builtin/log.c:2439

Interesting. Makes me wonder whether we should modify linux-TEST-vars to
also run with the leak checker enabled. Ideally we'd of course just do
this for all jobs, but the overhead is probably way too high... yes,
doing a simple benchmark shows a ~3x hit.

So this is definitely nothing we want to do for all jobs. But for the
linux-TEST-vars job it might make sense, as it exercises a bunch of
non-default code paths.

> The un-released rev_info has been there since the code was added in
> fa2ab86d18 (format-patch: add '--base' option to record base tree info,
> 2016-04-26), but back then we didn't even have a way to release rev_info
> resources! The actual leak probably started around f0d9cc4196
> (revision.c: begin refactoring --topo-order logic, 2018-11-01), but it's
> hard to bisect because there were so many other unrelated leaks back
> then.
> 
> So I'm not sure exactly when the leak started beyond "long ago", but it
> is easy-ish to find now (since we've plugged all those other leaks) and
> the solution is clear.
> 
> I didn't add a new test since we can demonstrate it with the existing
> ones, but it does require tweaking a test variable. We might consider
> ways to get more automatic leak-checking coverage there, but I think it
> should be done outside of this fix.

Yeah, agreed.

One thing worth noting: there are still six test suites that are failing
with this patch: t0095, t3451, t3452, t3453, t4013 and t4211. The t345x
failures are because of the missing call to `repo_unuse_commit_buffer()`
in git-history(1), which we already noted elsewhere.

All of the remaining leaks in t0095, t4013 and t4211 seem to be related
to bloom filters.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/log.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/builtin/log.c b/builtin/log.c
> index d027ce1e0b..350b35c556 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -1888,6 +1888,7 @@ static void prepare_bases(struct base_tree_info *bases,
>  		bases->nr_patch_id++;
>  	}
>  	clear_commit_base(&commit_base);
> +	release_revisions(&revs);
>  }

The fix looks sensible to me. We always initialize `revs` before we take
this exit path here, and there is no other early return that we'd have
to adjust.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH v4 3/3] replay: offer an option to linearize the commit topology
From: Johannes Schindelin @ 2026-06-30  9:44 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Toon Claes, git, Elijah Newren
In-Reply-To: <akInDBlyWbbRFcLH@pks.im>

Hi Patrick & Toon,

On Tue, 30 Jun 2026, Patrick Steinhardt wrote:

> On Fri, Jun 26, 2026 at 07:36:31AM +0200, Toon Claes wrote:
> > Patrick Steinhardt <ps@pks.im> writes:
> > 
> > > git-rebase(1) essentially knows about three different modes:
> > >
> > >   - "--no-rebase-merges", which is the default and maps to your
> > >     "--linearize".
> > >
> > >   - "--rebase-merges", which by default doesn't rebase cousins by using
> > >     "--ancestry-path" internally.
> > >
> > >   - "--rebase-merges=rebase-cousins", which doesn't pass the above
> > >     option.
> > >
> > > So it's not a simple boolean there, which makes me wonder whether we
> > > should mirror the same interface so that all of git-rebase(1)'s modes
> > > can be represented, as well.
> > 
> > That's a valid question, although I don't know a good answer to that.
> > 
> > Basically you're asking for what the command line options will look
> > like? Allow me to think out loud.
> > 
> > In this series I'm adding --linearize to git-replay(1). As mentioned, I
> > don't think it makes sense to add it to git-history(1) as well. Without
> > this option, the process aborts when it encounters a merge.
> > 
> > Dscho sent a patch series to properly replay (2-way) merges. I think
> > this should become the default for both git-replay(1) and
> > git-history(1).
> > 
> > But then, do we want to have an option that brings back the current
> > behavior of aborting at merges? Maybe with --no-merges?
> 
> I think that would be a sensible option to have.

I also think that we'll need a way to abort at merges because linearizing
commits is a relatively common operation.

> > Then there's the option of rebasing cousins left. That's something that
> > isn't covered by Dscho's series yet. Maybe --replay-cousins?
> > 
> > To reiterate what the final design could look like:
> > 
> >  * <nothing>: replay merges preserving topology.
> >  * "--linearize": flattens merges (only git-replay(1)).
> >  * "--no-merges": dies when the process tries to replay a merge.
> >  * "--replay-cousins": does what --rebase-merges=rebase-cousins does.
> 
> Right. And if we tried to be consistent with git-rebase(1), then this
> could be done as:
> 
>   - "--rebase-merges" to replay merges preserving topology, which is the
>     default once we support replaying them.
> 
>   - "--no-rebase-merges" to flatten commits.
> 
>   - "--rebase-merges=abort" to explicitly die when seeing merges.
> 
>   - "--rebase-merges=rebase-cousins"

The `git rebase` options are unlikely to be a good precedent to follow.
Their history is full of usability warts, and in hindsight, I would really
have loved a more steady hand in developing and maintaining a good UX. The
fact alone that this is called `rebase` speaks volumes about how hostile
of a user experience this command surfaces.

In any case, these options should use the much more natural term "replay"
instead of "rebase".

But then: you said that `--no-rebase-merges` should flatten the commits?
That's not what this option name conveys to me; It would convey to me that
the operation would _abort_ on encountering merge commits.

In other words, I do think that the --linearize option is conceptually
quite distinct from the different modes in which merge commits could be
handled. As such, this option should probably not be conflated with
the various `--replay-merges=<mode>` modes.

> > Now, all these options are (I think) mutually exclusive, so we could
> > consider an option "--replay-merges=<mode>", but personally I find
> > "--<option>=<value>" arguments harder to use than specifying separate
> > options.
> > 
> > I think I'm avoiding your question, because the design of the command
> > line parameters doesn't need tot 1-on-1 correlate to the internal
> > datastructure. And I agree the mode isn't a boolean, but does that mean
> > we want to use an enum internally? Well, I don't know. And I also don't
> > think that matters right now. Code is easy to change, I think the
> > command line options should be designed with the future in mind, which I
> > believe we do with "--linearize".
> > 
> > Sorry for this long-winded rambling, but bottom line I think it's fine
> > to add --linearize and in the future add more options and see how the
> > code should evolve to support those.
> 
> Hm, I dunno. You basically reasoned that we potentially want to have all
> of the same options that git-rebase(1)'s "--rebase-merges=" already
> supports. So that begs the question why we need to reinvent the wheel
> then and not just use the same syntax.

I would strongly caution against repeating the same UX mistakes as `git
rebase` has to live with.

The _functionality_, yes, I think that'd be good to have in `git replay`.
But we can surface that functionality in much better ways, with option
names that reflect the concepts much more intuitively.

Ciao,
Johannes

> Note that I'm not arguing that we should support all of these options
> now. I'm merely arguing that we should try to be consistent, unless
> there is a good argument not to do that. I'm fine with the interface if
> there indeed is a good argument, but if so we should document why we
> think that the current interface in git-rebase(1) is not a good fit for
> this command.
> 
> Thanks!
> 
> Patrick
> 
> 

^ permalink raw reply

* Re: [PATCH v5 0/4] history: add squash subcommand to fold a range
From: Matt Hunter @ 2026-06-30  9:23 UTC (permalink / raw)
  To: Harald Nordgren
  Cc: phillip.wood, Patrick Steinhardt,
	Harald Nordgren via GitGitGadget, git
In-Reply-To: <CAHwyqnVBEOm+FwD+i9Aa7edTvdnDPJom1zubcXgoExZnp--vWQ@mail.gmail.com>

On Tue Jun 30, 2026 at 3:19 AM EDT, Harald Nordgren wrote:
>> This is probably a larger question, since (according to the man page) it
>> affects the other 'git history' commands as well.  When I run
>> 'git history ...' and discover that I made a mistake after inspecting
>> the results, is there a fool-proof way to undo the change and return to
>> the previous state?  My first thought was to run 'git reset --hard ...',
>> but the default behavior of --update-refs (moving other branches) can
>> make this more complicated.
>
> This is a larger question: But I would love to have a reflog that is
> more human-centered. When e.g. rebasing a series with N commits, it's
> very tricky in the reflog to find what was the state before that.
>
> I feel like branch switching is given too much space in the reflog,
> since it's not a destructive action, I don't care about it.

I share these headaches to an extent.  When dealing with the first
problem (seeing an atomic entry in the reflog), I usually look at the
branch's own reflog instead of HEAD's

But my question is about doing a comprehensive reset from a botched
operation.  If any history operation updates branch refs besides the
current one.  I don't think there's an obvious way to see which other
ones were affected, and a naive 'git reset --hard my-branch@{1}' leaves
them pointed at unwanted commits.  Is this right?

tangent:  I'm pretty sure that git-status relies on checkout / branch
switching reflog entries in order to know which tag your detached HEAD
started from, eg: when it says something like

    HEAD detached from v2.55.0-rc2
    nothing to commit, working tree clean

>
> And when handling multiple commits in on go (squashing, rebasing), I
> would love to see a visual hierarchy (with indentation for sub-steps)
> instead of treating each action as equally important when it isn't.

Sounds compelling!

^ permalink raw reply

* Re: [PATCH 3/3] t5551: pack refs after creating many tags
From: Patrick Steinhardt @ 2026-06-30  9:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Montalbo, git, Junio C Hamano
In-Reply-To: <20260629203527.GA1895313@coredump.intra.peff.net>

On Mon, Jun 29, 2026 at 04:35:27PM -0400, Jeff King wrote:
> There's one other thing you might find interesting. While poking at the
> timings here the other day, I noticed that reftable is very eager to
> stat the tables.list file. Try this:
> 
>   git init --ref-format=reftable
>   blob=$(echo foo | git hash-object -w --stdin)
>   seq -f "create refs/tags/foo-%g $blob" 2000 |
>   strace -c git update-ref --stdin
> 
> We make 2000 fstat, which strace claims takes 85% of the time. I suspect
> this is over-emphasized because strace inherently makes syscalls slow,
> but running with perf also highlights it as a non-trivial cost.

Yeah, this rings a bell. If I remember correctly, this is because we
call `refs_resolve_ref_unsafe()` to verify whether the target already
exists. And as that function is generic, it wasn't easy to optimize it
by reusing the already-loaded reftable stack.

> It has been a long time since I've thought about reftable internals, but
> it feels like we ought to be able to take the lock and then trust that
> the stack has not been manipulated.

Yeah, that would certainly be an option to explore.

> It may not be worth digging into too much, though. I can make 50,000
> refs in 150ms on my system, which is probably good enough (especially
> compared to the files backend).

True, it's going to be much better compared to the "files" backend. But
that isn't enough reason to not optimize it even further -- doubly so if
it actually takes 85% of the time. Sounds like a low-hanging fruit to me
that can result in a significant speedup.

I'll probably not get to it anytime soon, but I'll create an issue to
keep track of it.

Patrick

^ permalink raw reply

* Re: [PATCH 0/3] fixing expensive http test timeouts
From: Patrick Steinhardt @ 2026-06-30  9:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Michael Montalbo, git
In-Reply-To: <xmqqik71xqtc.fsf@gitster.g>

On Mon, Jun 29, 2026 at 09:19:11AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> pushes only to "cast in stone" branches.  If there are other
> >> branches that deserve to be tested with TEST_LONG upon other events
> >> that the existing GitHub Actions CI does not trigger, it may be good
> >> to have GitLab CI cover them, perhaps?
> >
> > I'm a bit hesitant to do such a split, mostly because the canonical
> > source of truth that the project typically uses is GitHub's CI. So I
> > want us at GitLab to be able to catch the same issues that GitHub would
> > flag. And if GitLab's CI stopped detecting everything that GitHub does,
> > then the result would likely be that we often create merge requests on
> > both platforms, which would only result in more wasted resources.
> 
> I didn't suggest splitting them into two circles that overlap but
> each with area only it covers, though.  GitLab's coverage can be
> superset to GitHub's and that would satify what I suggested.

Fair.

> FWIW, I do not consider GitHub's CI "the canonical source" at all.
> It is a very handy service to use to check how well we are doing,
> but from time to time it has its own hiccups ;-).

Well, GitLab of course has its own share of hiccups, like for example
the Chocolatey issues we've been facing.

> What can we do to make the visibility of GitLab's CI more prominent?
> 
> I know where the CI jobs that are triggered when I push out the
> integration branches are found at GitHub's website[*], but I do not
> think I know the corresponding one at GitLab, for example, and I
> think that is a shame.

The pipelines of the official mirror can be found at [1]. We might for
example add something like the below patch to our README.md to make it
more discoverable.

Patrick

[1]: https://gitlab.com/git-scm/git/-/pipelines

diff --git a/README.md b/README.md
index d87bca1b8c..9ad77fdf7e 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,5 @@
-[![Build status](https://github.com/git/git/workflows/CI/badge.svg)](https://github.com/git/git/actions?query=branch%3Amaster+event%3Apush)
+[![GitHub build status](https://github.com/git/git/workflows/CI/badge.svg)](https://github.com/git/git/actions?query=branch%3Amaster+event%3Apush)
+[![GitLab build status](https://gitlab.com/git-scm/git/badges/master/pipeline.svg)](https://gitlab.com/git-scm/git/-/pipelines?ref=master)
 
 Git - fast, scalable, distributed revision control system
 =========================================================

^ permalink raw reply related

* Re: [PATCH 2/6] object-file: propagate files transaction errors
From: Patrick Steinhardt @ 2026-06-30  8:45 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git
In-Reply-To: <akLBFaTfBEq8vHUr@denethor>

On Mon, Jun 29, 2026 at 02:04:08PM -0500, Justin Tobler wrote:
> On 26/06/29 01:58PM, Justin Tobler wrote:
> > On 26/06/24 01:26PM, Patrick Steinhardt wrote:
> > > On Tue, Jun 23, 2026 at 11:19:16PM -0500, Justin Tobler wrote:
> > > > @@ -511,11 +511,15 @@ static void odb_transaction_files_prepare(struct odb_transaction *base)
> > > >  	 * added at the time they call odb_transaction_files_begin.
> > > >  	 */
> > > >  	if (!transaction || transaction->objdir)
> > > > -		return;
> > > > +		return 0;
> > > >  
> > > >  	transaction->objdir = tmp_objdir_create(base->source->odb->repo, "bulk-fsync");
> > > > -	if (transaction->objdir)
> > > > -		tmp_objdir_replace_primary_odb(transaction->objdir, 0);
> > > > +	if (!transaction->objdir)
> > > > +		return -1;
> > > 
> > > Huh. So previously we just didn't handle this error at all and just
> > > continued to tag along? Did that result in anything sensible or was this
> > > just YOLOing it?
> > 
> > Good question. Previously if there was an error, we wouldn't end up
> > creating any tmpdir and would instead continue to use the primary ODB to
> > write objects in. This change would make it a hard error if we fail to
> > create the temp dir. This matches the behavior that git-receive-pack(1)
> > expects, but I didn't consider that the existing callers could
> > transparently handle there being no temp dir.
> > 
> > I suspect we may want existing ODB transaction users to continue being
> > resilient in the same manner. In the next version, I'll maintain the
> > same behavior.
> 
> I think I got a bit ahead of myself. The existing callers of
> odb_transaction_files_prepare() still continue to ignore this error. So
> the behavior already does remain the same here.

Oh, well, okay. I think this behaviour is plain bad -- if the caller
wants to have a transaction, then we should bail in case we cannot
create one. But this doesn't need to be fixed in this patch series.

Patrick

^ permalink raw reply

* Re: [PATCH 2/6] object-file: propagate files transaction errors
From: Patrick Steinhardt @ 2026-06-30  8:45 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git
In-Reply-To: <akK1roQJknYstX0u@denethor>

On Mon, Jun 29, 2026 at 01:58:54PM -0500, Justin Tobler wrote:
> On 26/06/24 01:26PM, Patrick Steinhardt wrote:
> > On Tue, Jun 23, 2026 at 11:19:16PM -0500, Justin Tobler wrote:
> > > diff --git a/object-file.c b/object-file.c
> > > index a3eb8d71dd..18c2df75fb 100644
> > > --- a/object-file.c
> > > +++ b/object-file.c
> > > @@ -499,7 +499,7 @@ struct odb_transaction_files {
> > >  	struct transaction_packfile packfile;
> > >  };
> > >  
> > > -static void odb_transaction_files_prepare(struct odb_transaction *base)
> > > +static int odb_transaction_files_prepare(struct odb_transaction *base)
> > >  {
> > >  	struct odb_transaction_files *transaction =
> > >  		container_of_or_null(base, struct odb_transaction_files, base);
> > 
> > By the way, is there any reason why those functions are still hosted in
> > "object-file.c" instead of in "odb/source-files.c"? I should probably
> > know, but I forgot.
> 
> There are currently a couple spots in the "files" object write path in
> "object-file.c" that still reach into some of these transaction function
> that are not part of the generic ODB transaction interface. I'm hoping
> in a future followup series to detangle this a bit and be able to get
> all the "files" ODB transaction related code moved into
> "odb/source-files.c".

Makes sense. I also revisited that code a couple days ago, and the
answer is "it's messy right now". Hopefully this will become easier to
detangle as we make progress on pluggifying transactions.

> > > @@ -511,11 +511,15 @@ static void odb_transaction_files_prepare(struct odb_transaction *base)
> > >  	 * added at the time they call odb_transaction_files_begin.
> > >  	 */
> > >  	if (!transaction || transaction->objdir)
> > > -		return;
> > > +		return 0;
> > >  
> > >  	transaction->objdir = tmp_objdir_create(base->source->odb->repo, "bulk-fsync");
> > > -	if (transaction->objdir)
> > > -		tmp_objdir_replace_primary_odb(transaction->objdir, 0);
> > > +	if (!transaction->objdir)
> > > +		return -1;
> > 
> > Huh. So previously we just didn't handle this error at all and just
> > continued to tag along? Did that result in anything sensible or was this
> > just YOLOing it?
> 
> Good question. Previously if there was an error, we wouldn't end up
> creating any tmpdir and would instead continue to use the primary ODB to
> write objects in. This change would make it a hard error if we fail to
> create the temp dir. This matches the behavior that git-receive-pack(1)
> expects, but I didn't consider that the existing callers could
> transparently handle there being no temp dir.
> 
> I suspect we may want existing ODB transaction users to continue being
> resilient in the same manner. In the next version, I'll maintain the
> same behavior.

Honestly I'd say that the change is a good one. I cannot think of a
single reason to just blindly not create the transaction and proceed.
But it certainly is something that should be documented as part of the
commit message.

> > > @@ -1670,27 +1678,34 @@ int read_loose_object(struct repository *repo,
> > >  	return ret;
> > >  }
> > >  
> > > -static void odb_transaction_files_commit(struct odb_transaction *base)
> > > +static int odb_transaction_files_commit(struct odb_transaction *base)
> > >  {
> > >  	struct odb_transaction_files *transaction =
> > >  		container_of(base, struct odb_transaction_files, base);
> > >  
> > > -	flush_loose_object_transaction(transaction);
> > > +	if (flush_loose_object_transaction(transaction))
> > > +		return -1;
> > >  	flush_packfile_transaction(transaction);
> > > +
> > > +	return 0;
> > >  }
> > >  
> > > -struct odb_transaction *odb_transaction_files_begin(struct odb_source *source)
> > > +int odb_transaction_files_begin(struct odb_source *source,
> > > +				struct odb_transaction **out)
> > >  {
> > >  	struct odb_transaction_files *transaction;
> > >  	struct object_database *odb = source->odb;
> > >  
> > > -	if (odb->transaction)
> > > -		return NULL;
> > > +	if (odb->transaction) {
> > > +		*out = NULL;
> > > +		return 0;
> > > +	}
> > >  
> > >  	transaction = xcalloc(1, sizeof(*transaction));
> > >  	transaction->base.source = source;
> > >  	transaction->base.commit = odb_transaction_files_commit;
> > >  	transaction->base.write_object_stream = odb_transaction_files_write_object_stream;
> > > +	*out = &transaction->base;
> > >  
> > > -	return &transaction->base;
> > > +	return 0;
> > >  }
> > 
> > It's still somewhat fishy that we have this ODB-level transaction, but
> > that's a preexisting issue and thus outside the scope of this patch
> > series. Ideally though, it would be possible for there to be multiple
> > transactions, and it would be the caller's responsibility for juggling
> > these transactions. Just as it happens with reference transactions.
> 
> I completely agree. One of the current problems preventing this is that
> only a single instance of tmp_objdir is allowed and stored globally.
> This is done to keep atexit() cleanup simple.
> 
> My plan is to eventually convert all existing tmp_objdir callsites to
> use ODB transactions which should hopefully allow us to remove the need
> for a separate tmp_objdir system. At that point, we can also work to
> change how temp dir cleanup is handled at exit.

Great.

> Another problem is that there are also a couple of ODB transaction
> callsites that require to know if there is already a pending transaction
> for the repository and the transaction has not been wired down to these
> callsites. My hope is that this can be addressed though as we expand ODB
> transaction usage for object writes.

Yeah, agreed. Making the use of transactions explicit feels sensible to
me.

Patrick

^ permalink raw reply

* Re: [PATCH 6/6] builtin/receive-pack: stage incoming objects via ODB transactions
From: Patrick Steinhardt @ 2026-06-30  8:45 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git
In-Reply-To: <akLLB_J-pvJ7iR7c@denethor>

On Mon, Jun 29, 2026 at 03:25:18PM -0500, Justin Tobler wrote:
> On 26/06/24 01:26PM, Patrick Steinhardt wrote:
> > On Tue, Jun 23, 2026 at 11:19:20PM -0500, Justin Tobler wrote:
> > > Objects received by git-receive-pack(1) are quarantined in a temporary
> > > "incoming" directory and migrated into the object database prior to the
> > > reference updates. The quarantine is currently managed through
> > > `tmp_objdir` directly. In a pluggable ODB future, how exactly an object
> > > gets written to a transaction may vary for a given ODB source. Refactor
> > > git-receive-pack(1) to use the ODB transaction interfaces to manage the
> > > object staging area in a more agnostic manner accordingly.
> > > 
> > > Note that the temporary directory created for git-receive-pack(1) is
> > > eagerly created and uses a different prefix name. This behavior is
> > 
> > A different prefix name compared to what?
> 
> Currently the temporary directories created for ODB transactions all use
> the prefix "bulk-fsync". The temp dir created by git-receive-pack(1) is
> expected to have the prefix "incoming".
> 
> > > special cased in the "files" backend by having `odb_transaction_begin()`
> > > callers that require this behavior provide an `ODB_TRANSACTION_RECEIVE`
> > > flag.
> > 
> > Okay. I guess this is to retain existing behaviour where the temporary
> > directory is created lazily everywhere else. Makes me wonder whether we
> > should eventually change this to just unconditionally create the
> > directory in all cases so that we can drop this new flag.
> 
> It would be nice to not have to have a flag here, but if we want to also
> keep the existing temp dir prefixes, we would also need to keep the
> flags.

Fair. Makes me wonder whether we really need to keep the exact same
naming for this temporary directory. This is so deep into internals that
I'm not sure whether we really need to treat this as part of our
interface. I'm rather inclined to say it's not necessary.

In any case, I think it's fine to defer that discussion and keep this
as-is for now. But we might keep it in the back of our minds and maybe
simplify this in a subsequent patch series.

> > > diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> > > index 19eb6a1b61..ee8e03e2ab 100644
> > > --- a/builtin/receive-pack.c
> > > +++ b/builtin/receive-pack.c
> > > @@ -2326,7 +2323,8 @@ static void push_header_arg(struct strvec *args, struct pack_header *hdr)
> > >  		     ntohl(hdr->hdr_version), ntohl(hdr->hdr_entries));
> > >  }
> > >  
> > > -static const char *unpack(int err_fd, struct shallow_info *si)
> > > +static const char *unpack(int err_fd, struct shallow_info *si,
> > > +			  struct odb_transaction *transaction)
> > >  {
> > >  	struct pack_header hdr;
> > >  	const char *hdr_err;
> > 
> > It feels a bit weird that we sometimes pass the transaction as
> > parameter, whereas othertimes we access it via `the_repository`.
> 
> That's fair. I was trying to avoid the churn of wiring to all its
> callsites, but it's probably best to be consistent. Maybe it would be
> fine to just create a transaction global like we do for the reference
> transaction?

My first kneejerk reaction was "no", but then I noticed that the global
variable you're talking about is local to "builtin/receive-pack.c". So
that might be an okayish solution.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH 2/2] odb: introduce `odb_prepare()`
From: Patrick Steinhardt @ 2026-06-30  8:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Toon Claes, git
In-Reply-To: <xmqqa4sdt3e6.fsf@gitster.g>

On Mon, Jun 29, 2026 at 02:58:41PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> According to my grep results are there 17 callsites for odb_reprepare(),
> >> then I agree it makes sense to create this wrapper.
> >
> > Yeah, I was a bit torn myself whether or not to keep the wrapper. I
> > eventually decided to just keep it because it reduces churn, and it's a
> > trivial wrapper anyway.
> 
> That sounds OK.  Are we all happy with the current shape of the
> topic?  I myself did not find anything iffy in these two patches.

Based on Toon's reply [1] it seems like this series is ready to go.
Thanks!

Patrick

[1]: <87ik704f1j.fsf@emacs.iotcl.com>

^ permalink raw reply

* Re: What's cooking in git.git (Jun 2026, #10)
From: Toon Claes @ 2026-06-30  8:20 UTC (permalink / raw)
  To: Junio C Hamano, git
In-Reply-To: <xmqq5x36dtyf.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> * ps/odb-generalize-prepare (2026-06-22) 3 commits
>  - odb: introduce `odb_prepare()`
>  - odb/source: generalize `reprepare()` callback
>  - Merge branch 'ps/odb-source-packed' into ps/odb-generalize-prepare
>  (this branch uses ps/odb-source-packed.)
>
>  The `reprepare()` callback for object database sources has been
>  generalized into a `prepare()` callback with an optional flush cache
>  flag, and a new `odb_prepare()` wrapper has been introduced to
>  allow pre-opening object database sources.
>
>  Needs review.
>  source: <20260622-b4-pks-odb-generalize-prepare-v1-0-d2a5c5d13144@pks.im>

I did have some questions/remarks, but Patrick answered them, and with
those answers I'm happy about this series.

-- 
Cheers,
Toon

^ permalink raw reply

* Re: [PATCH 1/2] odb/source: generalize `reprepare()` callback
From: Toon Claes @ 2026-06-30  8:18 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <akINy-hP5EPD4Y4e@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> On Fri, Jun 26, 2026 at 02:10:32PM +0200, Toon Claes wrote:
>> > diff --git a/builtin/grep.c b/builtin/grep.c
>> > index 8080d1bf5e..7361bf071e 100644
>> > --- a/builtin/grep.c
>> > +++ b/builtin/grep.c
>> > @@ -1361,10 +1360,8 @@ int cmd_grep(int argc,
>> >  			struct odb_source *source;
>> >  
>> >  			odb_prepare_alternates(the_repository->objects);
>> > -			for (source = the_repository->objects->sources; source; source = source->next) {
>> > -				struct odb_source_files *files = odb_source_files_downcast(source);
>> > -				odb_source_packed_prepare(files->packed);
>> > +			for (source = the_repository->objects->sources; source; source = source->next)
>> > +				odb_source_prepare(source, 0);
>> 
>> So you're downcasting inside the implementation by the backends itself.
>> That makes sense, but would it be worth to say something about that in
>> the commit message?
>
> Hm. Would that provide much value? I'm probably quite a bit biased here,
> but I think that it's implicit that the backends have to eventually cast
> the generic structure to their own backend.
>
> So I wouldn't really know how to clarify this. Did you have anything
> specific in mind?

Ah, I'm sorry, I misread that. I thought you changed the vtable function
to do the downcasting, but you're simply changing from calling a 
`*_packed_*()` to the generic variant that goes through the vtable.

Anyhow, not worth mentioning in the commit message.

>> > diff --git a/odb/source-packed.c b/odb/source-packed.c
>> > index 42c28fba0e..fa5a072488 100644
>> > --- a/odb/source-packed.c
>> > +++ b/odb/source-packed.c
>> > @@ -15,7 +15,7 @@ static int find_pack_entry(struct odb_source_packed *store,
>> >  {
>> >  	struct packfile_list_entry *l;
>> >  
>> > -	odb_source_packed_prepare(store);
>> > +	odb_source_prepare(&store->base, 0);
>> 
>> Why are you not using ODB_PREPARE_FLUSH_CACHES here? It used to do
>> before?
>
> Because this was calling `odb_source_packed_prepare()` before, not
> `odb_source_reprepare()`. So this was calling the non-flushing
> variant.

Again, confusion on my end.

>> >  	if (store->midx && fill_midx_entry(store->midx, oid, e))
>> >  		return 1;
>> >  
>> > @@ -47,7 +47,7 @@ static int odb_source_packed_read_object_info(struct odb_source *source,
>> >  	 * been added since the last time we have prepared the packfile store.
>> >  	 */
>> >  	if (flags & OBJECT_INFO_SECOND_READ)
>> > -		odb_source_reprepare(source);
>> > +		odb_source_prepare(source, ODB_PREPARE_FLUSH_CACHES);
>> 
>> I think the new code is correct, but why wasn't `packed` used here in
>> the past? The old odb_source_reprepare() expected a downcasted, didn't
>> it?
>
> No, `odb_source_reprepare()` is the generic variant. The naming schema
> is typically:
>
>   - `odb_source_frobnicate()` for the generic variants, which receive a
>     `struct odb_source` as input.
>
>   - `odb_source_<type>_frobnitcate()` for their backend-specific
>     implementations, which cast down the generic `struct odb_source` to
>     their backend-specific struct.

Yeah, I understand things better now. Thanks for clarifying.

-- 
Cheers,
Toon

^ permalink raw reply

* Re: [PATCH v5 0/4] history: add squash subcommand to fold a range
From: Harald Nordgren @ 2026-06-30  7:19 UTC (permalink / raw)
  To: Matt Hunter
  Cc: phillip.wood, Patrick Steinhardt,
	Harald Nordgren via GitGitGadget, git
In-Reply-To: <DJM1N17VMUM5.3V5Y6YMFLIFQJ@lfurio.us>

> This is probably a larger question, since (according to the man page) it
> affects the other 'git history' commands as well.  When I run
> 'git history ...' and discover that I made a mistake after inspecting
> the results, is there a fool-proof way to undo the change and return to
> the previous state?  My first thought was to run 'git reset --hard ...',
> but the default behavior of --update-refs (moving other branches) can
> make this more complicated.

This is a larger question: But I would love to have a reflog that is
more human-centered. When e.g. rebasing a series with N commits, it's
very tricky in the reflog to find what was the state before that.

I feel like branch switching is given too much space in the reflog,
since it's not a destructive action, I don't care about it.

And when handling multiple commits in on go (squashing, rebasing), I
would love to see a visual hierarchy (with indentation for sub-steps)
instead of treating each action as equally important when it isn't.

Harald

^ permalink raw reply

* Re: [PATCH] builtin/history: unuse the commit buffer after use
From: Jeff King @ 2026-06-30  6:44 UTC (permalink / raw)
  To: Kaartic Sivaraam; +Cc: Patrick Steinhardt, Git mailing list
In-Reply-To: <20260630055026.GE2495216@coredump.intra.peff.net>

On Tue, Jun 30, 2026 at 01:50:26AM -0400, Jeff King wrote:

> t4014 also fails, but with a unique leak. I don't think it's the same
> thing; it looks like we may leak the slab from init_topo_walk().

Fixed over in
https://lore.kernel.org/git/20260630063944.GA3733670@coredump.intra.peff.net/.

I think it can be handled independently of your fix here. You might also
run afoul of the prove TAP-output thing fixed in that thread.

-Peff

^ permalink raw reply

* [PATCH 2/2] format-patch: fix leak of rev_info in prepare_bases()
From: Jeff King @ 2026-06-30  6:43 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Patrick Steinhardt
In-Reply-To: <20260630063944.GA3733670@coredump.intra.peff.net>

In prepare_bases() we do a custom revision walk, separate from the main
format-patch walk. After we finish, we fail to call release_revisions(),
possibly leaking its contents.

We failed to notice it so far because the revision machinery doesn't
always allocate. But at least one case can trigger the leak: if a commit
graph is present, then the topo-walk allocates revs.topo_walk_info and
some associated data structures. You can see it in the test suite by
running:

  make SANITIZE=leak
  cd t
  GIT_TEST_COMMIT_GRAPH=1 ./t4014-format-patch.sh

which yields many entries like:

  ==git==3687620==ERROR: LeakSanitizer: detected memory leaks
  Direct leak of 200 byte(s) in 1 object(s) allocated from:
      #0 0x7f4ccba185cb in malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:74
      #1 0x55cd452cdd0b in do_xmalloc wrapper.c:55
      #2 0x55cd452cdd9d in xmalloc wrapper.c:76
      #3 0x55cd45255473 in init_topo_walk revision.c:3845
      #4 0x55cd45255bef in prepare_revision_walk revision.c:4017
      #5 0x55cd44ffec40 in prepare_bases builtin/log.c:1872
      #6 0x55cd450010ec in cmd_format_patch builtin/log.c:2439

The un-released rev_info has been there since the code was added in
fa2ab86d18 (format-patch: add '--base' option to record base tree info,
2016-04-26), but back then we didn't even have a way to release rev_info
resources! The actual leak probably started around f0d9cc4196
(revision.c: begin refactoring --topo-order logic, 2018-11-01), but it's
hard to bisect because there were so many other unrelated leaks back
then.

So I'm not sure exactly when the leak started beyond "long ago", but it
is easy-ish to find now (since we've plugged all those other leaks) and
the solution is clear.

I didn't add a new test since we can demonstrate it with the existing
ones, but it does require tweaking a test variable. We might consider
ways to get more automatic leak-checking coverage there, but I think it
should be done outside of this fix.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/log.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/log.c b/builtin/log.c
index d027ce1e0b..350b35c556 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1888,6 +1888,7 @@ static void prepare_bases(struct base_tree_info *bases,
 		bases->nr_patch_id++;
 	}
 	clear_commit_base(&commit_base);
+	release_revisions(&revs);
 }

 static void print_bases(struct base_tree_info *bases, FILE *file)
-- 
2.55.0.346.g83d0ea82e4

^ permalink raw reply related

* [PATCH 1/2] t: move LSan errors from stdout to stderr
From: Jeff King @ 2026-06-30  6:41 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Patrick Steinhardt
In-Reply-To: <20260630063944.GA3733670@coredump.intra.peff.net>

When we find LSan errors, we dump them via "say_color", which goes to
stdout. This is mostly harmless, since stdout and stderr tend to go to
the same place (either the user's terminal, or to the ".out" file with
--verbose-log).

But when running under a TAP harness like prove, they are split and
stdout is interpreted as TAP output. Historically even this was fine, as
the extra lines on stdout would be ignored. But since 389c83025d (t: let
prove fail when parsing invalid TAP output, 2026-06-04) we instruct the
TAP reader to complain, and a leaking test will result in complaints
like this (this is a real leak which we have yet to fix):

  $ GIT_TEST_COMMIT_GRAPH=1 make SANITIZE=leak test
  [...]
  Test Summary Report
  -------------------
  t4014-format-patch.sh (Wstat: 256 (exited 1) Tests: 226 Failed: 30)
    Failed tests:  197-226
    Non-zero exit status: 1
    Parse errors: Unknown TAP token: ""
                  Unknown TAP token: "================================================================="
                  Unknown TAP token: "==git==3693658==ERROR: LeakSanitizer: detected memory leaks"
                  Unknown TAP token: ""
                  Unknown TAP token: "Direct leak of 200 byte(s) in 1 object(s) allocated from:"
  Displayed the first 5 of 1531 TAP syntax errors.
  Re-run prove with the -p option to see them all.

You still see the failing tests, so it's mostly just an annoyance. We
can fix it by redirecting to stderr (actually descriptor 4, which is our
verbose-respecting variant). I confirmed manually that the output still
appears with --verbose-log, and even with a single-test "-i
--verbose-only=197" going to the terminal.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/test-lib.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index ceefb99bff..d390c53ec1 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1217,14 +1217,14 @@ check_test_results_san_file_ () {
 	then
 		return
 	fi &&
-	say_color error "$(cat "$TEST_RESULTS_SAN_FILE".*)" &&
+	say_color >&4 error "$(cat "$TEST_RESULTS_SAN_FILE".*)" &&
 
 	if test "$test_failure" = 0
 	then
-		say "Our logs revealed a memory leak, exit non-zero!" &&
+		say >&4 "Our logs revealed a memory leak, exit non-zero!" &&
 		invert_exit_code=t
 	else
-		say "Our logs revealed a memory leak..."
+		say >&4 "Our logs revealed a memory leak..."
 	fi
 }
 
-- 
2.55.0.346.g83d0ea82e4


^ permalink raw reply related

* [PATCH 0/2] small leak fix in format-patch
From: Jeff King @ 2026-06-30  6:39 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Patrick Steinhardt

This fixes a leak I found while discussing an unrelated leak in another
thread[1]. As a bonus, this fixes some minor recent breakage of
leak-reporting when running the test suite under prove. The patches can
be split into separate topics if we want.

  [1/2]: t: move LSan errors from stdout to stderr
  [2/2]: format-patch: fix leak of rev_info in prepare_bases()

 builtin/log.c | 1 +
 t/test-lib.sh | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

-Peff

[1] https://lore.kernel.org/git/20260630055026.GE2495216@coredump.intra.peff.net/

^ permalink raw reply

* Re: [PATCH] builtin/history: unuse the commit buffer after use
From: Jeff King @ 2026-06-30  5:50 UTC (permalink / raw)
  To: Kaartic Sivaraam; +Cc: Patrick Steinhardt, Git mailing list
In-Reply-To: <20260630053825.GC2495216@coredump.intra.peff.net>

On Tue, Jun 30, 2026 at 01:38:25AM -0400, Jeff King wrote:

> Try:
> 
>   make SANITIZE=leak
>   cd t
>   GIT_TEST_COMMIT_GRAPH=1 ./t3451-history-reword.sh -v -i

Of course that made me curious how a full run of the test suite would
react to that flag. We get a few failures of t345x tests, but they all
look like the same leak discussed here.

t4014 also fails, but with a unique leak. I don't think it's the same
thing; it looks like we may leak the slab from init_topo_walk().

-Peff

^ permalink raw reply

* Re: [PATCH] meson: wire up USE_NSEC build knob
From: Jeff King @ 2026-06-30  5:43 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: D. Ben Knoble, git, brian m . carlson, Junio C Hamano,
	Ramsay Jones
In-Reply-To: <akIL6oJgUv8J8SB2@pks.im>

On Mon, Jun 29, 2026 at 08:08:42AM +0200, Patrick Steinhardt wrote:

> > True, but AFAICT it probably is safe these days, at least one some
> > platforms.
> 
> Hm. That makes me wonder whether it is the completely wrong approach to
> make this a build option then. If it works on some systems and only on
> some filesystems, then a build option is just too coarse-grained. A
> distro wouldn't really be able to ever enable the option, unless it knew
> that repositories will only ever exist on a filesystem that works. Which
> I guess is an assumption that no distro can make.
> 
> So instead, I wonder whether we should treat this the same as for
> example "core.ignoreCase", where we only use nanosecond resolution when
> opted in by the user. Ideally, if we had a way to detect brokenness, we
> could even make git-init(1) set it automatically.

Yeah, this came up earlier in the thread. It would be nice if we could
set it automatically, but I'm not sure we have a good way of testing a
particular filesystem. I think the sequence is:

  1. stat() a file, getting nanoseconds

  2. somehow flush the kernel's in-core inode cache

  3. stat() it again and compare

Step 2 is the tricky part. ;) It's not only not portable, but probably
something that would annoy users if we did it for every repo creation.

It would also be nice if we could actually verify that the sequence
above _does_ show the problem. I was not able to come up with a failing
instance on my modern Linux machine (even going as far as unmounting and
re-mounting for step 2).

But I do agree in general that it should be a config flag and not a
build option. Run-time flags are more friendly to users when there is no
good reason to avoid them.

-Peff

^ permalink raw reply

* Re: [PATCH] builtin/history: unuse the commit buffer after use
From: Jeff King @ 2026-06-30  5:38 UTC (permalink / raw)
  To: Kaartic Sivaraam; +Cc: Patrick Steinhardt, Git mailing list
In-Reply-To: <94b0bed5-c86a-4291-b958-52f09faebd29@gmail.com>

On Tue, Jun 30, 2026 at 09:13:28AM +0530, Kaartic Sivaraam wrote:

> On 15/06/26 15:18, Patrick Steinhardt wrote:
> > Huh, curious. That seems to hint that we're missing test coverage for
> > this specific scenario, as our test suite doesn't detect this leak.
> > 
> 
> Indeed. The tricky thing is (as mentioned in another thread), this is
> happening only when we get a commit not cached in the commit slab. Once we
> get an idea on how certain commits get cached in the commit slab while
> others don't, we can write a test case that would catch this leak.

Try:

  make SANITIZE=leak
  cd t
  GIT_TEST_COMMIT_GRAPH=1 ./t3451-history-reword.sh -v -i

That shows off the leak. It's possible we should be running the LSan
tests in CI with more feature flags enabled, but I suspect it's just as
likely to miss a case as to add one (i.e., there might be a leak when we
_don't_ use the commit graph). We could do both, but the combinatorics
get expensive.

You could add a specific test that builds a commit graph and runs a
history-reword. That would show off the fix, but it's so specific that I
find it a bit unlikely that it would catch a useful regression in the
future.

So I dunno. I would be content with showing the commands above in the
commit message.

-Peff

^ permalink raw reply

* Re: [RFC] clone: allow sparse-checkout paths to be specified during clone
From: Jeff King @ 2026-06-30  5:32 UTC (permalink / raw)
  To: Pushkar Singh; +Cc: git, Junio C Hamano, ps
In-Reply-To: <CALE2CrTVVQF4rGhGG-9kmjweFHHYw+xnPU6Jtt=QmHpq7L6P2w@mail.gmail.com>

On Mon, Jun 22, 2026 at 05:05:06PM +0530, Pushkar Singh wrote:

> Currently, the workflow for this is:
> 
>     git clone --sparse <repo>
>     cd <repo>
>     git sparse-checkout set <paths>
> 
> While this works as intended, it feels somewhat cumbersome, especially
> for someone who is new to Git or not familiar with sparse-checkout
> workflows.
> 
> Personally, I do not think of the problem as:
>     "I need to initialize sparse-checkout and then configure pathspecs."
> 
> Instead, I usually think:
>     "I only want to clone these directories from the repository."
> 
> With that in mind, I was wondering if it would make sense to allow
> sparse-checkout patterns to be specified directly during clone.

I haven't ever really used sparse-checkout, so I don't have much of an
opinion. IIRC the sparseness is contained in a patterns file, so I'd
have expected the first level of fix to be "you can provide that file at
clone time, rather than afterwards". But maybe nobody really touches
that file manually, and they just use the sparse-checkout helper. Like I
said, I don't have any experience. :)

You might try cc-ing folks who worked on sparse checkouts, especially
Stolee.

One final thought from a non-sparse-checkout user: you're coming at it
from the point of view of ergonomics (it is annoying to clone and then
set up sparsity separately) but there is also a performance question. If
the clone knows which paths are of interest, would that make it possible
to request a partial clone of the specific paths?

I think probably not in practice, because most servers have path-based
filters disabled (because they're expensive and work against bitmaps).
So the strategy is usually more like "don't ask for any blobs at all,
and then let checkout lazy-load them as we increase the sparse
checkout". But I'm also not really a partial-clone user, either. ;)

-Peff

^ permalink raw reply

* Re: [PATCH] builtin/history: unuse the commit buffer after use
From: Jeff King @ 2026-06-30  5:26 UTC (permalink / raw)
  To: Kaartic Sivaraam; +Cc: Git mailing list, Patrick Steinhardt
In-Reply-To: <317d0f7b-469f-4456-8808-506e17de264d@gmail.com>

On Tue, Jun 30, 2026 at 09:15:49AM +0530, Kaartic Sivaraam wrote:

> > So what the patch is doing is correct, but the explanation is a little
> > confused. We see the leak only when re-encoding, so we'd probably want a
> > test case that triggers that. Which I assume implies rewriting a commit
> > that was previously generated with an encoding header.
> > 
> 
> Thank you very much for these insights! It has been helpful but on further
> digging I think this is not about reencoding. On testing and digging
> further, the leak appears to be happening when the commit that is being
> reworded we get is a freshly allocated buffer from repo_get_commit_buffer.
> I'm still trying to figure out how specific commits get cached in the slab
> while other commits don't. I'll update this thread shortly once I get an
> idea about the same.
> 
> Meanwhile, if anyone knows offhand about this, kindly chime in.

The most likely cause of an uncached commit buffer is that we loaded the
commit from the commit-graph (and thus never opened the object in the
first place, so there was nothing to cache).

I suppose it is also possible that we might create a commit struct and
then before ever calling parse_commit() on it, ask to see the commit
msg. And thus we never looked at either the commit graph nor the object
itself.

We'd also refuse to cache if save_commit_buffer is disabled (since
that's the point of that flag). If that were the case I'd expect it to
be set consistently within a given program.

So my guess is probably the commit graph, but it could be that the
command tries to format unparsed commits (which is _not_ a bug or error,
but would trigger the situation where we hand back a freshly allocated
buffer).

I suspect the leak could _also_ be caused by re-encoding, but yeah, the
root cause is "repo_logmsg_reencode gave us an allocated string", and
that can happen when re-encoding is necessary, or when we did not have a
cached buffer in the first place.

> > But note that you need to do _both_ the unuse and free calls. If we did
> > re-encode, the former is needed to free the newly allocated buffer. The
> > latter only drops the original buffer in the cache.
> 
> From my understanding, I think we may not need free_commit_buffer for the
> following reasons:
> 
> - The leak was only being reported when the commit did not come
>   from the commit slab.
> - We are not going to be reading too many commit objects into memory in
>   this code path. Hence freeing the commit in the slab isn't strictly
>   necessary.
> 
> Kindly correct me if I missed something, though.

It depends on the definition of "too many" here. ;) Probably nobody will
notice if you keep a dozen in memory, but they may if the command
happens to get thousands of commits as input. That's not likely, but if
there's an easy moment where the program can say "ah, I am done with
commit X, let's free any resources associated with it" then I think it
is worth doing.

If there isn't such an easy moment, it is probably not that big a deal
to leave it as-is. It's a rare case when somebody would notice at all,
and if you think over the implications above, a repo with an up-to-date
commit graph won't have very many cached entries in the first place.

> To conclude, I think the change that the patch proposes if fine but the
> commit message definitely needs updation.

Yep. Whether we should be calling free_commit_buffer() or not is really
an orthogonal question to the leak. If we want to do something about it,
it would be a separate patch anyway.

-Peff

^ permalink raw reply

* Re: [RFC] clone: allow sparse-checkout paths to be specified during clone
From: Pushkar Singh @ 2026-06-30  4:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, ps
In-Reply-To: <CALE2CrTVVQF4rGhGG-9kmjweFHHYw+xnPU6Jtt=QmHpq7L6P2w@mail.gmail.com>

Hi,

Just a gentle ping in case this RFC got buried.

If anyone has any thoughts on the proposal, I'd be happy to hear them.
I'd mainly like to know whether this seems worth pursuing, or whether
I should move on to another idea.

Thanks,
Pushkar

^ permalink raw reply

* Re: [PATCH] builtin/history: unuse the commit buffer after use
From: Kaartic Sivaraam @ 2026-06-30  3:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, Patrick Steinhardt
In-Reply-To: <20260615172946.GD91269@coredump.intra.peff.net>

Hi Peff,

On 15/06/26 22:59, Jeff King wrote:
> On Mon, Jun 15, 2026 at 11:48:10AM +0200, Patrick Steinhardt wrote:
>> Huh, curious. That seems to hint that we're missing test coverage for
>> this specific scenario, as our test suite doesn't detect this leak.
> 
> I think it will only leak when the commit object has an "encoding"
> header. See below.
> 

I'm quite sure this is not about the commit with the encoding header. 
More below.

> The first paragraph is accurate here. We'd generally just get a pointer
> to the buffer cached in the slab, because no re-encoding occurs. And in
> that case you _don't_ need to call unuse_commit_buffer(), because you
> have a read-only copy, and the slab cache will hold it forever[1].
> Calling the unuse function will be a noop.
> 
> But when we _do_ re-encode, then you get a new buffer which must be
> freed. And that is when you have to call the unuse function. And the
> reason it is "unuse" and not just "free" is that you don't necessarily
> know which you have, but that function figures it out (and frees it only
> if necessary).
> 
> So what the patch is doing is correct, but the explanation is a little
> confused. We see the leak only when re-encoding, so we'd probably want a
> test case that triggers that. Which I assume implies rewriting a commit
> that was previously generated with an encoding header.
> 

Thank you very much for these insights! It has been helpful but on 
further digging I think this is not about reencoding. On testing and 
digging further, the leak appears to be happening when the commit that 
is being reworded we get is a freshly allocated buffer from 
repo_get_commit_buffer. I'm still trying to figure out how specific 
commits get cached in the slab while other commits don't. I'll update 
this thread shortly once I get an idea about the same.

Meanwhile, if anyone knows offhand about this, kindly chime in.

> Now back to that [1] note. Even if we didn't re-encode, we'll still hold
> onto that buffer forever. It's not a "leak" in the traditional sense
> because it's still referenced in the commit slab cache. But if you are
> going to walk over a million commits (like git-log does), you probably
> don't want to hold a million commit messages in memory at once.
> 
> For that you'd want to call free_commit_buffer() when you know you're
> totally done with it (again, like git-log does after it finishes showing
> the commit). That might be the case here in commit_tree_ext(), or it
> might happen later (I'm not familiar with the git-history code).
>
> But note that you need to do _both_ the unuse and free calls. If we did
> re-encode, the former is needed to free the newly allocated buffer. The
> latter only drops the original buffer in the cache.
>

 From my understanding, I think we may not need free_commit_buffer for 
the following reasons:

- The leak was only being reported when the commit did not come
   from the commit slab.
- We are not going to be reading too many commit objects into memory in
   this code path. Hence freeing the commit in the slab isn't strictly
   necessary.

Kindly correct me if I missed something, though.

To conclude, I think the change that the patch proposes if fine but the 
commit message definitely needs updation.

--
Sivaraam


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox