Git development
 help / color / mirror / Atom feed
* Re: [PATCH 3/3] t5551: pack refs after creating many tags
From: Patrick Steinhardt @ 2026-06-30  9:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Montalbo, git, Junio C Hamano
In-Reply-To: <20260629203527.GA1895313@coredump.intra.peff.net>

On Mon, Jun 29, 2026 at 04:35:27PM -0400, Jeff King wrote:
> There's one other thing you might find interesting. While poking at the
> timings here the other day, I noticed that reftable is very eager to
> stat the tables.list file. Try this:
> 
>   git init --ref-format=reftable
>   blob=$(echo foo | git hash-object -w --stdin)
>   seq -f "create refs/tags/foo-%g $blob" 2000 |
>   strace -c git update-ref --stdin
> 
> We make 2000 fstat, which strace claims takes 85% of the time. I suspect
> this is over-emphasized because strace inherently makes syscalls slow,
> but running with perf also highlights it as a non-trivial cost.

Yeah, this rings a bell. If I remember correctly, this is because we
call `refs_resolve_ref_unsafe()` to verify whether the target already
exists. And as that function is generic, it wasn't easy to optimize it
by reusing the already-loaded reftable stack.

> It has been a long time since I've thought about reftable internals, but
> it feels like we ought to be able to take the lock and then trust that
> the stack has not been manipulated.

Yeah, that would certainly be an option to explore.

> It may not be worth digging into too much, though. I can make 50,000
> refs in 150ms on my system, which is probably good enough (especially
> compared to the files backend).

True, it's going to be much better compared to the "files" backend. But
that isn't enough reason to not optimize it even further -- doubly so if
it actually takes 85% of the time. Sounds like a low-hanging fruit to me
that can result in a significant speedup.

I'll probably not get to it anytime soon, but I'll create an issue to
keep track of it.

Patrick

^ permalink raw reply

* Re: [PATCH v5 0/4] history: add squash subcommand to fold a range
From: Matt Hunter @ 2026-06-30  9:23 UTC (permalink / raw)
  To: Harald Nordgren
  Cc: phillip.wood, Patrick Steinhardt,
	Harald Nordgren via GitGitGadget, git
In-Reply-To: <CAHwyqnVBEOm+FwD+i9Aa7edTvdnDPJom1zubcXgoExZnp--vWQ@mail.gmail.com>

On Tue Jun 30, 2026 at 3:19 AM EDT, Harald Nordgren wrote:
>> This is probably a larger question, since (according to the man page) it
>> affects the other 'git history' commands as well.  When I run
>> 'git history ...' and discover that I made a mistake after inspecting
>> the results, is there a fool-proof way to undo the change and return to
>> the previous state?  My first thought was to run 'git reset --hard ...',
>> but the default behavior of --update-refs (moving other branches) can
>> make this more complicated.
>
> This is a larger question: But I would love to have a reflog that is
> more human-centered. When e.g. rebasing a series with N commits, it's
> very tricky in the reflog to find what was the state before that.
>
> I feel like branch switching is given too much space in the reflog,
> since it's not a destructive action, I don't care about it.

I share these headaches to an extent.  When dealing with the first
problem (seeing an atomic entry in the reflog), I usually look at the
branch's own reflog instead of HEAD's

But my question is about doing a comprehensive reset from a botched
operation.  If any history operation updates branch refs besides the
current one.  I don't think there's an obvious way to see which other
ones were affected, and a naive 'git reset --hard my-branch@{1}' leaves
them pointed at unwanted commits.  Is this right?

tangent:  I'm pretty sure that git-status relies on checkout / branch
switching reflog entries in order to know which tag your detached HEAD
started from, eg: when it says something like

    HEAD detached from v2.55.0-rc2
    nothing to commit, working tree clean

>
> And when handling multiple commits in on go (squashing, rebasing), I
> would love to see a visual hierarchy (with indentation for sub-steps)
> instead of treating each action as equally important when it isn't.

Sounds compelling!

^ permalink raw reply

* Re: [PATCH v4 3/3] replay: offer an option to linearize the commit topology
From: Johannes Schindelin @ 2026-06-30  9:44 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Toon Claes, git, Elijah Newren
In-Reply-To: <akInDBlyWbbRFcLH@pks.im>

Hi Patrick & Toon,

On Tue, 30 Jun 2026, Patrick Steinhardt wrote:

> On Fri, Jun 26, 2026 at 07:36:31AM +0200, Toon Claes wrote:
> > Patrick Steinhardt <ps@pks.im> writes:
> > 
> > > git-rebase(1) essentially knows about three different modes:
> > >
> > >   - "--no-rebase-merges", which is the default and maps to your
> > >     "--linearize".
> > >
> > >   - "--rebase-merges", which by default doesn't rebase cousins by using
> > >     "--ancestry-path" internally.
> > >
> > >   - "--rebase-merges=rebase-cousins", which doesn't pass the above
> > >     option.
> > >
> > > So it's not a simple boolean there, which makes me wonder whether we
> > > should mirror the same interface so that all of git-rebase(1)'s modes
> > > can be represented, as well.
> > 
> > That's a valid question, although I don't know a good answer to that.
> > 
> > Basically you're asking for what the command line options will look
> > like? Allow me to think out loud.
> > 
> > In this series I'm adding --linearize to git-replay(1). As mentioned, I
> > don't think it makes sense to add it to git-history(1) as well. Without
> > this option, the process aborts when it encounters a merge.
> > 
> > Dscho sent a patch series to properly replay (2-way) merges. I think
> > this should become the default for both git-replay(1) and
> > git-history(1).
> > 
> > But then, do we want to have an option that brings back the current
> > behavior of aborting at merges? Maybe with --no-merges?
> 
> I think that would be a sensible option to have.

I also think that we'll need a way to abort at merges because linearizing
commits is a relatively common operation.

> > Then there's the option of rebasing cousins left. That's something that
> > isn't covered by Dscho's series yet. Maybe --replay-cousins?
> > 
> > To reiterate what the final design could look like:
> > 
> >  * <nothing>: replay merges preserving topology.
> >  * "--linearize": flattens merges (only git-replay(1)).
> >  * "--no-merges": dies when the process tries to replay a merge.
> >  * "--replay-cousins": does what --rebase-merges=rebase-cousins does.
> 
> Right. And if we tried to be consistent with git-rebase(1), then this
> could be done as:
> 
>   - "--rebase-merges" to replay merges preserving topology, which is the
>     default once we support replaying them.
> 
>   - "--no-rebase-merges" to flatten commits.
> 
>   - "--rebase-merges=abort" to explicitly die when seeing merges.
> 
>   - "--rebase-merges=rebase-cousins"

The `git rebase` options are unlikely to be a good precedent to follow.
Their history is full of usability warts, and in hindsight, I would really
have loved a more steady hand in developing and maintaining a good UX. The
fact alone that this is called `rebase` speaks volumes about how hostile
of a user experience this command surfaces.

In any case, these options should use the much more natural term "replay"
instead of "rebase".

But then: you said that `--no-rebase-merges` should flatten the commits?
That's not what this option name conveys to me; It would convey to me that
the operation would _abort_ on encountering merge commits.

In other words, I do think that the --linearize option is conceptually
quite distinct from the different modes in which merge commits could be
handled. As such, this option should probably not be conflated with
the various `--replay-merges=<mode>` modes.

> > Now, all these options are (I think) mutually exclusive, so we could
> > consider an option "--replay-merges=<mode>", but personally I find
> > "--<option>=<value>" arguments harder to use than specifying separate
> > options.
> > 
> > I think I'm avoiding your question, because the design of the command
> > line parameters doesn't need tot 1-on-1 correlate to the internal
> > datastructure. And I agree the mode isn't a boolean, but does that mean
> > we want to use an enum internally? Well, I don't know. And I also don't
> > think that matters right now. Code is easy to change, I think the
> > command line options should be designed with the future in mind, which I
> > believe we do with "--linearize".
> > 
> > Sorry for this long-winded rambling, but bottom line I think it's fine
> > to add --linearize and in the future add more options and see how the
> > code should evolve to support those.
> 
> Hm, I dunno. You basically reasoned that we potentially want to have all
> of the same options that git-rebase(1)'s "--rebase-merges=" already
> supports. So that begs the question why we need to reinvent the wheel
> then and not just use the same syntax.

I would strongly caution against repeating the same UX mistakes as `git
rebase` has to live with.

The _functionality_, yes, I think that'd be good to have in `git replay`.
But we can surface that functionality in much better ways, with option
names that reflect the concepts much more intuitively.

Ciao,
Johannes

> Note that I'm not arguing that we should support all of these options
> now. I'm merely arguing that we should try to be consistent, unless
> there is a good argument not to do that. I'm fine with the interface if
> there indeed is a good argument, but if so we should document why we
> think that the current interface in git-rebase(1) is not a good fit for
> this command.
> 
> Thanks!
> 
> Patrick
> 
> 

^ permalink raw reply

* Re: [PATCH 2/2] format-patch: fix leak of rev_info in prepare_bases()
From: Patrick Steinhardt @ 2026-06-30 10:26 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Karthik Nayak
In-Reply-To: <20260630064301.GB3733961@coredump.intra.peff.net>

On Tue, Jun 30, 2026 at 02:43:01AM -0400, Jeff King wrote:
> In prepare_bases() we do a custom revision walk, separate from the main
> format-patch walk. After we finish, we fail to call release_revisions(),
> possibly leaking its contents.
> 
> We failed to notice it so far because the revision machinery doesn't
> always allocate. But at least one case can trigger the leak: if a commit
> graph is present, then the topo-walk allocates revs.topo_walk_info and
> some associated data structures. You can see it in the test suite by
> running:
> 
>   make SANITIZE=leak
>   cd t
>   GIT_TEST_COMMIT_GRAPH=1 ./t4014-format-patch.sh
> 
> which yields many entries like:
> 
>   ==git==3687620==ERROR: LeakSanitizer: detected memory leaks
>   Direct leak of 200 byte(s) in 1 object(s) allocated from:
>       #0 0x7f4ccba185cb in malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:74
>       #1 0x55cd452cdd0b in do_xmalloc wrapper.c:55
>       #2 0x55cd452cdd9d in xmalloc wrapper.c:76
>       #3 0x55cd45255473 in init_topo_walk revision.c:3845
>       #4 0x55cd45255bef in prepare_revision_walk revision.c:4017
>       #5 0x55cd44ffec40 in prepare_bases builtin/log.c:1872
>       #6 0x55cd450010ec in cmd_format_patch builtin/log.c:2439

Interesting. Makes me wonder whether we should modify linux-TEST-vars to
also run with the leak checker enabled. Ideally we'd of course just do
this for all jobs, but the overhead is probably way too high... yes,
doing a simple benchmark shows a ~3x hit.

So this is definitely nothing we want to do for all jobs. But for the
linux-TEST-vars job it might make sense, as it exercises a bunch of
non-default code paths.

> The un-released rev_info has been there since the code was added in
> fa2ab86d18 (format-patch: add '--base' option to record base tree info,
> 2016-04-26), but back then we didn't even have a way to release rev_info
> resources! The actual leak probably started around f0d9cc4196
> (revision.c: begin refactoring --topo-order logic, 2018-11-01), but it's
> hard to bisect because there were so many other unrelated leaks back
> then.
> 
> So I'm not sure exactly when the leak started beyond "long ago", but it
> is easy-ish to find now (since we've plugged all those other leaks) and
> the solution is clear.
> 
> I didn't add a new test since we can demonstrate it with the existing
> ones, but it does require tweaking a test variable. We might consider
> ways to get more automatic leak-checking coverage there, but I think it
> should be done outside of this fix.

Yeah, agreed.

One thing worth noting: there are still six test suites that are failing
with this patch: t0095, t3451, t3452, t3453, t4013 and t4211. The t345x
failures are because of the missing call to `repo_unuse_commit_buffer()`
in git-history(1), which we already noted elsewhere.

All of the remaining leaks in t0095, t4013 and t4211 seem to be related
to bloom filters.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/log.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/builtin/log.c b/builtin/log.c
> index d027ce1e0b..350b35c556 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -1888,6 +1888,7 @@ static void prepare_bases(struct base_tree_info *bases,
>  		bases->nr_patch_id++;
>  	}
>  	clear_commit_base(&commit_base);
> +	release_revisions(&revs);
>  }

The fix looks sensible to me. We always initialize `revs` before we take
this exit path here, and there is no other early return that we'd have
to adjust.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH v2 0/5] builtin/refs: add ability to write references
From: Patrick Steinhardt @ 2026-06-30 10:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqik71ul0j.fsf@gitster.g>

On Mon, Jun 29, 2026 at 01:52:44PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Reference-related functionality in Git is currently spread across many
> > different commands: git-update-ref(1), git-for-each-ref(1),
> > git-show-ref(1), git-pack-refs(1) and git-symbolic-ref(1). This makes it
> > hard for users to discover what functionality we have available to work
> > with references.
> >
> > We have thus started to consolidate this functionality into git-refs(1),
> > which is a toolbox of everything related to references. Until now, the
> > command doesn't handle functionality of git-update-ref(1).
> 
> This unfortunately hasn't heard any responses since June 17th, so I
> took a look at it again myself.  All the things we discussed during
> the review of the initial round has been addressed, it seems.
> 
> Shall we mark the topic ready for 'next' now?

Let me send one last reroll to fix the typo you pointed out. But other
than that I think this should be ready.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH v2 4/5] builtin/refs: add "create" subcommand
From: Patrick Steinhardt @ 2026-06-30 10:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq5x31ukqv.fsf@gitster.g>

On Mon, Jun 29, 2026 at 01:58:32PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > +	if (repo_get_oid_with_flags(repo, argv[1], &newoid, GET_OID_SKIP_AMBIGUITY_CHECK))
> > +		die(_("invalid object ID: '%s'"), argv[1]);
> > +	if (is_null_oid(&newoid))
> > +		die(_("cannot create reference with null old object ID"));
> 
> An apparent typo here, "with null old" -> "with null new object
> name".
> 
> Other than that, I think this one is good.

Yes, indeed, good eyes.

Patrick

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox