* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-02 18:48 Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer Martin von Zweigbergk
@ 2025-04-02 19:34 ` Remo Senekowitsch
2025-04-02 19:49 ` Konstantin Ryabitsev
2025-04-02 19:45 ` Konstantin Ryabitsev
` (5 subsequent siblings)
6 siblings, 1 reply; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-02 19:34 UTC (permalink / raw)
To: Martin von Zweigbergk, Git Mailing List
Cc: Edwin Kempin, Scott Chacon, philipmetzger@bluewin.ch
Hi,
I would like to add one benefit to the list that I think is very important.
A change-id header would allow standalone code review tooling and Git forges
(e.g. GitHub, GitLab, Forgejo) to reliably track different versions of a patch.
This would enable much improved code review experiences, not unlike what Gerrit
users are enjoying already.
Mailing list oriented projects like Git itself and the Linux kernel actually
do code review more similar to the Gerrit model. There is a focus on the
individual commits -- what will eventually end up on the master branch.
A "force-push" to a PR/MR on a Git forge is conceptually similar to sending a
new patchset to a mailing list. However, some people don't like that, because
it makes it hard for reviewers to track what they have and haven't reviewed
already. Their answer to this problem is to sequentially add meaningless "fixup
commits" on top of a branch, making incremental code review easier. Often the
branch is later squashed, to clean up the history a little bit. But that also
loses important information in the process.
Mailing lists don't suffer from this as much, because mail clients don't go
out of their way to hide old patchset versions from users. However, they also
don't provide any tools to help code reviewers associate old and new versions
of patchsets. A change-id header could be essential in developing tooling
for mailing lists that track patchsets and even individual patches within
them across versions. Easily being able to view the interdiff between the
last-reviewed version of a patchset and its most recent version is tremendously
useful for any code review workflow.
So I think that almost all members of the broader Git ecosystem would benefit
in many ways from Git supporting this header directly.
Remo
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-02 19:34 ` Remo Senekowitsch
@ 2025-04-02 19:49 ` Konstantin Ryabitsev
0 siblings, 0 replies; 118+ messages in thread
From: Konstantin Ryabitsev @ 2025-04-02 19:49 UTC (permalink / raw)
To: Remo Senekowitsch
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Wed, Apr 02, 2025 at 09:34:18PM +0200, Remo Senekowitsch wrote:
> Mailing lists don't suffer from this as much, because mail clients don't go
> out of their way to hide old patchset versions from users. However, they also
> don't provide any tools to help code reviewers associate old and new versions
> of patchsets. A change-id header could be essential in developing tooling
> for mailing lists that track patchsets and even individual patches within
> them across versions. Easily being able to view the interdiff between the
> last-reviewed version of a patchset and its most recent version is tremendously
> useful for any code review workflow.
Yes, this already exists, using change-id footers.
E.g. you can run this inside your git checkout:
$ b4 diff 20250331-b4-pks-collect-build-fixes-v2-0-6b06136808f3@pks.im
Grabbing thread from lore.kernel.org/all/20250331-b4-pks-collect-build-fixes-v2-0-6b06136808f3@pks.im/t.mbox.gz
Checking for older revisions
Grabbing search results from lore.kernel.org
---
Analyzing 19 messages in the thread
Preparing fake-am for v1: meson: fix handling of '-Dcurl=auto'
range: 98f2b1fb3587..163b98dec916
Preparing fake-am for v2: meson: fix handling of '-Dcurl=auto'
range: 1406ab7e183b..df3c15fbd1ce
---
Diffing v1 and v2
Running: git range-diff 98f2b1fb3587..163b98dec916 1406ab7e183b..df3c15fbd1ce
---
1: 6587b42aec = 1: f41c06addd meson: fix handling of '-Dcurl=auto'
2: d8d124b84d = 2: 8c2301bcd5 gitweb: fix generation of "gitweb.js"
3: 1f27a035f9 < -: ---------- meson: require Perl when building docs
4: 163b98dec9 = 3: e1962003a5 meson: respect 'tests' build option in contrib
-: ---------- > 4: 6f07c417a8 meson: distinguish build and target host binaries
-: ---------- > 5: df3c15fbd1 ci: use Visual Studio for win+meson job on GitHub Workflows
-K
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-02 18:48 Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer Martin von Zweigbergk
2025-04-02 19:34 ` Remo Senekowitsch
@ 2025-04-02 19:45 ` Konstantin Ryabitsev
2025-04-02 19:52 ` Martin von Zweigbergk
` (4 subsequent siblings)
6 siblings, 0 replies; 118+ messages in thread
From: Konstantin Ryabitsev @ 2025-04-02 19:45 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Wed, Apr 02, 2025 at 11:48:01AM -0700, Martin von Zweigbergk wrote:
> Hi,
>
> The Gerrit, GitButler, and Jujutsu projects all have a concept of
> a "change id", and it behaves in a similar way between the three
> tools. The change id is conceptually associated with a commit.
> It follows a commit as its rewritten (e.g. by amending and
> rebasing). The three projects currently store and format the
> change id differently. We would like to unify that so we can
> interoperate better. We hope the Git project is also interested
> in preserving and using this header.
Notably, b4 also uses change-id, but as a series identifier, not as a commit
identifier. It is not intended to ever make its way into git commits and is
always passed in the patch footer.
The format is an arbitrary unique string. :)
-K
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-02 18:48 Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer Martin von Zweigbergk
2025-04-02 19:34 ` Remo Senekowitsch
2025-04-02 19:45 ` Konstantin Ryabitsev
@ 2025-04-02 19:52 ` Martin von Zweigbergk
2025-04-03 9:09 ` Patrick Steinhardt
` (3 subsequent siblings)
6 siblings, 0 replies; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-04-02 19:52 UTC (permalink / raw)
To: Git Mailing List
Cc: Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
Sorry about the typo in the subject line. I meant "commit *header*".
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-02 18:48 Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer Martin von Zweigbergk
` (2 preceding siblings ...)
2025-04-02 19:52 ` Martin von Zweigbergk
@ 2025-04-03 9:09 ` Patrick Steinhardt
2025-04-03 10:38 ` Remo Senekowitsch
2025-04-03 15:56 ` Elijah Newren
2025-04-03 15:39 ` Elijah Newren
` (2 subsequent siblings)
6 siblings, 2 replies; 118+ messages in thread
From: Patrick Steinhardt @ 2025-04-03 9:09 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
Hi Martin,
On Wed, Apr 02, 2025 at 11:48:01AM -0700, Martin von Zweigbergk wrote:
> Hi,
>
> The Gerrit, GitButler, and Jujutsu projects all have a concept of
> a "change id", and it behaves in a similar way between the three
> tools. The change id is conceptually associated with a commit.
> It follows a commit as its rewritten (e.g. by amending and
> rebasing). The three projects currently store and format the
> change id differently. We would like to unify that so we can
> interoperate better. We hope the Git project is also interested
> in preserving and using this header.
>
> There are many benefits to having a change id even if it's just
> local. I mentioned some in my email to this mailing list in [1].
> For example, it enables
> `git rebase main <change ID>; git switch <change ID>` without
> requiring the user to look up the hash of the rewritten commit.
> If the change id also transferred between repos and preserved by
> a forge (such as Gerrit), it enables the change id to be used to
> identify a code review.
Agreed, change IDs solve a couple of issues that many users face:
- You can reliably track how a patch evolves over time. This helps
various different tools to track identity of commits, like for
example forges, but also tools like git-range-diff(1).
- It becomes trivial to see whether a commit has been cherry-picked
into another branch. We do have git-cherry(1) to do that right now,
but that command is based on heuristics and fails as soon as the
patch itself needed to be adapted.
- Working with history rewrites becomes easier in the general case as
you don't have to adapt to constantly changing commit IDs.
The mere fact that different tools eventually ended up with similar
designs around change IDs is a good indicator that there is a real need
for them out there.
> Here's how the change ids are currently stored and formatted:
>
> * Gerrit currently stores change ids in a commit trailer called
> `Change-Id`. It always starts with the letter 'I' and is
> followed by 40 hex digits. For example:
> `Change-Id: Ib563e78c3fedcff262255fa025441daa3202311b`.
>
> * GitButler currently stores change ids in a commit footer
> called `gitbutler-change-id` (older versions used
> `change-id`). It's written as 32 hex digits separated by
> dashes as in the UUID format. For example:
> `gitbutler-change-id 7d0fbc63-032d-413c-8ae8-610fbeb713c0`.
>
> * Jujutsu currently stores change ids in a local storage outside
> of the Git repo and is therefore not part of the Git commit
> id. It is stored as 16 bytes. It is rendered to the user as
> "reverse hex" using 'z' through 'k' as hex digits ('z' = 0,
> 'k' = 15). This allows even short prefixes to be distinguished
> from commit ids, which is a very useful property when used in
> the CLI.
>
> As mentioned, the three projects would like to use the same
> storage and format. I think we have a consensus to store it in a
> Git commit header called `change-id` as a 32 reverse-hex digis.
> For example: `change-id ywlktllmukprnxnmzzprukpuwyztylwt`.
I don't mind the actual format too much at this point, so I won't
comment on this part.
> There is a design doc [2] about the impact on Gerrit and how to
> handle various cases where the client doesn't understand the
> `change-id` header. That also includes some discussion about
> whether cherry-picking should preserve the change id or create a
> new one. I think there is a lot of value in having a
> standardized header regardless of what we decide about
> cherry-picks.
>
> So, to be clear, this is mostly a heads up at this point; we don't
> depend on any immediate changes from the Git project.
Scott has already been reaching out to me before your mail, and I also
mentioned to him that I have been thinking about the problem of change
IDs for quite a while already. This has mostly been triggered by Jujutsu
and how it uses change IDs, which is one of the good improvements over
Git from my perspective.
While there may not be a need to do anything in Git itself I would think
that supporting change IDs natively in Git would still be sensible.
Sure, you can emulate them via commit trailers. But I don't consider
trailers to be particularly great as a storage format for this metadata.
After all, you will want to filter the commit graph by change ID for
some of the usecases, and doing that based on a loosely-defined format
probably isn't great.
So what would it take to get change IDs into Git? I think the most
important items would be:
- Generating and writing change IDs in commands that support them.
This includes e.g. git-commit(1), git-commit-tree(1), git-merge(1),
git-merge-tree(1). This should of course be completely optional and
probably be disabled by default.
- Making tools that rewrite commits aware of change IDs so that they
know to retain change IDs. This involves e.g. git-cherry-pick(1),
git-rebase(1), git-replay(1).
- Extending revisions to allow specifying commits by change ID.
- Allowing us to filter commit graphs by change ID.
I don't think any of these should be particularly hard to do. Sure,
addressing and filtering commits by change IDs would be slowish at first
because we have to basically read all commits, but this is something
that can be sped up via indices.
The biggest question is of course backwards compatibility -- can we
introduce a change ID into the commit metadata without breaking existing
users? I guess you'll already have a lot of experience with this given
that you essentially already inject change IDs into metadata, and tools
generally handle this just fine?
I'd certainly be happy to help out with an effort to introduce change
IDs into Git if the community is amenable to such a proposal.
Patrick
NB: I'm also quite happy that Jujutsu brings a bit of a new contender
to Git into the picture. It has a lot of nice ideas, and in the best
case Git might be able to learn a few nice tricks from JJ. After
all, I think we can all benefit from some friendly competition.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 9:09 ` Patrick Steinhardt
@ 2025-04-03 10:38 ` Remo Senekowitsch
2025-04-03 11:06 ` Patrick Steinhardt
2025-04-03 15:56 ` Elijah Newren
1 sibling, 1 reply; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-03 10:38 UTC (permalink / raw)
To: Patrick Steinhardt, Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
Hi Patrick,
On Thu Apr 3, 2025 at 11:09 AM CEST, Patrick Steinhardt wrote:
> On Wed, Apr 02, 2025 at 11:48:01AM -0700, Martin von Zweigbergk wrote:
>>
>> As mentioned, the three projects would like to use the same
>> storage and format. I think we have a consensus to store it in a
>> Git commit header called `change-id` as a 32 reverse-hex digis.
>> For example: `change-id ywlktllmukprnxnmzzprukpuwyztylwt`.
>
> I don't mind the actual format too much at this point, so I won't
> comment on this part.
Gerrit and GitButler also did not mind the format, which is why they
agreed to adopt the one of Jujutsu. There is also no technical reason
why Jujutsu wouldn't be able to support a free-form id. However,
discussing a standard for the ecosystem gives us the opportunity to
pick something that everybody can rely on and benefit from.
Some benefits of the proposed format include:
- known memory requirement
- change-id as part of a URL never has to be escaped
- it being a hash means the smallest unambiguous prefix is minimized
So, these are mostly practical considerations. If there are notable
benefits to free-form IDs, Jujutsu can hash that again to get an ID in
its internal format if necessary. But it's always easier to go from a
strict format to a loose one later, as opposed to the other way around.
> While there may not be a need to do anything in Git itself I would think
> that supporting change IDs natively in Git would still be sensible.
> Sure, you can emulate them via commit trailers. But I don't consider
> trailers to be particularly great as a storage format for this metadata.
> After all, you will want to filter the commit graph by change ID for
> some of the usecases, and doing that based on a loosely-defined format
> probably isn't great.
>
> So what would it take to get change IDs into Git? I think the most
> important items would be:
>
> - Generating and writing change IDs in commands that support them.
> This includes e.g. git-commit(1), git-commit-tree(1), git-merge(1),
> git-merge-tree(1). This should of course be completely optional and
> probably be disabled by default.
>
> - Making tools that rewrite commits aware of change IDs so that they
> know to retain change IDs. This involves e.g. git-cherry-pick(1),
> git-rebase(1), git-replay(1).
>
> - Extending revisions to allow specifying commits by change ID.
>
> - Allowing us to filter commit graphs by change ID.
I agree with all of that. The first two points are the ones that would
actually allow the ecosystem to start relying on this new header as a
standard and develop related features while staying interoperable with
the rest of the ecosystem. E.g. if the header is preserved by
git-rebase, Git & Jujutsu users will enjoy stable change-ids when a
branch is rebase-merged on a forge. And if git generated the header
itself with git-commit, Gerrit could drop its requirement for clients
to generate a change-id footer via their commit-msg hook.
> The biggest question is of course backwards compatibility -- can we
> introduce a change ID into the commit metadata without breaking existing
> users? I guess you'll already have a lot of experience with this given
> that you essentially already inject change IDs into metadata, and tools
> generally handle this just fine?
Jujutsu has been injecting a 'jj:trees' header into commits to track
more metadata around merge conflicts. There weren't any problems with
that, unless one uses git to rewrite these commits with e.g. git-rebase,
in which case that header is simply lost. But commits with conflicts are
usually not pushed to a remote anyway, so the risk there was minimal.
Scott Chacon with GitButler has more experience in this regard, since
they actually push commits with a change-id in its header to remotes.
He told the Jujutsu community that they didn't encounter any problems,
no misbehaving tools that are fussy about unknown headers. The only
problem is unknown commit headers being dropped by Git itself, depending
on how it is invoked by the remote. (GitHub seems to preserve the header
during a rebase-merge, because they use git-replay. GitLab and Forgejo
drop the header.) With these insights from Scott, Jujutsu is moving
forward to put the change-id in the commit header.
> NB: I'm also quite happy that Jujutsu brings a bit of a new contender
> to Git into the picture. It has a lot of nice ideas, and in the best
> case Git might be able to learn a few nice tricks from JJ. After
> all, I think we can all benefit from some friendly competition.
One of the best features of Jujutsu is that it plays nice with Git.
Most users work in "colocated" repos that have both a .git and a .jj
directory. Git commands continue to work as usual (mostly). If Git
adopted the change-id header, interleaving of Git and Jujutsu commands
would work even better.
So, +1 from me for friendly competition / collaboration. :-)
Remo
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 10:38 ` Remo Senekowitsch
@ 2025-04-03 11:06 ` Patrick Steinhardt
0 siblings, 0 replies; 118+ messages in thread
From: Patrick Steinhardt @ 2025-04-03 11:06 UTC (permalink / raw)
To: Remo Senekowitsch
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch, Christian Couder
On Thu, Apr 03, 2025 at 12:38:52PM +0200, Remo Senekowitsch wrote:
> On Thu Apr 3, 2025 at 11:09 AM CEST, Patrick Steinhardt wrote:
> > On Wed, Apr 02, 2025 at 11:48:01AM -0700, Martin von Zweigbergk wrote:
> > The biggest question is of course backwards compatibility -- can we
> > introduce a change ID into the commit metadata without breaking existing
> > users? I guess you'll already have a lot of experience with this given
> > that you essentially already inject change IDs into metadata, and tools
> > generally handle this just fine?
>
> Jujutsu has been injecting a 'jj:trees' header into commits to track
> more metadata around merge conflicts. There weren't any problems with
> that, unless one uses git to rewrite these commits with e.g. git-rebase,
> in which case that header is simply lost. But commits with conflicts are
> usually not pushed to a remote anyway, so the risk there was minimal.
> Scott Chacon with GitButler has more experience in this regard, since
> they actually push commits with a change-id in its header to remotes.
> He told the Jujutsu community that they didn't encounter any problems,
> no misbehaving tools that are fussy about unknown headers. The only
> problem is unknown commit headers being dropped by Git itself, depending
> on how it is invoked by the remote. (GitHub seems to preserve the header
> during a rebase-merge, because they use git-replay. GitLab and Forgejo
> drop the header.) With these insights from Scott, Jujutsu is moving
> forward to put the change-id in the commit header.
Yeah, Scott made me aware of the limitations in GitLab already. We
wanted to migrate to git-replay(1) for a long time already, but never
got around to actually doing this. Coincidentally I have recently been
talking with Chris, who proposed to finally go through with this change.
I guess this here is another factor that will make us schedule this
change sooner rather than later.
So: we'll soon start working on it, but I won't promise any timeline.
Patrick
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 9:09 ` Patrick Steinhardt
2025-04-03 10:38 ` Remo Senekowitsch
@ 2025-04-03 15:56 ` Elijah Newren
2025-04-03 16:25 ` Remo Senekowitsch
2025-04-04 9:41 ` Patrick Steinhardt
1 sibling, 2 replies; 118+ messages in thread
From: Elijah Newren @ 2025-04-03 15:56 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, remo, philipmetzger@bluewin.ch
On Thu, Apr 3, 2025 at 2:13 AM Patrick Steinhardt <ps@pks.im> wrote:
>
[...]
> Agreed, change IDs solve a couple of issues that many users face:
>
> - You can reliably track how a patch evolves over time. This helps
> various different tools to track identity of commits, like for
> example forges, but also tools like git-range-diff(1).
>
> - It becomes trivial to see whether a commit has been cherry-picked
> into another branch. We do have git-cherry(1) to do that right now,
> but that command is based on heuristics and fails as soon as the
> patch itself needed to be adapted.
>
> - Working with history rewrites becomes easier in the general case as
> you don't have to adapt to constantly changing commit IDs.
Could you elaborate? I agree with the other points you raise, but I'm
unsure how this helps with a history rewrite. Do you mean the
rewriting of history, or someone trying to consume the history
rewrite? If the former, I don't see it, and if the latter, didn't you
already cover that in the two bullets above? Or is there something
else you are also getting at?
> So what would it take to get change IDs into Git? I think the most
> important items would be:
>
> - Generating and writing change IDs in commands that support them.
> This includes e.g. git-commit(1), git-commit-tree(1), git-merge(1),
> git-merge-tree(1). This should of course be completely optional and
> probably be disabled by default.
>
> - Making tools that rewrite commits aware of change IDs so that they
> know to retain change IDs. This involves e.g. git-cherry-pick(1),
> git-rebase(1), git-replay(1).
And also git-commit(1) [when passing --amend], and git-fast-export(1)
and git-fast-import(1) -- though possibly with options for the last
two to expunge them instead of preserving them, but probably
defaulting to preserving them.
However, I think some of these might already handle this. Commands
which call read_commit_extra_headers() and pass those along to
commit_tree_extended() may already preserve these. It appears commit
--amend and replay both do this. sequencer has some code that looks
relevant, but it appears to only be reading the headers from HEAD (at
the time the nth commit is being replayed), which seems like it'd be
looking at the wrong commit. That might actually be a bug...
> - Extending revisions to allow specifying commits by change ID.
Would this essentially be similar to <rev>^{/<text>} except searching
specifically change-id headers rather than commit message?
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 15:56 ` Elijah Newren
@ 2025-04-03 16:25 ` Remo Senekowitsch
2025-04-03 16:38 ` Elijah Newren
2025-04-04 9:41 ` Patrick Steinhardt
1 sibling, 1 reply; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-03 16:25 UTC (permalink / raw)
To: Elijah Newren, Patrick Steinhardt
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Thu Apr 3, 2025 at 5:56 PM CEST, Elijah Newren wrote:
> On Thu, Apr 3, 2025 at 2:13 AM Patrick Steinhardt <ps@pks.im> wrote:
>>
>> - Extending revisions to allow specifying commits by change ID.
>
> Would this essentially be similar to <rev>^{/<text>} except searching
> specifically change-id headers rather than commit message?
One benefit of using the "reverse-hex" format (hex with a different
alphabet: z(0) through k(25)) we're proposing is that it allows a
change-id or its prefix to be used in the same place as a commit hash,
without ambiguity.
Remo
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 16:25 ` Remo Senekowitsch
@ 2025-04-03 16:38 ` Elijah Newren
2025-04-03 21:46 ` Martin von Zweigbergk
0 siblings, 1 reply; 118+ messages in thread
From: Elijah Newren @ 2025-04-03 16:38 UTC (permalink / raw)
To: Remo Senekowitsch
Cc: Patrick Steinhardt, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, philipmetzger@bluewin.ch
On Thu, Apr 3, 2025 at 9:25 AM Remo Senekowitsch <remo@buenzli.dev> wrote:
>
> On Thu Apr 3, 2025 at 5:56 PM CEST, Elijah Newren wrote:
> > On Thu, Apr 3, 2025 at 2:13 AM Patrick Steinhardt <ps@pks.im> wrote:
> >>
> >> - Extending revisions to allow specifying commits by change ID.
> >
> > Would this essentially be similar to <rev>^{/<text>} except searching
> > specifically change-id headers rather than commit message?
>
> One benefit of using the "reverse-hex" format (hex with a different
> alphabet: z(0) through k(25)) we're proposing is that it allows a
> change-id or its prefix to be used in the same place as a commit hash,
> without ambiguity.
I already saw that, but this doesn't address my question.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 16:38 ` Elijah Newren
@ 2025-04-03 21:46 ` Martin von Zweigbergk
0 siblings, 0 replies; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-04-03 21:46 UTC (permalink / raw)
To: Elijah Newren
Cc: Remo Senekowitsch, Patrick Steinhardt, Git Mailing List,
Edwin Kempin, Scott Chacon, philipmetzger@bluewin.ch
On Thu, 3 Apr 2025 at 09:39, Elijah Newren <newren@gmail.com> wrote:
>
> On Thu, Apr 3, 2025 at 9:25 AM Remo Senekowitsch <remo@buenzli.dev> wrote:
> >
> > On Thu Apr 3, 2025 at 5:56 PM CEST, Elijah Newren wrote:
> > > On Thu, Apr 3, 2025 at 2:13 AM Patrick Steinhardt <ps@pks.im> wrote:
> > >>
> > >> - Extending revisions to allow specifying commits by change ID.
> > >
> > > Would this essentially be similar to <rev>^{/<text>} except searching
> > > specifically change-id headers rather than commit message?
> >
> > One benefit of using the "reverse-hex" format (hex with a different
> > alphabet: z(0) through k(25)) we're proposing is that it allows a
> > change-id or its prefix to be used in the same place as a commit hash,
> > without ambiguity.
>
> I already saw that, but this doesn't address my question.
Jujutsu has a persistent index of change ids to commit ids. It's
similar to Git's commit graph. As you might expect, this index often
has multiple commits for a single change id. When needed, we then
build an in-memory index of only the subset of commits currently
reachable from the visible head commits (you can think of it as all
currently reachable commits from Git branches, but we don't require a
branch to keep a commit visible). When you do something like `jj show
xyz`, we use that in-memory index to find the relevant commits. That's
usually just one commit.
By the way, we actually have another level of lookup that happens
before that. You can configure an expression for the set of commits
you want to prioritize short change id prefixes and commit id prefixes
for. If you say that you want all commits that are not ancestors of
any remote-tracking branch to be shorter, for example, then we build
an additional in-memory index of only those commits. That means that
both change id prefixes and commit id prefixes are often just a few
characters long, even in huge repos like the monorepo at Google.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 15:56 ` Elijah Newren
2025-04-03 16:25 ` Remo Senekowitsch
@ 2025-04-04 9:41 ` Patrick Steinhardt
1 sibling, 0 replies; 118+ messages in thread
From: Patrick Steinhardt @ 2025-04-04 9:41 UTC (permalink / raw)
To: Elijah Newren
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, remo, philipmetzger@bluewin.ch
On Thu, Apr 03, 2025 at 08:56:01AM -0700, Elijah Newren wrote:
> On Thu, Apr 3, 2025 at 2:13 AM Patrick Steinhardt <ps@pks.im> wrote:
> >
> [...]
> > Agreed, change IDs solve a couple of issues that many users face:
> >
> > - You can reliably track how a patch evolves over time. This helps
> > various different tools to track identity of commits, like for
> > example forges, but also tools like git-range-diff(1).
> >
> > - It becomes trivial to see whether a commit has been cherry-picked
> > into another branch. We do have git-cherry(1) to do that right now,
> > but that command is based on heuristics and fails as soon as the
> > patch itself needed to be adapted.
> >
> > - Working with history rewrites becomes easier in the general case as
> > you don't have to adapt to constantly changing commit IDs.
>
> Could you elaborate? I agree with the other points you raise, but I'm
> unsure how this helps with a history rewrite. Do you mean the
> rewriting of history, or someone trying to consume the history
> rewrite? If the former, I don't see it, and if the latter, didn't you
> already cover that in the two bullets above? Or is there something
> else you are also getting at?
Yeah, this point wasn't quite clear. It's mostly based around my own
findings that I always end up copying a lot of object IDs around while
working on rebases. And the most annoying part to me is that those OIDs
also change on every rewrite, and as a consequence I always have to look
up the rewritten object IDs.
By using change IDs this issue would become easier as I only need to
remember one set of constant IDs that don't change on ever rewrite.
> > So what would it take to get change IDs into Git? I think the most
> > important items would be:
> >
> > - Generating and writing change IDs in commands that support them.
> > This includes e.g. git-commit(1), git-commit-tree(1), git-merge(1),
> > git-merge-tree(1). This should of course be completely optional and
> > probably be disabled by default.
> >
> > - Making tools that rewrite commits aware of change IDs so that they
> > know to retain change IDs. This involves e.g. git-cherry-pick(1),
> > git-rebase(1), git-replay(1).
>
> And also git-commit(1) [when passing --amend], and git-fast-export(1)
> and git-fast-import(1) -- though possibly with options for the last
> two to expunge them instead of preserving them, but probably
> defaulting to preserving them.
>
> However, I think some of these might already handle this. Commands
> which call read_commit_extra_headers() and pass those along to
> commit_tree_extended() may already preserve these. It appears commit
> --amend and replay both do this. sequencer has some code that looks
> relevant, but it appears to only be reading the headers from HEAD (at
> the time the nth commit is being replayed), which seems like it'd be
> looking at the wrong commit. That might actually be a bug...
Yes, some tools already handle this correctly indeed.
> > - Extending revisions to allow specifying commits by change ID.
>
> Would this essentially be similar to <rev>^{/<text>} except searching
> specifically change-id headers rather than commit message?
Yeah, something like that. The exact format for such a new revision
would be up for debate, but I quite like the reverse hex format that JJ
itself uses as it is unambiguous compared to object IDs. Only problem of
course is that it's not unambiguous compared to refnames, so accepting
reverse object IDS as-is without any kind of prefix is probably a no-go.
Patrick
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-02 18:48 Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer Martin von Zweigbergk
` (3 preceding siblings ...)
2025-04-03 9:09 ` Patrick Steinhardt
@ 2025-04-03 15:39 ` Elijah Newren
2025-04-03 16:40 ` Remo Senekowitsch
` (2 more replies)
2025-04-07 20:59 ` Junio C Hamano
2025-08-19 14:04 ` Askar Safin
6 siblings, 3 replies; 118+ messages in thread
From: Elijah Newren @ 2025-04-03 15:39 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Wed, Apr 2, 2025 at 11:48 AM Martin von Zweigbergk
<martinvonz@google.com> wrote:
>
> Hi,
>
> The Gerrit, GitButler, and Jujutsu projects all have a concept of
> a "change id", and it behaves in a similar way between the three
> tools. The change id is conceptually associated with a commit.
> It follows a commit as its rewritten (e.g. by amending and
> rebasing). The three projects currently store and format the
> change id differently. We would like to unify that so we can
> interoperate better. We hope the Git project is also interested
> in preserving and using this header.
>
> There are many benefits to having a change id even if it's just
> local. I mentioned some in my email to this mailing list in [1].
> For example, it enables
> `git rebase main <change ID>; git switch <change ID>` without
> requiring the user to look up the hash of the rewritten commit.
But <change ID> isn't unique, right? The whole point of having the
change ID is to preserve it despite edits (e.g. rebase, commit
--amend, cherry-pick), meaning that you end up with multiple commits
with the same <change ID>.
Why would this work?
And if it does work, isn't it expensive since you'd need to walk
history to find it? Or do you keep an extra lookup table on the side
somewhere?
> If the change id also transferred between repos and preserved by
> a forge (such as Gerrit), it enables the change id to be used to
> identify a code review.
>
> Here's how the change ids are currently stored and formatted:
>
> * Gerrit currently stores change ids in a commit trailer called
> `Change-Id`. It always starts with the letter 'I' and is
> followed by 40 hex digits. For example:
> `Change-Id: Ib563e78c3fedcff262255fa025441daa3202311b`.
>
> * GitButler currently stores change ids in a commit footer
> called `gitbutler-change-id` (older versions used
> `change-id`). It's written as 32 hex digits separated by
> dashes as in the UUID format. For example:
> `gitbutler-change-id 7d0fbc63-032d-413c-8ae8-610fbeb713c0`.
>
> * Jujutsu currently stores change ids in a local storage outside
> of the Git repo and is therefore not part of the Git commit
> id. It is stored as 16 bytes. It is rendered to the user as
> "reverse hex" using 'z' through 'k' as hex digits ('z' = 0,
> 'k' = 15). This allows even short prefixes to be distinguished
> from commit ids, which is a very useful property when used in
> the CLI.
>
> As mentioned, the three projects would like to use the same
> storage and format. I think we have a consensus to store it in a
> Git commit header called `change-id` as a 32 reverse-hex digis.
> For example: `change-id ywlktllmukprnxnmzzprukpuwyztylwt`.
Yaay, I always hated it as a trailer.
> There is a design doc [2] about the impact on Gerrit and how to
> handle various cases where the client doesn't understand the
> `change-id` header. That also includes some discussion about
> whether cherry-picking should preserve the change id or create a
> new one. I think there is a lot of value in having a
> standardized header regardless of what we decide about
> cherry-picks.
cherry-pick & rebase preserve author name, email & time, while
creating a new committer name, email, & time. To me, the change-id is
about the authorship, and since these commands already preserve
authorship, it'd seem weird to me to have cherry-pick not preserve the
change-id by default.
> So, to be clear, this is mostly a heads up at this point; we don't
> depend on any immediate changes from the Git project.
I appreciate the heads up, and agree based on what I've seen so far
that you can at least get started without Git changes.
However, I think you'd want git to preserve the change-id headers upon
git commit --amend, rebase, or cherry-pick, which would require some
git changes. And you may want git to preserve them when doing a
fast-export, and be able to read them in with fast-import.
Anyway, I was a voice in the past that was kind of against these,
though that was mostly as a commit footer. Plus, the number of
projects using them, hearing about their experience at Git Merge, and
realizing that part of my objection was due to misuse of Gerrit by
some folks in the past have all lead me to change my opinion.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 15:39 ` Elijah Newren
@ 2025-04-03 16:40 ` Remo Senekowitsch
2025-04-03 22:11 ` Kane York
2025-04-04 2:28 ` Elijah Newren
2025-04-03 17:48 ` Theodore Ts'o
2025-04-03 18:10 ` Nico Williams
2 siblings, 2 replies; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-03 16:40 UTC (permalink / raw)
To: Elijah Newren, Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Thu Apr 3, 2025 at 5:39 PM CEST, Elijah Newren wrote:
> On Wed, Apr 2, 2025 at 11:48 AM Martin von Zweigbergk
> <martinvonz@google.com> wrote:
>>
>> There are many benefits to having a change id even if it's just
>> local. I mentioned some in my email to this mailing list in [1].
>> For example, it enables
>> `git rebase main <change ID>; git switch <change ID>` without
>> requiring the user to look up the hash of the rewritten commit.
>
> But <change ID> isn't unique, right? The whole point of having the
> change ID is to preserve it despite edits (e.g. rebase, commit
> --amend, cherry-pick), meaning that you end up with multiple commits
> with the same <change ID>.
>
> Why would this work?
>
> And if it does work, isn't it expensive since you'd need to walk
> history to find it? Or do you keep an extra lookup table on the side
> somewhere?
For rebase and commit --amend, the way Jujutsu deals with those is that
all descendants are immediately rebased on top of the new commit, and
refs to those descendants are updated as well. That means, the old
version of the patch with the same change-id becomes unreachable. So,
at least most of the time, the change-id is indeed unique.
This doesn't work for cherry-pick, more on that below.
Some of these features are not in Git yet, at least not to my knowledge.
That means getting the full benefit of change-ids with Git itself
would indeed require some more work. I know of rebase.updateRefs
and rebase.rebaseMerges, which move the Git experience closer to
Jujutsu, but don't go all the way. AFAIK it's not possible with Git to
automatically rebase --update-refs all descendants of a commit that is
amended or rebased.
Jujutsu does keep a separate index of change-ids, yes.
>> There is a design doc [2] about the impact on Gerrit and how to
>> handle various cases where the client doesn't understand the
>> `change-id` header. That also includes some discussion about
>> whether cherry-picking should preserve the change id or create a
>> new one. I think there is a lot of value in having a
>> standardized header regardless of what we decide about
>> cherry-picks.
>
> cherry-pick & rebase preserve author name, email & time, while
> creating a new committer name, email, & time. To me, the change-id is
> about the authorship, and since these commands already preserve
> authorship, it'd seem weird to me to have cherry-pick not preserve the
> change-id by default.
I'd say Jujutsu, Gerrit and GitButler think of a change-id as associated
with a unit of review. (Although it will naturally support reviewing
sets of patches as well.) Usually only one person will push commits with
the same change-id, just like people don't usually force-push over each
others branches. But that's mostly about avoiding logistical problems.
When an employee leaves a company or is on vacation, it can be perfectly
reasonable for someone else to take over their work. In that case, it
would be appropriate to preserve the change-id, even though authorship
has changed, because the history of code review on that patch should
stay associated with the new version.
Cherry-picking on the other hand often represents a separate unit of
review. That review may revolve around whether it makes sense to
backport a bugfix at all or any additional changes that may have been
necessary to make the bugfix work in the different, older codebase.
As mentioned above, there's also the issue that preserving the change-id
on cherry-pick likely results in duplicates. For Jujutsu, it would be
nice it this was avoided. But it's not infeasible to deal with that
either.
For Gerrit, it would be important to be able to track a change across
cherry-picks somehow, since that is a feature they already have. If Git
decides to preserve the change-id on cherry-pick, there's no problem
for Gerrit. Alternatives include storing a separate cherry-picked-from
header or enabling the -x flag on cherry-pick by default.
Remo
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 16:40 ` Remo Senekowitsch
@ 2025-04-03 22:11 ` Kane York
2025-04-04 2:28 ` Elijah Newren
1 sibling, 0 replies; 118+ messages in thread
From: Kane York @ 2025-04-03 22:11 UTC (permalink / raw)
To: remo; +Cc: ekempin, git, martinvonz, newren, philipmetzger, scott
On Thu, 03 Apr 2025 18:40:20 +0200, Remo Senekowitsch wrote:
> On Thu Apr 3, 2025 at 5:39 PM CEST, Elijah Newren wrote:
>> cherry-pick & rebase preserve author name, email & time, while creating a
>> new committer name, email, & time. To me, the change-id is about the
>> authorship, and since these commands already preserve authorship, it'd seem
>> weird to me to have cherry-pick not preserve the change-id by default.
> I'd say Jujutsu, Gerrit and GitButler think of a change-id as associated with
> a unit of review.
>
> [...]
>
> Cherry-picking on the other hand often represents a separate unit of review.
> That review may revolve around whether it makes sense to backport a bugfix at
> all or any additional changes that may have been necessary to make the bugfix
> work in the different, older codebase.
>
> As mentioned above, there's also the issue that preserving the change-id on
> cherry-pick likely results in duplicates. For Jujutsu, it would be nice it
> this was avoided. But it's not infeasible to deal with that either.
>
> For Gerrit, it would be important to be able to track a change across
> cherry-picks somehow, since that is a feature they already have. If Git
> decides to preserve the change-id on cherry-pick, there's no problem for
> Gerrit. Alternatives include storing a separate cherry-picked-from header or
> enabling the -x flag on cherry-pick by default.
I agree, and propose this concrete behavior:
- git cherry-pick generates a fresh `change-id`, and places the old change-id
in a `cherry-picked-change` header
- git cherry-pick preserves the old `change-id` if passed the new
`--preserve-change-id` flag
- git rebase passes the `--preserve-change-id` flag on 'pick' actions, unless
passed the new `--no-preserve-change-id` flag
- git rebase uses the earlier commit's change-id on 'fixup' actions
- git rebase prints the change-ids into the COMMIT_MSG on 'squash' actions, and
tries to read the user's choice of which to use, defaulting to the earlier
commit if both or none are present
- git rebase creates a new change-id on 'merge' actions
- git rebase needs no special behavior specified for 'edit', 'exec', 'break',
'drop', 'label', 'reset', 'update-ref'
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 16:40 ` Remo Senekowitsch
2025-04-03 22:11 ` Kane York
@ 2025-04-04 2:28 ` Elijah Newren
2025-04-04 2:40 ` Elijah Newren
1 sibling, 1 reply; 118+ messages in thread
From: Elijah Newren @ 2025-04-04 2:28 UTC (permalink / raw)
To: Remo Senekowitsch
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Thu, Apr 3, 2025 at 9:40 AM Remo Senekowitsch <remo@buenzli.dev> wrote:
>
> On Thu Apr 3, 2025 at 5:39 PM CEST, Elijah Newren wrote:
> > On Wed, Apr 2, 2025 at 11:48 AM Martin von Zweigbergk
> > <martinvonz@google.com> wrote:
> >>
> >> There are many benefits to having a change id even if it's just
> >> local. I mentioned some in my email to this mailing list in [1].
> >> For example, it enables
> >> `git rebase main <change ID>; git switch <change ID>` without
> >> requiring the user to look up the hash of the rewritten commit.
> >
> > But <change ID> isn't unique, right? The whole point of having the
> > change ID is to preserve it despite edits (e.g. rebase, commit
> > --amend, cherry-pick), meaning that you end up with multiple commits
> > with the same <change ID>.
> >
> > Why would this work?
> >
> > And if it does work, isn't it expensive since you'd need to walk
> > history to find it? Or do you keep an extra lookup table on the side
> > somewhere?
>
> For rebase and commit --amend, the way Jujutsu deals with those is that
> all descendants are immediately rebased on top of the new commit, and
> refs to those descendants are updated as well. That means, the old
> version of the patch with the same change-id becomes unreachable. So,
> at least most of the time, the change-id is indeed unique.
>
> This doesn't work for cherry-pick, more on that below.
>
> Some of these features are not in Git yet, at least not to my knowledge.
> That means getting the full benefit of change-ids with Git itself
> would indeed require some more work. I know of rebase.updateRefs
> and rebase.rebaseMerges, which move the Git experience closer to
> Jujutsu, but don't go all the way. AFAIK it's not possible with Git to
> automatically rebase --update-refs all descendants of a commit that is
> amended or rebased.
Correct; that doesn't exist currently.
> Jujutsu does keep a separate index of change-ids, yes.
Thanks.
> >> There is a design doc [2] about the impact on Gerrit and how to
> >> handle various cases where the client doesn't understand the
> >> `change-id` header. That also includes some discussion about
> >> whether cherry-picking should preserve the change id or create a
> >> new one. I think there is a lot of value in having a
> >> standardized header regardless of what we decide about
> >> cherry-picks.
> >
> > cherry-pick & rebase preserve author name, email & time, while
> > creating a new committer name, email, & time. To me, the change-id is
> > about the authorship, and since these commands already preserve
> > authorship, it'd seem weird to me to have cherry-pick not preserve the
> > change-id by default.
>
> I'd say Jujutsu, Gerrit and GitButler think of a change-id as associated
> with a unit of review. (Although it will naturally support reviewing
> sets of patches as well.) Usually only one person will push commits with
> the same change-id, just like people don't usually force-push over each
> others branches. But that's mostly about avoiding logistical problems.
> When an employee leaves a company or is on vacation, it can be perfectly
> reasonable for someone else to take over their work. In that case, it
> would be appropriate to preserve the change-id, even though authorship
> has changed, because the history of code review on that patch should
> stay associated with the new version.
>
> Cherry-picking on the other hand often represents a separate unit of
> review. That review may revolve around whether it makes sense to
> backport a bugfix at all or any additional changes that may have been
> necessary to make the bugfix work in the different, older codebase.
I've worked with many projects hosted in Gerrit, and they all had a
very different view of change-ids than what you've espoused here.
They cherry-picked changes to other branches, fully expecting the
change-id to be kept the same. They often checked to verify that
important fixes had been backported to all the relevant LTS branches
by looking for the change-id. So, we'd typically have N+1 commits
sharing the same change-id, all reachable from existing branches,
where N is the number of LTS versions still supported at the time (and
the +1 comes from the main branch development).
> As mentioned above, there's also the issue that preserving the change-id
> on cherry-pick likely results in duplicates. For Jujutsu, it would be
> nice it this was avoided. But it's not infeasible to deal with that
> either.
>
> For Gerrit, it would be important to be able to track a change across
> cherry-picks somehow, since that is a feature they already have. If Git
> decides to preserve the change-id on cherry-pick, there's no problem
> for Gerrit. Alternatives include storing a separate cherry-picked-from
> header or enabling the -x flag on cherry-pick by default.
Cherry-picked-from trailers can be nice when it exists, but much more
frequently than one would want it provides a dead-end. People will
cherry-pick a commit that was local-only, or only found in some
security-embargoed repository, and you'd end up with dead ends. You
also occasionally get chains: E cherry-picked from D, which was
cherry-picked from C, which was cherry-picked from B, etc. And more
complex structures are possible. And maybe part of that chain was a
local-only commit or some commit from a security-embargoed repository
that you don't have access to. Then folks get to write scripts and
try to deduce relationships from those trailers (e.g. hey, these two
commits both claim they were cherry-picked from the same non-existent
commit, and this other commit was a cherry-pick of one of these two,
so they're a representation of the same logical change on these
different LTS branches). It makes it a hassle to try to determine
which LTS branches have the appropriate fixes backported and applied.
I've done it, but I thought this problem was logically the point of
change-ids as found in Gerrit, honestly (well, that and its byzantine
push to refs/for/$BRANCH stuff so it could automagically determine
which CR that your push was supposed to be correlated with instead of
just letting you specify via a real refname in your push command).
While I understand that having nearly-unique change-ids let you use
change-ids interchangably with commits, that seems like a questionable
benefit over being able to actually track which logical changes are
the same and have been applied to which LTS branches. I fully realize
folks may disagree...but if we're suggesting commands like `git switch
<change-id>` which can only possibly be meaningful if <change-id> is
unique across all branches, then what are we supposed to do for the
many projects which use change-ids for LTS backport tracking? What
does `git switch <change-id>` (and any other command where you attempt
to use a non-unique change-id in place of a unique commit identifier)
do for them?
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-04 2:28 ` Elijah Newren
@ 2025-04-04 2:40 ` Elijah Newren
2025-04-04 3:47 ` Martin von Zweigbergk
0 siblings, 1 reply; 118+ messages in thread
From: Elijah Newren @ 2025-04-04 2:40 UTC (permalink / raw)
To: Remo Senekowitsch
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Thu, Apr 3, 2025 at 7:28 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Thu, Apr 3, 2025 at 9:40 AM Remo Senekowitsch <remo@buenzli.dev> wrote:
> >
> > On Thu Apr 3, 2025 at 5:39 PM CEST, Elijah Newren wrote:
> > > On Wed, Apr 2, 2025 at 11:48 AM Martin von Zweigbergk
> > > <martinvonz@google.com> wrote:
> > >>
> > >> There are many benefits to having a change id even if it's just
> > >> local. I mentioned some in my email to this mailing list in [1].
> > >> For example, it enables
> > >> `git rebase main <change ID>; git switch <change ID>` without
> > >> requiring the user to look up the hash of the rewritten commit.
> > >
> > > But <change ID> isn't unique, right? The whole point of having the
> > > change ID is to preserve it despite edits (e.g. rebase, commit
> > > --amend, cherry-pick), meaning that you end up with multiple commits
> > > with the same <change ID>.
> > >
> > > Why would this work?
> > >
> > > And if it does work, isn't it expensive since you'd need to walk
> > > history to find it? Or do you keep an extra lookup table on the side
> > > somewhere?
> >
> > For rebase and commit --amend, the way Jujutsu deals with those is that
> > all descendants are immediately rebased on top of the new commit, and
> > refs to those descendants are updated as well. That means, the old
> > version of the patch with the same change-id becomes unreachable. So,
> > at least most of the time, the change-id is indeed unique.
> >
> > This doesn't work for cherry-pick, more on that below.
> >
> > Some of these features are not in Git yet, at least not to my knowledge.
> > That means getting the full benefit of change-ids with Git itself
> > would indeed require some more work. I know of rebase.updateRefs
> > and rebase.rebaseMerges, which move the Git experience closer to
> > Jujutsu, but don't go all the way. AFAIK it's not possible with Git to
> > automatically rebase --update-refs all descendants of a commit that is
> > amended or rebased.
>
> Correct; that doesn't exist currently.
>
> > Jujutsu does keep a separate index of change-ids, yes.
>
> Thanks.
>
> > >> There is a design doc [2] about the impact on Gerrit and how to
> > >> handle various cases where the client doesn't understand the
> > >> `change-id` header. That also includes some discussion about
> > >> whether cherry-picking should preserve the change id or create a
> > >> new one. I think there is a lot of value in having a
> > >> standardized header regardless of what we decide about
> > >> cherry-picks.
> > >
> > > cherry-pick & rebase preserve author name, email & time, while
> > > creating a new committer name, email, & time. To me, the change-id is
> > > about the authorship, and since these commands already preserve
> > > authorship, it'd seem weird to me to have cherry-pick not preserve the
> > > change-id by default.
> >
> > I'd say Jujutsu, Gerrit and GitButler think of a change-id as associated
> > with a unit of review. (Although it will naturally support reviewing
> > sets of patches as well.) Usually only one person will push commits with
> > the same change-id, just like people don't usually force-push over each
> > others branches. But that's mostly about avoiding logistical problems.
> > When an employee leaves a company or is on vacation, it can be perfectly
> > reasonable for someone else to take over their work. In that case, it
> > would be appropriate to preserve the change-id, even though authorship
> > has changed, because the history of code review on that patch should
> > stay associated with the new version.
> >
> > Cherry-picking on the other hand often represents a separate unit of
> > review. That review may revolve around whether it makes sense to
> > backport a bugfix at all or any additional changes that may have been
> > necessary to make the bugfix work in the different, older codebase.
>
> I've worked with many projects hosted in Gerrit, and they all had a
> very different view of change-ids than what you've espoused here.
> They cherry-picked changes to other branches, fully expecting the
> change-id to be kept the same. They often checked to verify that
> important fixes had been backported to all the relevant LTS branches
> by looking for the change-id. So, we'd typically have N+1 commits
> sharing the same change-id, all reachable from existing branches,
> where N is the number of LTS versions still supported at the time (and
> the +1 comes from the main branch development).
>
> > As mentioned above, there's also the issue that preserving the change-id
> > on cherry-pick likely results in duplicates. For Jujutsu, it would be
> > nice it this was avoided. But it's not infeasible to deal with that
> > either.
> >
> > For Gerrit, it would be important to be able to track a change across
> > cherry-picks somehow, since that is a feature they already have. If Git
> > decides to preserve the change-id on cherry-pick, there's no problem
> > for Gerrit. Alternatives include storing a separate cherry-picked-from
> > header or enabling the -x flag on cherry-pick by default.
>
> Cherry-picked-from trailers can be nice when it exists, but much more
> frequently than one would want it provides a dead-end. People will
> cherry-pick a commit that was local-only, or only found in some
> security-embargoed repository, and you'd end up with dead ends. You
> also occasionally get chains: E cherry-picked from D, which was
> cherry-picked from C, which was cherry-picked from B, etc. And more
> complex structures are possible. And maybe part of that chain was a
> local-only commit or some commit from a security-embargoed repository
> that you don't have access to. Then folks get to write scripts and
> try to deduce relationships from those trailers (e.g. hey, these two
> commits both claim they were cherry-picked from the same non-existent
> commit, and this other commit was a cherry-pick of one of these two,
> so they're a representation of the same logical change on these
> different LTS branches). It makes it a hassle to try to determine
> which LTS branches have the appropriate fixes backported and applied.
> I've done it, but I thought this problem was logically the point of
> change-ids as found in Gerrit, honestly (well, that and its byzantine
> push to refs/for/$BRANCH stuff so it could automagically determine
> which CR that your push was supposed to be correlated with instead of
> just letting you specify via a real refname in your push command).
> While I understand that having nearly-unique change-ids let you use
> change-ids interchangably with commits, that seems like a questionable
> benefit over being able to actually track which logical changes are
> the same and have been applied to which LTS branches. I fully realize
> folks may disagree...but if we're suggesting commands like `git switch
> <change-id>` which can only possibly be meaningful if <change-id> is
> unique across all branches, then what are we supposed to do for the
> many projects which use change-ids for LTS backport tracking? What
> does `git switch <change-id>` (and any other command where you attempt
> to use a non-unique change-id in place of a unique commit identifier)
> do for them?
One possible simple solution here is just to treat change-ids (or
there abbreviations) kind of like abbreviated hashes -- they aren't
guaranteed to be unique. If the user specifies a change-id and there
are multiple branches with such a change-id, we provide the user an
error much like we do for abbreviated hashes.
Is that what folks have in mind? If so, I'll be happy to drop my
reservations about this aspect.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-04 2:40 ` Elijah Newren
@ 2025-04-04 3:47 ` Martin von Zweigbergk
2025-04-04 4:03 ` Nico Williams
2025-04-04 4:59 ` Elijah Newren
0 siblings, 2 replies; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-04-04 3:47 UTC (permalink / raw)
To: Elijah Newren
Cc: Remo Senekowitsch, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Thu, 3 Apr 2025 at 19:40, Elijah Newren <newren@gmail.com> wrote:
>
> One possible simple solution here is just to treat change-ids (or
> there abbreviations) kind of like abbreviated hashes -- they aren't
> guaranteed to be unique. If the user specifies a change-id and there
> are multiple branches with such a change-id, we provide the user an
> error much like we do for abbreviated hashes.
>
> Is that what folks have in mind? If so, I'll be happy to drop my
> reservations about this aspect.
Yes, that's close to what we have in mind. I think I just didn't
explain clearly that it's mostly harmless in at least Jujutsu if there
are multiple commits with the same change id. If there are multiple
visible commits with the same change id, then you'll just have to
decide what should happen when the user tries to refer to commits by
change id. We currently let it resolve to all the visible commits with
the given change id. We may change that to be an error instead [1].
The user can always fall back to using the commit id in such cases. We
call change ids with multiple visible commits "divergent". They
currently show up in red in `jj log`, which I think we all agree makes
them seem unnecessarily scary. We'll probably change that soon [2]
[3].
So when I said that I think it's quite uncommon to have multiple
commits with the same change id, I didn't mean that as an excuse to
not consider the other cases at all. I just mean that I think the vast
majority of commits are not cherry-picked, so we don't need to
optimize the user experience for that case - it's fine if it's a bit
more complicated to refer to such commits.
I hope that clarifies. Let me know if there are still unanswered
questions that I have missed.
[1] https://github.com/jj-vcs/jj/issues/5632
[2] https://github.com/jj-vcs/jj/pull/5800
[3] https://github.com/jj-vcs/jj/pull/5850
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-04 3:47 ` Martin von Zweigbergk
@ 2025-04-04 4:03 ` Nico Williams
2025-04-04 4:59 ` Elijah Newren
1 sibling, 0 replies; 118+ messages in thread
From: Nico Williams @ 2025-04-04 4:03 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Elijah Newren, Remo Senekowitsch, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Thu, Apr 03, 2025 at 08:47:37PM -0700, Martin von Zweigbergk wrote:
> Yes, that's close to what we have in mind. I think I just didn't
> explain clearly that it's mostly harmless in at least Jujutsu if there
> are multiple commits with the same change id. If there are multiple
> visible commits with the same change id, then you'll just have to
> decide what should happen when the user tries to refer to commits by
> change id. We currently let it resolve to all the visible commits with
> the given change id. We may change that to be an error instead [1].
> The user can always fall back to using the commit id in such cases. We
> call change ids with multiple visible commits "divergent". They
> currently show up in red in `jj log`, which I think we all agree makes
> them seem unnecessarily scary. We'll probably change that soon [2]
> [3].
To support search by change ID you can use refs named something like
refs/change-IDs/<change-ID>, and the ref can either point to a singular
commit if the change ID is intended to be unique, or it can point to a
root commit which lists the commits with that change ID.
> So when I said that I think it's quite uncommon to have multiple
> commits with the same change id, I didn't mean that as an excuse to
> not consider the other cases at all. I just mean that I think the vast
> majority of commits are not cherry-picked, so we don't need to
> optimize the user experience for that case - it's fine if it's a bit
> more complicated to refer to such commits.
Fair.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-04 3:47 ` Martin von Zweigbergk
2025-04-04 4:03 ` Nico Williams
@ 2025-04-04 4:59 ` Elijah Newren
2025-04-04 5:21 ` Martin von Zweigbergk
1 sibling, 1 reply; 118+ messages in thread
From: Elijah Newren @ 2025-04-04 4:59 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Remo Senekowitsch, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Thu, Apr 3, 2025 at 8:47 PM Martin von Zweigbergk
<martinvonz@google.com> wrote:
>
> On Thu, 3 Apr 2025 at 19:40, Elijah Newren <newren@gmail.com> wrote:
> >
> > One possible simple solution here is just to treat change-ids (or
> > there abbreviations) kind of like abbreviated hashes -- they aren't
> > guaranteed to be unique. If the user specifies a change-id and there
> > are multiple branches with such a change-id, we provide the user an
> > error much like we do for abbreviated hashes.
> >
> > Is that what folks have in mind? If so, I'll be happy to drop my
> > reservations about this aspect.
>
> Yes, that's close to what we have in mind. I think I just didn't
> explain clearly that it's mostly harmless in at least Jujutsu if there
> are multiple commits with the same change id. If there are multiple
> visible commits with the same change id, then you'll just have to
> decide what should happen when the user tries to refer to commits by
> change id. We currently let it resolve to all the visible commits with
> the given change id.
resolve to all visible commits? So the Jujutsu equivalent of 'git
switch <change-id>' would simultaneously check out N different
branches? Or do commands which cannot accept multiple commits just
throw an error in such a case?
Doing a "git log --no-walk <change-id>" and have it resolve to several
commits would be kinda cool...
> We may change that to be an error instead [1].
> The user can always fall back to using the commit id in such cases. We
> call change ids with multiple visible commits "divergent". They
> currently show up in red in `jj log`, which I think we all agree makes
> them seem unnecessarily scary. We'll probably change that soon [2]
> [3].
>
> So when I said that I think it's quite uncommon to have multiple
> commits with the same change id, I didn't mean that as an excuse to
> not consider the other cases at all. I just mean that I think the vast
> majority of commits are not cherry-picked, so we don't need to
> optimize the user experience for that case - it's fine if it's a bit
> more complicated to refer to such commits.
>
> I hope that clarifies. Let me know if there are still unanswered
> questions that I have missed.
Yep, thanks, that answered all my previous questions...though you
raised one new one that I mentioned above.
> [1] https://github.com/jj-vcs/jj/issues/5632
> [2] https://github.com/jj-vcs/jj/pull/5800
> [3] https://github.com/jj-vcs/jj/pull/5850
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-04 4:59 ` Elijah Newren
@ 2025-04-04 5:21 ` Martin von Zweigbergk
2025-04-04 9:29 ` Patrick Steinhardt
0 siblings, 1 reply; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-04-04 5:21 UTC (permalink / raw)
To: Elijah Newren
Cc: Remo Senekowitsch, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Thu, 3 Apr 2025 at 22:00, Elijah Newren <newren@gmail.com> wrote:
>
> On Thu, Apr 3, 2025 at 8:47 PM Martin von Zweigbergk
> <martinvonz@google.com> wrote:
> >
> > On Thu, 3 Apr 2025 at 19:40, Elijah Newren <newren@gmail.com> wrote:
> > >
> > > One possible simple solution here is just to treat change-ids (or
> > > there abbreviations) kind of like abbreviated hashes -- they aren't
> > > guaranteed to be unique. If the user specifies a change-id and there
> > > are multiple branches with such a change-id, we provide the user an
> > > error much like we do for abbreviated hashes.
> > >
> > > Is that what folks have in mind? If so, I'll be happy to drop my
> > > reservations about this aspect.
> >
> > Yes, that's close to what we have in mind. I think I just didn't
> > explain clearly that it's mostly harmless in at least Jujutsu if there
> > are multiple commits with the same change id. If there are multiple
> > visible commits with the same change id, then you'll just have to
> > decide what should happen when the user tries to refer to commits by
> > change id. We currently let it resolve to all the visible commits with
> > the given change id.
>
> resolve to all visible commits? So the Jujutsu equivalent of 'git
> switch <change-id>' would simultaneously check out N different
> branches? Or do commands which cannot accept multiple commits just
> throw an error in such a case?
Yes, the latter.
The closest equivalent of `git switch` is `jj new` and that command
actually does support multiple commits - that's how you create a merge
commit. But we also have a little safeguard in the form of requiring
you to say `jj new all:xyz` if you really want that. For commands
where it's more expected and harmless to take multiple commits as
input, we don't require the `all:` prefix.
> Doing a "git log --no-walk <change-id>" and have it resolve to several
> commits would be kinda cool...
Yes, this is one of those commands where it's harmless so `jj log -r
xyz` simply shows all (visible) commits with the given change id.
Other examples include `jj abandon`, `jj describe`, `jj diff -r`, `jj
rebase -r`, `jj squash --from`, while e.g. `jj rebase -d` (the
destination) requires the `all:` prefix to allow multiple parents
(making the roots of the rebased set into merge commits).
> Yep, thanks, that answered all my previous questions...though you
> raised one new one that I mentioned above.
Yay!
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-04 5:21 ` Martin von Zweigbergk
@ 2025-04-04 9:29 ` Patrick Steinhardt
0 siblings, 0 replies; 118+ messages in thread
From: Patrick Steinhardt @ 2025-04-04 9:29 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Elijah Newren, Remo Senekowitsch, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Thu, Apr 03, 2025 at 10:21:57PM -0700, Martin von Zweigbergk wrote:
> On Thu, 3 Apr 2025 at 22:00, Elijah Newren <newren@gmail.com> wrote:
> >
> > On Thu, Apr 3, 2025 at 8:47 PM Martin von Zweigbergk
> > <martinvonz@google.com> wrote:
> > >
> > > On Thu, 3 Apr 2025 at 19:40, Elijah Newren <newren@gmail.com> wrote:
> > > >
> > > > One possible simple solution here is just to treat change-ids (or
> > > > there abbreviations) kind of like abbreviated hashes -- they aren't
> > > > guaranteed to be unique. If the user specifies a change-id and there
> > > > are multiple branches with such a change-id, we provide the user an
> > > > error much like we do for abbreviated hashes.
> > > >
> > > > Is that what folks have in mind? If so, I'll be happy to drop my
> > > > reservations about this aspect.
> > >
> > > Yes, that's close to what we have in mind. I think I just didn't
> > > explain clearly that it's mostly harmless in at least Jujutsu if there
> > > are multiple commits with the same change id. If there are multiple
> > > visible commits with the same change id, then you'll just have to
> > > decide what should happen when the user tries to refer to commits by
> > > change id. We currently let it resolve to all the visible commits with
> > > the given change id.
> >
> > resolve to all visible commits? So the Jujutsu equivalent of 'git
> > switch <change-id>' would simultaneously check out N different
> > branches? Or do commands which cannot accept multiple commits just
> > throw an error in such a case?
>
> Yes, the latter.
That sounds like sensible behaviour to me. We already know to print
ambiguous hashes, so we could probably do the same with change IDs:
$ git rev-parse 1234
error: short object ID 1234 is ambiguous
hint: The candidates are:
hint: 1234d8d9179 commit 2018-06-08 - Merge 'add-p-many-files'
hint: 1234e8297f3 commit 2020-10-26 - Merge branch 'en/sequencer-rollback-lock-cleanup' into next
hint: 123456ff3f0 tree
hint: 12349764e65 tree
hint: 1234b76f424 tree
hint: 1234c687269 tree
hint: 1234ebb5d8f blob
fatal: ambiguous argument '1234': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
1234
It should be rather easy to adapt this mechanism to also handle the case
where the same change IDs (or abbreviated change IDs) exist across
multiple commits.
Patrick
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 15:39 ` Elijah Newren
2025-04-03 16:40 ` Remo Senekowitsch
@ 2025-04-03 17:48 ` Theodore Ts'o
2025-04-03 20:31 ` Remo Senekowitsch
2025-04-03 18:10 ` Nico Williams
2 siblings, 1 reply; 118+ messages in thread
From: Theodore Ts'o @ 2025-04-03 17:48 UTC (permalink / raw)
To: Elijah Newren
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, remo, philipmetzger@bluewin.ch
On Thu, Apr 03, 2025 at 08:39:31AM -0700, Elijah Newren wrote:
>
> But <change ID> isn't unique, right? The whole point of having the
> change ID is to preserve it despite edits (e.g. rebase, commit
> --amend, cherry-pick), meaning that you end up with multiple commits
> with the same <change ID>.
It's supposed to be unique, but it isn't always. I've certainly seen
cases where it might not be, but that's arguably a bug. I suspect in
some cases it's because users are cutting and pasting commit
descriptions, and sometimes when they rebase a patch series, patches
will get collapsed or split apart --- especially when backporting to
an older LTS release.
Perhaps because of this, in some communities, their tooling in front
of Gerrit will always regenerate the Commit-ID when doing a
cherry-pick (For example, when cherry-picking from the development
HEAD branch back to a release branch).
So as a cauaionary note, as people use Change ID's in Gerrit today,
sometimes the Change ID changes between rebases, and I've certainly
seen cases where the sematic meaning of the commit has changed
significantly without changing the Change ID. So it's great as a
hint, but in practice, at least today, it might not be completely safe
to assume the semantics are as advertised....
- Ted
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 17:48 ` Theodore Ts'o
@ 2025-04-03 20:31 ` Remo Senekowitsch
2025-04-05 2:09 ` Theodore Ts'o
0 siblings, 1 reply; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-03 20:31 UTC (permalink / raw)
To: Theodore Ts'o, Elijah Newren
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Thu Apr 3, 2025 at 7:48 PM CEST, Theodore Ts'o wrote:
> On Thu, Apr 03, 2025 at 08:39:31AM -0700, Elijah Newren wrote:
>>
>> But <change ID> isn't unique, right? The whole point of having the
>> change ID is to preserve it despite edits (e.g. rebase, commit
>> --amend, cherry-pick), meaning that you end up with multiple commits
>> with the same <change ID>.
>
> It's supposed to be unique, but it isn't always. I've certainly seen
> cases where it might not be, but that's arguably a bug. I suspect in
> some cases it's because users are cutting and pasting commit
> descriptions, and sometimes when they rebase a patch series, patches
> will get collapsed or split apart --- especially when backporting to
> an older LTS release.
>
> Perhaps because of this, in some communities, their tooling in front
> of Gerrit will always regenerate the Commit-ID when doing a
> cherry-pick (For example, when cherry-picking from the development
> HEAD branch back to a release branch).
>
> So as a cauaionary note, as people use Change ID's in Gerrit today,
> sometimes the Change ID changes between rebases, and I've certainly
> seen cases where the sematic meaning of the commit has changed
> significantly without changing the Change ID. So it's great as a
> hint, but in practice, at least today, it might not be completely safe
> to assume the semantics are as advertised....
That is all true. I would just say that some change-ids not being unique
doesn't systematically take away from the benefits. If a given change-id
is unique, you get all the benefits for that patch, independent of
the uniqueness of the other change-ids. If some patch changes a lot
semantically while keeping its change-id, that will degrade its review
history, but without affecting the review history of any other patch.
Remo
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 20:31 ` Remo Senekowitsch
@ 2025-04-05 2:09 ` Theodore Ts'o
0 siblings, 0 replies; 118+ messages in thread
From: Theodore Ts'o @ 2025-04-05 2:09 UTC (permalink / raw)
To: Remo Senekowitsch
Cc: Elijah Newren, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, philipmetzger@bluewin.ch
On Thu, Apr 03, 2025 at 10:31:08PM +0200, Remo Senekowitsch wrote:
> > So as a cauaionary note, as people use Change ID's in Gerrit today,
> > sometimes the Change ID changes between rebases, and I've certainly
> > seen cases where the sematic meaning of the commit has changed
> > significantly without changing the Change ID. So it's great as a
> > hint, but in practice, at least today, it might not be completely safe
> > to assume the semantics are as advertised....
>
> That is all true. I would just say that some change-ids not being unique
> doesn't systematically take away from the benefits. If a given change-id
> is unique, you get all the benefits for that patch, independent of
> the uniqueness of the other change-ids. If some patch changes a lot
> semantically while keeping its change-id, that will degrade its review
> history, but without affecting the review history of any other patch.
Oh, agreed. I've certainly used a Change-ID by pasting it into the
Gerrit Search Box, and when I saw two or three unrelated commits where
the one-line subject were for entirely different kernel subsystem, as
a human I could do the disambiguation myself. And if I did't find a
particular bug fix for the various release brances, I would cut and
paste the one-line commit summary into the Gerrit Search Box, and find
the right commit, again aided by human intelligence.
So I'm super-glad that it's there, and it *is* useful. But I just
wanted to inject a note of caution if people were thinking about using
it in some kind of automated tooling --- especilly if there is a
proposal to inject this into git as a first-class object.
Certainly from a git perspective, I will admit that more often than
not, I generally don't do something like "git log --grep
I23e218cd964f16c0b2b26127d4a5ca6529867673", but rather, "git log
--grep 'mediatek: Don't modify spi_transfer'". If I have to cut and
paste a Change-ID, most of the time I aso have access to the one-line
commit description, and grepping for something human readable is just
easier.
- Ted
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 15:39 ` Elijah Newren
2025-04-03 16:40 ` Remo Senekowitsch
2025-04-03 17:48 ` Theodore Ts'o
@ 2025-04-03 18:10 ` Nico Williams
2025-04-03 21:45 ` Remo Senekowitsch
2025-04-03 22:05 ` Martin von Zweigbergk
2 siblings, 2 replies; 118+ messages in thread
From: Nico Williams @ 2025-04-03 18:10 UTC (permalink / raw)
To: Elijah Newren
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, remo, philipmetzger@bluewin.ch
On Thu, Apr 03, 2025 at 08:39:31AM -0700, Elijah Newren wrote:
> On Wed, Apr 2, 2025 at 11:48 AM Martin von Zweigbergk
> > There are many benefits to having a change id even if it's just
> > local. I mentioned some in my email to this mailing list in [1].
> > For example, it enables
> > `git rebase main <change ID>; git switch <change ID>` without
> > requiring the user to look up the hash of the rewritten commit.
>
> But <change ID> isn't unique, right? The whole point of having the
> change ID is to preserve it despite edits (e.g. rebase, commit
> --amend, cherry-pick), meaning that you end up with multiple commits
> with the same <change ID>.
>
> Why would this work?
I agree that `git rebase main <change ID>; git switch <change ID>` is
not a good UI, and I wouldn't want it even though I want change IDs.
Change IDs are documentary, and they help people understand that various
commits are just edits/rewrites of the same original commit.
> And if it does work, isn't it expensive since you'd need to walk
> history to find it? Or do you keep an extra lookup table on the side
> somewhere?
Worse: since there can be many commits with the same change ID they
can't be used as refs because Git can't possibly be expected to find
_the one_ you really intend -- how could it? I suppose Git could let
you pick from a list, but that's not likely going to have enough
context. Maybe Git could give you a list of named branches in which it
found some change ID's commits to pick one branch from, or maybe one
could `git cherry-pick --from $some_branch $cid` and have Git find the
commit(s) on `$some_branch` that match change ID `$cid`.
Also, maybe commits need to support multiple change IDs. For example,
one might want one change ID for a set of commits (eg implementing a
feature), and each commit in the set having a second and more unique
change ID for just that commit (and all edits/rewrites of it).
> Yaay, I always hated it as a trailer.
More headers is better.
> cherry-pick & rebase preserve author name, email & time, while
> creating a new committer name, email, & time. To me, the change-id is
> about the authorship, and since these commands already preserve
> authorship, it'd seem weird to me to have cherry-pick not preserve the
> change-id by default.
+1. Besides, rebase is just a a bunch of cherry-picks, so cherry-pick
is the fundamental operation here. If rebase is to preserver change-id
then so is cherry-pick.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 18:10 ` Nico Williams
@ 2025-04-03 21:45 ` Remo Senekowitsch
[not found] ` <Z+8GoNrdaJlmNpGm@ubby>
2025-04-03 22:05 ` Martin von Zweigbergk
1 sibling, 1 reply; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-03 21:45 UTC (permalink / raw)
To: Nico Williams, Elijah Newren
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Thu Apr 3, 2025 at 8:10 PM CEST, Nico Williams wrote:
> On Thu, Apr 03, 2025 at 08:39:31AM -0700, Elijah Newren wrote:
> Also, maybe commits need to support multiple change IDs. For example,
> one might want one change ID for a set of commits (eg implementing a
> feature), and each commit in the set having a second and more unique
> change ID for just that commit (and all edits/rewrites of it).
Semantically grouping a set of commits is an intersting idea, but not
really related to the proposed change-id? Mercurial has "topics", which
I don't know too much about. Even if such a "topic-identifier" were
to be stored in the commit header, it probably shouldn't be called the
same thing as the change-id header. Otherwise it becomes impossible to
determine if a change-id header is supposed to identify an individual
patch across its versions or associate it semantically with others.
>> cherry-pick & rebase preserve author name, email & time, while
>> creating a new committer name, email, & time. To me, the change-id is
>> about the authorship, and since these commands already preserve
>> authorship, it'd seem weird to me to have cherry-pick not preserve the
>> change-id by default.
>
> +1. Besides, rebase is just a a bunch of cherry-picks, so cherry-pick
> is the fundamental operation here. If rebase is to preserver change-id
> then so is cherry-pick.
Well, if rebase is just a bunch of cherry-picks, that sounds like an
implementation detail. The ways rebase and cherry-pick are most often
used are semantically very different from each other. (interactive)
rebase is often used to amend commits that already have descendants. In
that case, it makes sense for the change-id to be preserved. cherry-pick
on the other hand is often used to create a dublicate of a patch at a
different location in the commit tree, e.g. for backporting purposes.
The equivalent of cherry-pick in Jujutsu is called "duplicate" and
doesn't preserve the change-id for that reason. So if cherry-pick
retains the change-id, that would lead to more duplicate change-ids in
the tree, degrading their usefulness in those cases for little benefit.
But again, if Git decides to go that route it's still better for the
other projects than not supporting the header at all.
Remo
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 18:10 ` Nico Williams
2025-04-03 21:45 ` Remo Senekowitsch
@ 2025-04-03 22:05 ` Martin von Zweigbergk
2025-04-03 22:13 ` Nico Williams
1 sibling, 1 reply; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-04-03 22:05 UTC (permalink / raw)
To: Nico Williams
Cc: Elijah Newren, Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
I think I may have answered some of your questions here in my other
reply to Elijah, so consider reading that too.
On Thu, 3 Apr 2025 at 11:10, Nico Williams <nico@cryptonector.com> wrote:
>
> I agree that `git rebase main <change ID>; git switch <change ID>` is
> not a good UI, and I wouldn't want it even though I want change IDs.
Why do you think it's not a good UI? Is it because the change ID isn't
meaningful? That's correct, but they are also very convenient. The
unique prefix is usually two letters or so, depending on how many
"local" commits you have in your repo. That makes them easy to type. I
basically never refer to a commit by a branch name anymore.
> > And if it does work, isn't it expensive since you'd need to walk
> > history to find it? Or do you keep an extra lookup table on the side
> > somewhere?
>
> Worse: since there can be many commits with the same change ID they
> can't be used as refs because Git can't possibly be expected to find
> _the one_ you really intend -- how could it? I suppose Git could let
> you pick from a list, but that's not likely going to have enough
> context. Maybe Git could give you a list of named branches in which it
> found some change ID's commits to pick one branch from, or maybe one
> could `git cherry-pick --from $some_branch $cid` and have Git find the
> commit(s) on `$some_branch` that match change ID `$cid`.
See my reply to Elijah. There's usually just one visible commit with a
given change id a repo.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 22:05 ` Martin von Zweigbergk
@ 2025-04-03 22:13 ` Nico Williams
2025-04-03 22:47 ` Martin von Zweigbergk
0 siblings, 1 reply; 118+ messages in thread
From: Nico Williams @ 2025-04-03 22:13 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Elijah Newren, Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Thu, Apr 03, 2025 at 03:05:41PM -0700, Martin von Zweigbergk wrote:
> On Thu, 3 Apr 2025 at 11:10, Nico Williams <nico@cryptonector.com> wrote:
> >
> > I agree that `git rebase main <change ID>; git switch <change ID>` is
> > not a good UI, and I wouldn't want it even though I want change IDs.
>
> Why do you think it's not a good UI? Is it because the change ID isn't
> meaningful? That's correct, but they are also very convenient. The
> unique prefix is usually two letters or so, depending on how many
> "local" commits you have in your repo. That makes them easy to type. I
> basically never refer to a commit by a branch name anymore.
What would `git rebase main <change ID>` do? I assumed that it would
find a commit `<change ID>` in the current branch and rebase it onto the
main branch. That seems workable, I suppose. It would be akin to
`git cherry-pick --from $branch --change-id $change_id`.
What would `git switch <change ID>` do? `git switch` switches between
branches, but a change ID can't possibly identify a branch since many
commits could exist with the same change ID all in different branches.
> > > And if it does work, isn't it expensive since you'd need to walk
> > > history to find it? Or do you keep an extra lookup table on the side
> > > somewhere?
> >
> > Worse: since there can be many commits with the same change ID they
> > can't be used as refs because Git can't possibly be expected to find
> > _the one_ you really intend -- how could it? I suppose Git could let
> > you pick from a list, but that's not likely going to have enough
> > context. Maybe Git could give you a list of named branches in which it
> > found some change ID's commits to pick one branch from, or maybe one
> > could `git cherry-pick --from $some_branch $cid` and have Git find the
> > commit(s) on `$some_branch` that match change ID `$cid`.
>
> See my reply to Elijah. There's usually just one visible commit with a
> given change id a repo.
s/a repo/in a repo/ ?
s/a repo/in a branch/ ?
I'd expect many commits with the same change ID in a _repo_, but at most
one in a _branch_. Except see my other comments about having a second
kind of change ID to identify a set of related commits (e.g., if you
have a rule that features, bug fixes, and tests all must go in separate
commits, which some do, so you might need to have a main commit for some
feature, several commits with fixes to earlier bugs required for that
feature, and several test commits, all related to that feature).
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 22:13 ` Nico Williams
@ 2025-04-03 22:47 ` Martin von Zweigbergk
2025-04-04 2:06 ` Elijah Newren
2025-04-04 3:11 ` Nico Williams
0 siblings, 2 replies; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-04-03 22:47 UTC (permalink / raw)
To: Nico Williams
Cc: Elijah Newren, Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Thu, 3 Apr 2025 at 15:13, Nico Williams <nico@cryptonector.com> wrote:
>
> On Thu, Apr 03, 2025 at 03:05:41PM -0700, Martin von Zweigbergk wrote:
> > On Thu, 3 Apr 2025 at 11:10, Nico Williams <nico@cryptonector.com> wrote:
> > >
> > > I agree that `git rebase main <change ID>; git switch <change ID>` is
> > > not a good UI, and I wouldn't want it even though I want change IDs.
> >
> > Why do you think it's not a good UI? Is it because the change ID isn't
> > meaningful? That's correct, but they are also very convenient. The
> > unique prefix is usually two letters or so, depending on how many
> > "local" commits you have in your repo. That makes them easy to type. I
> > basically never refer to a commit by a branch name anymore.
>
> What would `git rebase main <change ID>` do? I assumed that it would
> find a commit `<change ID>` in the current branch and rebase it onto the
> main branch. That seems workable, I suppose. It would be akin to
> `git cherry-pick --from $branch --change-id $change_id`.
I think part of the problem is that I didn't consider that Git doesn't
really like to work in detached HEAD mode and doesn't automatically
update refs pointing to rewritten commits. This does take away a lot
of the usefulness, unfortunately. It would still be a bit useful as an
argument to readonly commands.
> What would `git switch <change ID>` do? `git switch` switches between
> branches, but a change ID can't possibly identify a branch since many
> commits could exist with the same change ID all in different branches.
Yes, the same change id *can* exist on many branches, but it's pretty
uncommon. It might happen after cherry-picking, depending on what we
decide there, but it should very rarely happen in other cases. When
you rewrite a commit, the old commit usually becomes unreachable, so
if your change id resolved to one commit before the rewrite, then it
will resolve to one commit after the rewrite. I know Git often leaves
descendant branches until you manually rebase them, but at least
that's probably typically a pretty short-lived state. I had a script
for this back when I used Git.
> > > > And if it does work, isn't it expensive since you'd need to walk
> > > > history to find it? Or do you keep an extra lookup table on the side
> > > > somewhere?
> > >
> > > Worse: since there can be many commits with the same change ID they
> > > can't be used as refs because Git can't possibly be expected to find
> > > _the one_ you really intend -- how could it? I suppose Git could let
> > > you pick from a list, but that's not likely going to have enough
> > > context. Maybe Git could give you a list of named branches in which it
> > > found some change ID's commits to pick one branch from, or maybe one
> > > could `git cherry-pick --from $some_branch $cid` and have Git find the
> > > commit(s) on `$some_branch` that match change ID `$cid`.
> >
> > See my reply to Elijah. There's usually just one visible commit with a
> > given change id a repo.
>
> s/a repo/in a repo/ ?
> s/a repo/in a branch/ ?
The former.
> I'd expect many commits with the same change ID in a _repo_, but at most
> one in a _branch_.
There may be multiple commits with the same change ID in a repo if you
had cherry-picked the commit, depending on what we decide to do with
the change ID on cherry-pick. But cherry-picks are not very common
anyway. Or maybe it's common in some workflow? Oh, are you thinking of
a scenario where you cherry-pick your own commit to see if an
alternative approach is better? Sure, if cherry-pick preserve the
change ID, then you would have multiple commits with the same change
ID in that case.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 22:47 ` Martin von Zweigbergk
@ 2025-04-04 2:06 ` Elijah Newren
2025-04-04 3:11 ` Nico Williams
1 sibling, 0 replies; 118+ messages in thread
From: Elijah Newren @ 2025-04-04 2:06 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Nico Williams, Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Thu, Apr 3, 2025 at 3:47 PM Martin von Zweigbergk
<martinvonz@google.com> wrote:
>
[...]
> > What would `git switch <change ID>` do? `git switch` switches between
> > branches, but a change ID can't possibly identify a branch since many
> > commits could exist with the same change ID all in different branches.
>
> Yes, the same change id *can* exist on many branches, but it's pretty
> uncommon.
In my experience with Gerrit-based projects, it was actually pretty common...
> > I'd expect many commits with the same change ID in a _repo_, but at most
> > one in a _branch_.
>
> There may be multiple commits with the same change ID in a repo if you
> had cherry-picked the commit, depending on what we decide to do with
> the change ID on cherry-pick. But cherry-picks are not very common
> anyway. Or maybe it's common in some workflow?
It's quite common for projects with multiple supported LTS releases.
Important fixes are backported to each LTS branch, and people
typically check that the fix has been backported by looking for the
same change id on other branches.
In such projects, 'git switch <change-id>' feels non-sensical to me;
which particular copy of that change-id from which branch would that
switch you to?
Or were the dozens of projects I was using that were hosted in Gerrit
abusing change-ids in your view? If so, how would you get them to
stop?
If you don't have multiple LTS releases to support, or avoid
backporting fixes, then I could see how you'd arrive at the conclusion
that duplicate change-ids were rare, and then you could start using
change-ids interchangably with commit identifiers in commands so long
as you only tried it on such projects. But I'm a bit uncomfortable
suggesting we just assume all projects fit that mold. Am I still
missing something here?
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-03 22:47 ` Martin von Zweigbergk
2025-04-04 2:06 ` Elijah Newren
@ 2025-04-04 3:11 ` Nico Williams
2025-04-04 4:08 ` Martin von Zweigbergk
1 sibling, 1 reply; 118+ messages in thread
From: Nico Williams @ 2025-04-04 3:11 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Elijah Newren, Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Thu, Apr 03, 2025 at 03:47:30PM -0700, Martin von Zweigbergk wrote:
> I think part of the problem is that I didn't consider that Git doesn't
> really like to work in detached HEAD mode and doesn't automatically
> update refs pointing to rewritten commits. This does take away a lot
> of the usefulness, unfortunately. It would still be a bit useful as an
> argument to readonly commands.
I work in detached HEAD mode almost all the time. And yeah, Git won't
update refs when in detached HEAD mode because... what ref should it
update? The whole point of detached HEAD mode is that it does not.
> > What would `git switch <change ID>` do? `git switch` switches between
> > branches, but a change ID can't possibly identify a branch since many
> > commits could exist with the same change ID all in different branches.
>
> Yes, the same change id *can* exist on many branches, but it's pretty
> uncommon. It might happen after cherry-picking, depending on what we
The whole point of change IDs for me is that if I need to backport bug
fixes [0] then I can identify the bug fixes by change ID and then
cherry-pick them onto support branches, which means that yes, there will
be many commits with the same change ID, each on different branches.
Besides backports another use case that leaves multiple commits with the
same change ID is when I'm working on multiple different approaches to
implementing some feature on a complex codebase. I might have two or
three branches exploring different ways to implement some feature, and
of course I would want to use the same change IDs for similar commits
even if I didn't use cherry-pick to create all but the first.
Here's a third use-case that's quite real for me at $WORK where I have
repos for Debian-style packaging and use different branches for
different OS releases. Some such pkgs are ones that we patch locally,
so each release-specific branch targets a different OS release, but they
all mostly have the same contents, in which case this is similar to
backports. Other pkgs are locally developed and rarely differ on
different OS releases but occasionally they have to for <reasons> --
this is not similar to backports.
[0] Granted, the industry has moved away from backports. But many open
source projects still backport security fixes. E.g., OpenSSL,
PostgreSQL, etc. So this is relevant.
> decide there, but it should very rarely happen in other cases. When
> you rewrite a commit, the old commit usually becomes unreachable, so
> if your change id resolved to one commit before the rewrite, then it
> will resolve to one commit after the rewrite. [...]
No matter what you will not have a guarantee of one commit for any given
change ID. Therefore `git switch $change_id` is not workable, not
without interaction, and even then still not workable because Git might
have to search _many_ branches to find commits matching the given change
ID. (Fossil could have an index on change ID and trivially make that
search possible, but for Git adding an index is more complicated.)
> > I'd expect many commits with the same change ID in a _repo_, but at most
> > one in a _branch_.
>
> There may be multiple commits with the same change ID in a repo if you
> had cherry-picked the commit, depending on what we decide to do with
> the change ID on cherry-pick. But cherry-picks are not very common
> anyway. Or maybe it's common in some workflow? Oh, are you thinking of
> a scenario where you cherry-pick your own commit to see if an
> alternative approach is better? Sure, if cherry-pick preserve the
> change ID, then you would have multiple commits with the same change
> ID in that case.
Even if cherry-picking were not common it's supported, therefore you
can't have change IDs be unique like this. That doesn't make change IDs
useless -- on the contrary, their utility comes from the fact that they
are not repo-wide unique!
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-04 3:11 ` Nico Williams
@ 2025-04-04 4:08 ` Martin von Zweigbergk
2025-04-04 4:23 ` Nico Williams
0 siblings, 1 reply; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-04-04 4:08 UTC (permalink / raw)
To: Nico Williams
Cc: Elijah Newren, Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Thu, 3 Apr 2025 at 20:11, Nico Williams <nico@cryptonector.com> wrote:
>
> On Thu, Apr 03, 2025 at 03:47:30PM -0700, Martin von Zweigbergk wrote:
> > I think part of the problem is that I didn't consider that Git doesn't
> > really like to work in detached HEAD mode and doesn't automatically
> > update refs pointing to rewritten commits. This does take away a lot
> > of the usefulness, unfortunately. It would still be a bit useful as an
> > argument to readonly commands.
>
> I work in detached HEAD mode almost all the time. And yeah, Git won't
> update refs when in detached HEAD mode because... what ref should it
> update? The whole point of detached HEAD mode is that it does not.
Jujutsu (and Mercurial) keep track of the set of visible heads. There
can be branches but they are not necessary. When you rewrite a commit,
Jutjutsu always rewrites all descendants. It also updates all branches
pointing to those commits automatically. For example, if you update
the description of some commit with `jj describe -m 'new description
--revision xyz', then commit xyz and all its descendants will be
updated, and any branches pointing to any of the rewritten commits
will be updated.
I think this is getting off topic. I can provide more detail to you in
a private message if necessary, or maybe we should start a new thread.
I don't know what the convention on this list is. Or join the Discord
channel [1], or the #jujutsu channel on Libera.Chat.
> > > What would `git switch <change ID>` do? `git switch` switches between
> > > branches, but a change ID can't possibly identify a branch since many
> > > commits could exist with the same change ID all in different branches.
> >
> > Yes, the same change id *can* exist on many branches, but it's pretty
> > uncommon. It might happen after cherry-picking, depending on what we
>
> The whole point of change IDs for me is that if I need to backport bug
> fixes [0] then I can identify the bug fixes by change ID and then
> cherry-pick them onto support branches, which means that yes, there will
> be many commits with the same change ID, each on different branches.
I agree that that can be one reason to use change IDs but I disagree
that it's the only reason. Having a stable way to refer to an evolving
commit is also important.
> Besides backports another use case that leaves multiple commits with the
> same change ID is when I'm working on multiple different approaches to
> implementing some feature on a complex codebase. I might have two or
> three branches exploring different ways to implement some feature, and
> of course I would want to use the same change IDs for similar commits
> even if I didn't use cherry-pick to create all but the first.
This sounds more like the idea of "topics". It's an interesting
discussion, but it seems off topic for this thread.
> and even then still not workable because Git might
> have to search _many_ branches to find commits matching the given change
> ID. (Fossil could have an index on change ID and trivially make that
> search possible, but for Git adding an index is more complicated.)
Yes, I understand that it would be significant work to add support in
Git. I hope that Git can gain the feature eventually, but we have no
expectation that it will be implemented soon, especially not the UX
part (the preservation-on-rewrite part should be simpler, I think).
[1] https://discord.gg/dkmfj3aGQN
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-04 4:08 ` Martin von Zweigbergk
@ 2025-04-04 4:23 ` Nico Williams
2025-04-04 9:34 ` Patrick Steinhardt
0 siblings, 1 reply; 118+ messages in thread
From: Nico Williams @ 2025-04-04 4:23 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Elijah Newren, Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Thu, Apr 03, 2025 at 09:08:59PM -0700, Martin von Zweigbergk wrote:
> Jujutsu (and Mercurial) keep track of the set of visible heads. There
> can be branches but they are not necessary. When you rewrite a commit,
> Jutjutsu always rewrites all descendants. It also updates all branches
> pointing to those commits automatically. For example, if you update
> the description of some commit with `jj describe -m 'new description
> --revision xyz', then commit xyz and all its descendants will be
> updated, and any branches pointing to any of the rewritten commits
> will be updated.
I think I would find that too patronising, probably unbearable.
> > and even then still not workable because Git might
> > have to search _many_ branches to find commits matching the given change
> > ID. (Fossil could have an index on change ID and trivially make that
> > search possible, but for Git adding an index is more complicated.)
>
> Yes, I understand that it would be significant work to add support in
> Git. I hope that Git can gain the feature eventually, but we have no
> expectation that it will be implemented soon, especially not the UX
> part (the preservation-on-rewrite part should be simpler, I think).
Ah, well, Git does have an index: refs. You could use
refs/change-IDs/<change-ID> to index by change ID.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-04 4:23 ` Nico Williams
@ 2025-04-04 9:34 ` Patrick Steinhardt
2025-04-04 16:04 ` Nico Williams
0 siblings, 1 reply; 118+ messages in thread
From: Patrick Steinhardt @ 2025-04-04 9:34 UTC (permalink / raw)
To: Nico Williams
Cc: Martin von Zweigbergk, Elijah Newren, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Thu, Apr 03, 2025 at 11:23:43PM -0500, Nico Williams wrote:
> On Thu, Apr 03, 2025 at 09:08:59PM -0700, Martin von Zweigbergk wrote:
> > > and even then still not workable because Git might
> > > have to search _many_ branches to find commits matching the given change
> > > ID. (Fossil could have an index on change ID and trivially make that
> > > search possible, but for Git adding an index is more complicated.)
> >
> > Yes, I understand that it would be significant work to add support in
> > Git. I hope that Git can gain the feature eventually, but we have no
> > expectation that it will be implemented soon, especially not the UX
> > part (the preservation-on-rewrite part should be simpler, I think).
>
> Ah, well, Git does have an index: refs. You could use
> refs/change-IDs/<change-ID> to index by change ID.
I don't think references are a good mechanism to track change IDs. The
expectation around refs is that users can change them basically at will,
but it certainly does not make any sense to let them update change IDs.
Furthermore, as we have already discussed, change IDs are not unique,
but refs can only point to a single commit ID. So that's another
mismatch that we cannot address.
I think caching the information in an auxiliary data structure would
thus be a lot more reasonable. This could for example be part of commit
graphs, but could also be a separate index specific to change IDs
themselves.
Patirck
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-04 9:34 ` Patrick Steinhardt
@ 2025-04-04 16:04 ` Nico Williams
2025-04-07 8:00 ` Patrick Steinhardt
0 siblings, 1 reply; 118+ messages in thread
From: Nico Williams @ 2025-04-04 16:04 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Martin von Zweigbergk, Elijah Newren, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Fri, Apr 04, 2025 at 11:34:20AM +0200, Patrick Steinhardt wrote:
> > Ah, well, Git does have an index: refs. You could use
> > refs/change-IDs/<change-ID> to index by change ID.
>
> I don't think references are a good mechanism to track change IDs. The
> expectation around refs is that users can change them basically at will,
> but it certainly does not make any sense to let them update change IDs.
git meta (a separate project) uses refs named refs/commits/$full_hash.
Users can always break their repos in a many many ways. Using refs is
fine.
> Furthermore, as we have already discussed, change IDs are not unique,
> but refs can only point to a single commit ID. So that's another
> mismatch that we cannot address.
Oh I agree with that! But I think I've made that pretty clear by now :]
However one could have refs/change-IDs/<change-ID> ultimately point to
an object that lists the commits with that change ID, and then that
would function as an index for non-unique change IDs.
> I think caching the information in an auxiliary data structure would
> thus be a lot more reasonable. This could for example be part of commit
> graphs, but could also be a separate index specific to change IDs
> themselves.
Sure.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-04 16:04 ` Nico Williams
@ 2025-04-07 8:00 ` Patrick Steinhardt
0 siblings, 0 replies; 118+ messages in thread
From: Patrick Steinhardt @ 2025-04-07 8:00 UTC (permalink / raw)
To: Nico Williams
Cc: Martin von Zweigbergk, Elijah Newren, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Fri, Apr 04, 2025 at 11:04:30AM -0500, Nico Williams wrote:
> On Fri, Apr 04, 2025 at 11:34:20AM +0200, Patrick Steinhardt wrote:
> > > Ah, well, Git does have an index: refs. You could use
> > > refs/change-IDs/<change-ID> to index by change ID.
> >
> > I don't think references are a good mechanism to track change IDs. The
> > expectation around refs is that users can change them basically at will,
> > but it certainly does not make any sense to let them update change IDs.
>
> git meta (a separate project) uses refs named refs/commits/$full_hash.
> Users can always break their repos in a many many ways. Using refs is
> fine.
True, they can. But if we use refs as a storage mechanism this means
that a remote user can also mess with _your_ repository by introducing a
broken `refs/change-IDs/<change-ID>` ref that points to a commit with a
different change ID.
You can of course work around this by treating such references
specially. But that would introduce more complexity into a central
subsystem of Git, and the result would probably be another mechanism
with leaky abstractions and confusing behaviour.
> > Furthermore, as we have already discussed, change IDs are not unique,
> > but refs can only point to a single commit ID. So that's another
> > mismatch that we cannot address.
>
> Oh I agree with that! But I think I've made that pretty clear by now :]
>
> However one could have refs/change-IDs/<change-ID> ultimately point to
> an object that lists the commits with that change ID, and then that
> would function as an index for non-unique change IDs.
That would be possible indeed, but now it's getting even more awkward.
So I'm not yet convinced that refs are a good fit for storing cached
change IDs.
Patrick
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-02 18:48 Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer Martin von Zweigbergk
` (4 preceding siblings ...)
2025-04-03 15:39 ` Elijah Newren
@ 2025-04-07 20:59 ` Junio C Hamano
2025-04-07 21:36 ` Nico Williams
` (2 more replies)
2025-08-19 14:04 ` Askar Safin
6 siblings, 3 replies; 118+ messages in thread
From: Junio C Hamano @ 2025-04-07 20:59 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
Martin von Zweigbergk <martinvonz@google.com> writes:
> For example, it enables
> `git rebase main <change ID>; git switch <change ID>` without
> requiring the user to look up the hash of the rewritten commit.
I do not quite see why this can be listed even as an advantage,
unless you are going to allow end users to name the changes, instead
of using auto-generated impossible-to-remember hexadecimal string
(perhaps prefixed with a single "I" or something).
> If the change id also transferred between repos and preserved by
> a forge (such as Gerrit), it enables the change id to be used to
> identify a code review.
People often talk about rebasing and rewriting in the context of
discussing "change IDs", and for 80% of the use cases where a simple
single-commit topic is involved, it would perfectly work fine.
After making a new commit C0 on top of 'main', updating 'main' with
others' changes, and then rebasing that C0 on top of updated 'main'
to produce C1, you would expect that C0 and C1 are moral equivalents
so it is natural that you wish there is a name to give to these
moral equivalents.
But stepping back a bit, if they are not just moral equivalents but
record identical changes that are so same that an earlier review of
C0 makes it unnecessary to review C1, why are you even rebasing in
the first place? Just merging C0 to the updated 'main' would retain
the earlier review made on C0 and things should merge just fine.
I have more problems with the remaining 20% use case, where you need
to deal with multiple commits.
Perhaps your initial changeset is a single commit C0 that is so
large and does too many things at once, and reviewers would
naturally advise you to split things up. You'll come up with a
series of commits, C1_0 and C1_1. The net effect of applying these
two patches may be the same as applying the original C0, but each of
them is more cleanly separated to address one issue at a time, and
the explanation given in the proposed log message more clearly
describes the issue each of them addresses. Now you gave a change ID
to C0, and want to somehow relate C1_0 and C1_1 to the original C0.
Which one gets the same change ID? Earlier one? The last one?
Both gets the same change ID?
Or your initial changeset is a two-commit series, C0_0 and C0_1, but
reviewers find that each one of them alone is not complete, and
because the issue addressed by these two is small and isolated
enough, you are advised to make them into a single commit C1. Did
you start with two change IDs for these two original commits? If
so, whose change ID the updated commit C1 inherit? Or does C1 have
two change IDs now? Or did you start with a single change ID
assigned to both of these two original commits?
Quite frankly, I think the concept of "change ID" is nice but it is
not mechanically trustable. Recording them in the trailers is fine,
but I somehow feel that they have a clear-cut semantics everybody
can agree on to deserve to be in the header part of commit objects.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-07 20:59 ` Junio C Hamano
@ 2025-04-07 21:36 ` Nico Williams
2025-04-08 12:55 ` Theodore Ts'o
2025-04-07 22:51 ` Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer Remo Senekowitsch
2025-04-08 0:10 ` Junio C Hamano
2 siblings, 1 reply; 118+ messages in thread
From: Nico Williams @ 2025-04-07 21:36 UTC (permalink / raw)
To: Junio C Hamano
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, remo, philipmetzger@bluewin.ch
On Mon, Apr 07, 2025 at 08:59:56PM +0000, Junio C Hamano wrote:
> I have more problems with the remaining 20% use case, where you need
> to deal with multiple commits.
>
> Perhaps your initial changeset is a single commit C0 that is so
> large and does too many things at once, and reviewers would
> naturally advise you to split things up. You'll come up with a
> series of commits, C1_0 and C1_1. The net effect of applying these
> two patches may be the same as applying the original C0, but each of
> them is more cleanly separated to address one issue at a time, and
> the explanation given in the proposed log message more clearly
> describes the issue each of them addresses. Now you gave a change ID
> to C0, and want to somehow relate C1_0 and C1_1 to the original C0.
> Which one gets the same change ID? Earlier one? The last one?
> Both gets the same change ID?
>
> Or your initial changeset is a two-commit series, C0_0 and C0_1, but
> reviewers find that each one of them alone is not complete, and
> because the issue addressed by these two is small and isolated
> enough, you are advised to make them into a single commit C1. Did
> you start with two change IDs for these two original commits? If
> so, whose change ID the updated commit C1 inherit? Or does C1 have
> two change IDs now? Or did you start with a single change ID
> assigned to both of these two original commits?
This is why I suggested earlier that there need to be multiple change
IDs, not just one. Perhaps one is a "code review ID" and another is
a "commit change ID". The code review ID would let you link together
all commits that were reviewed together, so if you have to split or
squash commits they would all still have that one code review ID. The
commit change ID would be shared by all sufficiently-similar versions of
a commit. If a commit is dropped or split or squashed then its commit
ID might get dropped too, but the code review ID would stay the same.
I think that a code review ID is a lot saner to use as a ref, as in OP's
`git switch ...` example. Though one might still want symbolic tag-like
names for individual commits in a set of related commits (e.g., for
forward- and back-ports), though I'm inclined to say that's not needed.
> Quite frankly, I think the concept of "change ID" is nice but it is
> not mechanically trustable. Recording them in the trailers is fine,
> but I somehow feel that they have a clear-cut semantics everybody
> can agree on to deserve to be in the header part of commit objects.
I don't think they need to have such extremely detailed semantics in
order to be able to get a header. The semantics will ultimately be
somewhat project-defined, typically something like "during code review
you can use these to related newer updates to an MR/PR/CR to older
versions" and "once integrated you can use these to find the approved
code review as follows [details]". The [details] (probably a URI
template) for finding concluded CRs might vary. The CR tool might vary.
The construction of the change IDs might vary. The intent might not
vary at all.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-07 21:36 ` Nico Williams
@ 2025-04-08 12:55 ` Theodore Ts'o
2025-04-08 15:53 ` Nico Williams
0 siblings, 1 reply; 118+ messages in thread
From: Theodore Ts'o @ 2025-04-08 12:55 UTC (permalink / raw)
To: Nico Williams
Cc: Junio C Hamano, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Mon, Apr 07, 2025 at 04:36:01PM -0500, Nico Williams wrote:
> This is why I suggested earlier that there need to be multiple change
> IDs, not just one. Perhaps one is a "code review ID" and another is
> a "commit change ID". The code review ID would let you link together
> all commits that were reviewed together, so if you have to split or
> squash commits they would all still have that one code review ID. The
> commit change ID would be shared by all sufficiently-similar versions of
> a commit. If a commit is dropped or split or squashed then its commit
> ID might get dropped too, but the code review ID would stay the same.
I think "code review ID" makes a lot of sense, although what I would
call it is "patch series ID". This has very clear semantic: it ties
commits which should be grouped together as a single higher-level set
of changes. It could be used by "git format-patch" / "git send-email"
to automatically send a group of patches as a logical unit.
I'd include the "patch series ID" in the e-mail that gets sent out, so
that "git apply-message" would be able to retain the patch series ID.
Patchwork could use the Patch series ID to automatically mark a v2
version of patch series as obsoleting the v1 version of the patch
series. So it would be a lot more useful for than just for
Gerrit-style workflows, and that's a good sign that feature makes
sense from a design perspective.
I'll note that even without the "commit change ID", just simply
knowing that one patch series is a newer version of a pre-existing
patch series is enough to allow Gerrit to intuit which commit is a
newer version of another commit. For singleton commits, nothing else
is necessary. For multi-commit patch series, gerrit could use the
one-line commit description to associate commits; it could use
ordering of the patches; it could just see which commit contents are
similar to previous commits, much like how git detects renames.
In my experience looking at how kernel developers use gerrit versus
e-mail workflows, in general, gerrit patch series tend to involve a
smaller number of commits, because looking at how various files change
between commtis is awkward; and with e-mail workflows, the patch
series tend involve a larger number of commits, because reviewing
smaller commits is easier with e-mail.
So if this true for other communities using web-based review
workflows, using an hueristics instead of a "commit change ID" might
be sufficient --- and for those communities that run into problems,
they could continue to use a gerrit-style "Change-ID: " in the footer,
with the hueristics being used if for some reason commits that don't
have the Change-ID make it into Gerrit.
> > Quite frankly, I think the concept of "change ID" is nice but it is
> > not mechanically trustable. Recording them in the trailers is fine,
> > but I somehow feel that they have a clear-cut semantics everybody
> > can agree on to deserve to be in the header part of commit objects.
>
> I don't think they need to have such extremely detailed semantics in
> order to be able to get a header. The semantics will ultimately be
> somewhat project-defined, typically something like "during code review
> you can use these to related newer updates to an MR/PR/CR to older
> versions" and "once integrated you can use these to find the approved
> code review as follows [details]". The [details] (probably a URI
> template) for finding concluded CRs might vary. The CR tool might vary.
> The construction of the change IDs might vary. The intent might not
> vary at all.
I disagree. From long experience, allowing something into an
interface that doesn't have strongly defined semantics has lead to
*huge* problems. This has certainly been the case for
Kernel<->Userspace interfaces; so my bias is that if we can't define
strong semantics, then we should probably avoid adding that interface
until we can. Otherwise, this can lead to a huge number of headaches,
both for developers and users.
People *will* develop automation tools suing an official "commit
change ID", assuming that how their project (or their forge site) uses
the ill-defined Change ID is the One True Way that the badly defined
field should be used. And other people will developer *other* tools
assuming some other interpreation for that field. And then the git
developers and users will be left trying to pick up the pieces.
Cheers,
- Ted
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-08 12:55 ` Theodore Ts'o
@ 2025-04-08 15:53 ` Nico Williams
2025-04-09 12:19 ` Theodore Ts'o
0 siblings, 1 reply; 118+ messages in thread
From: Nico Williams @ 2025-04-08 15:53 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Junio C Hamano, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Tue, Apr 08, 2025 at 08:55:21AM -0400, Theodore Ts'o wrote:
> On Mon, Apr 07, 2025 at 04:36:01PM -0500, Nico Williams wrote:
> > This is why I suggested earlier that there need to be multiple change
> > IDs, not just one. Perhaps one is a "code review ID" and another is
> > a "commit change ID". [...]
>
> I think "code review ID" makes a lot of sense, although what I would
> call it is "patch series ID". This has very clear semantic: it ties
> commits which should be grouped together as a single higher-level set
> of changes. It could be used by "git format-patch" / "git send-email"
> to automatically send a group of patches as a logical unit.
>
> [...]
Yes.
> I'll note that even without the "commit change ID", just simply
> knowing that one patch series is a newer version of a pre-existing
> patch series is enough to allow Gerrit to intuit which commit is a
> newer version of another commit. For singleton commits, nothing else
> is necessary. For multi-commit patch series, gerrit could use the
> one-line commit description to associate commits; it could use
> ordering of the patches; it could just see which commit contents are
> similar to previous commits, much like how git detects renames.
I'm not keen on CR tools "intuiting" from.. similarity checks. I don't
love Git's similarity checks for file renames. I get that for a
distributed VCS assigning something like "inode numbers" is tricky, but
as long as devs don't race to create the same files it was always
possible to have UUIDs as "inode numbers" and avoid the similarity
checks. Strictly speaking we don't even need any of these change IDs to
make it possible for tools to use similarity checks to find all versions
of a commit or patch series or whatever, but it's very nice to have
something less heuristic and more exact.
> In my experience looking at how kernel developers use gerrit versus
> e-mail workflows, in general, gerrit patch series tend to involve a
> smaller number of commits, because looking at how various files change
> between commtis is awkward; and with e-mail workflows, the patch
> series tend involve a larger number of commits, because reviewing
> smaller commits is easier with e-mail.
Yes.
> So if this true for other communities using web-based review
> workflows, using an hueristics instead of a [...]
I'm not keen :)
> > I don't think they need to have such extremely detailed semantics in
> > order to be able to get a header. The semantics will ultimately be
> > somewhat project-defined, typically something like "during code review
> > you can use these to related newer updates to an MR/PR/CR to older
> > versions" and "once integrated you can use these to find the approved
> > code review as follows [details]". The [details] (probably a URI
> > template) for finding concluded CRs might vary. The CR tool might vary.
> > The construction of the change IDs might vary. The intent might not
> > vary at all.
>
> I disagree. From long experience, allowing something into an
> interface that doesn't have strongly defined semantics has lead to
> *huge* problems. This has certainly been the case for
> Kernel<->Userspace interfaces; so my bias is that if we can't define
> strong semantics, then we should probably avoid adding that interface
> until we can. Otherwise, this can lead to a huge number of headaches,
> both for developers and users.
So how much of the [details] do you want specified? If you want to be
able to go from "change ID" to CR generically for all CR tools then the
the best -and perhaps only reasonable- way is to make the change ID a
URI. Or if you think the [details] can be elided and still have
semantics that are well-defined enough then I think you agree with me
more than you disagree :)
If we want to leave some details to be site-/project-local then perhaps
change IDs should have some type and domain/project identifier. Users
who cannot make use of that metadata (e.g., because the CR tool is not
reachable) can still use the change IDs to link commits and patch
series. I think that linking is the only thing we absolutely must
define semantics for, and the rest can be site-/project-local. IMO.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-08 15:53 ` Nico Williams
@ 2025-04-09 12:19 ` Theodore Ts'o
2025-04-09 12:56 ` Junio C Hamano
2025-04-09 16:54 ` Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer) Nico Williams
0 siblings, 2 replies; 118+ messages in thread
From: Theodore Ts'o @ 2025-04-09 12:19 UTC (permalink / raw)
To: Nico Williams
Cc: Junio C Hamano, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Tue, Apr 08, 2025 at 10:53:06AM -0500, Nico Williams wrote:
> I'm not keen on CR tools "intuiting" from.. similarity checks. I don't
> love Git's similarity checks for file renames. I get that for a
> distributed VCS assigning something like "inode numbers" is tricky, but
> as long as devs don't race to create the same files it was always
> possible to have UUIDs as "inode numbers" and avoid the similarity
> checks.
I'm not keen on fields that can have essentially random semantics.
Part of this is because today Change-ID is in the footer, and so
humans can randomly set it to any value they like. Sometimes they cut
and paste footers, and so completely unrelated commits have the same
Change-Id which show up when you do a Gerrit lookup by Chnage-Id.
Admittedly, this aspect gets better if we shove it into the git commit
header.
Part of it is because some tools will edit the Change-Id when doing a
cherry-pick. (For example, one tool that I'm familiar which is a CLI
front-end to Gerrit, when you run the command "kdt cherry-pick", will
unconditionally edit the Change-Id to a completely new value) --- and
some will not, because they are just do a "git cherry-pick" without
doing anything else. And if you live in an ecosystem where some
poeple use "git cherry-pick", and other people do "kdt cherry-pick",
you basically have *no* guarantees about how Change-Id might behave
for different commits. This *might* get better if we shove it into a
git commit header, although if you give people tools to edit the
Change-Id as part of a "git commit --amend", some tools might end up
changing the Change-Id in random ways again.
But then we have the problem where if patches get merged or split,
what Change-Id is really undefined today. I could imagine that if a
commit gets split, both descedent commits should retain the same
Change-Id. Or maybe if a patch stack gets collapsed, all of the
predecessor Change-Id should be included in that collapsed commit,
much like how an "Octopus Merge" might have a half-dozen or more
parent commits. Defining the semantics here is part of the battle;
the other part of the battle would be how would the tools make sure
these semantics get obeyed.
Perhaps one approach might be that the hueristics that you hate being
used as an automated way to sort it out, might get used to set the
semantics at commit time, with perhaps a way for the user to override
the hueristics, or where the user has to explicitly acknowledge that
the hueristics correctly noticed that the patch has changed radically
and maybe the Change-Id shouldn't be retained any more?
Finally, perhaps there should be some discussion about whether we
think git should be maintaining indexes based on the Commit-Id.
Personally, cutting and pasting a random 17 character ID is painful
and annoying, and when I see it in my shell history, I have no idea
what might have been going on. So if I need to cut and paste a
Commit-Id, I might as well cut and paste the one-line commit summary,
and do a "git log --grep" search based on that. But if the Commit-Id
is indexed, then maybe it might be more useful? I dunno....
> So how much of the [details] do you want specified? If you want to be
> able to go from "change ID" to CR generically for all CR tools then the
> the best -and perhaps only reasonable- way is to make the change ID a
> URI. Or if you think the [details] can be elided and still have
> semantics that are well-defined enough then I think you agree with me
> more than you disagree :)
Well, see above about some possible semantics. I'm *still* not
convinced even with the better-defined semantics it's worth storing
the extra baggage in the commit header. But that's more of a
value/philosophical question, much like how we "could" store explicit
file rename information in the git commit, but in the very early days
of the git design history, although BitKeeper did track file names,
Linus consciously decided to go down a much simpler path. So that's
really more of a SMTP vs X.400 preference of simplicity versus
complexity in the protocol versus implementation, which is something
where people of good will might disagree --- and there Junio's
opinions matter far more then mine. :-)
Cheers,
- Ted
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-09 12:19 ` Theodore Ts'o
@ 2025-04-09 12:56 ` Junio C Hamano
2025-04-09 19:13 ` Nico Williams
2025-04-09 16:54 ` Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer) Nico Williams
1 sibling, 1 reply; 118+ messages in thread
From: Junio C Hamano @ 2025-04-09 12:56 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Nico Williams, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
"Theodore Ts'o" <tytso@mit.edu> writes:
> ... This *might* get better if we shove it into a
> git commit header, although if you give people tools to edit the
> Change-Id as part of a "git commit --amend", some tools might end up
> changing the Change-Id in random ways again.
Thanks for pointing these out. I agree with the above, and it is
one of the reasons why I doubt that it would be a win to have this
information in the header part.
> ... So if I need to cut and paste a
> Commit-Id, I might as well cut and paste the one-line commit summary,
> and do a "git log --grep" search based on that. But if the Commit-Id
> is indexed, then maybe it might be more useful? I dunno....
If the information becomes useful enough, we will definitely start
adding index for it, just like we only have "parent" header field in
commit objects to represent transitive NxM parent-child relationship
to start with but have in the form of reachability bitmaps an index
of which commit can or cannot reach which other commits. With squash
and split you outlined above (omitted from my quote), it is likely
that you'd want similar transitive NxM predecessor-successor relationship
among a family of commits that represents patchset evolution, and an
index constructed with a similar principle should work well.
> Well, see above about some possible semantics. I'm *still* not
> convinced even with the better-defined semantics it's worth storing
> the extra baggage in the commit header. But that's more of a
> value/philosophical question, much like how we "could" store explicit
> file rename information in the git commit, but in the very early days
> of the git design history, although BitKeeper did track file names,
> Linus consciously decided to go down a much simpler path. So that's
> really more of a SMTP vs X.400 preference of simplicity versus
> complexity in the protocol versus implementation...
It is not "simpler is more manageable".
The early days' design decision, which still lives to this day, was
a bit stronger than that. As can be read from [*1*] (which by the
way I consider one of the most important message regarding the
design in early days of Git), the design started from "recording
renames is pointless".
Thanks.
[Reference]
*1* https://lore.kernel.org/git/Pine.LNX.4.58.0504150753440.7211@ppc970.osdl.org/
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-09 12:56 ` Junio C Hamano
@ 2025-04-09 19:13 ` Nico Williams
2025-04-10 8:29 ` Junio C Hamano
0 siblings, 1 reply; 118+ messages in thread
From: Nico Williams @ 2025-04-09 19:13 UTC (permalink / raw)
To: Junio C Hamano
Cc: Theodore Ts'o, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Wed, Apr 09, 2025 at 05:56:16AM -0700, Junio C Hamano wrote:
> It is not "simpler is more manageable".
>
> The early days' design decision, which still lives to this day, was
> a bit stronger than that. As can be read from [*1*] (which by the
> way I consider one of the most important message regarding the
> design in early days of Git), the design started from "recording
> renames is pointless".
>
> *1* https://lore.kernel.org/git/Pine.LNX.4.58.0504150753440.7211@ppc970.osdl.org/
Allow me to withdraw my use of similarity heuristics for renames as an
argument against similarity heuristics over change IDs. I still think
that explicit change IDs would be better than using only commit
similarity heuristics.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-09 19:13 ` Nico Williams
@ 2025-04-10 8:29 ` Junio C Hamano
2025-04-10 21:40 ` Martin von Zweigbergk
0 siblings, 1 reply; 118+ messages in thread
From: Junio C Hamano @ 2025-04-10 8:29 UTC (permalink / raw)
To: Nico Williams
Cc: Theodore Ts'o, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
Nico Williams <nico@cryptonector.com> writes:
> argument against similarity heuristics over change IDs. I still think
> that explicit change IDs would be better than using only commit
> similarity heuristics.
I do think it makes sense to explicitly record that this commit was
(or "these commits were") created to refine and replace that commit
(or "these other commits"), if we want to keep track of how a set of
patches evolved, if such a determination can be reliably done. And
I suspect that IDEs can do a much better job keeping track of such
correspondence than they can keep track of renames and copies, which
I mentioned in an earlier message.
It is insufficient to just record a single "change ID" to each
commit, in order to handle anything other than "a single commit gets
updated by another single commit" case. It is insufficient to even
keep track of "a single commit gets updated by another single
commit, which in turn gets updated by yet another single commit"
case, without assuming globally synchronised clock in a distributed
environment, simply because you only have three commit objects that
share the same "change ID" string among themselves, and you cannot
tell between A becoming B becoming C (in which case people would
consider C is the latest in the iterations), or two developers
started from A to produce B and C indenendently (in which case it is
not yet decided which one between B and C should be considered the
latest).
Since we are all human, it is possible that we think things through
and make a design as complete as humanly possible but it later turns
out to be insufficient. If we make such a mistake, we'd then need
to deal with it and that is just simply a part of developers' life.
But something that is _known_ to be structurally insufficient before
it is added to the system? We should refuse to make such a thing a
part of very core part of the data structure, like the header fields
in commit objects.
Thanks.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-10 8:29 ` Junio C Hamano
@ 2025-04-10 21:40 ` Martin von Zweigbergk
0 siblings, 0 replies; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-04-10 21:40 UTC (permalink / raw)
To: Junio C Hamano
Cc: Nico Williams, Theodore Ts'o, Git Mailing List, Edwin Kempin,
Scott Chacon, remo, philipmetzger@bluewin.ch
On Thu, 10 Apr 2025 at 01:29, Junio C Hamano <gitster@pobox.com> wrote:
>
> Nico Williams <nico@cryptonector.com> writes:
>
> > argument against similarity heuristics over change IDs. I still think
> > that explicit change IDs would be better than using only commit
> > similarity heuristics.
>
> I do think it makes sense to explicitly record that this commit was
> (or "these commits were") created to refine and replace that commit
> (or "these other commits"), if we want to keep track of how a set of
> patches evolved, if such a determination can be reliably done. And
>
> I suspect that IDEs can do a much better job keeping track of such
> correspondence than they can keep track of renames and copies, which
> I mentioned in an earlier message.
>
> It is insufficient to just record a single "change ID" to each
> commit, in order to handle anything other than "a single commit gets
> updated by another single commit" case. It is insufficient to even
> keep track of "a single commit gets updated by another single
> commit, which in turn gets updated by yet another single commit"
> case, without assuming globally synchronised clock in a distributed
> environment, simply because you only have three commit objects that
> share the same "change ID" string among themselves, and you cannot
> tell between A becoming B becoming C (in which case people would
> consider C is the latest in the iterations), or two developers
> started from A to produce B and C indenendently (in which case it is
> not yet decided which one between B and C should be considered the
> latest).
>
> Since we are all human, it is possible that we think things through
> and make a design as complete as humanly possible but it later turns
> out to be insufficient. If we make such a mistake, we'd then need
> to deal with it and that is just simply a part of developers' life.
>
> But something that is _known_ to be structurally insufficient before
> it is added to the system? We should refuse to make such a thing a
> part of very core part of the data structure, like the header fields
> in commit objects.
I think we are talking about slightly different things. The change ID
proposal is about providing a stable way of referring to an evolving
commit. You can think of it almost like an automatically generated Git
branch name that follows the commit as it's rewritten (as if you had
passed `--update-refs` to every command). I think what you're
describing is more like the "Git Evolve" proposal [1] (also linked to
from elsewhere in this thread). I think that's also an interesting
feature but I see it as mostly a separate feature. There is certainly
overlap between the two features, such as how the simple centralized
flow I mentioned can work pretty well by relying only on the change
ID.
Change IDs in Jujutsu are very useful without support for changeset
evolution. With the indexing I mentioned and the prefix lookup
prioritizing "mutable commits" (roughly those that are not on a
remote), it's quite convenient to run `jj log` and see a highlighted
prefix of, say, "xu", and then you can do e.g. `jj show xu` instead of
having to paste a longer ID or type a branch name. A further advantage
of preferring change IDs over commit IDs in commands is that you don't
risk creating "divergent" commits (similar copies) by rewriting the
same commit twice. For example, `jj describe xu -m foo && jj describe
xu -m bar` will rewrite the original commit to have message "foo" and
then rewrite the rewritten commit to have message "bar". I understand
that this use case is less useful to Git users because Git doesn't
like to work in detached HEAD mode and doesn't rewrite descendants
automatically. Consider experimenting with jj to get a better sense of
how it works :)
[1] https://lore.kernel.org/git/pull.1356.git.1663959324.gitgitgadget@gmail.com/
^ permalink raw reply [flat|nested] 118+ messages in thread
* Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-09 12:19 ` Theodore Ts'o
2025-04-09 12:56 ` Junio C Hamano
@ 2025-04-09 16:54 ` Nico Williams
2025-04-09 18:02 ` Junio C Hamano
2025-04-14 19:54 ` D. Ben Knoble
1 sibling, 2 replies; 118+ messages in thread
From: Nico Williams @ 2025-04-09 16:54 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Junio C Hamano, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Wed, Apr 09, 2025 at 08:19:24AM -0400, Theodore Ts'o wrote:
> On Tue, Apr 08, 2025 at 10:53:06AM -0500, Nico Williams wrote:
> > I'm not keen on CR tools "intuiting" from.. similarity checks.
> > [...]
>
> I'm not keen on fields that can have essentially random semantics.
> Part of this is because today Change-ID is in the footer, and so
> humans can randomly set it to any value they like. Sometimes they cut
> and paste footers, and so completely unrelated commits have the same
> Change-Id which show up when you do a Gerrit lookup by Chnage-Id.
> Admittedly, this aspect gets better if we shove it into the git commit
> header.
>
> Part of it is because some tools will edit the Change-Id when doing a
> cherry-pick. [...]
I was only proposing to leave some details out, not to have completely
undefined semantics. The particular details we might want to leave out
are about resolving change IDs to URIs. In particular this editing of
change IDs on cherry-pick you mention has to not be permitted, or
perhaps a new change ID could be added -- i.e., are these headers
single-valued or multi-valued?
Let's nail down the semantics of these change ID headers. Here is a
proposal to bang on:
- change IDs get preserved on cherry-pick and on `pick`s in rebases
- users can manually remove or change these change IDs, naturally,
though generall they would not
- the actual change IDs are either free-form or they are URIs -- pick
one, but if they are URIs they should be URIs to CRs, and approved
CRs should perhaps have links to integration reports etc.
- there should be one header for a change ID for the patch series (the
MR/PR/whateverR); patch series IDs can be shared by many commits in
one branch, so they are not in any way unique
- there may be one header for a change ID for each commit, which should
be unique in any _branch_, but not unique in any repo (due to back-
and forward-ports for example)
- there should be another header to list change IDs from which a commit
was derived that nonetheless has a different commit change ID
- these headers should be multi-valued to handle squashes and merges
- if a commit change ID is missing but a path series change ID is
present then similarity checks could be used to link multiple
versions of any one such commit
Optional:
- a commit change ID could be used as a ref to an object that lists the
commits that have that change ID
- a patch series change ID could be used as a ref to an object that lists
the head commit of of that patch series in every branch that contains
it
> Perhaps one approach might be that the hueristics that you hate being
> used as an automated way to sort it out, might get used to set the
> semantics at commit time, with perhaps a way for the user to override
> the hueristics, or where the user has to explicitly acknowledge that
> the hueristics correctly noticed that the patch has changed radically
> and maybe the Change-Id shouldn't be retained any more?
Yes, heuristics can be used to help the user make such decisions. I've
no issue with that.
> Finally, perhaps there should be some discussion about whether we
> think git should be maintaining indexes based on the Commit-Id.
If they can be refs, then they should be. Since they can't be unique
the ref should be to an object listing the actual commits (see above).
There could also be a non-ref index for these.
> Personally, cutting and pasting a random 17 character ID is painful
> and annoying, and when I see it in my shell history, I have no idea
> what might have been going on. So if I need to cut and paste a
> Commit-Id, I might as well cut and paste the one-line commit summary,
> and do a "git log --grep" search based on that. But if the Commit-Id
> is indexed, then maybe it might be more useful? I dunno....
+1
> Well, see above about some possible semantics. I'm *still* not
> convinced even with the better-defined semantics it's worth storing
> the extra baggage in the commit header. But that's more of a
> value/philosophical question, much like how we "could" store explicit
> file rename information in the git commit, but in the very early days
> of the git design history, although BitKeeper did track file names,
> Linus consciously decided to go down a much simpler path. So that's
> really more of a SMTP vs X.400 preference of simplicity versus
> complexity in the protocol versus implementation, which is something
> where people of good will might disagree --- and there Junio's
> opinions matter far more then mine. :-)
I don't find file rename heuristics to be "simple", and they're often
wrong, though I've fully internalized that copies and renames have to be
done alone in separate commits with no contents changes so as to make
incorrect rename determinations much less likely.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-09 16:54 ` Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer) Nico Williams
@ 2025-04-09 18:02 ` Junio C Hamano
2025-04-09 18:35 ` Nico Williams
2025-04-14 19:54 ` D. Ben Knoble
1 sibling, 1 reply; 118+ messages in thread
From: Junio C Hamano @ 2025-04-09 18:02 UTC (permalink / raw)
To: Nico Williams
Cc: Theodore Ts'o, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
Nico Williams <nico@cryptonector.com> writes:
> I don't find file rename heuristics to be "simple", and they're often
I think you need to re-read what Tytso wrote before reacting. He
never said rename and copy heuristics is simple. His statement was
that choosing *NOT* to record the "file A was renamed to B and C was
copied to D in this commit" in commit objects can be thought as a
choice to keep the system simple.
But the decision not to record renames and copies in commits is not
about simplicity but it is more about correctness. People record
wrong renames and not all renaming changes are made with "git mv".
Through your GUI IDE, you may choose "Copy file" menu and make a
copy of an existing file F to file G, and edit file G extensively
and record the result in a commit. No matter how heavy the edit
after copying is, your IDE would remember that G came originally by
copying from F so it should be able to record the fact that G was
copied from F in the resulting commit.
But recording that as a copy may be a *wrong* thing to do in the
first place. The IDE cannot guess *why* you copied F to G in the
first place. Perhaps it was because your project requires you to
reproduce boilerplate license notice in the comment at the beginning
of each and every file, and copying an existing file as a whole,
removing everything after that initial boilderplate comment, and
write everything else afresh was the easiest way for you to work.
If you inspect the resulting change brought in by the commit, with
intelligence, a human may say "G is a completely new file, even
though it shares the leading boilerplate comment with F and every
other file in the project". Your IDE wouldn't say that.
We designed not to etch such wrong renames/copoies in stone by
recording them at the commit time. Instead we compare the before
and after image to intuit the _intention_ of what the user wanted to
do _when_ you _ask_ (i.e. when you run "git diff" or "git log").
This has an additional benefit that the heuristics to find out about
renames and copies *can* be improved long after commits are made, so
a commit that used to be misidentified to have renamed a file (when
there is no such renaming) may later say that a file is removed and
a new file is made independently with newer version of Git.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-09 18:02 ` Junio C Hamano
@ 2025-04-09 18:35 ` Nico Williams
2025-04-09 19:14 ` Eric Sunshine
2025-04-10 13:44 ` Theodore Ts'o
0 siblings, 2 replies; 118+ messages in thread
From: Nico Williams @ 2025-04-09 18:35 UTC (permalink / raw)
To: Junio C Hamano
Cc: Theodore Ts'o, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Wed, Apr 09, 2025 at 11:02:41AM -0700, Junio C Hamano wrote:
> Nico Williams <nico@cryptonector.com> writes:
>
> > I don't find file rename heuristics to be "simple", and they're often
>
> I think you need to re-read what Tytso wrote before reacting. He
> never said rename and copy heuristics is simple. His statement was
> that choosing *NOT* to record the "file A was renamed to B and C was
> copied to D in this commit" in commit objects can be thought as a
> choice to keep the system simple.
I was using file rename heuristics to explain that I wouldn't like more
of the same for other things; I was not trying to litigate renames.
I'm trying to litigate the _addition_ of more similarity-based
heuristics for _other_ things.
If similarity heuristics were enough for CR tools then none would have
introduced anything like change IDs. Or perhaps CR tools authors have
been flat out wrong to not try or use similarity heuristics exclusively
over change IDs. That's a topic worth discussing. I've stated my
preference for not relying solely on similarity heuristics.
> [...]
>
> But recording that as a copy may be a *wrong* thing to do in the
> first place. The IDE cannot guess *why* you copied F to G in the
> first place. [...]
Exactly, software can't read human minds, which is why recording
user-expressed intent (rename, copy) would be a nice option to have.
> If you inspect the resulting change brought in by the commit, with
> intelligence, a human may say "G is a completely new file, even
> though it shares the leading boilerplate comment with F and every
> other file in the project". Your IDE wouldn't say that.
Right, the IDE should let one rename/copy the file _and_ choose to ether
record that as a rename/copy in the version control system _or not_.
Only the user can truly know their own intent.
Sometimes users may slip up and not record their intent correctly,
perhaps because going back to fix an earlier choice is ETOOHARD to
figure out how to do in their IDE's UI. Fair enough. So similarity
checks can help one understand history, but where one can have intent
recorded, that's a better indicator. IMO.
> We designed not to etch such wrong renames/copoies in stone by
> recording them at the commit time. Instead we compare the before
> and after image to intuit the _intention_ of what the user wanted to
> do _when_ you _ask_ (i.e. when you run "git diff" or "git log").
Well, I suspect more likely that Linus didn't want to have some sort of
inode number nor some sort of explicit rename/copy indication as a
significant simplification that allowed Git to get shipped sooner. I'm
not questioning that nor trying to litigate rename/copy.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-09 18:35 ` Nico Williams
@ 2025-04-09 19:14 ` Eric Sunshine
2025-04-09 19:31 ` Nico Williams
2025-04-10 13:44 ` Theodore Ts'o
1 sibling, 1 reply; 118+ messages in thread
From: Eric Sunshine @ 2025-04-09 19:14 UTC (permalink / raw)
To: Nico Williams
Cc: Junio C Hamano, Theodore Ts'o, Martin von Zweigbergk,
Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Wed, Apr 9, 2025 at 2:51 PM Nico Williams <nico@cryptonector.com> wrote:
> On Wed, Apr 09, 2025 at 11:02:41AM -0700, Junio C Hamano wrote:
> > We designed not to etch such wrong renames/copoies in stone by
> > recording them at the commit time. Instead we compare the before
> > and after image to intuit the _intention_ of what the user wanted to
> > do _when_ you _ask_ (i.e. when you run "git diff" or "git log").
>
> Well, I suspect more likely that Linus didn't want to have some sort of
> inode number nor some sort of explicit rename/copy indication as a
> significant simplification that allowed Git to get shipped sooner. I'm
> not questioning that nor trying to litigate rename/copy.
Contrary to your suspicion, what Junio describes above was a conscious
and deliberate design decision by Linus[*].
[*]: https://lore.kernel.org/git/Pine.LNX.4.58.0504150753440.7211@ppc970.osdl.org/
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-09 19:14 ` Eric Sunshine
@ 2025-04-09 19:31 ` Nico Williams
0 siblings, 0 replies; 118+ messages in thread
From: Nico Williams @ 2025-04-09 19:31 UTC (permalink / raw)
To: Eric Sunshine
Cc: Junio C Hamano, Theodore Ts'o, Martin von Zweigbergk,
Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Wed, Apr 09, 2025 at 03:14:46PM -0400, Eric Sunshine wrote:
> Contrary to your suspicion, what Junio describes above was a conscious
> and deliberate design decision by Linus[*].
Fair enough. Please look past that.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-09 18:35 ` Nico Williams
2025-04-09 19:14 ` Eric Sunshine
@ 2025-04-10 13:44 ` Theodore Ts'o
2025-04-10 16:18 ` Junio C Hamano
1 sibling, 1 reply; 118+ messages in thread
From: Theodore Ts'o @ 2025-04-10 13:44 UTC (permalink / raw)
To: Nico Williams
Cc: Junio C Hamano, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Wed, Apr 09, 2025 at 01:35:45PM -0500, Nico Williams wrote:
> I was using file rename heuristics to explain that I wouldn't like more
> of the same for other things; I was not trying to litigate renames.
>
> I'm trying to litigate the _addition_ of more similarity-based
> heuristics for _other_ things.
>
> If similarity heuristics were enough for CR tools then none would have
> introduced anything like change IDs. Or perhaps CR tools authors have
> been flat out wrong to not try or use similarity heuristics exclusively
> over change IDs. That's a topic worth discussing. I've stated my
> preference for not relying solely on similarity heuristics.
There is quite a lot of similarity between trying to record file names
(and having the concept of "inode numbers" for files tracked by git),
and the discussion we've had about how to track user intent when a
commit gets split or merged, and having a "Change-ID" which exactly
functions like an "inode number", except for an individual commit
intead of a file.
The arguments about why we don't have an "inode number" for files,
because it *is* complicated and hard to getr right, are *precisely*
the same argument for why I remmain unconvinced that having an "inode
number" of the semantic idea of a commit (read: Change-Id).
If you are someone who very much believes in the importance of doing
per-commit Code Review using something like Gerrit, then you might
think that a Change-ID is more *important* than an "inode number", and
so it is therefore worth the greater amount of complexity and/or
ambiguity when the Change-Id gets subject to the same levels of
incorrectness that having the IDE track the user intent behind a file
copy or rename might have. That's a value judgement, and there's no
real right answer here.
After all, there are still people, for example as seen on a thread on
the The Unix Heritage Sociey mailing list, who have argued that git
is a hot mess because we don't track file renames and copies the way
"real" source code management systems like BitKeeper and Perforce does
things. I happen to disagree, but that's a value judgement about
what's important in a SCM design.
Regardless how we come out on whethe having an "inode number" for the
high-level semantic value of a commit is worth it, I do think having a
"patch set ID" which ties related commits together does make sense,
though. That would solve some interesting problems both for the
web/forge review workflow as wel as the mailing list review workflow.
I'd be curious what people might think about that.
Cheers,
- Ted
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-10 13:44 ` Theodore Ts'o
@ 2025-04-10 16:18 ` Junio C Hamano
2025-04-11 15:48 ` Theodore Ts'o
0 siblings, 1 reply; 118+ messages in thread
From: Junio C Hamano @ 2025-04-10 16:18 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Nico Williams, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
"Theodore Ts'o" <tytso@mit.edu> writes:
> Regardless how we come out on whethe having an "inode number" for the
> high-level semantic value of a commit is worth it, I do think having a
> "patch set ID" which ties related commits together does make sense,
> though. That would solve some interesting problems both for the
> web/forge review workflow as wel as the mailing list review workflow.
> I'd be curious what people might think about that.
As a concept, I agree that a mechanism to identify these iterations
of the same topic collectively is a very valuable thing to have.
FWIW, I use the Message-ID of the cover letter e-mail as a rough
approximation for "patch set ID", and it is quite usable once you
train your contributors to always make the cover letter for
iteration N a direct reply to the cover letter for iteration N-1,
and also make the individual patches a direct reply to the cover
letter for the same iteration.
Then visiting lore.kernel.org/$mid/ will give me at a glance some
essential information about the series, like
- how hotly the topic is being discussed?
- does the iteration $mid I happened to have picked the latest, a
bit older, or irrelevantly older?
- has the topic been extending its scope?
Thanks to the "cover for iteration N is a direct response for
iteration N-1" and "cover is marked as [PATCH 0/$n]" conventions,
"b4 am" grabs, by default, the patches from the latest iteration
when given the message-id of the cover letter of any iteration of a
patch set. Because most of the time a consumer of an evolving
patchset is interested in the latest iteration (unless the
contributor screws up, in which case we may need to go back and
explicitly grab an older iteration), I never felt a need for an
official "patch set ID", though.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-10 16:18 ` Junio C Hamano
@ 2025-04-11 15:48 ` Theodore Ts'o
2025-04-11 16:38 ` Konstantin Ryabitsev
2025-04-11 17:44 ` Junio C Hamano
0 siblings, 2 replies; 118+ messages in thread
From: Theodore Ts'o @ 2025-04-11 15:48 UTC (permalink / raw)
To: Junio C Hamano
Cc: Nico Williams, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Thu, Apr 10, 2025 at 09:18:56AM -0700, Junio C Hamano wrote:
> Thanks to the "cover for iteration N is a direct response for
> iteration N-1" and "cover is marked as [PATCH 0/$n]" conventions,
Even if the cover for iteration N isn't a reply-to the cover for
interation N-1, b4 will search based on the subject line for a cover
letter with higher version number, and this mostly works.
My one (admittedly minor) pain point is where someone replies to a
patch series with something like "you should really also fix FOO", and
then someone replies with a single patch (without a cover letter,
possibly created with git; possibly not) that addresses issue FOO.
This can confuse "b4 am -c" into thinking that the patch to address
FOO was in fact a newer version of the patch being reviewed. It's not
a big deal; I can deal with this manually. But having a patch set ID
would help with this.
The other things that would help with having an official patch set ID
would be to allow patchwork to automatically supercede an older
version of the patch series (possibly with a link to the older version
of the patch series in the Web UI).
I'd also love if lore.kernel.org and maybe b4 also had an automatic
way to get at the older versions of the patch series, and the patch
set ID would help with the automation. Admittedly it's not strictly
speaking necessary, since b4 is already using the cover letter subject
line to search newer versions of the patch series. The number of
messages it would need to search to find older versions would be
greater, though.
Cheers,
- Ted
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-11 15:48 ` Theodore Ts'o
@ 2025-04-11 16:38 ` Konstantin Ryabitsev
2025-04-11 17:44 ` Junio C Hamano
1 sibling, 0 replies; 118+ messages in thread
From: Konstantin Ryabitsev @ 2025-04-11 16:38 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Junio C Hamano, Nico Williams, Martin von Zweigbergk,
Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Fri, Apr 11, 2025 at 11:48:39AM -0400, Theodore Ts'o wrote:
> On Thu, Apr 10, 2025 at 09:18:56AM -0700, Junio C Hamano wrote:
> > Thanks to the "cover for iteration N is a direct response for
> > iteration N-1" and "cover is marked as [PATCH 0/$n]" conventions,
>
> Even if the cover for iteration N isn't a reply-to the cover for
> interation N-1, b4 will search based on the subject line for a cover
> letter with higher version number, and this mostly works.
Note, that we try to use the series change-id for this, if we find it. We only
fall back to matching by subject (+author) if that's not present.
> My one (admittedly minor) pain point is where someone replies to a
> patch series with something like "you should really also fix FOO", and
> then someone replies with a single patch (without a cover letter,
> possibly created with git; possibly not) that addresses issue FOO.
>
> This can confuse "b4 am -c" into thinking that the patch to address
> FOO was in fact a newer version of the patch being reviewed. It's not
> a big deal; I can deal with this manually. But having a patch set ID
> would help with this.
I'm not sure there's ever going to be a clear "do what I mean" solution here.
We try to pick the most common course of action in such case, which is to
assume that it's a quick followup bugfix for the patch.
> I'd also love if lore.kernel.org and maybe b4 also had an automatic
> way to get at the older versions of the patch series, and the patch
> set ID would help with the automation.
You can, for patches sent with b4 that contain change-id. E.g.:
https://lore.kernel.org/all/?q=changeid%3A20250313-a4-a5-reset-6696e5b18e10
https://lore.kernel.org/lkml/?q=changeid%3A20250313-try_with-cc9f91dd3b60
-K
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-11 15:48 ` Theodore Ts'o
2025-04-11 16:38 ` Konstantin Ryabitsev
@ 2025-04-11 17:44 ` Junio C Hamano
2025-04-12 23:13 ` Theodore Ts'o
1 sibling, 1 reply; 118+ messages in thread
From: Junio C Hamano @ 2025-04-11 17:44 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Nico Williams, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
"Theodore Ts'o" <tytso@mit.edu> writes:
> On Thu, Apr 10, 2025 at 09:18:56AM -0700, Junio C Hamano wrote:
>> Thanks to the "cover for iteration N is a direct response for
>> iteration N-1" and "cover is marked as [PATCH 0/$n]" conventions,
>
> Even if the cover for iteration N isn't a reply-to the cover for
> interation N-1, b4 will search based on the subject line for a cover
> letter with higher version number, and this mostly works.
That is nice.
> My one (admittedly minor) pain point is where someone replies to a
> patch series with something like "you should really also fix FOO", and
> then someone replies with a single patch (without a cover letter,
> possibly created with git; possibly not) that addresses issue FOO.
>
> This can confuse "b4 am -c" into thinking that the patch to address
> FOO was in fact a newer version of the patch being reviewed. It's not
> a big deal; I can deal with this manually. But having a patch set ID
> would help with this.
Excellent. I've seen this happen often; even though I usually pick
these small things up directly from within my newsreader, it would
be unpleasant when it happens when you are trying to grab a large
series with "b4 am".
The submitting contributor must make a conscious arrangement to give
a "patch set ID" shared among the messages in a single iteration,
and everybody who are responding must make sure they do not add the
same ID to the messages they throw at the thread in response. Those
who use format-patch and send-email can do that with convention and
automation and there is no reason to rely on In-Reply-To: header
(which may confuse the automated recipient of manually created
follow-up messages).
> The other things that would help with having an official patch set ID
> would be to allow patchwork to automatically supercede an older
> version of the patch series (possibly with a link to the older version
> of the patch series in the Web UI).
Lovely.
> I'd also love if lore.kernel.org and maybe b4 also had an automatic
> way to get at the older versions of the patch series, and the patch
> set ID would help with the automation. Admittedly it's not strictly
> speaking necessary, since b4 is already using the cover letter subject
> line to search newer versions of the patch series. The number of
> messages it would need to search to find older versions would be
> greater, though.
True.
Thanks.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-11 17:44 ` Junio C Hamano
@ 2025-04-12 23:13 ` Theodore Ts'o
2025-04-14 15:13 ` Junio C Hamano
2025-04-15 21:38 ` Jacob Keller
0 siblings, 2 replies; 118+ messages in thread
From: Theodore Ts'o @ 2025-04-12 23:13 UTC (permalink / raw)
To: Junio C Hamano
Cc: Nico Williams, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Fri, Apr 11, 2025 at 10:44:43AM -0700, Junio C Hamano wrote:
>
> The submitting contributor must make a conscious arrangement to give
> a "patch set ID" shared among the messages in a single iteration,
> and everybody who are responding must make sure they do not add the
> same ID to the messages they throw at the thread in response. Those
> who use format-patch and send-email can do that with convention and
> automation and there is no reason to rely on In-Reply-To: header
> (which may confuse the automated recipient of manually created
> follow-up messages).
So it all depends on how the patch set ID is implemented. Here's one
way that I had in mind. The reason why I like like this over the
Change-ID approach is that the semantics can be very clearly defined,
and the only thing we rely on is the user saying "this new commit is
part of patch series which I'm putting together".
By default when creating a new commit, the field is empty (in which
case the patch set ID is presumed to be the same as the commit ID), or
if the user gives a command-line flag say, "git commit --series"
which indicates that it is part of a patch series in which case the
patch set ID of the commit is set to the patch set ID of the current
commit (i.e., eventully, its parent commit).
Whenever the commit is amended or rebased or cherry picked, if the
patch series ID is NULL, then it is set to the original commit ID.
Otherwise, the existing patch set ID is preserved.
The patch set ID will be output by git format-patch (perhaps as "Patch
Series ID: sha has" immediately after the --- line. And if it is
present, "git am" will import that patch series ID into git commit
which creates when it sucks in the e-mail.
The net affect of this is that for new versions of git which implement
the Patch Set ID, all new commits are treated as patch series of
length 1, unless a subsequent commit is created using "git commit
--series". And the Patch Set ID will be preserved across
cherry-picks, rebase operations, and git send-email/git apply-message
operations.
So if someone replies to an existing e-mail thread with a new commit,
git format-patch will give it a different patch set ID, so we can
distinguish it from an amended copy of a patch in the patch series.
It also means that singleton commits, the patch ID effectively acts
much like the tranditonal Change-ID. For multi-commit patch series,
all of the commits will have the same patch set ID.
- Ted
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-12 23:13 ` Theodore Ts'o
@ 2025-04-14 15:13 ` Junio C Hamano
2025-04-15 22:30 ` Remo Senekowitsch
2025-04-15 21:38 ` Jacob Keller
1 sibling, 1 reply; 118+ messages in thread
From: Junio C Hamano @ 2025-04-14 15:13 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Nico Williams, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
"Theodore Ts'o" <tytso@mit.edu> writes:
> On Fri, Apr 11, 2025 at 10:44:43AM -0700, Junio C Hamano wrote:
>>
>> The submitting contributor must make a conscious arrangement to give
>> a "patch set ID" shared among the messages in a single iteration,
>> and everybody who are responding must make sure they do not add the
>> same ID to the messages they throw at the thread in response. Those
>> who use format-patch and send-email can do that with convention and
>> automation and there is no reason to rely on In-Reply-To: header
>> (which may confuse the automated recipient of manually created
>> follow-up messages).
>
> So it all depends on how the patch set ID is implemented. Here's one
> way that I had in mind. The reason why I like like this over the
> Change-ID approach is that the semantics can be very clearly defined,
> and the only thing we rely on is the user saying "this new commit is
> part of patch series which I'm putting together".
>
> By default when creating a new commit, the field is empty (in which
> case the patch set ID is presumed to be the same as the commit ID), or
> if the user gives a command-line flag say, "git commit --series"
> which indicates that it is part of a patch series in which case the
> patch set ID of the commit is set to the patch set ID of the current
> commit (i.e., eventully, its parent commit).
>
> Whenever the commit is amended or rebased or cherry picked, if the
> patch series ID is NULL, then it is set to the original commit ID.
> Otherwise, the existing patch set ID is preserved.
>
> The patch set ID will be output by git format-patch (perhaps as "Patch
> Series ID: sha has" immediately after the --- line. And if it is
> present, "git am" will import that patch series ID into git commit
> which creates when it sucks in the e-mail.
>
> The net affect of this is that for new versions of git which implement
> the Patch Set ID, all new commits are treated as patch series of
> length 1, unless a subsequent commit is created using "git commit
> --series". And the Patch Set ID will be preserved across
> cherry-picks, rebase operations, and git send-email/git apply-message
> operations.
>
> So if someone replies to an existing e-mail thread with a new commit,
> git format-patch will give it a different patch set ID, so we can
> distinguish it from an amended copy of a patch in the patch series.
>
> It also means that singleton commits, the patch ID effectively acts
> much like the tranditonal Change-ID. For multi-commit patch series,
> all of the commits will have the same patch set ID.
Yeah, I like that aspect the best---the case for single commit
series falling out as a natural degenerate case of the more general
case to support multi-commit series is a good sign that the design
got something right ;-)
I am still not sure what to think about the lack of explicit the
evolution history of one patch set that share the same patch set ID.
When we have 10 commits that share the same patch set ID, I can
imagine that we can easily tell 3 are from one iteration, and 3 and
4 among the rest are from another two iterations by noticing that
there are three strand of pearls, having 3, 3, and 4 commits on it.
And we can identify the initial round by noticing that one of the
commits have its name as the patch set ID, but I am not sure if we
should be OK by not having anything but the committter timestamp to
tell which one among the other two iterations are earlier, and we
cannot tell anything about these two other iterations if they are
independent rewrites of the original round.
But other than that, I like something with clearly defined semantics
(and the definition coming naturally out of the structure, not out
of some arbitrary convention that forces to bring in some
semantics), and what you outlined above looks reasonably clean and
easy to use.
Thanks.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-14 15:13 ` Junio C Hamano
@ 2025-04-15 22:30 ` Remo Senekowitsch
2025-04-16 0:09 ` Junio C Hamano
2025-04-16 0:21 ` Jacob Keller
0 siblings, 2 replies; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-15 22:30 UTC (permalink / raw)
To: Junio C Hamano, Theodore Ts'o
Cc: Nico Williams, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, philipmetzger@bluewin.ch
On Mon Apr 14, 2025 at 5:13 PM CEST, Junio C Hamano wrote:
> "Theodore Ts'o" <tytso@mit.edu> writes:
>
>> On Fri, Apr 11, 2025 at 10:44:43AM -0700, Junio C Hamano wrote:
>>>
>>> The submitting contributor must make a conscious arrangement to give
>>> a "patch set ID" shared among the messages in a single iteration,
>>> and everybody who are responding must make sure they do not add the
>>> same ID to the messages they throw at the thread in response. Those
>>> who use format-patch and send-email can do that with convention and
>>> automation and there is no reason to rely on In-Reply-To: header
>>> (which may confuse the automated recipient of manually created
>>> follow-up messages).
>>
>> So it all depends on how the patch set ID is implemented. Here's one
>> way that I had in mind. The reason why I like like this over the
>> Change-ID approach is that the semantics can be very clearly defined,
>> and the only thing we rely on is the user saying "this new commit is
>> part of patch series which I'm putting together".
>>
>> By default when creating a new commit, the field is empty (in which
>> case the patch set ID is presumed to be the same as the commit ID), or
>> if the user gives a command-line flag say, "git commit --series"
>> which indicates that it is part of a patch series in which case the
>> patch set ID of the commit is set to the patch set ID of the current
>> commit (i.e., eventully, its parent commit).
>>
>> Whenever the commit is amended or rebased or cherry picked, if the
>> patch series ID is NULL, then it is set to the original commit ID.
>> Otherwise, the existing patch set ID is preserved.
>>
>> The patch set ID will be output by git format-patch (perhaps as "Patch
>> Series ID: sha has" immediately after the --- line. And if it is
>> present, "git am" will import that patch series ID into git commit
>> which creates when it sucks in the e-mail.
>>
>> The net affect of this is that for new versions of git which implement
>> the Patch Set ID, all new commits are treated as patch series of
>> length 1, unless a subsequent commit is created using "git commit
>> --series". And the Patch Set ID will be preserved across
>> cherry-picks, rebase operations, and git send-email/git apply-message
>> operations.
>>
>> So if someone replies to an existing e-mail thread with a new commit,
>> git format-patch will give it a different patch set ID, so we can
>> distinguish it from an amended copy of a patch in the patch series.
>>
>> It also means that singleton commits, the patch ID effectively acts
>> much like the tranditonal Change-ID. For multi-commit patch series,
>> all of the commits will have the same patch set ID.
>
> Yeah, I like that aspect the best---the case for single commit
> series falling out as a natural degenerate case of the more general
> case to support multi-commit series is a good sign that the design
> got something right ;-)
>
> I am still not sure what to think about the lack of explicit the
> evolution history of one patch set that share the same patch set ID.
>
> When we have 10 commits that share the same patch set ID, I can
> imagine that we can easily tell 3 are from one iteration, and 3 and
> 4 among the rest are from another two iterations by noticing that
> there are three strand of pearls, having 3, 3, and 4 commits on it.
> And we can identify the initial round by noticing that one of the
> commits have its name as the patch set ID, but I am not sure if we
> should be OK by not having anything but the committter timestamp to
> tell which one among the other two iterations are earlier, and we
> cannot tell anything about these two other iterations if they are
> independent rewrites of the original round.
>
> But other than that, I like something with clearly defined semantics
> (and the definition coming naturally out of the structure, not out
> of some arbitrary convention that forces to bring in some
> semantics), and what you outlined above looks reasonably clean and
> easy to use.
Doesn't a patch set ID suffer from the same kind of ambiguity the
change-id supposedly does? Patch sets can be split and merged, a commit
from one patch set can be cherry-picked into another. What patch set ID
should such a cherry-picked commit have?
And I think the argument that a change-id for a singleton patch
set naturally falls out of the patch set ID can easily be reversed.
Admittedly, I don't have the most experience with the mailing list
workflow, but a multi-commit patch set usually comes with a cover
letter, right? And people like to track their cover letter in a commit?
IIUC, b4 is designed around that too.
In that case, the cover letter has its own change-id as any other
commit, which will naturally remain stable across every version of the
patch set. It would be non-sensical to squash, split or cherry-pick the
cover letter commit. Sounds like a great candidate for the patch set ID.
So the patch set ID can just as naturally flow out from the change-id.
I can see two concrete disadvantages of the patch set ID:
* It's strictly less powerful. As explained, the change-id can do
everything the patch set ID can via the cover letter. But the patch
set ID cannot help you track how individual commits within the patch
set evolved.
* It's more complicated. While many Git users work with patch sets every
day, it's not a concept in Git iself. Git only knows about commits.
The patch set ID would introduce a new concept into Git unnecessarily,
while the change-id naturally extends the language Git already speaks,
that of commits.
Remo
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-15 22:30 ` Remo Senekowitsch
@ 2025-04-16 0:09 ` Junio C Hamano
2025-04-16 0:21 ` Jacob Keller
1 sibling, 0 replies; 118+ messages in thread
From: Junio C Hamano @ 2025-04-16 0:09 UTC (permalink / raw)
To: Remo Senekowitsch
Cc: Theodore Ts'o, Nico Williams, Martin von Zweigbergk,
Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
"Remo Senekowitsch" <remo@buenzli.dev> writes:
> Doesn't a patch set ID suffer from the same kind of ambiguity the
> change-id supposedly does? Patch sets can be split and merged, a commit
> from one patch set can be cherry-picked into another. What patch set ID
Correct.
I still prefer it over the change IDs between these two incomplete
mechanisms. To resolve the ambiguity, you'd probably go all the way
to an approach like "in addition to the usual parent-child
relationship, we record the change evolution relationship so that
there is a record on each commit what 'predecessor commit(s)' it was
derived from".
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-15 22:30 ` Remo Senekowitsch
2025-04-16 0:09 ` Junio C Hamano
@ 2025-04-16 0:21 ` Jacob Keller
1 sibling, 0 replies; 118+ messages in thread
From: Jacob Keller @ 2025-04-16 0:21 UTC (permalink / raw)
To: Remo Senekowitsch, Junio C Hamano, Theodore Ts'o
Cc: Nico Williams, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, philipmetzger@bluewin.ch
On 4/15/2025 3:30 PM, Remo Senekowitsch wrote:
> Doesn't a patch set ID suffer from the same kind of ambiguity the
> change-id supposedly does? Patch sets can be split and merged, a commit
> from one patch set can be cherry-picked into another. What patch set ID
> should such a cherry-picked commit have?
>
> And I think the argument that a change-id for a singleton patch
> set naturally falls out of the patch set ID can easily be reversed.
> Admittedly, I don't have the most experience with the mailing list
> workflow, but a multi-commit patch set usually comes with a cover
> letter, right? And people like to track their cover letter in a commit?
> IIUC, b4 is designed around that too.
>
> In that case, the cover letter has its own change-id as any other
> commit, which will naturally remain stable across every version of the
> patch set. It would be non-sensical to squash, split or cherry-pick the
> cover letter commit. Sounds like a great candidate for the patch set ID.
>
If you commit your cover letter, that would work fine. If you don't
commit your cover letter, you could probably also generate this.
However, the commit itself wouldn't necessary have the patch-id as part
of its metadata so you may not be able to easily look this up without
referring to the external place where you published. Generally cover
letters are commits only for the submitter. Once you submit and it is
applied/merged, the patch-id would vanish unless
> So the patch set ID can just as naturally flow out from the change-id.
>
> I can see two concrete disadvantages of the patch set ID:
>
> * It's strictly less powerful. As explained, the change-id can do
> everything the patch set ID can via the cover letter. But the patch
> set ID cannot help you track how individual commits within the patch
> set evolved.
Fair.
>
> * It's more complicated. While many Git users work with patch sets every
> day, it's not a concept in Git iself. Git only knows about commits.
> The patch set ID would introduce a new concept into Git unnecessarily,
> while the change-id naturally extends the language Git already speaks,
> that of commits.
>
Sure. I guess it depends somewhat on what you want to get out of the
change-ids.
If you only care about "how do commits fit together in series", I think
a set of change-ids per commit is not that helpful on its own. You could
likely build the equivalent of patch-id out of it with proper tooling.
However, if patch-id is not sufficient for what most folks interested in
this topic want, then I would agree with you its not the right direction.
> Remo
>
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-12 23:13 ` Theodore Ts'o
2025-04-14 15:13 ` Junio C Hamano
@ 2025-04-15 21:38 ` Jacob Keller
1 sibling, 0 replies; 118+ messages in thread
From: Jacob Keller @ 2025-04-15 21:38 UTC (permalink / raw)
To: Theodore Ts'o, Junio C Hamano
Cc: Nico Williams, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On 4/12/2025 4:13 PM, Theodore Ts'o wrote:
> On Fri, Apr 11, 2025 at 10:44:43AM -0700, Junio C Hamano wrote:
>>
>> The submitting contributor must make a conscious arrangement to give
>> a "patch set ID" shared among the messages in a single iteration,
>> and everybody who are responding must make sure they do not add the
>> same ID to the messages they throw at the thread in response. Those
>> who use format-patch and send-email can do that with convention and
>> automation and there is no reason to rely on In-Reply-To: header
>> (which may confuse the automated recipient of manually created
>> follow-up messages).
>
> So it all depends on how the patch set ID is implemented. Here's one
> way that I had in mind. The reason why I like like this over the
> Change-ID approach is that the semantics can be very clearly defined,
> and the only thing we rely on is the user saying "this new commit is
> part of patch series which I'm putting together".
>
I've been catching up on this thread, trying to get a sense of the
discussion. I like the notion of this patch-id. I think dealing with
patch series as a single entity with one patch-id is nice.
In my experiences with gerrit, a patch series being treated as
individual reviews with their own change-ids usually discouraged doing
things in series and especially discouraged splitting a patch into two
after a review started. I prefer being able to collate the series
together, so a patch-id is useful.
Having the singleton change-id semantics naturally emerge is nice.
One thing I really liked from previous in the thread was the "reverse
hex" where they suggested encoding a change-id uses letters from the end
of the alphabet. I really like that it was immediately unambiguous when
you see a change-id value vs seeing a commit-id. Obviously there are
lots of other ways to encode this and I think the thread has discussed
numerous options.
> By default when creating a new commit, the field is empty (in which
> case the patch set ID is presumed to be the same as the commit ID), or
> if the user gives a command-line flag say, "git commit --series"
> which indicates that it is part of a patch series in which case the
> patch set ID of the commit is set to the patch set ID of the current
> commit (i.e., eventully, its parent commit).
>
> Whenever the commit is amended or rebased or cherry picked, if the
> patch series ID is NULL, then it is set to the original commit ID.
> Otherwise, the existing patch set ID is preserved.
Ok, so it starts null, but as soon as you rebase/amend/etc the commit
then its set. This is how the change-id semantics fall out for singleton
commits.
>
> The patch set ID will be output by git format-patch (perhaps as "Patch
> Series ID: sha has" immediately after the --- line. And if it is
> present, "git am" will import that patch series ID into git commit
> which creates when it sucks in the e-mail.
>
> The net affect of this is that for new versions of git which implement
> the Patch Set ID, all new commits are treated as patch series of
> length 1, unless a subsequent commit is created using "git commit
> --series". And the Patch Set ID will be preserved across
> cherry-picks, rebase operations, and git send-email/git apply-message
> operations.
I think its likely for tooling to emerge to retroactively convert a
branch into a "series" with the same patch-id after the fact once a
series is formed.
>
> So if someone replies to an existing e-mail thread with a new commit,
> git format-patch will give it a different patch set ID, so we can
> distinguish it from an amended copy of a patch in the patch series.
>
> It also means that singleton commits, the patch ID effectively acts
> much like the tranditonal Change-ID. For multi-commit patch series,
> all of the commits will have the same patch set ID.
>
> - Ted
>
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-09 16:54 ` Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer) Nico Williams
2025-04-09 18:02 ` Junio C Hamano
@ 2025-04-14 19:54 ` D. Ben Knoble
2025-04-14 21:34 ` Nico Williams
` (2 more replies)
1 sibling, 3 replies; 118+ messages in thread
From: D. Ben Knoble @ 2025-04-14 19:54 UTC (permalink / raw)
To: Nico Williams
Cc: Theodore Ts'o, Junio C Hamano, Martin von Zweigbergk,
Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Wed, Apr 9, 2025 at 12:56 PM Nico Williams <nico@cryptonector.com> wrote:
>
> On Wed, Apr 09, 2025 at 08:19:24AM -0400, Theodore Ts'o wrote:
> > On Tue, Apr 08, 2025 at 10:53:06AM -0500, Nico Williams wrote:
> > > I'm not keen on CR tools "intuiting" from.. similarity checks.
> > > [...]
> >
> > I'm not keen on fields that can have essentially random semantics.
> > Part of this is because today Change-ID is in the footer, and so
> > humans can randomly set it to any value they like. Sometimes they cut
> > and paste footers, and so completely unrelated commits have the same
> > Change-Id which show up when you do a Gerrit lookup by Chnage-Id.
> > Admittedly, this aspect gets better if we shove it into the git commit
> > header.
> >
> > Part of it is because some tools will edit the Change-Id when doing a
> > cherry-pick. [...]
>
> I was only proposing to leave some details out, not to have completely
> undefined semantics. The particular details we might want to leave out
> are about resolving change IDs to URIs. In particular this editing of
> change IDs on cherry-pick you mention has to not be permitted, or
> perhaps a new change ID could be added -- i.e., are these headers
> single-valued or multi-valued?
>
> Let's nail down the semantics of these change ID headers. Here is a
> proposal to bang on:
>
> - change IDs get preserved on cherry-pick and on `pick`s in rebases
>
> - users can manually remove or change these change IDs, naturally,
> though generall they would not
>
> - the actual change IDs are either free-form or they are URIs -- pick
> one, but if they are URIs they should be URIs to CRs, and approved
> CRs should perhaps have links to integration reports etc.
Using URIs [to code reviews] looks to me like it makes some
assumptions about what creates or consumes these headers, right?
Especially since the URI should point to a code review… Is there a way
to do that which is downstream-agnostic?
Further, and maybe this is my ignorance of Gerrit showing: how would
you attach a URI to a local commit when authoring it? You don't have
the review URI when running `git commit`, do you? (Maybe I
misunderstood; I'm seeing an odd chicken-egg problem here.)
Which begs another question: what/who applies the initial change ID to
a commit and when?
[…]
I've skimmed most of the discussion, and I think a unique ID for an
in-flight series could be useful for ergonomics and to support more
tools that link between versions of the series.
Re-reading the original post [1] (which didn't mention this kind of
ID?), I'm having a hard time seeing the problem statement. There's a
lot said here about the specifics of the solution, and some other neat
things it might unlock… meanwhile, I'm wondering if all the
consternation about change IDs is because the problem being solved is
underspecified for a core Git feature? (That might tie to Ted's
initial concerns about semantic meaning, on which I think I concur:
the parent and committer/author headers have unambiguous meaning to
Git, independent of anything else.)
It looks to me, an outsider, like the problem is some combination of
"I want to track a commit's evolution" and "I want to see related
commits in review, esp. when it's an identical and already-approved
commit." But I might be misreading, and clarifying the problem
statement might help bring us to a better core solution?
[1]: https://lore.kernel.org/git/xmqqh62tm5fo.fsf@gitster.g/T/#m038be849b9b4020c16c562d810cf77bad91a2c87
Cheers,
D. Ben Knoble
PS This discussion feels somewhat related to the classic GitHub
problem of not presenting interdiffs/range-diffs: GitHub shows a
too-flat source diff on force-pushes. Perhaps better web UI tooling
about interdiff review (which I think is one of the things Gerrit
does/wants to do?) makes change IDs less necessary, since interdiffs
help connect evolutions of commits?
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-14 19:54 ` D. Ben Knoble
@ 2025-04-14 21:34 ` Nico Williams
2025-04-15 21:44 ` Jacob Keller
2025-04-16 11:36 ` Remo Senekowitsch
2 siblings, 0 replies; 118+ messages in thread
From: Nico Williams @ 2025-04-14 21:34 UTC (permalink / raw)
To: D. Ben Knoble
Cc: Theodore Ts'o, Junio C Hamano, Martin von Zweigbergk,
Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Mon, Apr 14, 2025 at 03:54:23PM -0400, D. Ben Knoble wrote:
> Using URIs [to code reviews] looks to me like it makes some
> assumptions about what creates or consumes these headers, right?
> Especially since the URI should point to a code review… Is there a way
> to do that which is downstream-agnostic?
You could tag the URI(s) with purposes, but URIs are already pretty
agnostic as to what is being referenced by them as they are merely the
reference. That said, if you need to decompose the URI into specific
subitems then you need to understand the underlying application's
local-part (and q-param, if any) scheme.
> Further, and maybe this is my ignorance of Gerrit showing: how would
> you attach a URI to a local commit when authoring it? You don't have
> the review URI when running `git commit`, do you? (Maybe I
> misunderstood; I'm seeing an odd chicken-egg problem here.)
Excellent outlook-changing question. Local tooling would be needed,
which would be annoying if that tooling were not Git itself, but if it's
Git then how would it interface with Gerrit or any other such tools?
We'd have to define APIs for that, and that too would be annoying. So
it has to be `git commit` (or `git rebase -i` and then use a new verb to
stop and set a commit's change ID metadata, like `reword`, but for
metadata), which means the user has to acquire a CR before creating the
CR, so the CR tools would have to support that.
On the other hand if it's not CR URIs but more like ticket URIs (as in
JIRA, bugzilla, etc.) then it's much easier.
People already use ticket IDs all the time, typically in the commit
subject, else in the commit commentary, typically using some specific
form. For example Illumos has devs put one ticket ID in the subject and
if there are more ticket IDs then the body of the commit message must
start with each additional ticket ID on a line by itself with the
ticket's synopsis following the ID. E.g.,
https://src.illumos.org/source/history/illumos-gate/ (I think they
don't allow any actual commentary in the commit message body, with all
commentary having to be in the tickets).
Typically tickets have to exist before the commits get created, and in
cases like Illumos' tickets have to exist before the code review is
created, and the commits have to reference the relevant ticket(s).
OTOH in the Illumos case you see that in a CR one might have multiple
commits for different tickets each, and still all be related. A change
ID/URI could be used to link those together without having to go
spelunking in the ticket system. Also ticket IDs (and URIs) could be
handy as a header in the commits because otherwise one has to know the
commit naming conventions of the project. Illumos, for example, could
link tickets by ID in the subject and commit message body and by URI in
commit headers (or footers).
> Which begs another question: what/who applies the initial change ID to
> a commit and when?
See above.
> I've skimmed most of the discussion, and I think a unique ID for an
> in-flight series could be useful for ergonomics and to support more
> tools that link between versions of the series.
Also to ease back- and forward-ports. Though to be fair that's a mostly
solved probalm as when users do those typically they start with a
ticket, find a CR linked from the ticket, find the corresponding commits
in the main branch, then go from there. A change ID might not be more
helpful than that. Then again, if you're doing a second or third
backport then one could find earlier backports that might be easier to
port from than the main branch commits, and here then a change ID might
help.
> Re-reading the original post [1] (which didn't mention this kind of
> ID?), I'm having a hard time seeing the problem statement. [...]
It mentions a "change ID". I'm supposing it could be a URI, but I don't
care if it's not, and if it's easier then ignore the URI thing.
> It looks to me, an outsider, like the problem is some combination of
> "I want to track a commit's evolution" and "I want to see related
> commits in review, esp. when it's an identical and already-approved
> commit." But I might be misreading, and clarifying the problem
> statement might help bring us to a better core solution?
I'm the one introducing the second of those, and perhaps I should butt
off.
> [1]: https://lore.kernel.org/git/xmqqh62tm5fo.fsf@gitster.g/T/#m038be849b9b4020c16c562d810cf77bad91a2c87
>
> Cheers,
> D. Ben Knoble
>
> PS This discussion feels somewhat related to the classic GitHub
> problem of not presenting interdiffs/range-diffs: GitHub shows a
> too-flat source diff on force-pushes. Perhaps better web UI tooling
> about interdiff review (which I think is one of the things Gerrit
> does/wants to do?) makes change IDs less necessary, since interdiffs
> help connect evolutions of commits?
Nothing here could force GH to make their UI nicer and more featureful.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-14 19:54 ` D. Ben Knoble
2025-04-14 21:34 ` Nico Williams
@ 2025-04-15 21:44 ` Jacob Keller
2025-04-16 11:36 ` Remo Senekowitsch
2 siblings, 0 replies; 118+ messages in thread
From: Jacob Keller @ 2025-04-15 21:44 UTC (permalink / raw)
To: D. Ben Knoble, Nico Williams
Cc: Theodore Ts'o, Junio C Hamano, Martin von Zweigbergk,
Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On 4/14/2025 12:54 PM, D. Ben Knoble wrote:
>
> It looks to me, an outsider, like the problem is some combination of
> "I want to track a commit's evolution" and "I want to see related
> commits in review, esp. when it's an identical and already-approved
> commit." But I might be misreading, and clarifying the problem
> statement might help bring us to a better core solution?
>
> [1]: https://lore.kernel.org/git/xmqqh62tm5fo.fsf@gitster.g/T/#m038be849b9b4020c16c562d810cf77bad91a2c87
>
To me, it seems like multiple different and independent problems are
being solved with something that is almost but not quite the same in
each of the major projects shown as examples. All of these projects
would benefit from having something built into git... but its a
challenge when they don't have the same semantics and don't quite solve
the same use cases.
It is hard to come up with something that is general enough to cover all
of the uses cases.
> Cheers,
> D. Ben Knoble
>
> PS This discussion feels somewhat related to the classic GitHub
> problem of not presenting interdiffs/range-diffs: GitHub shows a
> too-flat source diff on force-pushes. Perhaps better web UI tooling
> about interdiff review (which I think is one of the things Gerrit
> does/wants to do?) makes change IDs less necessary, since interdiffs
> help connect evolutions of commits?
>
I think interdiffs and range-diffs are very helpful. More exposure of
these in the various forges would be good. I suspect that creating an
easy to use web UI for these is a hard problem, especially as there are
a number of corner cases to get right.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-14 19:54 ` D. Ben Knoble
2025-04-14 21:34 ` Nico Williams
2025-04-15 21:44 ` Jacob Keller
@ 2025-04-16 11:36 ` Remo Senekowitsch
2025-04-22 20:17 ` D. Ben Knoble
2 siblings, 1 reply; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-16 11:36 UTC (permalink / raw)
To: D. Ben Knoble, Nico Williams
Cc: Theodore Ts'o, Junio C Hamano, Martin von Zweigbergk,
Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Mon Apr 14, 2025 at 9:54 PM CEST, D. Ben Knoble wrote:
> On Wed, Apr 9, 2025 at 12:56 PM Nico Williams <nico@cryptonector.com> wrote:
>> Let's nail down the semantics of these change ID headers. Here is a
>> proposal to bang on:
>>
>> - change IDs get preserved on cherry-pick and on `pick`s in rebases
>>
>> - users can manually remove or change these change IDs, naturally,
>> though generall they would not
>>
>> - the actual change IDs are either free-form or they are URIs -- pick
>> one, but if they are URIs they should be URIs to CRs, and approved
>> CRs should perhaps have links to integration reports etc.
>
> Using URIs [to code reviews] looks to me like it makes some
> assumptions about what creates or consumes these headers, right?
> Especially since the URI should point to a code review… Is there a way
> to do that which is downstream-agnostic?
>
> Further, and maybe this is my ignorance of Gerrit showing: how would
> you attach a URI to a local commit when authoring it? You don't have
> the review URI when running `git commit`, do you? (Maybe I
> misunderstood; I'm seeing an odd chicken-egg problem here.)
>
> Which begs another question: what/who applies the initial change ID to
> a commit and when?
These are all great questions, which the originally proposed format
(fixed-width reverse-hex) has answers to. I think a URI would be
strictly worse.
* Using a reverse-hex ID makes no assumptions about what consumes these
headers. There can be multiple different consumers which treat the ID
differently with different URI schemes.
* Attaching a reverse-hex ID to a local commit when authoring it is
trivial: you generate it randomly.
This is one of those cases where being maximally restrictive about the
format will enable maximal flexibility downstream.
One example: GitHub has a URL scheme that looks like this:
github.com/org/repo/compare/<ref1>..<ref2>
This doesn't work if the refs contain slashes, as branches sometimes do
(e.g. feat/foo, username/bar). If the change-id is a URI, this type of
URL scheme doesn't work reliably.
That is not to say we should design the change-id around GitHub, it's
just an example how making the format more free-form (URI is more
free-form than fixed-width reverse-hex) makes it more difficult to get
stuff working downstream.
And lastly, laser-etching the URI scheme of one particular tool into
your commit history means the history is at great risk of degrading
over time. URI schemes change, domains change, tools become outdated
and are replaced.
Adding some ephemeral configuration to a tool that constructs a URI out
of a reverse-hex ID on the other hand is trivial.
> PS This discussion feels somewhat related to the classic GitHub
> problem of not presenting interdiffs/range-diffs: GitHub shows a
> too-flat source diff on force-pushes. Perhaps better web UI tooling
> about interdiff review (which I think is one of the things Gerrit
> does/wants to do?) makes change IDs less necessary, since interdiffs
> help connect evolutions of commits?
I think it's the other way around: Building a code review UI built on
git and centered around interdiffs today is _hard_, that's why we don't
have it yet. Adding change-ids to commits will make it much easier,
paving the way for these tools to be implemented.
Remo
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-16 11:36 ` Remo Senekowitsch
@ 2025-04-22 20:17 ` D. Ben Knoble
2025-04-22 22:24 ` Remo Senekowitsch
[not found] ` <aAgWytQNqtLzg2TU@ubby>
0 siblings, 2 replies; 118+ messages in thread
From: D. Ben Knoble @ 2025-04-22 20:17 UTC (permalink / raw)
To: Remo Senekowitsch
Cc: Nico Williams, Theodore Ts'o, Junio C Hamano,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Wed, Apr 16, 2025 at 7:36 AM Remo Senekowitsch <remo@buenzli.dev> wrote:
>
> On Mon Apr 14, 2025 at 9:54 PM CEST, D. Ben Knoble wrote:
> > On Wed, Apr 9, 2025 at 12:56 PM Nico Williams <nico@cryptonector.com> wrote:
> >> Let's nail down the semantics of these change ID headers. Here is a
> >> proposal to bang on:
> >>
> >> - change IDs get preserved on cherry-pick and on `pick`s in rebases
> >>
> >> - users can manually remove or change these change IDs, naturally,
> >> though generall they would not
> >>
> >> - the actual change IDs are either free-form or they are URIs -- pick
> >> one, but if they are URIs they should be URIs to CRs, and approved
> >> CRs should perhaps have links to integration reports etc.
> >
> > Using URIs [to code reviews] looks to me like it makes some
> > assumptions about what creates or consumes these headers, right?
> > Especially since the URI should point to a code review… Is there a way
> > to do that which is downstream-agnostic?
> >
> > Further, and maybe this is my ignorance of Gerrit showing: how would
> > you attach a URI to a local commit when authoring it? You don't have
> > the review URI when running `git commit`, do you? (Maybe I
> > misunderstood; I'm seeing an odd chicken-egg problem here.)
> >
> > Which begs another question: what/who applies the initial change ID to
> > a commit and when?
>
> These are all great questions, which the originally proposed format
> (fixed-width reverse-hex) has answers to. I think a URI would be
> strictly worse.
Well, I think we still missed "what/who applies the initial change ID
to a commit and when."
But the treatment below is something I agree with and failed to
convey, I think: namely, URIs seem to encode too much
"unportable"/"specific" information in Git. I feel like the current
design is not really "tool-agnostic" as much as "built on a universal
core." That seems valuable and prone to more longevity.
>
> * Using a reverse-hex ID makes no assumptions about what consumes these
> headers. There can be multiple different consumers which treat the ID
> differently with different URI schemes.
>
> * Attaching a reverse-hex ID to a local commit when authoring it is
> trivial: you generate it randomly.
>
> This is one of those cases where being maximally restrictive about the
> format will enable maximal flexibility downstream.
>
> One example: GitHub has a URL scheme that looks like this:
> github.com/org/repo/compare/<ref1>..<ref2>
>
> This doesn't work if the refs contain slashes, as branches sometimes do
> (e.g. feat/foo, username/bar). If the change-id is a URI, this type of
> URL scheme doesn't work reliably.
>
> That is not to say we should design the change-id around GitHub, it's
> just an example how making the format more free-form (URI is more
> free-form than fixed-width reverse-hex) makes it more difficult to get
> stuff working downstream.
>
> And lastly, laser-etching the URI scheme of one particular tool into
> your commit history means the history is at great risk of degrading
> over time. URI schemes change, domains change, tools become outdated
> and are replaced.
>
> Adding some ephemeral configuration to a tool that constructs a URI out
> of a reverse-hex ID on the other hand is trivial.
Yep.
> > PS This discussion feels somewhat related to the classic GitHub
> > problem of not presenting interdiffs/range-diffs: GitHub shows a
> > too-flat source diff on force-pushes. Perhaps better web UI tooling
> > about interdiff review (which I think is one of the things Gerrit
> > does/wants to do?) makes change IDs less necessary, since interdiffs
> > help connect evolutions of commits?
>
> I think it's the other way around: Building a code review UI built on
> git and centered around interdiffs today is _hard_, that's why we don't
> have it yet. Adding change-ids to commits will make it much easier,
> paving the way for these tools to be implemented.
>
> Remo
Fair point, although GitHub's detection of force-pushes makes me think
it could split a PR into versions at that point, cross-link backwards
and forwards by one version (from the force-push detection), show
range-diffs between versions based on the target branch of the merge,
and even follow the cross-links to show an overall sequence of
versions.
But I don't work there, so presumably it's harder than that :)
I sincerely hope to make it easier if it's really that hard! And,
though my opinion matters little, I'm having a hard time piercing the
conversation to see a "universal core" that solves the desired
problem. Maybe I'm not reading carefully enough, and maybe a summary
would help. I greatly appreciated the work of previous folks to
summarize the current thread status.
--
D. Ben Knoble
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-22 20:17 ` D. Ben Knoble
@ 2025-04-22 22:24 ` Remo Senekowitsch
2025-04-22 22:42 ` Junio C Hamano
[not found] ` <aAgWytQNqtLzg2TU@ubby>
1 sibling, 1 reply; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-22 22:24 UTC (permalink / raw)
To: D. Ben Knoble
Cc: Nico Williams, Theodore Ts'o, Junio C Hamano,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Tue Apr 22, 2025 at 10:17 PM CEST, D. Ben Knoble wrote:
> On Wed, Apr 16, 2025 at 7:36 AM Remo Senekowitsch <remo@buenzli.dev> wrote:
>>
>> On Mon Apr 14, 2025 at 9:54 PM CEST, D. Ben Knoble wrote:
>> > On Wed, Apr 9, 2025 at 12:56 PM Nico Williams <nico@cryptonector.com> wrote:
>> >> Let's nail down the semantics of these change ID headers. Here is a
>> >> proposal to bang on:
>> >>
>> >> - change IDs get preserved on cherry-pick and on `pick`s in rebases
>> >>
>> >> - users can manually remove or change these change IDs, naturally,
>> >> though generall they would not
>> >>
>> >> - the actual change IDs are either free-form or they are URIs -- pick
>> >> one, but if they are URIs they should be URIs to CRs, and approved
>> >> CRs should perhaps have links to integration reports etc.
>> >
>> > Using URIs [to code reviews] looks to me like it makes some
>> > assumptions about what creates or consumes these headers, right?
>> > Especially since the URI should point to a code review… Is there a way
>> > to do that which is downstream-agnostic?
>> >
>> > Further, and maybe this is my ignorance of Gerrit showing: how would
>> > you attach a URI to a local commit when authoring it? You don't have
>> > the review URI when running `git commit`, do you? (Maybe I
>> > misunderstood; I'm seeing an odd chicken-egg problem here.)
>> >
>> > Which begs another question: what/who applies the initial change ID to
>> > a commit and when?
>>
>> These are all great questions, which the originally proposed format
>> (fixed-width reverse-hex) has answers to. I think a URI would be
>> strictly worse.
>
> Well, I think we still missed "what/who applies the initial change ID
> to a commit and when."
The tool that creates the commit, when it creates the commit. In the
context of this discussion, that's Git or Jujutsu. If the commit is
brand new, generate the change-id randomly. If it's the "spiritual
successor" of another commit, use that commit's change-id.
Btw. since the thread was started, the implementation in Jujutsu has
been completed and I've been pushing commits with the change-id header
to various remotes for a while now. It works well. Forges can start
taking advantage of it. (I hope I find time to help work on that.)
> But the treatment below is something I agree with and failed to
> convey, I think: namely, URIs seem to encode too much
> "unportable"/"specific" information in Git. I feel like the current
> design is not really "tool-agnostic" as much as "built on a universal
> core." That seems valuable and prone to more longevity.
>
>>
>> * Using a reverse-hex ID makes no assumptions about what consumes these
>> headers. There can be multiple different consumers which treat the ID
>> differently with different URI schemes.
>>
>> * Attaching a reverse-hex ID to a local commit when authoring it is
>> trivial: you generate it randomly.
>>
>> This is one of those cases where being maximally restrictive about the
>> format will enable maximal flexibility downstream.
>>
>> One example: GitHub has a URL scheme that looks like this:
>> github.com/org/repo/compare/<ref1>..<ref2>
>>
>> This doesn't work if the refs contain slashes, as branches sometimes do
>> (e.g. feat/foo, username/bar). If the change-id is a URI, this type of
>> URL scheme doesn't work reliably.
>>
>> That is not to say we should design the change-id around GitHub, it's
>> just an example how making the format more free-form (URI is more
>> free-form than fixed-width reverse-hex) makes it more difficult to get
>> stuff working downstream.
>>
>> And lastly, laser-etching the URI scheme of one particular tool into
>> your commit history means the history is at great risk of degrading
>> over time. URI schemes change, domains change, tools become outdated
>> and are replaced.
>>
>> Adding some ephemeral configuration to a tool that constructs a URI out
>> of a reverse-hex ID on the other hand is trivial.
>
> Yep.
>
>> > PS This discussion feels somewhat related to the classic GitHub
>> > problem of not presenting interdiffs/range-diffs: GitHub shows a
>> > too-flat source diff on force-pushes. Perhaps better web UI tooling
>> > about interdiff review (which I think is one of the things Gerrit
>> > does/wants to do?) makes change IDs less necessary, since interdiffs
>> > help connect evolutions of commits?
>>
>> I think it's the other way around: Building a code review UI built on
>> git and centered around interdiffs today is _hard_, that's why we don't
>> have it yet. Adding change-ids to commits will make it much easier,
>> paving the way for these tools to be implemented.
>
> Fair point, although GitHub's detection of force-pushes makes me think
> it could split a PR into versions at that point, cross-link backwards
> and forwards by one version (from the force-push detection), show
> range-diffs between versions based on the target branch of the merge,
> and even follow the cross-links to show an overall sequence of
> versions.
>
> But I don't work there, so presumably it's harder than that :)
I agree that forges still have a lot of potential to improve their code
review UIs even without change-ids. But tracking individual patches
across force-pushes is not as easy as calling git range-diff. Its
manpage says the output is not stable for machines to read and the
algorithm it uses has cubic runtime complexity. Not something I'd want
to use on my backend if the user controls the input. So the next best
thing is to reimplement it using cheaper (but worse) heuristics. The
author timestamp would probably be a good start. But at that point,
you're looking at a lot of work for what users will perceive as an
unreliable and inconsistent experience.
> I sincerely hope to make it easier if it's really that hard! And,
> though my opinion matters little, I'm having a hard time piercing the
> conversation to see a "universal core" that solves the desired
> problem. Maybe I'm not reading carefully enough, and maybe a summary
> would help. I greatly appreciated the work of previous folks to
> summarize the current thread status.
--
Best regards,
Remo
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-22 22:24 ` Remo Senekowitsch
@ 2025-04-22 22:42 ` Junio C Hamano
2025-04-22 22:51 ` Nico Williams
` (2 more replies)
0 siblings, 3 replies; 118+ messages in thread
From: Junio C Hamano @ 2025-04-22 22:42 UTC (permalink / raw)
To: Remo Senekowitsch
Cc: D. Ben Knoble, Nico Williams, Theodore Ts'o,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
"Remo Senekowitsch" <remo@buenzli.dev> writes:
> Btw. since the thread was started, the implementation in Jujutsu has
> been completed and I've been pushing commits with the change-id header
> to various remotes for a while now. It works well. Forges can start
> taking advantage of it. (I hope I find time to help work on that.)
It should work well, until somebody finds your random is not random
enough, right? Unlike our object name that depends on the contents
(hence a duplicate unless the cryptographic hash function collides
means they are truly the same commit), there is no grabally unique
ID assigner involved in your implementation, right? Until
sufficiently large number of people start using and large number of
changes gets assigned IDs, it won't become an issue, but then how
would it be different from what was raised in the earlier discussion
to use the commit object name itself as the change ID for a commit
that is not derived from anybody else, and copy that ID to commits
that are derived from the original commit as the change ID shared
among them? At least that would give us a much better uniqueness
guarantee, wouldn't it? If you want to be able to tell between
commit object name and change ID, I wouldn't object if you encode
them using whatever mechanism.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-22 22:42 ` Junio C Hamano
@ 2025-04-22 22:51 ` Nico Williams
2025-04-22 23:47 ` Remo Senekowitsch
2025-04-22 23:49 ` Junio C Hamano
2025-04-22 23:21 ` Remo Senekowitsch
2025-04-23 5:07 ` Martin von Zweigbergk
2 siblings, 2 replies; 118+ messages in thread
From: Nico Williams @ 2025-04-22 22:51 UTC (permalink / raw)
To: Junio C Hamano
Cc: Remo Senekowitsch, D. Ben Knoble, Theodore Ts'o,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Tue, Apr 22, 2025 at 03:42:25PM -0700, Junio C Hamano wrote:
> "Remo Senekowitsch" <remo@buenzli.dev> writes:
> > Btw. since the thread was started, the implementation in Jujutsu has
> > been completed and I've been pushing commits with the change-id header
> > to various remotes for a while now. It works well. Forges can start
> > taking advantage of it. (I hope I find time to help work on that.)
>
> It should work well, until somebody finds your random is not random
> enough, right? Unlike our object name that depends on the contents
> (hence a duplicate unless the cryptographic hash function collides
> means they are truly the same commit), there is no grabally unique
Eh, if the hash function is weak then collisions might not be so rare,
especially when intentional.
In a content addressed storage system hash collisions are (can be) bad,
but typically the hash functions used are good enough that collisions
are tolerable, and one just assumes that if the hash matches then the
data is right, and there is no way to verify that that does not require
trusting the hash function.
> ID assigner involved in your implementation, right? [...]
Change IDs for this purpose need not be unique, IMO. If they get
duplicated either you can notice the problem before pushing, or you can
accept the dups and use context to resolve them correctly as needed.
One could make every commit have the same change ID, as a joke or spam,
but presumably most upstreams wouldn't do that.
Using ticket IDs as change IDs implies a globally unique ID assigner,
and should work well enough where things like bugzilla are used.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-22 22:51 ` Nico Williams
@ 2025-04-22 23:47 ` Remo Senekowitsch
2025-04-23 0:32 ` Nico Williams
2025-04-22 23:49 ` Junio C Hamano
1 sibling, 1 reply; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-22 23:47 UTC (permalink / raw)
To: Nico Williams, Junio C Hamano
Cc: D. Ben Knoble, Theodore Ts'o, Martin von Zweigbergk,
Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Wed Apr 23, 2025 at 12:51 AM CEST, Nico Williams wrote:
>
> Using ticket IDs as change IDs implies a globally unique ID assigner,
> and should work well enough where things like bugzilla are used.
This email thread contains recurring ideas of stuffing unrelated
metadata into the change-id header (patch-id, ticket-id). I think we
should be careful not to do that.
The purpose of the proposed change-id is to identify and track how a
change evolves over time. We have talked about how those semantics may
or may not be clear enough, and that's a good thing to discuss.
It's also a good thing to discuss potential alternatives. E.g. "If we
have a patch-id, we don't need a change-id." I don't agree with that,
but it's a good thing to discuss.
But _deriving a change-id from_ a patch-id or ticket-id or whatever
completely destroys its purpose of tracking how a change evolves over
time. The change-id can only do that job if it _doesn't_ have other
unrelated semantics attached to it.
A patch can contain multiple changes. A ticket can be associated with
multiple changes. Deriving a change-id from either of them makes it
impossible to identify and track a single change.
So let's avoid mixing these concepts and talk about them as distinct
alternatives.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-22 23:47 ` Remo Senekowitsch
@ 2025-04-23 0:32 ` Nico Williams
2025-04-23 1:15 ` Remo Senekowitsch
0 siblings, 1 reply; 118+ messages in thread
From: Nico Williams @ 2025-04-23 0:32 UTC (permalink / raw)
To: Remo Senekowitsch
Cc: Junio C Hamano, D. Ben Knoble, Theodore Ts'o,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Wed, Apr 23, 2025 at 01:47:29AM +0200, Remo Senekowitsch wrote:
> On Wed Apr 23, 2025 at 12:51 AM CEST, Nico Williams wrote:
> > Using ticket IDs as change IDs implies a globally unique ID assigner,
> > and should work well enough where things like bugzilla are used.
>
> This email thread contains recurring ideas of stuffing unrelated
> metadata into the change-id header (patch-id, ticket-id). I think we
> should be careful not to do that.
We, the users, have been doing this for decades by convention (i.e.,
starting commit subject lines with ticket IDs and putting all other
related ticket IDs in the rest of the commit comment. Why would that be
wrong _now_?
> The purpose of the proposed change-id is to identify and track how a
> change evolves over time. We have talked about how those semantics may
> or may not be clear enough, and that's a good thing to discuss.
And ticket IDs happen to be very good at being this sort of change ID.
Sun did this starting in the early 90s. They used a rebase workflow at
least since 1992. Folks still do this in Illumos. I'm certain that the
rump Solaris engineering at Oracle still does this. That's more than
three decades of that.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-23 0:32 ` Nico Williams
@ 2025-04-23 1:15 ` Remo Senekowitsch
2025-04-23 4:45 ` Nico Williams
0 siblings, 1 reply; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-23 1:15 UTC (permalink / raw)
To: Nico Williams
Cc: Junio C Hamano, D. Ben Knoble, Theodore Ts'o,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Wed Apr 23, 2025 at 2:32 AM CEST, Nico Williams wrote:
> On Wed, Apr 23, 2025 at 01:47:29AM +0200, Remo Senekowitsch wrote:
>> On Wed Apr 23, 2025 at 12:51 AM CEST, Nico Williams wrote:
>> > Using ticket IDs as change IDs implies a globally unique ID assigner,
>> > and should work well enough where things like bugzilla are used.
>>
>> This email thread contains recurring ideas of stuffing unrelated
>> metadata into the change-id header (patch-id, ticket-id). I think we
>> should be careful not to do that.
>
> We, the users, have been doing this for decades by convention (i.e.,
> starting commit subject lines with ticket IDs and putting all other
> related ticket IDs in the rest of the commit comment. Why would that be
> wrong _now_?
You can stuff as much free-form metadata into the commit message as you
want, because git itself doesn't care much about what's in there. The
better analogy would be to put the names of your mom and dad in the
"parent" header as a free-form piece of metadata about the heritage of
the commit author. That's gonna break stuff.
Putting any sort of unrelated metadata into a change-id breaks how it
works. There is no reason to do it, you can put that metadata in its
own, dedicated place. If not the commit message, then at least in its
own header.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-23 1:15 ` Remo Senekowitsch
@ 2025-04-23 4:45 ` Nico Williams
0 siblings, 0 replies; 118+ messages in thread
From: Nico Williams @ 2025-04-23 4:45 UTC (permalink / raw)
To: Remo Senekowitsch
Cc: Junio C Hamano, D. Ben Knoble, Theodore Ts'o,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Wed, Apr 23, 2025 at 03:15:28AM +0200, Remo Senekowitsch wrote:
> You can stuff as much free-form metadata into the commit message as you
> want, because git itself doesn't care much about what's in there. The
> better analogy would be to put the names of your mom and dad in the
> "parent" header as a free-form piece of metadata about the heritage of
> the commit author. That's gonna break stuff.
The point was that lacking a change ID header people have been resorting
to conventions, and to point out that people have been using ticket IDs
as change IDs. It's just an observation.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-22 22:51 ` Nico Williams
2025-04-22 23:47 ` Remo Senekowitsch
@ 2025-04-22 23:49 ` Junio C Hamano
2025-04-23 1:02 ` Nico Williams
1 sibling, 1 reply; 118+ messages in thread
From: Junio C Hamano @ 2025-04-22 23:49 UTC (permalink / raw)
To: Nico Williams
Cc: Remo Senekowitsch, D. Ben Knoble, Theodore Ts'o,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
Nico Williams <nico@cryptonector.com> writes:
> Change IDs for this purpose need not be unique, IMO. If they get
> duplicated either you can notice the problem before pushing, or you can
> accept the dups and use context to resolve them correctly as needed.
Not having to worry about duplicates (as long as you trust Git well
enough not to worry about object name collisions) is a powerful
thing, though. Unless there is a strong reason to stick to slightly
shorter (like 160-bit vs 128-bit) random string, being able to
directly compute the object name of the very first iteration is also
a plus, if you choose to use its commit object name as its change ID
(hence you do not have to worry about how you come up with a new ID)
possibly in some encoded form (if you want to be able to use both
object name and change ID in your UI), and have later iterations
record the same change ID in the trailer would be quite simple,
robust, and would be just as useful, if not more, as a scheme that
uses randomly generated string as change ID.
In any case, I am not all that interested in how change ID is
assigned than what its semantics would be and how it would help
keeping track of the changes made to changes. Personally, I am
uninterested in a scheme that does not let me even tell which one
among many commit objects that share the same change ID is the
original and how they evolved (in other words, how they relate to
each other).
Thanks.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-22 23:49 ` Junio C Hamano
@ 2025-04-23 1:02 ` Nico Williams
2025-04-23 4:47 ` Nico Williams
0 siblings, 1 reply; 118+ messages in thread
From: Nico Williams @ 2025-04-23 1:02 UTC (permalink / raw)
To: Junio C Hamano
Cc: Remo Senekowitsch, D. Ben Knoble, Theodore Ts'o,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Tue, Apr 22, 2025 at 04:49:13PM -0700, Junio C Hamano wrote:
> Not having to worry about duplicates (as long as you trust Git well
> enough not to worry about object name collisions) is a powerful
> thing, though. Unless there is a strong reason to stick to slightly
> [...]
Is that really true? Yes, because if two different objects have the
same hash then the first one pushed to a repository will "win" and the
second's content will never appear, right? I suppose that the server
could detect the collision at push time and reject the push to make sure
that there's no surprises for the pusher.
But Git requires objects and refs to be unique _and_ it indexes by them.
This is great. It might be fine for change IDs too, or not.
But remember that proponents want change IDs to not quite be unique,
since the point is that they tie multiple different versions of a commit
series together for the purpose of code review. In the end, when the
code review is completed and approved then the change ID might be unique
again, but then cherry-picking onto other branches for forward- or
back-porting might render them non-unique again.
If you accept change IDs, I highly recommend not requiring them to be
unique.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-23 1:02 ` Nico Williams
@ 2025-04-23 4:47 ` Nico Williams
0 siblings, 0 replies; 118+ messages in thread
From: Nico Williams @ 2025-04-23 4:47 UTC (permalink / raw)
To: Junio C Hamano
Cc: Remo Senekowitsch, D. Ben Knoble, Theodore Ts'o,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Tue, Apr 22, 2025 at 08:02:20PM -0500, Nico Williams wrote:
> But remember that proponents want change IDs to not quite be unique,
> since the point is that they tie multiple different versions of a commit
> series together for the purpose of code review. In the end, when the
> code review is completed and approved then the change ID might be unique
> again, but then cherry-picking onto other branches for forward- or
> back-porting might render them non-unique again.
Ah, right, the thing about change IDs getting copied to new commits when
cherry-picking was my suggestion. I stand by it, but it's not what
others are proposing.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-22 22:42 ` Junio C Hamano
2025-04-22 22:51 ` Nico Williams
@ 2025-04-22 23:21 ` Remo Senekowitsch
2025-04-23 5:07 ` Martin von Zweigbergk
2 siblings, 0 replies; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-22 23:21 UTC (permalink / raw)
To: Junio C Hamano
Cc: D. Ben Knoble, Nico Williams, Theodore Ts'o,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Wed Apr 23, 2025 at 12:42 AM CEST, Junio C Hamano wrote:
> "Remo Senekowitsch" <remo@buenzli.dev> writes:
>
>> Btw. since the thread was started, the implementation in Jujutsu has
>> been completed and I've been pushing commits with the change-id header
>> to various remotes for a while now. It works well. Forges can start
>> taking advantage of it. (I hope I find time to help work on that.)
>
> It should work well, until somebody finds your random is not random
> enough, right? Unlike our object name that depends on the contents
> (hence a duplicate unless the cryptographic hash function collides
> means they are truly the same commit), there is no grabally unique
> ID assigner involved in your implementation, right? Until
> sufficiently large number of people start using and large number of
> changes gets assigned IDs, it won't become an issue, but then how
> would it be different from what was raised in the earlier discussion
> to use the commit object name itself as the change ID for a commit
> that is not derived from anybody else, and copy that ID to commits
> that are derived from the original commit as the change ID shared
> among them? At least that would give us a much better uniqueness
> guarantee, wouldn't it? If you want to be able to tell between
> commit object name and change ID, I wouldn't object if you encode
> them using whatever mechanism.
Random number generators are well suited for this purpose, the only
requirement is that the distribution of generated numbers does not
massively deviate from a uniform probability distribution.
Commit hashes have 160 bits of information, the change-ids used by
Jujutsu today have 128 bits. That's still plenty to make random
collisions practically irrelevant.
And anyway, the situation of colliding change-ids will be somewhat of
a normal event due to cherry-picking etc. Jujutsu can deal with this
situation just fine today. Colliding change-ids are simply not as much
of a problem as colliding commit hashes.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-22 22:42 ` Junio C Hamano
2025-04-22 22:51 ` Nico Williams
2025-04-22 23:21 ` Remo Senekowitsch
@ 2025-04-23 5:07 ` Martin von Zweigbergk
2025-04-23 15:51 ` Junio C Hamano
2 siblings, 1 reply; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-04-23 5:07 UTC (permalink / raw)
To: Junio C Hamano
Cc: Remo Senekowitsch, D. Ben Knoble, Nico Williams,
Theodore Ts'o, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Tue, 22 Apr 2025 at 15:42, Junio C Hamano <gitster@pobox.com> wrote:
>
> "Remo Senekowitsch" <remo@buenzli.dev> writes:
>
> > Btw. since the thread was started, the implementation in Jujutsu has
> > been completed and I've been pushing commits with the change-id header
> > to various remotes for a while now. It works well. Forges can start
> > taking advantage of it. (I hope I find time to help work on that.)
>
> It should work well, until somebody finds your random is not random
> enough, right? Unlike our object name that depends on the contents
> (hence a duplicate unless the cryptographic hash function collides
> means they are truly the same commit), there is no grabally unique
> ID assigner involved in your implementation, right?
A forge can decide to enforce that no two commits on the main branch
have the same change ID, for example.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-23 5:07 ` Martin von Zweigbergk
@ 2025-04-23 15:51 ` Junio C Hamano
2025-04-23 16:19 ` Martin von Zweigbergk
0 siblings, 1 reply; 118+ messages in thread
From: Junio C Hamano @ 2025-04-23 15:51 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Remo Senekowitsch, D. Ben Knoble, Nico Williams,
Theodore Ts'o, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
Martin von Zweigbergk <martinvonz@google.com> writes:
> On Tue, 22 Apr 2025 at 15:42, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> "Remo Senekowitsch" <remo@buenzli.dev> writes:
>>
>> > Btw. since the thread was started, the implementation in Jujutsu has
>> > been completed and I've been pushing commits with the change-id header
>> > to various remotes for a while now. It works well. Forges can start
>> > taking advantage of it. (I hope I find time to help work on that.)
>>
>> It should work well, until somebody finds your random is not random
>> enough, right? Unlike our object name that depends on the contents
>> (hence a duplicate unless the cryptographic hash function collides
>> means they are truly the same commit), there is no grabally unique
>> ID assigner involved in your implementation, right?
>
> A forge can decide to enforce that no two commits on the main branch
> have the same change ID, for example.
Would it make sense, though? Imagine that a contributor in your
project did not refactor code properly and instead made a
copy-and-paste duplicates of a very similar code. I find a bug in
one of them, without realizing that the old mistake of duplicating
code (instead of making it a shared helper that is called from the
two places) and create a fix for it. Later somebody else realizes
the same fix is needed for the other copy---attempting to cherry
pick the original fix may find that remaining copy of a buggy code
as the logic to perform a three-way merge across renames that is
sufficiently clever kicks in. Shouldn't these two commits to fix
the same bug in two places share the same change ID so that it is
clear to the later developers that the latter fix was derived from
the former one?
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-23 15:51 ` Junio C Hamano
@ 2025-04-23 16:19 ` Martin von Zweigbergk
2025-06-06 13:04 ` Toon Claes
0 siblings, 1 reply; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-04-23 16:19 UTC (permalink / raw)
To: Junio C Hamano
Cc: Remo Senekowitsch, D. Ben Knoble, Nico Williams,
Theodore Ts'o, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Wed, 23 Apr 2025 at 08:51, Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin von Zweigbergk <martinvonz@google.com> writes:
>
> > On Tue, 22 Apr 2025 at 15:42, Junio C Hamano <gitster@pobox.com> wrote:
> >>
> >> "Remo Senekowitsch" <remo@buenzli.dev> writes:
> >>
> >> > Btw. since the thread was started, the implementation in Jujutsu has
> >> > been completed and I've been pushing commits with the change-id header
> >> > to various remotes for a while now. It works well. Forges can start
> >> > taking advantage of it. (I hope I find time to help work on that.)
> >>
> >> It should work well, until somebody finds your random is not random
> >> enough, right? Unlike our object name that depends on the contents
> >> (hence a duplicate unless the cryptographic hash function collides
> >> means they are truly the same commit), there is no grabally unique
> >> ID assigner involved in your implementation, right?
> >
> > A forge can decide to enforce that no two commits on the main branch
> > have the same change ID, for example.
>
> Would it make sense, though? Imagine that a contributor in your
> project did not refactor code properly and instead made a
> copy-and-paste duplicates of a very similar code. I find a bug in
> one of them, without realizing that the old mistake of duplicating
> code (instead of making it a shared helper that is called from the
> two places) and create a fix for it. Later somebody else realizes
> the same fix is needed for the other copy---attempting to cherry
> pick the original fix may find that remaining copy of a buggy code
> as the logic to perform a three-way merge across renames that is
> sufficiently clever kicks in. Shouldn't these two commits to fix
> the same bug in two places share the same change ID so that it is
> clear to the later developers that the latter fix was derived from
> the former one?
Maybe it depends on how the forge uses the change id. If the forge is
Gerrit, it will use the (change id, target branch) to identify a
review (IIUC), so then it will require a new change ID because it
requires a new review. If it doesn't have that requirement (maybe it's
PR-style forge), then it could at least highlight to the reviewer that
there was an old version of the change that has already been merged.
As Remo said, we've implemented this feature in Jujutsu and we'll
probably enable it by default soon. I think GitButler has had it
enabled for quite some time. So maybe we'll see in a year or two what
problems it causes :)
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-23 16:19 ` Martin von Zweigbergk
@ 2025-06-06 13:04 ` Toon Claes
0 siblings, 0 replies; 118+ messages in thread
From: Toon Claes @ 2025-06-06 13:04 UTC (permalink / raw)
To: Martin von Zweigbergk, Junio C Hamano
Cc: Remo Senekowitsch, D. Ben Knoble, Nico Williams,
Theodore Ts'o, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
Martin von Zweigbergk <martinvonz@google.com> writes:
> On Wed, 23 Apr 2025 at 08:51, Junio C Hamano <gitster@pobox.com> wrote:
>> Would it make sense, though? Imagine that a contributor in your
>> project did not refactor code properly and instead made a
>> copy-and-paste duplicates of a very similar code. I find a bug in
>> one of them, without realizing that the old mistake of duplicating
>> code (instead of making it a shared helper that is called from the
>> two places) and create a fix for it. Later somebody else realizes
>> the same fix is needed for the other copy---attempting to cherry
>> pick the original fix may find that remaining copy of a buggy code
>> as the logic to perform a three-way merge across renames that is
>> sufficiently clever kicks in. Shouldn't these two commits to fix
>> the same bug in two places share the same change ID so that it is
>> clear to the later developers that the latter fix was derived from
>> the former one?
>
> Maybe it depends on how the forge uses the change id. If the forge is
> Gerrit, it will use the (change id, target branch) to identify a
> review (IIUC), so then it will require a new change ID because it
> requires a new review. If it doesn't have that requirement (maybe it's
> PR-style forge), then it could at least highlight to the reviewer that
> there was an old version of the change that has already been merged.
I've been thinking some more about duplicates.
First, I don't think you can enforce uniqueness. Whenever you're working
with forks you can have commit A from fork I to have the same change-ID
as commit B in fork II (because either of them might be rebased).
Enforcing uniqueness on change-IDs would disallow the user to fetch from
both forks, or they would need to specify how to resolve the change-id
conflict.
I like Martin's idea of having forges ensuring uniqueness, but I was
wondering if we should take it a step furter: a commit should not be
able to reach another commit with the same change-id. Verifying this on
the client-side, helps the user detect issues sooner. Only seeing that
error when pushing to the forge might be annoying.
Anyhow on the other hand, what about merges? If people merge the 'main'
branch into their feature branch, it would no longer be possible to
merge the feature branch into the 'main' branch.
But to circle back to Junio's example. It depends on what you want to
track. You could consider both changes to be different, and thus
requiring different change-ids, because one didn't fix all of it.
--
Cheers,
Toon
^ permalink raw reply [flat|nested] 118+ messages in thread
[parent not found: <aAgWytQNqtLzg2TU@ubby>]
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
[not found] ` <aAgWytQNqtLzg2TU@ubby>
@ 2025-04-23 0:25 ` Remo Senekowitsch
2025-04-23 0:45 ` Nico Williams
2025-05-10 19:32 ` Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer) D. Ben Knoble
1 sibling, 1 reply; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-23 0:25 UTC (permalink / raw)
To: Nico Williams, D. Ben Knoble
Cc: Theodore Ts'o, Junio C Hamano, Martin von Zweigbergk,
Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Wed Apr 23, 2025 at 12:23 AM CEST, Nico Williams wrote:
> On Tue, Apr 22, 2025 at 04:17:03PM -0400, D. Ben Knoble wrote:
>> On Wed, Apr 16, 2025 at 7:36 AM Remo Senekowitsch <remo@buenzli.dev> wrote:
>> > On Mon Apr 14, 2025 at 9:54 PM CEST, D. Ben Knoble wrote:
>
> [Responding out of order.]
>
>> > I think it's the other way around: Building a code review UI built on
>> > git and centered around interdiffs today is _hard_, that's why we don't
>> > have it yet. Adding change-ids to commits will make it much easier,
>> > paving the way for these tools to be implemented.
>>
>> Fair point, although GitHub's detection of force-pushes makes me think
>> it could split a PR into versions at that point, cross-link backwards
>> and forwards by one version (from the force-push detection), show
>> range-diffs between versions based on the target branch of the merge,
>> and even follow the cross-links to show an overall sequence of
>> versions.
>
> GitLab does something like this for review comments where the author
> updates the commented you get a link you can click on that shows you the
> differences between version n-1 and version n. GitLab does this without
> change IDs.
>
> GitLab seems to figure it out -- an existence proof that it can be done.
> So maybe Junio and Theodore are quite right that similarity checks
> should be enough.
I haven't used GitLab in a while so I had to test it. I found the
feature you describe and it's nice, but it falls short in important and
somewhat predicable ways.
Firstly, it only happens when you make a comment on the whole diff.
Then it shows you an interdiff between the version you commented on and
the immediate next version. (I didn't find a way to make it show the
interdiff between the commented-on and the _latest_ version, but that
seems doable implementation wise.)
But that's not really what we're talking about here. That's taking the
interdiff between two versions _of the entire branch_ and selecting a
relevant _hunk_ from it. Neat, but completely unrelated.
If you make a comment on a _specific commit_, the feature doesn't kick
in at all. GitLab just tells you that this comment was made "on an
outdated change in commit xyz".
A truly patch-based review UI would show the interdiff between the
previous version of the specific commit that was commented on and the
next or latest version of _that specific patch_.
That requires tracking how the individual patches evolve and as far as
I can tell, GitLab makes no attempt to do so. I didn't even give it a
hard time, I kept the number of commits, the commit messages and author
timestamps the same.
So, while I must say GitLab is doing pretty well here, claiming that
they "figured out" patch-based review on top of git without change-ids
is not accurate.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-04-23 0:25 ` Remo Senekowitsch
@ 2025-04-23 0:45 ` Nico Williams
2025-04-23 12:58 ` How GitLab does/doesn't need change IDs (was Re: Semantics of change IDs) Toon Claes
0 siblings, 1 reply; 118+ messages in thread
From: Nico Williams @ 2025-04-23 0:45 UTC (permalink / raw)
To: Remo Senekowitsch
Cc: D. Ben Knoble, Theodore Ts'o, Junio C Hamano,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Wed, Apr 23, 2025 at 02:25:40AM +0200, Remo Senekowitsch wrote:
> On Wed Apr 23, 2025 at 12:23 AM CEST, Nico Williams wrote:
> > GitLab seems to figure it out -- an existence proof that it can be done.
> > So maybe Junio and Theodore are quite right that similarity checks
> > should be enough.
>
> I haven't used GitLab in a while so I had to test it. I found the
> feature you describe and it's nice, but it falls short in important and
> somewhat predicable ways.
True.
> Firstly, it only happens when you make a comment on the whole diff.
> Then it shows you an interdiff between the version you commented on and
> the immediate next version. (I didn't find a way to make it show the
> interdiff between the commented-on and the _latest_ version, but that
> seems doable implementation wise.)
I suspect that's a UI design issue: how cluttered do you want the UI to
be? GitLab's UI is already quite cluttered. Often I struggle to find
the thing I want, and I use it often.
> But that's not really what we're talking about here. That's taking the
> interdiff between two versions _of the entire branch_ and selecting a
> relevant _hunk_ from it. Neat, but completely unrelated.
I think GitHub and Gitlab both can handle ... commit ranges, can they
not? The problem is that then you have to find refs (or commit hashes)
for each version you want to compare, and once again UI complexity gets
ugly.
This is why I often do code reviews in the terminal, using `git diff`
and `git log --patch` as needed. The problem then is that actually
leaving a comment on the CR requires going back to the browser,
navigating to the CR/MR/PR/WhateverR, navigating to the file and diff,
finding the relevant hunk, and finally authoring my comment -- this is
very painful. In principle I want to be able to do everything in the
browser because I don't see how to design a TUI that lets me do all of
this, but I'm really a shell and TUI type of user, so if we could design
a TUI I'd rather use that than the browser.
> If you make a comment on a _specific commit_, the feature doesn't kick
> in at all. GitLab just tells you that this comment was made "on an
> outdated change in commit xyz".
I suppose GL could give you a link to a diff view of the same commit in
different versions.
The point is that GL demonstrates that these things can be done. And I
don't see how a change ID would have helped GL much except in cases
where one re-does all the commits with different subject lines etc, but
leaves the actual patches mostly the same. Now it does happen that I
split and squash commits, but it's rare that I completely redo them.
So I think I'm coming back to my original view, that change IDs are
useful for forward- and back-porting, and not much else, and that
existing conventions are probably good enough for forward- and
back-porting anyways. Though I don't object to a change ID header, mind
you.
> A truly patch-based review UI would show the interdiff between the
> previous version of the specific commit that was commented on and the
> next or latest version of _that specific patch_.
I suspect most users are not pedantic like me and don't really go to
that sort of length, know they could, would want to, or would if they
knew they can.
> So, while I must say GitLab is doing pretty well here, claiming that
> they "figured out" patch-based review on top of git without change-ids
> is not accurate.
They've demonstrated that this is possible now and without change IDs --
that is very relevant here since the assertion has been made that change
IDs would help do better than most CR tools do w/o them. GL probably
have done what they think their users want them to do. GL might yet add
more of this functionality.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* How GitLab does/doesn't need change IDs (was Re: Semantics of change IDs)
2025-04-23 0:45 ` Nico Williams
@ 2025-04-23 12:58 ` Toon Claes
2025-04-23 18:59 ` Nico Williams
0 siblings, 1 reply; 118+ messages in thread
From: Toon Claes @ 2025-04-23 12:58 UTC (permalink / raw)
To: Nico Williams, Remo Senekowitsch
Cc: D. Ben Knoble, Theodore Ts'o, Junio C Hamano,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
Nico Williams <nico@cryptonector.com> writes:
> On Wed, Apr 23, 2025 at 02:25:40AM +0200, Remo Senekowitsch wrote:
>> Firstly, it only happens when you make a comment on the whole diff.
>> Then it shows you an interdiff between the version you commented on and
>> the immediate next version. (I didn't find a way to make it show the
>> interdiff between the commented-on and the _latest_ version, but that
>> seems doable implementation wise.)
>
> I suspect that's a UI design issue: how cluttered do you want the UI to
> be? GitLab's UI is already quite cluttered. Often I struggle to find
> the thing I want, and I use it often.
I'm not sure if you've been using Merge Requests here, but if you do you
can compare each version against another. There are two dropdowns near
the top of the "Changes" tab: "Compare <master> and <latest version>".
Here you can compare versions against the target branch and other
versions.
> I think GitHub and Gitlab both can handle ... commit ranges, can they
> not? The problem is that then you have to find refs (or commit hashes)
> for each version you want to compare, and once again UI complexity gets
> ugly.
At GitLab we keep track of the commit IDs a branch has been (maybe only
if there is a Merge Request for that branch, I'm not sure). We can use
this information to compare versions. I think, to some people in this
thread, the use-case for Change-IDs is pretty similar: you want to tie
information about two commits together. If there is a branch, you have
this information, but in a mail-based workflow (or in Jujutsu) there is
no branch. In that case a Change-Id can fill that gap.
>> If you make a comment on a _specific commit_, the feature doesn't kick
>> in at all. GitLab just tells you that this comment was made "on an
>> outdated change in commit xyz".
As I mentioned, you might need to create a MR to retain/see this
information, I'm not entirely sure.
> The point is that GL demonstrates that these things can be done. And I
> don't see how a change ID would have helped GL much except in cases
> where one re-does all the commits with different subject lines etc, but
> leaves the actual patches mostly the same. Now it does happen that I
> split and squash commits, but it's rare that I completely redo them.
That's because GL stores history about a branch ref (outside the Git
object/ref database). If you don't do that, you can't. Having a
Change-Id embedded in the commit, retains that information in Git's DB.
--
Toon
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: How GitLab does/doesn't need change IDs (was Re: Semantics of change IDs)
2025-04-23 12:58 ` How GitLab does/doesn't need change IDs (was Re: Semantics of change IDs) Toon Claes
@ 2025-04-23 18:59 ` Nico Williams
0 siblings, 0 replies; 118+ messages in thread
From: Nico Williams @ 2025-04-23 18:59 UTC (permalink / raw)
To: Toon Claes
Cc: Remo Senekowitsch, D. Ben Knoble, Theodore Ts'o,
Junio C Hamano, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, philipmetzger@bluewin.ch
On Wed, Apr 23, 2025 at 02:58:49PM +0200, Toon Claes wrote:
> Nico Williams <nico@cryptonector.com> writes:
>
> At GitLab we keep track of the commit IDs a branch has been (maybe only
> if there is a Merge Request for that branch, I'm not sure). [...]
Do you mean "we keep track of the commit _hashes_ a branch has _seen_"?
But it can't be commit hashes, and there's no commit IDs, so GL could be
assigning synthetic, internal commit IDs based on commit similarity,
which proves Junio's and Theodore's point that similarity checking can
be enough.
> > The point is that GL demonstrates that these things can be done. And I
> > don't see how a change ID would have helped GL much except in cases
> > where one re-does all the commits with different subject lines etc, but
> > leaves the actual patches mostly the same. Now it does happen that I
> > split and squash commits, but it's rare that I completely redo them.
>
> That's because GL stores history about a branch ref (outside the Git
> object/ref database). If you don't do that, you can't. Having a
> Change-Id embedded in the commit, retains that information in Git's DB.
I.e., GL has an internal reflog on the server side. I've sometimes
wished that I could push and fetch reflogs (or subsets thereof anyways).
When doing code reviews I use [local, obv.] reflogs to see the diffs
between an earlier version of a branch that I fetched and reviewed
earlier and the latest that I just fetched and am reviewing, and
generally I don't need to see any other versions I never fetched, but
occasionally I've wished I could fetch those other versions, but since
there are no server-side refs for them, I can't. [Or maybe I'm about to
learn of some feature I didn't know about :)]
I agree that change IDs / commit IDs in commit headers can help one keep
track of versions of a branch w/o a server-side reflog, but how would
you keep track of their chnronology? I.e., how do you know which is
version 1, which is version 2, .., and which is version N-1? (Version N
being the head of the branch.) If you don't index these then finding
them is a full table scan, and if you index them then you've implemented
a server-side reflog.
Which makes me think that all that's needed for a good CR tool here is
a) a server-side reflog, b) similarity checking for commits. (a)
doesn't seem like a radical idea (that can be implemented with server
side hooks), and (b) is also not radical given that file rename / copy
operations are detected by Git using similarity checking already.
From a UI/UX perspective not having to take extra steps to get those
change IDs into the commits is nice and user-friendly.
So I've come around to not wanting a change ID header :) For the back-
and forward-port use-case having commit subject conventions that make
use of "ticket IDs" or similar is a 90% solution that many users have
been living with for decades.
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
[not found] ` <aAgWytQNqtLzg2TU@ubby>
2025-04-23 0:25 ` Remo Senekowitsch
@ 2025-05-10 19:32 ` D. Ben Knoble
2025-05-10 19:46 ` D. Ben Knoble
1 sibling, 1 reply; 118+ messages in thread
From: D. Ben Knoble @ 2025-05-10 19:32 UTC (permalink / raw)
To: Nico Williams
Cc: Remo Senekowitsch, Theodore Ts'o, Junio C Hamano,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Tue, Apr 22, 2025 at 6:23 PM Nico Williams <nico@cryptonector.com> wrote:
>
> On Tue, Apr 22, 2025 at 04:17:03PM -0400, D. Ben Knoble wrote:
> > On Wed, Apr 16, 2025 at 7:36 AM Remo Senekowitsch <remo@buenzli.dev> wrote:
> > > On Mon Apr 14, 2025 at 9:54 PM CEST, D. Ben Knoble wrote:
>
> [Responding out of order.]
>
> > > I think it's the other way around: Building a code review UI built on
> > > git and centered around interdiffs today is _hard_, that's why we don't
> > > have it yet. Adding change-ids to commits will make it much easier,
> > > paving the way for these tools to be implemented.
> >
> > Fair point, although GitHub's detection of force-pushes makes me think
> > it could split a PR into versions at that point, cross-link backwards
> > and forwards by one version (from the force-push detection), show
> > range-diffs between versions based on the target branch of the merge,
> > and even follow the cross-links to show an overall sequence of
> > versions.
>
> GitLab does something like this for review comments where the author
> updates the commented you get a link you can click on that shows you the
> differences between version n-1 and version n. GitLab does this without
> change IDs.
Does GitLab's output resemble that of git-range-diff? If not, it's not
quite what I had in mind ;)
GitHub's "compare versions" button (after a force-push) is more like
"git diff" on the 2 trees, which is only a fraction of the relevant
information.
[…]
> > But the treatment below is something I agree with and failed to
> > convey, I think: namely, URIs seem to encode too much
> > "unportable"/"specific" information in Git. I feel like the current
> > design is not really "tool-agnostic" as much as "built on a universal
> > core." That seems valuable and prone to more longevity.
>
> On the other hand URIs can easily be dereferenced, as long as they are
> not rotted.
Rot happens, though… your point is well-taken! Most of Git seems to be
valuable in the presence of linkrot, though, and I don't see how
adding URIs keeps that principle alive.
> > > > Which begs another question: what/who applies the initial change ID to
> > > > a commit and when?
> > >
> > > These are all great questions, which the originally proposed format
> > > (fixed-width reverse-hex) has answers to. I think a URI would be
> > > strictly worse.
> >
> > Well, I think we still missed "what/who applies the initial change ID
> > to a commit and when."
>
> Some possible answers:
>
> - The user constructs the change ID from other things like
> bugzilla/jira/... ticket IDs.
>
> This is OK, but probably not what OP had in mind.
>
> If the user needs additional sub-IDs just add a -{N} suffix.
>
> It's on the user and utilities to keep usage consistent. CR tooling
> could refuse to allow reuse of change IDs after a CR is merged.
Yep; I think trailers are the common version of this today. And ofc,
see linkrot.
> - CR tooling edits your history to add that change ID when you create a
> CR
>
> This feels wrong.
Yep.
> - CR tooling creates an 'empty' CR just to acquire a change ID that the
> user can then add to their commits, either manually or with utility
> that does it for them (but which they invoke).
>
> This is alright, though a bit annoying. If the user fails to set the
> change ID, then it's no worse than today.
>
> - The user adds it when they update an existing CR by manually editing
> their history or invoking a utility that does it for them. The
> change ID comes from the CR.
>
> This is alright, though a bit annoying. If the user fails to set the
> change ID, then it's no worse than today.
>
> - The CR tooling adds an empty commit at the head just to hold the
> change ID and any additional metadata for each of the commits in the
> series.
>
> The user has to remember to fetch this commit, unless it's generated
> locally, but they're going to fetch anyways so this is ok.
>
> IMO empty commits are annoying. (That includes merge commits, but
> then I'm a rebase workflow / linear history zealot, so merge commits
> are also annoying because merge workflows are annoying.) But I would
> think that empty commits that include metadata about code review are
> reasonable and tolerable, especially if their subject lines are
> descriptive, like "CR: {cr title here}" or "CR {ID}: {title}", say.
>
> I think I'd only really like the first and last of the above.
>
> Nico
> --
Thanks for thinking out some concrete options on where IDs come from!
I think I've gotten even more confused on how they are supposed to be
used (which should probably inform the implementation), but hopefully
we're getting somewhere :)
--
D. Ben Knoble
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-05-10 19:32 ` Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer) D. Ben Knoble
@ 2025-05-10 19:46 ` D. Ben Knoble
2025-05-10 20:31 ` Martin von Zweigbergk
0 siblings, 1 reply; 118+ messages in thread
From: D. Ben Knoble @ 2025-05-10 19:46 UTC (permalink / raw)
To: Nico Williams
Cc: Remo Senekowitsch, Theodore Ts'o, Junio C Hamano,
Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
On Sat, May 10, 2025 at 3:32 PM D. Ben Knoble <ben.knoble@gmail.com> wrote:
[…]
>
> Thanks for thinking out some concrete options on where IDs come from!
> I think I've gotten even more confused on how they are supposed to be
> used (which should probably inform the implementation), but hopefully
> we're getting somewhere :)
>
> --
> D. Ben Knoble
[Trying to summarize?]
On a re-read of
https://lore.kernel.org/git/CANiSa6gwup5vXU235mG+Ybbc+P=SbwoNFEmuhg=iYu0yGvSXVA@mail.gmail.com/,
I see that change IDs were motivated partly by identifying (related?)
commits after rewrites. I can certainly see how it would be nice to
track down how a commit I'm working on evolved; I can even imagine
most of the problems brought up in this thread wrt splitting or
combining commits (not to mention, say, cherry-picks where the
committer makes non-trivial changes to the patch).
There was also a note about using a change ID to identify a code
review in supporting tools. Neat!
I'll leave it to someone else to summarize the open questions? (I now
have a few of my own about how tools in Gits ecosystem respond to…
unexpected… headers.)
In the meantime, I think I'll repost this, since I'm not sure I ever
got clarity:
Re-reading the original post [1] (which didn't mention this kind of
ID?), I'm having a hard time seeing the problem statement. There's a
lot said here about the specifics of the solution, and some other neat
things it might unlock… meanwhile, I'm wondering if all the
consternation about change IDs is because the problem being solved is
underspecified for a core Git feature? (That might tie to Ted's
initial concerns about semantic meaning, on which I think I concur:
the parent and committer/author headers have unambiguous meaning to
Git, independent of anything else.)
It looks to me, an outsider, like the problem is some combination of
"I want to track a commit's evolution" and "I want to see related
commits in review, esp. when it's an identical and already-approved
commit." But I might be misreading, and clarifying the problem
statement might help bring us to a better core solution?
[1]: https://lore.kernel.org/git/xmqqh62tm5fo.fsf@gitster.g/T/#m038be849b9b4020c16c562d810cf77bad91a2c87
--
D. Ben Knoble
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-05-10 19:46 ` D. Ben Knoble
@ 2025-05-10 20:31 ` Martin von Zweigbergk
2025-05-12 17:03 ` Junio C Hamano
[not found] ` <aCJi+4q6DZhnfdy+@ubby>
0 siblings, 2 replies; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-05-10 20:31 UTC (permalink / raw)
To: D. Ben Knoble
Cc: Nico Williams, Remo Senekowitsch, Theodore Ts'o,
Junio C Hamano, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
Hi,
On Sat, 10 May 2025 at 12:46, D. Ben Knoble <ben.knoble@gmail.com> wrote:
>
> On a re-read of
> https://lore.kernel.org/git/CANiSa6gwup5vXU235mG+Ybbc+P=SbwoNFEmuhg=iYu0yGvSXVA@mail.gmail.com/,
> I see that change IDs were motivated partly by identifying (related?)
> commits after rewrites. I can certainly see how it would be nice to
> track down how a commit I'm working on evolved; I can even imagine
> most of the problems brought up in this thread wrt splitting or
> combining commits (not to mention, say, cherry-picks where the
> committer makes non-trivial changes to the patch).
>
> There was also a note about using a change ID to identify a code
> review in supporting tools. Neat!
>
> I'll leave it to someone else to summarize the open questions? (I now
> have a few of my own about how tools in Gits ecosystem respond to…
> unexpected… headers.)
>
> In the meantime, I think I'll repost this, since I'm not sure I ever
> got clarity:
>
> Re-reading the original post [1] (which didn't mention this kind of
> ID?), I'm having a hard time seeing the problem statement. There's a
> lot said here about the specifics of the solution, and some other neat
> things it might unlock… meanwhile, I'm wondering if all the
> consternation about change IDs is because the problem being solved is
> underspecified for a core Git feature? (That might tie to Ted's
> initial concerns about semantic meaning, on which I think I concur:
> the parent and committer/author headers have unambiguous meaning to
> Git, independent of anything else.)
>
> It looks to me, an outsider, like the problem is some combination of
> "I want to track a commit's evolution" and "I want to see related
> commits in review, esp. when it's an identical and already-approved
> commit." But I might be misreading, and clarifying the problem
> statement might help bring us to a better core solution?
To me, the main benefit is being able to refer to an evolving change
by a stable ID. That enables things like `jj describe qx -m 'new
description'; jj new qx` (update commit message, then switch to it)
without having to look up the new commit ID after setting the
description. That's sufficient benefit for me, and I think most
Jujutsu users would agree. That's basically the only benefit we've
gotten from it so far since we have not started transferring it to
remotes. (There are other minor benefits like being able to highlight
to the user if they have two related commits so they may want to
delete one or somehow combine them.)
Given that we already have this stable ID, it would be nice to also
transfer it to remotes and have it be preserved by the remote,
including when the remote rewrites the commit. If we can use it for
things like identifying a code review so we don't need to link it
using a `Change-Id:` commit footer, then that's even better.
If we instead had something like Mercurial's Changeset Evolution
(explicitly recording how commits have evolved), then we could have a
similar identifier that was based on the original version of a commit.
To make lookup by this kind of change ID faster, we could have an
index from commit ID to change ID (i.e. original commit ID). This
seems to imply a commit can have 0 or 1 predecessors (0 for brand new
commits, 1 for rewrites), which is different from Mercurial's
Changeset Evolution, but not necessarily bad. For this kind of change
ID to be the same across repos, and assuming the predecessor pointer
is stored in the commit, we need to make sure to transfer all commits
back to the original commit when we push to a remote. As I think we've
talked about before here, that can be problematic because the user has
to be careful to check that the intermediate commits did not have
anything sensitive in them. It's also often wasteful to share all the
intermediate commits with other developers. Another option is to
transfer the predecessor pointer outside of the commit object. That
has its own problems, like being able to create cycles in the
predecessor graph.
>
> [1]: https://lore.kernel.org/git/xmqqh62tm5fo.fsf@gitster.g/T/#m038be849b9b4020c16c562d810cf77bad91a2c87
>
> --
> D. Ben Knoble
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-05-10 20:31 ` Martin von Zweigbergk
@ 2025-05-12 17:03 ` Junio C Hamano
2025-05-12 17:19 ` Martin von Zweigbergk
[not found] ` <aCJi+4q6DZhnfdy+@ubby>
1 sibling, 1 reply; 118+ messages in thread
From: Junio C Hamano @ 2025-05-12 17:03 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: D. Ben Knoble, Nico Williams, Remo Senekowitsch,
Theodore Ts'o, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
Martin von Zweigbergk <martinvonz@google.com> writes:
> If we instead had something like Mercurial's Changeset Evolution
> (explicitly recording how commits have evolved), then we could have a
> similar identifier that was based on the original version of a commit.
> To make lookup by this kind of change ID faster, we could have an
> index from commit ID to change ID (i.e. original commit ID). This
> seems to imply a commit can have 0 or 1 predecessors (0 for brand new
> commits, 1 for rewrites), which is different from Mercurial's
> Changeset Evolution, but not necessarily bad. For this kind of change
> ID to be the same across repos, and assuming the predecessor pointer
> is stored in the commit, we need to make sure to transfer all commits
> back to the original commit when we push to a remote. As I think we've
> talked about before here, that can be problematic because the user has
> to be careful to check that the intermediate commits did not have
> anything sensitive in them. It's also often wasteful to share all the
> intermediate commits with other developers. Another option is to
> transfer the predecessor pointer outside of the commit object. That
> has its own problems, like being able to create cycles in the
> predecessor graph.
A few comments (not necessarily strong suggestions).
- I do not think you need to limit the predecessor pointers to 0 or
1; when you started from N commits and worked to produce the
final single commit, the result would naturally have N predecessors.
- The predecessor pointers do not necessarily have to participate
in the object transfer, just like filtered/lazy clones can ignore
the tree pointer in a commit object when making a commit-only
clone and the contained trees are fetched from the promisor
on-demand. It can even be set to be filtered out by default,
since it would make unnecessary transfer cost people would not
care most of the time, and only made available when the user
expresses that they want to know how the change resulted in the
current shape.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-05-12 17:03 ` Junio C Hamano
@ 2025-05-12 17:19 ` Martin von Zweigbergk
2025-05-14 14:38 ` Junio C Hamano
0 siblings, 1 reply; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-05-12 17:19 UTC (permalink / raw)
To: Junio C Hamano
Cc: D. Ben Knoble, Nico Williams, Remo Senekowitsch,
Theodore Ts'o, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Mon, 12 May 2025 at 10:03, Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin von Zweigbergk <martinvonz@google.com> writes:
>
> > If we instead had something like Mercurial's Changeset Evolution
> > (explicitly recording how commits have evolved), then we could have a
> > similar identifier that was based on the original version of a commit.
> > To make lookup by this kind of change ID faster, we could have an
> > index from commit ID to change ID (i.e. original commit ID). This
> > seems to imply a commit can have 0 or 1 predecessors (0 for brand new
> > commits, 1 for rewrites), which is different from Mercurial's
> > Changeset Evolution, but not necessarily bad. For this kind of change
> > ID to be the same across repos, and assuming the predecessor pointer
> > is stored in the commit, we need to make sure to transfer all commits
> > back to the original commit when we push to a remote. As I think we've
> > talked about before here, that can be problematic because the user has
> > to be careful to check that the intermediate commits did not have
> > anything sensitive in them. It's also often wasteful to share all the
> > intermediate commits with other developers. Another option is to
> > transfer the predecessor pointer outside of the commit object. That
> > has its own problems, like being able to create cycles in the
> > predecessor graph.
>
> A few comments (not necessarily strong suggestions).
>
> - I do not think you need to limit the predecessor pointers to 0 or
> 1; when you started from N commits and worked to produce the
> final single commit, the result would naturally have N predecessors.
>
> - The predecessor pointers do not necessarily have to participate
> in the object transfer, just like filtered/lazy clones can ignore
> the tree pointer in a commit object when making a commit-only
> clone and the contained trees are fetched from the promisor
> on-demand. It can even be set to be filtered out by default,
> since it would make unnecessary transfer cost people would not
> care most of the time, and only made available when the user
> expresses that they want to know how the change resulted in the
> current shape.
Sorry, what I meant that those two things (limiting it to 0 or 1
predecessor and transferring predecessors to the remote) would be
needed if you want to be able to use the predecessor pointers to infer
a "change ID" that allows you to do things like `jj describe qx -m
'new
description'; jj new qx`. Oh, I suppose the former can be relaxed by
simply saying that the change ID is defined as (or derived from) the
last commit ID you reach by walking predecessors backwards by
following the first predecessor pointer if there are several. You
still need to transfer at least that whole chain to the remote if you
want to make sure that others agree about the change ID (though you
only need to transfer the chain of first predecessors).
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-05-12 17:19 ` Martin von Zweigbergk
@ 2025-05-14 14:38 ` Junio C Hamano
2025-05-15 10:31 ` Oswald Buddenhagen
0 siblings, 1 reply; 118+ messages in thread
From: Junio C Hamano @ 2025-05-14 14:38 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: D. Ben Knoble, Nico Williams, Remo Senekowitsch,
Theodore Ts'o, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
Martin von Zweigbergk <martinvonz@google.com> writes:
> Sorry, what I meant that those two things (limiting it to 0 or 1
> predecessor and transferring predecessors to the remote) would be
> needed if you want to be able to use the predecessor pointers to infer
> a "change ID" that allows you to do things like `jj describe qx -m
> 'new
> description'; jj new qx`.
If we limit the data model so that it cannot represent one commit
becoming split into two (or vice versa), and instead one old commit
always corresponds to one new commit, our design of convenience
operations may become a lot easier and simpler. No question about
that. I am not sure that is not like a tail wagging a dog, though.
If most use cases are one-to-one without split/merge, then you can
still perform operations that require one-to-one correspondence in
most cases where "predecessor pointer" and "change ID" are moral
equivalents, no?
In any case, that wasn't what I primarily was interested to talk
about.
It sounds like the "change ID" being discussed is a simple and
useful thing that can and should be a commit trailer that is carried
forward automatically across "git commit --amend", "git rebase", and
"git cherry-pick", i.e. those commands that duplicate a new copy, a
refinement, of an existing commit object, without any additional
support from Git proper. More importantly, unlike the "predecessor"
thing, which may be helped if Git knew that it can optionally affect
reachability, it does not look like it needs any meaning that needs
structural support from Git internals.
So I do appreciate that the wider Git ecosystem like Gerrit and JJ
are talking to adopt "Change ID" with the same syntax and semantics
(if they all can agree, that is), but I do not think it needs to
affect Git at the object level. From what I have heard so far, it
certainly does not fit in the commit and tag object headers, but
more like the usual trailer thing.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-05-14 14:38 ` Junio C Hamano
@ 2025-05-15 10:31 ` Oswald Buddenhagen
2025-05-15 16:32 ` Jacob Keller
0 siblings, 1 reply; 118+ messages in thread
From: Oswald Buddenhagen @ 2025-05-15 10:31 UTC (permalink / raw)
To: Junio C Hamano
Cc: Martin von Zweigbergk, D. Ben Knoble, Nico Williams,
Remo Senekowitsch, Theodore Ts'o, Git Mailing List,
Edwin Kempin, Scott Chacon, philipmetzger@bluewin.ch
On Wed, May 14, 2025 at 07:38:58AM -0700, Junio C Hamano wrote:
>So I do appreciate that the wider Git ecosystem like Gerrit and JJ
>are talking to adopt "Change ID" with the same syntax and semantics
>(if they all can agree, that is), but I do not think it needs to
>affect Git at the object level. From what I have heard so far, it
>certainly does not fit in the commit and tag object headers, but
>more like the usual trailer thing.
>
but it doesn't really speak against it, either. it's a rather subjective
call whether 3rd party metadata should be included into the commit
proper or bolted on via the commit message.
this is really quite similar to rfc822 headers, and no-one would
seriously argue that "auxiliary" meta data should be stored somewhere
else. in fact, the current recommendation is even to omit X- prefixes
(rfc6648).
it certainly seems valuable to at least have an example that proves that
git doesn't corrupt extra headers. however, that would be obviously
insufficient for "proper" support in git itself, as creation, editing
and display of these headers needs to be reliable and convenient as
well.
the use case for change-ids is addressing evolving changes (be it
locally or in online review systems), which requires a one-to-one
mapping with a commit in the active working set (sans the fact that a
change can exist on multiple branches). as such, the ambiguity that
tracking the complete evolution of commits would create would be even
actively counter-productive for this use case.
by extension, it is sometimes necessary to make a conscious decision
which change-id a particular derived commit should have. for this, it's
quite convenient to have it in the commit message, because this makes it
easy to drop one (or both) change-ids when squashing (or splitting)
commits. however, this could also be achieved by having a defined format
for editing metadata from within the commit message editor (this can be
already achieved by having prepare-commit-msg and commit-msg hooks, but
it's messy and lacks standardization).
one argument against change-id trailers is that they are eye sores, in
particular in small projects that don't use trailers otherwise. this is
in fact a common argument against even optional use of gerrit for
reviews. having a more "subtle" implementation in git upstream would
certainly alleviate this.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-05-15 10:31 ` Oswald Buddenhagen
@ 2025-05-15 16:32 ` Jacob Keller
2025-05-15 19:59 ` Junio C Hamano
0 siblings, 1 reply; 118+ messages in thread
From: Jacob Keller @ 2025-05-15 16:32 UTC (permalink / raw)
To: Oswald Buddenhagen
Cc: Junio C Hamano, Martin von Zweigbergk, D. Ben Knoble,
Nico Williams, Remo Senekowitsch, Theodore Ts'o,
Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Thu, May 15, 2025 at 3:49 AM Oswald Buddenhagen
<oswald.buddenhagen@gmx.de> wrote:
> one argument against change-id trailers is that they are eye sores, in
> particular in small projects that don't use trailers otherwise. this is
> in fact a common argument against even optional use of gerrit for
> reviews. having a more "subtle" implementation in git upstream would
> certainly alleviate this.
>
At one point, the driver team I work for wanted to include Change-Id
trailers to commits we submitted to the Linux kernel, for tracking
against our own database (we used Gerrit at the time). They were
rejected for this very reason of being an eye sore -- (possibly other
reasons as well, I can't recall the full discussion). Of course, if
they aren't in the commit message but instead a header, it would have
needed some other way to specify them in emailed patch form if we
wanted them to stick around when using the am-based workflows.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-05-15 16:32 ` Jacob Keller
@ 2025-05-15 19:59 ` Junio C Hamano
2025-05-15 20:10 ` Nico Williams
0 siblings, 1 reply; 118+ messages in thread
From: Junio C Hamano @ 2025-05-15 19:59 UTC (permalink / raw)
To: Jacob Keller
Cc: Oswald Buddenhagen, Martin von Zweigbergk, D. Ben Knoble,
Nico Williams, Remo Senekowitsch, Theodore Ts'o,
Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
Jacob Keller <jacob.keller@gmail.com> writes:
> At one point, the driver team I work for wanted to include Change-Id
> trailers to commits we submitted to the Linux kernel, for tracking
> against our own database (we used Gerrit at the time). They were
> rejected for this very reason of being an eye sore -- (possibly other
> reasons as well, I can't recall the full discussion).
For something to be an eye sore, it also has to be of no use to
those who consider it an eye sore. The signed-off-by trailer is
noisy and it becomes annoying after reading "git log --no-merges"
for a week worth of commits, but it serves useful purpose so nobody
would complain them as being an eye sore, even if they complain for
other reasons.
Why weren't they seeing any benefit of having such trailer? Would
they have found a good use of the information if it were hidden in
the header part?
If the answer is "it is only useful to some people", what is the
reason why those other people find it useless? Is it "our own
database" being closed and there were no federated catalog of
change-ids that can be used by all project participants? Or does it
go beyond that, like what a Change-Id trailer means to project
participants from one organization is different to those from
another, or something?
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-05-15 19:59 ` Junio C Hamano
@ 2025-05-15 20:10 ` Nico Williams
0 siblings, 0 replies; 118+ messages in thread
From: Nico Williams @ 2025-05-15 20:10 UTC (permalink / raw)
To: Junio C Hamano
Cc: Jacob Keller, Oswald Buddenhagen, Martin von Zweigbergk,
D. Ben Knoble, Remo Senekowitsch, Theodore Ts'o,
Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Thu, May 15, 2025 at 12:59:46PM -0700, Junio C Hamano wrote:
> For something to be an eye sore, it also has to be of no use to
> those who consider it an eye sore. The signed-off-by trailer is
> noisy and it becomes annoying after reading "git log --no-merges"
> for a week worth of commits, but it serves useful purpose so nobody
> would complain them as being an eye sore, even if they complain for
> other reasons.
Maybe `git log` can have options for leaving out trailers of no interest
to the user? Email workflows still will see them, of course.
> Why weren't they seeing any benefit of having such trailer? Would
> they have found a good use of the information if it were hidden in
> the header part?
>
> If the answer is "it is only useful to some people", what is the
> [...]
I think the answer is "they are only useful a tiny fraction of the times
I/we/they look at the git log".
Nico
--
^ permalink raw reply [flat|nested] 118+ messages in thread
[parent not found: <aCJi+4q6DZhnfdy+@ubby>]
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
[not found] ` <aCJi+4q6DZhnfdy+@ubby>
@ 2025-05-12 21:43 ` Martin von Zweigbergk
2025-05-12 22:04 ` brian m. carlson
2025-05-13 21:22 ` D. Ben Knoble
0 siblings, 2 replies; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-05-12 21:43 UTC (permalink / raw)
To: Nico Williams
Cc: D. Ben Knoble, Remo Senekowitsch, Theodore Ts'o,
Junio C Hamano, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Mon, 12 May 2025 at 14:07, Nico Williams <nico@cryptonector.com> wrote:
>
> On Sat, May 10, 2025 at 01:31:32PM -0700, Martin von Zweigbergk wrote:
> > To me, the main benefit is being able to refer to an evolving change
> > by a stable ID. That enables things like `jj describe qx -m 'new
> > description'; jj new qx` (update commit message, then switch to it)
> > without having to look up the new commit ID after setting the
> > description.
>
> Notionally this is not different from renaming a file. You have a name
> (file name, commit message subject) and you have the thing it refers to
> (file contents, tree object).
>
> <insert sub-thread about why Git does not have inode numbers for
> files, does not record rename/copy intent, and depends on file
> content similarity checks to detect renames>
>
> If Git can do file content similarity checking to discover renames, then
> surely so can jj and other CR tools do commit similarity checking to
> discover commit message changes. Is there anything that makes the
> preceding statement incorrect?
That wouldn't work in the `jj describe qx -m 'new description'; jj new
qx` example I used above, right? I think you're suggesting that when
the user runs `jj describe qx -m 'new description'`, we should compare
the reachable commits before the command to the reachable commits
after the command and then record in some storage that the new commit
is part of the same "change" as the old commit. Is that what you
meant? In this particular case, the commit message obviously changed,
so comparing the commit messages will obviously fail. We could of
course make this command record the information itself, however.
> > Given that we already have this stable ID, [...]
>
> "We" == jujutsu?
Yes, sorry :)
> How is this stable ID constructed?
It's just random bytes (16 when using the Git backend, 32 in the
Google backend).
> How would things other than jj construct these? We spent many messages
> trying to work that out and in my estimate that wasn't settled.
Random bytes has worked well for jj.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-05-12 21:43 ` Martin von Zweigbergk
@ 2025-05-12 22:04 ` brian m. carlson
2025-06-06 12:28 ` Toon Claes
2025-05-13 21:22 ` D. Ben Knoble
1 sibling, 1 reply; 118+ messages in thread
From: brian m. carlson @ 2025-05-12 22:04 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Nico Williams, D. Ben Knoble, Remo Senekowitsch,
Theodore Ts'o, Junio C Hamano, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
[-- Attachment #1: Type: text/plain, Size: 1674 bytes --]
On 2025-05-12 at 21:43:46, Martin von Zweigbergk wrote:
> On Mon, 12 May 2025 at 14:07, Nico Williams <nico@cryptonector.com> wrote:
> >
> > How is this stable ID constructed?
>
> It's just random bytes (16 when using the Git backend, 32 in the
> Google backend).
>
> > How would things other than jj construct these? We spent many messages
> > trying to work that out and in my estimate that wasn't settled.
>
> Random bytes has worked well for jj.
I would like to suggest that we use a deterministic approach. People
rely on Git commits being deterministic, including in my stash
import/export series[0]. In addition, it's important to avoid any
allegations of side channels or leaking information in commits, which
would be a concern in many environments and which a deterministic
approach would avoid[1].
I'd suggest a simple SHA-256 hash of the original commit data (for both
SHA-1 and SHA-256 commits, but one that would change to a new hash if we
added one) or an HMAC-SHA-256 with a fixed and documented key.
I would also recommend a config option to avoid creating these IDs for
those who don't want them included for privacy reasons. I expect to set
such an option, for instance.
[0] That series will definitely require that they be disabled when
creating commits, since the goal is to ensure bit-for-bit
reproducibility between different Git versions so that users can
immediately tell if the stash history is identical.
[1] For instance, it's an easy way to leak keys or other credentials
without people noticing just by pushing an innocuous-looking commit.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 325 bytes --]
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-05-12 22:04 ` brian m. carlson
@ 2025-06-06 12:28 ` Toon Claes
2025-06-06 15:44 ` Junio C Hamano
0 siblings, 1 reply; 118+ messages in thread
From: Toon Claes @ 2025-06-06 12:28 UTC (permalink / raw)
To: brian m. carlson, Martin von Zweigbergk
Cc: Nico Williams, D. Ben Knoble, Remo Senekowitsch,
Theodore Ts'o, Junio C Hamano, Git Mailing List, Edwin Kempin,
Scott Chacon, philipmetzger@bluewin.ch
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> On 2025-05-12 at 21:43:46, Martin von Zweigbergk wrote:
>> Random bytes has worked well for jj.
>
> I would like to suggest that we use a deterministic approach. People
> rely on Git commits being deterministic, including in my stash
> import/export series[0]. In addition, it's important to avoid any
> allegations of side channels or leaking information in commits, which
> would be a concern in many environments and which a deterministic
> approach would avoid[1].
>
> I'd suggest a simple SHA-256 hash of the original commit data (for both
> SHA-1 and SHA-256 commits, but one that would change to a new hash if we
> added one) or an HMAC-SHA-256 with a fixed and documented key.
I was thinking: you cannot guarantee determinism, because the change-ID
would remain stable, even when if the underlaying data on which it was
generated changes. But on second thought, _some_ determinisn *can* be
useful, for example when different tools try to generate a change-ID for
the same source commit.
> I would also recommend a config option to avoid creating these IDs for
> those who don't want them included for privacy reasons. I expect to set
> such an option, for instance.
Fair enough.
--
Cheers,
Toon
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-06-06 12:28 ` Toon Claes
@ 2025-06-06 15:44 ` Junio C Hamano
0 siblings, 0 replies; 118+ messages in thread
From: Junio C Hamano @ 2025-06-06 15:44 UTC (permalink / raw)
To: Toon Claes
Cc: brian m. carlson, Martin von Zweigbergk, Nico Williams,
D. Ben Knoble, Remo Senekowitsch, Theodore Ts'o,
Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
Toon Claes <toon@iotcl.com> writes:
> ... But on second thought, _some_ determinisn *can* be
> useful, for example when different tools try to generate a change-ID for
> the same source commit.
Yup. And once we have such determinism for the change-ID that is
given to a freshly written commit not derived from anything else, as
long as different tools use the same criteria to decide when to and
not to carry forward the existing change-IDs forward to a commit
they newly create,
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)
2025-05-12 21:43 ` Martin von Zweigbergk
2025-05-12 22:04 ` brian m. carlson
@ 2025-05-13 21:22 ` D. Ben Knoble
1 sibling, 0 replies; 118+ messages in thread
From: D. Ben Knoble @ 2025-05-13 21:22 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Nico Williams, Remo Senekowitsch, Theodore Ts'o,
Junio C Hamano, Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Mon, May 12, 2025 at 5:43 PM Martin von Zweigbergk
<martinvonz@google.com> wrote:
>
> On Mon, 12 May 2025 at 14:07, Nico Williams <nico@cryptonector.com> wrote:
> >
> > On Sat, May 10, 2025 at 01:31:32PM -0700, Martin von Zweigbergk wrote:
> > > To me, the main benefit is being able to refer to an evolving change
> > > by a stable ID. That enables things like `jj describe qx -m 'new
> > > description'; jj new qx` (update commit message, then switch to it)
> > > without having to look up the new commit ID after setting the
> > > description.
> >
> > Notionally this is not different from renaming a file. You have a name
> > (file name, commit message subject) and you have the thing it refers to
> > (file contents, tree object).
> >
> > <insert sub-thread about why Git does not have inode numbers for
> > files, does not record rename/copy intent, and depends on file
> > content similarity checks to detect renames>
> >
> > If Git can do file content similarity checking to discover renames, then
> > surely so can jj and other CR tools do commit similarity checking to
> > discover commit message changes. Is there anything that makes the
> > preceding statement incorrect?
>
> That wouldn't work in the `jj describe qx -m 'new description'; jj new
> qx` example I used above, right? I think you're suggesting that when
> the user runs `jj describe qx -m 'new description'`, we should compare
> the reachable commits before the command to the reachable commits
> after the command and then record in some storage that the new commit
> is part of the same "change" as the old commit. Is that what you
> meant? In this particular case, the commit message obviously changed,
> so comparing the commit messages will obviously fail. We could of
> course make this command record the information itself, however.
I didn't follow the entirety of the example (since I think you'd have
to have change ID "qx" to start with in that case), but:
I think you'd compare commit /contents/ (aka the trees they point to)
rather than /messages/, just like how rename detection compares blobs
or trees? (Although I seem to recall a recent thread where a heuristic
involving the old/new name went wrong because it was too short, so a
heuristic in the messages is probably also reasonable. Doesn't
range-diff do something similar?)
>
> > > Given that we already have this stable ID, [...]
> >
> > "We" == jujutsu?
>
> Yes, sorry :)
>
> > How is this stable ID constructed?
>
> It's just random bytes (16 when using the Git backend, 32 in the
> Google backend).
>
> > How would things other than jj construct these? We spent many messages
> > trying to work that out and in my estimate that wasn't settled.
>
> Random bytes has worked well for jj.
--
D. Ben Knoble
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-07 20:59 ` Junio C Hamano
2025-04-07 21:36 ` Nico Williams
@ 2025-04-07 22:51 ` Remo Senekowitsch
2025-04-08 0:10 ` Junio C Hamano
2 siblings, 0 replies; 118+ messages in thread
From: Remo Senekowitsch @ 2025-04-07 22:51 UTC (permalink / raw)
To: Junio C Hamano, Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon,
philipmetzger@bluewin.ch
On Mon Apr 7, 2025 at 10:59 PM CEST, Junio C Hamano wrote:
> Martin von Zweigbergk <martinvonz@google.com> writes:
>
>> For example, it enables
>> `git rebase main <change ID>; git switch <change ID>` without
>> requiring the user to look up the hash of the rewritten commit.
>
> I do not quite see why this can be listed even as an advantage,
> unless you are going to allow end users to name the changes, instead
> of using auto-generated impossible-to-remember hexadecimal string
> (perhaps prefixed with a single "I" or something).
Since the change-id will use a "reverse-hex" alphabet (z-k instead of
0-f), prefixing an "I" won't be necessary for disambiguation.
In the case of Jujutsu, there is a configurable "immutable revset",
which represents the idea that people usually don't want to force-push
over the master branch. If the user specifies a change-id prefix that's
ambiguous, but only one of the commits it could refer to is "mutable",
that one takes precedence. So in practice, the commits one is currently
working on can be identified with 1-3 characters, which fit comfortably
into short-term memory.
While that feature isn't implemented in Git (yet), it shows how such a
usage pattern can be very ergonomic.
>> If the change id also transferred between repos and preserved by
>> a forge (such as Gerrit), it enables the change id to be used to
>> identify a code review.
>
> People often talk about rebasing and rewriting in the context of
> discussing "change IDs", and for 80% of the use cases where a simple
> single-commit topic is involved, it would perfectly work fine.
> After making a new commit C0 on top of 'main', updating 'main' with
> others' changes, and then rebasing that C0 on top of updated 'main'
> to produce C1, you would expect that C0 and C1 are moral equivalents
> so it is natural that you wish there is a name to give to these
> moral equivalents.
>
> But stepping back a bit, if they are not just moral equivalents but
> record identical changes that are so same that an earlier review of
> C0 makes it unnecessary to review C1, why are you even rebasing in
> the first place? Just merging C0 to the updated 'main' would retain
> the earlier review made on C0 and things should merge just fine.
Some people (including me) like to / are used to rebasing often and
keeping a linear history on master as possible. I'm not aware that
there's anything wrong with that.
In practice, I don't think this will cause any problems. The change-id
will help the code review tool to identify the commits as morally
equivalent. After checking that the other review-relevant data (patch,
message, author...) is identical, the review tool may choose to mark
the commit C1 as "already reviewed" without any user intervention.
And if the data has somehow changed, only the interdiff of that can be
shown for review, which is precisely the kind of benefit we're aiming
for with this header. It has been pointed out that git-range-diff can do
this in a limited fashion already, which we can hopefully expand upon.
> I have more problems with the remaining 20% use case, where you need
> to deal with multiple commits.
>
> Perhaps your initial changeset is a single commit C0 that is so
> large and does too many things at once, and reviewers would
> naturally advise you to split things up. You'll come up with a
> series of commits, C1_0 and C1_1. The net effect of applying these
> two patches may be the same as applying the original C0, but each of
> them is more cleanly separated to address one issue at a time, and
> the explanation given in the proposed log message more clearly
> describes the issue each of them addresses. Now you gave a change ID
> to C0, and want to somehow relate C1_0 and C1_1 to the original C0.
> Which one gets the same change ID? Earlier one? The last one?
> Both gets the same change ID?
>
> Or your initial changeset is a two-commit series, C0_0 and C0_1, but
> reviewers find that each one of them alone is not complete, and
> because the issue addressed by these two is small and isolated
> enough, you are advised to make them into a single commit C1. Did
> you start with two change IDs for these two original commits? If
> so, whose change ID the updated commit C1 inherit? Or does C1 have
> two change IDs now? Or did you start with a single change ID
> assigned to both of these two original commits?
These are good descriptions of realistic scenarios. At a high level, I'd
say change-ids don't help much with these problems. But that's OK, they
don't need to solve all problems to be worthwhile.
A commit only ever has one change-id. (Other headers may be useful, but
they should have another name and should be disussed separately IMO.)
In both of the described scenarios, it doesn't matter which one
inherits the previous one. When a commit is split into two, the one that
inherits the change-id will show up in review tools with half of its
patch deleted, the other commit will show up as new. When two commits
are combined into one, any one of the change-ids is inherited. Review
tools will show the patch from the commit the ID was inherited from as
preserved (maybe even hidden) while the patches from the other commits
will show up as added. If they were previously reviewed, those reviews
will be "lost".
So, while these scenarios are not ideal and somehow ambiguous, the
ambiguity doesn't present any real problems in practice.
More importantly, I disagree with the 80%-20% split between these
scenarios. There is another one that comes up a lot, and that scenario
is the one that benefits the most from change-ids. Namely, when multiple
commits are carried together as a patchset. These commits can split
out into little subtrees that are constantly rebased to keep up with
upstream, while the commits are amended, reworded and so on. They can
sometimes be reviewed as a whole, where people are generally looking at
all commits at once. But they can also represent dependencies between
entirely different features. For example, bugfixes will often be at the
base of this tree / patchset, while features that depend on it fan out
from there. Here is where the change-id shines: As commits are amended,
reordered, rebased and reworded, their evolution can be trivially
tracked. There are discussions around the topic of "stacked PRs", which
is basically this exact scenario, just managed in a specific way.
I find myself working this way a lot.
> Quite frankly, I think the concept of "change ID" is nice but it is
> not mechanically trustable. Recording them in the trailers is fine,
> but I somehow feel that they have a clear-cut semantics everybody
> can agree on to deserve to be in the header part of commit objects.
I think the concrete examples above show that while change-ids don't
solve all version control problems, these remaining problems don't
diminish the value of change-ids either. I also don't see any potential
for different tools in the ecosystem to disagree about the semantics so
profoundly that interoperability would be hampered.
Remo
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-07 20:59 ` Junio C Hamano
2025-04-07 21:36 ` Nico Williams
2025-04-07 22:51 ` Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer Remo Senekowitsch
@ 2025-04-08 0:10 ` Junio C Hamano
2025-04-08 5:35 ` Martin von Zweigbergk
2 siblings, 1 reply; 118+ messages in thread
From: Junio C Hamano @ 2025-04-08 0:10 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
Junio C Hamano <gitster@pobox.com> writes:
> I have more problems with the remaining 20% use case, where you need
> to deal with multiple commits.
>
> Perhaps your initial changeset is a single commit C0 that is so
> large and does too many things at once, and reviewers would
> naturally advise you to split things up. You'll come up with a
> series of commits, C1_0 and C1_1. The net effect of applying these
> two patches may be the same as applying the original C0, but each of
> them is more cleanly separated to address one issue at a time, and
> the explanation given in the proposed log message more clearly
> describes the issue each of them addresses. Now you gave a change ID
> to C0, and want to somehow relate C1_0 and C1_1 to the original C0.
> Which one gets the same change ID? Earlier one? The last one?
> Both gets the same change ID?
>
> Or your initial changeset is a two-commit series, C0_0 and C0_1, but
> reviewers find that each one of them alone is not complete, and
> because the issue addressed by these two is small and isolated
> enough, you are advised to make them into a single commit C1. Did
> you start with two change IDs for these two original commits? If
> so, whose change ID the updated commit C1 inherit? Or does C1 have
> two change IDs now? Or did you start with a single change ID
> assigned to both of these two original commits?
Another thing I forgot to mention.
A well-kept reflog on a topic branch can keep track of how the set
of patches that makes the topic evolved. E.g. with something like
this:
$ git switch topic
$ git range-diff @{1}...
$ git range-diff @{2}...@{1}
$ git range-diff @{3}...@{2}
you can view how the topic as a whole has evolved. A downside of
using reflog entries this way is that it is not well-suited for
distributed workflow.
A set of individual commits that share the same "change ID" is,
unlike reflog entries which is an ordered set of tip of topics, not
inherently ordered. This is inevitable in the distributed world
where many people can simultaneously work on improving a single
"change" in many different ways, but making it difficult if not
impossible to see how things evolved, simply because you first need
to figure out the order of these commits that share the same "change
ID". Some may be independently evolved from the same ancestor
iteration. Some may be repeatedly worked on on a single strand of
pearls (much like how development recorded in reflog entries of a
single branch in a single user set-up goes). I guess you would need
a way to record the predecessor vs successor relationship of various
commits that share the same "change ID", much like commits form DAG
to represent ancestor vs descendant relationship.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-08 0:10 ` Junio C Hamano
@ 2025-04-08 5:35 ` Martin von Zweigbergk
2025-04-08 14:27 ` Junio C Hamano
2025-04-08 14:27 ` Junio C Hamano
0 siblings, 2 replies; 118+ messages in thread
From: Martin von Zweigbergk @ 2025-04-08 5:35 UTC (permalink / raw)
To: Junio C Hamano
Cc: Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On Mon, 7 Apr 2025 at 17:10, Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> A well-kept reflog on a topic branch can keep track of how the set
> of patches that makes the topic evolved. E.g. with something like
> this:
>
> $ git switch topic
> $ git range-diff @{1}...
> $ git range-diff @{2}...@{1}
> $ git range-diff @{3}...@{2}
>
> you can view how the topic as a whole has evolved. A downside of
> using reflog entries this way is that it is not well-suited for
> distributed workflow.
>
> A set of individual commits that share the same "change ID" is,
> unlike reflog entries which is an ordered set of tip of topics, not
> inherently ordered. This is inevitable in the distributed world
> where many people can simultaneously work on improving a single
> "change" in many different ways, but making it difficult if not
> impossible to see how things evolved, simply because you first need
> to figure out the order of these commits that share the same "change
> ID". Some may be independently evolved from the same ancestor
> iteration. Some may be repeatedly worked on on a single strand of
> pearls (much like how development recorded in reflog entries of a
> single branch in a single user set-up goes). I guess you would need
> a way to record the predecessor vs successor relationship of various
> commits that share the same "change ID", much like commits form DAG
> to represent ancestor vs descendant relationship.
That is correct. The change ID should be sufficient for handling
simple distributed cases involving a single remote but it's not a full
replacement for something like Mercurial's Changeset Evolution [1].
For example:
1. I push commit C1 with change ID X to remote R
2. You fetch commit C1 from remote R
3. You rewrite commit C1 as commit C2 (still change ID X)
4. You push C2 to the remote
5. I fetch from the remote
My client will now be able to tell that the remote rewrote C1 to C2
(because you told it to do so). Any local commits I had on top of C1
can then be rebased on top of C2.
However, let's say you had instead pushed C2 remote R2. If my client
had never fetched from that remote before, then when I fetch from R2,
it would not know whether the new C2 was a successor of C1 or actually
a predecessor of it (or even a sibling in this "obsolescence graph" as
Mercurial calls it).
[1] https://wiki.mercurial-scm.org/ChangesetEvolution
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-08 5:35 ` Martin von Zweigbergk
@ 2025-04-08 14:27 ` Junio C Hamano
2025-04-08 15:58 ` Phillip Wood
` (2 more replies)
2025-04-08 14:27 ` Junio C Hamano
1 sibling, 3 replies; 118+ messages in thread
From: Junio C Hamano @ 2025-04-08 14:27 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
Martin von Zweigbergk <martinvonz@google.com> writes:
>> A set of individual commits that share the same "change ID" is,
>> unlike reflog entries which is an ordered set of tip of topics, not
>> inherently ordered. This is inevitable in the distributed world
>> where many people can simultaneously work on improving a single
>> "change" in many different ways, but making it difficult if not
>> impossible to see how things evolved, simply because you first need
>> to figure out the order of these commits that share the same "change
>> ID". Some may be independently evolved from the same ancestor
>> iteration. Some may be repeatedly worked on on a single strand of
>> pearls (much like how development recorded in reflog entries of a
>> single branch in a single user set-up goes). I guess you would need
>> a way to record the predecessor vs successor relationship of various
>> commits that share the same "change ID", much like commits form DAG
>> to represent ancestor vs descendant relationship.
>
> That is correct. The change ID should be sufficient for handling
> simple distributed cases involving a single remote but it's not a full
> replacement for something like Mercurial's Changeset Evolution [1].
Just a random thought. We could very easily replace "change ID"
with a concept of predecessor-successor commits.
Just like we can represent parents-children NxM transitive relation
only with 0 or more "parent" commit object headers, we can record
zero or more "predecessor" trailer in the commit log.
(1) a commit with no "predecessor" is like "root commit" in the
commit history topology. It is a brand new change that took
inspiration from nobody else and that is not a polished form of
any other existing commit.
(2) a commit created as a refinement for one or more existing
commits record each of them as "predecessor" to it. Having
more than one of them is like a "merge commit" in the commit
history topology and represents that two patches were squashed
into one.
(3) Splitting an originally large change into multiple changes can
be represented the same way. They share the same commit as
their "predecessor". Perhaps you have originally two-commit
series, A and B, and split them differently in such a way that
C has half of a and D has the rest of A plus B. In which case,
C has A as its predecessor while D has both A and B as its
predecessor.
(4) Just like we can use auxiliary data structures like bitmaps to
figure out reachability without following all the links in the
commit history topology, we should be able to learn how a new
change was born, and trace how it evolved into newer iteration
of the moral equivalent of the change, possibly as a series
with mutiple commits, using auxiliary data structure, which
would represent predecessor-successor NxM transitive relation
in a similar way in a form that is efficient to access.
Something like this should allow us avoid relying on "change ID"s
that can collide elsewhere in the world without having a central
authority to assign them.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-08 14:27 ` Junio C Hamano
@ 2025-04-08 15:58 ` Phillip Wood
2025-04-08 16:27 ` Nico Williams
2025-04-12 21:32 ` Junio C Hamano
2025-04-16 0:24 ` Jacob Keller
2025-05-14 15:08 ` Kristoffer Haugsbakk
2 siblings, 2 replies; 118+ messages in thread
From: Phillip Wood @ 2025-04-08 15:58 UTC (permalink / raw)
To: Junio C Hamano, Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On 08/04/2025 15:27, Junio C Hamano wrote:
> Martin von Zweigbergk <martinvonz@google.com> writes:
>
>>> A set of individual commits that share the same "change ID" is,
>>> unlike reflog entries which is an ordered set of tip of topics, not
>>> inherently ordered. This is inevitable in the distributed world
>>> where many people can simultaneously work on improving a single
>>> "change" in many different ways, but making it difficult if not
>>> impossible to see how things evolved, simply because you first need
>>> to figure out the order of these commits that share the same "change
>>> ID". Some may be independently evolved from the same ancestor
>>> iteration. Some may be repeatedly worked on on a single strand of
>>> pearls (much like how development recorded in reflog entries of a
>>> single branch in a single user set-up goes). I guess you would need
>>> a way to record the predecessor vs successor relationship of various
>>> commits that share the same "change ID", much like commits form DAG
>>> to represent ancestor vs descendant relationship.
>>
>> That is correct. The change ID should be sufficient for handling
>> simple distributed cases involving a single remote but it's not a full
>> replacement for something like Mercurial's Changeset Evolution [1].
>
> Just a random thought. We could very easily replace "change ID"
> with a concept of predecessor-successor commits.
>
> Just like we can represent parents-children NxM transitive relation
> only with 0 or more "parent" commit object headers, we can record
> zero or more "predecessor" trailer in the commit log.
>
> (1) a commit with no "predecessor" is like "root commit" in the
> commit history topology. It is a brand new change that took
> inspiration from nobody else and that is not a polished form of
> any other existing commit.
>
> (2) a commit created as a refinement for one or more existing
> commits record each of them as "predecessor" to it. Having
> more than one of them is like a "merge commit" in the commit
> history topology and represents that two patches were squashed
> into one.
>
> (3) Splitting an originally large change into multiple changes can
> be represented the same way. They share the same commit as
> their "predecessor". Perhaps you have originally two-commit
> series, A and B, and split them differently in such a way that
> C has half of a and D has the rest of A plus B. In which case,
> C has A as its predecessor while D has both A and B as its
> predecessor.
>
> (4) Just like we can use auxiliary data structures like bitmaps to
> figure out reachability without following all the links in the
> commit history topology, we should be able to learn how a new
> change was born, and trace how it evolved into newer iteration
> of the moral equivalent of the change, possibly as a series
> with mutiple commits, using auxiliary data structure, which
> would represent predecessor-successor NxM transitive relation
> in a similar way in a form that is efficient to access.
>
> Something like this should allow us avoid relying on "change ID"s
> that can collide elsewhere in the world without having a central
> authority to assign them.
This is similar in spirit to the "git evolve" proposal [1]. One of the
objections to that was that it required all of the rewritten commits to
be pushed back to the remote, rather than just the current version. So
if I rewrite a branch three times and push the result for review all of
the intermediate state gets pushed as well. That is because the
intermediate commits were needed to track the chain of rewritten commits
to avoid the problem Elijah described [2] when trying to follow
cherry-picked-from trailers. If the predecessor information was stored
separately to the commit it refers to (in a notes ref for example) then
we could in principle simplify the chain of rewrites when pushing so
that we only need to push the final version of the commit and a mapping
from the version that we fetched from the remote.
Tracking predecessors as you describe is certainly a more complete
solution to tracking the evolution of commits and it addresses the
shortcomings of change-ids you outlined in your previous mail. It is a
lot more work to implement though.
Best Wishes
Phillip
[1]
https://lore.kernel.org/git/pull.1356.git.1663959324.gitgitgadget@gmail.com/
[2]
https://lore.kernel.org/git/CABPp-BECTrVp9X6bVmzU8LEeYsC3KbzeJvAaDPN+FgZz_uEhmA@mail.gmail.com/
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-08 15:58 ` Phillip Wood
@ 2025-04-08 16:27 ` Nico Williams
2025-04-12 21:32 ` Junio C Hamano
1 sibling, 0 replies; 118+ messages in thread
From: Nico Williams @ 2025-04-08 16:27 UTC (permalink / raw)
To: phillip.wood
Cc: Junio C Hamano, Martin von Zweigbergk, Git Mailing List,
Edwin Kempin, Scott Chacon, remo, philipmetzger@bluewin.ch
On Tue, Apr 08, 2025 at 04:58:58PM +0100, Phillip Wood wrote:
> On 08/04/2025 15:27, Junio C Hamano wrote:
> > Something like this should allow us avoid relying on "change ID"s
> > that can collide elsewhere in the world without having a central
> > authority to assign them.
>
> This is similar in spirit to the "git evolve" proposal [1]. One of the
> objections to that was that it required all of the rewritten commits to be
> pushed back to the remote, rather than just the current version. So if I
> rewrite a branch three times and push the result for review all of the
> intermediate state gets pushed as well. That is because the intermediate
> commits were needed to track the chain of rewritten commits to avoid the
> problem Elijah described [2] when trying to follow cherry-picked-from
> trailers. If the predecessor information was stored separately to the commit
> it refers to (in a notes ref for example) then we could in principle
> simplify the chain of rewrites when pushing so that we only need to push the
> final version of the commit and a mapping from the version that we fetched
> from the remote.
One might as well have a way to push reflogs. But so much internal
history is just distracting. Half the point of linear history upstream
is to drop all the internal history that no one should care about.
Even during code review this is too much and too distracting. Some
users commit very often, and for them this would make their work rather
difficult to review.
> Tracking predecessors as you describe is certainly a more complete solution
> to tracking the evolution of commits and it addresses the shortcomings of
> change-ids you outlined in your previous mail. It is a lot more work to
> implement though.
There's also that.
IMO that's ETOOMUCH. Just change IDs / series IDs should suffice.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-08 15:58 ` Phillip Wood
2025-04-08 16:27 ` Nico Williams
@ 2025-04-12 21:32 ` Junio C Hamano
1 sibling, 0 replies; 118+ messages in thread
From: Junio C Hamano @ 2025-04-12 21:32 UTC (permalink / raw)
To: Phillip Wood
Cc: Martin von Zweigbergk, Git Mailing List, Edwin Kempin,
Scott Chacon, remo, philipmetzger@bluewin.ch
Phillip Wood <phillip.wood123@gmail.com> writes:
> This is similar in spirit to the "git evolve" proposal [1]. One of the
> objections to that was that it required all of the rewritten commits
> to be pushed back to the remote, rather than just the current
> version.
I am not sure if that particular "objection" is valid.
We can make the predecessor link not participate in the reachability
dag, and even if we made them contribute to commit reachability,
they can be filtered out with --filter= facility, can't they?
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-08 14:27 ` Junio C Hamano
2025-04-08 15:58 ` Phillip Wood
@ 2025-04-16 0:24 ` Jacob Keller
2025-05-14 15:08 ` Kristoffer Haugsbakk
2 siblings, 0 replies; 118+ messages in thread
From: Jacob Keller @ 2025-04-16 0:24 UTC (permalink / raw)
To: Junio C Hamano, Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
On 4/8/2025 7:27 AM, Junio C Hamano wrote:
> Martin von Zweigbergk <martinvonz@google.com> writes:
>
>>> A set of individual commits that share the same "change ID" is,
>>> unlike reflog entries which is an ordered set of tip of topics, not
>>> inherently ordered. This is inevitable in the distributed world
>>> where many people can simultaneously work on improving a single
>>> "change" in many different ways, but making it difficult if not
>>> impossible to see how things evolved, simply because you first need
>>> to figure out the order of these commits that share the same "change
>>> ID". Some may be independently evolved from the same ancestor
>>> iteration. Some may be repeatedly worked on on a single strand of
>>> pearls (much like how development recorded in reflog entries of a
>>> single branch in a single user set-up goes). I guess you would need
>>> a way to record the predecessor vs successor relationship of various
>>> commits that share the same "change ID", much like commits form DAG
>>> to represent ancestor vs descendant relationship.
>>
>> That is correct. The change ID should be sufficient for handling
>> simple distributed cases involving a single remote but it's not a full
>> replacement for something like Mercurial's Changeset Evolution [1].
>
> Just a random thought. We could very easily replace "change ID"
> with a concept of predecessor-successor commits.
>
> Just like we can represent parents-children NxM transitive relation
> only with 0 or more "parent" commit object headers, we can record
> zero or more "predecessor" trailer in the commit log.
>
> (1) a commit with no "predecessor" is like "root commit" in the
> commit history topology. It is a brand new change that took
> inspiration from nobody else and that is not a polished form of
> any other existing commit.
>
> (2) a commit created as a refinement for one or more existing
> commits record each of them as "predecessor" to it. Having
> more than one of them is like a "merge commit" in the commit
> history topology and represents that two patches were squashed
> into one.
>
> (3) Splitting an originally large change into multiple changes can
> be represented the same way. They share the same commit as
> their "predecessor". Perhaps you have originally two-commit
> series, A and B, and split them differently in such a way that
> C has half of a and D has the rest of A plus B. In which case,
> C has A as its predecessor while D has both A and B as its
> predecessor.
>
> (4) Just like we can use auxiliary data structures like bitmaps to
> figure out reachability without following all the links in the
> commit history topology, we should be able to learn how a new
> change was born, and trace how it evolved into newer iteration
> of the moral equivalent of the change, possibly as a series
> with mutiple commits, using auxiliary data structure, which
> would represent predecessor-successor NxM transitive relation
> in a similar way in a form that is efficient to access.
>
> Something like this should allow us avoid relying on "change ID"s
> that can collide elsewhere in the world without having a central
> authority to assign them.
>
This does seem like the most "powerful" form of this, but does lose one
of the "simplicity"-based advantages of change ids.
Of course, you could simply use the root commit ID in most cases and
that would be sufficient, and in cases where its not unique you could
have tooling show more data and allow users to disambiguate.
This approach also likely requires the most "work" to implement on the
git side, vs storing a simpler single-value header.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-08 14:27 ` Junio C Hamano
2025-04-08 15:58 ` Phillip Wood
2025-04-16 0:24 ` Jacob Keller
@ 2025-05-14 15:08 ` Kristoffer Haugsbakk
2 siblings, 0 replies; 118+ messages in thread
From: Kristoffer Haugsbakk @ 2025-05-14 15:08 UTC (permalink / raw)
To: Junio C Hamano, Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
What a thread!
On Tue, Apr 8, 2025, at 16:27, Junio C Hamano wrote:
> Martin von Zweigbergk <martinvonz@google.com> writes:
>
>>> A set of individual commits that share the same "change ID" is,
>>> unlike reflog entries which is an ordered set of tip of topics, not
>>> inherently ordered. This is inevitable in the distributed world
>>> where many people can simultaneously work on improving a single
>>> "change" in many different ways, but making it difficult if not
>>> impossible to see how things evolved, simply because you first need
>>> to figure out the order of these commits that share the same "change
>>> ID". Some may be independently evolved from the same ancestor
>>> iteration. Some may be repeatedly worked on on a single strand of
>>> pearls (much like how development recorded in reflog entries of a
>>> single branch in a single user set-up goes). I guess you would need
>>> a way to record the predecessor vs successor relationship of various
>>> commits that share the same "change ID", much like commits form DAG
>>> to represent ancestor vs descendant relationship.
>>
>> That is correct. The change ID should be sufficient for handling
>> simple distributed cases involving a single remote but it's not a full
>> replacement for something like Mercurial's Changeset Evolution [1].
>
> Just a random thought. We could very easily replace "change ID"
> with a concept of predecessor-successor commits.
>
> Just like we can represent parents-children NxM transitive relation
> only with 0 or more "parent" commit object headers, we can record
> zero or more "predecessor" trailer in the commit log.
>
> (1) a commit with no "predecessor" is like "root commit" in the
> commit history topology. It is a brand new change that took
> inspiration from nobody else and that is not a polished form of
> any other existing commit.
>
> (2) a commit created as a refinement for one or more existing
> commits record each of them as "predecessor" to it. Having
> more than one of them is like a "merge commit" in the commit
> history topology and represents that two patches were squashed
> into one.
>
> (3) Splitting an originally large change into multiple changes can
> be represented the same way. They share the same commit as
> their "predecessor". Perhaps you have originally two-commit
> series, A and B, and split them differently in such a way that
> C has half of a and D has the rest of A plus B. In which case,
> C has A as its predecessor while D has both A and B as its
> predecessor.
>
> (4) Just like we can use auxiliary data structures like bitmaps to
> figure out reachability without following all the links in the
> commit history topology, we should be able to learn how a new
> change was born, and trace how it evolved into newer iteration
> of the moral equivalent of the change, possibly as a series
> with mutiple commits, using auxiliary data structure, which
> would represent predecessor-successor NxM transitive relation
> in a similar way in a form that is efficient to access.
>
> Something like this should allow us avoid relying on "change ID"s
> that can collide elsewhere in the world without having a central
> authority to assign them.
I have a few submissions where I recorded the commit hash and the
previous commits in the email headers.
https://lore.kernel.org/git/0ab05a4cf09ba02016b4493936ad1b092b1326aa.1730979849.git.code@khaugsbakk.name/
For this one (v3):[1]
```
X-Commit-Hash: 0ab05a4cf09ba02016b4493936ad1b092b1326aa
X-Previous-Commits: c50f9d405f9043a03cb5ca1855fbf27f9423c759 63a431537b78e2d84a172b5c837adba6184a1f1b
```
• `X-Commit-Hash`: my local commit for this patch
• `X-Previous-Commits`: the two previous commits (v1 and v2 in arbitrary order)
Version 1 just has the hash:
https://lore.kernel.org/git/63a431537b78e2d84a172b5c837adba6184a1f1b.1729451376.git.code@khaugsbakk.name/
```
X-Commit-Hash: 63a431537b78e2d84a172b5c837adba6184a1f1b
```
And v2:
https://lore.kernel.org/git/c50f9d405f9043a03cb5ca1855fbf27f9423c759.1730234365.git.code@khaugsbakk.name/
```
X-Commit-Hash: c50f9d405f9043a03cb5ca1855fbf27f9423c759
X-Previous-Commits: 63a431537b78e2d84a172b5c837adba6184a1f1b
```
† 1: The hash is in the message-id in my case. But I wanted a dedicated
field instead of taking it out of the msg id. And the msg id makeup
doesn’t seem documented. I’ve already seen a thread where someone
relied on parsing data out of the msg id until it changed from under
them.
> (4) Just like we can use auxiliary data structures like bitmaps to
> figure out reachability without following all the links in the
> commit history topology, we should be able to learn how a new
> change was born, and trace how it evolved into newer iteration
> of the moral equivalent of the change, possibly as a series
> with mutiple commits, using auxiliary data structure, which
> would represent predecessor-successor NxM transitive relation
> in a similar way in a form that is efficient to access.
I don’t know if this is related but it would be amazing if we users
could define custom indexes on the DB. Maybe people won’t agree on what
a change-id should mean (judging by this thread?) but with custom
indexes you could maybe get fast queries for whatever “id” you want to define.
Unrelated example: defining an index on `git patch-id --stable` for
quick *cherry* checks without making your own table with:
```
<rev list> | git diff-tree --patch --stdin \
| git patch-id --stable
```
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-08 5:35 ` Martin von Zweigbergk
2025-04-08 14:27 ` Junio C Hamano
@ 2025-04-08 14:27 ` Junio C Hamano
1 sibling, 0 replies; 118+ messages in thread
From: Junio C Hamano @ 2025-04-08 14:27 UTC (permalink / raw)
To: Martin von Zweigbergk
Cc: Git Mailing List, Edwin Kempin, Scott Chacon, remo,
philipmetzger@bluewin.ch
Martin von Zweigbergk <martinvonz@google.com> writes:
>> A set of individual commits that share the same "change ID" is,
>> unlike reflog entries which is an ordered set of tip of topics, not
>> inherently ordered. This is inevitable in the distributed world
>> where many people can simultaneously work on improving a single
>> "change" in many different ways, but making it difficult if not
>> impossible to see how things evolved, simply because you first need
>> to figure out the order of these commits that share the same "change
>> ID". Some may be independently evolved from the same ancestor
>> iteration. Some may be repeatedly worked on on a single strand of
>> pearls (much like how development recorded in reflog entries of a
>> single branch in a single user set-up goes). I guess you would need
>> a way to record the predecessor vs successor relationship of various
>> commits that share the same "change ID", much like commits form DAG
>> to represent ancestor vs descendant relationship.
>
> That is correct. The change ID should be sufficient for handling
> simple distributed cases involving a single remote but it's not a full
> replacement for something like Mercurial's Changeset Evolution [1].
Just a random thought. We could very easily replace "change ID"
with a concept of predecessor-successor commits.
Just like we can represent parents-children NxM transitive relation
only with 0 or more "parent" commit object headers, we can record
zero or more "predecessor" trailer in the commit log.
(1) a commit with no "predecessor" is like "root commit" in the
commit history topology. It is a brand new change that took
inspiration from nobody else and that is not a polished form of
any other existing commit.
(2) a commit created as a refinement for one or more existing
commits record each of them as "predecessor" to it. Having
more than one of them is like a "merge commit" in the commit
history topology and represents that two patches were squashed
into one.
(3) Splitting an originally large change into multiple changes can
be represented the same way. They share the same commit as
their "predecessor". Perhaps you have originally two-commit
series, A and B, and split them differently in such a way that
C has half of a and D has the rest of A plus B. In which case,
C has A as its predecessor while D has both A and B as its
predecessor.
(4) Just like we can use auxiliary data structures like bitmaps to
figure out reachability without following all the links in the
commit history topology, we should be able to learn how a new
change was born, and trace how it evolved into newer iteration
of the moral equivalent of the change, possibly as a series
with mutiple commits, using auxiliary data structure, which
would represent predecessor-successor NxM transitive relation
in a similar way in a form that is efficient to access.
Something like this should allow us avoid relying on "change ID"s
that can collide elsewhere in the world without having a central
authority to assign them.
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-04-02 18:48 Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer Martin von Zweigbergk
` (5 preceding siblings ...)
2025-04-07 20:59 ` Junio C Hamano
@ 2025-08-19 14:04 ` Askar Safin
2025-08-19 16:44 ` Ben Knoble
6 siblings, 1 reply; 118+ messages in thread
From: Askar Safin @ 2025-08-19 14:04 UTC (permalink / raw)
To: martinvonz; +Cc: ekempin, git, philipmetzger, remo, scott
If your change-id proposal requires some incompatible changes in git itself,
then, please, do them now! Incompatible release of git (git 3.0) is near.
--
Askar Safin
^ permalink raw reply [flat|nested] 118+ messages in thread
* Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer
2025-08-19 14:04 ` Askar Safin
@ 2025-08-19 16:44 ` Ben Knoble
0 siblings, 0 replies; 118+ messages in thread
From: Ben Knoble @ 2025-08-19 16:44 UTC (permalink / raw)
To: Askar Safin; +Cc: martinvonz, ekempin, git, philipmetzger, remo, scott
> Le 19 août 2025 à 10:10, Askar Safin <safinaskar@zohomail.com> a écrit :
>
> If your change-id proposal requires some incompatible changes in git itself,
> then, please, do them now! Incompatible release of git (git 3.0) is near.
I’m not the maintainer (so grain of salt), but as far as I know nobody has even picked a date for that release. I’m not sure that qualifies as near.
^ permalink raw reply [flat|nested] 118+ messages in thread