Git development
 help / color / mirror / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Matt Stark <msta@google.com>
Cc: git@vger.kernel.org, ps@pks.im, gitster@pobox.com,
	phillip.wood@dunelm.org.uk,
	Martin von Zweigbergk <martinvonz@google.com>,
	remo@buenzli.dev, Edwin Kempin <ekempin@google.com>,
	schacon@gmail.com, philipmetzger@bluewin.ch,
	konstantin@linuxfoundation.org, newren@gmail.com, tytso@mit.edu,
	nico@cryptonector.com, rikingcoding@gmail.com
Subject: Re: [PATCH] headers: Preserve 'change-id' header in rebase / cherry-pick.
Date: Tue, 7 Apr 2026 23:28:42 +0000	[thread overview]
Message-ID: <adWTKt20ISC3qz2g@fruit.crustytoothpaste.net> (raw)
In-Reply-To: <CAH7WC73-4p0RrqKNSh2G-xfpfO7QHZiXHbU_UFRkM3Q=bMWTDw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4576 bytes --]

On 2026-04-07 at 03:13:18, Matt Stark wrote:
> In the discussions on
> https://lore.kernel.org/git/Z_OGMb-1oV0Ex05e@pks.im/T/#m038be849b9b4020c16c562d810cf77bad91a2c87,
> it seems to be that:
> * There is consensus that a `change-id` header provides good value

I'm not sure I agree.

Absent some well-defined documentation describing what it means, I don't
see how it could provide good value.  It sounds like you're saying a
persistent commit ID is generally useful, but I don't see the value and
I associate persistent IDs with online tracking and advertisements,
which are neither useful to me nor particularly ethical.  Since nobody
has explained the compelling reasons in documentation, I am left to
speculate on them myself and have come up empty.

> * There is not consenus on what precise format that should take

I think stabilizing this before a format is defined is a mistake.

Even if, for the sake of argument, we agree that this is a generally
useful thing to have, we'd want to have a standard format (ideally
produced in a deterministic way for the reproducibility of the testsuite
and downstream projects), which we don't have, before we persist this.
We would probably want to have `git fsck` verify that the format is
correct and this is not being used as a way to store random information
as part of the initial change.  I assure you that users will very much
try to shovel random, arbitrary, malformed information in there
otherwise, since I've seen this in the author and committer headers[0].

> This commit, rather than attempting to standardize the format, simply
> preserves the change-id header in whatever format it used previously.
> 
> If we so choose, we can later decide on a standardized format, but since
> git only preserves existing headers, this should not create backwards
> incompatibility.

As I mentioned before in other threads, this needs to be off by default
or configurable.  This kind of ID provides tracking of commits, which is
useful in some situations but may also be undesirable for privacy or
other reasons.  Unlike other headers in commits, it is not easily
visible (one can easily tell if a commit is signed, for instance, or
what its tree is) and so therefore has potential privacy implications.

This is especially true since historically a great deal of information
has been automatically rewritten when rebasing or cherry-picking
(leaving only author and message alone), so users will have come to
expect this.

This is also a great way to leak information, such as secret keys.  I
can shovel sensitive keys or IDs into a commit (in a possibly encrypted
form), push them somewhere I have access to, and then exploit them.
Nobody will ever notice since corporate firewalls don't actually see the
raw object information, only the compressed and deltified packfile.  I
can even have my colleague rebase my commit with --reset-author and push
it so I have plausible deniability.

As an example of a problematic situation, say user A creates a commit
and publishes it somewhere on a remote.  It doesn't get picked up into
the main branch.  A year later, user A changes their name (because they
transition, marry, acquire a new citizenship[1], or for any other good
and valuable reason) and suddenly go by the name B.  Six months later,
they rebase the patch on the current main branch and, because the
project has advanced quite a bit, it looks completely different (so `git
cherry` will no longer identify it in any meaningful way).  They adjust
the message substantially due to the change and sign it off as user B
and submit it.

The user in this case may not have wanted the two commits to be
associated (very especially so if they transitioned), so this poses a
substantial risk of unintended disclosure.  The fact that Git makes this
a problem already is not a good excuse for making it worse here; to the
contrary, we should be making the situation better, not piling on.

[0] For instance, some people want to provide timestamps that are larger
than 2^64, despite the fact that it is remarkably unlikely that humans
will still exist 5×10^11 years in the future, let alone that Git will
still be in use.  Unsurprisingly, most programming languages don't
appreciate these timestamps, so problems ensue.
[1] Some countries require that citizens have a name which can decline
grammatically in the native language or otherwise meets linguistic or
cultural norms in that country.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 325 bytes --]

      parent reply	other threads:[~2026-04-07 23:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-07  3:13 [PATCH] headers: Preserve 'change-id' header in rebase / cherry-pick Matt Stark
2026-04-07  4:09 ` Junio C Hamano
2026-04-07  4:58   ` Nico Williams
2026-04-07  5:02     ` Nico Williams
2026-04-07 14:33       ` Junio C Hamano
2026-04-07  9:55     ` Phillip Wood
2026-04-07 15:52       ` Nico Williams
2026-04-07 16:20         ` Junio C Hamano
2026-04-07 20:13           ` Nico Williams
2026-04-07 14:42     ` Junio C Hamano
2026-04-07  9:41   ` Phillip Wood
2026-04-07 23:28 ` brian m. carlson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adWTKt20ISC3qz2g@fruit.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=ekempin@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=konstantin@linuxfoundation.org \
    --cc=martinvonz@google.com \
    --cc=msta@google.com \
    --cc=newren@gmail.com \
    --cc=nico@cryptonector.com \
    --cc=philipmetzger@bluewin.ch \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=ps@pks.im \
    --cc=remo@buenzli.dev \
    --cc=rikingcoding@gmail.com \
    --cc=schacon@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox