git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Phillip Wood via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Patrick Steinhardt <ps@pks.im>,
	Elijah Newren <newren@gmail.com>,
	Phillip Wood <phillip.wood@dunelm.org.uk>
Subject: Re: [PATCH] [RFC] rebase -m: partial support for copying extra commit headers
Date: Tue, 8 Apr 2025 11:15:56 +0100	[thread overview]
Message-ID: <240d1cab-b564-45ae-945e-cba621aa7562@gmail.com> (raw)
In-Reply-To: <Z_R6W_yjJEYuWo0A@tapette.crustytoothpaste.net>

Hi brian

On 08/04/2025 02:22, brian m. carlson wrote:
> On 2025-04-07 at 15:52:43, Phillip Wood via GitGitGadget wrote:
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>>      [RFC] rebase -m: partial support for copying extra commit headers
>>      
>>      This patch is largely a response to
>>      https://lore.kernel.org/git/Z-5rpWKAVPmz32jC@pks.im/ . I'm in two minds
>>      about whether we should consider merging such partial support but if it
>>      helps forges preserve extra commit headers then it may well be worth it.
> 
> I'd like to see command-line options to control this and ideally a
> configuration option.  Right now, we know nothing about these extra
> headers, including an expected format.  If a future version of Git (say,
> 3.0) adds a new header and the user includes invalid data in this extra
> header (which happens all the time with author and committer
> information), then 2.50 will propagate it on rebase and it won't be
> fixed until the user uses a version of Git that understands the header
> and can fsck it correctly.  That's not really great, since it means we
> can unknowingly spread corruption.

We could certainly add some way to make this opt-in if there is a desire 
for it and you make a good point about compatibility if we add a new 
commit header. I'm not sure I'd describe preserving these headers when 
rebasing as spreading corruption though as we're simply rewriting 
existing commits. If the user chose to merge rather than rebase we'd 
still have the same issues without creating any new commits.

> I am pretty sure that at $DAYJOB we'll need to have a discussion about
> whether we want to propagate these headers during rebase and I'm
> personally leaning against it.

My understanding is that GitHub has been using "git replay" for rebases 
and therefore copying extra commit headers since the middle of 2023 
[1,2]. The message I linked to in my original mail suggests that the 
"change-id" header is preserved when rebasing on GitHub.

> Why, you ask?  I've seen at least the following types of corruption:
> 
> * Missing timezones
> * Timezones with less than four digits
> * Valid timezones padded to more than four digits with zeros
> * Timezones which don't exist and never have (e.g., +1700)
> * Timezones which are so absurdly large that they push the date to a
>    year when nobody alive now will still be living
> * Date stamps that are larger than 2^64
> * Date stamps which are smaller than 2^64 but beyond the expected life
>    of the Sun
> * Extra angle brackets in the email field
> * Nothing in between the email brackets
> * Nothing before the email brackets (no name at all)
> * Names which are not UTF-8 but without an encoding header
> * Names which are not valid in the specified encoding
> * Emails which are not valid UTF-8[0]
> * Emails which don't meet the (ludicrously generous to the point of
>    being nearly unparseable) RFC production
> * Encodings which are not valid IANA charsets
> * Messages with no body and no blank line (just the newline at the end
>    of the final header)
> * gpgsig headers that include random non-ASCII bytes and control
>    characters[1]

Thanks for sharing that, it is an interesting list. On the subject of 
encoding I do think our documentation could be clearer that the encoding 
applies to all the headers as well as the commit message. As far as I 
can see it only mentions the commit message, not the author or committer 
identities but repo_logmsg_reencode() re-encodes the whole commit 
buffer. Out of interest do you think we could be doing a better job with 
fsck to pick up some of these problems earlier?

I think "git rebase" only cares that the author identity can be parsed 
by split_ident() which is fairly lenient.

> I see Patrick is CC'd here and I'm interested in his thoughts, as well
> as, of course, those of anyone else as well.

Yes me too

Thanks for your thoughtful and intereting reply

Phillip

[1] 
https://github.blog/changelog/2023-06-28-rebase-commits-now-created-using-the-merge-ort-strategy/
[2] 
https://github.blog/engineering/infrastructure/scaling-merge-ort-across-github/


  reply	other threads:[~2025-04-08 10:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-07 15:52 [PATCH] [RFC] rebase -m: partial support for copying extra commit headers Phillip Wood via GitGitGadget
2025-04-08  1:22 ` brian m. carlson
2025-04-08 10:15   ` Phillip Wood [this message]
2025-04-08 14:44     ` Junio C Hamano
2025-04-09 14:11       ` Phillip Wood
2025-04-09 15:36         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=240d1cab-b564-45ae-945e-cba621aa7562@gmail.com \
    --to=phillip.wood123@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=ps@pks.im \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).