* RFC: indicating diff strategy in format-patch message headers
@ 2024-06-05 18:01 Konstantin Ryabitsev
2024-06-05 18:22 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Ryabitsev @ 2024-06-05 18:01 UTC (permalink / raw)
To: git
Hello, all:
When developing tooling that attempts to map commits to patches, we often
don't have more than just the git-patch-id to go by. The problem is, there is
any number of ways to generate patches from commits:
- using a different strategy (--histogram, --patience, etc)
- using a different number of context lines (-U5)
Without knowing what options were used by the original author, we cannot be
certain that we'll create a patch with the same git-patch-id, even if it's
from the exact same commit.
Would it make sense to have git-format-patch (and friends) include an
additional header hinting at the options used to generate the patch? E.g.:
X-git-diff-options: algo=myers; context=3;
It won't help for all cases where we need to make a match (e.g. when we match
from git commits to a patch query), but it will help matching the other way.
Any thoughts?
-K
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: indicating diff strategy in format-patch message headers
2024-06-05 18:01 RFC: indicating diff strategy in format-patch message headers Konstantin Ryabitsev
@ 2024-06-05 18:22 ` Junio C Hamano
2024-06-05 18:46 ` Konstantin Ryabitsev
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2024-06-05 18:22 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: git
Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:
> Would it make sense to have git-format-patch (and friends) include an
> additional header hinting at the options used to generate the patch? E.g.:
>
> X-git-diff-options: algo=myers; context=3;
I doubt it. And newer version of Git _will_ try to improve how
patch text appears to be more useful to human users, so you have
more moving parts than you'd want to even think about enumerating.
If you were to add a new e-mail header, wouldn't it make more sense
to add a patch-id header and agree on the set of options to be used
to generate that patch-id (which might be different from the setting
used to format the real patch for human and "git am" consumption)?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: indicating diff strategy in format-patch message headers
2024-06-05 18:22 ` Junio C Hamano
@ 2024-06-05 18:46 ` Konstantin Ryabitsev
2024-06-05 22:26 ` Eric Wong
0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Ryabitsev @ 2024-06-05 18:46 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Wed, Jun 05, 2024 at 11:22:08AM GMT, Junio C Hamano wrote:
> > Would it make sense to have git-format-patch (and friends) include an
> > additional header hinting at the options used to generate the patch? E.g.:
> >
> > X-git-diff-options: algo=myers; context=3;
>
> I doubt it. And newer version of Git _will_ try to improve how
> patch text appears to be more useful to human users, so you have
> more moving parts than you'd want to even think about enumerating.
OK, I was afraid that was the case.
> If you were to add a new e-mail header, wouldn't it make more sense
> to add a patch-id header and agree on the set of options to be used
> to generate that patch-id (which might be different from the setting
> used to format the real patch for human and "git am" consumption)?
That would be redundant with the message-id. Unfortunately, this doesn't solve
the problem of how to reliably map a commit to the patch from which it
originated, other than using the Message-ID: or Link: trailers.
I'm always happy to see a Link: trailer pointing at the patch submission,
e.g.:
| Link: https://msgid.link/[msgid]
However, not all maintainers welcome them in commits (e.g. Linus was vocal
about it in the past [1]). It could be the case of "use them anyway," but
we'll still have a large enough subset of commits without this info and
increasingly without a reliable mechanism to map commits to original
submissions and code review systems.
I'm not sure if there's a reliable way to solve this.
-K
[1]: https://lore.kernel.org/lkml/CAHk-=wgAk3NEJ2PHtb0jXzCUOGytiHLq=rzjkFKfpiuH-SROgA@mail.gmail.com/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: indicating diff strategy in format-patch message headers
2024-06-05 18:46 ` Konstantin Ryabitsev
@ 2024-06-05 22:26 ` Eric Wong
2024-06-07 19:53 ` Konstantin Ryabitsev
0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2024-06-05 22:26 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: Junio C Hamano, git
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Wed, Jun 05, 2024 at 11:22:08AM GMT, Junio C Hamano wrote:
> > > Would it make sense to have git-format-patch (and friends) include an
> > > additional header hinting at the options used to generate the patch? E.g.:
> > >
> > > X-git-diff-options: algo=myers; context=3;
Fwiw, I use format-patch command-line + options in the
--signature= switch when generating patches in WIP git repo viewer:
https://yhbt.net/lore/pub/scm/git/git.git/6549c41e/s/0001.patch
> > If you were to add a new e-mail header, wouldn't it make more sense
> > to add a patch-id header and agree on the set of options to be used
> > to generate that patch-id (which might be different from the setting
> > used to format the real patch for human and "git am" consumption)?
>
> That would be redundant with the message-id. Unfortunately, this doesn't solve
> the problem of how to reliably map a commit to the patch from which it
> originated, other than using the Message-ID: or Link: trailers.
dfpre:, dfblob:, dfpost: search queries on lore seem to work...
On my lore + gko mirror, the #related search off a commit mostly
works for a single extindex, but mapping hundreds of forks to
hundreds of inboxes with a presentable UI for w3m||lynx users is
a different matter I haven't been able to solve:
https://yhbt.net/lore/pub/scm/git/git.git/6549c41e/s/#related
> I'm not sure if there's a reliable way to solve this.
I started working on a scoring mechanism a while back but got
distracted with personal matters. UI is hard, and dealing with
my gko mirror has been a source of pain due to various
memory+performance problems.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: indicating diff strategy in format-patch message headers
2024-06-05 22:26 ` Eric Wong
@ 2024-06-07 19:53 ` Konstantin Ryabitsev
2024-06-16 23:47 ` Eric Wong
0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Ryabitsev @ 2024-06-07 19:53 UTC (permalink / raw)
To: Eric Wong; +Cc: Junio C Hamano, git
On Wed, Jun 05, 2024 at 10:26:58PM GMT, Eric Wong wrote:
> > That would be redundant with the message-id. Unfortunately, this doesn't solve
> > the problem of how to reliably map a commit to the patch from which it
> > originated, other than using the Message-ID: or Link: trailers.
>
> dfpre:, dfblob:, dfpost: search queries on lore seem to work...
I've thought about that, but this is also not very reliable, at least not when
patch series are applied as fast-forwards, not merges. Unfortunately, some
projects enforce a flat history (glibc, gcc), with merges being specifically
not allowed -- which means dfblob matching is not going to match a lot of
commits.
> On my lore + gko mirror, the #related search off a commit mostly
> works for a single extindex, but mapping hundreds of forks to
> hundreds of inboxes with a presentable UI for w3m||lynx users is
> a different matter I haven't been able to solve:
>
> https://yhbt.net/lore/pub/scm/git/git.git/6549c41e/s/#related
>
> > I'm not sure if there's a reliable way to solve this.
>
> I started working on a scoring mechanism a while back but got
> distracted with personal matters. UI is hard, and dealing with
> my gko mirror has been a source of pain due to various
> memory+performance problems.
I can totally relate to the pain of memory+performance problems. :) Especially
when a ton of "AI" bots decide to aggressively crawl your sites. :/
-K
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: indicating diff strategy in format-patch message headers
2024-06-07 19:53 ` Konstantin Ryabitsev
@ 2024-06-16 23:47 ` Eric Wong
0 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2024-06-16 23:47 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: Junio C Hamano, git
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Wed, Jun 05, 2024 at 10:26:58PM GMT, Eric Wong wrote:
> > > That would be redundant with the message-id. Unfortunately, this doesn't solve
> > > the problem of how to reliably map a commit to the patch from which it
> > > originated, other than using the Message-ID: or Link: trailers.
> >
> > dfpre:, dfblob:, dfpost: search queries on lore seem to work...
>
> I've thought about that, but this is also not very reliable, at least not when
> patch series are applied as fast-forwards, not merges. Unfortunately, some
> projects enforce a flat history (glibc, gcc), with merges being specifically
> not allowed -- which means dfblob matching is not going to match a lot of
> commits.
Yeha, my initial plan (way back around 2016 or 2017) was to use
Subject; but I also forgot why I stopped with that idea :x
Anyways, trying to add bs: (body+subject) to queries in addition to dfblob
may prove useful:
https://public-inbox.org/meta/20240616233532.574646-1-e@80x24.org/
Not sure if author + date is necessary...
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-06-16 23:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-05 18:01 RFC: indicating diff strategy in format-patch message headers Konstantin Ryabitsev
2024-06-05 18:22 ` Junio C Hamano
2024-06-05 18:46 ` Konstantin Ryabitsev
2024-06-05 22:26 ` Eric Wong
2024-06-07 19:53 ` Konstantin Ryabitsev
2024-06-16 23:47 ` Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).