From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>,
git@vger.kernel.org
Subject: Re: RFE: git-patch-id should handle patches without leading "diff"
Date: Fri, 07 Dec 2018 23:23:45 +0100 [thread overview]
Message-ID: <87tvjpx9fy.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20181207220116.GB73340@google.com>
On Fri, Dec 07 2018, Jonathan Nieder wrote:
> Hi,
>
> Konstantin Ryabitsev wrote:
>
>> Every now and again I come across a patch sent to LKML without a leading
>> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>>
>> https://lore.kernel.org/lkml/20181125185004.151077005@linutronix.de/
>>
>> I am guessing quilt does not bother including the leading "diff a/foo
>> b/foo" because it's redundant with the next two lines, however this
>> remains a valid patch recognized by git-am.
>>
>> If you pipe that patch via git-patch-id, it produces nothing, but if I
>> put in the leading "diff", like so:
>>
>> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>
>> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".
>
> Interesting. As Ævar mentioned, the relevant code is
>
> /* Ignore commit comments */
> if (!patchlen && !starts_with(line, "diff "))
> continue;
>
> which is trying to handle a case where a line that is special to the
> parser appears before the diff begins.
>
> The patch-id appears to only care about the diff text, so it should be
> able to handle this. So if we have a better heuristic for where the
> diff starts, it would be good to use it.
No, the patch-id doesn't just care about the diff, it cares about the
context before the diff too.
See this patch:
$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~..
diff --git x/refs/files-backend.c y/refs/files-backend.c
index 9183875dad..dd8abe9185 100644
--- x/refs/files-backend.c
+++ y/refs/files-backend.c
@@ -180,7 +180,8 @@ static void files_reflog_path(struct files_ref_store *refs,
break;
case REF_TYPE_OTHER_PSEUDOREF:
case REF_TYPE_MAIN_PSEUDOREF:
- return files_reflog_path_other_worktrees(refs, sb, refname);
+ files_reflog_path_other_worktrees(refs, sb, refname);
+ break;
case REF_TYPE_NORMAL:
strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
break;
Observe that the diff --git line matters, we hash it:
$ git diff-tree -p HEAD~.. | git patch-id
5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
$ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id --stable
5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id --stable
4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000
The thing it doesn't care about is the "index" between the "diff" and
patch:
$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id --stable
4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000
We also care about the +++ and --- lines:
$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/^(\+\+\+|---).*/$1/g' | git patch-id
56985c2c38cce6079de2690082e1770a8e81214c 0000000000000000000000000000000000000000
Then we normalize the @@ line, e.g.:
$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id
4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000
$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/\d+/123/g' | git patch-id
4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000
There's other caveats (see the code, e.g. "strip space") but to a first
approximation a patch id is a hash of something that looks like this:
diff --git x/refs/files-backend.c y/refs/files-backend.c
--- x/refs/files-backend.c
+++ y/refs/files-backend.c
@@ -123,123 +123,123 @@ static void files_reflog_path(struct files_ref_store *refs,
break;
case REF_TYPE_OTHER_PSEUDOREF:
case REF_TYPE_MAIN_PSEUDOREF:
- return files_reflog_path_other_worktrees(refs, sb, refname);
+ files_reflog_path_other_worktrees(refs, sb, refname);
+ break;
case REF_TYPE_NORMAL:
strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
break;
Which means that accepting a patch like this as input would actually
give you a different patch-id than if it had the proper header.
So it seems most sensible to me if this is going to be supported that we
go a bit beyond the call of duty and fake up the start of it, namely:
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
To be:
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
It'll make the state machine a bit more complex, but IMO it would suck
more if we generate a different hash depending on the tool generating
the diff. OTOH the "diff --git" line was never there, and it *does*
matter, so should we be faking it up? Maybe not, bah!
> "git apply" uses apply.c::find_header, which is more permissive.
> Maybe it would be possible to unify these somehow. (I haven't looked
> closely enough to tell how painful that would be.)
>
> Thanks and hope that helps,
> Jonathan
next prev parent reply other threads:[~2018-12-07 22:23 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-07 18:19 RFE: git-patch-id should handle patches without leading "diff" Konstantin Ryabitsev
2018-12-07 19:25 ` Ævar Arnfjörð Bjarmason
2018-12-07 22:01 ` Jonathan Nieder
2018-12-07 22:23 ` Ævar Arnfjörð Bjarmason [this message]
2018-12-07 22:34 ` Jonathan Nieder
2018-12-08 6:01 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tvjpx9fy.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=konstantin@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).