From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>,
git@vger.kernel.org
Subject: Re: RFE: git-patch-id should handle patches without leading "diff"
Date: Fri, 07 Dec 2018 23:23:45 +0100 [thread overview]
Message-ID: <87tvjpx9fy.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20181207220116.GB73340@google.com>
On Fri, Dec 07 2018, Jonathan Nieder wrote:
> Hi,
>
> Konstantin Ryabitsev wrote:
>
>> Every now and again I come across a patch sent to LKML without a leading
>> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>>
>> https://lore.kernel.org/lkml/20181125185004.151077005@linutronix.de/
>>
>> I am guessing quilt does not bother including the leading "diff a/foo
>> b/foo" because it's redundant with the next two lines, however this
>> remains a valid patch recognized by git-am.
>>
>> If you pipe that patch via git-patch-id, it produces nothing, but if I
>> put in the leading "diff", like so:
>>
>> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>
>> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".
>
> Interesting. As Ævar mentioned, the relevant code is
>
> /* Ignore commit comments */
> if (!patchlen && !starts_with(line, "diff "))
> continue;
>
> which is trying to handle a case where a line that is special to the
> parser appears before the diff begins.
>
> The patch-id appears to only care about the diff text, so it should be
> able to handle this. So if we have a better heuristic for where the
> diff starts, it would be good to use it.
No, the patch-id doesn't just care about the diff, it cares about the
context before the diff too.
See this patch:
$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~..
diff --git x/refs/files-backend.c y/refs/files-backend.c
index 9183875dad..dd8abe9185 100644
--- x/refs/files-backend.c
+++ y/refs/files-backend.c
@@ -180,7 +180,8 @@ static void files_reflog_path(struct files_ref_store *refs,
break;
case REF_TYPE_OTHER_PSEUDOREF:
case REF_TYPE_MAIN_PSEUDOREF:
- return files_reflog_path_other_worktrees(refs, sb, refname);
+ files_reflog_path_other_worktrees(refs, sb, refname);
+ break;
case REF_TYPE_NORMAL:
strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
break;
Observe that the diff --git line matters, we hash it:
$ git diff-tree -p HEAD~.. | git patch-id
5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
$ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id --stable
5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id --stable
4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000
The thing it doesn't care about is the "index" between the "diff" and
patch:
$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id --stable
4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000
We also care about the +++ and --- lines:
$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/^(\+\+\+|---).*/$1/g' | git patch-id
56985c2c38cce6079de2690082e1770a8e81214c 0000000000000000000000000000000000000000
Then we normalize the @@ line, e.g.:
$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id
4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000
$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/\d+/123/g' | git patch-id
4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000
There's other caveats (see the code, e.g. "strip space") but to a first
approximation a patch id is a hash of something that looks like this:
diff --git x/refs/files-backend.c y/refs/files-backend.c
--- x/refs/files-backend.c
+++ y/refs/files-backend.c
@@ -123,123 +123,123 @@ static void files_reflog_path(struct files_ref_store *refs,
break;
case REF_TYPE_OTHER_PSEUDOREF:
case REF_TYPE_MAIN_PSEUDOREF:
- return files_reflog_path_other_worktrees(refs, sb, refname);
+ files_reflog_path_other_worktrees(refs, sb, refname);
+ break;
case REF_TYPE_NORMAL:
strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
break;
Which means that accepting a patch like this as input would actually
give you a different patch-id than if it had the proper header.
So it seems most sensible to me if this is going to be supported that we
go a bit beyond the call of duty and fake up the start of it, namely:
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
To be:
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
It'll make the state machine a bit more complex, but IMO it would suck
more if we generate a different hash depending on the tool generating
the diff. OTOH the "diff --git" line was never there, and it *does*
matter, so should we be faking it up? Maybe not, bah!
> "git apply" uses apply.c::find_header, which is more permissive.
> Maybe it would be possible to unify these somehow. (I haven't looked
> closely enough to tell how painful that would be.)
>
> Thanks and hope that helps,
> Jonathan
next prev parent reply other threads:[~2018-12-07 22:23 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-07 18:19 RFE: git-patch-id should handle patches without leading "diff" Konstantin Ryabitsev
2018-12-07 19:25 ` Ævar Arnfjörð Bjarmason
2018-12-07 22:01 ` Jonathan Nieder
2018-12-07 22:23 ` Ævar Arnfjörð Bjarmason [this message]
2018-12-07 22:34 ` Jonathan Nieder
2018-12-08 6:01 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tvjpx9fy.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=konstantin@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.