git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>,
	git@vger.kernel.org
Subject: Re: RFE: git-patch-id should handle patches without leading "diff"
Date: Fri, 07 Dec 2018 23:23:45 +0100	[thread overview]
Message-ID: <87tvjpx9fy.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20181207220116.GB73340@google.com>


On Fri, Dec 07 2018, Jonathan Nieder wrote:

> Hi,
>
> Konstantin Ryabitsev wrote:
>
>> Every now and again I come across a patch sent to LKML without a leading
>> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>>
>> https://lore.kernel.org/lkml/20181125185004.151077005@linutronix.de/
>>
>> I am guessing quilt does not bother including the leading "diff a/foo
>> b/foo" because it's redundant with the next two lines, however this
>> remains a valid patch recognized by git-am.
>>
>> If you pipe that patch via git-patch-id, it produces nothing, but if I
>> put in the leading "diff", like so:
>>
>> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>
>> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".
>
> Interesting.  As Ævar mentioned, the relevant code is
>
> 		/* Ignore commit comments */
> 		if (!patchlen && !starts_with(line, "diff "))
> 			continue;
>
> which is trying to handle a case where a line that is special to the
> parser appears before the diff begins.
>
> The patch-id appears to only care about the diff text, so it should be
> able to handle this.  So if we have a better heuristic for where the
> diff starts, it would be good to use it.

No, the patch-id doesn't just care about the diff, it cares about the
context before the diff too.

See this patch:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~..
    diff --git x/refs/files-backend.c y/refs/files-backend.c
    index 9183875dad..dd8abe9185 100644
    --- x/refs/files-backend.c
    +++ y/refs/files-backend.c
    @@ -180,7 +180,8 @@ static void files_reflog_path(struct files_ref_store *refs,
                    break;
            case REF_TYPE_OTHER_PSEUDOREF:
            case REF_TYPE_MAIN_PSEUDOREF:
    -               return files_reflog_path_other_worktrees(refs, sb, refname);
    +               files_reflog_path_other_worktrees(refs, sb, refname);
    +               break;
            case REF_TYPE_NORMAL:
                    strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
                    break;

Observe that the diff --git line matters, we hash it:

    $ git diff-tree -p HEAD~.. | git patch-id
    5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
    $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id --stable
    5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id --stable
    4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000

The thing it doesn't care about is the "index" between the "diff" and
patch:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id --stable
    4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000

We also care about the +++ and --- lines:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/^(\+\+\+|---).*/$1/g' | git patch-id
    56985c2c38cce6079de2690082e1770a8e81214c 0000000000000000000000000000000000000000

Then we normalize the @@ line, e.g.:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id
    4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000
    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/\d+/123/g' | git patch-id
    4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000


There's other caveats (see the code, e.g. "strip space") but to a first
approximation a patch id is a hash of something that looks like this:
    
    diff --git x/refs/files-backend.c y/refs/files-backend.c
    --- x/refs/files-backend.c
    +++ y/refs/files-backend.c
    @@ -123,123 +123,123 @@ static void files_reflog_path(struct files_ref_store *refs,
                    break;
            case REF_TYPE_OTHER_PSEUDOREF:
            case REF_TYPE_MAIN_PSEUDOREF:
    -               return files_reflog_path_other_worktrees(refs, sb, refname);
    +               files_reflog_path_other_worktrees(refs, sb, refname);
    +               break;
            case REF_TYPE_NORMAL:
                    strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
                    break;

Which means that accepting a patch like this as input would actually
give you a different patch-id than if it had the proper header.

So it seems most sensible to me if this is going to be supported that we
go a bit beyond the call of duty and fake up the start of it, namely:

    --- a/arch/x86/kernel/process.c
    +++ b/arch/x86/kernel/process.c

To be:

    diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
    --- a/arch/x86/kernel/process.c
    +++ b/arch/x86/kernel/process.c

It'll make the state machine a bit more complex, but IMO it would suck
more if we generate a different hash depending on the tool generating
the diff. OTOH the "diff --git" line was never there, and it *does*
matter, so should we be faking it up? Maybe not, bah!

> "git apply" uses apply.c::find_header, which is more permissive.
> Maybe it would be possible to unify these somehow.  (I haven't looked
> closely enough to tell how painful that would be.)
>
> Thanks and hope that helps,
> Jonathan

  reply	other threads:[~2018-12-07 22:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-07 18:19 RFE: git-patch-id should handle patches without leading "diff" Konstantin Ryabitsev
2018-12-07 19:25 ` Ævar Arnfjörð Bjarmason
2018-12-07 22:01 ` Jonathan Nieder
2018-12-07 22:23   ` Ævar Arnfjörð Bjarmason [this message]
2018-12-07 22:34     ` Jonathan Nieder
2018-12-08  6:01       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tvjpx9fy.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=konstantin@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).