From: Michael J Gruber <git@drmicha.warpmail.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH] combine-diff: use textconv for combined diff format
Date: Sat, 16 Apr 2011 12:24:08 +0200 [thread overview]
Message-ID: <4DA96E48.3050008@drmicha.warpmail.net> (raw)
In-Reply-To: <7voc47cqj0.fsf@alter.siamese.dyndns.org>
Junio C Hamano venit, vidit, dixit 15.04.2011 20:34:
> Michael J Gruber <git@drmicha.warpmail.net> writes:
>
>> git diff -m produces a combined diff!
>
> Hmm, what is the rest of your command line? I thought -m was a way to ask
> pairwise diff with each parent.
Sure, but it does not always work like that. Just look at the test from
my patch, or do any "git merge --no-commit" and then "git diff -m". I
would expect that to compare the worktree to each parent, but in fact it
runs "diff --cc".
At least I thought that's the only way how combine-diff would ever have
to deal with a merge result in the worktree as opposed to a blob. And it
seems that "diff -m" does not handle this but relays to "diff --cc" for
current git. I have not checked the "-m" codepath.
>> +static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, int n, int num_parent, int textconv)
>> {
>> struct diff_queue_struct *q = &diff_queued_diff;
>> struct combine_diff_path *p;
>> @@ -34,9 +34,13 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr,
>>
>> hashcpy(p->sha1, q->queue[i]->two->sha1);
>> p->mode = q->queue[i]->two->mode;
>> + if (textconv)
>> + p->textconv = get_textconv(q->queue[i]->two);
>> hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
>> p->parent[n].mode = q->queue[i]->one->mode;
>> p->parent[n].status = q->queue[i]->status;
>> + if (textconv)
>> + p->parent[n].textconv = get_textconv(q->queue[i]->one);
>
> This code attempts to handle different textconv set for each different
> parents. But I have to wonder if that is really worth it.
>
> The attribute to decide the content type of the blob is read from the same
> set of .gitattributes files, regardless of which parent you are looking at
> (and this is not likely to change---the exact procedure that is applied
> comes from .git/config that is not even versioned, so there is not much
> point in reading from the .gitattributes from the parent tree, trying to
> be "precise").
>
> If q->queue[i] is not a rename, p->textconv and p->parent[n].textconv
> would be the same because one and two came from the same path. If it is a
> rename, they by definition consist of similar contents, and the user would
> want the same textconv conversion applied to them to make them comparable.
> Even though using p->parent[n].textconv to convert q->queue[i]->one->sha1
> blob and using p->textconv to convert q->queue[i]->two->sha1 blob might be
> the right thing to do in theory, doing so wouldn't make a difference in
> practice. More importantly, even if the two textconvs specify different
> conversions, it is likely that it is an user error (e.g. the preimage had
> "img4433.jqg" that was renamed to img4433.jpg" in the postimage, and the
> attributes mechanism does not say ".jqg" is a JPEG that wants to get
> "exif" run to be texualized for the purpose of diffing, or something).
>
> Besides, if you really want to support "left hand side and right hand
> side, depending on which parent we are talking about, may use different
> textconv", you would need to defeat the optimization in show_patch_diff()
> that calls reuse_combine_diff() when sha1 are the same from other parent
> we have already compared---the parent we are looking at may be using a
> different textconv procedure. Even worse, if parent and child have the
> same sha1, the result of running parent textconv on the parent blob may be
> different from that of the child, which you would never even see in this
> codepath.
>
> So I suspect that using only one textconv per "struct combine_diff_path"
> would make both the code simpler, and more importantly would make the
> result more correct from the end user's point of view.
I'd be happy to take the simpler approach. While I still think the other
one is "more correct" (modulo the reuse issue) it should not matter in
most cases.
>
>> @@ -777,6 +783,12 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
>> close(fd);
>> }
>>
>> + if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && elem->textconv) {
>> + struct diff_filespec *df = alloc_filespec(elem->path);
>> + fill_filespec(df, elem->sha1, elem->mode);
>> + result_size = fill_textconv(elem->textconv, df, &result);
>> + }
>
> I suspect that these three lines have to become a small helper function to
> be used to convert the final blob (done here), and parent blob (done in
> combine_diff). With the "binary" support, it would eventually need to be
> enhanced to something like:
>
> if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV)) {
> if (textconv) {
> do these three lines;
> } else if (is binary) {
> "Binary blob $SHA-1";
> }
> }
>
"diff -m --oneline" says something like
aa01ae1 (from 64c0923) Merge branch 'master' into somebranch
diff --git a/a b/a
index 72594ed..d8323da 100644
Binary files a/a and b/a differ
aa01ae1 (from e85049e) Merge branch 'master' into somebranch
diff --git a/a b/a
index 86e041d..d8323da 100644
Binary files a/a and b/a differ
so I'm wondering whether we shouldn't stay closer to that with "--cc
also", e.g.:
aa01ae1 Merge branch 'master' into somebranch
diff --cc a
index 72594ed,86e041d..d8323da
Binary files a/a and b/a differ
BTW: Currently, "--cc --oneline" produces an extra newline before the
diff line, and also note how the diff lines differ ("a/a b/a" vs. "a").
But those are different issues.
> and having a small helper function early in the series would help that
> process.
>
>> + paths = intersect_paths(paths, i, num_parent, DIFF_OPT_TST(opt, ALLOW_TEXTCONV));
>
> As an internal API within this file, I would rather see "opt" as a whole
> passed to intersect_paths(). We may probably want to determine if the
> blob is binary in that function depending on other "opt" fields.
Yep.
Michael
[Resent today, sorry. Couldn't get myself to reboot that box yesterday
after a disk gave up.]
next prev parent reply other threads:[~2011-04-16 10:24 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-11 17:12 textconv not invoked when viewing merge commit Peter Oberndorfer
2011-04-12 9:04 ` Michael J Gruber
2011-04-14 19:09 ` Jeff King
2011-04-14 19:15 ` Jeff King
2011-04-14 19:26 ` Junio C Hamano
2011-04-14 19:28 ` Jeff King
2011-04-14 19:35 ` Michael J Gruber
2011-04-14 19:43 ` Junio C Hamano
2011-04-14 20:06 ` Junio C Hamano
2011-04-14 20:23 ` Jeff King
2011-04-14 21:05 ` Junio C Hamano
2011-04-14 21:30 ` Jeff King
2011-04-15 15:29 ` [PATCH] combine-diff: use textconv for combined diff format Michael J Gruber
2011-04-15 18:34 ` Junio C Hamano
2011-04-16 10:24 ` Michael J Gruber [this message]
2011-04-16 17:19 ` Junio C Hamano
2011-04-16 21:37 ` Jakub Narebski
2011-04-15 23:56 ` Jeff King
2011-04-21 16:08 ` Peter Oberndorfer
2011-04-15 6:54 ` textconv not invoked when viewing merge commit Matthieu Moy
2011-04-15 20:45 ` Junio C Hamano
2011-04-16 1:47 ` Jeff King
2011-04-16 6:10 ` Junio C Hamano
2011-04-16 6:33 ` Jeff King
2011-04-16 16:23 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DA96E48.3050008@drmicha.warpmail.net \
--to=git@drmicha.warpmail.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.