git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/3] diff: allow turning on textconv explicitly for plumbing
@ 2008-12-08  2:57 Jeff King
  2008-12-08  3:55 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2008-12-08  2:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Some history viewers use the diff plumbing to generate diffs
rather than going through the "git diff" porcelain.
Currently, there is no way for them to specify that they
would like to see the text-converted version of the diff.

This patch adds a "--textconv" option to allow such a
plumbing user to allow text conversion.  The user can then
tell the viewer whether or not they would like text
conversion enabled.

While it may be tempting add a configuration option rather
than requiring each plumbing user to be configured to pass
--textconv, that is somewhat dangerous. Text-converted diffs
generally cannot be applied directly, so each plumbing user
should "opt in" to generating such a diff, either by
explicit request of the user or by confirming that their
output will not be fed to patch.

Signed-off-by: Jeff King <peff@peff.net>
---
My ultimate goal is to see these diffs in gitk, which is implemented in
3/3. As a bonus side effect, the --no-textconv option can be used with
"git diff" or "git log" if you really don't want to see them there
(e.g., because you are abusing "git log" to produce a binary patch you
mean to apply).

I know this is not strictly a bugfix and we are in -rc, but:

  1. It is an enhancement to a previously unreleased feature, and
     shouldn't affect anything outside of that.

  2. It affects the scripting interface to textconv, so I would like to
     get it in before textconv is ever released so that it is always the
     "right way" to turn text conversion off or on.

 diff.c |    4 ++++
 diff.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/diff.c b/diff.c
index f644947..e21af3b 100644
--- a/diff.c
+++ b/diff.c
@@ -2477,6 +2477,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_OPT_SET(options, ALLOW_EXTERNAL);
 	else if (!strcmp(arg, "--no-ext-diff"))
 		DIFF_OPT_CLR(options, ALLOW_EXTERNAL);
+	else if (!strcmp(arg, "--textconv"))
+		DIFF_OPT_SET(options, ALLOW_TEXTCONV);
+	else if (!strcmp(arg, "--no-textconv"))
+		DIFF_OPT_CLR(options, ALLOW_TEXTCONV);
 	else if (!strcmp(arg, "--ignore-submodules"))
 		DIFF_OPT_SET(options, IGNORE_SUBMODULES);
 
-- 
1.6.1.rc2.285.gc1cf2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] diff: allow turning on textconv explicitly for plumbing
  2008-12-08  2:57 [PATCH 2/3] diff: allow turning on textconv explicitly for plumbing Jeff King
@ 2008-12-08  3:55 ` Junio C Hamano
  2008-12-08  4:08   ` Junio C Hamano
  2008-12-08  4:59   ` Jeff King
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2008-12-08  3:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> I know this is not strictly a bugfix and we are in -rc, but:
>
>   1. It is an enhancement to a previously unreleased feature, and
>      shouldn't affect anything outside of that.
>
>   2. It affects the scripting interface to textconv, so I would like to
>      get it in before textconv is ever released so that it is always the
>      "right way" to turn text conversion off or on.

I'd agree with #1, especially if you said "doesn't" instead of
"shouldn't".

But I am not 100% sure if the scripting part is "the right way".

If a script wants to take whatever Porcelain users are happy as the
"presentation for human consumption" and pass that through as its own
output to the end user, maybe it is better off reading from Porcelain,
instead of reading from the plumbing (the latter of which requires making
the plumbing output less reliable)?

When we later enhance textconv output from the "diff" Porcelain to benefit
interactive users, it will automatically help the script that passes
through the "diff" output to the end users.

You can certainly argue that this "textconv" feature that is grafted from
Porcelain into plumbing is a special case in that its output is subject to
change any time to help human consumption and we never strive for its
stability as we do for other features in the plumbing to support machine
readability by scripts.  You can propagate the later enhancement of
textconv diff output we'd make for Porcelain to the scripted users that
reads from the plumbing that way.

But then wouldn't it be the same for these scripts that do value the
"presentation meant for human consumption" over "machine readability" to
read from Porcelain?  That would not have to blur the distinction between
the Porcelain and plumbing like the approach you are suggesting here.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] diff: allow turning on textconv explicitly for plumbing
  2008-12-08  3:55 ` Junio C Hamano
@ 2008-12-08  4:08   ` Junio C Hamano
  2008-12-08  4:59   ` Jeff King
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2008-12-08  4:08 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> I know this is not strictly a bugfix and we are in -rc, but:
>>
>>   1. It is an enhancement to a previously unreleased feature, and
>>      shouldn't affect anything outside of that.
>>
>>   2. It affects the scripting interface to textconv, so I would like to
>>      get it in before textconv is ever released so that it is always the
>>      "right way" to turn text conversion off or on.
>
> I'd agree with #1, especially if you said "doesn't" instead of
> "shouldn't".
>
> But I am not 100% sure if the scripting part is "the right way".

But I'll apply them anyway.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] diff: allow turning on textconv explicitly for plumbing
  2008-12-08  3:55 ` Junio C Hamano
  2008-12-08  4:08   ` Junio C Hamano
@ 2008-12-08  4:59   ` Jeff King
  2008-12-08  6:52     ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff King @ 2008-12-08  4:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sun, Dec 07, 2008 at 07:55:12PM -0800, Junio C Hamano wrote:

> But I am not 100% sure if the scripting part is "the right way".
> 
> If a script wants to take whatever Porcelain users are happy as the
> "presentation for human consumption" and pass that through as its own
> output to the end user, maybe it is better off reading from Porcelain,
> instead of reading from the plumbing (the latter of which requires making
> the plumbing output less reliable)?

But I don't think having gitk call the porcelain makes sense, either.
That would enable, for example, external diff drivers which might
actually spawn an entirely new window. And I don't think most people
would want that from gitk.

Now, obviously it could call "git diff --no-ext-diff" to opt out of
that. But I think it is backwards to call porcelain but opt out of
features that you don't want. It makes more sense to start with the bare
minimum (i.e., plumbing) and opt in to features that you are OK with.

And yes, you increase the risk that a script which blindly passes
options to plumbing may now be asked to generate a textconv'd diff which
will be useless to the user. But:

  1. that is already a risk with --ext-diff

  2. it is actually a _feature_. It is asking the user to say "is this
     a sane option to pass for my situation or not" instead of forcing
     the script to make that decision.

> When we later enhance textconv output from the "diff" Porcelain to
> benefit interactive users, it will automatically help the script that
> passes through the "diff" output to the end users.

I'm not sure what enhancement you mean. But it is possible that gitk
would not _want_ such an enhancement, and it would then have to opt out
of it.

> You can certainly argue that this "textconv" feature that is grafted from
> Porcelain into plumbing is a special case in that its output is subject to
> change any time to help human consumption and we never strive for its
> stability as we do for other features in the plumbing to support machine
> readability by scripts.  You can propagate the later enhancement of
> textconv diff output we'd make for Porcelain to the scripted users that
> reads from the plumbing that way.

Right. _If_ it's a change that won't upset any plumbing consumers. If it
is, then it needs to be a separate option so that the plumbing consumers
who don't mind the change can start using it.

> But then wouldn't it be the same for these scripts that do value the
> "presentation meant for human consumption" over "machine readability" to
> read from Porcelain?  That would not have to blur the distinction between
> the Porcelain and plumbing like the approach you are suggesting here.

I don't agree that this is blurring the distinction. I don't see
textconv diffs as a "porcelain feature" that has been grafted on to
plumbing. I see it as a core diff feature which can be turned off or on.
Porcelain has it on by default, and plumbing has it off by default. But
you can ask either to change their default settings[1].

-Peff

[1]: You can argue that the current implementation is buggy in this
sense, though, since some porcelains will accept "--textconv" but not
change their behavior. I think blame, for example, will have that
problem, because it accepts diff options but does not necessarily use
them to create diffs. But I think that just means we have a bug and
eventually should handle the case of "git blame --textconv" correctly.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] diff: allow turning on textconv explicitly for plumbing
  2008-12-08  4:59   ` Jeff King
@ 2008-12-08  6:52     ` Junio C Hamano
  2008-12-08  7:14       ` Jeff King
  2008-12-08  8:27       ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2008-12-08  6:52 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

>> You can certainly argue that this "textconv" feature that is grafted from
>> Porcelain into plumbing is a special case in that its output is subject to
>> change any time to help human consumption and we never strive for its
>> stability as we do for other features in the plumbing to support machine
>> readability by scripts.  You can propagate the later enhancement of
>> textconv diff output we'd make for Porcelain to the scripted users that
>> reads from the plumbing that way.
>
> Right. _If_ it's a change that won't upset any plumbing consumers. If it
> is, then it needs to be a separate option so that the plumbing consumers
> who don't mind the change can start using it.

In the above argument, you are assuming that you can have a complete
enumeration of "plumbing consumers", so that we can tell what kind of
changes to textconv output do affect them in a negative way, and what kind
of changes are safe.

Yes, you can enumerate in-tree consumers, but that misses the whole point
of having "plumbing", which can be used by any scripts out of tree and
they can rely on stable output from the plumbing.  By giving --textconv to
them to use with the plumbing, you are effecively casting textconv output
in stone.  By definition we will never know what kind of changes to the
output from the textconv filter out of tree consumers would mind.

Currently "diff" machinery uses the output of textconv filter as-is
without any further embellishment, but I think it is too early to tell if
that is really what we would want, or we may need some minimum
postprocessing on what we receive from the textconv filter before feeding
that to the textual diff machinery, because the feature has not been used
in the field at all yet.

In any case, I've applied the series for an entirely different reason.
The patch is the most natural way to allow users of Porcelain to disable
textconv with the --no-textconv option, just like --no-ext-diff can be
used to disable the external diff.

We _might_ want to add another patch to make the plumbing layer ignore (or
error out) --textconv option given from the command line before 1.6.1
ships, and after we gain confidence with the stability of the feature,
revert that patch to allow use of --textconv freely from the plumbing.

It always is easier to introduce a new feature in a more limited form and
then later loosen it, than adding it in an unrestricted form and having
later to limit it for whatever reason.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] diff: allow turning on textconv explicitly for plumbing
  2008-12-08  6:52     ` Junio C Hamano
@ 2008-12-08  7:14       ` Jeff King
  2008-12-08  8:27       ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Jeff King @ 2008-12-08  7:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sun, Dec 07, 2008 at 10:52:39PM -0800, Junio C Hamano wrote:

> > Right. _If_ it's a change that won't upset any plumbing consumers. If it
> > is, then it needs to be a separate option so that the plumbing consumers
> > who don't mind the change can start using it.
> 
> In the above argument, you are assuming that you can have a complete
> enumeration of "plumbing consumers", so that we can tell what kind of
> changes to textconv output do affect them in a negative way, and what kind
> of changes are safe.

No, I'm assuming we have an idea of what is a "reasonable" change to the
feature that won't break consumers, and what is a change that will
break them. And it's the same judgement we have to make all the time
when thinking about how changes will impact the scriptable interface.

And furthermore, I'm proposing to take the safest path, which is to
say that the scriptable interface _doesn't_ change unless each consumer
decides to ask it to do so. That is, the output of "git diff-tree" is
unchanged unless the caller explicitly allows textconv. So now we have
given a meaning to "--textconv". If you want a _new_ textconv variant
(let's say you want it to output the textconv patch with some "this
isn't a real patch" munging, and also include the binary patch -- which
would make it both human readable and applicable), then I think it is
safe to say that is an incompatible change. And it gets called
--textconv-with-patch or whatever, and porcelain can move to it by
default. But the plumbing consumers still get the same thing via
--textconv, and they can opt into the new format if that is useful to
them.

So uptake of the new feature is slower (you have to wait for the
maintainer of the script to support it) but you never break
compatibility.

> Yes, you can enumerate in-tree consumers, but that misses the whole point
> of having "plumbing", which can be used by any scripts out of tree and
> they can rely on stable output from the plumbing.  By giving --textconv to
> them to use with the plumbing, you are effecively casting textconv output
> in stone.  By definition we will never know what kind of changes to the
> output from the textconv filter out of tree consumers would mind.

Right. But we _have_ to cast it in stone if we want to make it available
to them at all. So I am proposing to give what exists now a name, and
further enhancements will have to have a different name.

If you want to argue that it is too early to cast in stone, since we
will have to carry around this implementation and the name "--textconv"
forever, then I can accept that. But it is a bit annoying, since I want
to use it in gitk _now_. :)

But I do run 'next' usually, so maybe it is worth pushing it off until
the next release cycle.

> In any case, I've applied the series for an entirely different reason.
> The patch is the most natural way to allow users of Porcelain to disable
> textconv with the --no-textconv option, just like --no-ext-diff can be
> used to disable the external diff.

Yes, I agree that is worthwhile and should have been there since the
beginning (and I think should definitely ship in 1.6.1).

> We _might_ want to add another patch to make the plumbing layer ignore (or
> error out) --textconv option given from the command line before 1.6.1
> ships, and after we gain confidence with the stability of the feature,
> revert that patch to allow use of --textconv freely from the plumbing.

If you want to do that, the simplest thing is to simply take the
"--no-textconv" part of 2/3 and drop the rest. There is not really any
point in supporting --textconv just for porcelain, which already has it
on by default.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] diff: allow turning on textconv explicitly for plumbing
  2008-12-08  6:52     ` Junio C Hamano
  2008-12-08  7:14       ` Jeff King
@ 2008-12-08  8:27       ` Junio C Hamano
  2008-12-09  5:48         ` Jeff King
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2008-12-08  8:27 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> In any case, I've applied the series for an entirely different reason.
> The patch is the most natural way to allow users of Porcelain to disable
> textconv with the --no-textconv option, just like --no-ext-diff can be
> used to disable the external diff.

I imagine that web developers may want to use a textconv filter that
replaces ">" with "\n>" on the HTML files their designer-colleagues throw
in the source tree to make "git log -p" of the whole project easier to
follow.  When the developers would want to suggest improvements to what
their designer-colleagues did, however, by running "git diff --stat -p" in
their dirty work tree to produce a patch (like I just did just now,
visible from the mnemonic prefixes below), they would want to disable
textconv temporarily to get an appliable patch with --no-textconv option.

Once we have --no-textconv, somebody would inevitably ask about its
positive counterpart, --textconv option.  Even though it might not make
sense from patch applicability viewpoint, the option would allow the end
user to explicitly ask for "git format-patch --textconv" and get a patch
that can only be reviewed by humans but cannot be applied.  Hence the
attached update to the draft release notes to version 1.6.1 [Update #1].

You raised an intriguing possibility to use textconv in blame.  It would
also be useful if we allowed "git show --textconv $blob" to pass the blob
via textconv filter and any other transformation controlled by the
attributes mechanism..  When "git show" sees the above command line, it
only knows the blob object name and not the path, so we may need to allow
a new option to tell the command to pretend as if the content came from a
path, perhaps with a syntax like:

	$ git show --attribute-path=a/b/c $blob
	$ git show --attribute-path=a/b/c --textconv $blob

Note that I envision that the above two commands would produce different
results.  The former would behave as if $blob were recorded at path a/b/c
and what would be checked out (i.e. usual crlf, ident, and smudge that are
in effect for the path are applied) to the standard output.  The latter
would apply the textconv filter that would apply to what is recorded at
the given path and show that.

[Update #1]

 Documentation/RelNotes-1.6.1.txt |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git c/Documentation/RelNotes-1.6.1.txt w/Documentation/RelNotes-1.6.1.txt
index 0405309..09bbcad 100644
--- c/Documentation/RelNotes-1.6.1.txt
+++ w/Documentation/RelNotes-1.6.1.txt
@@ -133,7 +133,8 @@ on.
   contents can be munged into human readable form and the difference
   between the results of the conversion can be viewed (obviously this
   cannot produce a patch that can be applied, so this is disabled in
-  format-patch among other things).
+  format-patch and plumbing, but if you really wanted to, you can enable
+  it by giving them --textconv command line option explicitly).
 
 * "--cached" option to "git diff has an easier to remember synonym "--staged",
   to ask "what is the difference between the given commit and the

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] diff: allow turning on textconv explicitly for plumbing
  2008-12-08  8:27       ` Junio C Hamano
@ 2008-12-09  5:48         ` Jeff King
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2008-12-09  5:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Dec 08, 2008 at 12:27:25AM -0800, Junio C Hamano wrote:

> I imagine that web developers may want to use a textconv filter that
> replaces ">" with "\n>" on the HTML files their designer-colleagues throw
> in the source tree to make "git log -p" of the whole project easier to

Hrm, didn't you argue against textconv'ing non-binary contents earlier
in the cycle? At any rate, I agree that is a reasonable use. Maybe a
better name would have been "normalize" or "canonicalize".

> follow.  When the developers would want to suggest improvements to what
> their designer-colleagues did, however, by running "git diff --stat -p" in
> their dirty work tree to produce a patch (like I just did just now,
> visible from the mnemonic prefixes below), they would want to disable
> textconv temporarily to get an appliable patch with --no-textconv option.

Yep, exactly. I use "git diff >patch" all the time instead of
format-patch.

> You raised an intriguing possibility to use textconv in blame.  It would
> also be useful if we allowed "git show --textconv $blob" to pass the blob
> via textconv filter and any other transformation controlled by the
> attributes mechanism..  When "git show" sees the above command line, it
> only knows the blob object name and not the path, so we may need to allow
> a new option to tell the command to pretend as if the content came from a
> path, perhaps with a syntax like:
> 
> 	$ git show --attribute-path=a/b/c $blob
> 	$ git show --attribute-path=a/b/c --textconv $blob

I think that makes sense. If you are going to implement a "use this as
the attribute path" feature, though, you might also want just "use these
attributes" which I can imagine being simpler in some cases. Like:

  $ git show --attribute diff=whatever $blob

But on the subject of "other places to see textconv", another one I
considered was git log -S. I haven't looked to see if it even checks
binary files at all, but certainly if I was searching for some text
content I would want to find it in the text version (and depending on
the file format, there may be other binary cruft preventing you from
finding it in the binary version at all).

The series I posted was to show the text in gitk diffs. However, it does
nothing for finding text in gitk's "find commits introducing this
string", which I assume just uses git's pickaxe.

Unfortunately, it would probably be painfully slow. But still better
than nothing. One enhancement for textconv that I am considering is to
name the tempfiles based on the blob sha1. Then a smart helper could
cache expensive conversions if it wanted (and dumb helpers would
obviously just ignore the filename).

Side note: For one use case, storing word processor documents, this
approach seems sensible. But for my jpg example, it almost seems that
I could use git more efficiently by having two files: one with the exif
text, and one with the image content. And then I could "build" the
resulting tagged jpgs by combining the two, but operations on the text
would be very efficient. The downside, of course, being that the "build"
step is annoying, and consumes a ton of disk space. But it just occurred
to me as an alternate way to think about the use case.

> @@ -133,7 +133,8 @@ on.
>    contents can be munged into human readable form and the difference
>    between the results of the conversion can be viewed (obviously this
>    cannot produce a patch that can be applied, so this is disabled in
> -  format-patch among other things).
> +  format-patch and plumbing, but if you really wanted to, you can enable
> +  it by giving them --textconv command line option explicitly).

I wonder if we want to word this even more strongly: --textconv is there
and you can play with it if you want, but we are not promising that it
is part of the stable API yet.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-12-09  5:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-08  2:57 [PATCH 2/3] diff: allow turning on textconv explicitly for plumbing Jeff King
2008-12-08  3:55 ` Junio C Hamano
2008-12-08  4:08   ` Junio C Hamano
2008-12-08  4:59   ` Jeff King
2008-12-08  6:52     ` Junio C Hamano
2008-12-08  7:14       ` Jeff King
2008-12-08  8:27       ` Junio C Hamano
2008-12-09  5:48         ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).