git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Giving command line parameter to textconv command?
@ 2009-12-14 22:17 Nanako Shiraishi
  2009-12-14 23:31 ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Nanako Shiraishi @ 2009-12-14 22:17 UTC (permalink / raw)
  To: git

Some text documents in my repository is encoded in EUC-JP 
and I have this line in my .gitattributes file.

	*.txt	diff=eucjp

and these two lines in my .git/config file.

	[diff "eucjp"]
		textconv = nkf-w

And I have ~/bin/nkf-w script that is executable.

	#!/bin/sh
	nkf -w "$@"

The command takes a (Japanese) text file and converts it 
into UTF-8 (it guesses the input encoding).

I need this extra script because setting 'nkf -w' for
textconv like this

	[diff "eucjp"]
		textconv = nkf -w

gives an error.

	% diff --git a/hello.txt b/hello.txt
	index 696acd7..f07aa1a 100644
	error: cannot run nkf -w: No such file or directory
	error: error running textconv command 'nkf -w'
	fatal: unable to read files to diff

Could you fix textconv so that it can be given parameters?

-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Giving command line parameter to textconv command?
  2009-12-14 22:17 Giving command line parameter to textconv command? Nanako Shiraishi
@ 2009-12-14 23:31 ` Junio C Hamano
  2009-12-15  3:11   ` Nanako Shiraishi
                     ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Junio C Hamano @ 2009-12-14 23:31 UTC (permalink / raw)
  To: Nanako Shiraishi; +Cc: git, Jeff King

Nanako Shiraishi <nanako3@lavabit.com> writes:

> I need this extra script because setting 'nkf -w' for
> textconv like this
>
> 	[diff "eucjp"]
> 		textconv = nkf -w
>
> gives an error.
>
> 	% diff --git a/hello.txt b/hello.txt
> 	index 696acd7..f07aa1a 100644
> 	error: cannot run nkf -w: No such file or directory
> 	error: error running textconv command 'nkf -w'
> 	fatal: unable to read files to diff
>
> Could you fix textconv so that it can be given parameters?

The change to do so looks like this; it has a few side effects:

 - If somebody else were relying on the fact that 'nkf -w' names the
   entire command, it now will run 'nkf' command with '-w' as an argument,
   and it will break such a set-up.  IOW, command that has an IFS white
   space in its path will now need to be quoted from the shell.

   You can see the fallout from this in the damage made to t/ hierarchy in
   the attached patch.

 - You can now use $HOME and other environment variables your shell
   expands when defining your textconv command.

Overall I think it is a good direction to go, but we need to be careful
about how we transition the existing repositories that use the old
semantics.

We might need to introduce diff.*.xtextconv or something.

 diff.c                         |    9 ++++-----
 t/t4030-diff-textconv.sh       |    2 +-
 t/t4031-diff-rewrite-binary.sh |    2 +-
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 08bbd3e..64a1486 100644
--- a/diff.c
+++ b/diff.c
@@ -3760,15 +3760,14 @@ static char *run_textconv(const char *pgm, struct diff_filespec *spec,
 		size_t *outsize)
 {
 	struct diff_tempfile *temp;
-	const char *argv[3];
-	const char **arg = argv;
+	const char *argv[4] = { "sh", "-c", NULL, NULL };
 	struct child_process child;
 	struct strbuf buf = STRBUF_INIT;
+	struct strbuf cmd = STRBUF_INIT;
 
 	temp = prepare_temp_file(spec->path, spec);
-	*arg++ = pgm;
-	*arg++ = temp->name;
-	*arg = NULL;
+	strbuf_addf(&cmd, "%s %s", pgm, temp->name);
+	argv[2] = strbuf_detach(&cmd, NULL);
 
 	memset(&child, 0, sizeof(child));
 	child.argv = argv;
diff --git a/t/t4030-diff-textconv.sh b/t/t4030-diff-textconv.sh
index a3f0897..3468f77 100755
--- a/t/t4030-diff-textconv.sh
+++ b/t/t4030-diff-textconv.sh
@@ -48,7 +48,7 @@ test_expect_success 'file is considered binary by plumbing' '
 
 test_expect_success 'setup textconv filters' '
 	echo file diff=foo >.gitattributes &&
-	git config diff.foo.textconv "$PWD"/hexdump &&
+	git config diff.foo.textconv \""$PWD"/hexdump\" &&
 	git config diff.fail.textconv false
 '
 
diff --git a/t/t4031-diff-rewrite-binary.sh b/t/t4031-diff-rewrite-binary.sh
index a894c60..e6cb30e 100755
--- a/t/t4031-diff-rewrite-binary.sh
+++ b/t/t4031-diff-rewrite-binary.sh
@@ -54,7 +54,7 @@ chmod +x dump
 
 test_expect_success 'setup textconv' '
 	echo file diff=foo >.gitattributes &&
-	git config diff.foo.textconv "$PWD"/dump
+	git config diff.foo.textconv \""$PWD"/dump\"
 '
 
 test_expect_success 'rewrite diff respects textconv' '

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Giving command line parameter to textconv command?
  2009-12-14 23:31 ` Junio C Hamano
@ 2009-12-15  3:11   ` Nanako Shiraishi
  2009-12-15  5:56     ` Junio C Hamano
  2009-12-15 17:03   ` Jeff King
  2009-12-30  3:13   ` Nanako Shiraishi
  2 siblings, 1 reply; 33+ messages in thread
From: Nanako Shiraishi @ 2009-12-15  3:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

Quoting Junio C Hamano <gitster@pobox.com>

> Nanako Shiraishi <nanako3@lavabit.com> writes:
>
>> I need this extra script because setting 'nkf -w' for
>> textconv like this
>>
>> 	[diff "eucjp"]
>> 		textconv = nkf -w
>>
>> gives an error.
>>
>> 	% diff --git a/hello.txt b/hello.txt
>> 	index 696acd7..f07aa1a 100644
>> 	error: cannot run nkf -w: No such file or directory
>> 	error: error running textconv command 'nkf -w'
>> 	fatal: unable to read files to diff
>>
>> Could you fix textconv so that it can be given parameters?
>
> The change to do so looks like this; it has a few side effects:
>
>  - If somebody else were relying on the fact that 'nkf -w' names the
>    entire command, it now will run 'nkf' command with '-w' as an argument,
>    and it will break such a set-up.  IOW, command that has an IFS white
>    space in its path will now need to be quoted from the shell.
>
>    You can see the fallout from this in the damage made to t/ hierarchy in
>    the attached patch.
>
>  - You can now use $HOME and other environment variables your shell
>    expands when defining your textconv command.
>
> Overall I think it is a good direction to go, but we need to be careful
> about how we transition the existing repositories that use the old
> semantics.
>
> We might need to introduce diff.*.xtextconv or something.

I experimented with other variables (eg. smudge and clean) and 
they honor their command line arguments. If textconv is the only 
setting that doesn't, the change may be easier to justify.

By the way, there should be a better description of the filters 
in the gitattributes documentation, similar to how [diff "name"] 
sections in the .git/config file are described.

-- >8 --
Subject: Illustrate "filter" attribute with an example

The example was taken from aa4ed402c9721170fde2e9e43c3825562070e65e
(Add 'filter' attribute and external filter driver definition).

Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com>
---
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 1f472ce..5a45e51 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -197,6 +197,25 @@ intent is that if someone unsets the filter driver definition,
 or does not have the appropriate filter program, the project
 should still be usable.
 
+For example, in .gitattributes, you would assign the `filter`
+attribute for paths.
+
+------------------------
+*.c	filter=indent
+------------------------
+
+Then you would define a "filter.indent.clean" and "filter.indent.smudge"
+configuration in your .git/config to specify a pair of commands to
+modify the contents of C programs when the source files are checked
+in ("clean" is run) and checked out (no change is made because the
+command is "cat").
+
+------------------------
+[filter "indent"]
+	clean = indent
+	smudge = cat
+------------------------
+
 
 Interaction between checkin/checkout attributes
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^






-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Giving command line parameter to textconv command?
  2009-12-15  3:11   ` Nanako Shiraishi
@ 2009-12-15  5:56     ` Junio C Hamano
  2009-12-15 16:49       ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2009-12-15  5:56 UTC (permalink / raw)
  To: Nanako Shiraishi; +Cc: git, Jeff King

Nanako Shiraishi <nanako3@lavabit.com> writes:

> I experimented with other variables (eg. smudge and clean) and 
> they honor their command line arguments. If textconv is the only 
> setting that doesn't, the change may be easier to justify.

Yes, as you found out, convert.c::apply_filter() is aware of the command
line arguments.

Let's try to do a bit more work to make the coverage complete.  After
scanning "git grep -e start_async -e run_command" output, here is what I
came up with:

 - editor.c::launch_editor() that allows a custom editor named via
   GIT_EDITOR does seem to honor your command line arguments.

 - pager.c::setup_pager() is used for GIT_PAGER and it does honor your
   command line arguments.

 - ll-merge.c::ll_ext_merge() that is used to handle custom merge drivers
   lets the user specify command line via templating to replace %O %A %B
   and naturally it needs to be aware of the command line arguments.

 - diff.c::run_external_diff() that runs GIT_EXTERNAL_DIFF defines that
   the command has to take 7 parameters in a fixed order, and is not
   designed to permute its arguments like ll_ext_merge() does, but these
   days people don't use it directly (they use it indirectly via
   "difftool" wrapper), so it probably is not an issue.

 - merge-index.c::merge_entry() also defines a strict order and semantics
   to its parameters, but similar to GIT_EXTERNAL_DIFF, it is not
   something you would throw a ready-made program (like an editor or an
   pager) and expect it to work, so it wouldn't be an issue either.

Hooks do not even take arbitrary command line arguments, so we don't have
to worry about them.

So it does look like that textconv is the only odd-man out.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Giving command line parameter to textconv command?
  2009-12-15  5:56     ` Junio C Hamano
@ 2009-12-15 16:49       ` Jeff King
  2009-12-16  1:05         ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2009-12-15 16:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nanako Shiraishi, git

On Mon, Dec 14, 2009 at 09:56:28PM -0800, Junio C Hamano wrote:

> Let's try to do a bit more work to make the coverage complete.  After
> scanning "git grep -e start_async -e run_command" output, here is what I
> came up with:
> 
>  - editor.c::launch_editor() that allows a custom editor named via
>    GIT_EDITOR does seem to honor your command line arguments.
> 
>  - pager.c::setup_pager() is used for GIT_PAGER and it does honor your
>    command line arguments.
> 
>  - ll-merge.c::ll_ext_merge() that is used to handle custom merge drivers
>    lets the user specify command line via templating to replace %O %A %B
>    and naturally it needs to be aware of the command line arguments.

I think it is important to note in user-facing documentation (like
release notes describing the proposed textconv change) that these are
not just "honor your command line arguments" but "execute your command
with /bin/sh". Maybe that is obviously how it is implemented, but I
think it should be made clear that you have the power to use (and
responsibility to protect yourself from) arbitrary shell code.

>  - diff.c::run_external_diff() that runs GIT_EXTERNAL_DIFF defines that
>    the command has to take 7 parameters in a fixed order, and is not
>    designed to permute its arguments like ll_ext_merge() does, but these
>    days people don't use it directly (they use it indirectly via
>    "difftool" wrapper), so it probably is not an issue.

There is also diff.*.command, which I think people _do_ set manually (I
used to, until I wrote textconv. :) ). It does not use the shell
currently. I agree that people almost certainly have to write a
shell-script wrapper anyway. But I wonder if we should pass it through
the shell, just for the sake of consistency with the other variables (in
particular, if textconv changes, I think diff.*.command is closely
related).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Giving command line parameter to textconv command?
  2009-12-14 23:31 ` Junio C Hamano
  2009-12-15  3:11   ` Nanako Shiraishi
@ 2009-12-15 17:03   ` Jeff King
  2009-12-15 17:23     ` Junio C Hamano
  2009-12-30  3:13   ` Nanako Shiraishi
  2 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2009-12-15 17:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nanako Shiraishi, git

On Mon, Dec 14, 2009 at 03:31:38PM -0800, Junio C Hamano wrote:

> The change to do so looks like this; it has a few side effects:
> 
>  - If somebody else were relying on the fact that 'nkf -w' names the
>    entire command, it now will run 'nkf' command with '-w' as an argument,
>    and it will break such a set-up.  IOW, command that has an IFS white
>    space in its path will now need to be quoted from the shell.
> 
>    You can see the fallout from this in the damage made to t/ hierarchy in
>    the attached patch.
> 
>  - You can now use $HOME and other environment variables your shell
>    expands when defining your textconv command.
> 
> Overall I think it is a good direction to go, but we need to be careful
> about how we transition the existing repositories that use the old
> semantics.

I agree that this is a good change. My only reservation would be that
spawning a shell would be slightly slower (think "git log -p"). However:

  1. textconv is dog-slow already, as it has to dump each blob to a
     tempfile[1].

  2. There is an obvious optimization, which is that you can skip the
     shell if there are no metacharacters (in fact, we seem to be doing
     that already in launch_editor).

> We might need to introduce diff.*.xtextconv or something.

I am torn on this. I don't like introducing behavior changes that might
hurt people (and really, I think we are just talking about people with
textconv pointing to a program by its full path that has a space in it).
But I also hate carrying around baggage crap like xtextconv forever. It
makes git harder to explain why there are two variables that do almost
the same thing, and the one they want to use is probably the one with
the crappier, unlikely-to-be-guessed name.

So I am somewhat inclined to call it a bug fix, but I don't feel very
strongly.

-Peff

[1] The current textconv interface is really nice for things like just
using "antiword" out of the box. But I wrote a new interface which can
be much faster: it calls the textconv filter with the blob name, and
then the filter is responsible for using cat-file to get at the blob.
This means the filter can look at only part of a blob (e.g., if we are
interested in the metadata tags at the beginning of a large media file),
and it can cache answers as it sees fit, avoiding access to the blob
entirely.

I need to polish the code a bit and submit it. Obviously this is not
meant to replace the existing textconv, but rather to supplement it, for
when "fast and inconvenient" is better than "slow and simple". What is
the best way to configure this? I can imagine "diff.*.textconvType =
fast", or also "diff.*.fastTextconv".

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Giving command line parameter to textconv command?
  2009-12-15 17:03   ` Jeff King
@ 2009-12-15 17:23     ` Junio C Hamano
  0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2009-12-15 17:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Nanako Shiraishi, git

Jeff King <peff@peff.net> writes:

> [1] The current textconv interface is really nice for things like just
> using "antiword" out of the box. But I wrote a new interface which can
> be much faster: it calls the textconv filter with the blob name, and
> then the filter is responsible for using cat-file to get at the blob.
> This means the filter can look at only part of a blob (e.g., if we are
> interested in the metadata tags at the beginning of a large media file),
> and it can cache answers as it sees fit, avoiding access to the blob
> entirely.
>
> I need to polish the code a bit and submit it. Obviously this is not
> meant to replace the existing textconv, but rather to supplement it, for
> when "fast and inconvenient" is better than "slow and simple". What is
> the best way to configure this? I can imagine "diff.*.textconvType =
> fast", or also "diff.*.fastTextconv".

"diff.*.blobfilter"?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Giving command line parameter to textconv command?
  2009-12-15 16:49       ` Jeff King
@ 2009-12-16  1:05         ` Junio C Hamano
  2009-12-16  1:13           ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2009-12-16  1:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Nanako Shiraishi, git

Jeff King <peff@peff.net> writes:

>>  - diff.c::run_external_diff() that runs GIT_EXTERNAL_DIFF defines that
>>    the command has to take 7 parameters in a fixed order, and is not
>>    designed to permute its arguments like ll_ext_merge() does, but these
>>    days people don't use it directly (they use it indirectly via
>>    "difftool" wrapper), so it probably is not an issue.
>
> There is also diff.*.command, which I think people _do_ set manually (I
> used to, until I wrote textconv. :) ).

I had to spend fair amount of time to find where "diff.*.command" is
described.  We may want to update the documentation.

> .... I agree that people almost certainly have to write a shell-script
> wrapper anyway. But I wonder if we should pass it through the shell,
> just for the sake of consistency with the other variables (in
> particular, if textconv changes,

This covers GIT_EXTERNAL_DIFF, diff.external, and diff.<driver>.command
trio, if I am not mistaken.

If we changed run_external_diff(), in practice nobody would notice, except
for people who have installed the difftool helper in a directory with IFS
in the path.  That's one downside but I don't offhand see a use case where
the change would make somebody vastly happier.

But maybe people will find good uses and we'll never know until we try.
Care to roll a patch for that as well, to be queued for 1.7.0 (which will
be the one after 1.6.6)?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Giving command line parameter to textconv command?
  2009-12-16  1:05         ` Junio C Hamano
@ 2009-12-16  1:13           ` Jeff King
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2009-12-16  1:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nanako Shiraishi, git

On Tue, Dec 15, 2009 at 05:05:52PM -0800, Junio C Hamano wrote:

> > There is also diff.*.command, which I think people _do_ set manually (I
> > used to, until I wrote textconv. :) ).
> 
> I had to spend fair amount of time to find where "diff.*.command" is
> described.  We may want to update the documentation.

Yeah, I think textconv is similarly hard to find. We should probably
have a pointer in "git-diff.txt" to the attributes documentation.

I also think it would be much more obvious as "diff.*.external", but it
is probably not worth changing at this point.

> > .... I agree that people almost certainly have to write a shell-script
> > wrapper anyway. But I wonder if we should pass it through the shell,
> > just for the sake of consistency with the other variables (in
> > particular, if textconv changes,
> 
> This covers GIT_EXTERNAL_DIFF, diff.external, and diff.<driver>.command
> trio, if I am not mistaken.

It does cover all three. We could do them separately, I guess, but I
think that is just making things confusingly more inconsistent.

> If we changed run_external_diff(), in practice nobody would notice, except
> for people who have installed the difftool helper in a directory with IFS
> in the path.  That's one downside but I don't offhand see a use case where
> the change would make somebody vastly happier.

Yeah, the only upside I can see is consistency. Which I do value, but it
will be a hard sell to somebody whose setup has been broken. ;P

> But maybe people will find good uses and we'll never know until we try.
> Care to roll a patch for that as well, to be queued for 1.7.0 (which will
> be the one after 1.6.6)?

Will do, but it will probably be a few days as I'm about to do some
holiday traveling. I'll do the textconv change with it as a series.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Giving command line parameter to textconv command?
  2009-12-14 23:31 ` Junio C Hamano
  2009-12-15  3:11   ` Nanako Shiraishi
  2009-12-15 17:03   ` Jeff King
@ 2009-12-30  3:13   ` Nanako Shiraishi
  2009-12-30  8:05     ` Junio C Hamano
  2 siblings, 1 reply; 33+ messages in thread
From: Nanako Shiraishi @ 2009-12-30  3:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

Junio, could you tell us what happened to this thread?

I use this patch myself, and it works really well.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Giving command line parameter to textconv command?
  2009-12-30  3:13   ` Nanako Shiraishi
@ 2009-12-30  8:05     ` Junio C Hamano
  2009-12-30  9:56       ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2009-12-30  8:05 UTC (permalink / raw)
  To: Nanako Shiraishi; +Cc: git, Jeff King

Nanako Shiraishi <nanako3@lavabit.com> writes:

> Junio, could you tell us what happened to this thread?
>
> I use this patch myself, and it works really well.

I stopped promoting the patch after Peff mentioned he was planning a
rework of textconv, but now I re-read it, I think his rework would be
orthogonal to the patch.

Also Peff gives a good hint about borrowing how launch_editor() calls out
to the shell.  I think the codepath we fork&exec user-defined commands
(not hooks, but filters like smudge/clean/textconv and EDITOR/PAGER that
take a command line) need to be first enumerated, then we need to see if
we can refactor what launch_editor() does into a common helper function.
I felt it was unclear what we would want to do with GIT_EXTERNAL_DIFF,
diff.external, and diff.<driver>.command trio, but I tend to agree that we
should run things the same way, honoring $IFS and shell everywhere.

http://thread.gmane.org/gmane.comp.version-control.git/135250/focus=135312

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Giving command line parameter to textconv command?
  2009-12-30  8:05     ` Junio C Hamano
@ 2009-12-30  9:56       ` Jeff King
  2009-12-30 10:53         ` [PATCH 1/6] run-command: add "use shell" option Jeff King
                           ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: Jeff King @ 2009-12-30  9:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nanako Shiraishi, git

On Wed, Dec 30, 2009 at 12:05:20AM -0800, Junio C Hamano wrote:

> I stopped promoting the patch after Peff mentioned he was planning a
> rework of textconv, but now I re-read it, I think his rework would be
> orthogonal to the patch.

Actually, I thought I volunteered to produce a patch series converting
textconv and diff.

> Also Peff gives a good hint about borrowing how launch_editor() calls out
> to the shell.  I think the codepath we fork&exec user-defined commands
> (not hooks, but filters like smudge/clean/textconv and EDITOR/PAGER that
> take a command line) need to be first enumerated, then we need to see if
> we can refactor what launch_editor() does into a common helper function.

I think the helper function is not too hard. Whereas the patch you
posted earlier actually runs { "sh", "-c", "%s %s" } for textconv, it is
a bit simpler to do { "sh", "-c", "%s \"$@\"", ... } as launch_editor
does. And then we don't have to worry about quoting the arguments (which
your patch gets wrong).

> I felt it was unclear what we would want to do with GIT_EXTERNAL_DIFF,
> diff.external, and diff.<driver>.command trio, but I tend to agree that we
> should run things the same way, honoring $IFS and shell everywhere.

I thought it was left at "maybe for 1.7.0", but perhaps we don't have
time now to issue sufficient warning.

Anyway, I'm working on a series which will hopefully be done in a few
minutes.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/6] run-command: add "use shell" option
  2009-12-30  9:56       ` Jeff King
@ 2009-12-30 10:53         ` Jeff King
  2009-12-30 13:55           ` Erik Faye-Lund
  2010-01-01 22:12           ` Johannes Sixt
  2009-12-30 10:53         ` [PATCH 2/6] run-command: convert simple callsites to use_shell Jeff King
                           ` (4 subsequent siblings)
  5 siblings, 2 replies; 33+ messages in thread
From: Jeff King @ 2009-12-30 10:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nanako Shiraishi, git

Many callsites run "sh -c $CMD" to run $CMD. We can make it
a little simpler for them by factoring out the munging of
argv.

For simple cases with no arguments, this doesn't help much, but:

  1. For cases with arguments, we save the caller from
     having to build the appropriate shell snippet.

  2. We can later optimize to avoid the shell when
     there are no metacharacters in the program.

Signed-off-by: Jeff King <peff@peff.net>
---
I made the matching tweak to the Windows half of run-command, but I
don't actually have a box to test it on.

I modeled this after execv_git_cmd. Like that function, I try to release
the allocated argv on error. However, we do actually leak the strbuf
memory in one case. I'm not sure how much we care. On unix, this will
always happen in a forked process which will either exec or die. On
Windows, we seem to already be leaking the prepared argv for the git_cmd
case (and now we leak the shell_cmd case, too).

 run-command.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 run-command.h |    2 ++
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/run-command.c b/run-command.c
index cf2d8f7..dc65903 100644
--- a/run-command.c
+++ b/run-command.c
@@ -15,6 +15,46 @@ static inline void dup_devnull(int to)
 	close(fd);
 }
 
+static const char **prepare_shell_cmd(const char **argv)
+{
+	int argc, nargc = 0;
+	const char **nargv;
+
+	for (argc = 0; argv[argc]; argc++)
+		; /* just counting */
+	/* +1 for NULL, +3 for "sh -c" plus extra $0 */
+	nargv = xmalloc(sizeof(*nargv) * (argc + 1 + 3));
+
+	if (argc < 1)
+		die("BUG: shell command is empty");
+
+	nargv[nargc++] = "sh";
+	nargv[nargc++] = "-c";
+
+	if (argc < 2)
+		nargv[nargc++] = argv[0];
+	else {
+		struct strbuf arg0 = STRBUF_INIT;
+		strbuf_addf(&arg0, "%s \"$@\"", argv[0]);
+		nargv[nargc++] = strbuf_detach(&arg0, NULL);
+	}
+
+	for (argc = 0; argv[argc]; argc++)
+		nargv[nargc++] = argv[argc];
+	nargv[nargc] = NULL;
+
+	return nargv;
+}
+
+static int execv_shell_cmd(const char **argv)
+{
+	const char **nargv = prepare_shell_cmd(argv);
+	trace_argv_printf(nargv, "trace: exec:");
+	execvp(nargv[0], (char **)nargv);
+	free(nargv);
+	return -1;
+}
+
 int start_command(struct child_process *cmd)
 {
 	int need_in, need_out, need_err;
@@ -123,6 +163,8 @@ fail_pipe:
 			cmd->preexec_cb();
 		if (cmd->git_cmd) {
 			execv_git_cmd(cmd->argv);
+		} else if (cmd->use_shell) {
+			execv_shell_cmd(cmd->argv);
 		} else {
 			execvp(cmd->argv[0], (char *const*) cmd->argv);
 		}
@@ -179,6 +221,8 @@ fail_pipe:
 
 	if (cmd->git_cmd) {
 		cmd->argv = prepare_git_cmd(cmd->argv);
+	} else if (cmd->use_shell) {
+		cmd->argv = prepare_shell_cmd(cmd->argv);
 	}
 
 	cmd->pid = mingw_spawnvpe(cmd->argv[0], cmd->argv, env);
@@ -297,6 +341,7 @@ static void prepare_run_command_v_opt(struct child_process *cmd,
 	cmd->git_cmd = opt & RUN_GIT_CMD ? 1 : 0;
 	cmd->stdout_to_stderr = opt & RUN_COMMAND_STDOUT_TO_STDERR ? 1 : 0;
 	cmd->silent_exec_failure = opt & RUN_SILENT_EXEC_FAILURE ? 1 : 0;
+	cmd->use_shell = opt & RUN_USING_SHELL ? 1 : 0;
 }
 
 int run_command_v_opt(const char **argv, int opt)
diff --git a/run-command.h b/run-command.h
index fb34209..967ba8c 100644
--- a/run-command.h
+++ b/run-command.h
@@ -33,6 +33,7 @@ struct child_process {
 	unsigned git_cmd:1; /* if this is to be git sub-command */
 	unsigned silent_exec_failure:1;
 	unsigned stdout_to_stderr:1;
+	unsigned use_shell:1;
 	void (*preexec_cb)(void);
 };
 
@@ -46,6 +47,7 @@ extern int run_hook(const char *index_file, const char *name, ...);
 #define RUN_GIT_CMD	     2	/*If this is to be git sub-command */
 #define RUN_COMMAND_STDOUT_TO_STDERR 4
 #define RUN_SILENT_EXEC_FAILURE 8
+#define RUN_USING_SHELL 16
 int run_command_v_opt(const char **argv, int opt);
 
 /*
-- 
1.6.6.65.g050d2.dirty

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/6] run-command: convert simple callsites to use_shell
  2009-12-30  9:56       ` Jeff King
  2009-12-30 10:53         ` [PATCH 1/6] run-command: add "use shell" option Jeff King
@ 2009-12-30 10:53         ` Jeff King
  2009-12-30 10:55         ` [PATCH 3/6] run-command: optimize out useless shell calls Jeff King
                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2009-12-30 10:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nanako Shiraishi, git

Now that we have the use_shell feature, these callsites can
all be converted with small changes.

Signed-off-by: Jeff King <peff@peff.net>
---
These should be non-controversial.

 convert.c   |    3 ++-
 imap-send.c |    8 ++------
 ll-merge.c  |    6 +++---
 pager.c     |    5 +++--
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/convert.c b/convert.c
index 491e714..950b1f9 100644
--- a/convert.c
+++ b/convert.c
@@ -249,10 +249,11 @@ static int filter_buffer(int fd, void *data)
 	struct child_process child_process;
 	struct filter_params *params = (struct filter_params *)data;
 	int write_err, status;
-	const char *argv[] = { "sh", "-c", params->cmd, NULL };
+	const char *argv[] = { params->cmd, NULL };
 
 	memset(&child_process, 0, sizeof(child_process));
 	child_process.argv = argv;
+	child_process.use_shell = 1;
 	child_process.in = -1;
 	child_process.out = fd;
 
diff --git a/imap-send.c b/imap-send.c
index de8114b..51f371b 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -965,17 +965,13 @@ static struct store *imap_open_store(struct imap_server_conf *srvc)
 	/* open connection to IMAP server */
 
 	if (srvc->tunnel) {
-		const char *argv[4];
+		const char *argv[] = { srvc->tunnel, NULL };
 		struct child_process tunnel = {0};
 
 		imap_info("Starting tunnel '%s'... ", srvc->tunnel);
 
-		argv[0] = "sh";
-		argv[1] = "-c";
-		argv[2] = srvc->tunnel;
-		argv[3] = NULL;
-
 		tunnel.argv = argv;
+		tunnel.use_shell = 1;
 		tunnel.in = -1;
 		tunnel.out = -1;
 		if (start_command(&tunnel))
diff --git a/ll-merge.c b/ll-merge.c
index 2d6b6d6..18511e2 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -175,7 +175,7 @@ static int ll_ext_merge(const struct ll_merge_driver *fn,
 		{ "B", temp[2] },
 		{ NULL }
 	};
-	const char *args[] = { "sh", "-c", NULL, NULL };
+	const char *args[] = { NULL, NULL };
 	int status, fd, i;
 	struct stat st;
 
@@ -190,8 +190,8 @@ static int ll_ext_merge(const struct ll_merge_driver *fn,
 
 	strbuf_expand(&cmd, fn->cmdline, strbuf_expand_dict_cb, &dict);
 
-	args[2] = cmd.buf;
-	status = run_command_v_opt(args, 0);
+	args[0] = cmd.buf;
+	status = run_command_v_opt(args, RUN_USING_SHELL);
 	fd = open(temp[1], O_RDONLY);
 	if (fd < 0)
 		goto bad;
diff --git a/pager.c b/pager.c
index 92c03f6..2c7e8ec 100644
--- a/pager.c
+++ b/pager.c
@@ -28,7 +28,7 @@ static void pager_preexec(void)
 }
 #endif
 
-static const char *pager_argv[] = { "sh", "-c", NULL, NULL };
+static const char *pager_argv[] = { NULL, NULL };
 static struct child_process pager_process;
 
 static void wait_for_pager(void)
@@ -81,7 +81,8 @@ void setup_pager(void)
 	spawned_pager = 1; /* means we are emitting to terminal */
 
 	/* spawn the pager */
-	pager_argv[2] = pager;
+	pager_argv[0] = pager;
+	pager_process.use_shell = 1;
 	pager_process.argv = pager_argv;
 	pager_process.in = -1;
 	if (!getenv("LESS")) {
-- 
1.6.6.65.g050d2.dirty

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/6] run-command: optimize out useless shell calls
  2009-12-30  9:56       ` Jeff King
  2009-12-30 10:53         ` [PATCH 1/6] run-command: add "use shell" option Jeff King
  2009-12-30 10:53         ` [PATCH 2/6] run-command: convert simple callsites to use_shell Jeff King
@ 2009-12-30 10:55         ` Jeff King
  2009-12-31 16:54           ` Johannes Sixt
  2009-12-30 10:56         ` [PATCH 4/6] editor: use run_command's shell feature Jeff King
                           ` (2 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2009-12-30 10:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nanako Shiraishi, git

If there are no metacharacters in the program to be run, we
can just skip running the shell entirely and directly exec
the program.

The metacharacter test is pulled verbatim from
launch_editor, which already implements this optimization.

Signed-off-by: Jeff King <peff@peff.net>
---
Something inside me feels wrong with a catch-known-metacharacter test
instead of an allow-known-good check. But this is the same test we have
been using with launch_editor for some time, so I decided not to mess
with it.

 run-command.c |   18 ++++++++++--------
 1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/run-command.c b/run-command.c
index dc65903..22e2777 100644
--- a/run-command.c
+++ b/run-command.c
@@ -28,15 +28,17 @@ static const char **prepare_shell_cmd(const char **argv)
 	if (argc < 1)
 		die("BUG: shell command is empty");
 
-	nargv[nargc++] = "sh";
-	nargv[nargc++] = "-c";
-
-	if (argc < 2)
-		nargv[nargc++] = argv[0];
-	else {
-		struct strbuf arg0 = STRBUF_INIT;
-		strbuf_addf(&arg0, "%s \"$@\"", argv[0]);
-		nargv[nargc++] = strbuf_detach(&arg0, NULL);
+	if (strcspn(argv[0], "|&;<>()$`\\\"' \t\n*?[#~=%") != strlen(argv[0])) {
+		nargv[nargc++] = "sh";
+		nargv[nargc++] = "-c";
+
+		if (argc < 2)
+			nargv[nargc++] = argv[0];
+		else {
+			struct strbuf arg0 = STRBUF_INIT;
+			strbuf_addf(&arg0, "%s \"$@\"", argv[0]);
+			nargv[nargc++] = strbuf_detach(&arg0, NULL);
+		}
 	}
 
 	for (argc = 0; argv[argc]; argc++)
-- 
1.6.6.65.g050d2.dirty

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/6] editor: use run_command's shell feature
  2009-12-30  9:56       ` Jeff King
                           ` (2 preceding siblings ...)
  2009-12-30 10:55         ` [PATCH 3/6] run-command: optimize out useless shell calls Jeff King
@ 2009-12-30 10:56         ` Jeff King
  2009-12-30 11:01         ` [PATCH 5/6] textconv: use shell to run helper Jeff King
  2009-12-30 11:03         ` [PATCH 6/6] diff: run external diff helper with shell Jeff King
  5 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2009-12-30 10:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nanako Shiraishi, git

Now that run_command implements the same code in a more
general form, we can make use of it.

Signed-off-by: Jeff King <peff@peff.net>
---
Should also be non-controversial, and the diffstat shows that this is
the payoff for all of the added code earlier in the series. :)

 editor.c |   21 ++-------------------
 1 files changed, 2 insertions(+), 19 deletions(-)

diff --git a/editor.c b/editor.c
index 615f575..d834003 100644
--- a/editor.c
+++ b/editor.c
@@ -36,26 +36,9 @@ int launch_editor(const char *path, struct strbuf *buffer, const char *const *en
 		return error("Terminal is dumb, but EDITOR unset");
 
 	if (strcmp(editor, ":")) {
-		size_t len = strlen(editor);
-		int i = 0;
-		int failed;
-		const char *args[6];
-		struct strbuf arg0 = STRBUF_INIT;
+		const char *args[] = { editor, path, NULL };
 
-		if (strcspn(editor, "|&;<>()$`\\\"' \t\n*?[#~=%") != len) {
-			/* there are specials */
-			strbuf_addf(&arg0, "%s \"$@\"", editor);
-			args[i++] = "sh";
-			args[i++] = "-c";
-			args[i++] = arg0.buf;
-		}
-		args[i++] = editor;
-		args[i++] = path;
-		args[i] = NULL;
-
-		failed = run_command_v_opt_cd_env(args, 0, NULL, env);
-		strbuf_release(&arg0);
-		if (failed)
+		if (run_command_v_opt_cd_env(args, RUN_USING_SHELL, NULL, env))
 			return error("There was a problem with the editor '%s'.",
 					editor);
 	}
-- 
1.6.6.65.g050d2.dirty

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/6] textconv: use shell to run helper
  2009-12-30  9:56       ` Jeff King
                           ` (3 preceding siblings ...)
  2009-12-30 10:56         ` [PATCH 4/6] editor: use run_command's shell feature Jeff King
@ 2009-12-30 11:01         ` Jeff King
  2009-12-30 11:03         ` [PATCH 6/6] diff: run external diff helper with shell Jeff King
  5 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2009-12-30 11:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nanako Shiraishi, git

Currently textconv helpers are run directly. Running through
the shell is useful because the user can provide a program
with command line arguments, like "antiword -f".

It also makes textconv more consistent with other parts of
git, most of which run their helpers using the shell.

The downside is that textconv helpers with shell
metacharacters (like space) in the filename will be broken.

Signed-off-by: Jeff King <peff@peff.net>
---
This is the first actual change in behavior. This patch and 6/6 should
perhaps be held back with a warning period. I am inclined to consider it
a bug that they didn't use the shell, and this is fixing it.

 diff.c                         |    1 +
 t/t4030-diff-textconv.sh       |    2 +-
 t/t4031-diff-rewrite-binary.sh |    2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index d14a575..2794238 100644
--- a/diff.c
+++ b/diff.c
@@ -3818,6 +3818,7 @@ static char *run_textconv(const char *pgm, struct diff_filespec *spec,
 	*arg = NULL;
 
 	memset(&child, 0, sizeof(child));
+	child.use_shell = 1;
 	child.argv = argv;
 	child.out = -1;
 	if (start_command(&child) != 0 ||
diff --git a/t/t4030-diff-textconv.sh b/t/t4030-diff-textconv.sh
index a3f0897..c16d538 100755
--- a/t/t4030-diff-textconv.sh
+++ b/t/t4030-diff-textconv.sh
@@ -48,7 +48,7 @@ test_expect_success 'file is considered binary by plumbing' '
 
 test_expect_success 'setup textconv filters' '
 	echo file diff=foo >.gitattributes &&
-	git config diff.foo.textconv "$PWD"/hexdump &&
+	git config diff.foo.textconv "\"$PWD\""/hexdump &&
 	git config diff.fail.textconv false
 '
 
diff --git a/t/t4031-diff-rewrite-binary.sh b/t/t4031-diff-rewrite-binary.sh
index a894c60..27fb31b 100755
--- a/t/t4031-diff-rewrite-binary.sh
+++ b/t/t4031-diff-rewrite-binary.sh
@@ -54,7 +54,7 @@ chmod +x dump
 
 test_expect_success 'setup textconv' '
 	echo file diff=foo >.gitattributes &&
-	git config diff.foo.textconv "$PWD"/dump
+	git config diff.foo.textconv "\"$PWD\""/dump
 '
 
 test_expect_success 'rewrite diff respects textconv' '
-- 
1.6.6.65.g050d2.dirty

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/6] diff: run external diff helper with shell
  2009-12-30  9:56       ` Jeff King
                           ` (4 preceding siblings ...)
  2009-12-30 11:01         ` [PATCH 5/6] textconv: use shell to run helper Jeff King
@ 2009-12-30 11:03         ` Jeff King
  2010-01-01 22:14           ` [PATCH 7/6] t0021: use $SHELL_PATH for the filter script Johannes Sixt
  5 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2009-12-30 11:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nanako Shiraishi, git

This is mostly to make it more consistent with the rest of
git, which uses the shell to exec helpers.

Signed-off-by: Jeff King <peff@peff.net>
---
Consistency is the main advantage here. Though I think you can actually
do craziness like setting

  GIT_EXTERNAL_DIFF='my-command $2 $3 && true' git diff

to pick out some subset of the parameters, I can't imagine it is
actually a good idea.

 diff.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/diff.c b/diff.c
index 2794238..9bb40bc 100644
--- a/diff.c
+++ b/diff.c
@@ -2294,7 +2294,7 @@ static void run_external_diff(const char *pgm,
 	}
 	*arg = NULL;
 	fflush(NULL);
-	retval = run_command_v_opt(spawn_arg, 0);
+	retval = run_command_v_opt(spawn_arg, RUN_USING_SHELL);
 	remove_tempfile();
 	if (retval) {
 		fprintf(stderr, "external diff died, stopping at %s.\n", name);
-- 
1.6.6.65.g050d2.dirty

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] run-command: add "use shell" option
  2009-12-30 10:53         ` [PATCH 1/6] run-command: add "use shell" option Jeff King
@ 2009-12-30 13:55           ` Erik Faye-Lund
  2010-01-01 22:12           ` Johannes Sixt
  1 sibling, 0 replies; 33+ messages in thread
From: Erik Faye-Lund @ 2009-12-30 13:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Nanako Shiraishi, git

On Wed, Dec 30, 2009 at 11:53 AM, Jeff King <peff@peff.net> wrote:
> Many callsites run "sh -c $CMD" to run $CMD. We can make it
> a little simpler for them by factoring out the munging of
> argv.
>
> For simple cases with no arguments, this doesn't help much, but:
>
>  1. For cases with arguments, we save the caller from
>     having to build the appropriate shell snippet.
>
>  2. We can later optimize to avoid the shell when
>     there are no metacharacters in the program.
>

Nice. This could be helpful if we ever decide not to depend on a
sh-compatible shell on Windows (since one isn't included by default no
Windows, and installing msys/cygwin has some compatibility-issues with
previous installations).

-- 
Erik "kusma" Faye-Lund

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] run-command: optimize out useless shell calls
  2009-12-30 10:55         ` [PATCH 3/6] run-command: optimize out useless shell calls Jeff King
@ 2009-12-31 16:54           ` Johannes Sixt
  2009-12-31 19:47             ` Junio C Hamano
  2009-12-31 21:41             ` Jeff King
  0 siblings, 2 replies; 33+ messages in thread
From: Johannes Sixt @ 2009-12-31 16:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Nanako Shiraishi, git

Jeff King schrieb:
> If there are no metacharacters in the program to be run, we
> can just skip running the shell entirely and directly exec
> the program.
> 
> The metacharacter test is pulled verbatim from
> launch_editor, which already implements this optimization.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Something inside me feels wrong with a catch-known-metacharacter test
> instead of an allow-known-good check. But this is the same test we have
> been using with launch_editor for some time, so I decided not to mess
> with it.

The git version that msysgit ships (and the one that I use on Windows) has 
this sequence in pager.c:

static const char *pager_argv[] = { "sh", "-c", NULL, NULL };
...
	pager_process.argv = &pager_argv[2];
	pager_process.in = -1;
	if (start_command(&pager_process)) {
		pager_process.argv = pager_argv;
		pager_process.in = -1;
		if (start_command(&pager_process))
			return;
	}

to help people set

  PAGER=C:\Program Files\cygwin\bin\less

That is, we first try to run the program without the shell, then retry 
wrapped in sh -c.

Wouldn't it be possible to do the same here, assuming that we don't have 
programs such as "editor -f" in the path?

It does assume that we are able to detect execvp failure due to ENOENT 
which is currently proposed elsewhere by Ilari Liusvaara (and which is 
already possible on Windows).

-- Hannes

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] run-command: optimize out useless shell calls
  2009-12-31 16:54           ` Johannes Sixt
@ 2009-12-31 19:47             ` Junio C Hamano
  2009-12-31 21:41               ` Johannes Sixt
  2009-12-31 21:41             ` Jeff King
  1 sibling, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2009-12-31 19:47 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, Nanako Shiraishi, git

Johannes Sixt <j6t@kdbg.org> writes:

> to help people set
>
>  PAGER=C:\Program Files\cygwin\bin\less
>
> That is, we first try to run the program without the shell, then retry
> wrapped in sh -c.
>
> Wouldn't it be possible to do the same here, assuming that we don't
> have programs such as "editor -f" in the path?

It is a cute idea that covers 70-80% of the cases, as you also have to
assume that you don't have to specify your own pager on a path with IFS
(e.g. "Program files" in your example) and give your parameter to the
pager at the same time, e.g.

    PAGER="C:\Program Files\cygwin\bin\less -FRSX"

Because it has its own LESS environment to set FRSX and you can get away
with:

    PAGER="C:\Program Files\cygwin\bin\less"
    LESS=FRSX

"less" is not a representative example for this issue.  In real life I
suspect that custom programs that we don't ship with git (or you don't
ship with msysgit) would lack such a workaround, (and that is why I didn't
say "98% of the cases").

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] run-command: optimize out useless shell calls
  2009-12-31 16:54           ` Johannes Sixt
  2009-12-31 19:47             ` Junio C Hamano
@ 2009-12-31 21:41             ` Jeff King
  2009-12-31 22:16               ` Johannes Sixt
  1 sibling, 1 reply; 33+ messages in thread
From: Jeff King @ 2009-12-31 21:41 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Nanako Shiraishi, git

On Thu, Dec 31, 2009 at 05:54:37PM +0100, Johannes Sixt wrote:

> The git version that msysgit ships (and the one that I use on
> Windows) has this sequence in pager.c:
> 
> static const char *pager_argv[] = { "sh", "-c", NULL, NULL };
> ...
> 	pager_process.argv = &pager_argv[2];
> 	pager_process.in = -1;
> 	if (start_command(&pager_process)) {
> 		pager_process.argv = pager_argv;
> 		pager_process.in = -1;
> 		if (start_command(&pager_process))
> 			return;
> 	}

As a side note to what you are saying, my patch 2/6 will break this. The
"new" way to do it would be:

  /* do not set pager_argv[2] specially */
  pager_process.in = -1;
  if (start_command(&pager_process)) {
          pager_process.use_shell = 1;
          pager_process.in = -1;
          if (start_command(&pager_process))
                  return;
  }

though I am happy to also just revert the pager.c changes and leave it
to handle the shell itself.

But of course if we use your trick internally in run-command, then your
pager-specific change can just go away.

> to help people set
> 
>  PAGER=C:\Program Files\cygwin\bin\less
> 
> That is, we first try to run the program without the shell, then
> retry wrapped in sh -c.
> 
> Wouldn't it be possible to do the same here, assuming that we don't
> have programs such as "editor -f" in the path?

Hmm. That is somewhat clever. And it would actually solve the
backward-compatibility problem for helpers moving to shell execution. If
you have a textconv of "/path with space/foo", it will continue to
work transparently, but now "/path_without_space/foo --option" will also
Just Work.

The two downsides I see are:

  1. You want to run "foo" with "-bar" in your path but you have "foo
     -bar" in your path (your "editor -f" example). This just seems
     insane to me.

  2. There is a little bit of an interface inconsistency. You can do
     PAGER="/path with space/foo", but once you want to add an option,
     PAGER="/path with space/foo -bar" does not do what you mean. You
     have to understand the magic that is going on, and that you now
     need to quote the first part.

     I'm not sure that is not a feature, though. You are paying that
     cost in the transition, but getting the DWYM magic for the common
     case.

> It does assume that we are able to detect execvp failure due to
> ENOENT which is currently proposed elsewhere by Ilari Liusvaara (and
> which is already possible on Windows).

We could also simply do the path lookup ourselves, decide whether to use
the shell, and then exec. Path lookup is not that hard, and I think we
already have mingw compat routines for it anyway.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] run-command: optimize out useless shell calls
  2009-12-31 19:47             ` Junio C Hamano
@ 2009-12-31 21:41               ` Johannes Sixt
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Sixt @ 2009-12-31 21:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Nanako Shiraishi, git

On Donnerstag, 31. Dezember 2009, Junio C Hamano wrote:
> It is a cute idea that covers 70-80% of the cases, as you also have to
> assume that you don't have to specify your own pager on a path with IFS
> (e.g. "Program files" in your example) and give your parameter to the
> pager at the same time, e.g.
>
>     PAGER="C:\Program Files\cygwin\bin\less -FRSX"
>
> Because it has its own LESS environment to set FRSX and you can get away
> with:
>
>     PAGER="C:\Program Files\cygwin\bin\less"
>     LESS=FRSX
>
> "less" is not a representative example for this issue.  In real life I
> suspect that custom programs that we don't ship with git (or you don't
> ship with msysgit) would lack such a workaround, (and that is why I didn't
> say "98% of the cases").

OTOH, once you see that you would have to set

	PAGER: C:\Program Files\cygwin\bin\less -FRSX

(I'm not using shell syntax here; think of a dialog that has name and value in 
separate edit boxes) then it is rather obvious that this cannot work. If you 
are clever (and you probably are - after all, you are modifying something 
esoteric: the environment!), then you will have heard about the magic 
double-quotes, and you will write this as

	PAGER: "C:\Program Files\cygwin\bin\less" -FRSX

instead, and it will work as intended.

Granted, "less" is not representative.

	GIT_EDITOR: "C:\Program Files\Notepad++\notepad++" -multiInst

is probably more realistic (but I didn't test it).

-- Hannes

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] run-command: optimize out useless shell calls
  2009-12-31 21:41             ` Jeff King
@ 2009-12-31 22:16               ` Johannes Sixt
  2010-01-01  4:50                 ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Johannes Sixt @ 2009-12-31 22:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Nanako Shiraishi, git

On Donnerstag, 31. Dezember 2009, Jeff King wrote:
> But of course if we use your trick internally in run-command, then your
> pager-specific change can just go away.

This is what I had in mind.

> > It does assume that we are able to detect execvp failure due to
> > ENOENT which is currently proposed elsewhere by Ilari Liusvaara (and
> > which is already possible on Windows).
>
> We could also simply do the path lookup ourselves, decide whether to use
> the shell, and then exec.

I tried to convince Ilari that this is the way to go, but...

-- Hannes

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] run-command: optimize out useless shell calls
  2009-12-31 22:16               ` Johannes Sixt
@ 2010-01-01  4:50                 ` Jeff King
  2010-01-01 10:08                   ` Johannes Sixt
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2010-01-01  4:50 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Nanako Shiraishi, git

On Thu, Dec 31, 2009 at 11:16:47PM +0100, Johannes Sixt wrote:

> > > It does assume that we are able to detect execvp failure due to
> > > ENOENT which is currently proposed elsewhere by Ilari Liusvaara (and
> > > which is already possible on Windows).
> >
> > We could also simply do the path lookup ourselves, decide whether to use
> > the shell, and then exec.
> 
> I tried to convince Ilari that this is the way to go, but...

How should we proceed, then? The "DWIM with spaces" magic seems like
something that can come later, so I am tempted to recommend taking my
series now, fixing up msysgit as mentioned earlier (or just dropping the
pager.c portion of my 2/6), and then implementing DWIM once Ilari's
topic matures.

We might want to hold my 5/6 and 6/6 back from master until the DWIM
(which would make both totally safe, I think).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] run-command: optimize out useless shell calls
  2010-01-01  4:50                 ` Jeff King
@ 2010-01-01 10:08                   ` Johannes Sixt
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Sixt @ 2010-01-01 10:08 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Nanako Shiraishi, git

Jeff King schrieb:
> How should we proceed, then? The "DWIM with spaces" magic seems like
> something that can come later, so I am tempted to recommend taking my
> series now, fixing up msysgit as mentioned earlier (or just dropping the
> pager.c portion of my 2/6), and then implementing DWIM once Ilari's
> topic matures.

I think this procedure is fine. msysgit can resolve the conflict in 
pager.c as suggested earlier.

> We might want to hold my 5/6 and 6/6 back from master until the DWIM
> (which would make both totally safe, I think).

Would make sense, too.

-- Hannes

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] run-command: add "use shell" option
  2009-12-30 10:53         ` [PATCH 1/6] run-command: add "use shell" option Jeff King
  2009-12-30 13:55           ` Erik Faye-Lund
@ 2010-01-01 22:12           ` Johannes Sixt
  1 sibling, 0 replies; 33+ messages in thread
From: Johannes Sixt @ 2010-01-01 22:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Nanako Shiraishi, git

Jeff King schrieb:
> Many callsites run "sh -c $CMD" to run $CMD. We can make it
> a little simpler for them by factoring out the munging of
> argv.
> 
> For simple cases with no arguments, this doesn't help much, but:
> 
>   1. For cases with arguments, we save the caller from
>      having to build the appropriate shell snippet.
> 
>   2. We can later optimize to avoid the shell when
>      there are no metacharacters in the program.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I made the matching tweak to the Windows half of run-command, but I
> don't actually have a box to test it on.
> 
> I modeled this after execv_git_cmd. Like that function, I try to release
> the allocated argv on error. However, we do actually leak the strbuf
> memory in one case. I'm not sure how much we care. On unix, this will
> always happen in a forked process which will either exec or die. On
> Windows, we seem to already be leaking the prepared argv for the git_cmd
> case (and now we leak the shell_cmd case, too).

That is OK. We can fix this when we find a work-load where this is
a problem.

But would you please squash this in to avoid a warning about an unused
static function on Windows.

---
  run-command.c |    2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/run-command.c b/run-command.c
index 22e2777..47ced57 100644
--- a/run-command.c
+++ b/run-command.c
@@ -48,6 +48,7 @@ static const char **prepare_shell_cmd(const char **argv)
  	return nargv;
  }

+#ifndef WIN32
  static int execv_shell_cmd(const char **argv)
  {
  	const char **nargv = prepare_shell_cmd(argv);
@@ -56,6 +57,7 @@ static int execv_shell_cmd(const char **argv)
  	free(nargv);
  	return -1;
  }
+#endif

  int start_command(struct child_process *cmd)
  {
-- 
1.6.6.1073.gd853b.dirty

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/6] t0021: use $SHELL_PATH for the filter script
  2009-12-30 11:03         ` [PATCH 6/6] diff: run external diff helper with shell Jeff King
@ 2010-01-01 22:14           ` Johannes Sixt
  2010-01-01 22:15             ` [PATCH 8/6] t4030, t4031: work around bogus MSYS bash path conversion Johannes Sixt
  2010-01-03  7:24             ` [PATCH 7/6] t0021: use $SHELL_PATH for the filter script Jeff King
  0 siblings, 2 replies; 33+ messages in thread
From: Johannes Sixt @ 2010-01-01 22:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Nanako Shiraishi, git

On Windows, we need the shbang line to correctly invoke shell scripts via
a POSIX shell, except when the script is invoked via 'sh -c' because
sh (a bash) does "the right thing". Since nowadays the clean and smudge
filters are not always invoked via 'sh -c' anymore, we have to mark the
the one in t0021-conversion with #!$SHELL_PATH.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
  t/t0021-conversion.sh    |    3 ++-
  t/t4030-diff-textconv.sh |    6 ++++--
  2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 8fc39d7..6cb8d60 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -4,7 +4,8 @@ test_description='blob conversion via gitattributes'

  . ./test-lib.sh

-cat <<\EOF >rot13.sh
+cat <<EOF >rot13.sh
+#!$SHELL_PATH
  tr \
    'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' \
    'nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM'
diff --git a/t/t4030-diff-textconv.sh b/t/t4030-diff-textconv.sh
index c16d538..3cb7e63 100755
--- a/t/t4030-diff-textconv.sh
+++ b/t/t4030-diff-textconv.sh
@@ -19,10 +19,12 @@ cat >expect.text <<'EOF'
  +1
  EOF

-cat >hexdump <<'EOF'
-#!/bin/sh
+{
+	echo "#!$SHELL_PATH"
+	cat <<'EOF'
  perl -e '$/ = undef; $_ = <>; s/./ord($&)/ge; print $_' < "$1"
  EOF
+} >hexdump
  chmod +x hexdump

  test_expect_success 'setup binary file with history' '
-- 
1.6.6.1073.gd853b.dirty

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 8/6] t4030, t4031: work around bogus MSYS bash path conversion
  2010-01-01 22:14           ` [PATCH 7/6] t0021: use $SHELL_PATH for the filter script Johannes Sixt
@ 2010-01-01 22:15             ` Johannes Sixt
  2010-01-03  7:24             ` [PATCH 7/6] t0021: use $SHELL_PATH for the filter script Jeff King
  1 sibling, 0 replies; 33+ messages in thread
From: Johannes Sixt @ 2010-01-01 22:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Nanako Shiraishi, git

Recall that MSYS bash converts POSIX style absolute paths to Windows style
absolute paths. Unfortunately, it converts a program argument that begins
with a double-quote and otherwise looks like an absolute POSIX path, but
in doing so, it strips everything past the second double-quote[*]. This
case is triggered in the two test scripts. The work-around is to place the
Windows style path between the quotes to avoid the path conversion.

[*] It is already bogus that a conversion is even considered when a program
argument begins with a double-quote because it cannot be an absolute POSIX
path.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
  t/t4030-diff-textconv.sh       |    2 +-
  t/t4031-diff-rewrite-binary.sh |    2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t4030-diff-textconv.sh b/t/t4030-diff-textconv.sh
index 3cb7e63..9b94647 100755
--- a/t/t4030-diff-textconv.sh
+++ b/t/t4030-diff-textconv.sh
@@ -50,7 +50,7 @@ test_expect_success 'file is considered binary by plumbing' '

  test_expect_success 'setup textconv filters' '
  	echo file diff=foo >.gitattributes &&
-	git config diff.foo.textconv "\"$PWD\""/hexdump &&
+	git config diff.foo.textconv "\"$(pwd)\""/hexdump &&
  	git config diff.fail.textconv false
  '

diff --git a/t/t4031-diff-rewrite-binary.sh b/t/t4031-diff-rewrite-binary.sh
index 27fb31b..7e7b307 100755
--- a/t/t4031-diff-rewrite-binary.sh
+++ b/t/t4031-diff-rewrite-binary.sh
@@ -54,7 +54,7 @@ chmod +x dump

  test_expect_success 'setup textconv' '
  	echo file diff=foo >.gitattributes &&
-	git config diff.foo.textconv "\"$PWD\""/dump
+	git config diff.foo.textconv "\"$(pwd)\""/dump
  '

  test_expect_success 'rewrite diff respects textconv' '
-- 
1.6.6.1073.gd853b.dirty

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/6] t0021: use $SHELL_PATH for the filter script
  2010-01-01 22:14           ` [PATCH 7/6] t0021: use $SHELL_PATH for the filter script Johannes Sixt
  2010-01-01 22:15             ` [PATCH 8/6] t4030, t4031: work around bogus MSYS bash path conversion Johannes Sixt
@ 2010-01-03  7:24             ` Jeff King
  2010-01-04 15:50               ` Johannes Sixt
  1 sibling, 1 reply; 33+ messages in thread
From: Jeff King @ 2010-01-03  7:24 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Nanako Shiraishi, git

On Fri, Jan 01, 2010 at 11:14:06PM +0100, Johannes Sixt wrote:

> On Windows, we need the shbang line to correctly invoke shell scripts via
> a POSIX shell, except when the script is invoked via 'sh -c' because
> sh (a bash) does "the right thing". Since nowadays the clean and smudge
> filters are not always invoked via 'sh -c' anymore, we have to mark the
> the one in t0021-conversion with #!$SHELL_PATH.

Hrm. This does mean we might be breaking users who have helper scripts
in a similar state to those in the test suite (of course, so does your
pager hack, or anything which might optimize out a shell call).  But
perhaps given that scripts without a shebang generally don't work on
Windows, they are not too common and we don't need to worry about it.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/6] t0021: use $SHELL_PATH for the filter script
  2010-01-03  7:24             ` [PATCH 7/6] t0021: use $SHELL_PATH for the filter script Jeff King
@ 2010-01-04 15:50               ` Johannes Sixt
  2010-01-04 16:03                 ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Johannes Sixt @ 2010-01-04 15:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Nanako Shiraishi, git

Jeff King schrieb:
> On Fri, Jan 01, 2010 at 11:14:06PM +0100, Johannes Sixt wrote:
> 
>> On Windows, we need the shbang line to correctly invoke shell scripts via
>> a POSIX shell, except when the script is invoked via 'sh -c' because
>> sh (a bash) does "the right thing". Since nowadays the clean and smudge
>> filters are not always invoked via 'sh -c' anymore, we have to mark the
>> the one in t0021-conversion with #!$SHELL_PATH.
> 
> Hrm. This does mean we might be breaking users who have helper scripts
> in a similar state to those in the test suite...

Not helper scripts in general, but only clean and smudge filters, because 
these have been invoked with "sh -c" so far. Everything else, that was not 
run via "sh -c", but now is, is safe.

-- Hannes

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/6] t0021: use $SHELL_PATH for the filter script
  2010-01-04 15:50               ` Johannes Sixt
@ 2010-01-04 16:03                 ` Jeff King
  2010-01-04 16:46                   ` Johannes Sixt
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2010-01-04 16:03 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Nanako Shiraishi, git

On Mon, Jan 04, 2010 at 04:50:39PM +0100, Johannes Sixt wrote:

> >>On Windows, we need the shbang line to correctly invoke shell scripts via
> >>a POSIX shell, except when the script is invoked via 'sh -c' because
> >>sh (a bash) does "the right thing". Since nowadays the clean and smudge
> >>filters are not always invoked via 'sh -c' anymore, we have to mark the
> >>the one in t0021-conversion with #!$SHELL_PATH.
> >
> >Hrm. This does mean we might be breaking users who have helper scripts
> >in a similar state to those in the test suite...
> 
> Not helper scripts in general, but only clean and smudge filters,
> because these have been invoked with "sh -c" so far. Everything else,
> that was not run via "sh -c", but now is, is safe.

I converted more than that; see my 2/6. It is also the pager, the
imap-send tunnel helper, and external merge helpers. Not the editor,
since it already had the no-metacharacters optimization (though it, too,
could be affected if we implement your DWIM trick instead of the
metacharacter thing).

So I think we need to make a conscious decision that this is an
acceptable change of behavior (and I am totally fine with the change
happening -- I just want to be clear about the extent of what is being
changed).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/6] t0021: use $SHELL_PATH for the filter script
  2010-01-04 16:03                 ` Jeff King
@ 2010-01-04 16:46                   ` Johannes Sixt
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Sixt @ 2010-01-04 16:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Nanako Shiraishi, git

Jeff King schrieb:
> I converted more than that; see my 2/6. It is also the pager, the
> imap-send tunnel helper, and external merge helpers. Not the editor,
> since it already had the no-metacharacters optimization (though it, too,
> could be affected if we implement your DWIM trick instead of the
> metacharacter thing).
> 
> So I think we need to make a conscious decision that this is an
> acceptable change of behavior (and I am totally fine with the change
> happening -- I just want to be clear about the extent of what is being
> changed).

Hm, ok, I see.

- The clean and smudge filters are probably the most important cases.

- I *did* write my own merge script (to merge PNGs!), but I made sure to 
begin it with #!/bin/bash, and I don't think anybody else is crazy enough 
to write a custom merge script ;)

- imap-send on Windows is so new that I don't think anyone is already 
using it with a custom tunneling script.

- The change in pager.c is unimportant because all versions shipped so far 
(via msysgit) have the conflicting change that tried without "sh -c" first.

I think that these can be handled with an entry in the release notes.

-- Hannes

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2010-01-04 16:46 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-14 22:17 Giving command line parameter to textconv command? Nanako Shiraishi
2009-12-14 23:31 ` Junio C Hamano
2009-12-15  3:11   ` Nanako Shiraishi
2009-12-15  5:56     ` Junio C Hamano
2009-12-15 16:49       ` Jeff King
2009-12-16  1:05         ` Junio C Hamano
2009-12-16  1:13           ` Jeff King
2009-12-15 17:03   ` Jeff King
2009-12-15 17:23     ` Junio C Hamano
2009-12-30  3:13   ` Nanako Shiraishi
2009-12-30  8:05     ` Junio C Hamano
2009-12-30  9:56       ` Jeff King
2009-12-30 10:53         ` [PATCH 1/6] run-command: add "use shell" option Jeff King
2009-12-30 13:55           ` Erik Faye-Lund
2010-01-01 22:12           ` Johannes Sixt
2009-12-30 10:53         ` [PATCH 2/6] run-command: convert simple callsites to use_shell Jeff King
2009-12-30 10:55         ` [PATCH 3/6] run-command: optimize out useless shell calls Jeff King
2009-12-31 16:54           ` Johannes Sixt
2009-12-31 19:47             ` Junio C Hamano
2009-12-31 21:41               ` Johannes Sixt
2009-12-31 21:41             ` Jeff King
2009-12-31 22:16               ` Johannes Sixt
2010-01-01  4:50                 ` Jeff King
2010-01-01 10:08                   ` Johannes Sixt
2009-12-30 10:56         ` [PATCH 4/6] editor: use run_command's shell feature Jeff King
2009-12-30 11:01         ` [PATCH 5/6] textconv: use shell to run helper Jeff King
2009-12-30 11:03         ` [PATCH 6/6] diff: run external diff helper with shell Jeff King
2010-01-01 22:14           ` [PATCH 7/6] t0021: use $SHELL_PATH for the filter script Johannes Sixt
2010-01-01 22:15             ` [PATCH 8/6] t4030, t4031: work around bogus MSYS bash path conversion Johannes Sixt
2010-01-03  7:24             ` [PATCH 7/6] t0021: use $SHELL_PATH for the filter script Jeff King
2010-01-04 15:50               ` Johannes Sixt
2010-01-04 16:03                 ` Jeff King
2010-01-04 16:46                   ` Johannes Sixt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).