* "git add -i" with path gives "Argument list too long"
@ 2010-01-04 18:43 Wincent Colaiuta
2010-01-05 4:14 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Wincent Colaiuta @ 2010-01-04 18:43 UTC (permalink / raw)
To: git
Just ran "git add -i <path>" with "<path>" pointing to a subdirectory
which happens to have a bunch of files in it (about 7k) and it barfed
thusly:
Can't exec "git": Argument list too long at /usr/local/libexec/git-
core/git-add--interactive line 158.
Died at /usr/local/libexec/git-core/git-add--interactive line 158.
I see that what it's trying to do under the hood is:
git diff-index --cached --numstat --summary HEAD -- <7,000+ paths...>
Sure, we could divide the paths into smaller groups, run multiple
invocations of "git diff-index", and concatenate the results. But it
would be nicer if there was some other way that we could get at the
same information without having to pass 7,000 paths explicitly on the
command line; is there any which I am overlooking?
The enormous file list is the result of passing <path> into "git ls-
files -- <path>". Would it be worth:
- either, modifying "git diff-index" to accept a list of paths over
stdin so that we could at least pipe the output from "git ls-files"
into "git diff-index"
- or, preferably, teach "git diff index" to recurse into directories
rather than expect a list of paths-of-blobs (possibly with a command
line switch to activate the behaviour if it were deemed a dangerous
default)
This is one piece of plumbing that I've never dabbled with, so forgive
me if my questions are a little dumb.
Cheers,
Wincent
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long"
2010-01-04 18:43 Wincent Colaiuta
@ 2010-01-05 4:14 ` Jeff King
2010-01-05 5:31 ` Junio C Hamano
2010-01-05 12:34 ` Wincent Colaiuta
0 siblings, 2 replies; 10+ messages in thread
From: Jeff King @ 2010-01-05 4:14 UTC (permalink / raw)
To: Wincent Colaiuta; +Cc: Junio C Hamano, git
[cc'd Junio: there is a question of path limiters versus pathspecs near
the bottom].
On Mon, Jan 04, 2010 at 07:43:10PM +0100, Wincent Colaiuta wrote:
> Just ran "git add -i <path>" with "<path>" pointing to a subdirectory
> which happens to have a bunch of files in it (about 7k) and it barfed
> thusly:
>
> Can't exec "git": Argument list too long at /usr/local/libexec/git-
> core/git-add--interactive line 158.
> Died at /usr/local/libexec/git-core/git-add--interactive line 158.
>
> I see that what it's trying to do under the hood is:
>
> git diff-index --cached --numstat --summary HEAD -- <7,000+ paths...>
Yep, and there is a similar diff-files call after that.
> Sure, we could divide the paths into smaller groups, run multiple
> invocations of "git diff-index", and concatenate the results. But it
> would be nicer if there was some other way that we could get at the
> same information without having to pass 7,000 paths explicitly on the
> command line; is there any which I am overlooking?
No, I don't think there is way to do it with the current code short of
breaking up the output.
> The enormous file list is the result of passing <path> into "git ls-
> files -- <path>". Would it be worth:
>
> - either, modifying "git diff-index" to accept a list of paths over
> stdin so that we could at least pipe the output from "git ls-files"
> into "git diff-index"
We could do that. It would also need a patch for diff-files. And
unfortunately it makes their interface somewhat inconsistent then with
diff-tree, which already has a --stdin but uses it for a list of
tree-ishes.
> - or, preferably, teach "git diff index" to recurse into directories
> rather than expect a list of paths-of-blobs (possibly with a command
> line switch to activate the behaviour if it were deemed a dangerous
> default)
Doesn't it already do this? If I say "git diff index subdir" it
will limit the diff only to things inside subdir/.
In fact, it is tempting to do this:
diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index bfd1003..a8b3e30 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -275,15 +275,6 @@ sub list_modified {
my ($only) = @_;
my (%data, @return);
my ($add, $del, $adddel, $file);
- my @tracked = ();
-
- if (@ARGV) {
- @tracked = map {
- chomp $_;
- unquote_path($_);
- } run_cmd_pipe(qw(git ls-files --), @ARGV);
- return if (!@tracked);
- }
my $reference;
if (defined $patch_mode_revision and $patch_mode_revision ne 'HEAD') {
@@ -295,7 +286,7 @@ sub list_modified {
}
for (run_cmd_pipe(qw(git diff-index --cached
--numstat --summary), $reference,
- '--', @tracked)) {
+ '--', @ARGV)) {
if (($add, $del, $file) =
/^([-\d]+) ([-\d]+) (.*)/) {
my ($change, $bin);
@@ -320,7 +311,7 @@ sub list_modified {
}
}
- for (run_cmd_pipe(qw(git diff-files --numstat --summary --), @tracked)) {
+ for (run_cmd_pipe(qw(git diff-files --numstat --summary --), @ARGV)) {
if (($add, $del, $file) =
/^([-\d]+) ([-\d]+) (.*)/) {
$file = unquote_path($file);
but note that the pathspecs given to ls-files and the path limiters
given to diff are not quite the same. So "git add -i '*.c'" will
currently find "subdir/foo.c", but would not with the above patch. Is
that what you meant when you said "recurse into directories"?
I seem to recall Junio noting in the past the inconsistencies in git
about what is a path and what is a pathspec. Is this one of those
inconsistencies, and would it be a positive thing to fix it?
-Peff
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long"
2010-01-05 4:14 ` Jeff King
@ 2010-01-05 5:31 ` Junio C Hamano
2010-01-05 12:34 ` Wincent Colaiuta
1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2010-01-05 5:31 UTC (permalink / raw)
To: Jeff King; +Cc: Wincent Colaiuta, git
Jeff King <peff@peff.net> writes:
> I seem to recall Junio noting in the past the inconsistencies in git
> about what is a path and what is a pathspec. Is this one of those
> inconsistencies, and would it be a positive thing to fix it?
We actually never take paths and everything we take is pathspec. One
longstanding gripe I had around this area is there are two kinds of
pathspecs. "diff-tree" family of pathspecs only match "leading
directories" while "dir.c" and "builtin-grep.c" pathspecs allow both
"leading directories" and "glob" pathspecs. Teaching "diff-tree" family
to also grok globs would be a very nice thing to do, and the listing of
paths with ls-files in the sample patch you are removing is indeed a
workaround for this issue.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long"
2010-01-05 4:14 ` Jeff King
2010-01-05 5:31 ` Junio C Hamano
@ 2010-01-05 12:34 ` Wincent Colaiuta
2010-01-06 12:19 ` Jeff King
1 sibling, 1 reply; 10+ messages in thread
From: Wincent Colaiuta @ 2010-01-05 12:34 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
El 05/01/2010, a las 05:14, Jeff King escribió:
>> - or, preferably, teach "git diff index" to recurse into directories
>> rather than expect a list of paths-of-blobs (possibly with a command
>> line switch to activate the behaviour if it were deemed a dangerous
>> default)
>
> Doesn't it already do this? If I say "git diff index subdir" it
> will limit the diff only to things inside subdir/.
[snip patch]
I tried out the patch and it obviously does avoid the "Argument list
too long" problem. At least for my usage patterns the superficial
differences in behavior that you note would not be a problem (I
usually want to limit things to a subdir, and seldom if ever pass in
things like '*.c').
> but note that the pathspecs given to ls-files and the path limiters
> given to diff are not quite the same. So "git add -i '*.c'" will
> currently find "subdir/foo.c", but would not with the above patch. Is
> that what you meant when you said "recurse into directories"?
In my relative ignorance of the finer details here, I meant that I
would want "diff-index" to give us the exact same set of blobs as we
get from "ls-files", so as to fix the error without modifying the user
visible behavior.
As I said, I personally wouldn't be impacted by the change in behavior
that your patch produces, but maybe others might.
Cheers,
Wincent
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long"
2010-01-05 12:34 ` Wincent Colaiuta
@ 2010-01-06 12:19 ` Jeff King
0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2010-01-06 12:19 UTC (permalink / raw)
To: Wincent Colaiuta; +Cc: Junio C Hamano, git
On Tue, Jan 05, 2010 at 01:34:32PM +0100, Wincent Colaiuta wrote:
> >but note that the pathspecs given to ls-files and the path limiters
> >given to diff are not quite the same. So "git add -i '*.c'" will
> >currently find "subdir/foo.c", but would not with the above patch. Is
> >that what you meant when you said "recurse into directories"?
>
> In my relative ignorance of the finer details here, I meant that I
> would want "diff-index" to give us the exact same set of blobs as we
> get from "ls-files", so as to fix the error without modifying the
> user visible behavior.
>
> As I said, I personally wouldn't be impacted by the change in
> behavior that your patch produces, but maybe others might.
I also don't care about the globbing feature, though I suppose some
people do. However, I'm not sure add's behavior is all that sensible
now. Interactive add respects the globs, but regular add does not.
Worse, it seems that it notes that the pathspec is a wildcard and does
not even flag an error for failing to find any files. For example:
$ git init
$ echo content >foo.c
$ mkdir subdir && echo content >subdir/bar.c
$ git add .
$ echo change >foo.c && echo change >subdir/bar.c
$ git diff --name-only
foo.c
subdir/bar.c
$ git add foobar ;# should barf, and does
fatal: pathspec 'foobar' did not match any files
$ git add '*.c' ;# does not barf, but does not respect wildcard
$ git diff --name-only
foo.c
subdir/bar.c
$ yes | git add -p '*.c' ;# respects glob
$ git diff --name-only ;# empty
So it's an consistency that should probably be fixed. And of course it
is tempting to fix it by disabling the globs for the interactive case,
which would not involve writing any new code. ;) But I don't think it is
a good idea to punish people by taking away their feature in the name of
consistency.
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* "git add -i" with path gives "Argument list too long"
@ 2010-03-21 20:52 Wincent Colaiuta
2010-03-22 0:39 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Wincent Colaiuta @ 2010-03-21 20:52 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano
Back in January I reported a wart in "git add -i" and discussion kind of petered out and I forgot about the bug until today when I ran into it again. Now I have a few more cycles to burn I have the time to sort it out.
Just to revise what was discussed back in January:
- my original report: http://lists-archives.org/git/708569-git-add-i-with-path-gives-argument-list-too-long.html
- quick and easy fix posted by Jeff King: http://lists-archives.org/git/708591-git-add-i-with-path-gives-argument-list-too-long.html
- notes on inconsistencies between "git add" and "git add -i" by Jeff king: http://lists-archives.org/git/708698-git-add-i-with-path-gives-argument-list-too-long.html
To summarize and expand a bit upon what Jeff noted in that last post:
- given _tracked_ file "foo.c" and _untracked_ file "bar.c"
- "git add '*.c'" ignores the tracked file and stages the untracked file
- "git add -i '*.c'" ignores the untracked file and operates on the tracked file
So, I gather that we can agree that this behavior isn't right. Either we treat the arguments as pathspecs or we don't, but it doesn't seem right to do one thing for tracked files and another for untracked ones, and it seems especially wrong for the differential treatment to apply in one direction with "git add" and in the other with "git add -i".
So, as a first step I'd like to make some test cases to serve as a specification for what we want the desired behavior to be. I saw in one of the "what's cooking messages" in February that there was some changes cooking in the "cp/add-u-pathspec" topic, so I am not entirely clear on if there is consensus about the desired behaviour.
Before I run ahead and start writing test, do we have consensus?
(Will have to worry about the initial cause of my bug report -- the "argument list too long" error -- later on.)
Cheers,
Wincent
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long"
2010-03-21 20:52 "git add -i" with path gives "Argument list too long" Wincent Colaiuta
@ 2010-03-22 0:39 ` Jeff King
2010-03-22 1:23 ` Junio C Hamano
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2010-03-22 0:39 UTC (permalink / raw)
To: Wincent Colaiuta; +Cc: git, Junio C Hamano
On Sun, Mar 21, 2010 at 09:52:56PM +0100, Wincent Colaiuta wrote:
> Back in January I reported a wart in "git add -i" and discussion kind
> of petered out and I forgot about the bug until today when I ran into
> it again. Now I have a few more cycles to burn I have the time to sort
> it out.
Thanks for following up.
> To summarize and expand a bit upon what Jeff noted in that last post:
>
> - given _tracked_ file "foo.c" and _untracked_ file "bar.c"
> - "git add '*.c'" ignores the tracked file and stages the untracked file
> - "git add -i '*.c'" ignores the untracked file and operates on the
> tracked file
> [...]
> So, as a first step I'd like to make some test cases to serve as a
> specification for what we want the desired behavior to be. I saw in
I'm pretty sure the behavior that we want eventually is for both the
diff family (including diff-files), and "git add" to learn about the
globbing pathspecs that ls-files uses. That will make "git add" and "git
add -i" consistent, and once diff-files understands it, we can drop the
call to ls-files in add--interactive, which will solve your "argument
list too long" problem.
And yes, I am being intentionally vague about the behavior beyond saying
"do what ls-files does". Personally I am not too familiar with the
globbing pathspecs, so I am not qualified to give an exhaustive list of
their behavior. Junio could probably say more, or you will have to read
the code.
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long"
2010-03-22 0:39 ` Jeff King
@ 2010-03-22 1:23 ` Junio C Hamano
2010-03-22 1:41 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2010-03-22 1:23 UTC (permalink / raw)
To: Jeff King; +Cc: Wincent Colaiuta, git
Jeff King <peff@peff.net> writes:
> their behavior. Junio could probably say more, or you will have to read
> the code.
Or read what I already said here a few times ;-) I generally do not want
to repeat myself.
There are two semantics of pathspecs:
(1) exact match, or leading path. e.g.
git ls-files Makefile Documentation/
(2) exact match, leading path, or fnmatch(3).
git ls-files Makefile Documentation/ '*.txt'
The former is used by the diff family, and the latter by pretty much
everything else. In very old days, the former was the only kind we
supported, and originally, ls-files didn't even take any pathspecs.
The 5be4efb ([PATCH] Make "git-ls-files" work in subdirectories,
2005-08-21) taught ls-files to take pathspec that can glob, but diff
family never got updated to match that.
In order to operate on set of paths, you would need to (a) enumerate your
paths, and (b) filter that enumeration efficiently with pathspecs. If you
are iterating over the index (e.g. "ls-files", "diff-files", "grep"),
there is nothing tricky in the enumeration step. We have a flat array of
names in the index and you just walk active_cache[] from 0 to active_nr.
If on the other hand you are walking in an inherently hierarchical
namespace (e.g. "ls-tree", "diff-tree", "grep --no-index") with non empty
set of pathspecs, you need to take advantage of the filtering behaviour
while enumerating the paths---otherwise your performance will suck.
"Leading path" semantics is easier to understand; if a tree entry you are
looking at is "contrib", and the pathspecs you have are "Makefile" and
"Documentation/", then there is no way for anything underneath it
(e.g. "contrib/README") would survive the filtering process, so you can
skip the entry without even opening the sub-tree object. Linus's argument
was "teaching globs to pathspec code would suck in performance" and he is
right in general. Because diff-tree is inherently about walking the two
tree objects in parallel, it does not extend its pathspec semantics to
globbing (i.e. if the user asked for '*.txt', you have to open _all_ the
tree objects down to the leaf level to see if they contain any file whose
name ends with .txt), and other family members of diff (namely, diff-files
and "diff" without --cached nor any tree-ish argument) match this
behaviour for consistency, even though theoretically "diff-files" could
easily do globbing, as it walks the flat index namespace.
But I think it is Ok to sacrifice the optimization and descend into any
and all subtrees/directories to see if a path that might match the pattern
exists when the user asks for '*.txt', as long as (and this is a _very_
important point) an update to pathspec logic on the diff side does not
break the optimization unnecessarily. E.g.
git diff v1.0.0 v1.2.0 -- Makefile 'Documentation/*.txt'
should still skip opening tree object for 'contrib/' (because anything
underneath contrib/ would never match either pathspecs given), but can
and should descend into Documentation. And it should _not_ skip 'howto'
subdirectory in Documentation/ directory, as it could find a match with
'*.txt' in that subdirectory.
To prepare for this, later reimplementations of pathspec matching logic
(the one used by "git grep") can compute hints meant to be used by the
path enumeration step, as I explained earlier, enumeration needs to take
advantage of what filtering would do to paths that it will find.
By the way I threw this "pathspec unification" to the list of possible
GSoC ideas, but I suspect it might be a bit too much work to do this
properly for a summer student (and also we might not want to trust this
important part of the system to a summer student). Other things that
probably needs to be thought through (and I haven't) that may be related
to this codepath is how to handle case-insensitive filesystems. I think
we currently do not match paths that we obtain from the filesystem case
insensitively with the given pathspecs (we probably shouldn't go case
insensitive when we are walking the index or the tree objects, on the
other hand).
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long"
2010-03-22 1:23 ` Junio C Hamano
@ 2010-03-22 1:41 ` Jeff King
2010-03-22 1:55 ` Junio C Hamano
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2010-03-22 1:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Wincent Colaiuta, git
On Sun, Mar 21, 2010 at 06:23:38PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > their behavior. Junio could probably say more, or you will have to read
> > the code.
>
> Or read what I already said here a few times ;-) I generally do not want
> to repeat myself.
Well, yeah. :) What I meant was:
> There are two semantics of pathspecs:
>
> (1) exact match, or leading path. e.g.
>
> git ls-files Makefile Documentation/
>
> (2) exact match, leading path, or fnmatch(3).
>
> git ls-files Makefile Documentation/ '*.txt'
Is type (2) exactly fnmatch? Or are there subtle corner cases that
should be tested? For the implementation it shouldn't matter, as
hopefully the same code paths will be used, and "just like ls-files"
should be specification enough. But for testing, it might be nice to
enumerate some of the more subtle cases.
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long"
2010-03-22 1:41 ` Jeff King
@ 2010-03-22 1:55 ` Junio C Hamano
0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2010-03-22 1:55 UTC (permalink / raw)
To: Jeff King; +Cc: Wincent Colaiuta, git
Jeff King <peff@peff.net> writes:
> Is type (2) exactly fnmatch?
It calls fnmatch(3) but without FNM_PATHNAME nor FNM_PERIOD, which makes
me sometimes wonder if it was a historical mistake and go to "git log" to
check, but it is more-or-less deliberate choice, as illustrated by the
'arch/i386/*.[ch]' example in 5be4efb ([PATCH] Make "git-ls-files" work in
subdirectories, 2005-08-21).
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-03-22 1:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-21 20:52 "git add -i" with path gives "Argument list too long" Wincent Colaiuta
2010-03-22 0:39 ` Jeff King
2010-03-22 1:23 ` Junio C Hamano
2010-03-22 1:41 ` Jeff King
2010-03-22 1:55 ` Junio C Hamano
-- strict thread matches above, loose matches on Subject: below --
2010-01-04 18:43 Wincent Colaiuta
2010-01-05 4:14 ` Jeff King
2010-01-05 5:31 ` Junio C Hamano
2010-01-05 12:34 ` Wincent Colaiuta
2010-01-06 12:19 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).