* "git add -i" with path gives "Argument list too long" @ 2010-03-21 20:52 Wincent Colaiuta 2010-03-22 0:39 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Wincent Colaiuta @ 2010-03-21 20:52 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano Back in January I reported a wart in "git add -i" and discussion kind of petered out and I forgot about the bug until today when I ran into it again. Now I have a few more cycles to burn I have the time to sort it out. Just to revise what was discussed back in January: - my original report: http://lists-archives.org/git/708569-git-add-i-with-path-gives-argument-list-too-long.html - quick and easy fix posted by Jeff King: http://lists-archives.org/git/708591-git-add-i-with-path-gives-argument-list-too-long.html - notes on inconsistencies between "git add" and "git add -i" by Jeff king: http://lists-archives.org/git/708698-git-add-i-with-path-gives-argument-list-too-long.html To summarize and expand a bit upon what Jeff noted in that last post: - given _tracked_ file "foo.c" and _untracked_ file "bar.c" - "git add '*.c'" ignores the tracked file and stages the untracked file - "git add -i '*.c'" ignores the untracked file and operates on the tracked file So, I gather that we can agree that this behavior isn't right. Either we treat the arguments as pathspecs or we don't, but it doesn't seem right to do one thing for tracked files and another for untracked ones, and it seems especially wrong for the differential treatment to apply in one direction with "git add" and in the other with "git add -i". So, as a first step I'd like to make some test cases to serve as a specification for what we want the desired behavior to be. I saw in one of the "what's cooking messages" in February that there was some changes cooking in the "cp/add-u-pathspec" topic, so I am not entirely clear on if there is consensus about the desired behaviour. Before I run ahead and start writing test, do we have consensus? (Will have to worry about the initial cause of my bug report -- the "argument list too long" error -- later on.) Cheers, Wincent ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long" 2010-03-21 20:52 "git add -i" with path gives "Argument list too long" Wincent Colaiuta @ 2010-03-22 0:39 ` Jeff King 2010-03-22 1:23 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Jeff King @ 2010-03-22 0:39 UTC (permalink / raw) To: Wincent Colaiuta; +Cc: git, Junio C Hamano On Sun, Mar 21, 2010 at 09:52:56PM +0100, Wincent Colaiuta wrote: > Back in January I reported a wart in "git add -i" and discussion kind > of petered out and I forgot about the bug until today when I ran into > it again. Now I have a few more cycles to burn I have the time to sort > it out. Thanks for following up. > To summarize and expand a bit upon what Jeff noted in that last post: > > - given _tracked_ file "foo.c" and _untracked_ file "bar.c" > - "git add '*.c'" ignores the tracked file and stages the untracked file > - "git add -i '*.c'" ignores the untracked file and operates on the > tracked file > [...] > So, as a first step I'd like to make some test cases to serve as a > specification for what we want the desired behavior to be. I saw in I'm pretty sure the behavior that we want eventually is for both the diff family (including diff-files), and "git add" to learn about the globbing pathspecs that ls-files uses. That will make "git add" and "git add -i" consistent, and once diff-files understands it, we can drop the call to ls-files in add--interactive, which will solve your "argument list too long" problem. And yes, I am being intentionally vague about the behavior beyond saying "do what ls-files does". Personally I am not too familiar with the globbing pathspecs, so I am not qualified to give an exhaustive list of their behavior. Junio could probably say more, or you will have to read the code. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long" 2010-03-22 0:39 ` Jeff King @ 2010-03-22 1:23 ` Junio C Hamano 2010-03-22 1:41 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2010-03-22 1:23 UTC (permalink / raw) To: Jeff King; +Cc: Wincent Colaiuta, git Jeff King <peff@peff.net> writes: > their behavior. Junio could probably say more, or you will have to read > the code. Or read what I already said here a few times ;-) I generally do not want to repeat myself. There are two semantics of pathspecs: (1) exact match, or leading path. e.g. git ls-files Makefile Documentation/ (2) exact match, leading path, or fnmatch(3). git ls-files Makefile Documentation/ '*.txt' The former is used by the diff family, and the latter by pretty much everything else. In very old days, the former was the only kind we supported, and originally, ls-files didn't even take any pathspecs. The 5be4efb ([PATCH] Make "git-ls-files" work in subdirectories, 2005-08-21) taught ls-files to take pathspec that can glob, but diff family never got updated to match that. In order to operate on set of paths, you would need to (a) enumerate your paths, and (b) filter that enumeration efficiently with pathspecs. If you are iterating over the index (e.g. "ls-files", "diff-files", "grep"), there is nothing tricky in the enumeration step. We have a flat array of names in the index and you just walk active_cache[] from 0 to active_nr. If on the other hand you are walking in an inherently hierarchical namespace (e.g. "ls-tree", "diff-tree", "grep --no-index") with non empty set of pathspecs, you need to take advantage of the filtering behaviour while enumerating the paths---otherwise your performance will suck. "Leading path" semantics is easier to understand; if a tree entry you are looking at is "contrib", and the pathspecs you have are "Makefile" and "Documentation/", then there is no way for anything underneath it (e.g. "contrib/README") would survive the filtering process, so you can skip the entry without even opening the sub-tree object. Linus's argument was "teaching globs to pathspec code would suck in performance" and he is right in general. Because diff-tree is inherently about walking the two tree objects in parallel, it does not extend its pathspec semantics to globbing (i.e. if the user asked for '*.txt', you have to open _all_ the tree objects down to the leaf level to see if they contain any file whose name ends with .txt), and other family members of diff (namely, diff-files and "diff" without --cached nor any tree-ish argument) match this behaviour for consistency, even though theoretically "diff-files" could easily do globbing, as it walks the flat index namespace. But I think it is Ok to sacrifice the optimization and descend into any and all subtrees/directories to see if a path that might match the pattern exists when the user asks for '*.txt', as long as (and this is a _very_ important point) an update to pathspec logic on the diff side does not break the optimization unnecessarily. E.g. git diff v1.0.0 v1.2.0 -- Makefile 'Documentation/*.txt' should still skip opening tree object for 'contrib/' (because anything underneath contrib/ would never match either pathspecs given), but can and should descend into Documentation. And it should _not_ skip 'howto' subdirectory in Documentation/ directory, as it could find a match with '*.txt' in that subdirectory. To prepare for this, later reimplementations of pathspec matching logic (the one used by "git grep") can compute hints meant to be used by the path enumeration step, as I explained earlier, enumeration needs to take advantage of what filtering would do to paths that it will find. By the way I threw this "pathspec unification" to the list of possible GSoC ideas, but I suspect it might be a bit too much work to do this properly for a summer student (and also we might not want to trust this important part of the system to a summer student). Other things that probably needs to be thought through (and I haven't) that may be related to this codepath is how to handle case-insensitive filesystems. I think we currently do not match paths that we obtain from the filesystem case insensitively with the given pathspecs (we probably shouldn't go case insensitive when we are walking the index or the tree objects, on the other hand). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long" 2010-03-22 1:23 ` Junio C Hamano @ 2010-03-22 1:41 ` Jeff King 2010-03-22 1:55 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Jeff King @ 2010-03-22 1:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: Wincent Colaiuta, git On Sun, Mar 21, 2010 at 06:23:38PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > their behavior. Junio could probably say more, or you will have to read > > the code. > > Or read what I already said here a few times ;-) I generally do not want > to repeat myself. Well, yeah. :) What I meant was: > There are two semantics of pathspecs: > > (1) exact match, or leading path. e.g. > > git ls-files Makefile Documentation/ > > (2) exact match, leading path, or fnmatch(3). > > git ls-files Makefile Documentation/ '*.txt' Is type (2) exactly fnmatch? Or are there subtle corner cases that should be tested? For the implementation it shouldn't matter, as hopefully the same code paths will be used, and "just like ls-files" should be specification enough. But for testing, it might be nice to enumerate some of the more subtle cases. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long" 2010-03-22 1:41 ` Jeff King @ 2010-03-22 1:55 ` Junio C Hamano 0 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2010-03-22 1:55 UTC (permalink / raw) To: Jeff King; +Cc: Wincent Colaiuta, git Jeff King <peff@peff.net> writes: > Is type (2) exactly fnmatch? It calls fnmatch(3) but without FNM_PATHNAME nor FNM_PERIOD, which makes me sometimes wonder if it was a historical mistake and go to "git log" to check, but it is more-or-less deliberate choice, as illustrated by the 'arch/i386/*.[ch]' example in 5be4efb ([PATCH] Make "git-ls-files" work in subdirectories, 2005-08-21). ^ permalink raw reply [flat|nested] 10+ messages in thread
* "git add -i" with path gives "Argument list too long" @ 2010-01-04 18:43 Wincent Colaiuta 2010-01-05 4:14 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Wincent Colaiuta @ 2010-01-04 18:43 UTC (permalink / raw) To: git Just ran "git add -i <path>" with "<path>" pointing to a subdirectory which happens to have a bunch of files in it (about 7k) and it barfed thusly: Can't exec "git": Argument list too long at /usr/local/libexec/git- core/git-add--interactive line 158. Died at /usr/local/libexec/git-core/git-add--interactive line 158. I see that what it's trying to do under the hood is: git diff-index --cached --numstat --summary HEAD -- <7,000+ paths...> Sure, we could divide the paths into smaller groups, run multiple invocations of "git diff-index", and concatenate the results. But it would be nicer if there was some other way that we could get at the same information without having to pass 7,000 paths explicitly on the command line; is there any which I am overlooking? The enormous file list is the result of passing <path> into "git ls- files -- <path>". Would it be worth: - either, modifying "git diff-index" to accept a list of paths over stdin so that we could at least pipe the output from "git ls-files" into "git diff-index" - or, preferably, teach "git diff index" to recurse into directories rather than expect a list of paths-of-blobs (possibly with a command line switch to activate the behaviour if it were deemed a dangerous default) This is one piece of plumbing that I've never dabbled with, so forgive me if my questions are a little dumb. Cheers, Wincent ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long" 2010-01-04 18:43 Wincent Colaiuta @ 2010-01-05 4:14 ` Jeff King 2010-01-05 5:31 ` Junio C Hamano 2010-01-05 12:34 ` Wincent Colaiuta 0 siblings, 2 replies; 10+ messages in thread From: Jeff King @ 2010-01-05 4:14 UTC (permalink / raw) To: Wincent Colaiuta; +Cc: Junio C Hamano, git [cc'd Junio: there is a question of path limiters versus pathspecs near the bottom]. On Mon, Jan 04, 2010 at 07:43:10PM +0100, Wincent Colaiuta wrote: > Just ran "git add -i <path>" with "<path>" pointing to a subdirectory > which happens to have a bunch of files in it (about 7k) and it barfed > thusly: > > Can't exec "git": Argument list too long at /usr/local/libexec/git- > core/git-add--interactive line 158. > Died at /usr/local/libexec/git-core/git-add--interactive line 158. > > I see that what it's trying to do under the hood is: > > git diff-index --cached --numstat --summary HEAD -- <7,000+ paths...> Yep, and there is a similar diff-files call after that. > Sure, we could divide the paths into smaller groups, run multiple > invocations of "git diff-index", and concatenate the results. But it > would be nicer if there was some other way that we could get at the > same information without having to pass 7,000 paths explicitly on the > command line; is there any which I am overlooking? No, I don't think there is way to do it with the current code short of breaking up the output. > The enormous file list is the result of passing <path> into "git ls- > files -- <path>". Would it be worth: > > - either, modifying "git diff-index" to accept a list of paths over > stdin so that we could at least pipe the output from "git ls-files" > into "git diff-index" We could do that. It would also need a patch for diff-files. And unfortunately it makes their interface somewhat inconsistent then with diff-tree, which already has a --stdin but uses it for a list of tree-ishes. > - or, preferably, teach "git diff index" to recurse into directories > rather than expect a list of paths-of-blobs (possibly with a command > line switch to activate the behaviour if it were deemed a dangerous > default) Doesn't it already do this? If I say "git diff index subdir" it will limit the diff only to things inside subdir/. In fact, it is tempting to do this: diff --git a/git-add--interactive.perl b/git-add--interactive.perl index bfd1003..a8b3e30 100755 --- a/git-add--interactive.perl +++ b/git-add--interactive.perl @@ -275,15 +275,6 @@ sub list_modified { my ($only) = @_; my (%data, @return); my ($add, $del, $adddel, $file); - my @tracked = (); - - if (@ARGV) { - @tracked = map { - chomp $_; - unquote_path($_); - } run_cmd_pipe(qw(git ls-files --), @ARGV); - return if (!@tracked); - } my $reference; if (defined $patch_mode_revision and $patch_mode_revision ne 'HEAD') { @@ -295,7 +286,7 @@ sub list_modified { } for (run_cmd_pipe(qw(git diff-index --cached --numstat --summary), $reference, - '--', @tracked)) { + '--', @ARGV)) { if (($add, $del, $file) = /^([-\d]+) ([-\d]+) (.*)/) { my ($change, $bin); @@ -320,7 +311,7 @@ sub list_modified { } } - for (run_cmd_pipe(qw(git diff-files --numstat --summary --), @tracked)) { + for (run_cmd_pipe(qw(git diff-files --numstat --summary --), @ARGV)) { if (($add, $del, $file) = /^([-\d]+) ([-\d]+) (.*)/) { $file = unquote_path($file); but note that the pathspecs given to ls-files and the path limiters given to diff are not quite the same. So "git add -i '*.c'" will currently find "subdir/foo.c", but would not with the above patch. Is that what you meant when you said "recurse into directories"? I seem to recall Junio noting in the past the inconsistencies in git about what is a path and what is a pathspec. Is this one of those inconsistencies, and would it be a positive thing to fix it? -Peff ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long" 2010-01-05 4:14 ` Jeff King @ 2010-01-05 5:31 ` Junio C Hamano 2010-01-05 12:34 ` Wincent Colaiuta 1 sibling, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2010-01-05 5:31 UTC (permalink / raw) To: Jeff King; +Cc: Wincent Colaiuta, git Jeff King <peff@peff.net> writes: > I seem to recall Junio noting in the past the inconsistencies in git > about what is a path and what is a pathspec. Is this one of those > inconsistencies, and would it be a positive thing to fix it? We actually never take paths and everything we take is pathspec. One longstanding gripe I had around this area is there are two kinds of pathspecs. "diff-tree" family of pathspecs only match "leading directories" while "dir.c" and "builtin-grep.c" pathspecs allow both "leading directories" and "glob" pathspecs. Teaching "diff-tree" family to also grok globs would be a very nice thing to do, and the listing of paths with ls-files in the sample patch you are removing is indeed a workaround for this issue. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long" 2010-01-05 4:14 ` Jeff King 2010-01-05 5:31 ` Junio C Hamano @ 2010-01-05 12:34 ` Wincent Colaiuta 2010-01-06 12:19 ` Jeff King 1 sibling, 1 reply; 10+ messages in thread From: Wincent Colaiuta @ 2010-01-05 12:34 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git El 05/01/2010, a las 05:14, Jeff King escribió: >> - or, preferably, teach "git diff index" to recurse into directories >> rather than expect a list of paths-of-blobs (possibly with a command >> line switch to activate the behaviour if it were deemed a dangerous >> default) > > Doesn't it already do this? If I say "git diff index subdir" it > will limit the diff only to things inside subdir/. [snip patch] I tried out the patch and it obviously does avoid the "Argument list too long" problem. At least for my usage patterns the superficial differences in behavior that you note would not be a problem (I usually want to limit things to a subdir, and seldom if ever pass in things like '*.c'). > but note that the pathspecs given to ls-files and the path limiters > given to diff are not quite the same. So "git add -i '*.c'" will > currently find "subdir/foo.c", but would not with the above patch. Is > that what you meant when you said "recurse into directories"? In my relative ignorance of the finer details here, I meant that I would want "diff-index" to give us the exact same set of blobs as we get from "ls-files", so as to fix the error without modifying the user visible behavior. As I said, I personally wouldn't be impacted by the change in behavior that your patch produces, but maybe others might. Cheers, Wincent ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: "git add -i" with path gives "Argument list too long" 2010-01-05 12:34 ` Wincent Colaiuta @ 2010-01-06 12:19 ` Jeff King 0 siblings, 0 replies; 10+ messages in thread From: Jeff King @ 2010-01-06 12:19 UTC (permalink / raw) To: Wincent Colaiuta; +Cc: Junio C Hamano, git On Tue, Jan 05, 2010 at 01:34:32PM +0100, Wincent Colaiuta wrote: > >but note that the pathspecs given to ls-files and the path limiters > >given to diff are not quite the same. So "git add -i '*.c'" will > >currently find "subdir/foo.c", but would not with the above patch. Is > >that what you meant when you said "recurse into directories"? > > In my relative ignorance of the finer details here, I meant that I > would want "diff-index" to give us the exact same set of blobs as we > get from "ls-files", so as to fix the error without modifying the > user visible behavior. > > As I said, I personally wouldn't be impacted by the change in > behavior that your patch produces, but maybe others might. I also don't care about the globbing feature, though I suppose some people do. However, I'm not sure add's behavior is all that sensible now. Interactive add respects the globs, but regular add does not. Worse, it seems that it notes that the pathspec is a wildcard and does not even flag an error for failing to find any files. For example: $ git init $ echo content >foo.c $ mkdir subdir && echo content >subdir/bar.c $ git add . $ echo change >foo.c && echo change >subdir/bar.c $ git diff --name-only foo.c subdir/bar.c $ git add foobar ;# should barf, and does fatal: pathspec 'foobar' did not match any files $ git add '*.c' ;# does not barf, but does not respect wildcard $ git diff --name-only foo.c subdir/bar.c $ yes | git add -p '*.c' ;# respects glob $ git diff --name-only ;# empty So it's an consistency that should probably be fixed. And of course it is tempting to fix it by disabling the globs for the interactive case, which would not involve writing any new code. ;) But I don't think it is a good idea to punish people by taking away their feature in the name of consistency. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-03-22 1:56 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-21 20:52 "git add -i" with path gives "Argument list too long" Wincent Colaiuta 2010-03-22 0:39 ` Jeff King 2010-03-22 1:23 ` Junio C Hamano 2010-03-22 1:41 ` Jeff King 2010-03-22 1:55 ` Junio C Hamano -- strict thread matches above, loose matches on Subject: below -- 2010-01-04 18:43 Wincent Colaiuta 2010-01-05 4:14 ` Jeff King 2010-01-05 5:31 ` Junio C Hamano 2010-01-05 12:34 ` Wincent Colaiuta 2010-01-06 12:19 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).