* Simulating an empty git repository without having said repository on disk @ 2012-01-16 18:34 Richard Hartmann 2012-01-16 20:41 ` Jeff King 0 siblings, 1 reply; 3+ messages in thread From: Richard Hartmann @ 2012-01-16 18:34 UTC (permalink / raw) To: Git List Hi all, for vcsh[1], I need a rather hackish feature: List all files untracked by vcsh. The plan to achieve this is: Get lists of all files by all repos which' GIT_WORK_TREE is in one directory ($HOME, by default), merge all lists into one and use that as a .gitignore or exclude. Then run `git status` with $GIT_WORK_TREE pointing to $HOME while using said ignore/exclude. That will give me a list of all files & directories which are not tracked by any of the git repos managed by vcsh. I could create an empty git repo to run this operation in, but that seems wasteful. Thus, I would prefer to run this command against a non-existing, empty git repo. Problem is, I could not find a way to do this. I know the empty tree's SHA is hard-coded into git so I was hoping there would be a way to trick git using this, but I couldn't find a way. Any and all help appreciated, even if it's just a "no, this is not possible" Thanks, Richard [1] https://github.com/RichiH/vcsh ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Simulating an empty git repository without having said repository on disk 2012-01-16 18:34 Simulating an empty git repository without having said repository on disk Richard Hartmann @ 2012-01-16 20:41 ` Jeff King 2012-01-16 23:09 ` Richard Hartmann 0 siblings, 1 reply; 3+ messages in thread From: Jeff King @ 2012-01-16 20:41 UTC (permalink / raw) To: Richard Hartmann; +Cc: Git List On Mon, Jan 16, 2012 at 07:34:04PM +0100, Richard Hartmann wrote: > for vcsh[1], I need a rather hackish feature: List all files untracked > by vcsh. The plan to achieve this is: > > Get lists of all files by all repos which' GIT_WORK_TREE is in one > directory ($HOME, by default), merge all lists into one and use that > as a .gitignore or exclude. Then run `git status` with $GIT_WORK_TREE > pointing to $HOME while using said ignore/exclude. That will give me a > list of all files & directories which are not tracked by any of the > git repos managed by vcsh. I don't use vcsh, but I seem to recall that it works by overlaying the working trees of different repositories on each other, right? So you can't just say "oh, files in foo/ belong to repository 'bar'". You must take the union of the set of tracked files from all repos, then find the difference of that from the set of all files. Can individual repos mark things as excluded, too? Or do you have a master exclusion list? I.e., if I decide that I don't want "foo" tracked at all, how do I tell vcsh? > I could create an empty git repo to run this operation in, but that > seems wasteful. Thus, I would prefer to run this command against a > non-existing, empty git repo. Problem is, I could not find a way to do > this. > > I know the empty tree's SHA is hard-coded into git so I was hoping > there would be a way to trick git using this, but I couldn't find a > way. I'm not sure why you care about the empty tree if you are only looking at untracked files. Or perhaps the problem is that you are using "git status", which fundamentally cares about looking at differences between HEAD and the index, even though you don't care in this case. In that case, maybe "git ls-files -o" would be more appropriate? The most straightforward way in git would be to generate a temporary index that mentions all of the tracked files, like this: tmp=/some/tmp/index rm -f $tmp for i in repo; do git --git-dir=$repo ls-files -z | GIT_INDEX_FILE=$tmp xargs -0 git update-index --add done GIT_INDEX_FILE=$tmp git ls-files -o but that is very close to your "create an empty git repo" (in fact, you might even need to in order for update-index to be happy). But it would give you a place to put a master exclusion list (you would use it as --exclude=... in the final ls-files). If you have per-repo exclusion lists, then you could take a different approach, and simply run "git ls-files -o" for each repository. Any files which appeared in _every_ output would be untracked (since tracked files or individually-excluded files would be omitted from at least one repo). Like: # get the list of untracked files from each repo's perspective count=0 for i in repo; do count=$(($count + 1)) git --git-dir=$repo ls-files -o done >output # now count how many times each entry appears. Truly untracked things # appear $count times. sort <output | uniq -c | perl -lne "/^\s*$count (.*)/ and print \$1" The downside is that you are doing $count traversals of the untracked directories. On an OS with a reasonable lstat and a directory structure that fits into cache, that is probably not too big a deal, though. > Any and all help appreciated, even if it's just a "no, this is not possible" I took a lot of guesses at exactly what you want. It might be more clear if you gave us an example situation along with the output you expect. -Peff ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Simulating an empty git repository without having said repository on disk 2012-01-16 20:41 ` Jeff King @ 2012-01-16 23:09 ` Richard Hartmann 0 siblings, 0 replies; 3+ messages in thread From: Richard Hartmann @ 2012-01-16 23:09 UTC (permalink / raw) To: Jeff King; +Cc: Git List On Mon, Jan 16, 2012 at 21:41, Jeff King <peff@peff.net> wrote: > I don't use vcsh, but I seem to recall that it works by overlaying the > working trees of different repositories on each other, right? In parallel, but yes. > So you > can't just say "oh, files in foo/ belong to repository 'bar'". You must > take the union of the set of tracked files from all repos, then find the > difference of that from the set of all files. Correct. > Can individual repos mark things as excluded, too? Or do you have a > master exclusion list? I.e., if I decide that I don't want "foo" tracked > at all, how do I tell vcsh? That's something I am still contemplating as there are several ways: * excludes * pre-/appends to the gitignore of every repo * runtime magic Feedback welcome :) > I'm not sure why you care about the empty tree if you are only looking > at untracked files. Or perhaps the problem is that you are using "git > status", which fundamentally cares about looking at differences between > HEAD and the index, even though you don't care in this case. In that case, > maybe "git ls-files -o" would be more appropriate? --others does not work as I need to look at several repos. I tried to get the union of --others, but that creates 'argument too large' very quickly. Initially, I tried with find, but as that is depth-first, it takes ages when compared to git's early stopping at directories. > The most straightforward way in git would be to generate a temporary > index that mentions all of the tracked files, like this: > > tmp=/some/tmp/index > rm -f $tmp > for i in repo; do > git --git-dir=$repo ls-files -z | > GIT_INDEX_FILE=$tmp xargs -0 git update-index --add > done > GIT_INDEX_FILE=$tmp git ls-files -o > > but that is very close to your "create an empty git repo" (in fact, you > might even need to in order for update-index to be happy). But it would > give you a place to put a master exclusion list (you would use it as > --exclude=... in the final ls-files). > > If you have per-repo exclusion lists, then you could take a different > approach, and simply run "git ls-files -o" for each repository. Any > files which appeared in _every_ output would be untracked (since tracked > files or individually-excluded files would be omitted from at least one > repo). Like: See above, but I will try yours as well. > perl -lne "/^\s*$count (.*)/ and print \$1" I know I sound picky, but I would also like to avoid any third-party dependencies if possible. Perl is common, but not installed everywhere. > The downside is that you are doing $count traversals of the untracked > directories. On an OS with a reasonable lstat and a directory structure > that fits into cache, that is probably not too big a deal, though. With cold cache, it can take ages. Especially once you have a few git-annex repos in $HOME. > I took a lot of guesses at exactly what you want. It might be more clear > if you gave us an example situation along with the output you expect. repo foo tracks .foo and .foo.d, bar .bar, etc % vcsh list #lists repos foo bar baz % ls -aR .foo .foo.d/ .bar .baz.d/ .quux.d/ .quux.d/foo .quux.d/quux.d/quux .quux.d/quux.d/quuux .quux.d/quux.d/quuuux .quux.d/quuuux.d/quuuux pants shirts % vcsh run foo git ls-files # run command in context of repo foo .foo .foo.d .quux.d/foo % vcsh list-untracked # with the code I want in it .quux.d/quux.d/ .quux.d/quuuux.d/ pants shirts % I hope that makes sense. The only part that does not already work today is list-untracked. For failed attempts look at https://github.com/RichiH/vcsh/tree/list-untracked https://github.com/RichiH/vcsh/tree/list-untracked-2 Thanks, Richard ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-01-16 23:10 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-16 18:34 Simulating an empty git repository without having said repository on disk Richard Hartmann 2012-01-16 20:41 ` Jeff King 2012-01-16 23:09 ` Richard Hartmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).