* Caching directories
@ 2006-01-24 4:20 Pavel Roskin
2006-01-25 5:52 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Pavel Roskin @ 2006-01-24 4:20 UTC (permalink / raw)
To: git
Hello!
I'm thinking of moving cg-clean functionality to git. After having
switched to StGIT, it's the last cogito command I'm still using. I
think git can go it much better, since it's a recursive command
traversing the whole repository.
To be safe and useful, the new command should distinguish between
tracked and untracked directories. Untracked files in tracked
directories are usually the first target for cleaning, as they are
mostly automatic backups and temporary files. Untracked directories are
more likely candidates to be preserved, as they can hold external
sources, build output, extensive test data etc.
cg-clean considers a directory untracked if it has no cached files in
it. This carries a significant speed penalty, even if not coded in
bash.
Maybe it's time to start caching directories in git? I mean,
directories corresponding to tree objects could have their stats
recorded in the cache. This would allow to distinguish between tracked
and untracked directories without scanning them recursively.
--
Regards,
Pavel Roskin
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Caching directories
2006-01-24 4:20 Caching directories Pavel Roskin
@ 2006-01-25 5:52 ` Junio C Hamano
2006-01-25 7:29 ` Pavel Roskin
0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2006-01-25 5:52 UTC (permalink / raw)
To: Pavel Roskin; +Cc: git
Pavel Roskin <proski@gnu.org> writes:
> Maybe it's time to start caching directories in git? I mean,
> directories corresponding to tree objects could have their stats
> recorded in the cache. This would allow to distinguish between tracked
> and untracked directories without scanning them recursively.
I do not understand the above logic.
Given a directory path, finding out if the directory has
something tracked in it is an O(log n) operation in the current
index that does not "cache directory". Your message implies
that you feel we could use the index file to list "untracked
directories" without recursively scanning the directory tree,
but to me, the only way to do that is to record a new directory
in the index file every time somebody (either Makefile or the
user) creates a junk directory. That does not make much sense
to me, so I am probably misreading what you really meant.
I've been meaning to explore the possibility of recording 0{40}
SHA1 in the index file to mean "I do not want to place anything
on this path when I write the index out to a tree yet, but keep
an eye on the path in the working tree for me".
You can consider this as an "intent to add"; for example, with
such an index file, you could do something like this:
$ git update-index --intent-to-add foo
This would record 0^{40} SHA1 with the 0 mode in the index at
"foo". Then:
$ git diff-files -p
diff --git a/foo b/foo
new file mode 100644
index 0000000..6690023
--- /dev/null
+++ b/foo
@@ -0,0 +1,24 @@
+...
+....
...
The index has heard about it, but does not actually have it, so
it reports an addition. Since we currently do not have such,
after a "git add", the index not just has heard about it, but
actually has it, and as a consequence, there is no way to get
"new file" out of diff-files.
$ git diff-index --cached HEAD ;# nothing
The index has heard about it, but does not have it. If the HEAD
commit did not have it, diff-index --cached would report
nothing.
$ git diff-index HEAD
diff --git a/foo b/foo
new file mode 100644
index 0000000..6690023
--- /dev/null
+++ b/foo
@@ -0,0 +1,24 @@
+...
+....
...
The index has heard about it, and without --cached it uses the
working tree file, so if HEAD did not have it you would see "new
file" out of diff-index. If the comparison were with a tree
that has "foo" in it, diff-index using an index that does not
have "foo" would not say anything in the current system, but
with "intent to add", it would say "Oh, your index knows about
it so let me look in the working tree; ah, you have something
there. Let me compare it with the version in the tree in
question".
One interesting thing the "intent to add" entries would do is
this:
$ git diff-files --abbrev foo
:000000 100644 0000... 0000...
Note that two "0^{40}" mean quite different things. The one on
the LHS means "we've heard about it but we do not have it". On
the other hand, the one on the RHS means "we do not cache the
SHA1 --- go look at the working tree file".
We might want to represent the existence of a tree that does not
have anything under using 0^{40} as well. Or it might be better
kept out of the main index entries list, and become extra data
just like we have been discussing how to store "bind" entries in
the "Subprojects" thread. I dunno.
I have no idea what 'clean' does, so would not comment on that
part of your message.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Caching directories
2006-01-25 5:52 ` Junio C Hamano
@ 2006-01-25 7:29 ` Pavel Roskin
0 siblings, 0 replies; 3+ messages in thread
From: Pavel Roskin @ 2006-01-25 7:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Tue, 2006-01-24 at 21:52 -0800, Junio C Hamano wrote:
> Pavel Roskin <proski@gnu.org> writes:
>
> > Maybe it's time to start caching directories in git? I mean,
> > directories corresponding to tree objects could have their stats
> > recorded in the cache. This would allow to distinguish between tracked
> > and untracked directories without scanning them recursively.
>
> I do not understand the above logic.
>
> Given a directory path, finding out if the directory has
> something tracked in it is an O(log n) operation in the current
> index that does not "cache directory". Your message implies
> that you feel we could use the index file to list "untracked
> directories" without recursively scanning the directory tree,
> but to me, the only way to do that is to record a new directory
> in the index file every time somebody (either Makefile or the
> user) creates a junk directory. That does not make much sense
> to me, so I am probably misreading what you really meant.
Sorry, it looks like my post was based on incorrect assumptions. The
new --directory option to git-ls-files seems to be exactly what I want.
It allowed me to simplify cg-clean immensely. Further simplification
will be possible once the support for .gitignore in parent directories
is fixed.
> I have no idea what 'clean' does, so would not comment on that
> part of your message.
It means removing untracked files and directories.
--
Regards,
Pavel Roskin
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-01-25 7:30 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-24 4:20 Caching directories Pavel Roskin
2006-01-25 5:52 ` Junio C Hamano
2006-01-25 7:29 ` Pavel Roskin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).