* git prune pig slow
@ 2006-07-29 9:02 Russell King
2006-07-29 11:40 ` Johannes Schindelin
2006-07-29 18:14 ` Linus Torvalds
0 siblings, 2 replies; 6+ messages in thread
From: Russell King @ 2006-07-29 9:02 UTC (permalink / raw)
To: git
Hi,
git 1.4.0, P4 2.6GHz, 1GB.
I'm trying to use "git prune" to remove some unreachable objects from
my git tree. However, it appears to be _extremely_ expensive:
rmk 13376 91.3 15.7 165980 161556 pts/0 R+ 09:50 5:14 git-fsck-object
stracing it shows that it's doing lots and lots of brk() calls.
I killed it after 10 minutes and decided to do the job manually -
git-fsck-objects --unreachable and deleting the objects one by one is
_much_ quicker than git-fsck-objects --full --cache --unreachable.
--
Russell King
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git prune pig slow
2006-07-29 9:02 git prune pig slow Russell King
@ 2006-07-29 11:40 ` Johannes Schindelin
2006-07-29 18:14 ` Linus Torvalds
1 sibling, 0 replies; 6+ messages in thread
From: Johannes Schindelin @ 2006-07-29 11:40 UTC (permalink / raw)
To: Russell King; +Cc: git
Hi,
On Sat, 29 Jul 2006, Russell King wrote:
> Hi,
>
> git 1.4.0, P4 2.6GHz, 1GB.
>
> I'm trying to use "git prune" to remove some unreachable objects from
> my git tree. However, it appears to be _extremely_ expensive:
>
> rmk 13376 91.3 15.7 165980 161556 pts/0 R+ 09:50 5:14 git-fsck-object
>
> stracing it shows that it's doing lots and lots of brk() calls.
Does git-count-objects show a high amount of unpacked objects? You should
try "git-repack -a -d" _before_ git-prune, then.
Hth,
Dscho
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git prune pig slow
2006-07-29 9:02 git prune pig slow Russell King
2006-07-29 11:40 ` Johannes Schindelin
@ 2006-07-29 18:14 ` Linus Torvalds
2006-07-29 20:03 ` Linus Torvalds
1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2006-07-29 18:14 UTC (permalink / raw)
To: Russell King; +Cc: git
On Sat, 29 Jul 2006, Russell King wrote:
>
> I killed it after 10 minutes and decided to do the job manually -
> git-fsck-objects --unreachable and deleting the objects one by one is
> _much_ quicker than git-fsck-objects --full --cache --unreachable.
It's also very dangerous.
If you have partial packing (which you can get if you fetch data using
rsync or http, for example), not havign the "--full" means that
git-fsck-objects will report on objects being "unreachable" if they are
only reachable from another object that is packed.
Now, in practice, if you only use the git native protocol, this should
never happen, and you're fine. But there's a _very_ real reason why "git
prune" passes the "--full" flag to git-fsck-cache. "git prune" is simply
too dangerous without it.
That said, the current "git prune" in 1.4.2-rc is much faster, because it
does the reachability analysis on its own, and doesn't do all the other
things that git-fsck-cache does.
Btw, another alternative to "git prune" is actually to do
git repack -a -d
and then just delete all unpacked objects.
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git prune pig slow
2006-07-29 18:14 ` Linus Torvalds
@ 2006-07-29 20:03 ` Linus Torvalds
0 siblings, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2006-07-29 20:03 UTC (permalink / raw)
To: Russell King; +Cc: git
On Sat, 29 Jul 2006, Linus Torvalds wrote:
>
> It's also very dangerous.
>
> If you have partial packing (which you can get if you fetch data using
> rsync or http, for example), not havign the "--full" means that
> git-fsck-objects will report on objects being "unreachable" if they are
> only reachable from another object that is packed.
>
> Now, in practice, if you only use the git native protocol, this should
> never happen, and you're fine.
Side note: in _practice_, it probably doesn't happen even with rsync and
http, so in that sense, it's true that "--full" is almost always likely to
just be a waste of time, and I can't come up with a schenario where you
really need "--full" for pruning unless you did something strange. All the
normal workflows means that if you have an object that is in a pack,
everything it points to will _also_ be in a pack, and as such, "git prune"
would never remove anything that wasn't safe to remove, even without the
"--full".
But just to get an example of how a _strange_ schenario could happen,
let's say that
- you're tracking a upstreams repository using rsync or http (ie you will
get the objects in the same format that upstream tracks them, either as
individual objects, or as "packs")
- that upstreams repository does _incremental_ repacks every once in a
while.
- the last time you fetched was _just_ before upstream did an incremental
pack, we call this "State A".
As a result, you now have his old state A all as individual
objects in your object database.
- you fetch again, now after upstream has done _two_ incremntal packs
(one to pack all the loose objects that you already had, and one to
pack the new state). Upstream is now at "State B"
As a result, you get all of his _new_ objects as one nice pack:
you do not get his other pack, because you already have all
_those_ objects (which are "state A") as individual objects.
- so now, since you're only tracking the other ends state, and have no
objects of your own (in particular, the last fetch/pull did _not_
generate a merge object of your own to connect the new pack with the
old objects), what has happened is that all your heads point into the
new incremental pack you just fetched, and that pack itself will have
pointers to the individual objects that you fetched last time, because
it was an incremental pack to "state A".
- what happens now is that if you run "git-fsck-objects" without the
"--full", it will claim that _all_ of your unpacked objects are
unreachable, because they really are reachable only though that new
pack.
So in this (very very unusual) circumstance, "git prune" without the
"--full" would literally prune away objects that you very much need.
I hope this explains why that "unnecessary" (and admittedly much more
expensive) --full is there. It really is unnecessary in practice: partly
because Junio has made "git repack -a -d" so efficient that doing
incremental packs isn't even worth it for most people, and partly because
you probably use the native git protocol and repack yourself, and thus
never use another persons pack directly (which also avoids this problem).
But yeah, the olf "git prune" was really very expensive. It's much better
in the current git branch, although it's still not _cheap_ (because it
does do the whole reachability analysis, though all pack-files, because it
wants to get the above special case right).
If we really wanted to, we could add a "core.fullpacks" flag that you
could set, and that would cause the non-native protocols to not work (or
alternatively force a re-pack after they have fetched a pack), and that
would disallow incremental repacking locally, and then we could optimize
the hell out of "git prune" and say that it never needs to look at any
reachability for an object that is already packed.
That would make "git prune" basically instantaneous, the way "git
fsck-objects" is by default. But to be safe, it really needs to have some
per-repository flag that is honored by the other git commands.
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git prune pig slow
@ 2006-07-29 22:41 linux
2006-07-29 23:48 ` Linus Torvalds
0 siblings, 1 reply; 6+ messages in thread
From: linux @ 2006-07-29 22:41 UTC (permalink / raw)
To: git; +Cc: torvalds
> Btw, another alternative to "git prune" is actually to do
>
> git repack -a -d
>
> and then just delete all unpacked objects.
No, that's dangerous too. The index file is considered part of the root
set for git-fsck-objects, but not for git-repack.
Example script:
$ git-init-db
$ cat > hello.c
#include <stdio.h>
int
main(void)
{
puts("Hello, world!");
return 0;
}
$ git-update-index --add hello.c
$ git-repack -a -d
Generating pack...
Done counting 0 objects.
Nothing new to pack.
$ rm .git/objects/67/159ba959e0a0cd6157bf04d5dad66af59383c2
rm: remove write-protected regular file `.git/objects/67/159ba959e0a0cd6157bf04d5dad66af59383c2'? y
$ git commit
error: invalid object 67159ba959e0a0cd6157bf04d5dad66af59383c2
fatal: git-write-tree: error building trees
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git prune pig slow
2006-07-29 22:41 linux
@ 2006-07-29 23:48 ` Linus Torvalds
0 siblings, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2006-07-29 23:48 UTC (permalink / raw)
To: linux; +Cc: git
On Sat, 29 Jul 2006, linux@horizon.com wrote:
>
> No, that's dangerous too. The index file is considered part of the root
> set for git-fsck-objects, but not for git-repack.
Indeed.
Although at least you won't lose any history - at worst you'll have to
basically do a "git reset HEAD" to make things right again.
I was careful when I wrote the new git-prune to take the index into
account, but I'd forgotten about it wrt the "git repack -a -d" suggestion.
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-07-29 23:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-29 9:02 git prune pig slow Russell King
2006-07-29 11:40 ` Johannes Schindelin
2006-07-29 18:14 ` Linus Torvalds
2006-07-29 20:03 ` Linus Torvalds
-- strict thread matches above, loose matches on Subject: below --
2006-07-29 22:41 linux
2006-07-29 23:48 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).