* [RFC] git-fetch - repack in the background after fetching
@ 2006-05-30 4:42 Martin Langhoff
2006-05-30 4:51 ` Linus Torvalds
0 siblings, 1 reply; 11+ messages in thread
From: Martin Langhoff @ 2006-05-30 4:42 UTC (permalink / raw)
To: git; +Cc: Martin Langhoff
Check whether we have a large set of unpacked objects and repack
after the fetch, but don't for the user to wait for us.
---
There's been some discussion about repacking proactively without
preventing further work. But as Linus said, repacking on an active
repo is _safe_, so repack in the background.
If we like this approach, we should at least respect a git-repo-config
entry saying core.noautorepack for users who don't want it. I don't
really know if there is any convention for us to check if we are in
a resource-constrained situation (aka laptops on battery). If there
is, we should respect that as well. I suspect anacron and others
do this already but I can't find any references.
We can potentially do it on commit, merge and push as well.
---
git-fetch.sh | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
5498d015eb1062928a504af3c6b3cb9b776088e8
diff --git a/git-fetch.sh b/git-fetch.sh
index 69bd810..4d64cdb 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -424,3 +424,9 @@ case ",$update_head_ok,$orig_head," in
fi
;;
esac
+
+if test $(git rev-list --unpacked --all | wc -l) -gt 1000
+then
+ echo "Repacking in the background"
+ nice git repack -a -d -q &
+fi
--
1.3.2.g82000
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC] git-fetch - repack in the background after fetching
2006-05-30 4:42 Martin Langhoff
@ 2006-05-30 4:51 ` Linus Torvalds
2006-05-30 5:14 ` Martin Langhoff
2006-05-30 6:37 ` Daniel Barkalow
0 siblings, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2006-05-30 4:51 UTC (permalink / raw)
To: Martin Langhoff; +Cc: git
On Tue, 30 May 2006, Martin Langhoff wrote:
>
> There's been some discussion about repacking proactively without
> preventing further work. But as Linus said, repacking on an active
> repo is _safe_
Repacking is, but "-d" is not necessarily.
You really should do the prune-packed only _after_ you've repacked, and no
old git programs are around.
Some long-running (in git terms) git programs will look up the pack-files
when they start, and if you repack after that, they won't see the new
pack-file, but they _will_ notice that the unpacked files are no longer
there, and will be very unhappy indeed.
So the "-d" part really isn't necessarily safe.
Of course, in -practice- you won't likely see this, and the archive itself
is never corrupted, but concurrent git ops can fail due to it in theory,
and quite frankly, that's not the kind of SCM I like to use.
So either just do "git repack -a", or do things synchronously.
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] git-fetch - repack in the background after fetching
2006-05-30 4:51 ` Linus Torvalds
@ 2006-05-30 5:14 ` Martin Langhoff
2006-05-30 6:37 ` Daniel Barkalow
1 sibling, 0 replies; 11+ messages in thread
From: Martin Langhoff @ 2006-05-30 5:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Martin Langhoff, git
On 5/30/06, Linus Torvalds <torvalds@osdl.org> wrote:
> Repacking is, but "-d" is not necessarily.
Ok -- strawman knocked down. Next try...
> Some long-running (in git terms) git programs will look up the pack-files
> when they start, and if you repack after that, they won't see the new
> pack-file, but they _will_ notice that the unpacked files are no longer
> there, and will be very unhappy indeed.
>
> So the "-d" part really isn't necessarily safe.
>
> Of course, in -practice- you won't likely see this, and the archive itself
> is never corrupted, but concurrent git ops can fail due to it in theory,
> and quite frankly, that's not the kind of SCM I like to use.
Would it be safe to repack -a && sleep 180 && git prune-packed ?
> So either just do "git repack -a", or do things synchronously.
Which I take to mean 'prune synchronously'. So what about...
+
+if test $(git rev-list --unpacked --all | wc -l) -gt 1000
+then
+ echo "Repacking in the background"
+ git prune-packed
+ nice git repack -a -q &
+fi
this would mean that at any given time there's a bit of overlap
between packed and unpacked, but will be resolved over repeated
commands.
martin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] git-fetch - repack in the background after fetching
2006-05-30 4:51 ` Linus Torvalds
2006-05-30 5:14 ` Martin Langhoff
@ 2006-05-30 6:37 ` Daniel Barkalow
2006-05-30 14:53 ` Linus Torvalds
1 sibling, 1 reply; 11+ messages in thread
From: Daniel Barkalow @ 2006-05-30 6:37 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Martin Langhoff, git
On Mon, 29 May 2006, Linus Torvalds wrote:
> Some long-running (in git terms) git programs will look up the pack-files
> when they start, and if you repack after that, they won't see the new
> pack-file, but they _will_ notice that the unpacked files are no longer
> there, and will be very unhappy indeed.
We should be able to fix this, right? If an object isn't found in packs or
unpacked, look for new packs; if there are any, look for the object in
them; if it's not there, then give up. The only tricky thing is making it
possible to scan through the available packs without installing any that
are already installed. I think the failure case is only a critical path in
the history-walking fetch code, which should probably disable this (or
defer it to after trying to download the object).
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] git-fetch - repack in the background after fetching
2006-05-30 6:37 ` Daniel Barkalow
@ 2006-05-30 14:53 ` Linus Torvalds
0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2006-05-30 14:53 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Martin Langhoff, git
On Tue, 30 May 2006, Daniel Barkalow wrote:
>
> We should be able to fix this, right? If an object isn't found in packs or
> unpacked, look for new packs; if there are any, look for the object in
> them; if it's not there, then give up.
Yes. That sounds fine.
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC] git-fetch - repack in the background after fetching
@ 2006-06-24 11:30 Martin Langhoff
2006-06-25 3:12 ` Junio C Hamano
2006-06-25 3:53 ` Linus Torvalds
0 siblings, 2 replies; 11+ messages in thread
From: Martin Langhoff @ 2006-06-24 11:30 UTC (permalink / raw)
To: git, junkio; +Cc: Martin Langhoff
Check whether we have a large set of unpacked objects and repack
after the fetch, but don't for the user to wait for us. Conditional
on core.autorepack =! no.
Having ' handle concurrent pruning of packed objects'
(637cdd9d1d997fca34a1fc668fed1311e30fe95f) from Jeff King it should
be safe to repack and prune in the background.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
---
This is a follow up to a similar patch earlier
http://www.gelato.unsw.edu.au/archives/git/0605/21401.html -- is there
interest in making GIT more friendly to users who don't know or care
about packing and repacking their repos?
I loathe to do this conditionally only on the count of unpacked
objects. If there's a quick'n'dirty way of asking portably whether
the machine is busy or otherwise resource-constrained (ie: on battery)
it should use it to avoid running repack at inconvenient times.
---
git-fetch.sh | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/git-fetch.sh b/git-fetch.sh
index 48818f8..7211318 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -427,3 +427,12 @@ case ",$update_head_ok,$orig_head," in
fi
;;
esac
+
+if test "$(git-repo-config --get core.autorepack)" != 'no'
+then
+ if test $(git rev-list --unpacked --all | wc -l) -gt 1000
+ then
+ echo "Repacking in the background"
+ nice git repack -a -d -q &
+ fi
+fi
--
1.4.1.rc1.g59c8
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC] git-fetch - repack in the background after fetching
2006-06-24 11:30 Martin Langhoff
@ 2006-06-25 3:12 ` Junio C Hamano
2006-06-25 3:53 ` Linus Torvalds
1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2006-06-25 3:12 UTC (permalink / raw)
To: Martin Langhoff; +Cc: git
Martin Langhoff <martin@catalyst.net.nz> writes:
> This is a follow up to a similar patch earlier
> http://www.gelato.unsw.edu.au/archives/git/0605/21401.html -- is there
> interest in making GIT more friendly to users who don't know or care
> about packing and repacking their repos?
I would be a bit worried about the niced background repack
racing against another instance of itself spawned by the same
parent.
> I loathe to do this conditionally only on the count of unpacked
> objects. If there's a quick'n'dirty way of asking portably whether
> the machine is busy or otherwise resource-constrained (ie: on battery)
> it should use it to avoid running repack at inconvenient times.
count-objects might be lighter weight than rev-list --unpacked.
If you mean to make core.autorepack to be boolean, checking for
string 'no' is not the right way.
git repo-config --bool --get core.autorepack
But it does not matter if that variable is a string that is
almost always true unless the value is "no".
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] git-fetch - repack in the background after fetching
2006-06-24 11:30 Martin Langhoff
2006-06-25 3:12 ` Junio C Hamano
@ 2006-06-25 3:53 ` Linus Torvalds
2006-06-25 9:25 ` Johannes Schindelin
1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2006-06-25 3:53 UTC (permalink / raw)
To: Martin Langhoff; +Cc: git, junkio
On Sat, 24 Jun 2006, Martin Langhoff wrote:
>
> Check whether we have a large set of unpacked objects and repack
> after the fetch, but don't for the user to wait for us. Conditional
> on core.autorepack =! no.
I don't think this is safe.
It's also done stupidly.
Instead of askign how many unpacked objects we have with the (expensive)
git-rev-list, why not just do
ls "$GIT_DIR/objects/00" | wc -l
which is pretty much guaranteed to be faster and easier.
However, the more worrisome thing about background repacking is that while
it should be safe against normal users, if you have two _repacks_ at the
same time, they can decide to remove each others packs. Yeah, yeah, that's
pretty damn unlikely, but hey, "pretty damn unlikely" is not "impossible".
Also, I think you'd want to repack with "-l", in case the thing is set up
with an alternate object directory.
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] git-fetch - repack in the background after fetching
2006-06-25 3:53 ` Linus Torvalds
@ 2006-06-25 9:25 ` Johannes Schindelin
2006-06-25 17:29 ` Linus Torvalds
0 siblings, 1 reply; 11+ messages in thread
From: Johannes Schindelin @ 2006-06-25 9:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Martin Langhoff, git, junkio
Hi,
On Sat, 24 Jun 2006, Linus Torvalds wrote:
> However, the more worrisome thing about background repacking is that while
> it should be safe against normal users, if you have two _repacks_ at the
> same time, they can decide to remove each others packs. Yeah, yeah, that's
> pretty damn unlikely, but hey, "pretty damn unlikely" is not "impossible".
Why not introduce a lock file for repack?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] git-fetch - repack in the background after fetching
2006-06-25 9:25 ` Johannes Schindelin
@ 2006-06-25 17:29 ` Linus Torvalds
0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2006-06-25 17:29 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Martin Langhoff, git, junkio
On Sun, 25 Jun 2006, Johannes Schindelin wrote:
>
> On Sat, 24 Jun 2006, Linus Torvalds wrote:
>
> > However, the more worrisome thing about background repacking is that while
> > it should be safe against normal users, if you have two _repacks_ at the
> > same time, they can decide to remove each others packs. Yeah, yeah, that's
> > pretty damn unlikely, but hey, "pretty damn unlikely" is not "impossible".
>
> Why not introduce a lock file for repack?
You can do that. The problem is, lock-files are really hard to do
right, and portably. Especially from scripts.
But _I_ think the basic issue is that it's wrong to even try to do this
background repack.
Git does explicit repacking. That's just how it is. If the worry is that
people forget to pack often enough, why not just have the "git pull"
script _tell_ the user, something like
if [lots of unpacked objects]; then
echo "You've got a boatload of unpacked objects now."
echo "Maybe you'd like to repack using"
echo " git repack -a -d"
echo "Thank you for not smoking"
fi >&2
which is educational on so many levels.
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] git-fetch - repack in the background after fetching
@ 2006-06-25 17:53 linux
0 siblings, 0 replies; 11+ messages in thread
From: linux @ 2006-06-25 17:53 UTC (permalink / raw)
To: git
How about a post-fetch hook script that can do this? With an example
of either printing a message or repacking in the background?
procmail includes a lockfile(1) utility useful for shell scripts, but
it also wouldn't be hard to add a "git-lock-file <file> <command>..."
utility that would create the given lock file, fork the command, and
clean up again when it exited, relaying its exit status.
(I can write one if there's interest.)
I agree with Linus that *defaulting* to background repack has problems,
but it does seem useful to provide enough hooks to easily implement
the option. Even printing the warning in a script seems like it would
simplify internationalization, and different sites (e.g. reiserfs
developers) might have different policies about what constitutes
"a boatload".
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-06-25 17:53 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-25 17:53 [RFC] git-fetch - repack in the background after fetching linux
-- strict thread matches above, loose matches on Subject: below --
2006-06-24 11:30 Martin Langhoff
2006-06-25 3:12 ` Junio C Hamano
2006-06-25 3:53 ` Linus Torvalds
2006-06-25 9:25 ` Johannes Schindelin
2006-06-25 17:29 ` Linus Torvalds
2006-05-30 4:42 Martin Langhoff
2006-05-30 4:51 ` Linus Torvalds
2006-05-30 5:14 ` Martin Langhoff
2006-05-30 6:37 ` Daniel Barkalow
2006-05-30 14:53 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).