git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] git-fetch - repack in the background after fetching
@ 2006-05-30  4:42 Martin Langhoff
  2006-05-30  4:51 ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Langhoff @ 2006-05-30  4:42 UTC (permalink / raw)
  To: git; +Cc: Martin Langhoff

Check whether we have a large set of unpacked objects and repack
after the fetch, but don't for the user to wait for us.

---

There's been some discussion about repacking proactively without
preventing further work. But as Linus said, repacking on an active
repo is _safe_, so repack in the background. 

If we like this approach, we should at least respect a git-repo-config
entry saying core.noautorepack for users who don't want it. I don't
really know if there is any convention for us to check if we are in
a resource-constrained situation (aka laptops on battery). If there
is, we should respect that as well. I suspect anacron and others 
do this already but I can't find any references.

We can potentially do it on commit, merge and push as well. 
---

 git-fetch.sh |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

5498d015eb1062928a504af3c6b3cb9b776088e8
diff --git a/git-fetch.sh b/git-fetch.sh
index 69bd810..4d64cdb 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -424,3 +424,9 @@ case ",$update_head_ok,$orig_head," in
 	fi
 	;;
 esac
+
+if test $(git rev-list --unpacked --all | wc -l) -gt 1000
+then
+	echo "Repacking in the background"
+	nice git repack -a -d -q &
+fi
-- 
1.3.2.g82000

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC] git-fetch - repack in the background after fetching
  2006-05-30  4:42 Martin Langhoff
@ 2006-05-30  4:51 ` Linus Torvalds
  2006-05-30  5:14   ` Martin Langhoff
  2006-05-30  6:37   ` Daniel Barkalow
  0 siblings, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2006-05-30  4:51 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git



On Tue, 30 May 2006, Martin Langhoff wrote:
> 
> There's been some discussion about repacking proactively without
> preventing further work. But as Linus said, repacking on an active
> repo is _safe_

Repacking is, but "-d" is not necessarily.

You really should do the prune-packed only _after_ you've repacked, and no 
old git programs are around.

Some long-running (in git terms) git programs will look up the pack-files 
when they start, and if you repack after that, they won't see the new 
pack-file, but they _will_ notice that the unpacked files are no longer 
there, and will be very unhappy indeed.

So the "-d" part really isn't necessarily safe.

Of course, in -practice- you won't likely see this, and the archive itself 
is never corrupted, but concurrent git ops can fail due to it in theory, 
and quite frankly, that's not the kind of SCM I like to use.

So either just do "git repack -a", or do things synchronously.

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] git-fetch - repack in the background after fetching
  2006-05-30  4:51 ` Linus Torvalds
@ 2006-05-30  5:14   ` Martin Langhoff
  2006-05-30  6:37   ` Daniel Barkalow
  1 sibling, 0 replies; 11+ messages in thread
From: Martin Langhoff @ 2006-05-30  5:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Langhoff, git

On 5/30/06, Linus Torvalds <torvalds@osdl.org> wrote:
> Repacking is, but "-d" is not necessarily.

Ok -- strawman knocked down. Next try...

> Some long-running (in git terms) git programs will look up the pack-files
> when they start, and if you repack after that, they won't see the new
> pack-file, but they _will_ notice that the unpacked files are no longer
> there, and will be very unhappy indeed.
>
> So the "-d" part really isn't necessarily safe.
>
> Of course, in -practice- you won't likely see this, and the archive itself
> is never corrupted, but concurrent git ops can fail due to it in theory,
> and quite frankly, that's not the kind of SCM I like to use.

Would it be safe to repack -a && sleep 180 && git prune-packed ?

> So either just do "git repack -a", or do things synchronously.

Which I take to mean 'prune synchronously'. So what about...

+
+if test $(git rev-list --unpacked --all | wc -l) -gt 1000
+then
+       echo "Repacking in the background"
+       git prune-packed
+       nice git repack -a -q &
+fi

this would mean that at any given time there's a bit of overlap
between packed and unpacked, but will be resolved over repeated
commands.




martin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] git-fetch - repack in the background after fetching
  2006-05-30  4:51 ` Linus Torvalds
  2006-05-30  5:14   ` Martin Langhoff
@ 2006-05-30  6:37   ` Daniel Barkalow
  2006-05-30 14:53     ` Linus Torvalds
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel Barkalow @ 2006-05-30  6:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Langhoff, git

On Mon, 29 May 2006, Linus Torvalds wrote:

> Some long-running (in git terms) git programs will look up the pack-files 
> when they start, and if you repack after that, they won't see the new 
> pack-file, but they _will_ notice that the unpacked files are no longer 
> there, and will be very unhappy indeed.

We should be able to fix this, right? If an object isn't found in packs or 
unpacked, look for new packs; if there are any, look for the object in 
them; if it's not there, then give up. The only tricky thing is making it 
possible to scan through the available packs without installing any that 
are already installed. I think the failure case is only a critical path in 
the history-walking fetch code, which should probably disable this (or 
defer it to after trying to download the object).

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] git-fetch - repack in the background after fetching
  2006-05-30  6:37   ` Daniel Barkalow
@ 2006-05-30 14:53     ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2006-05-30 14:53 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Martin Langhoff, git



On Tue, 30 May 2006, Daniel Barkalow wrote:
> 
> We should be able to fix this, right? If an object isn't found in packs or 
> unpacked, look for new packs; if there are any, look for the object in 
> them; if it's not there, then give up.

Yes. That sounds fine.

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC] git-fetch - repack in the background after fetching
@ 2006-06-24 11:30 Martin Langhoff
  2006-06-25  3:12 ` Junio C Hamano
  2006-06-25  3:53 ` Linus Torvalds
  0 siblings, 2 replies; 11+ messages in thread
From: Martin Langhoff @ 2006-06-24 11:30 UTC (permalink / raw)
  To: git, junkio; +Cc: Martin Langhoff

Check whether we have a large set of unpacked objects and repack
after the fetch, but don't for the user to wait for us. Conditional
on core.autorepack =! no.

Having ' handle concurrent pruning of packed objects'
(637cdd9d1d997fca34a1fc668fed1311e30fe95f) from Jeff King it should
be safe to repack and prune in the background.

Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>

---

This is a follow up to a similar patch earlier
http://www.gelato.unsw.edu.au/archives/git/0605/21401.html -- is there 
interest in making GIT more friendly to users who don't know or care
about packing and repacking their repos?

I loathe to do this conditionally only on the count of unpacked
objects. If there's a quick'n'dirty way of asking portably whether
the machine is busy or otherwise resource-constrained (ie: on battery)
it should use it to avoid running repack at inconvenient times.

---
 git-fetch.sh |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/git-fetch.sh b/git-fetch.sh
index 48818f8..7211318 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -427,3 +427,12 @@ case ",$update_head_ok,$orig_head," in
 	fi
 	;;
 esac
+
+if test "$(git-repo-config --get core.autorepack)" != 'no'
+then
+	if test $(git rev-list --unpacked --all | wc -l) -gt 1000
+	then
+		echo "Repacking in the background"
+		nice git repack -a -d -q &
+	fi
+fi
-- 
1.4.1.rc1.g59c8

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC] git-fetch - repack in the background after fetching
  2006-06-24 11:30 Martin Langhoff
@ 2006-06-25  3:12 ` Junio C Hamano
  2006-06-25  3:53 ` Linus Torvalds
  1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2006-06-25  3:12 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git

Martin Langhoff <martin@catalyst.net.nz> writes:

> This is a follow up to a similar patch earlier
> http://www.gelato.unsw.edu.au/archives/git/0605/21401.html -- is there 
> interest in making GIT more friendly to users who don't know or care
> about packing and repacking their repos?

I would be a bit worried about the niced background repack
racing against another instance of itself spawned by the same
parent.

> I loathe to do this conditionally only on the count of unpacked
> objects. If there's a quick'n'dirty way of asking portably whether
> the machine is busy or otherwise resource-constrained (ie: on battery)
> it should use it to avoid running repack at inconvenient times.

count-objects might be lighter weight than rev-list --unpacked.

If you mean to make core.autorepack to be boolean, checking for
string 'no' is not the right way.

	git repo-config --bool --get core.autorepack

But it does not matter if that variable is a string that is
almost always true unless the value is "no".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] git-fetch - repack in the background after fetching
  2006-06-24 11:30 Martin Langhoff
  2006-06-25  3:12 ` Junio C Hamano
@ 2006-06-25  3:53 ` Linus Torvalds
  2006-06-25  9:25   ` Johannes Schindelin
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2006-06-25  3:53 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git, junkio



On Sat, 24 Jun 2006, Martin Langhoff wrote:
>
> Check whether we have a large set of unpacked objects and repack
> after the fetch, but don't for the user to wait for us. Conditional
> on core.autorepack =! no.

I don't think this is safe.

It's also done stupidly.

Instead of askign how many unpacked objects we have with the (expensive) 
git-rev-list, why not just do

	ls "$GIT_DIR/objects/00" | wc -l

which is pretty much guaranteed to be faster and easier.

However, the more worrisome thing about background repacking is that while 
it should be safe against normal users, if you have two _repacks_ at the 
same time, they can decide to remove each others packs. Yeah, yeah, that's 
pretty damn unlikely, but hey, "pretty damn unlikely" is not "impossible".

Also, I think you'd want to repack with "-l", in case the thing is set up 
with an alternate object directory.

			Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] git-fetch - repack in the background after fetching
  2006-06-25  3:53 ` Linus Torvalds
@ 2006-06-25  9:25   ` Johannes Schindelin
  2006-06-25 17:29     ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Schindelin @ 2006-06-25  9:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Langhoff, git, junkio

Hi,

On Sat, 24 Jun 2006, Linus Torvalds wrote:

> However, the more worrisome thing about background repacking is that while 
> it should be safe against normal users, if you have two _repacks_ at the 
> same time, they can decide to remove each others packs. Yeah, yeah, that's 
> pretty damn unlikely, but hey, "pretty damn unlikely" is not "impossible".

Why not introduce a lock file for repack?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] git-fetch - repack in the background after fetching
  2006-06-25  9:25   ` Johannes Schindelin
@ 2006-06-25 17:29     ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2006-06-25 17:29 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Martin Langhoff, git, junkio



On Sun, 25 Jun 2006, Johannes Schindelin wrote:
> 
> On Sat, 24 Jun 2006, Linus Torvalds wrote:
> 
> > However, the more worrisome thing about background repacking is that while 
> > it should be safe against normal users, if you have two _repacks_ at the 
> > same time, they can decide to remove each others packs. Yeah, yeah, that's 
> > pretty damn unlikely, but hey, "pretty damn unlikely" is not "impossible".
> 
> Why not introduce a lock file for repack?

You can do that. The problem is, lock-files are really hard to do 
right, and portably. Especially from scripts.

But _I_ think the basic issue is that it's wrong to even try to do this 
background repack.

Git does explicit repacking. That's just how it is. If the worry is that 
people forget to pack often enough, why not just have the "git pull" 
script _tell_ the user, something like

	if [lots of unpacked objects]; then
		echo "You've got a boatload of unpacked objects now."
		echo "Maybe you'd like to repack using"
		echo "   git repack -a -d"
		echo "Thank you for not smoking"
	fi >&2

which is educational on so many levels.

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] git-fetch - repack in the background after fetching
@ 2006-06-25 17:53 linux
  0 siblings, 0 replies; 11+ messages in thread
From: linux @ 2006-06-25 17:53 UTC (permalink / raw)
  To: git

How about a post-fetch hook script that can do this?  With an example
of either printing a message or repacking in the background?

procmail includes a lockfile(1) utility useful for shell scripts, but
it also wouldn't be hard to add a "git-lock-file <file> <command>..."
utility that would create the given lock file, fork the command, and
clean up again when it exited, relaying its exit status.
(I can write one if there's interest.)

I agree with Linus that *defaulting* to background repack has problems,
but it does seem useful to provide enough hooks to easily implement
the option.  Even printing the warning in a script seems like it would
simplify internationalization, and different sites (e.g. reiserfs
developers) might have different policies about what constitutes
"a boatload".

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-06-25 17:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-25 17:53 [RFC] git-fetch - repack in the background after fetching linux
  -- strict thread matches above, loose matches on Subject: below --
2006-06-24 11:30 Martin Langhoff
2006-06-25  3:12 ` Junio C Hamano
2006-06-25  3:53 ` Linus Torvalds
2006-06-25  9:25   ` Johannes Schindelin
2006-06-25 17:29     ` Linus Torvalds
2006-05-30  4:42 Martin Langhoff
2006-05-30  4:51 ` Linus Torvalds
2006-05-30  5:14   ` Martin Langhoff
2006-05-30  6:37   ` Daniel Barkalow
2006-05-30 14:53     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).