git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Remove unneeded packs
@ 2005-11-12 13:04 Marcel Holtmann
  2005-11-12 13:13 ` Andreas Ericsson
  2005-11-12 13:40 ` Craig Schlenter
  0 siblings, 2 replies; 18+ messages in thread
From: Marcel Holtmann @ 2005-11-12 13:04 UTC (permalink / raw)
  To: git

Hi guys,

every time Linus re-creates the pack for his linux-2.6 tree, I end up
with another pack. I use HTTP as transport and thus the new pack will be
download (which is almost 100 MB), but that is fine. However it seems
that the old (previous) pack will never be deleted. For the no longer
needed object files I can use git-prune-packed, but the old pack I have
to identify and delete by myself. Exists an easy and nice way to get rid
of old unneeded packs? Can't git-prune-packed also do this job?

Regards

Marcel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Remove unneeded packs
  2005-11-12 13:04 Remove unneeded packs Marcel Holtmann
@ 2005-11-12 13:13 ` Andreas Ericsson
  2005-11-12 13:30   ` Marcel Holtmann
  2005-11-12 13:40 ` Craig Schlenter
  1 sibling, 1 reply; 18+ messages in thread
From: Andreas Ericsson @ 2005-11-12 13:13 UTC (permalink / raw)
  To: git

Marcel Holtmann wrote:
> Hi guys,
> 
> every time Linus re-creates the pack for his linux-2.6 tree, I end up
> with another pack. I use HTTP as transport and thus the new pack will be
> download (which is almost 100 MB), but that is fine. However it seems
> that the old (previous) pack will never be deleted. For the no longer
> needed object files I can use git-prune-packed, but the old pack I have
> to identify and delete by myself. Exists an easy and nice way to get rid
> of old unneeded packs? Can't git-prune-packed also do this job?
> 

A patchset was posted to the list 2005-11-09 by Lukas Sandström, adding 
"git-pack-intersect" which was subsequently renamed to the more 
appropriate "git-pack-redundant".

If I remember the commit messages and understand your question correctly 
it does what you want.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Remove unneeded packs
  2005-11-12 13:13 ` Andreas Ericsson
@ 2005-11-12 13:30   ` Marcel Holtmann
  2005-11-12 22:02     ` Lukas Sandström
  0 siblings, 1 reply; 18+ messages in thread
From: Marcel Holtmann @ 2005-11-12 13:30 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]

Hi Andreas,

> > every time Linus re-creates the pack for his linux-2.6 tree, I end up
> > with another pack. I use HTTP as transport and thus the new pack will be
> > download (which is almost 100 MB), but that is fine. However it seems
> > that the old (previous) pack will never be deleted. For the no longer
> > needed object files I can use git-prune-packed, but the old pack I have
> > to identify and delete by myself. Exists an easy and nice way to get rid
> > of old unneeded packs? Can't git-prune-packed also do this job?
> > 
> 
> A patchset was posted to the list 2005-11-09 by Lukas Sandström, adding 
> "git-pack-intersect" which was subsequently renamed to the more 
> appropriate "git-pack-redundant".
> 
> If I remember the commit messages and understand your question correctly 
> it does what you want.

you are right. It is exactly what I was looking for. I just saw it some
minutes ago, when I pulled the latest git tree. However to make an old
GCC 2.95 happy, the attached patch is needed.

I am not sure if it is fully working. It deletes a lot of old packs, but
in case of the linux-2.6 tree it leaves on additional behind.

.git/objects/pack/pack-4d7682fb8230fef33eb518fa8e53885ec675795e.idx
.git/objects/pack/pack-4d7682fb8230fef33eb518fa8e53885ec675795e.pack
.git/objects/pack/pack-b3c6fbdfa36a326815de6358885c7a570a986b1b.pack
.git/objects/pack/pack-b3c6fbdfa36a326815de6358885c7a570a986b1b.idx

The 4d76... is the current pack, but the b3c6... is an old one that is
not needed anymore.

Regards

Marcel


[-- Attachment #2: patch --]
[-- Type: text/x-patch, Size: 440 bytes --]

diff --git a/pack-redundant.c b/pack-redundant.c
index 1f8c577..4ed974e 100644
--- a/pack-redundant.c
+++ b/pack-redundant.c
@@ -358,11 +358,11 @@ size_t sizeof_union(struct packed_git *p
 size_t get_pack_redundancy(struct pack_list *pl)
 {
 	struct pack_list *subset;
+	size_t ret = 0;
 
 	if (pl == NULL)
 		return 0;
 
-	size_t ret = 0;
 	while ((subset = pl->next)) {
 		while(subset) {
 			ret += sizeof_union(pl->pack, subset->pack);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Remove unneeded packs
  2005-11-12 13:04 Remove unneeded packs Marcel Holtmann
  2005-11-12 13:13 ` Andreas Ericsson
@ 2005-11-12 13:40 ` Craig Schlenter
  2005-11-12 13:59   ` Balanced packing strategy Petr Baudis
  1 sibling, 1 reply; 18+ messages in thread
From: Craig Schlenter @ 2005-11-12 13:40 UTC (permalink / raw)
  To: git

On 12 Nov 2005, at 3:04 PM, Marcel Holtmann wrote:

> every time Linus re-creates the pack for his linux-2.6 tree, I end up
> with another pack. I use HTTP as transport and thus the new pack will 
> be
> download (which is almost 100 MB), but that is fine.
> [snip]

The 100MB situation is not cool for those of us on a tight bandwidth
budget or slow links. Can anyone tell me if the native git protocol is
any better at this stuff please?

Thanks,

--Craig

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Balanced packing strategy
  2005-11-12 13:40 ` Craig Schlenter
@ 2005-11-12 13:59   ` Petr Baudis
  2005-11-12 15:14     ` Craig Schlenter
  2005-11-13 20:06     ` Josef Weidendorfer
  0 siblings, 2 replies; 18+ messages in thread
From: Petr Baudis @ 2005-11-12 13:59 UTC (permalink / raw)
  To: Craig Schlenter; +Cc: git

Dear diary, on Sat, Nov 12, 2005 at 02:40:50PM CET, I got a letter
where Craig Schlenter <craig@codefountain.com> said that...
> On 12 Nov 2005, at 3:04 PM, Marcel Holtmann wrote:
> 
> >every time Linus re-creates the pack for his linux-2.6 tree, I end up
> >with another pack. I use HTTP as transport and thus the new pack will 
> >be
> >download (which is almost 100 MB), but that is fine.
> >[snip]
> 
> The 100MB situation is not cool for those of us on a tight bandwidth
> budget or slow links. Can anyone tell me if the native git protocol is
> any better at this stuff please?

Yes, the native GIT protocol transfers only the objects you need.

But the 100MB situation is still bad. FWIW, this is my proposal I sent
about a month ago to some packs-related discussion at the kernel.org
mailing list (ok, I updated it a little):


The repacking should be done in such a way to minimize the overhead for
the dumb transport users. Ideal for this is some structure like (at the
end of october):

	year2003.pack
	year2004.pack
	halfyear2004-2.pack
	halfyear2005-1.pack
	month4.pack
	month5.pack
	month6.pack
	month7.pack
	month8.pack
	month9.pack
	week37.pack
	week38.pack
	week39.pack
	week40.pack
	week41.pack
	week42.pack
	week43.pack
	<individual objects for weeks 43, 44>


This has the property that the second half of given pack is covered by
objects with precision lower by one. This is a relatively high overload
(this can be balanced by only keeping the last third or whatever), but
it designed to reduce the overhead of fetching packs over dumb
transport. E.g. if it's almost the end of July and you last fetched at
the start of June, you will not have to get the whole halfyear2005-1
pack, but be able to catch up by just fetching month6 pack, and then few
week-packs.

For the autopacker (which should be ideally ran by some cronjob), this
means packing new week each week and getting rid of a week worth of
objects, packing new month each month and getting rid of a month worth
of objects, etc.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Balanced packing strategy
  2005-11-12 13:59   ` Balanced packing strategy Petr Baudis
@ 2005-11-12 15:14     ` Craig Schlenter
  2005-11-13  2:34       ` Junio C Hamano
  2005-11-13 20:06     ` Josef Weidendorfer
  1 sibling, 1 reply; 18+ messages in thread
From: Craig Schlenter @ 2005-11-12 15:14 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

On 12 Nov 2005, at 3:59 PM, Petr Baudis wrote:

> Dear diary, on Sat, Nov 12, 2005 at 02:40:50PM CET, I got a letter
> where Craig Schlenter <craig@codefountain.com> said that...
>> The 100MB situation is not cool for those of us on a tight bandwidth
>> budget or slow links. Can anyone tell me if the native git protocol is
>> any better at this stuff please?
>
> Yes, the native GIT protocol transfers only the objects you need.

Ah, magic, thanks!

> But the 100MB situation is still bad. FWIW, this is my proposal I sent
> about a month ago to some packs-related discussion at the kernel.org
> mailing list (ok, I updated it a little):

It would be nice if there was some meaningful automatic packing that
didn't hurt "non-git-aware protocol" users.

Does the pack index file contain enough information to enable a client
to send http byte range requests to grab individual objects from a pack?
It does seem to store object offsets but maybe I'm missing something ...

Thank you,

--Craig

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Remove unneeded packs
  2005-11-12 13:30   ` Marcel Holtmann
@ 2005-11-12 22:02     ` Lukas Sandström
  2005-11-12 22:13       ` Marcel Holtmann
  2005-11-13  2:38       ` Junio C Hamano
  0 siblings, 2 replies; 18+ messages in thread
From: Lukas Sandström @ 2005-11-12 22:02 UTC (permalink / raw)
  To: git; +Cc: Marcel Holtmann

Marcel Holtmann wrote:
> you are right. It is exactly what I was looking for. I just saw it some
> minutes ago, when I pulled the latest git tree. However to make an old
> GCC 2.95 happy, the attached patch is needed.
> 
> I am not sure if it is fully working. It deletes a lot of old packs, but
> in case of the linux-2.6 tree it leaves on additional behind.
> 
> .git/objects/pack/pack-4d7682fb8230fef33eb518fa8e53885ec675795e.idx
> .git/objects/pack/pack-4d7682fb8230fef33eb518fa8e53885ec675795e.pack
> .git/objects/pack/pack-b3c6fbdfa36a326815de6358885c7a570a986b1b.pack
> .git/objects/pack/pack-b3c6fbdfa36a326815de6358885c7a570a986b1b.idx
> 
> The 4d76... is the current pack, but the b3c6... is an old one that is
> not needed anymore.
> 
> Regards
> 
> Marcel

This is most likley because the pack b3c6... contains unreachable objects.
git-pack-redundant only makes sure that all objects present in packfiles
still are present in packfiles after the redundant packs have been removed.

Thus, unreachable objects will also be considered as required.

Note that I haven't checked if this is the cause in this particular case,
but I have the same packfiles (I use the HTTP transport too).

I'm thinking of the possibility passing a list of objects to be ignored
on stdin to git-pack-redundant. This would hopefully solve this problem.

/Lukas Sandström

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Remove unneeded packs
  2005-11-12 22:02     ` Lukas Sandström
@ 2005-11-12 22:13       ` Marcel Holtmann
  2005-11-13  2:38       ` Junio C Hamano
  1 sibling, 0 replies; 18+ messages in thread
From: Marcel Holtmann @ 2005-11-12 22:13 UTC (permalink / raw)
  To: Lukas Sandström; +Cc: git

Hi Lukas,

> > you are right. It is exactly what I was looking for. I just saw it some
> > minutes ago, when I pulled the latest git tree. However to make an old
> > GCC 2.95 happy, the attached patch is needed.
> > 
> > I am not sure if it is fully working. It deletes a lot of old packs, but
> > in case of the linux-2.6 tree it leaves on additional behind.
> > 
> > .git/objects/pack/pack-4d7682fb8230fef33eb518fa8e53885ec675795e.idx
> > .git/objects/pack/pack-4d7682fb8230fef33eb518fa8e53885ec675795e.pack
> > .git/objects/pack/pack-b3c6fbdfa36a326815de6358885c7a570a986b1b.pack
> > .git/objects/pack/pack-b3c6fbdfa36a326815de6358885c7a570a986b1b.idx
> > 
> > The 4d76... is the current pack, but the b3c6... is an old one that is
> > not needed anymore.
> 
> This is most likley because the pack b3c6... contains unreachable objects.
> git-pack-redundant only makes sure that all objects present in packfiles
> still are present in packfiles after the redundant packs have been removed.
> 
> Thus, unreachable objects will also be considered as required.
> 
> Note that I haven't checked if this is the cause in this particular case,
> but I have the same packfiles (I use the HTTP transport too).

maybe these packs are from a previous bad update. The cloned repository
I found it, is actually quite old. When I checked it with some others it
seems that it works perfect.

> I'm thinking of the possibility passing a list of objects to be ignored
> on stdin to git-pack-redundant. This would hopefully solve this problem.

Sounds good, but I don't even know what objects are involved in this
case and stops it from being marked as redundant.

Regards

Marcel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Balanced packing strategy
  2005-11-12 15:14     ` Craig Schlenter
@ 2005-11-13  2:34       ` Junio C Hamano
  2005-11-13 11:00         ` Petr Baudis
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2005-11-13  2:34 UTC (permalink / raw)
  To: Craig Schlenter; +Cc: git

Craig Schlenter <craig@codefountain.com> writes:

> Does the pack index file contain enough information to enable a client
> to send http byte range requests to grab individual objects from a pack?
> It does seem to store object offsets...

Yes, it is certainly doable; there is enough information.  I am
not sure if it is worth the complexity, though.

Many objects are stored delitified, so your byte range requests
would return delta and base object name.  After you read what
was returned and find out the base object name, you would need
to get it, which can be another delta against its base object.
This would make tangling a delta chain would become a serialized
sequence of requests.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Remove unneeded packs
  2005-11-12 22:02     ` Lukas Sandström
  2005-11-12 22:13       ` Marcel Holtmann
@ 2005-11-13  2:38       ` Junio C Hamano
  2005-11-13 10:58         ` Lukas Sandström
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2005-11-13  2:38 UTC (permalink / raw)
  To: Lukas =?iso-2022-jp-2?B?U2FuZHN0chsuQRtOdm0=?=; +Cc: git

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=iso-2022-jp-2, Size: 558 bytes --]

Lukas Sandstr^[.A^[Nvm <lukass@etek.chalmers.se> writes:

> This is most likley because the pack b3c6... contains unreachable objects.
> git-pack-redundant only makes sure that all objects present in packfiles
> still are present in packfiles after the redundant packs have been removed.
> ...
> I'm thinking of the possibility passing a list of objects to be ignored
> on stdin to git-pack-redundant. This would hopefully solve this problem.

But once you go down that path, wouldn't doing 'repack -a -d'
become looking simpler and more attractive, I wonder?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Remove unneeded packs
  2005-11-13  2:38       ` Junio C Hamano
@ 2005-11-13 10:58         ` Lukas Sandström
  2005-11-13 12:00           ` Sergey Vlasov
  0 siblings, 1 reply; 18+ messages in thread
From: Lukas Sandström @ 2005-11-13 10:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Lukas Sandström

Junio C Hamano wrote:
> Lukas Sandström <lukass@etek.chalmers.se> writes:
>>This is most likley because the pack b3c6... contains unreachable objects.
>>git-pack-redundant only makes sure that all objects present in packfiles
>>still are present in packfiles after the redundant packs have been removed.
>>...
>>I'm thinking of the possibility passing a list of objects to be ignored
>>on stdin to git-pack-redundant. This would hopefully solve this problem.
> 
> 
> But once you go down that path, wouldn't doing 'repack -a -d'
> become looking simpler and more attractive, I wonder?
> 
> 

It depends on how expensive git-fsck-objects --full --unreacahble is versus
a full repack.

Howerver, if I read the source correctly git-fsck-objects doesn't currently test 
the reachablility of packed objects. This would have to change, and I'm not certain
of how to do that properly.

Note that the following patch is reqired if git-repack -a -d is to work as expected.
(Remove all packs except the new one)

Btw, I'm sending this patch in utf8, let's see if it works...

----
Subject: [PATCH] Make sure all old packfiles are removed when doing a full repack

This is nessecary because unrachable objects in packfiles makes git-pack-redundant
flag them as non-redundant.

Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>

---

 git-repack.sh |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

applies-to: 9a0f0c748316751fbf593a21f2b16bcdd975095a
08df1f641bd3f98a607a8413d647667adc18a633
diff --git a/git-repack.sh b/git-repack.sh
index f347207..293bb50 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -32,6 +32,8 @@ case ",$all_into_one," in
 	rev_list=
 	rev_parse='--all'
 	pack_objects=
+	existing=`cd "$PACKDIR" && \
+	    find . -type f \( -name '*.pack' -o -name '*.idx' \) -print`
 	;;
 esac
 if [ "$local" ]; then
@@ -60,7 +62,19 @@ mv .tmp-pack-$name.pack "$PACKDIR/pack-$
 mv .tmp-pack-$name.idx  "$PACKDIR/pack-$name.idx" ||
 exit
 
-if test "$remove_redandant" = t
+if test "$all_into_one" = t
+then
+	sync
+	( cd "$PACKDIR" &&
+		for e in $existing
+		do
+		case "$e" in
+		./pack-$name.pack | ./pack-$name.idx) ;;
+		*)	rm -f $e ;;
+		esac
+		done
+	)
+else if test "$remove_redandant" = t
 then
 	sync
 	redundant=$(git-pack-redundant --all)
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Balanced packing strategy
  2005-11-13  2:34       ` Junio C Hamano
@ 2005-11-13 11:00         ` Petr Baudis
  0 siblings, 0 replies; 18+ messages in thread
From: Petr Baudis @ 2005-11-13 11:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Craig Schlenter, git

Dear diary, on Sun, Nov 13, 2005 at 03:34:02AM CET, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> Craig Schlenter <craig@codefountain.com> writes:
> 
> > Does the pack index file contain enough information to enable a client
> > to send http byte range requests to grab individual objects from a pack?
> > It does seem to store object offsets...
> 
> Yes, it is certainly doable; there is enough information.  I am
> not sure if it is worth the complexity, though.

I think we need either the balanced packing or this.

> Many objects are stored delitified, so your byte range requests
> would return delta and base object name.  After you read what
> was returned and find out the base object name, you would need
> to get it, which can be another delta against its base object.
> This would make tangling a delta chain would become a serialized
> sequence of requests.

Sort the objects topologically, then get everything from the old heads
on. Obviously, this will not work so well when we get multiple heads in
single pack, but either don't do that (would it be actually so bad if we
would create one pack per head?), or:

  (i) objects are topologically sorted
  (ii) objects introduced by a commit/tree are right after the commit or
       tree in the pack file
  (iii) index file contains parents list for each commit

This way, you can possibly run through the gaps, or if the gap is big
enough, restart the request. You still will miss objects introduced by
commits in different branches, but in case of trees you can slurp the
trees at once again, and pick the individual objects otherwise; while
doing this second pass, you can apply the gaps strategy again.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Remove unneeded packs
  2005-11-13 10:58         ` Lukas Sandström
@ 2005-11-13 12:00           ` Sergey Vlasov
  2005-11-13 12:07             ` Lukas Sandström
  0 siblings, 1 reply; 18+ messages in thread
From: Sergey Vlasov @ 2005-11-13 12:00 UTC (permalink / raw)
  To: Lukas Sandstr__m; +Cc: git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1526 bytes --]

On Sun, 13 Nov 2005 11:58:11 +0100 Lukas Sandstr__m wrote:

> Subject: [PATCH] Make sure all old packfiles are removed when doing a full repack
> 
> This is nessecary because unrachable objects in packfiles makes git-pack-redundant
> flag them as non-redundant.
> 
> Signed-off-by: Lukas Sandstr____m <lukass@etek.chalmers.se>
> 
> ---
> 
>  git-repack.sh |   16 +++++++++++++++-
>  1 files changed, 15 insertions(+), 1 deletions(-)
> 
> applies-to: 9a0f0c748316751fbf593a21f2b16bcdd975095a
> 08df1f641bd3f98a607a8413d647667adc18a633
> diff --git a/git-repack.sh b/git-repack.sh
> index f347207..293bb50 100755
> --- a/git-repack.sh
> +++ b/git-repack.sh
> @@ -32,6 +32,8 @@ case ",$all_into_one," in
>  	rev_list=
>  	rev_parse='--all'
>  	pack_objects=
> +	existing=`cd "$PACKDIR" && \
> +	    find . -type f \( -name '*.pack' -o -name '*.idx' \) -print`
>  	;;
>  esac
>  if [ "$local" ]; then
> @@ -60,7 +62,19 @@ mv .tmp-pack-$name.pack "$PACKDIR/pack-$
>  mv .tmp-pack-$name.idx  "$PACKDIR/pack-$name.idx" ||
>  exit
>  
> -if test "$remove_redandant" = t
> +if test "$all_into_one" = t

This should be

if test "$all_into_one$remove_redandant" = tt

(otherwise "git repack -a" becomes the same as "git repack -a -d").

> +then
> +	sync
> +	( cd "$PACKDIR" &&
> +		for e in $existing
> +		do
> +		case "$e" in
> +		./pack-$name.pack | ./pack-$name.idx) ;;
> +		*)	rm -f $e ;;
> +		esac
> +		done
> +	)
> +else if test "$remove_redandant" = t
>  then
>  	sync
>  	redundant=$(git-pack-redundant --all)
> ---
> 0.99.9.GIT

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Remove unneeded packs
  2005-11-13 12:00           ` Sergey Vlasov
@ 2005-11-13 12:07             ` Lukas Sandström
  2005-11-13 12:20               ` Sergey Vlasov
  0 siblings, 1 reply; 18+ messages in thread
From: Lukas Sandström @ 2005-11-13 12:07 UTC (permalink / raw)
  To: git; +Cc: Sergey Vlasov, Junio C Hamano

Sergey Vlasov wrote:
> On Sun, 13 Nov 2005 11:58:11 +0100 Lukas Sandström wrote:
> 
> 
>>Subject: [PATCH] Make sure all old packfiles are removed when doing a full repack
>>
>>This is nessecary because unrachable objects in packfiles makes git-pack-redundant
>>flag them as non-redundant.
>>
>>Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
>>
>>---
>>
>> git-repack.sh |   16 +++++++++++++++-
>> 1 files changed, 15 insertions(+), 1 deletions(-)
>>
>>applies-to: 9a0f0c748316751fbf593a21f2b16bcdd975095a
>>08df1f641bd3f98a607a8413d647667adc18a633
>>diff --git a/git-repack.sh b/git-repack.sh
>>index f347207..293bb50 100755
>>--- a/git-repack.sh
>>+++ b/git-repack.sh
>>@@ -32,6 +32,8 @@ case ",$all_into_one," in
>> 	rev_list=
>> 	rev_parse='--all'
>> 	pack_objects=
>>+	existing=`cd "$PACKDIR" && \
>>+	    find . -type f \( -name '*.pack' -o -name '*.idx' \) -print`
>> 	;;
>> esac
>> if [ "$local" ]; then
>>@@ -60,7 +62,19 @@ mv .tmp-pack-$name.pack "$PACKDIR/pack-$
>> mv .tmp-pack-$name.idx  "$PACKDIR/pack-$name.idx" ||
>> exit
>> 
>>-if test "$remove_redandant" = t
>>+if test "$all_into_one" = t
> 
> 
> This should be
> 
> if test "$all_into_one$remove_redandant" = tt
> 
> (otherwise "git repack -a" becomes the same as "git repack -a -d").
> 
> 

This was the behaviour before git-pack-redundant, I just restored it.
Someone else gets to decide if git repack -a implies "remove all old packs".

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Remove unneeded packs
  2005-11-13 12:07             ` Lukas Sandström
@ 2005-11-13 12:20               ` Sergey Vlasov
  2005-11-13 12:31                 ` Lukas Sandström
  0 siblings, 1 reply; 18+ messages in thread
From: Sergey Vlasov @ 2005-11-13 12:20 UTC (permalink / raw)
  To: Lukas Sandstr?m; +Cc: git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

On Sun, Nov 13, 2005 at 01:07:50PM +0100, Lukas Sandstr?m wrote:
> Sergey Vlasov wrote:
> > On Sun, 13 Nov 2005 11:58:11 +0100 Lukas Sandstr?m wrote:

> >>-if test "$remove_redandant" = t
> >>+if test "$all_into_one" = t
> > 
> > 
> > This should be
> > 
> > if test "$all_into_one$remove_redandant" = tt
> > 
> > (otherwise "git repack -a" becomes the same as "git repack -a -d").
> > 
> > 
> 
> This was the behaviour before git-pack-redundant, I just restored it.

But the old code was:

if test "$remove_redandant" = t
then
	# We know $existing are all redandant only when
	# all-into-one is used.
	if test "$all_into_one" != '' && test "$existing" != ''
	then
		sync
		( cd "$PACKDIR" &&
		  for e in $existing
		  do
			case "$e" in
			./pack-$name.pack | ./pack-$name.idx) ;;
			*)	rm -f $e ;;
			esac
		  done
		)
	fi
fi

So without the -d option nothing was removed, even with -a.

(And test "$existing" != '' might also be needed for some shells which
are confused by the empty list in the for statement.)

> Someone else gets to decide if git repack -a implies "remove all old packs".

If there is a separate -d option for this, just using -a probably
should not remove anything.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Remove unneeded packs
  2005-11-13 12:20               ` Sergey Vlasov
@ 2005-11-13 12:31                 ` Lukas Sandström
  0 siblings, 0 replies; 18+ messages in thread
From: Lukas Sandström @ 2005-11-13 12:31 UTC (permalink / raw)
  To: Sergey Vlasov; +Cc: git, Junio C Hamano

Sergey Vlasov wrote:
> On Sun, Nov 13, 2005 at 01:07:50PM +0100, Lukas Sandstr?m wrote:
> 
>>Sergey Vlasov wrote:
>>
>>>On Sun, 13 Nov 2005 11:58:11 +0100 Lukas Sandstr?m wrote:
> 
> 
>>>>-if test "$remove_redandant" = t
>>>>+if test "$all_into_one" = t
>>>
>>>
>>>This should be
>>>
>>>if test "$all_into_one$remove_redandant" = tt
>>>
>>>(otherwise "git repack -a" becomes the same as "git repack -a -d").
>>>
>>>
>>
>>This was the behaviour before git-pack-redundant, I just restored it.
> 
> 
> But the old code was:
> 
> if test "$remove_redandant" = t
> then
> 	# We know $existing are all redandant only when
> 	# all-into-one is used.
> 	if test "$all_into_one" != '' && test "$existing" != ''
> 	then
> 		sync
> 		( cd "$PACKDIR" &&
> 		  for e in $existing
> 		  do
> 			case "$e" in
> 			./pack-$name.pack | ./pack-$name.idx) ;;
> 			*)	rm -f $e ;;
> 			esac
> 		  done
> 		)
> 	fi
> fi
> 
> So without the -d option nothing was removed, even with -a.
> 
True. I forgot to look at the context around the changed lines...
Btw, remove_redundant is misspellt.
> (And test "$existing" != '' might also be needed for some shells which
> are confused by the empty list in the for statement.)
> 
> 
>>Someone else gets to decide if git repack -a implies "remove all old packs".
> 
> 
> If there is a separate -d option for this, just using -a probably
> should not remove anything.

True, but you will have trouble removing stale packfiles if they contain
unreachable objects unless you remove them when you create the -a pack.

Anyway, ignore the patch above.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Balanced packing strategy
  2005-11-12 13:59   ` Balanced packing strategy Petr Baudis
  2005-11-12 15:14     ` Craig Schlenter
@ 2005-11-13 20:06     ` Josef Weidendorfer
  2005-11-13 23:13       ` Junio C Hamano
  1 sibling, 1 reply; 18+ messages in thread
From: Josef Weidendorfer @ 2005-11-13 20:06 UTC (permalink / raw)
  To: git; +Cc: Petr Baudis, Craig Schlenter

On Saturday 12 November 2005 14:59, Petr Baudis wrote:
> The repacking should be done in such a way to minimize the overhead for
> the dumb transport users. Ideal for this is some structure like (at the
> end of october):
> 
> 	year2003.pack
> 	year2004.pack
> ...
> 	week42.pack
> 	week43.pack
> 	<individual objects for weeks 43, 44>

I am not sure if it is really beneficial, as packs have the requirement
to be self contained, so you get a lot of objects undeltified which could
be deltified in a better scheme (as eg. in git native protocol).

AFAICS, the git native protocol (which is nothing more than a pack itself
for each transfer) even has this problem, too: If you are updating every
day via git native, the sum of transfered bytes in a month will be a
multiple of one git transfer for all the month's changes.

To keep the pack self-containment property, but work better with dumb
transfers, we could introduce incremental packs:

Instead of fully repacking, create a new pack by only appendending new
objects at the end of the pack. Thus, most objects will be appended in
deltified form, making the incremental addition quite small. The outcome
would be a totally new package.

Unfortunately, I do not know the package format in detail, and hope that
this is possible at all.

For dumb protocols to take advantage of this, the information that the
first part of a package is actually the same as another package has to
be stored somewhere visible.
If a client detects that it has the first part of a pack already locally,
it would be enough to fetch only some the second part. 

This is more or less the same as Pasky's solution, but by using incremental
packs instead. I think that such incremental packing will not even take
much more space that fully repacking.

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Balanced packing strategy
  2005-11-13 20:06     ` Josef Weidendorfer
@ 2005-11-13 23:13       ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2005-11-13 23:13 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, Josef Weidendorfer

Petr Baudis <pasky@suse.cz> writes:

> This has the property that the second half of given pack is covered by
> objects with precision lower by one. This is a relatively high overload
> (this can be balanced by only keeping the last third or whatever), but
> it designed to reduce the overhead of fetching packs over dumb
> transport.

I have a feeling that you would be better off if instead do the
repacking on the server side to prepare multiple packs, each of
which has all the necessary objects to bring people who was
up-to-date at various timerange ago, to arrange that you would
need only one patch fetch with individual objects near the tip.

This obviously needs smarter client-side support.

Suppose we are somewhere after releasing v1.8 and inching
towards v1.9:

In your proposal, the object ranges each pack contains would
look like this:

 v1.0..v1.5 --------
 v1.5..v1.6        -----
 v1.6..v1.7            -------
 v1.7..v1.8                  -----
 individual objects               ....

That is, there are slight overlaps but you would do multiple
packs if you are really behind.

Instead, you could do this:

 v1.0..v1.8 ----------------------
 v1.5..v1.8         --------------
 v1.6..v1.8             ----------
 v1.7..v1.8                   ----
 individual objects               ....

Everybody starts from the tip, fetching individual objects, and
when the last repack boundary (the time we released 1.8) is
reached, the dumb protocol downloader now faces a choice.  The
indices are fairly small, so you fetch all of them and see how
many objects you are lacking from each pack.  If you were
up-to-date very long time ago, say at v1.2, you would obviously
need to fetch the longest pack.  If you were up-to-date
recently, say after v1.6 was released, you need to fetch smaller
pack.

Given the self containedness requirements, any path that is
touched once in a period needs at least one full copy of it in
each pack (all other revisions could be deltified), and I
suspect in practice the oldest pack (v1.0..v1.5 pack in your
scheme) would not save much space by not having v1.5..v1.8
history.  We could tweak things further to do something like
this:

 v1.0..v1.8 ------------------
 v1.5..v1.8         ----------
 v1.6..v1.8             ------
 v1.7..v1.8                   ----
 individual objects               ....

to also account for a fact that the recent ones cover shorter
time range and not many paths are touched.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2005-11-13 23:13 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-12 13:04 Remove unneeded packs Marcel Holtmann
2005-11-12 13:13 ` Andreas Ericsson
2005-11-12 13:30   ` Marcel Holtmann
2005-11-12 22:02     ` Lukas Sandström
2005-11-12 22:13       ` Marcel Holtmann
2005-11-13  2:38       ` Junio C Hamano
2005-11-13 10:58         ` Lukas Sandström
2005-11-13 12:00           ` Sergey Vlasov
2005-11-13 12:07             ` Lukas Sandström
2005-11-13 12:20               ` Sergey Vlasov
2005-11-13 12:31                 ` Lukas Sandström
2005-11-12 13:40 ` Craig Schlenter
2005-11-12 13:59   ` Balanced packing strategy Petr Baudis
2005-11-12 15:14     ` Craig Schlenter
2005-11-13  2:34       ` Junio C Hamano
2005-11-13 11:00         ` Petr Baudis
2005-11-13 20:06     ` Josef Weidendorfer
2005-11-13 23:13       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).