* Remove unneeded packs @ 2005-11-12 13:04 Marcel Holtmann 2005-11-12 13:13 ` Andreas Ericsson 2005-11-12 13:40 ` Craig Schlenter 0 siblings, 2 replies; 18+ messages in thread From: Marcel Holtmann @ 2005-11-12 13:04 UTC (permalink / raw) To: git Hi guys, every time Linus re-creates the pack for his linux-2.6 tree, I end up with another pack. I use HTTP as transport and thus the new pack will be download (which is almost 100 MB), but that is fine. However it seems that the old (previous) pack will never be deleted. For the no longer needed object files I can use git-prune-packed, but the old pack I have to identify and delete by myself. Exists an easy and nice way to get rid of old unneeded packs? Can't git-prune-packed also do this job? Regards Marcel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Remove unneeded packs 2005-11-12 13:04 Remove unneeded packs Marcel Holtmann @ 2005-11-12 13:13 ` Andreas Ericsson 2005-11-12 13:30 ` Marcel Holtmann 2005-11-12 13:40 ` Craig Schlenter 1 sibling, 1 reply; 18+ messages in thread From: Andreas Ericsson @ 2005-11-12 13:13 UTC (permalink / raw) To: git Marcel Holtmann wrote: > Hi guys, > > every time Linus re-creates the pack for his linux-2.6 tree, I end up > with another pack. I use HTTP as transport and thus the new pack will be > download (which is almost 100 MB), but that is fine. However it seems > that the old (previous) pack will never be deleted. For the no longer > needed object files I can use git-prune-packed, but the old pack I have > to identify and delete by myself. Exists an easy and nice way to get rid > of old unneeded packs? Can't git-prune-packed also do this job? > A patchset was posted to the list 2005-11-09 by Lukas Sandström, adding "git-pack-intersect" which was subsequently renamed to the more appropriate "git-pack-redundant". If I remember the commit messages and understand your question correctly it does what you want. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Remove unneeded packs 2005-11-12 13:13 ` Andreas Ericsson @ 2005-11-12 13:30 ` Marcel Holtmann 2005-11-12 22:02 ` Lukas Sandström 0 siblings, 1 reply; 18+ messages in thread From: Marcel Holtmann @ 2005-11-12 13:30 UTC (permalink / raw) To: Andreas Ericsson; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1535 bytes --] Hi Andreas, > > every time Linus re-creates the pack for his linux-2.6 tree, I end up > > with another pack. I use HTTP as transport and thus the new pack will be > > download (which is almost 100 MB), but that is fine. However it seems > > that the old (previous) pack will never be deleted. For the no longer > > needed object files I can use git-prune-packed, but the old pack I have > > to identify and delete by myself. Exists an easy and nice way to get rid > > of old unneeded packs? Can't git-prune-packed also do this job? > > > > A patchset was posted to the list 2005-11-09 by Lukas Sandström, adding > "git-pack-intersect" which was subsequently renamed to the more > appropriate "git-pack-redundant". > > If I remember the commit messages and understand your question correctly > it does what you want. you are right. It is exactly what I was looking for. I just saw it some minutes ago, when I pulled the latest git tree. However to make an old GCC 2.95 happy, the attached patch is needed. I am not sure if it is fully working. It deletes a lot of old packs, but in case of the linux-2.6 tree it leaves on additional behind. .git/objects/pack/pack-4d7682fb8230fef33eb518fa8e53885ec675795e.idx .git/objects/pack/pack-4d7682fb8230fef33eb518fa8e53885ec675795e.pack .git/objects/pack/pack-b3c6fbdfa36a326815de6358885c7a570a986b1b.pack .git/objects/pack/pack-b3c6fbdfa36a326815de6358885c7a570a986b1b.idx The 4d76... is the current pack, but the b3c6... is an old one that is not needed anymore. Regards Marcel [-- Attachment #2: patch --] [-- Type: text/x-patch, Size: 440 bytes --] diff --git a/pack-redundant.c b/pack-redundant.c index 1f8c577..4ed974e 100644 --- a/pack-redundant.c +++ b/pack-redundant.c @@ -358,11 +358,11 @@ size_t sizeof_union(struct packed_git *p size_t get_pack_redundancy(struct pack_list *pl) { struct pack_list *subset; + size_t ret = 0; if (pl == NULL) return 0; - size_t ret = 0; while ((subset = pl->next)) { while(subset) { ret += sizeof_union(pl->pack, subset->pack); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: Remove unneeded packs 2005-11-12 13:30 ` Marcel Holtmann @ 2005-11-12 22:02 ` Lukas Sandström 2005-11-12 22:13 ` Marcel Holtmann 2005-11-13 2:38 ` Junio C Hamano 0 siblings, 2 replies; 18+ messages in thread From: Lukas Sandström @ 2005-11-12 22:02 UTC (permalink / raw) To: git; +Cc: Marcel Holtmann Marcel Holtmann wrote: > you are right. It is exactly what I was looking for. I just saw it some > minutes ago, when I pulled the latest git tree. However to make an old > GCC 2.95 happy, the attached patch is needed. > > I am not sure if it is fully working. It deletes a lot of old packs, but > in case of the linux-2.6 tree it leaves on additional behind. > > .git/objects/pack/pack-4d7682fb8230fef33eb518fa8e53885ec675795e.idx > .git/objects/pack/pack-4d7682fb8230fef33eb518fa8e53885ec675795e.pack > .git/objects/pack/pack-b3c6fbdfa36a326815de6358885c7a570a986b1b.pack > .git/objects/pack/pack-b3c6fbdfa36a326815de6358885c7a570a986b1b.idx > > The 4d76... is the current pack, but the b3c6... is an old one that is > not needed anymore. > > Regards > > Marcel This is most likley because the pack b3c6... contains unreachable objects. git-pack-redundant only makes sure that all objects present in packfiles still are present in packfiles after the redundant packs have been removed. Thus, unreachable objects will also be considered as required. Note that I haven't checked if this is the cause in this particular case, but I have the same packfiles (I use the HTTP transport too). I'm thinking of the possibility passing a list of objects to be ignored on stdin to git-pack-redundant. This would hopefully solve this problem. /Lukas Sandström ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Remove unneeded packs 2005-11-12 22:02 ` Lukas Sandström @ 2005-11-12 22:13 ` Marcel Holtmann 2005-11-13 2:38 ` Junio C Hamano 1 sibling, 0 replies; 18+ messages in thread From: Marcel Holtmann @ 2005-11-12 22:13 UTC (permalink / raw) To: Lukas Sandström; +Cc: git Hi Lukas, > > you are right. It is exactly what I was looking for. I just saw it some > > minutes ago, when I pulled the latest git tree. However to make an old > > GCC 2.95 happy, the attached patch is needed. > > > > I am not sure if it is fully working. It deletes a lot of old packs, but > > in case of the linux-2.6 tree it leaves on additional behind. > > > > .git/objects/pack/pack-4d7682fb8230fef33eb518fa8e53885ec675795e.idx > > .git/objects/pack/pack-4d7682fb8230fef33eb518fa8e53885ec675795e.pack > > .git/objects/pack/pack-b3c6fbdfa36a326815de6358885c7a570a986b1b.pack > > .git/objects/pack/pack-b3c6fbdfa36a326815de6358885c7a570a986b1b.idx > > > > The 4d76... is the current pack, but the b3c6... is an old one that is > > not needed anymore. > > This is most likley because the pack b3c6... contains unreachable objects. > git-pack-redundant only makes sure that all objects present in packfiles > still are present in packfiles after the redundant packs have been removed. > > Thus, unreachable objects will also be considered as required. > > Note that I haven't checked if this is the cause in this particular case, > but I have the same packfiles (I use the HTTP transport too). maybe these packs are from a previous bad update. The cloned repository I found it, is actually quite old. When I checked it with some others it seems that it works perfect. > I'm thinking of the possibility passing a list of objects to be ignored > on stdin to git-pack-redundant. This would hopefully solve this problem. Sounds good, but I don't even know what objects are involved in this case and stops it from being marked as redundant. Regards Marcel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Remove unneeded packs 2005-11-12 22:02 ` Lukas Sandström 2005-11-12 22:13 ` Marcel Holtmann @ 2005-11-13 2:38 ` Junio C Hamano 2005-11-13 10:58 ` Lukas Sandström 1 sibling, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2005-11-13 2:38 UTC (permalink / raw) To: Lukas =?iso-2022-jp-2?B?U2FuZHN0chsuQRtOdm0=?=; +Cc: git [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=iso-2022-jp-2, Size: 558 bytes --] Lukas Sandstr^[.A^[Nvm <lukass@etek.chalmers.se> writes: > This is most likley because the pack b3c6... contains unreachable objects. > git-pack-redundant only makes sure that all objects present in packfiles > still are present in packfiles after the redundant packs have been removed. > ... > I'm thinking of the possibility passing a list of objects to be ignored > on stdin to git-pack-redundant. This would hopefully solve this problem. But once you go down that path, wouldn't doing 'repack -a -d' become looking simpler and more attractive, I wonder? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Remove unneeded packs 2005-11-13 2:38 ` Junio C Hamano @ 2005-11-13 10:58 ` Lukas Sandström 2005-11-13 12:00 ` Sergey Vlasov 0 siblings, 1 reply; 18+ messages in thread From: Lukas Sandström @ 2005-11-13 10:58 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Lukas Sandström Junio C Hamano wrote: > Lukas Sandström <lukass@etek.chalmers.se> writes: >>This is most likley because the pack b3c6... contains unreachable objects. >>git-pack-redundant only makes sure that all objects present in packfiles >>still are present in packfiles after the redundant packs have been removed. >>... >>I'm thinking of the possibility passing a list of objects to be ignored >>on stdin to git-pack-redundant. This would hopefully solve this problem. > > > But once you go down that path, wouldn't doing 'repack -a -d' > become looking simpler and more attractive, I wonder? > > It depends on how expensive git-fsck-objects --full --unreacahble is versus a full repack. Howerver, if I read the source correctly git-fsck-objects doesn't currently test the reachablility of packed objects. This would have to change, and I'm not certain of how to do that properly. Note that the following patch is reqired if git-repack -a -d is to work as expected. (Remove all packs except the new one) Btw, I'm sending this patch in utf8, let's see if it works... ---- Subject: [PATCH] Make sure all old packfiles are removed when doing a full repack This is nessecary because unrachable objects in packfiles makes git-pack-redundant flag them as non-redundant. Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se> --- git-repack.sh | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-) applies-to: 9a0f0c748316751fbf593a21f2b16bcdd975095a 08df1f641bd3f98a607a8413d647667adc18a633 diff --git a/git-repack.sh b/git-repack.sh index f347207..293bb50 100755 --- a/git-repack.sh +++ b/git-repack.sh @@ -32,6 +32,8 @@ case ",$all_into_one," in rev_list= rev_parse='--all' pack_objects= + existing=`cd "$PACKDIR" && \ + find . -type f \( -name '*.pack' -o -name '*.idx' \) -print` ;; esac if [ "$local" ]; then @@ -60,7 +62,19 @@ mv .tmp-pack-$name.pack "$PACKDIR/pack-$ mv .tmp-pack-$name.idx "$PACKDIR/pack-$name.idx" || exit -if test "$remove_redandant" = t +if test "$all_into_one" = t +then + sync + ( cd "$PACKDIR" && + for e in $existing + do + case "$e" in + ./pack-$name.pack | ./pack-$name.idx) ;; + *) rm -f $e ;; + esac + done + ) +else if test "$remove_redandant" = t then sync redundant=$(git-pack-redundant --all) --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: Remove unneeded packs 2005-11-13 10:58 ` Lukas Sandström @ 2005-11-13 12:00 ` Sergey Vlasov 2005-11-13 12:07 ` Lukas Sandström 0 siblings, 1 reply; 18+ messages in thread From: Sergey Vlasov @ 2005-11-13 12:00 UTC (permalink / raw) To: Lukas Sandstr__m; +Cc: git, Junio C Hamano [-- Attachment #1: Type: text/plain, Size: 1526 bytes --] On Sun, 13 Nov 2005 11:58:11 +0100 Lukas Sandstr__m wrote: > Subject: [PATCH] Make sure all old packfiles are removed when doing a full repack > > This is nessecary because unrachable objects in packfiles makes git-pack-redundant > flag them as non-redundant. > > Signed-off-by: Lukas Sandstr____m <lukass@etek.chalmers.se> > > --- > > git-repack.sh | 16 +++++++++++++++- > 1 files changed, 15 insertions(+), 1 deletions(-) > > applies-to: 9a0f0c748316751fbf593a21f2b16bcdd975095a > 08df1f641bd3f98a607a8413d647667adc18a633 > diff --git a/git-repack.sh b/git-repack.sh > index f347207..293bb50 100755 > --- a/git-repack.sh > +++ b/git-repack.sh > @@ -32,6 +32,8 @@ case ",$all_into_one," in > rev_list= > rev_parse='--all' > pack_objects= > + existing=`cd "$PACKDIR" && \ > + find . -type f \( -name '*.pack' -o -name '*.idx' \) -print` > ;; > esac > if [ "$local" ]; then > @@ -60,7 +62,19 @@ mv .tmp-pack-$name.pack "$PACKDIR/pack-$ > mv .tmp-pack-$name.idx "$PACKDIR/pack-$name.idx" || > exit > > -if test "$remove_redandant" = t > +if test "$all_into_one" = t This should be if test "$all_into_one$remove_redandant" = tt (otherwise "git repack -a" becomes the same as "git repack -a -d"). > +then > + sync > + ( cd "$PACKDIR" && > + for e in $existing > + do > + case "$e" in > + ./pack-$name.pack | ./pack-$name.idx) ;; > + *) rm -f $e ;; > + esac > + done > + ) > +else if test "$remove_redandant" = t > then > sync > redundant=$(git-pack-redundant --all) > --- > 0.99.9.GIT [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Remove unneeded packs 2005-11-13 12:00 ` Sergey Vlasov @ 2005-11-13 12:07 ` Lukas Sandström 2005-11-13 12:20 ` Sergey Vlasov 0 siblings, 1 reply; 18+ messages in thread From: Lukas Sandström @ 2005-11-13 12:07 UTC (permalink / raw) To: git; +Cc: Sergey Vlasov, Junio C Hamano Sergey Vlasov wrote: > On Sun, 13 Nov 2005 11:58:11 +0100 Lukas Sandström wrote: > > >>Subject: [PATCH] Make sure all old packfiles are removed when doing a full repack >> >>This is nessecary because unrachable objects in packfiles makes git-pack-redundant >>flag them as non-redundant. >> >>Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se> >> >>--- >> >> git-repack.sh | 16 +++++++++++++++- >> 1 files changed, 15 insertions(+), 1 deletions(-) >> >>applies-to: 9a0f0c748316751fbf593a21f2b16bcdd975095a >>08df1f641bd3f98a607a8413d647667adc18a633 >>diff --git a/git-repack.sh b/git-repack.sh >>index f347207..293bb50 100755 >>--- a/git-repack.sh >>+++ b/git-repack.sh >>@@ -32,6 +32,8 @@ case ",$all_into_one," in >> rev_list= >> rev_parse='--all' >> pack_objects= >>+ existing=`cd "$PACKDIR" && \ >>+ find . -type f \( -name '*.pack' -o -name '*.idx' \) -print` >> ;; >> esac >> if [ "$local" ]; then >>@@ -60,7 +62,19 @@ mv .tmp-pack-$name.pack "$PACKDIR/pack-$ >> mv .tmp-pack-$name.idx "$PACKDIR/pack-$name.idx" || >> exit >> >>-if test "$remove_redandant" = t >>+if test "$all_into_one" = t > > > This should be > > if test "$all_into_one$remove_redandant" = tt > > (otherwise "git repack -a" becomes the same as "git repack -a -d"). > > This was the behaviour before git-pack-redundant, I just restored it. Someone else gets to decide if git repack -a implies "remove all old packs". ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Remove unneeded packs 2005-11-13 12:07 ` Lukas Sandström @ 2005-11-13 12:20 ` Sergey Vlasov 2005-11-13 12:31 ` Lukas Sandström 0 siblings, 1 reply; 18+ messages in thread From: Sergey Vlasov @ 2005-11-13 12:20 UTC (permalink / raw) To: Lukas Sandstr?m; +Cc: git, Junio C Hamano [-- Attachment #1: Type: text/plain, Size: 1239 bytes --] On Sun, Nov 13, 2005 at 01:07:50PM +0100, Lukas Sandstr?m wrote: > Sergey Vlasov wrote: > > On Sun, 13 Nov 2005 11:58:11 +0100 Lukas Sandstr?m wrote: > >>-if test "$remove_redandant" = t > >>+if test "$all_into_one" = t > > > > > > This should be > > > > if test "$all_into_one$remove_redandant" = tt > > > > (otherwise "git repack -a" becomes the same as "git repack -a -d"). > > > > > > This was the behaviour before git-pack-redundant, I just restored it. But the old code was: if test "$remove_redandant" = t then # We know $existing are all redandant only when # all-into-one is used. if test "$all_into_one" != '' && test "$existing" != '' then sync ( cd "$PACKDIR" && for e in $existing do case "$e" in ./pack-$name.pack | ./pack-$name.idx) ;; *) rm -f $e ;; esac done ) fi fi So without the -d option nothing was removed, even with -a. (And test "$existing" != '' might also be needed for some shells which are confused by the empty list in the for statement.) > Someone else gets to decide if git repack -a implies "remove all old packs". If there is a separate -d option for this, just using -a probably should not remove anything. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Remove unneeded packs 2005-11-13 12:20 ` Sergey Vlasov @ 2005-11-13 12:31 ` Lukas Sandström 0 siblings, 0 replies; 18+ messages in thread From: Lukas Sandström @ 2005-11-13 12:31 UTC (permalink / raw) To: Sergey Vlasov; +Cc: git, Junio C Hamano Sergey Vlasov wrote: > On Sun, Nov 13, 2005 at 01:07:50PM +0100, Lukas Sandstr?m wrote: > >>Sergey Vlasov wrote: >> >>>On Sun, 13 Nov 2005 11:58:11 +0100 Lukas Sandstr?m wrote: > > >>>>-if test "$remove_redandant" = t >>>>+if test "$all_into_one" = t >>> >>> >>>This should be >>> >>>if test "$all_into_one$remove_redandant" = tt >>> >>>(otherwise "git repack -a" becomes the same as "git repack -a -d"). >>> >>> >> >>This was the behaviour before git-pack-redundant, I just restored it. > > > But the old code was: > > if test "$remove_redandant" = t > then > # We know $existing are all redandant only when > # all-into-one is used. > if test "$all_into_one" != '' && test "$existing" != '' > then > sync > ( cd "$PACKDIR" && > for e in $existing > do > case "$e" in > ./pack-$name.pack | ./pack-$name.idx) ;; > *) rm -f $e ;; > esac > done > ) > fi > fi > > So without the -d option nothing was removed, even with -a. > True. I forgot to look at the context around the changed lines... Btw, remove_redundant is misspellt. > (And test "$existing" != '' might also be needed for some shells which > are confused by the empty list in the for statement.) > > >>Someone else gets to decide if git repack -a implies "remove all old packs". > > > If there is a separate -d option for this, just using -a probably > should not remove anything. True, but you will have trouble removing stale packfiles if they contain unreachable objects unless you remove them when you create the -a pack. Anyway, ignore the patch above. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Remove unneeded packs 2005-11-12 13:04 Remove unneeded packs Marcel Holtmann 2005-11-12 13:13 ` Andreas Ericsson @ 2005-11-12 13:40 ` Craig Schlenter 2005-11-12 13:59 ` Balanced packing strategy Petr Baudis 1 sibling, 1 reply; 18+ messages in thread From: Craig Schlenter @ 2005-11-12 13:40 UTC (permalink / raw) To: git On 12 Nov 2005, at 3:04 PM, Marcel Holtmann wrote: > every time Linus re-creates the pack for his linux-2.6 tree, I end up > with another pack. I use HTTP as transport and thus the new pack will > be > download (which is almost 100 MB), but that is fine. > [snip] The 100MB situation is not cool for those of us on a tight bandwidth budget or slow links. Can anyone tell me if the native git protocol is any better at this stuff please? Thanks, --Craig ^ permalink raw reply [flat|nested] 18+ messages in thread
* Balanced packing strategy 2005-11-12 13:40 ` Craig Schlenter @ 2005-11-12 13:59 ` Petr Baudis 2005-11-12 15:14 ` Craig Schlenter 2005-11-13 20:06 ` Josef Weidendorfer 0 siblings, 2 replies; 18+ messages in thread From: Petr Baudis @ 2005-11-12 13:59 UTC (permalink / raw) To: Craig Schlenter; +Cc: git Dear diary, on Sat, Nov 12, 2005 at 02:40:50PM CET, I got a letter where Craig Schlenter <craig@codefountain.com> said that... > On 12 Nov 2005, at 3:04 PM, Marcel Holtmann wrote: > > >every time Linus re-creates the pack for his linux-2.6 tree, I end up > >with another pack. I use HTTP as transport and thus the new pack will > >be > >download (which is almost 100 MB), but that is fine. > >[snip] > > The 100MB situation is not cool for those of us on a tight bandwidth > budget or slow links. Can anyone tell me if the native git protocol is > any better at this stuff please? Yes, the native GIT protocol transfers only the objects you need. But the 100MB situation is still bad. FWIW, this is my proposal I sent about a month ago to some packs-related discussion at the kernel.org mailing list (ok, I updated it a little): The repacking should be done in such a way to minimize the overhead for the dumb transport users. Ideal for this is some structure like (at the end of october): year2003.pack year2004.pack halfyear2004-2.pack halfyear2005-1.pack month4.pack month5.pack month6.pack month7.pack month8.pack month9.pack week37.pack week38.pack week39.pack week40.pack week41.pack week42.pack week43.pack <individual objects for weeks 43, 44> This has the property that the second half of given pack is covered by objects with precision lower by one. This is a relatively high overload (this can be balanced by only keeping the last third or whatever), but it designed to reduce the overhead of fetching packs over dumb transport. E.g. if it's almost the end of July and you last fetched at the start of June, you will not have to get the whole halfyear2005-1 pack, but be able to catch up by just fetching month6 pack, and then few week-packs. For the autopacker (which should be ideally ran by some cronjob), this means packing new week each week and getting rid of a week worth of objects, packing new month each month and getting rid of a month worth of objects, etc. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Balanced packing strategy 2005-11-12 13:59 ` Balanced packing strategy Petr Baudis @ 2005-11-12 15:14 ` Craig Schlenter 2005-11-13 2:34 ` Junio C Hamano 2005-11-13 20:06 ` Josef Weidendorfer 1 sibling, 1 reply; 18+ messages in thread From: Craig Schlenter @ 2005-11-12 15:14 UTC (permalink / raw) To: Petr Baudis; +Cc: git On 12 Nov 2005, at 3:59 PM, Petr Baudis wrote: > Dear diary, on Sat, Nov 12, 2005 at 02:40:50PM CET, I got a letter > where Craig Schlenter <craig@codefountain.com> said that... >> The 100MB situation is not cool for those of us on a tight bandwidth >> budget or slow links. Can anyone tell me if the native git protocol is >> any better at this stuff please? > > Yes, the native GIT protocol transfers only the objects you need. Ah, magic, thanks! > But the 100MB situation is still bad. FWIW, this is my proposal I sent > about a month ago to some packs-related discussion at the kernel.org > mailing list (ok, I updated it a little): It would be nice if there was some meaningful automatic packing that didn't hurt "non-git-aware protocol" users. Does the pack index file contain enough information to enable a client to send http byte range requests to grab individual objects from a pack? It does seem to store object offsets but maybe I'm missing something ... Thank you, --Craig ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Balanced packing strategy 2005-11-12 15:14 ` Craig Schlenter @ 2005-11-13 2:34 ` Junio C Hamano 2005-11-13 11:00 ` Petr Baudis 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2005-11-13 2:34 UTC (permalink / raw) To: Craig Schlenter; +Cc: git Craig Schlenter <craig@codefountain.com> writes: > Does the pack index file contain enough information to enable a client > to send http byte range requests to grab individual objects from a pack? > It does seem to store object offsets... Yes, it is certainly doable; there is enough information. I am not sure if it is worth the complexity, though. Many objects are stored delitified, so your byte range requests would return delta and base object name. After you read what was returned and find out the base object name, you would need to get it, which can be another delta against its base object. This would make tangling a delta chain would become a serialized sequence of requests. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Balanced packing strategy 2005-11-13 2:34 ` Junio C Hamano @ 2005-11-13 11:00 ` Petr Baudis 0 siblings, 0 replies; 18+ messages in thread From: Petr Baudis @ 2005-11-13 11:00 UTC (permalink / raw) To: Junio C Hamano; +Cc: Craig Schlenter, git Dear diary, on Sun, Nov 13, 2005 at 03:34:02AM CET, I got a letter where Junio C Hamano <junkio@cox.net> said that... > Craig Schlenter <craig@codefountain.com> writes: > > > Does the pack index file contain enough information to enable a client > > to send http byte range requests to grab individual objects from a pack? > > It does seem to store object offsets... > > Yes, it is certainly doable; there is enough information. I am > not sure if it is worth the complexity, though. I think we need either the balanced packing or this. > Many objects are stored delitified, so your byte range requests > would return delta and base object name. After you read what > was returned and find out the base object name, you would need > to get it, which can be another delta against its base object. > This would make tangling a delta chain would become a serialized > sequence of requests. Sort the objects topologically, then get everything from the old heads on. Obviously, this will not work so well when we get multiple heads in single pack, but either don't do that (would it be actually so bad if we would create one pack per head?), or: (i) objects are topologically sorted (ii) objects introduced by a commit/tree are right after the commit or tree in the pack file (iii) index file contains parents list for each commit This way, you can possibly run through the gaps, or if the gap is big enough, restart the request. You still will miss objects introduced by commits in different branches, but in case of trees you can slurp the trees at once again, and pick the individual objects otherwise; while doing this second pass, you can apply the gaps strategy again. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Balanced packing strategy 2005-11-12 13:59 ` Balanced packing strategy Petr Baudis 2005-11-12 15:14 ` Craig Schlenter @ 2005-11-13 20:06 ` Josef Weidendorfer 2005-11-13 23:13 ` Junio C Hamano 1 sibling, 1 reply; 18+ messages in thread From: Josef Weidendorfer @ 2005-11-13 20:06 UTC (permalink / raw) To: git; +Cc: Petr Baudis, Craig Schlenter On Saturday 12 November 2005 14:59, Petr Baudis wrote: > The repacking should be done in such a way to minimize the overhead for > the dumb transport users. Ideal for this is some structure like (at the > end of october): > > year2003.pack > year2004.pack > ... > week42.pack > week43.pack > <individual objects for weeks 43, 44> I am not sure if it is really beneficial, as packs have the requirement to be self contained, so you get a lot of objects undeltified which could be deltified in a better scheme (as eg. in git native protocol). AFAICS, the git native protocol (which is nothing more than a pack itself for each transfer) even has this problem, too: If you are updating every day via git native, the sum of transfered bytes in a month will be a multiple of one git transfer for all the month's changes. To keep the pack self-containment property, but work better with dumb transfers, we could introduce incremental packs: Instead of fully repacking, create a new pack by only appendending new objects at the end of the pack. Thus, most objects will be appended in deltified form, making the incremental addition quite small. The outcome would be a totally new package. Unfortunately, I do not know the package format in detail, and hope that this is possible at all. For dumb protocols to take advantage of this, the information that the first part of a package is actually the same as another package has to be stored somewhere visible. If a client detects that it has the first part of a pack already locally, it would be enough to fetch only some the second part. This is more or less the same as Pasky's solution, but by using incremental packs instead. I think that such incremental packing will not even take much more space that fully repacking. Josef ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Balanced packing strategy 2005-11-13 20:06 ` Josef Weidendorfer @ 2005-11-13 23:13 ` Junio C Hamano 0 siblings, 0 replies; 18+ messages in thread From: Junio C Hamano @ 2005-11-13 23:13 UTC (permalink / raw) To: Petr Baudis; +Cc: git, Josef Weidendorfer Petr Baudis <pasky@suse.cz> writes: > This has the property that the second half of given pack is covered by > objects with precision lower by one. This is a relatively high overload > (this can be balanced by only keeping the last third or whatever), but > it designed to reduce the overhead of fetching packs over dumb > transport. I have a feeling that you would be better off if instead do the repacking on the server side to prepare multiple packs, each of which has all the necessary objects to bring people who was up-to-date at various timerange ago, to arrange that you would need only one patch fetch with individual objects near the tip. This obviously needs smarter client-side support. Suppose we are somewhere after releasing v1.8 and inching towards v1.9: In your proposal, the object ranges each pack contains would look like this: v1.0..v1.5 -------- v1.5..v1.6 ----- v1.6..v1.7 ------- v1.7..v1.8 ----- individual objects .... That is, there are slight overlaps but you would do multiple packs if you are really behind. Instead, you could do this: v1.0..v1.8 ---------------------- v1.5..v1.8 -------------- v1.6..v1.8 ---------- v1.7..v1.8 ---- individual objects .... Everybody starts from the tip, fetching individual objects, and when the last repack boundary (the time we released 1.8) is reached, the dumb protocol downloader now faces a choice. The indices are fairly small, so you fetch all of them and see how many objects you are lacking from each pack. If you were up-to-date very long time ago, say at v1.2, you would obviously need to fetch the longest pack. If you were up-to-date recently, say after v1.6 was released, you need to fetch smaller pack. Given the self containedness requirements, any path that is touched once in a period needs at least one full copy of it in each pack (all other revisions could be deltified), and I suspect in practice the oldest pack (v1.0..v1.5 pack in your scheme) would not save much space by not having v1.5..v1.8 history. We could tweak things further to do something like this: v1.0..v1.8 ------------------ v1.5..v1.8 ---------- v1.6..v1.8 ------ v1.7..v1.8 ---- individual objects .... to also account for a fact that the recent ones cover shorter time range and not many paths are touched. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2005-11-13 23:13 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-11-12 13:04 Remove unneeded packs Marcel Holtmann 2005-11-12 13:13 ` Andreas Ericsson 2005-11-12 13:30 ` Marcel Holtmann 2005-11-12 22:02 ` Lukas Sandström 2005-11-12 22:13 ` Marcel Holtmann 2005-11-13 2:38 ` Junio C Hamano 2005-11-13 10:58 ` Lukas Sandström 2005-11-13 12:00 ` Sergey Vlasov 2005-11-13 12:07 ` Lukas Sandström 2005-11-13 12:20 ` Sergey Vlasov 2005-11-13 12:31 ` Lukas Sandström 2005-11-12 13:40 ` Craig Schlenter 2005-11-12 13:59 ` Balanced packing strategy Petr Baudis 2005-11-12 15:14 ` Craig Schlenter 2005-11-13 2:34 ` Junio C Hamano 2005-11-13 11:00 ` Petr Baudis 2005-11-13 20:06 ` Josef Weidendorfer 2005-11-13 23:13 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).