* [PATCH] Split packs from git-repack should have descending timestamps
@ 2007-05-24 22:33 Dana How
2007-05-25 0:46 ` Shawn O. Pearce
0 siblings, 1 reply; 5+ messages in thread
From: Dana How @ 2007-05-24 22:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List, danahow
If git-repack produces multiple split packs because
--max-pack-size was in effect, the first pack written
should have the latest timestamp because:
(1) sha1_file.c:rearrange_packed_git() puts more recent
pack files at the beginning of the search list; and
(2) the most recent objects are written out first
while packing.
This is based on next rather than master to avoid merge
conflicts with changes already in git-repack.sh due to
the --max-pack-size patchset.
Signed-off-by: Dana L. How <danahow@gmail.com>
---
git-repack.sh | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/git-repack.sh b/git-repack.sh
index 4ea6e5b..953de4a 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -68,6 +68,7 @@ names=$(git-pack-objects --non-empty --all --reflog $args </dev/null "$PACKTMP")
if [ -z "$names" ]; then
echo Nothing new to pack.
fi
+restamp=
for name in $names ; do
chmod a-w "$PACKTMP-$name.pack"
chmod a-w "$PACKTMP-$name.idx"
@@ -94,8 +95,12 @@ for name in $names ; do
exit 1
}
rm -f "$PACKDIR/old-pack-$name.pack" "$PACKDIR/old-pack-$name.idx"
+ restamp="$PACKDIR/pack-$name.pack $restamp"
done
+# for split packs, the first created should have most recent timestamp
+for file in $restamp ; do touch $file ; sleep 2; done &
+
if test "$remove_redundant" = t
then
# We know $existing are all redundant.
--
1.5.2.762.gd8c6-dirty
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] Split packs from git-repack should have descending timestamps 2007-05-24 22:33 [PATCH] Split packs from git-repack should have descending timestamps Dana How @ 2007-05-25 0:46 ` Shawn O. Pearce 2007-05-25 1:04 ` Junio C Hamano 0 siblings, 1 reply; 5+ messages in thread From: Shawn O. Pearce @ 2007-05-25 0:46 UTC (permalink / raw) To: Dana How; +Cc: Junio C Hamano, Git Mailing List Dana How <danahow@gmail.com> wrote: > > If git-repack produces multiple split packs because > --max-pack-size was in effect, the first pack written > should have the latest timestamp because: > (1) sha1_file.c:rearrange_packed_git() puts more recent > pack files at the beginning of the search list; and > (2) the most recent objects are written out first > while packing. > > This is based on next rather than master to avoid merge > conflicts with changes already in git-repack.sh due to > the --max-pack-size patchset. Ack. Given our mtime based sorting routine, even without your recent patch to improve it, I think we definately want this type of behavior built into git-repack.sh. Good follow-on to your --max-pack-size series. -- Shawn. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Split packs from git-repack should have descending timestamps 2007-05-25 0:46 ` Shawn O. Pearce @ 2007-05-25 1:04 ` Junio C Hamano 2007-05-25 2:33 ` Dana How 0 siblings, 1 reply; 5+ messages in thread From: Junio C Hamano @ 2007-05-25 1:04 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Dana How, Git Mailing List "Shawn O. Pearce" <spearce@spearce.org> writes: > Dana How <danahow@gmail.com> wrote: >> >> If git-repack produces multiple split packs because >> --max-pack-size was in effect, the first pack written >> should have the latest timestamp because: >> (1) sha1_file.c:rearrange_packed_git() puts more recent >> pack files at the beginning of the search list; and >> (2) the most recent objects are written out first >> while packing. >> >> This is based on next rather than master to avoid merge >> conflicts with changes already in git-repack.sh due to >> the --max-pack-size patchset. > > Ack. Given our mtime based sorting routine, even without your > recent patch to improve it, I think we definately want this type > of behavior built into git-repack.sh. Good follow-on to your > --max-pack-size series. Gee, I do not want to touch this, unless we can do something about that sleep 2, even if you have & at the end (actually, especially because you have that -- it makes me worried). At the minimum, I think you do not have to restamp at all if the result is a single pack (i.e. the usual case), like so: case "$restamp" in ?*' '?*) # we have more than one. # for split packs, the first created should have most recent timestamp for file in $restamp ; do touch $file; sleep 2; done & ;; esac Come to think of it, can't you do this "re-touching" business at the end of pack-objects without sleeping? You could keep track of the names of the packs you produced, and if you have produced 5, like so: 1 2 3 4 5 you would swap timestamp of #1 and #5, #2 and #4 using stat() and utime(), and you are done. Each of these huge packs would take more than one second to write it out, but if that is not the case, you could even start with timestamp of #5, subtract 1 and stamp #4, subtract 1 and stamp #3, ... You may end up using timestamp from the past, but that would not be a problem. And I am really hoping that the other "use object density in reordering" patch would make this irrelevant. You would have commit and then the rest in the normal input object stream, and recenty ordering done by git-pack-objects should keep commits together early in the resulting split pack, and earlier parts that have the commits would be hopefully denser. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Split packs from git-repack should have descending timestamps 2007-05-25 1:04 ` Junio C Hamano @ 2007-05-25 2:33 ` Dana How 2007-05-25 3:18 ` Junio C Hamano 0 siblings, 1 reply; 5+ messages in thread From: Dana How @ 2007-05-25 2:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: Shawn O. Pearce, Git Mailing List, danahow On 5/24/07, Junio C Hamano <junkio@cox.net> wrote: > "Shawn O. Pearce" <spearce@spearce.org> writes: > > Dana How <danahow@gmail.com> wrote: > >> > >> If git-repack produces multiple split packs because > >> --max-pack-size was in effect, the first pack written > >> should have the latest timestamp because: > >> (1) sha1_file.c:rearrange_packed_git() puts more recent > >> pack files at the beginning of the search list; and > >> (2) the most recent objects are written out first > >> while packing. > > > > Ack. Given our mtime based sorting routine, even without your > > recent patch to improve it, I think we definately want this type > > of behavior built into git-repack.sh. Good follow-on to your > > --max-pack-size series. > > Gee, I do not want to touch this, unless we can do something > about that sleep 2, even if you have & at the end (actually, > especially because you have that -- it makes me worried). > > At the minimum, I think you do not have to restamp at all if the > result is a single pack (i.e. the usual case), like so: > > case "$restamp" in > ?*' '?*) > # we have more than one. > # for split packs, the first created should have most recent timestamp > for file in $restamp ; do touch $file; sleep 2; done & > ;; > esac > > Come to think of it, can't you do this "re-touching" business at > the end of pack-objects without sleeping? You could keep track > of the names of the packs you produced, and if you have produced > 5, like so: > > 1 > 2 > 3 > 4 > 5 > > you would swap timestamp of #1 and #5, #2 and #4 using stat() > and utime(), and you are done. Each of these huge packs would > take more than one second to write it out, but if that is not > the case, you could even start with timestamp of #5, subtract 1 > and stamp #4, subtract 1 and stamp #3, ... You may end up using > timestamp from the past, but that would not be a problem. OK, this triggered the following argument which convinces me: git-pack-objects really should guarantee the correct timestamp order, otherwise some other caller will have to repeat the stuff I tried to put in git-repack.sh . So I will resubmit following Junio's suggestions. This won't be for a few days. Also, if there are rules on allowable bash constructs (POSIX only, no &, etc), perhaps they should go in SubmittingPatches near the new C99 comments? > And I am really hoping that the other "use object density in > reordering" patch would make this irrelevant. You would have > commit and then the rest in the normal input object stream, and > recenty ordering done by git-pack-objects should keep commits > together early in the resulting split pack, and earlier parts > that have the commits would be hopefully denser. I understand your point, but for a "normal" yet extremely large repository this may not be the case. The "object density" patch is designed so that the density component of the sort key is extremely weak -- I think the timestamp is very revealing, and should be followed in the absence of large variations in object density. Correcting the timestamps makes sure that the timestamp order corresponds sensibly to recency order when packs are split. A sequence of user commands producing packfiles results in sensible and usable timestamps; i"d just like to make sure this is also true when packs are split. Anyway, I'm not going to submit anything more about timestamps or object density until I see reactions to both patches, since they interact. -- Dana L. How danahow@gmail.com +1 650 804 5991 cell ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Split packs from git-repack should have descending timestamps 2007-05-25 2:33 ` Dana How @ 2007-05-25 3:18 ` Junio C Hamano 0 siblings, 0 replies; 5+ messages in thread From: Junio C Hamano @ 2007-05-25 3:18 UTC (permalink / raw) To: Dana How; +Cc: Shawn O. Pearce, Git Mailing List "Dana How" <danahow@gmail.com> writes: > Also, if there are rules on allowable bash constructs > (POSIX only, no &, etc), perhaps they should go in > SubmittingPatches near the new C99 comments? No bash arrays, no "function" noisewords, limiting <funky> in ${word<funky>word} constructs to POSIX (that means +,-,#,##,%,%% but no regexps), prefer "test" over "[" (the last one is just for readability). But the reason I barfed on "&" is not about the syntax nor portability. I was afraid of somebody else manipulating things long after the parent "git-repack" returns (but still the stamper sleeping and waiting to restamp the next one) and gets confused. In this particular case, the restamping is only about the performance so it is not _too_ bad, but in general I really do not like leftover processes still doing something in the background when the user thinks everything is done. > I understand your point, but for a "normal" yet extremely > large repository this may not be the case. The "object density" > patch is designed so that the density component of the sort > key is extremely weak -- I think the timestamp is very revealing, > and should be followed in the absence of large variations > in object density. I still think "a pack that has ONLY megablobs and mark it with .keep" is much simpler approach, and there is no question that density would work extremely well with that kind of arrangement. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-05-25 3:18 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-05-24 22:33 [PATCH] Split packs from git-repack should have descending timestamps Dana How 2007-05-25 0:46 ` Shawn O. Pearce 2007-05-25 1:04 ` Junio C Hamano 2007-05-25 2:33 ` Dana How 2007-05-25 3:18 ` Junio C Hamano
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.