From: Nicolas Pitre <nico@cam.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: git@vger.kernel.org
Subject: [PATCH] fix scalability problems with git-deltafy-script
Date: Mon, 20 Jun 2005 00:37:35 -0400 (EDT) [thread overview]
Message-ID: <Pine.LNX.4.63.0506200036210.1667@localhost.localdomain> (raw)
Current version would spin forever and exhaust memory while
attempting to sort all files from all revisions at once, until it
dies before even doing any real work. This is especially noticeable
when used on a big repository like the imported bkcvs repo for the
Linux kernel.
This patch allows for batching the sort to put a bound on needed
resources and making progress early, as well as including some small
cleanups.
Signed-off-by: Nicolas Pitre <nico@cam.org>
diff --git a/git-deltafy-script b/git-deltafy-script
--- a/git-deltafy-script
+++ b/git-deltafy-script
@@ -1,6 +1,6 @@
#!/bin/bash
-# Example script to deltafy an entire GIT repository based on the commit list.
+# Example script to deltify an entire GIT repository based on the commit list.
# The most recent version of a file is the reference and previous versions
# are made delta against the best earlier version available. And so on for
# successive versions going back in time. This way the increasing delta
@@ -25,37 +25,51 @@
set -e
-depth=
-[ "$1" == "-d" ] && depth="--max-depth=$2" && shift 2
+max_depth=
+[ "$1" == "-d" ] && max_depth="--max-depth=$2" && shift 2
+
+overlap=30
+max_behind="--max-behind=$overlap"
function process_list() {
if [ "$list" ]; then
echo "Processing $curr_file"
- echo "$head $list" | xargs git-mkdelta $depth --max-behind=30 -v
+ echo "$list" | xargs git-mkdelta $max_depth $max_behind -v
fi
}
+rev_list=""
curr_file=""
git-rev-list HEAD |
-git-diff-tree -r -t --stdin |
-awk '/^:/ { if ($5 == "M" || $5 == "N") print $4, $6;
- if ($5 == "M") print $3, $6 }' |
-LC_ALL=C sort -s -k 2 | uniq |
-while read sha1 file; do
- if [ "$file" == "$curr_file" ]; then
- list="$list $sha1"
- else
- process_list
- curr_file="$file"
- list=""
- head="$sha1"
- fi
+while true; do
+ # Let's batch revisions into groups of 1000 to give it a chance to
+ # scale with repositories containing long revision lists. We also
+ # overlap with the previous batch the size of mkdelta's look behind
+ # value in order to account for the processing discontinuity.
+ rev_list="$(echo -e -n "$rev_list" | tail --lines=$overlap)"
+ for i in $(seq 1000); do
+ read rev || break
+ rev_list="$rev_list$rev\n"
+ done
+ echo -e -n "$rev_list" |
+ git-diff-tree -r -t --stdin |
+ awk '/^:/ { if ($5 == "M") printf "%s %s\n%s %s\n", $4, $6, $3, $6 }' |
+ LC_ALL=C sort -s -k 2 | uniq |
+ while read sha1 file; do
+ if [ "$file" == "$curr_file" ]; then
+ list="$list $sha1"
+ else
+ process_list
+ curr_file="$file"
+ list="$sha1"
+ fi
+ done
+ [ "$rev" ] || break
done
process_list
curr_file="root directory"
-head=""
list="$(
git-rev-list HEAD |
while read commit; do
next reply other threads:[~2005-06-20 4:32 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-06-20 4:37 Nicolas Pitre [this message]
-- strict thread matches above, loose matches on Subject: below --
2005-06-16 2:59 [PATCH] fix scalability problems with git-deltafy-script Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.63.0506200036210.1667@localhost.localdomain \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).