From: Nicolas Pitre <nico@cam.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Some git performance measurements..
Date: Thu, 29 Nov 2007 12:25:39 -0500 (EST) [thread overview]
Message-ID: <alpine.LFD.0.99999.0711291208060.9605@xanadu.home> (raw)
In-Reply-To: <alpine.LFD.0.9999.0711282022470.8458@woody.linux-foundation.org>
On Wed, 28 Nov 2007, Linus Torvalds wrote:
> On Wed, 28 Nov 2007, Nicolas Pitre wrote:
>
> > But for a checkout that should actually correspond to a nice linear
> > access.
>
> For the initial check-out, yes. But the thing I timed was just a plain
> "git checkout", which won't actually do any of the blobs if they already
> exist checked-out (which I obviously had), which explains the non-dense
> patterns.
>
> The reason I care about "git checkout" (which is totally uninteresting in
> itself) is that it is a trivial use-case that fairly closely approximates
> two common cases that are *not* uninteresting: switching branches with
> most files unaffected and a fast-forward merge (both of which are the
> "two-way merge" special case).
[...]
> So it's actually fairly common to have "git checkout"-like behaviour with
> no blobs needing to be updated, and the "initial checkout" is in fact
> likely a less usual case. I wonder if we should make the pack-file have
> all the object types in separate regions (we already do that for commits,
> since "git rev-list" kind of operations are dense in the commit).
>
> Making the tree objects dense (the same way the commit objects are) might
> also conceivably speed up "git blame" and path history simplification,
> since those also tend to be "dense" in the tree history but don't actually
> look at the blobs themselves until they change.
Well, see below for the patch that actually split the pack data into
objects of the same type. Doing that "git checkout" on the kernel tree
did improve things for me although not spectacularly.
Current Git warm cache: 0.532s
Current Git cold cache: 17.4s
Patched Git warm cache: 0.521s
Patched Git cold cache: 14.2s
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 4f44658..b655efd 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -585,22 +585,43 @@ static off_t write_one(struct sha1file *f,
return offset + size;
}
+static int sort_by_type(const void *_a, const void *_b)
+{
+ const struct object_entry *a = *(struct object_entry **)_a;
+ const struct object_entry *b = *(struct object_entry **)_b;
+
+ /*
+ * Preserve recency order for objects of the same type and reused deltas.
+ */
+ if(a->type == OBJ_REF_DELTA || a->type == OBJ_OFS_DELTA ||
+ b->type == OBJ_REF_DELTA || b->type == OBJ_OFS_DELTA ||
+ a->type == b->type)
+ return (a < b) ? -1 : 1;
+ return a->type - b->type;
+}
+
/* forward declaration for write_pack_file */
static int adjust_perm(const char *path, mode_t mode);
static void write_pack_file(void)
{
- uint32_t i = 0, j;
+ uint32_t i, j;
struct sha1file *f;
off_t offset, offset_one, last_obj_offset = 0;
struct pack_header hdr;
int do_progress = progress >> pack_to_stdout;
uint32_t nr_remaining = nr_result;
+ struct object_entry **sorted_by_type;
if (do_progress)
progress_state = start_progress("Writing objects", nr_result);
written_list = xmalloc(nr_objects * sizeof(*written_list));
+ sorted_by_type = xmalloc(nr_objects * sizeof(*sorted_by_type));
+ for (i = 0; i < nr_objects; i++)
+ sorted_by_type[i] = objects + i;
+ qsort(sorted_by_type, nr_objects, sizeof(*sorted_by_type), sort_by_type);
+ i = 0;
do {
unsigned char sha1[20];
char *pack_tmp_name = NULL;
@@ -625,7 +646,7 @@ static void write_pack_file(void)
nr_written = 0;
for (; i < nr_objects; i++) {
last_obj_offset = offset;
- offset_one = write_one(f, objects + i, offset);
+ offset_one = write_one(f, sorted_by_type[i], offset);
if (!offset_one)
break;
offset = offset_one;
@@ -681,6 +702,7 @@ static void write_pack_file(void)
nr_remaining -= nr_written;
} while (nr_remaining && i < nr_objects);
+ free(sorted_by_type);
free(written_list);
stop_progress(&progress_state);
if (written != nr_result)
next prev parent reply other threads:[~2007-11-29 17:26 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-29 2:49 Some git performance measurements Linus Torvalds
2007-11-29 3:14 ` Linus Torvalds
2007-11-29 3:59 ` Nicolas Pitre
2007-11-29 4:32 ` Linus Torvalds
2007-11-29 17:25 ` Nicolas Pitre [this message]
2007-11-29 17:48 ` Linus Torvalds
2007-11-29 18:52 ` Nicolas Pitre
2007-11-30 5:00 ` Junio C Hamano
2007-11-30 6:03 ` Linus Torvalds
2007-11-30 0:54 ` Jakub Narebski
2007-11-30 2:21 ` Linus Torvalds
2007-11-30 2:39 ` Jakub Narebski
2007-11-30 2:40 ` Nicolas Pitre
2007-11-30 6:11 ` Steffen Prohaska
2007-12-07 13:35 ` Mike Ralphson
2007-12-07 13:49 ` Johannes Schindelin
2007-12-07 16:07 ` Linus Torvalds
2007-12-07 16:09 ` Mike Ralphson
2007-12-07 18:37 ` Johannes Schindelin
2007-12-07 19:15 ` Mike Ralphson
2007-12-08 11:05 ` Johannes Schindelin
2007-12-08 23:04 ` Brian Downing
2007-11-30 2:54 ` Linus Torvalds
2007-12-05 1:04 ` Federico Mena Quintero
2007-12-01 11:36 ` Joachim B Haga
2007-12-01 17:19 ` Linus Torvalds
2007-11-29 5:17 ` Junio C Hamano
2007-11-29 10:17 ` [PATCH] per-directory-exclude: lazily read .gitignore files Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.0.99999.0711291208060.9605@xanadu.home \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).