* Some git performance measurements.. @ 2007-11-29 2:49 Linus Torvalds 2007-11-29 3:14 ` Linus Torvalds 2007-11-29 5:17 ` Junio C Hamano 0 siblings, 2 replies; 28+ messages in thread From: Linus Torvalds @ 2007-11-29 2:49 UTC (permalink / raw) To: Git Mailing List So, today, I finally started looking a bit at one of the only remaining performance issues that I'm aware of: git behaviour under cold-cache and particularly with a slow laptop harddisk isn't as nice as I wish it should be. Sadly, one big reason for performance downsides is actually hard to measure: since we mmap() all the really critical data structures (the pack-file index and data), it doesn't show up really well in any of the otherwise really useful performance tools (eg strace and ltrace), because the bulk of the time is actually spent not in libraries or in system calls, but simply on regular instructions that take a page fault. Not a lot that I see we can do about that, unless we can make the pack-files even denser. But one very interesting thing I did notice: some loads open the ".gitignore" files *way* too much. Even in cases where we really don't care. And when the caches are cold, that's actually very expensive, even if - and perhaps _especially_when_ - the file doesn't exist at all (ie some filesystems that don't use hashes will look through the whole directory before they see that it's empty). An example of totally unnecessary .gitignore files is what a plain "git checkout" with no arguments ends up doing: git read-tree -m -u --exclude-per-directory=.gitignore HEAD HEAD which is *really* quite expensive, and a lot of the cost is trying to open a .gitignore file in each subdirectory that are never even used. Just to give a feel for *how* expensive that stupid .gitignore thing is, here's a pretty telling comparison of using --exclude-per-directory and not using it: With totally pointless --exclude-per-directory (aka "git checkout"): [torvalds@woody linux]$ time git read-tree -m -u --exclude-per-directory=.gitignore HEAD HEAD real 0m13.475s user 0m0.108s sys 0m0.228s Without: [torvalds@woody linux]$ time git read-tree -m -u HEAD HEAD real 0m5.923s user 0m0.100s sys 0m0.044s now, I'm not all that happy about that latter six-second time either, but both of the above numbers were done with completely cold caches (ie after having done a "echo 3 > /proc/sys/vm/drop_caches" as root). With hot caches, both of the numbers are under a tenth of a second (in fact, they are very close: 0.092s and 0.096s respectively), but the cold-cache case really shows just how horrible it is to (try to) open many files. Doing an open (or an lstat) on individual files will be a totally synchronous operation, with no room for read-ahead etc, so even if your disk in *theory* gets 80MB/s off the platter, when you do an open() or lstat(), you're basically doing three or four small data-dependent IO operations, and as a result even a fast disk will take almost a hundredth of a second per open/lstat operation. Less than a hundredth of a second may not sound much, but when we have 1700+ directories in the kernel trees, doing that for each possible .gitignore file is really really expensive! (Doing an "lstat()" of each file is much cheaper in comparison, because at least you'll get several director entries and probably a few related inodes with each IO. But opening just _one_ file per directory like the .gitignore code does, really kills your IO throughput) Now, timings like these are why I'm looking forward to SSD's. They may have the same throughput as a disk, but they can do thousands of dependent IOPS, and help latency-bound cases like this by an order of magnitude. But when we're doing those .gitignore file reads totally unnecassarily, that just hurts.. Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-29 2:49 Some git performance measurements Linus Torvalds @ 2007-11-29 3:14 ` Linus Torvalds 2007-11-29 3:59 ` Nicolas Pitre 2007-12-01 11:36 ` Joachim B Haga 2007-11-29 5:17 ` Junio C Hamano 1 sibling, 2 replies; 28+ messages in thread From: Linus Torvalds @ 2007-11-29 3:14 UTC (permalink / raw) To: Git Mailing List On Wed, 28 Nov 2007, Linus Torvalds wrote: > > Sadly, one big reason for performance downsides is actually hard to > measure: since we mmap() all the really critical data structures (the > pack-file index and data), it doesn't show up really well in any of the > otherwise really useful performance tools (eg strace and ltrace), because > the bulk of the time is actually spent not in libraries or in system > calls, but simply on regular instructions that take a page fault. Side note: while the ".gitignore" issue actually overshadowed everything else for things like "git checkout" and "git merge", I do have some traces for page faults too. They're kind of interesting, but actually show that we do reasonably well in this area - certainly much better than if we tried to keep things in multiple independent files. The pack-files (both index and data) are accessed somewhat randomly, but there is still enough locality that doing read-ahead and clustering really does help. So while "git checkout" actually end up reading a lot of git "tree" objects (as many tree objects as there are .gitignore files that we try to open unnecessarily!), we actually spend a *lot* less time on the pack-file operations than we do on trying to do the futile ".gitignore" opens in every directory. If somebody cares, I do have the page fault traces on those things for the same "git checkout" that had the horrid .gitignore behaviour. What's interesting is: - the pack-file really does get accessed in nice increasing patterns for the top-of-tree. So the fact that we sort objects "topologically" does actually seem to work pretty well (yeah, it was obvious, but it's nice to have a real page fault trace that shows the effect!) The accesses seemed to be *totally* ordered. There were obvously gaps (ie they weren't *dense*, since we only looked at one version of the trees in "git checkout"), but at the same time, it certainly wasn't random or causing any horrible seeking back-and-forth. - we're *reasonably* dense in pack-file accesses. Not wonderful, but not horrible. That in turn means that read-ahead kicks in and works ok. The pattern for the pack-file hat I caught looks like this (the format is "pageindex: cycles-to-fill-in-kernel"): 6810: 15202545 6811: 353 6812: 267 6813: 476 6814: 559 6826: 1086510 6878: 13948057 6894: 9756446 6896: 300 6899: 307 6903: 319 6907: 293 6910: 666120 6912: 401 6913: 330 6916: 401 6918: 303 6919: 330 6931: 784373 6943: 405 6944: 187 6945: 315 ... and here you can see that read-ahead works fine about 60% of the time, with most pages taking just 300-500 CPU cycles (no actual IO, of course!) to find/lookup, but then when there are discontinuities we have the cost of the real IO, and the cost for those obviously then jumps into the tens of millions of cycles (ie we're talking several milliseconds). - the index accesses are much more "random": the initial 256-way fan-out followed by the binary search causes the access patterns to look very different: 0: 28367707 136: 18867574 140: 221280 141: 745890 142: 284427 143: 338 381: 9787459 377: 394 375: 255 376: 248 3344: 29885989 3347: 334 3346: 255 3684: 7251911 1055: 12954064 1052: 386 1050: 251 1049: 240 1947: 10501455 1944: 382 1946: 262 where it doesn't even read-ahead at all in the beginning (because it looks entirely random), but the kernel eventually *does* actually go into read-ahead mode pretty soon simply because once it gets into the binary search thing, the data entries are close enough to be in adjacent pages, and it all looks ok. As a result, by the end of the run, all the index file pages have been cached, and we're getting uniformly low numbers (ie in the hundreds of cycles, not millions): ... 490: 810 489: 352 488: 334 2254: 907 91: 484 3580: 776 962: 806 522: 761 2494: 514 805: 653 177: 495 176: 375 2861: 439 649: 660 648: 518 ... so the pack index file access patterns are much less predictable, but since the pack index is so dense and relatively small, that doesn't end up being a problem in the long run, and only in the beginning do we pay a highish cost for having to read it into memory. In other words: we do ok. I think we could do better (and an SSD would certainly help, since we're never going to do long and 100% contiguous IO), but I was expecting to see *big* trouble, and it really wasn't that horrid. That said, I think there's something subtly wrong in our pack-file sorting, and it should be more contiguous when we just do tree object accesses on the top commit. I was really hoping that all the top-level trees should be written entirely together, but I wonder if the "write out deltas first" thing causes us to have those big gaps in between. Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-29 3:14 ` Linus Torvalds @ 2007-11-29 3:59 ` Nicolas Pitre 2007-11-29 4:32 ` Linus Torvalds 2007-12-01 11:36 ` Joachim B Haga 1 sibling, 1 reply; 28+ messages in thread From: Nicolas Pitre @ 2007-11-29 3:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Wed, 28 Nov 2007, Linus Torvalds wrote: > - the index accesses are much more "random": the initial 256-way fan-out > followed by the binary search causes the access patterns to look very > different: > > 0: 28367707 > 136: 18867574 > 140: 221280 > 141: 745890 > 142: 284427 > 143: 338 > 381: 9787459 > 377: 394 > 375: 255 > 376: 248 > 3344: 29885989 > 3347: 334 > 3346: 255 > 3684: 7251911 > 1055: 12954064 > 1052: 386 > 1050: 251 > 1049: 240 > 1947: 10501455 > 1944: 382 > 1946: 262 > > where it doesn't even read-ahead at all in the beginning (because it > looks entirely random), but the kernel eventually *does* actually go > into read-ahead mode pretty soon simply because once it gets into the > binary search thing, the data entries are close enough to be in > adjacent pages, and it all looks ok. Did you try with version 2 of the pack index? Because it should have somewhat better locality as the object SHA1 and their offset are split into separate tables. > That said, I think there's something subtly wrong in our pack-file > sorting, and it should be more contiguous when we just do tree object > accesses on the top commit. I was really hoping that all the top-level > trees should be written entirely together, but I wonder if the "write out > deltas first" thing causes us to have those big gaps in between. Tree objects aren't all together. Related blob objects are interlaced with those tree objects. But for a checkout that should actually correspond to a nice linear access. And deltas aren't written first, but rather their base object. And because deltas are based on newer objects, in theory the top commit shouldn't have any delta at all, and the second commit should have all the base objects for its deltas already written out a part of the first commit. At least that's what a perfect data set would produce. Last time I checked, there was about 20% of the deltas that happened to be in the other direction, i.e. the deltified object was younger than its base object, most probably because the new version of the file shrunk instead of growing which is against the assumption in the delta search object sort. But again, because the base object is needed to resolve the delta, it will be read anyway. Nicolas ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-29 3:59 ` Nicolas Pitre @ 2007-11-29 4:32 ` Linus Torvalds 2007-11-29 17:25 ` Nicolas Pitre 0 siblings, 1 reply; 28+ messages in thread From: Linus Torvalds @ 2007-11-29 4:32 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Git Mailing List On Wed, 28 Nov 2007, Nicolas Pitre wrote: > > Tree objects aren't all together. Related blob objects are interlaced > with those tree objects. Yeah, I noticed that a few minutes after saying this. > But for a checkout that should actually correspond to a nice linear > access. For the initial check-out, yes. But the thing I timed was just a plain "git checkout", which won't actually do any of the blobs if they already exist checked-out (which I obviously had), which explains the non-dense patterns. The reason I care about "git checkout" (which is totally uninteresting in itself) is that it is a trivial use-case that fairly closely approximates two common cases that are *not* uninteresting: switching branches with most files unaffected and a fast-forward merge (both of which are the "two-way merge" special case). I also suspect it is pretty close to a real three-way merge (again, with just a few files changed). IOW, there's a lot of these "tree operations" that actually leave 99% of the tree totally unchanged, at least in the kernel. Even a fairly big merge tends to change just a few hundred files. And when there are 23,000 files in the tree, a few hundred files is a fairly small percentage! So it's actually fairly common to have "git checkout"-like behaviour with no blobs needing to be updated, and the "initial checkout" is in fact likely a less usual case. I wonder if we should make the pack-file have all the object types in separate regions (we already do that for commits, since "git rev-list" kind of operations are dense in the commit). Making the tree objects dense (the same way the commit objects are) might also conceivably speed up "git blame" and path history simplification, since those also tend to be "dense" in the tree history but don't actually look at the blobs themselves until they change. Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-29 4:32 ` Linus Torvalds @ 2007-11-29 17:25 ` Nicolas Pitre 2007-11-29 17:48 ` Linus Torvalds 2007-11-30 0:54 ` Jakub Narebski 0 siblings, 2 replies; 28+ messages in thread From: Nicolas Pitre @ 2007-11-29 17:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Wed, 28 Nov 2007, Linus Torvalds wrote: > On Wed, 28 Nov 2007, Nicolas Pitre wrote: > > > But for a checkout that should actually correspond to a nice linear > > access. > > For the initial check-out, yes. But the thing I timed was just a plain > "git checkout", which won't actually do any of the blobs if they already > exist checked-out (which I obviously had), which explains the non-dense > patterns. > > The reason I care about "git checkout" (which is totally uninteresting in > itself) is that it is a trivial use-case that fairly closely approximates > two common cases that are *not* uninteresting: switching branches with > most files unaffected and a fast-forward merge (both of which are the > "two-way merge" special case). [...] > So it's actually fairly common to have "git checkout"-like behaviour with > no blobs needing to be updated, and the "initial checkout" is in fact > likely a less usual case. I wonder if we should make the pack-file have > all the object types in separate regions (we already do that for commits, > since "git rev-list" kind of operations are dense in the commit). > > Making the tree objects dense (the same way the commit objects are) might > also conceivably speed up "git blame" and path history simplification, > since those also tend to be "dense" in the tree history but don't actually > look at the blobs themselves until they change. Well, see below for the patch that actually split the pack data into objects of the same type. Doing that "git checkout" on the kernel tree did improve things for me although not spectacularly. Current Git warm cache: 0.532s Current Git cold cache: 17.4s Patched Git warm cache: 0.521s Patched Git cold cache: 14.2s diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 4f44658..b655efd 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -585,22 +585,43 @@ static off_t write_one(struct sha1file *f, return offset + size; } +static int sort_by_type(const void *_a, const void *_b) +{ + const struct object_entry *a = *(struct object_entry **)_a; + const struct object_entry *b = *(struct object_entry **)_b; + + /* + * Preserve recency order for objects of the same type and reused deltas. + */ + if(a->type == OBJ_REF_DELTA || a->type == OBJ_OFS_DELTA || + b->type == OBJ_REF_DELTA || b->type == OBJ_OFS_DELTA || + a->type == b->type) + return (a < b) ? -1 : 1; + return a->type - b->type; +} + /* forward declaration for write_pack_file */ static int adjust_perm(const char *path, mode_t mode); static void write_pack_file(void) { - uint32_t i = 0, j; + uint32_t i, j; struct sha1file *f; off_t offset, offset_one, last_obj_offset = 0; struct pack_header hdr; int do_progress = progress >> pack_to_stdout; uint32_t nr_remaining = nr_result; + struct object_entry **sorted_by_type; if (do_progress) progress_state = start_progress("Writing objects", nr_result); written_list = xmalloc(nr_objects * sizeof(*written_list)); + sorted_by_type = xmalloc(nr_objects * sizeof(*sorted_by_type)); + for (i = 0; i < nr_objects; i++) + sorted_by_type[i] = objects + i; + qsort(sorted_by_type, nr_objects, sizeof(*sorted_by_type), sort_by_type); + i = 0; do { unsigned char sha1[20]; char *pack_tmp_name = NULL; @@ -625,7 +646,7 @@ static void write_pack_file(void) nr_written = 0; for (; i < nr_objects; i++) { last_obj_offset = offset; - offset_one = write_one(f, objects + i, offset); + offset_one = write_one(f, sorted_by_type[i], offset); if (!offset_one) break; offset = offset_one; @@ -681,6 +702,7 @@ static void write_pack_file(void) nr_remaining -= nr_written; } while (nr_remaining && i < nr_objects); + free(sorted_by_type); free(written_list); stop_progress(&progress_state); if (written != nr_result) ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-29 17:25 ` Nicolas Pitre @ 2007-11-29 17:48 ` Linus Torvalds 2007-11-29 18:52 ` Nicolas Pitre 2007-11-30 5:00 ` Junio C Hamano 2007-11-30 0:54 ` Jakub Narebski 1 sibling, 2 replies; 28+ messages in thread From: Linus Torvalds @ 2007-11-29 17:48 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Git Mailing List On Thu, 29 Nov 2007, Nicolas Pitre wrote: > > Well, see below for the patch that actually split the pack data into > objects of the same type. Doing that "git checkout" on the kernel tree > did improve things for me although not spectacularly. Umm. See my earlier numbers. For "git checkout" with cold cache, the *bulk* of the time is actually the ".gitignore" file lookups, so if you see a three-second improvement out of 17s, it may not look spectacular, but considering that probably 10s of those 17s were something *else* going on, I suspect that if you really did just a plain "git checkout", you actually *do* have a spectacular improvement of roughly 7s -> 4s! Try with time git read-tree -m -u HEAD HEAD > /dev/null instead. But if that is what you already did, then yeah, the performance improvement for cold-cache wasn't as big as I was hoping for. Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-29 17:48 ` Linus Torvalds @ 2007-11-29 18:52 ` Nicolas Pitre 2007-11-30 5:00 ` Junio C Hamano 1 sibling, 0 replies; 28+ messages in thread From: Nicolas Pitre @ 2007-11-29 18:52 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Thu, 29 Nov 2007, Linus Torvalds wrote: > > > On Thu, 29 Nov 2007, Nicolas Pitre wrote: > > > > Well, see below for the patch that actually split the pack data into > > objects of the same type. Doing that "git checkout" on the kernel tree > > did improve things for me although not spectacularly. > > Umm. See my earlier numbers. For "git checkout" with cold cache, the > *bulk* of the time is actually the ".gitignore" file lookups, so if you > see a three-second improvement out of 17s, it may not look spectacular, > but considering that probably 10s of those 17s were something *else* going > on, I suspect that if you really did just a plain "git checkout", you > actually *do* have a spectacular improvement of roughly 7s -> 4s! > > Try with > > time git read-tree -m -u HEAD HEAD > /dev/null > > instead. Oh! OK then. Current, cold cache: 5.248s Current, warm cache: 0.185s Patched, cold cache: 3.337s Patched, warm cache: 0.183s So yes, the improvement is more significant then, although the cold cache timings vary quite a lot between successive tries. Nicolas ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-29 17:48 ` Linus Torvalds 2007-11-29 18:52 ` Nicolas Pitre @ 2007-11-30 5:00 ` Junio C Hamano 2007-11-30 6:03 ` Linus Torvalds 1 sibling, 1 reply; 28+ messages in thread From: Junio C Hamano @ 2007-11-30 5:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > Umm. See my earlier numbers. For "git checkout" with cold cache, the > *bulk* of the time is actually the ".gitignore" file lookups, so if you > see a three-second improvement out of 17s, it may not look spectacular, > but considering that probably 10s of those 17s were something *else* going > on, I suspect that if you really did just a plain "git checkout", you > actually *do* have a spectacular improvement of roughly 7s -> 4s! I am hoping that "probably 10s of those 17s" can actually be measured with the patch I sent out last night. Has anybody took a look at it? Partitioning the pack data by object type shifts the tradeoffs from the current "the data in the same tree are mostly together, except commits are treated differently because rev walk is done quite often" layout. Because we do not ever look at blob objects while pruning the history (unless the -Spickaxe option is used, I think), partitioned layout would optimize ancestry walking even more than the current packfile layout. On the other hand, any operation that wants to look at the contents are penalized. A two-tree diff that inspects the contents (e.g. fuzzy renames and pickaxe) needs to read from the tree section to find which blob to compare with which other blob, and and then needs to seek to the blob section to actually read the contents, while the current layout tends to group both trees and blobs that belong to the same tree together. It is natural that blame is penalized by the new layout, mostly because it needs to grab two blobs to compare from parent-child pair, but also because it needs to find two-tree diffs for parent-child pair it traverses whenever it needs to follow across renames (that is, when it sees there is no corresponding path in the parent). I would expect to see similar slowdown from grep which wants to inspect blobs that are in the same tree. When I do archaeology, I think I often run blame first to see which change made the block of text into the current shape first, and then run a path limited "git log -p" either starting or ending at that revision. In that workflow, the initial blame may get slower with the new layout, but I suspect it would help by speeding up the latter "git log -p" step. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-30 5:00 ` Junio C Hamano @ 2007-11-30 6:03 ` Linus Torvalds 0 siblings, 0 replies; 28+ messages in thread From: Linus Torvalds @ 2007-11-30 6:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nicolas Pitre, Git Mailing List On Thu, 29 Nov 2007, Junio C Hamano wrote: > > I am hoping that "probably 10s of those 17s" can actually be measured > with the patch I sent out last night. Has anybody took a look at it? Sorry, I missed it. But I just did timings. Your patch helps git read-tree -m -u --exclude-per-directory=.gitignore HEAD HEAD timings enormously, and it's now down to 3s for me (which is the same speed as it is without any per-directory-excludes). That's a big improvement from the ~10s I see without your patch (I've repacked my tree, I have to admit that I don't even know if it's the new or the old older, but I can state that 7s for me was just those .gitignore files). Sadly, the full "git checkout" itself is not actually improved, due to the git update-index --refresh there, which will end up populating the whole directory cache anyway. I wonder why I didn't see that as the expensive operation when I timed "git checkout". Probably because I narrowed down on the "git read-tree" as the operation that actually accesses the pack-file and the object directory, while the "git update-index" never touches the actual objects. Anyway, I think your patch is great. It just doesn't help the full case of a "git checkout", only the read-tree portion of it ;( As to partitioning the data according to types: > When I do archaeology, I think I often run blame first to see which > change made the block of text into the current shape first, and then run > a path limited "git log -p" either starting or ending at that revision. > In that workflow, the initial blame may get slower with the new layout, > but I suspect it would help by speeding up the latter "git log -p" step. I really cannot convince myself one way or the other. I have a suspicion that sometimes it helps to have objects (regardless of type) close to each other, and sometimes it helps to have the trees packed densely. A lot of operations *do* work on both blobs and trees (a *raw* diff doesn't, but they are fairly rare), so this is not at all clear-cut like the commit case. So sorting the commits together is a no-brainer, since a lot of really important ops only look at them. But blobs and trees? The numbers certainly go both ways, and I suspect we are probably better off not messing with the sort order unless we have some unambiguous real results. Oh, well. I was hoping that I'd have a number of cases that showed good improvements, with perhaps the bulk of it not showing much difference at all. But while I saw the good improvements, the very first try at "git blame" also showed quite worse numbers, so I think we should consider it an interesting idea, but probably shelve it. Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-29 17:25 ` Nicolas Pitre 2007-11-29 17:48 ` Linus Torvalds @ 2007-11-30 0:54 ` Jakub Narebski 2007-11-30 2:21 ` Linus Torvalds 1 sibling, 1 reply; 28+ messages in thread From: Jakub Narebski @ 2007-11-30 0:54 UTC (permalink / raw) To: git <opublikowany i wysłany> [Cc: git@vger.kernel.org, Nicolas Pitre <nico@cam.org>, Linus Torvalds <torvalds@linux-foundation.org>] Nicolas Pitre wrote: > Well, see below for the patch that actually split the pack data into > objects of the same type. Doing that "git checkout" on the kernel tree > did improve things for me although not spectacularly. > +static int sort_by_type(const void *_a, const void *_b) > +{ > + const struct object_entry *a = *(struct object_entry **)_a; > + const struct object_entry *b = *(struct object_entry **)_b; > + > + /* > + * Preserve recency order for objects of the same type and reused deltas. > + */ > + if(a->type == OBJ_REF_DELTA || a->type == OBJ_OFS_DELTA || > + b->type == OBJ_REF_DELTA || b->type == OBJ_OFS_DELTA || > + a->type == b->type) > + return (a < b) ? -1 : 1; > + return a->type - b->type; > +} > + qsort(sorted_by_type, nr_objects, sizeof(*sorted_by_type), sort_by_type); Isn't there a better way to do this sorting? What is needed here is (stable) _bucket_ sort / _pigeonhole_ sort (or counting sort), which is O(n); quicksort is perhaps simpler to use, but I'm not sure if faster in this situation. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-30 0:54 ` Jakub Narebski @ 2007-11-30 2:21 ` Linus Torvalds 2007-11-30 2:39 ` Jakub Narebski ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Linus Torvalds @ 2007-11-30 2:21 UTC (permalink / raw) To: Jakub Narebski, Nicolas Pitre; +Cc: Git Mailing List On Fri, 30 Nov 2007, Jakub Narebski wrote: > > Isn't there a better way to do this sorting? What is needed here is > (stable) _bucket_ sort / _pigeonhole_ sort (or counting sort), which > is O(n); quicksort is perhaps simpler to use, but I'm not sure if > faster in this situation. Actually, I doubt you need to do any sorting at all: what would be easiest would be to simply change "traverse_commit_list()" to use different lists for different object types, and just output them in type order (semi-sane order choice: commits first, then tags, then trees, and finally blobs). Ta-daa! All done! Magic! No sorting required, because all the objects got output in the right order without any extra sort phase! Something like the appended (untested! lots of! exclamation marks! Really!) Linus --- list-objects.c | 36 ++++++++++++++++++++++-------------- 1 files changed, 22 insertions(+), 14 deletions(-) diff --git a/list-objects.c b/list-objects.c index 4ef58e7..a046f37 100644 --- a/list-objects.c +++ b/list-objects.c @@ -49,7 +49,6 @@ static void process_blob(struct rev_info *revs, */ static void process_gitlink(struct rev_info *revs, const unsigned char *sha1, - struct object_array *p, struct name_path *path, const char *name) { @@ -58,7 +57,8 @@ static void process_gitlink(struct rev_info *revs, static void process_tree(struct rev_info *revs, struct tree *tree, - struct object_array *p, + struct object_array *trees, + struct object_array *blobs, struct name_path *path, const char *name) { @@ -75,7 +75,7 @@ static void process_tree(struct rev_info *revs, die("bad tree object %s", sha1_to_hex(obj->sha1)); obj->flags |= SEEN; name = xstrdup(name); - add_object(obj, p, path, name); + add_object(obj, trees, path, name); me.up = path; me.elem = name; me.elem_len = strlen(name); @@ -86,14 +86,14 @@ static void process_tree(struct rev_info *revs, if (S_ISDIR(entry.mode)) process_tree(revs, lookup_tree(entry.sha1), - p, &me, entry.path); + trees, blobs, &me, entry.path); else if (S_ISGITLINK(entry.mode)) process_gitlink(revs, entry.sha1, - p, &me, entry.path); + &me, entry.path); else process_blob(revs, lookup_blob(entry.sha1), - p, &me, entry.path); + blobs, &me, entry.path); } free(tree->buffer); tree->buffer = NULL; @@ -138,10 +138,12 @@ void traverse_commit_list(struct rev_info *revs, { int i; struct commit *commit; - struct object_array objects = { 0, 0, NULL }; + struct object_array tags = { 0, 0, NULL }; + struct object_array trees = { 0, 0, NULL }; + struct object_array blobs = { 0, 0, NULL }; while ((commit = get_revision(revs)) != NULL) { - process_tree(revs, commit->tree, &objects, NULL, ""); + process_tree(revs, commit->tree, &trees, &blobs, NULL, ""); show_commit(commit); } for (i = 0; i < revs->pending.nr; i++) { @@ -152,25 +154,31 @@ void traverse_commit_list(struct rev_info *revs, continue; if (obj->type == OBJ_TAG) { obj->flags |= SEEN; - add_object_array(obj, name, &objects); + add_object_array(obj, name, &tags); continue; } if (obj->type == OBJ_TREE) { - process_tree(revs, (struct tree *)obj, &objects, + process_tree(revs, (struct tree *)obj, &trees, &blobs, NULL, name); continue; } if (obj->type == OBJ_BLOB) { - process_blob(revs, (struct blob *)obj, &objects, + process_blob(revs, (struct blob *)obj, &blobs, NULL, name); continue; } die("unknown pending object %s (%s)", sha1_to_hex(obj->sha1), name); } - for (i = 0; i < objects.nr; i++) - show_object(&objects.objects[i]); - free(objects.objects); + for (i = 0; i < tags.nr; i++) + show_object(&tags.objects[i]); + for (i = 0; i < trees.nr; i++) + show_object(&trees.objects[i]); + for (i = 0; i < blobs.nr; i++) + show_object(&blobs.objects[i]); + free(tags.objects); + free(trees.objects); + free(blobs.objects); if (revs->pending.nr) { free(revs->pending.objects); revs->pending.nr = 0; ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-30 2:21 ` Linus Torvalds @ 2007-11-30 2:39 ` Jakub Narebski 2007-11-30 2:40 ` Nicolas Pitre 2007-11-30 2:54 ` Linus Torvalds 2 siblings, 0 replies; 28+ messages in thread From: Jakub Narebski @ 2007-11-30 2:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, Git Mailing List On Fri, 30 Nov 2007, Linus Torvalds wrote: > > On Fri, 30 Nov 2007, Jakub Narebski wrote: >> >> Isn't there a better way to do this sorting? What is needed here is >> (stable) _bucket_ sort / _pigeonhole_ sort (or counting sort), which >> is O(n); quicksort is perhaps simpler to use, but I'm not sure if >> faster in this situation. > > Actually, I doubt you need to do any sorting at all: what would be easiest > would be to simply change "traverse_commit_list()" to use different lists > for different object types, and just output them in type order (semi-sane > order choice: commits first, then tags, then trees, and finally blobs). > > Ta-daa! All done! Magic! No sorting required, because all the objects got > output in the right order without any extra sort phase! Actually this algorithm has the fancy name of "pigeonhole sort" algorithm, and is a subcase (special case) of bucket sort. Well, sort of, as there is no final sorted list, only output in "sorted" order. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-30 2:21 ` Linus Torvalds 2007-11-30 2:39 ` Jakub Narebski @ 2007-11-30 2:40 ` Nicolas Pitre 2007-11-30 6:11 ` Steffen Prohaska 2007-11-30 2:54 ` Linus Torvalds 2 siblings, 1 reply; 28+ messages in thread From: Nicolas Pitre @ 2007-11-30 2:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, Git Mailing List On Thu, 29 Nov 2007, Linus Torvalds wrote: > On Fri, 30 Nov 2007, Jakub Narebski wrote: > > > > Isn't there a better way to do this sorting? What is needed here is > > (stable) _bucket_ sort / _pigeonhole_ sort (or counting sort), which > > is O(n); quicksort is perhaps simpler to use, but I'm not sure if > > faster in this situation. That particular sort takes under a second here with the Linux repo. Pretty insignificant compared to the time required to repack. > Actually, I doubt you need to do any sorting at all: what would be easiest > would be to simply change "traverse_commit_list()" to use different lists > for different object types, and just output them in type order (semi-sane > order choice: commits first, then tags, then trees, and finally blobs). Yes! That's what I thought initially, but since list-objects.c is completely unknown territory to me, I sorted them in pack-object.c instead, out of pure laziness. Nicolas ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-30 2:40 ` Nicolas Pitre @ 2007-11-30 6:11 ` Steffen Prohaska 2007-12-07 13:35 ` Mike Ralphson 0 siblings, 1 reply; 28+ messages in thread From: Steffen Prohaska @ 2007-11-30 6:11 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Linus Torvalds, Jakub Narebski, Git Mailing List On Nov 30, 2007, at 3:40 AM, Nicolas Pitre wrote: > On Thu, 29 Nov 2007, Linus Torvalds wrote: > >> On Fri, 30 Nov 2007, Jakub Narebski wrote: >>> >>> Isn't there a better way to do this sorting? What is needed here is >>> (stable) _bucket_ sort / _pigeonhole_ sort (or counting sort), which >>> is O(n); quicksort is perhaps simpler to use, but I'm not sure if >>> faster in this situation. > > That particular sort takes under a second here with the Linux repo. > Pretty insignificant compared to the time required to repack. Brian Downing measured horrid performance of qsort on Windows 2000 [1]. qsort seems to show worst case behaviour. This resulted in a patch replacing Window's qsort implementation for the mingw port [2]. [1] http://thread.gmane.org/gmane.comp.version-control.msysgit/1084 [2] http://thread.gmane.org/gmane.comp.version-control.msysgit/1086 Avoiding qsort would even be better. I'm not sure, though, if the particular qsort call that triggered the current discussion, is the very same qsort call that Brian was hit by. I'm only claiming that in general avoiding qsort on Windows is a good idea. Steffen ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-30 6:11 ` Steffen Prohaska @ 2007-12-07 13:35 ` Mike Ralphson 2007-12-07 13:49 ` Johannes Schindelin 0 siblings, 1 reply; 28+ messages in thread From: Mike Ralphson @ 2007-12-07 13:35 UTC (permalink / raw) To: Steffen Prohaska, Junio C Hamano Cc: Nicolas Pitre, Linus Torvalds, Jakub Narebski, Git Mailing List On Nov 30, 2007 6:11 AM, Steffen Prohaska <prohaska@zib.de> wrote: > Brian Downing measured horrid performance of qsort on Windows > 2000 [1]. qsort seems to show worst case behaviour. > > This resulted in a patch replacing Window's qsort implementation > for the mingw port [2]. > > [1] http://thread.gmane.org/gmane.comp.version-control.msysgit/1084 > [2] http://thread.gmane.org/gmane.comp.version-control.msysgit/1086 > > > Avoiding qsort would even be better. I'm not sure, though, > if the particular qsort call that triggered the current > discussion, is the very same qsort call that Brian was hit by. > I'm only claiming that in general avoiding qsort on Windows > is a good idea. This is a vote for pulling this into mainline git. AIX (at least 5.3) also has the horrible worst-case performance of the libc qsort on sorted or near-sorted lists (such as those provided by some filesystems, or a directory which has been rsynced). Some versions of Solaris have the same problem [1] I benchmarked 3 alternative qsorts, qsortG [2] was the fastest on my system but has funky licensing, the NetBSD qsort was middle-range and the glibc one the slowest of the three (but that could be due to it being tuned for a "Sun 4/260"). All of them show over 100x speed improvements on a git-status of my main repo (104s -> ~0.7s) I like the idea of a BROKEN_QSORT make variable. I was trying to come up with a patch which would discover the problem in a performance/regression test and suggest the setting, but have had insufficient free time so far. Cheers, Mike [1] http://bugs.opensolaris.org/view_bug.do?bug_id=1258570 [2] http://www.mccaughan.org.uk/g/software.html#qsort ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-12-07 13:35 ` Mike Ralphson @ 2007-12-07 13:49 ` Johannes Schindelin 2007-12-07 16:07 ` Linus Torvalds 2007-12-07 16:09 ` Mike Ralphson 0 siblings, 2 replies; 28+ messages in thread From: Johannes Schindelin @ 2007-12-07 13:49 UTC (permalink / raw) To: Mike Ralphson Cc: Steffen Prohaska, Junio C Hamano, Nicolas Pitre, Linus Torvalds, Jakub Narebski, Git Mailing List Hi, On Fri, 7 Dec 2007, Mike Ralphson wrote: > I benchmarked 3 alternative qsorts, qsortG [2] was the fastest on my > system but has funky licensing, the NetBSD qsort was middle-range and > the glibc one the slowest of the three (but that could be due to it > being tuned for a "Sun 4/260"). All of them show over 100x speed > improvements on a git-status of my main repo (104s -> ~0.7s) How is "You may use it in anything you like;" funky licensing? It is effectively public domain. BTW if you need a starting point (easing on your time constraints): http://repo.or.cz/w/git/mingw/4msysgit.git?a=commitdiff;h=bba554dd0114dc436cfdd3f17edc836bbaf3d95f Ciao, Dscho ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-12-07 13:49 ` Johannes Schindelin @ 2007-12-07 16:07 ` Linus Torvalds 2007-12-07 16:09 ` Mike Ralphson 1 sibling, 0 replies; 28+ messages in thread From: Linus Torvalds @ 2007-12-07 16:07 UTC (permalink / raw) To: Johannes Schindelin Cc: Mike Ralphson, Steffen Prohaska, Junio C Hamano, Nicolas Pitre, Jakub Narebski, Git Mailing List On Fri, 7 Dec 2007, Johannes Schindelin wrote: > > How is "You may use it in anything you like;" funky licensing? It is > effectively public domain. No, it has a long list of requirements that my not be onerous, but they aren't compatible with GPL (ie they require that you make changes in certain ways). That said, if somebody wants to use that qsort, the thing to do is to ask Gareth for permission, maybe he just says "sure". For example, for git, you might as well remove the whole unaligned case, and quite frankly, that #ifdef DEBUG_QSORT is some of the ugliest I've ever seen and should be cleaned up (why didn't he just use a "dbg_printf()" macro like everybody else? Even if it requires double parenthesis for ols-style C portability, it's cleaner than what is there now). Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-12-07 13:49 ` Johannes Schindelin 2007-12-07 16:07 ` Linus Torvalds @ 2007-12-07 16:09 ` Mike Ralphson 2007-12-07 18:37 ` Johannes Schindelin 1 sibling, 1 reply; 28+ messages in thread From: Mike Ralphson @ 2007-12-07 16:09 UTC (permalink / raw) To: Johannes Schindelin Cc: Steffen Prohaska, Junio C Hamano, Nicolas Pitre, Linus Torvalds, Jakub Narebski, Git Mailing List On Dec 7, 2007 1:49 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > On Fri, 7 Dec 2007, Mike Ralphson wrote: > > > I benchmarked 3 alternative qsorts, qsortG [2] was the fastest on my > > system but has funky licensing, the NetBSD qsort was middle-range and > > the glibc one the slowest of the three (but that could be due to it > > being tuned for a "Sun 4/260"). All of them show over 100x speed > > improvements on a git-status of my main repo (104s -> ~0.7s) > > How is "You may use it in anything you like;" funky licensing? It is > effectively public domain. I did ask what the git licensing policy was (GPL2 or GPL2-compatible) but got no response. The author's wishes state: * This code may be reproduced freely provided * - this file is retained unaltered apart from minor * changes for portability and efficiency * - no changes are made to this comment * - any changes that *are* made are clearly flagged * - the _ID string below is altered by inserting, after * the date, the string " altered" followed at your option * by other material. (Exceptions: you may change the name * of the exported routine without changing the ID string. * You may change the values of the macros TRUNC_* and * PIVOT_THRESHOLD without changing the ID string, provided * they remain constants with TRUNC_nonaligned, TRUNC_aligned * and TRUNC_words/WORD_BYTES between 8 and 24, and * PIVOT_THRESHOLD between 32 and 200.) and they should be respected. "retained unaltered apart from" sounds a little bit more restrictive than we might like. I haven't pinged him about relicensing though. [Edit, I see Linus has just made those points]. > BTW if you need a starting point (easing on your time constraints): > http://repo.or.cz/w/git/mingw/4msysgit.git?a=commitdiff;h=bba554dd0114dc436cfdd3f17edc836bbaf3d95f Many thanks, the gmane links I bookmarked are just complaining about wefts producing no weaves or somesuch, so that's helpful. I'll try the mergesort and report back. Cheers, Mike ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-12-07 16:09 ` Mike Ralphson @ 2007-12-07 18:37 ` Johannes Schindelin 2007-12-07 19:15 ` Mike Ralphson 0 siblings, 1 reply; 28+ messages in thread From: Johannes Schindelin @ 2007-12-07 18:37 UTC (permalink / raw) To: Mike Ralphson Cc: Steffen Prohaska, Junio C Hamano, Nicolas Pitre, Linus Torvalds, Jakub Narebski, Git Mailing List Hi, On Fri, 7 Dec 2007, Mike Ralphson wrote: > On Dec 7, 2007 1:49 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > On Fri, 7 Dec 2007, Mike Ralphson wrote: > > > > > I benchmarked 3 alternative qsorts, qsortG [2] was the fastest on my > > > system but has funky licensing, the NetBSD qsort was middle-range > > > and the glibc one the slowest of the three (but that could be due to > > > it being tuned for a "Sun 4/260"). All of them show over 100x speed > > > improvements on a git-status of my main repo (104s -> ~0.7s) > > > > How is "You may use it in anything you like;" funky licensing? It is > > effectively public domain. > > I did ask what the git licensing policy was (GPL2 or GPL2-compatible) > but got no response. The author's wishes state: > > * This code may be reproduced freely provided > [long list] Okay, sorry, I did not bother reading further when I read "You may use it in anything you like;". But if the author did not respond, it might be a better idea to just reimplement it. Ciao, Dscho ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-12-07 18:37 ` Johannes Schindelin @ 2007-12-07 19:15 ` Mike Ralphson 2007-12-08 11:05 ` Johannes Schindelin 0 siblings, 1 reply; 28+ messages in thread From: Mike Ralphson @ 2007-12-07 19:15 UTC (permalink / raw) To: Johannes Schindelin Cc: Steffen Prohaska, Junio C Hamano, Nicolas Pitre, Linus Torvalds, Jakub Narebski, Git Mailing List On Dec 7, 2007 6:37 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > On Fri, 7 Dec 2007, Mike Ralphson wrote: > > > On Dec 7, 2007 1:49 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > > On Fri, 7 Dec 2007, Mike Ralphson wrote: > > > > > > > I benchmarked 3 alternative qsorts, qsortG [2] was the fastest on my > > > > system but has funky licensing, the NetBSD qsort was middle-range > > > > and the glibc one the slowest of the three (but that could be due to > > > > it being tuned for a "Sun 4/260"). All of them show over 100x speed > > > > improvements on a git-status of my main repo (104s -> ~0.7s) > > > > > Okay, sorry, I did not bother reading further when I read "You may use it > in anything you like;". > > But if the author did not respond, it might be a better idea to just > reimplement it. > I've just tried the mergesort implementation as used in msysgit and that performs faster for me. It's simpler, and compatibly licensed. It looks good. Mike ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-12-07 19:15 ` Mike Ralphson @ 2007-12-08 11:05 ` Johannes Schindelin 2007-12-08 23:04 ` Brian Downing 0 siblings, 1 reply; 28+ messages in thread From: Johannes Schindelin @ 2007-12-08 11:05 UTC (permalink / raw) To: Mike Ralphson Cc: Steffen Prohaska, Junio C Hamano, Nicolas Pitre, Linus Torvalds, Jakub Narebski, Git Mailing List Hi, On Fri, 7 Dec 2007, Mike Ralphson wrote: > On Dec 7, 2007 6:37 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > On Fri, 7 Dec 2007, Mike Ralphson wrote: > > > > > On Dec 7, 2007 1:49 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > > > On Fri, 7 Dec 2007, Mike Ralphson wrote: > > > > > > > > > I benchmarked 3 alternative qsorts, qsortG [2] was the fastest > > > > > on my system but has funky licensing, the NetBSD qsort was > > > > > middle-range and the glibc one the slowest of the three (but > > > > > that could be due to it being tuned for a "Sun 4/260"). All of > > > > > them show over 100x speed improvements on a git-status of my > > > > > main repo (104s -> ~0.7s) > > > > > > > > Okay, sorry, I did not bother reading further when I read "You may use > > it in anything you like;". > > > > But if the author did not respond, it might be a better idea to just > > reimplement it. > > > > I've just tried the mergesort implementation as used in msysgit and that > performs faster for me. It's simpler, and compatibly licensed. It looks > good. Now I'm confused. You said you tested qsortG, NetBSD qsort and qlibc, with glibc performing the slowest. Now, 4msysgit's implementation is based on glibc (Thanks Brian!), so I wonder if you could redo the performance tests and say if qsortG still is substantially faster than 4msysgit's qsort? Ciao, Dscho ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-12-08 11:05 ` Johannes Schindelin @ 2007-12-08 23:04 ` Brian Downing 0 siblings, 0 replies; 28+ messages in thread From: Brian Downing @ 2007-12-08 23:04 UTC (permalink / raw) To: Johannes Schindelin Cc: Mike Ralphson, Steffen Prohaska, Junio C Hamano, Nicolas Pitre, Linus Torvalds, Jakub Narebski, Git Mailing List On Sat, Dec 08, 2007 at 11:05:35AM +0000, Johannes Schindelin wrote: > On Fri, 7 Dec 2007, Mike Ralphson wrote: > > I've just tried the mergesort implementation as used in msysgit and that > > performs faster for me. It's simpler, and compatibly licensed. It looks > > good. > > Now I'm confused. You said you tested qsortG, NetBSD qsort and qlibc, > with glibc performing the slowest. Now, 4msysgit's implementation is > based on glibc (Thanks Brian!), so I wonder if you could redo the > performance tests and say if qsortG still is substantially faster than > 4msysgit's qsort? This is just me guessing, but when he said: > I benchmarked 3 alternative qsorts, qsortG [2] was the fastest on my > system but has funky licensing, the NetBSD qsort was middle-range and > the glibc one the slowest of the three (but that could be due to it > being tuned for a "Sun 4/260"). All of them show over 100x speed > improvements on a git-status of my main repo (104s -> ~0.7s) It's possible he tried glibc's actual quicksort implementation, rather than their "qsort." Their qsort basically has the following behavior: if size < 1024 mergesort with temporary array on stack if allocating size bytes would likely cause swapping quicksort in place else mergesort with temporary array in heap I removed the "quicksort in place" possibility, as it would have added another sort algorithm and I had no way to easily determine whether "allocating size bytes would likely cause swapping." -bcd ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-30 2:21 ` Linus Torvalds 2007-11-30 2:39 ` Jakub Narebski 2007-11-30 2:40 ` Nicolas Pitre @ 2007-11-30 2:54 ` Linus Torvalds 2007-12-05 1:04 ` Federico Mena Quintero 2 siblings, 1 reply; 28+ messages in thread From: Linus Torvalds @ 2007-11-30 2:54 UTC (permalink / raw) To: Jakub Narebski, Nicolas Pitre; +Cc: Git Mailing List On Thu, 29 Nov 2007, Linus Torvalds wrote: > > Something like the appended (untested! Ok, tested now. It does seem to work. The page fault trace for the pack-file shows that we now get basically perfect IO patterns for my "git checkout" testcase, and while I'm not sure that's necessarily a test-case that really deserves this kind of attention, it's certainly potentially interesting. To check the performance impact of this, though, you'd need to pack the same repository two different ways - with this kind of sorting change and without - and then test different cold-cache timings for things like "git blame" etc that might care. The timing of the commands itself could be done with either a pre-change or post-change version of git, it's only the resulting order in the pack-file that matters. My very unscientific tests says that "git read-tree" is speed up by the change (from 5.2s to 3.3s, so it's quite noticeable), but "git blame" slows down (from 8.7s to 12.9s, so that's quite noticeable too). But as Jakub pointed out, the cold-cache numbers do fluctuate a lot, and while they were reasonably stable over runs, the "git blame" numbers in particular probably depend a fair amount on whether the file is commonly changed or not. Anybody interested in trying to do something more scientific? Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-30 2:54 ` Linus Torvalds @ 2007-12-05 1:04 ` Federico Mena Quintero 0 siblings, 0 replies; 28+ messages in thread From: Federico Mena Quintero @ 2007-12-05 1:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, Nicolas Pitre, Git Mailing List On Thu, 2007-11-29 at 18:54 -0800, Linus Torvalds wrote: > Jakub pointed out, the cold-cache numbers do fluctuate a lot, and while You may want to try iogrind: http://live.gnome.org/iogrind It's a valgrind skin to record I/O operations (including "implicit" ones like touching mmap()ed pages), plus a graphical tool to visualize the logs, similar to kcachegrind. Federico ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-29 3:14 ` Linus Torvalds 2007-11-29 3:59 ` Nicolas Pitre @ 2007-12-01 11:36 ` Joachim B Haga 2007-12-01 17:19 ` Linus Torvalds 1 sibling, 1 reply; 28+ messages in thread From: Joachim B Haga @ 2007-12-01 11:36 UTC (permalink / raw) To: git Linus Torvalds <torvalds@linux-foundation.org> writes: > The pack-files (both index and data) are accessed somewhat randomly, but > there is still enough locality that doing read-ahead and clustering really > does help. They are dense enough that slurping them in whole is 20% faster, at least here. And much less noisy! These are both cache-cold tests. $ time git read-tree -m -u HEAD HEAD real 0m9.255s user 0m0.832s sys 0m0.196s $ time (cat .git/objects/pack/* .git/index >/dev/null; git read-tree -m -u HEAD HEAD) real 0m7.141s user 0m0.936s sys 0m1.912s Now, I don't know how useful this is since git doesn't know if the data are cached. Is it perhaps possible to give a hint to the readahead logic that it should try to read as far as possible? -j. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-12-01 11:36 ` Joachim B Haga @ 2007-12-01 17:19 ` Linus Torvalds 0 siblings, 0 replies; 28+ messages in thread From: Linus Torvalds @ 2007-12-01 17:19 UTC (permalink / raw) To: Joachim B Haga; +Cc: git On Sat, 1 Dec 2007, Joachim B Haga wrote: > > Linus Torvalds <torvalds@linux-foundation.org> writes: > > The pack-files (both index and data) are accessed somewhat randomly, but > > there is still enough locality that doing read-ahead and clustering really > > does help. > > They are dense enough that slurping them in whole is 20% faster, at > least here. And much less noisy! These are both cache-cold tests. With BK, I used to have a "readahead" script to something close to this. The problem with that approach is that it works wonderfully well for people who (a) have tons of memory and (b) really only care about the source tree and almost nothing else, but it doesn't work that well at all for others. So yes, for me, forcing a page-in of all the data is actually worth it. I commonly do something like git grep quieuiueriueirue & on my main machine when I reboot it for testing - just to bring in the working tree into cache, so that subsequent "git diff" and "git grep" operations will be faster. > $ time git read-tree -m -u HEAD HEAD > > real 0m9.255s > user 0m0.832s > sys 0m0.196s > > $ time (cat .git/objects/pack/* .git/index >/dev/null; git read-tree -m -u HEAD HEAD) > > real 0m7.141s > user 0m0.936s > sys 0m1.912s > > Now, I don't know how useful this is since git doesn't know if the > data are cached. Is it perhaps possible to give a hint to the > readahead logic that it should try to read as far as possible? You have a much faster disk drive than I do on that slow laptop that I wanted to optimize for. I get [torvalds@hp linux]$ time git read-tree -m -u HEAD HEAD real 0m12.849s user 0m0.232s sys 0m0.124s for the cold-cache case, but then for populating the whole thing: time cat .git/objects/pack/* .git/index >/dev/null real 0m31.350s user 0m0.040s sys 0m0.468s whoops. Can you say "pitiful"? (In contrast, my desktop does the same it in seven seconds - laptop disks really are *much* slower than a reasonable desktop one). Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Some git performance measurements.. 2007-11-29 2:49 Some git performance measurements Linus Torvalds 2007-11-29 3:14 ` Linus Torvalds @ 2007-11-29 5:17 ` Junio C Hamano 2007-11-29 10:17 ` [PATCH] per-directory-exclude: lazily read .gitignore files Junio C Hamano 1 sibling, 1 reply; 28+ messages in thread From: Junio C Hamano @ 2007-11-29 5:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > Less than a hundredth of a second may not sound much, but when we have > 1700+ directories in the kernel trees, doing that for each possible > .gitignore file is really really expensive! The only thing that wants to use excluded() in the unpack_trees() codepath is the code to allow overwriting an existing, untracked file in verify_absent(). When we find an untracked file at the same path as we are just trying to check out a file, we allow overwriting it only if that file is "ignored". But the way the unpack_trees_rec() and the gitignore handling in dir.c are structured currently means we do push/pop exclude-per-directory stack as we enter and leave a new subdirectory. We do not do this lazily on demand. The newer gitattributes subsystem maintains a similar per-directory data structure but this is purely done on-demand; until somebody asks "what are the attrs for this path", we do not read .gitattributes file. We should be able to restructure exclude-per-directory code in a similar way. ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] per-directory-exclude: lazily read .gitignore files 2007-11-29 5:17 ` Junio C Hamano @ 2007-11-29 10:17 ` Junio C Hamano 0 siblings, 0 replies; 28+ messages in thread From: Junio C Hamano @ 2007-11-29 10:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Operations that walk directories or trees, which potentially need to consult the .gitignore files, used to always try to open the .gitignore file every time they entered a new directory, even when they ended up not needing to call excluded() function to see if a path in the directory is ignored. This changes the directory walking API to remove the need to call these two functions. Instead, the directory walk data structure caches the data used by excluded() function the last time, and lazily reuses it as much as possible. Among the data the last check used, the ones from deeper directories that the path we are checking is outside are discarded, data from the common leading directories are reused, and then the directories between the common directory and the directory the path being checked is in are checked for .gitignore file. This is very similar to the way gitattributes are handled. This API change also fixes "ls-files -c -i", which called excluded() without setting up the gitignore data via the old push/pop functions. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Junio C Hamano <gitster@pobox.com> writes: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> Less than a hundredth of a second may not sound much, but when we have >> 1700+ directories in the kernel trees, doing that for each possible >> .gitignore file is really really expensive! > > The only thing that wants to use excluded() in the unpack_trees() > codepath is the code to allow overwriting an existing, untracked file in > verify_absent(). When we find an untracked file at the same path as we > are just trying to check out a file, we allow overwriting it only if > that file is "ignored". > ... > The newer gitattributes subsystem maintains a similar per-directory data > structure but this is purely done on-demand; until somebody asks "what > are the attrs for this path", we do not read .gitattributes file. We > should be able to restructure exclude-per-directory code in a similar > way. I haven't seriously benched this, other than the same read-tree test you mentioned. With the good locality the directory walking (both by read_directory() aka ls-files and unpack_trees() aka read-tree) tends to have, checks for paths from the same directory come next to each other, and that seems to help. dir.c | 103 +++++++++++++++++++++++++++++--------------------------- dir.h | 32 ++++++++++------- unpack-trees.c | 6 --- 3 files changed, 72 insertions(+), 69 deletions(-) diff --git a/dir.c b/dir.c index 7c97483..d448902 100644 --- a/dir.c +++ b/dir.c @@ -151,6 +151,7 @@ void add_exclude(const char *string, const char *base, static int add_excludes_from_file_1(const char *fname, const char *base, int baselen, + char **buf_p, struct exclude_list *which) { struct stat st; @@ -171,6 +172,8 @@ static int add_excludes_from_file_1(const char *fname, goto err; close(fd); + if (buf_p) + *buf_p = buf; buf[size++] = '\n'; entry = buf; for (i = 0; i < size; i++) { @@ -192,31 +195,63 @@ static int add_excludes_from_file_1(const char *fname, void add_excludes_from_file(struct dir_struct *dir, const char *fname) { - if (add_excludes_from_file_1(fname, "", 0, + if (add_excludes_from_file_1(fname, "", 0, NULL, &dir->exclude_list[EXC_FILE]) < 0) die("cannot use %s as an exclude file", fname); } -int push_exclude_per_directory(struct dir_struct *dir, const char *base, int baselen) +static void prep_exclude(struct dir_struct *dir, const char *base, int baselen) { - char exclude_file[PATH_MAX]; - struct exclude_list *el = &dir->exclude_list[EXC_DIRS]; - int current_nr = el->nr; - - if (dir->exclude_per_dir) { - memcpy(exclude_file, base, baselen); - strcpy(exclude_file + baselen, dir->exclude_per_dir); - add_excludes_from_file_1(exclude_file, base, baselen, el); + struct exclude_list *el; + struct exclude_stack *stk = NULL; + int current; + + if ((!dir->exclude_per_dir) || + (baselen + strlen(dir->exclude_per_dir) >= PATH_MAX)) + return; /* too long a path -- ignore */ + + /* Pop the ones that are not the prefix of the path being checked. */ + el = &dir->exclude_list[EXC_DIRS]; + while ((stk = dir->exclude_stack) != NULL) { + if (stk->baselen <= baselen && + !strncmp(dir->basebuf, base, stk->baselen)) + break; + dir->exclude_stack = stk->prev; + while (stk->exclude_ix < el->nr) + free(el->excludes[--el->nr]); + free(stk->filebuf); + free(stk); } - return current_nr; -} -void pop_exclude_per_directory(struct dir_struct *dir, int stk) -{ - struct exclude_list *el = &dir->exclude_list[EXC_DIRS]; + /* Read from the parent directories and push them down. */ + current = stk ? stk->baselen : -1; + while (current < baselen) { + struct exclude_stack *stk = xcalloc(1, sizeof(*stk)); + const char *cp; - while (stk < el->nr) - free(el->excludes[--el->nr]); + if (current < 0) { + cp = base; + current = 0; + } + else { + cp = strchr(base + current + 1, '/'); + if (!cp) + die("oops in prep_exclude"); + cp++; + } + stk->prev = dir->exclude_stack; + stk->baselen = cp - base; + stk->exclude_ix = el->nr; + memcpy(dir->basebuf + current, base + current, + stk->baselen - current); + strcpy(dir->basebuf + stk->baselen, dir->exclude_per_dir); + add_excludes_from_file_1(dir->basebuf, + dir->basebuf, stk->baselen, + &stk->filebuf, el); + dir->exclude_stack = stk; + current = stk->baselen; + } + dir->basebuf[baselen] = '\0'; } /* Scan the list and let the last match determines the fate. @@ -283,6 +318,7 @@ int excluded(struct dir_struct *dir, const char *pathname) const char *basename = strrchr(pathname, '/'); basename = (basename) ? basename+1 : pathname; + prep_exclude(dir, pathname, basename-pathname); for (st = EXC_CMDL; st <= EXC_FILE; st++) { switch (excluded_1(pathname, pathlen, basename, &dir->exclude_list[st])) { case 0: @@ -500,13 +536,10 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co int contents = 0; if (fdir) { - int exclude_stk; struct dirent *de; char fullname[PATH_MAX + 1]; memcpy(fullname, base, baselen); - exclude_stk = push_exclude_per_directory(dir, base, baselen); - while ((de = readdir(fdir)) != NULL) { int len, dtype; int exclude; @@ -580,8 +613,6 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co } exit_early: closedir(fdir); - - pop_exclude_per_directory(dir, exclude_stk); } return contents; @@ -650,37 +681,9 @@ static void free_simplify(struct path_simplify *simplify) int read_directory(struct dir_struct *dir, const char *path, const char *base, int baselen, const char **pathspec) { struct path_simplify *simplify = create_simplify(pathspec); - char *pp = NULL; - - /* - * Make sure to do the per-directory exclude for all the - * directories leading up to our base. - */ - if (baselen) { - if (dir->exclude_per_dir) { - char *p; - pp = xmalloc(baselen+1); - memcpy(pp, base, baselen+1); - p = pp; - while (1) { - char save = *p; - *p = 0; - push_exclude_per_directory(dir, pp, p-pp); - *p++ = save; - if (!save) - break; - p = strchr(p, '/'); - if (p) - p++; - else - p = pp + baselen; - } - } - } read_directory_recursive(dir, path, base, baselen, 0, simplify); free_simplify(simplify); - free(pp); qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name); qsort(dir->ignored, dir->ignored_nr, sizeof(struct dir_entry *), cmp_name); return dir->nr; diff --git a/dir.h b/dir.h index 82009dc..d8814dc 100644 --- a/dir.h +++ b/dir.h @@ -1,17 +1,6 @@ #ifndef DIR_H #define DIR_H -/* - * We maintain three exclude pattern lists: - * EXC_CMDL lists patterns explicitly given on the command line. - * EXC_DIRS lists patterns obtained from per-directory ignore files. - * EXC_FILE lists patterns from fallback ignore files. - */ -#define EXC_CMDL 0 -#define EXC_DIRS 1 -#define EXC_FILE 2 - - struct dir_entry { unsigned int len; char name[FLEX_ARRAY]; /* more */ @@ -34,6 +23,13 @@ struct exclude_list { } **excludes; }; +struct exclude_stack { + struct exclude_stack *prev; + char *filebuf; + int baselen; + int exclude_ix; +}; + struct dir_struct { int nr, alloc; int ignored_nr, ignored_alloc; @@ -48,6 +44,18 @@ struct dir_struct { /* Exclude info */ const char *exclude_per_dir; struct exclude_list exclude_list[3]; + /* + * We maintain three exclude pattern lists: + * EXC_CMDL lists patterns explicitly given on the command line. + * EXC_DIRS lists patterns obtained from per-directory ignore files. + * EXC_FILE lists patterns from fallback ignore files. + */ +#define EXC_CMDL 0 +#define EXC_DIRS 1 +#define EXC_FILE 2 + + struct exclude_stack *exclude_stack; + char basebuf[PATH_MAX]; }; extern int common_prefix(const char **pathspec); @@ -58,8 +66,6 @@ extern int common_prefix(const char **pathspec); extern int match_pathspec(const char **pathspec, const char *name, int namelen, int prefix, char *seen); extern int read_directory(struct dir_struct *, const char *path, const char *base, int baselen, const char **pathspec); -extern int push_exclude_per_directory(struct dir_struct *, const char *, int); -extern void pop_exclude_per_directory(struct dir_struct *, int); extern int excluded(struct dir_struct *, const char *); extern void add_excludes_from_file(struct dir_struct *, const char *fname); diff --git a/unpack-trees.c b/unpack-trees.c index aea16ad..e9eb795 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -71,12 +71,8 @@ static int unpack_trees_rec(struct tree_entry_list **posns, int len, int remove; int baselen = strlen(base); int src_size = len + 1; - int i_stk = i_stk; int retval = 0; - if (o->dir) - i_stk = push_exclude_per_directory(o->dir, base, strlen(base)); - do { int i; const char *first; @@ -255,8 +251,6 @@ static int unpack_trees_rec(struct tree_entry_list **posns, int len, } while (1); leave_directory: - if (o->dir) - pop_exclude_per_directory(o->dir, i_stk); return retval; } -- 1.5.3.6.2064.g2e22f ^ permalink raw reply related [flat|nested] 28+ messages in thread
end of thread, other threads:[~2007-12-08 23:04 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-29 2:49 Some git performance measurements Linus Torvalds 2007-11-29 3:14 ` Linus Torvalds 2007-11-29 3:59 ` Nicolas Pitre 2007-11-29 4:32 ` Linus Torvalds 2007-11-29 17:25 ` Nicolas Pitre 2007-11-29 17:48 ` Linus Torvalds 2007-11-29 18:52 ` Nicolas Pitre 2007-11-30 5:00 ` Junio C Hamano 2007-11-30 6:03 ` Linus Torvalds 2007-11-30 0:54 ` Jakub Narebski 2007-11-30 2:21 ` Linus Torvalds 2007-11-30 2:39 ` Jakub Narebski 2007-11-30 2:40 ` Nicolas Pitre 2007-11-30 6:11 ` Steffen Prohaska 2007-12-07 13:35 ` Mike Ralphson 2007-12-07 13:49 ` Johannes Schindelin 2007-12-07 16:07 ` Linus Torvalds 2007-12-07 16:09 ` Mike Ralphson 2007-12-07 18:37 ` Johannes Schindelin 2007-12-07 19:15 ` Mike Ralphson 2007-12-08 11:05 ` Johannes Schindelin 2007-12-08 23:04 ` Brian Downing 2007-11-30 2:54 ` Linus Torvalds 2007-12-05 1:04 ` Federico Mena Quintero 2007-12-01 11:36 ` Joachim B Haga 2007-12-01 17:19 ` Linus Torvalds 2007-11-29 5:17 ` Junio C Hamano 2007-11-29 10:17 ` [PATCH] per-directory-exclude: lazily read .gitignore files Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).