* [PATCH v1] WIP unpack-trees: avoid duplicate ODB lookups during checkout @ 2017-04-06 20:37 git 2017-04-06 20:37 ` [PATCH v1] " git 0 siblings, 1 reply; 8+ messages in thread From: git @ 2017-04-06 20:37 UTC (permalink / raw) To: git; +Cc: gitster, peff, Jeff Hostetler From: Jeff Hostetler <jeffhost@microsoft.com> Avoid duplicate ODB lookups for trees during traverse_tree_recursive(). I'm marking this WIP so we can talk about the TODO in the commit message. Jeff Hostetler (1): unpack-trees: avoid duplicate ODB lookups during checkout tree-walk.c | 8 ++++++++ tree-walk.h | 1 + unpack-trees.c | 13 +++++++++---- 3 files changed, 18 insertions(+), 4 deletions(-) -- 2.9.3 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v1] unpack-trees: avoid duplicate ODB lookups during checkout 2017-04-06 20:37 [PATCH v1] WIP unpack-trees: avoid duplicate ODB lookups during checkout git @ 2017-04-06 20:37 ` git 2017-04-06 22:48 ` Stefan Beller 2017-04-07 0:32 ` René Scharfe 0 siblings, 2 replies; 8+ messages in thread From: git @ 2017-04-06 20:37 UTC (permalink / raw) To: git; +Cc: gitster, peff, Jeff Hostetler From: Jeff Hostetler <jeffhost@microsoft.com> Teach traverse_trees_recursive() to not do redundant ODB lookups when both directories refer to the same OID. In operations such as read-tree, checkout, and merge when the differences between the commits are relatively small, there will likely be many directories that have the same SHA-1. In these cases we can avoid hitting the ODB multiple times for the same SHA-1. TODO This change is a first attempt to test that by comparing TODO the hashes of name[i] and name[i-i] and simply copying TODO the tree-descriptor data. I was thinking of the n=2 TODO case here. We may want to extend this to the n=3 case. ================ On the Windows repo (500K trees, 3.1M files, 450MB index), this reduced the overall time by 0.75 seconds when cycling between 2 commits with a single file difference. (avg) before: 22.699 (avg) after: 21.955 =============== ================ Using the p0004-read-tree test (posted earlier this week) with 1M files on Linux: before: $ ./p0004-read-tree.sh 0004.5: switch work1 work2 (1003037) 11.99(8.12+3.32) 0004.6: switch commit aliases (1003037) 11.95(8.20+3.14) after: $ ./p0004-read-tree.sh 0004.5: switch work1 work2 (1003037) 11.17(7.84+2.76) 0004.6: switch commit aliases (1003037) 11.13(7.82+2.72) ================ Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> --- tree-walk.c | 8 ++++++++ tree-walk.h | 1 + unpack-trees.c | 13 +++++++++---- 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/tree-walk.c b/tree-walk.c index ff77605..3b82f0e 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -92,6 +92,14 @@ void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1) return buf; } +void *copy_tree_descriptor(struct tree_desc *dest, const struct tree_desc *src) +{ + void *buf = xmalloc(src->size); + memcpy(buf, src->buffer, src->size); + init_tree_desc(dest, buf, src->size); + return buf; +} + static void entry_clear(struct name_entry *a) { memset(a, 0, sizeof(*a)); diff --git a/tree-walk.h b/tree-walk.h index 68bb78b..ca4032b 100644 --- a/tree-walk.h +++ b/tree-walk.h @@ -43,6 +43,7 @@ int tree_entry(struct tree_desc *, struct name_entry *); int tree_entry_gently(struct tree_desc *, struct name_entry *); void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1); +void *copy_tree_descriptor(struct tree_desc *dest, const struct tree_desc *src); struct traverse_info; typedef int (*traverse_callback_t)(int n, unsigned long mask, unsigned long dirmask, struct name_entry *entry, struct traverse_info *); diff --git a/unpack-trees.c b/unpack-trees.c index 3a8ee19..50aacad 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -554,10 +554,15 @@ static int traverse_trees_recursive(int n, unsigned long dirmask, newinfo.df_conflicts |= df_conflicts; for (i = 0; i < n; i++, dirmask >>= 1) { - const unsigned char *sha1 = NULL; - if (dirmask & 1) - sha1 = names[i].oid->hash; - buf[i] = fill_tree_descriptor(t+i, sha1); + if (i > 0 && (dirmask & 1) && names[i].oid && names[i-1].oid && + !hashcmp(names[i].oid->hash, names[i-1].oid->hash)) { + buf[i] = copy_tree_descriptor(&t[i], &t[i-1]); + } else { + const unsigned char *sha1 = NULL; + if (dirmask & 1) + sha1 = names[i].oid->hash; + buf[i] = fill_tree_descriptor(t+i, sha1); + } } bottom = switch_cache_bottom(&newinfo); -- 2.9.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v1] unpack-trees: avoid duplicate ODB lookups during checkout 2017-04-06 20:37 ` [PATCH v1] " git @ 2017-04-06 22:48 ` Stefan Beller 2017-04-07 5:19 ` Jeff King 2017-04-07 17:35 ` Jeff Hostetler 2017-04-07 0:32 ` René Scharfe 1 sibling, 2 replies; 8+ messages in thread From: Stefan Beller @ 2017-04-06 22:48 UTC (permalink / raw) To: Jeff Hostetler Cc: git@vger.kernel.org, Junio C Hamano, Jeff King, Jeff Hostetler On Thu, Apr 6, 2017 at 1:37 PM, <git@jeffhostetler.com> wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Teach traverse_trees_recursive() to not do redundant ODB > lookups when both directories refer to the same OID. And the reason for this is that omitting the second lookup saves time, i.e. a lookup in the ODB of a sufficiently large repo is slow. My kneejerk line of thinking: * yes, it sounds good to reduce the number of ODB accesses. * But if we consider ODB lookups to be slow and we perform a structured access, how about a cache in front of the ODB? * We already have that! (sort of..) 9a414486d9 (lookup_object: prioritize recently found objects, 2013-05-01) * Instead of improving the caching, maybe change the size of the problem: We could keep the objects of different types in different hash-tables. object.c has its own hash table, I presume for historical and performance reasons, this would be split up to multiple hash tables. Additionally to "object *lookup_object(*sha1)", we'd have a function "object *lookup_object(*sha1, enum object_type hint)" which looks into the correct the hash table. If you were to call just lookup_object with no hint, then you'd look into all the different tables (I guess there is a preferrable order in which to look, haven't thought about that). > > In operations such as read-tree, checkout, and merge when > the differences between the commits are relatively small, > there will likely be many directories that have the same > SHA-1. In these cases we can avoid hitting the ODB multiple > times for the same SHA-1. This would explain partially why there was such a good performance boost in the referenced commit above as we implicitly lookup the same object multiple times. Peff is really into getting this part faster, c.f. https://public-inbox.org/git/20160914235547.h3n2otje2hec6u7k@sigill.intra.peff.net/ > TODO This change is a first attempt to test that by comparing > TODO the hashes of name[i] and name[i-i] and simply copying > TODO the tree-descriptor data. I was thinking of the n=2 > TODO case here. We may want to extend this to the n=3 case. > > ================ > On the Windows repo (500K trees, 3.1M files, 450MB index), > this reduced the overall time by 0.75 seconds when cycling > between 2 commits with a single file difference. > > (avg) before: 22.699 > (avg) after: 21.955 > =============== So it shaves off 4% of the time needed. it doesn't sound like a break through, but I guess these small patches add up. :) > for (i = 0; i < n; i++, dirmask >>= 1) { > - const unsigned char *sha1 = NULL; > - if (dirmask & 1) > - sha1 = names[i].oid->hash; > - buf[i] = fill_tree_descriptor(t+i, sha1); > + if (i > 0 && (dirmask & 1) && names[i].oid && names[i-1].oid && > + !hashcmp(names[i].oid->hash, names[i-1].oid->hash)) { Why do we need to check for dirmask & 1 here? This ought to be covered by the hashcmp already IIUC. So maybe we can pull out the if (dirmask & 1) sha1 = names[i].oid->hash; out of the else when dropping that dirmask check? Thanks, Stefan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] unpack-trees: avoid duplicate ODB lookups during checkout 2017-04-06 22:48 ` Stefan Beller @ 2017-04-07 5:19 ` Jeff King 2017-04-07 13:51 ` Jeff Hostetler 2017-04-07 17:35 ` Jeff Hostetler 1 sibling, 1 reply; 8+ messages in thread From: Jeff King @ 2017-04-07 5:19 UTC (permalink / raw) To: Stefan Beller Cc: Jeff Hostetler, git@vger.kernel.org, Junio C Hamano, Jeff Hostetler On Thu, Apr 06, 2017 at 03:48:07PM -0700, Stefan Beller wrote: > On Thu, Apr 6, 2017 at 1:37 PM, <git@jeffhostetler.com> wrote: > > From: Jeff Hostetler <jeffhost@microsoft.com> > > > > Teach traverse_trees_recursive() to not do redundant ODB > > lookups when both directories refer to the same OID. > > And the reason for this is that omitting the second lookup > saves time, i.e. a lookup in the ODB of a sufficiently large > repo is slow. > > My kneejerk line of thinking: > * yes, it sounds good to reduce the number of ODB accesses. > * But if we consider ODB lookups to be slow and we perform > a structured access, how about a cache in front of the ODB? > * We already have that! (sort of..) 9a414486d9 (lookup_object: > prioritize recently found objects, 2013-05-01) > * Instead of improving the caching, maybe change the > size of the problem: We could keep the objects of different types > in different hash-tables. I don't think this is using lookup_object() at all, though. It goes straight to fill_tree_descriptor(), which will read the object contents from disk. So our time is going to: 1. Finding the object in the odb (ideally a binary search in a single pack index, but less optimal when there are many packs). 2. Reconstructing the object. This means zlib-inflating from disk, but it may also mean delta reconstruction. I think there _are_ some caches at play here, though, when you look up the same tree back to back. The pack-mru means that we'll always look in the correct pack first. And in theory the delta base cache means that we'll already have the whole thing reconstructed in memory (though I have often been confused by exactly when we put items into that cache, so I might be wrong). So in theory, this is not all that different than the "just allocate and copy the bytes" optimization that's happening here (though I'm not surprised that doing it at a higher level can produce some speedup). I think the more interesting optimization is "just use the same buffer without bothering to copy", which is hard for the low-level code to do (since it doesn't know what lifetime the caller is expecting). > object.c has its own hash table, I presume for historical and > performance reasons, this would be split up to multiple hash > tables. So I don't think lookup_object() is really relevant here. But I'm also not sure that multiple hash tables would really buy us much. In theory hash tables are O(1), so multiple smaller tables doesn't help (and might hurt, since now we have four O(1) lookups to do). Of course that's a minor fiction because of collisions, but we keep the load factor on the object.c table pretty low (and that's why things like quadratic probing and cuckoo hashing never showed great improvement). -Peff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] unpack-trees: avoid duplicate ODB lookups during checkout 2017-04-07 5:19 ` Jeff King @ 2017-04-07 13:51 ` Jeff Hostetler 0 siblings, 0 replies; 8+ messages in thread From: Jeff Hostetler @ 2017-04-07 13:51 UTC (permalink / raw) To: Jeff King, Stefan Beller Cc: git@vger.kernel.org, Junio C Hamano, Jeff Hostetler On 4/7/2017 1:19 AM, Jeff King wrote: > On Thu, Apr 06, 2017 at 03:48:07PM -0700, Stefan Beller wrote: > >> On Thu, Apr 6, 2017 at 1:37 PM, <git@jeffhostetler.com> wrote: >>> From: Jeff Hostetler <jeffhost@microsoft.com> >>> >>> Teach traverse_trees_recursive() to not do redundant ODB >>> lookups when both directories refer to the same OID. >> >> And the reason for this is that omitting the second lookup >> saves time, i.e. a lookup in the ODB of a sufficiently large >> repo is slow. >> >> My kneejerk line of thinking: >> * yes, it sounds good to reduce the number of ODB accesses. >> * But if we consider ODB lookups to be slow and we perform >> a structured access, how about a cache in front of the ODB? >> * We already have that! (sort of..) 9a414486d9 (lookup_object: >> prioritize recently found objects, 2013-05-01) >> * Instead of improving the caching, maybe change the >> size of the problem: We could keep the objects of different types >> in different hash-tables. > > I don't think this is using lookup_object() at all, though. It goes > straight to fill_tree_descriptor(), which will read the object contents > from disk. So our time is going to: > > 1. Finding the object in the odb (ideally a binary search in a single > pack index, but less optimal when there are many packs). > > 2. Reconstructing the object. This means zlib-inflating from disk, but > it may also mean delta reconstruction. > > I think there _are_ some caches at play here, though, when you look up > the same tree back to back. The pack-mru means that we'll always look in > the correct pack first. And in theory the delta base cache means that > we'll already have the whole thing reconstructed in memory (though I > have often been confused by exactly when we put items into that cache, > so I might be wrong). This change shaved 500K calls to fill_tree_descriptor() (on the Windows source tree on a "checkout -b" to the same commit). I was surprised it didn't give more of a speed up, so some of that caching may be in play here, but it's hard to tell. Also, on my repo I have 100GB+ of packfiles, so that may be messing with things a bit. > > So in theory, this is not all that different than the "just allocate and > copy the bytes" optimization that's happening here (though I'm not > surprised that doing it at a higher level can produce some speedup). > > I think the more interesting optimization is "just use the same buffer > without bothering to copy", which is hard for the low-level code to do > (since it doesn't know what lifetime the caller is expecting). My first draft did just borrow the buffer right there in traverse_ and it seemed to work just fine. I was being cautious and copied it properly in the lower layer. Since the bufs are freed at the bottom it felt safe, but I didn't want to overlook anything. I'll switch it back. > >> object.c has its own hash table, I presume for historical and >> performance reasons, this would be split up to multiple hash >> tables. > > So I don't think lookup_object() is really relevant here. But I'm also > not sure that multiple hash tables would really buy us much. In theory > hash tables are O(1), so multiple smaller tables doesn't help (and might > hurt, since now we have four O(1) lookups to do). Of course that's a > minor fiction because of collisions, but we keep the load factor on the > object.c table pretty low (and that's why things like quadratic probing > and cuckoo hashing never showed great improvement). I do wonder now about the initial hash table size and any limits on it, but that is a question for another day. Thanks Jeff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] unpack-trees: avoid duplicate ODB lookups during checkout 2017-04-06 22:48 ` Stefan Beller 2017-04-07 5:19 ` Jeff King @ 2017-04-07 17:35 ` Jeff Hostetler 1 sibling, 0 replies; 8+ messages in thread From: Jeff Hostetler @ 2017-04-07 17:35 UTC (permalink / raw) To: Stefan Beller Cc: git@vger.kernel.org, Junio C Hamano, Jeff King, Jeff Hostetler On 4/6/2017 6:48 PM, Stefan Beller wrote: > On Thu, Apr 6, 2017 at 1:37 PM, <git@jeffhostetler.com> wrote: >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Teach traverse_trees_recursive() to not do redundant ODB >> lookups when both directories refer to the same OID. > > And the reason for this is that omitting the second lookup > saves time, i.e. a lookup in the ODB of a sufficiently large > repo is slow. > > My kneejerk line of thinking: > * yes, it sounds good to reduce the number of ODB accesses. > * But if we consider ODB lookups to be slow and we perform > a structured access, how about a cache in front of the ODB? > * We already have that! (sort of..) 9a414486d9 (lookup_object: > prioritize recently found objects, 2013-05-01) > * Instead of improving the caching, maybe change the > size of the problem: We could keep the objects of different types > in different hash-tables. > > object.c has its own hash table, I presume for historical and > performance reasons, this would be split up to multiple hash > tables. > > Additionally to "object *lookup_object(*sha1)", we'd have > a function > > "object *lookup_object(*sha1, enum object_type hint)" > which looks into the correct the hash table. > > If you were to call just lookup_object with no hint, then you'd > look into all the different tables (I guess there is a preferrable > order in which to look, haven't thought about that). > >> >> In operations such as read-tree, checkout, and merge when >> the differences between the commits are relatively small, >> there will likely be many directories that have the same >> SHA-1. In these cases we can avoid hitting the ODB multiple >> times for the same SHA-1. > > This would explain partially why there was such a good > performance boost in the referenced commit above as we > implicitly lookup the same object multiple times. > > Peff is really into getting this part faster, c.f. > https://public-inbox.org/git/20160914235547.h3n2otje2hec6u7k@sigill.intra.peff.net/ That looks interesting, but I question the probabilities for my use case here. When walking the trees and files in a single commit, I have no expectation that I'll see the same tree OID twice, so the cache is not really useful and may just add overhead. However, in a checkout or merge there is a high expectation of visiting the same tree OID -- and most of the time they are peers -- since commits tend to only change isolated parts of the tree. (I'm not going to worry about the case where someone moves an entire sub-tree to somewhere else in the tree and violates my peer assertion.) I did notice that we do 2 independent passes during checkout. First to compare the old and new commits. Then to compare the new with the worktree. So we touch each tree object 3 times. My patch helps the first, but does nothing for the second. Hopefully the cache is helping it (but I have not measured that). > >> TODO This change is a first attempt to test that by comparing >> TODO the hashes of name[i] and name[i-i] and simply copying >> TODO the tree-descriptor data. I was thinking of the n=2 >> TODO case here. We may want to extend this to the n=3 case. > >> >> ================ >> On the Windows repo (500K trees, 3.1M files, 450MB index), >> this reduced the overall time by 0.75 seconds when cycling >> between 2 commits with a single file difference. >> >> (avg) before: 22.699 >> (avg) after: 21.955 >> =============== > > So it shaves off 4% of the time needed. it doesn't sound like a > break through, but I guess these small patches add up. :) Agreed, but on the Windows source repo, it can take 30 seconds to do a "checkout -b" (without changing the actual HEAD commit). That's just for the housekeeping of ensuring you get a clean worktree. If I can knock off 5% here with minimal impact and without changing any file formats, I'll take it. And if I can just repeat that n times... :-) > >> for (i = 0; i < n; i++, dirmask >>= 1) { >> - const unsigned char *sha1 = NULL; >> - if (dirmask & 1) >> - sha1 = names[i].oid->hash; >> - buf[i] = fill_tree_descriptor(t+i, sha1); >> + if (i > 0 && (dirmask & 1) && names[i].oid && names[i-1].oid && >> + !hashcmp(names[i].oid->hash, names[i-1].oid->hash)) { > > Why do we need to check for dirmask & 1 here? > This ought to be covered by the hashcmp already IIUC. > So maybe we can pull out the > if (dirmask & 1) > sha1 = names[i].oid->hash; > out of the else when dropping that dirmask check? I was wondering about the test in the else clause. Just now I put a quick assert in there and it went off, so I'm not going to change this. Thanks Jeff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] unpack-trees: avoid duplicate ODB lookups during checkout 2017-04-06 20:37 ` [PATCH v1] " git 2017-04-06 22:48 ` Stefan Beller @ 2017-04-07 0:32 ` René Scharfe 2017-04-07 13:57 ` Jeff Hostetler 1 sibling, 1 reply; 8+ messages in thread From: René Scharfe @ 2017-04-07 0:32 UTC (permalink / raw) To: git, git; +Cc: gitster, peff, Jeff Hostetler Am 06.04.2017 um 22:37 schrieb git@jeffhostetler.com: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Teach traverse_trees_recursive() to not do redundant ODB > lookups when both directories refer to the same OID. > > In operations such as read-tree, checkout, and merge when > the differences between the commits are relatively small, > there will likely be many directories that have the same > SHA-1. In these cases we can avoid hitting the ODB multiple > times for the same SHA-1. > > TODO This change is a first attempt to test that by comparing > TODO the hashes of name[i] and name[i-i] and simply copying > TODO the tree-descriptor data. I was thinking of the n=2 > TODO case here. We may want to extend this to the n=3 case. > > ================ > On the Windows repo (500K trees, 3.1M files, 450MB index), > this reduced the overall time by 0.75 seconds when cycling > between 2 commits with a single file difference. > > (avg) before: 22.699 > (avg) after: 21.955 > =============== > > ================ > Using the p0004-read-tree test (posted earlier this week) > with 1M files on Linux: > > before: > $ ./p0004-read-tree.sh > 0004.5: switch work1 work2 (1003037) 11.99(8.12+3.32) > 0004.6: switch commit aliases (1003037) 11.95(8.20+3.14) > > after: > $ ./p0004-read-tree.sh > 0004.5: switch work1 work2 (1003037) 11.17(7.84+2.76) > 0004.6: switch commit aliases (1003037) 11.13(7.82+2.72) > ================ > > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > --- > tree-walk.c | 8 ++++++++ > tree-walk.h | 1 + > unpack-trees.c | 13 +++++++++---- > 3 files changed, 18 insertions(+), 4 deletions(-) > > diff --git a/tree-walk.c b/tree-walk.c > index ff77605..3b82f0e 100644 > --- a/tree-walk.c > +++ b/tree-walk.c > @@ -92,6 +92,14 @@ void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1) > return buf; > } > > +void *copy_tree_descriptor(struct tree_desc *dest, const struct tree_desc *src) > +{ > + void *buf = xmalloc(src->size); > + memcpy(buf, src->buffer, src->size); > + init_tree_desc(dest, buf, src->size); > + return buf; > +} > + > static void entry_clear(struct name_entry *a) > { > memset(a, 0, sizeof(*a)); > diff --git a/tree-walk.h b/tree-walk.h > index 68bb78b..ca4032b 100644 > --- a/tree-walk.h > +++ b/tree-walk.h > @@ -43,6 +43,7 @@ int tree_entry(struct tree_desc *, struct name_entry *); > int tree_entry_gently(struct tree_desc *, struct name_entry *); > > void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1); > +void *copy_tree_descriptor(struct tree_desc *dest, const struct tree_desc *src); > > struct traverse_info; > typedef int (*traverse_callback_t)(int n, unsigned long mask, unsigned long dirmask, struct name_entry *entry, struct traverse_info *); > diff --git a/unpack-trees.c b/unpack-trees.c > index 3a8ee19..50aacad 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -554,10 +554,15 @@ static int traverse_trees_recursive(int n, unsigned long dirmask, > newinfo.df_conflicts |= df_conflicts; > > for (i = 0; i < n; i++, dirmask >>= 1) { > - const unsigned char *sha1 = NULL; > - if (dirmask & 1) > - sha1 = names[i].oid->hash; > - buf[i] = fill_tree_descriptor(t+i, sha1); > + if (i > 0 && (dirmask & 1) && names[i].oid && names[i-1].oid && Can .oid even be NULL? (I didn't check, but it's dereferenced in the sha1 assignment below, so I guess the answer is no and these two checks are not needed.) > + !hashcmp(names[i].oid->hash, names[i-1].oid->hash)) { Calling oidcmp would be shorter. > + buf[i] = copy_tree_descriptor(&t[i], &t[i-1]); buf keeps track of the allocations that need to be freed at the end of the function. I assume these buffers are read-only. Can you use an alias here instead of a duplicate by calling init_tree_desc with the predecessor's buffer and setting buf[i] to NULL? Or even just copying t[i - 1] to t[i] with an assignment? That would be shorter and probably also quicker. > + } else { > + const unsigned char *sha1 = NULL; > + if (dirmask & 1) > + sha1 = names[i].oid->hash; > + buf[i] = fill_tree_descriptor(t+i, sha1); > + } > } > > bottom = switch_cache_bottom(&newinfo); > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] unpack-trees: avoid duplicate ODB lookups during checkout 2017-04-07 0:32 ` René Scharfe @ 2017-04-07 13:57 ` Jeff Hostetler 0 siblings, 0 replies; 8+ messages in thread From: Jeff Hostetler @ 2017-04-07 13:57 UTC (permalink / raw) To: René Scharfe, git; +Cc: gitster, peff, Jeff Hostetler On 4/6/2017 8:32 PM, René Scharfe wrote: > Am 06.04.2017 um 22:37 schrieb git@jeffhostetler.com: >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Teach traverse_trees_recursive() to not do redundant ODB >> lookups when both directories refer to the same OID. >> >> In operations such as read-tree, checkout, and merge when >> the differences between the commits are relatively small, >> there will likely be many directories that have the same >> SHA-1. In these cases we can avoid hitting the ODB multiple >> times for the same SHA-1. >> >> TODO This change is a first attempt to test that by comparing >> TODO the hashes of name[i] and name[i-i] and simply copying >> TODO the tree-descriptor data. I was thinking of the n=2 >> TODO case here. We may want to extend this to the n=3 case. >> >> ================ >> On the Windows repo (500K trees, 3.1M files, 450MB index), >> this reduced the overall time by 0.75 seconds when cycling >> between 2 commits with a single file difference. >> >> (avg) before: 22.699 >> (avg) after: 21.955 >> =============== >> >> ================ >> Using the p0004-read-tree test (posted earlier this week) >> with 1M files on Linux: >> >> before: >> $ ./p0004-read-tree.sh >> 0004.5: switch work1 work2 (1003037) 11.99(8.12+3.32) >> 0004.6: switch commit aliases (1003037) 11.95(8.20+3.14) >> >> after: >> $ ./p0004-read-tree.sh >> 0004.5: switch work1 work2 (1003037) 11.17(7.84+2.76) >> 0004.6: switch commit aliases (1003037) 11.13(7.82+2.72) >> ================ >> >> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> >> --- >> tree-walk.c | 8 ++++++++ >> tree-walk.h | 1 + >> unpack-trees.c | 13 +++++++++---- >> 3 files changed, 18 insertions(+), 4 deletions(-) >> >> diff --git a/tree-walk.c b/tree-walk.c >> index ff77605..3b82f0e 100644 >> --- a/tree-walk.c >> +++ b/tree-walk.c >> @@ -92,6 +92,14 @@ void *fill_tree_descriptor(struct tree_desc *desc, >> const unsigned char *sha1) >> return buf; >> } >> +void *copy_tree_descriptor(struct tree_desc *dest, const struct >> tree_desc *src) >> +{ >> + void *buf = xmalloc(src->size); >> + memcpy(buf, src->buffer, src->size); >> + init_tree_desc(dest, buf, src->size); >> + return buf; >> +} >> + >> static void entry_clear(struct name_entry *a) >> { >> memset(a, 0, sizeof(*a)); >> diff --git a/tree-walk.h b/tree-walk.h >> index 68bb78b..ca4032b 100644 >> --- a/tree-walk.h >> +++ b/tree-walk.h >> @@ -43,6 +43,7 @@ int tree_entry(struct tree_desc *, struct name_entry >> *); >> int tree_entry_gently(struct tree_desc *, struct name_entry *); >> void *fill_tree_descriptor(struct tree_desc *desc, const unsigned >> char *sha1); >> +void *copy_tree_descriptor(struct tree_desc *dest, const struct >> tree_desc *src); >> struct traverse_info; >> typedef int (*traverse_callback_t)(int n, unsigned long mask, >> unsigned long dirmask, struct name_entry *entry, struct traverse_info *); >> diff --git a/unpack-trees.c b/unpack-trees.c >> index 3a8ee19..50aacad 100644 >> --- a/unpack-trees.c >> +++ b/unpack-trees.c >> @@ -554,10 +554,15 @@ static int traverse_trees_recursive(int n, >> unsigned long dirmask, >> newinfo.df_conflicts |= df_conflicts; >> for (i = 0; i < n; i++, dirmask >>= 1) { >> - const unsigned char *sha1 = NULL; >> - if (dirmask & 1) >> - sha1 = names[i].oid->hash; >> - buf[i] = fill_tree_descriptor(t+i, sha1); >> + if (i > 0 && (dirmask & 1) && names[i].oid && names[i-1].oid && > > Can .oid even be NULL? (I didn't check, but it's dereferenced in the > sha1 assignment below, so I guess the answer is no and these two checks > are not needed.) yes. i think the (dirmask&1) is hiding it for name[i]. i put both in the code above, but it also worked fine just testing both oid's (and without dirmask). > >> + !hashcmp(names[i].oid->hash, names[i-1].oid->hash)) { > > Calling oidcmp would be shorter. right. > >> + buf[i] = copy_tree_descriptor(&t[i], &t[i-1]); > > buf keeps track of the allocations that need to be freed at the end of > the function. I assume these buffers are read-only. Can you use an > alias here instead of a duplicate by calling init_tree_desc with the > predecessor's buffer and setting buf[i] to NULL? Or even just copying > t[i - 1] to t[i] with an assignment? That would be shorter and probably > also quicker. Yes, my first draft did that. Just being cautious, but i'll switch it back since that seems to be the consensus. > >> + } else { >> + const unsigned char *sha1 = NULL; >> + if (dirmask & 1) >> + sha1 = names[i].oid->hash; >> + buf[i] = fill_tree_descriptor(t+i, sha1); >> + } >> } >> bottom = switch_cache_bottom(&newinfo); >> Thanks Jeff ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-04-07 17:35 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-04-06 20:37 [PATCH v1] WIP unpack-trees: avoid duplicate ODB lookups during checkout git 2017-04-06 20:37 ` [PATCH v1] " git 2017-04-06 22:48 ` Stefan Beller 2017-04-07 5:19 ` Jeff King 2017-04-07 13:51 ` Jeff Hostetler 2017-04-07 17:35 ` Jeff Hostetler 2017-04-07 0:32 ` René Scharfe 2017-04-07 13:57 ` Jeff Hostetler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).