* [PATCH] name-hash: don't add sparse directories in threaded lazy init @ 2025-05-21 11:40 Alex Mironov via GitGitGadget 2025-05-21 17:17 ` Derrick Stolee 2025-05-21 20:16 ` [PATCH v2] " Alex Mironov via GitGitGadget 0 siblings, 2 replies; 13+ messages in thread From: Alex Mironov via GitGitGadget @ 2025-05-21 11:40 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Junio C Hamano, Alex Mironov, Alex Mironov From: Alex Mironov <alexandrfox@gmail.com> Similarly to 5f116695864788d1fe45ff06bfad7a71a8d98d0a make sure to avoid placing sparse directories into the name_hash hashtable whenever multithreaded initialization is performed. Sparse directory entries represent a directory that is outside the sparse-checkout definition. These are not paths to blobs, so should not be added to the name_hash table as they must never be queried. Signed-off-by: Alex Mironov <alexandrfox@gmail.com> --- name-hash: don't add sparse directories in threaded lazy init Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v1 Pull-Request: https://github.com/git/git/pull/1970 name-hash.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/name-hash.c b/name-hash.c index d66de1cdfd5..03123a8779a 100644 --- a/name-hash.c +++ b/name-hash.c @@ -492,6 +492,9 @@ static void *lazy_name_thread_proc(void *_data) for (k = 0; k < d->istate->cache_nr; k++) { struct cache_entry *ce_k = d->istate->cache[k]; ce_k->ce_flags |= CE_HASHED; + if (S_ISSPARSEDIR(ce_k->ce_mode)) { + continue; + } hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); hashmap_add(&d->istate->name_hash, &ce_k->ent); } base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0 -- gitgitgadget ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] name-hash: don't add sparse directories in threaded lazy init 2025-05-21 11:40 [PATCH] name-hash: don't add sparse directories in threaded lazy init Alex Mironov via GitGitGadget @ 2025-05-21 17:17 ` Derrick Stolee 2025-05-21 18:32 ` Junio C Hamano 2025-05-21 20:07 ` Alex Mironov 2025-05-21 20:16 ` [PATCH v2] " Alex Mironov via GitGitGadget 1 sibling, 2 replies; 13+ messages in thread From: Derrick Stolee @ 2025-05-21 17:17 UTC (permalink / raw) To: Alex Mironov via GitGitGadget, git; +Cc: Junio C Hamano, Alex Mironov On 5/21/2025 7:40 AM, Alex Mironov via GitGitGadget wrote: > From: Alex Mironov <alexandrfox@gmail.com> > > Similarly to 5f116695864788d1fe45ff06bfad7a71a8d98d0a nit: we typically use the "reference" style to refer to other commits, use 'git log -1 --pretty=reference <oid>' to get output like this: 5f116695864 (name-hash: don't add directories to name_hash, 2021-04-12) > make sure to avoid placing sparse directories into the name_hash > hashtable whenever multithreaded initialization is performed. > > Sparse directory entries represent a directory that is outside the > sparse-checkout definition. These are not paths to blobs, so should not > be added to the name_hash table as they must never be queried. > > Signed-off-by: Alex Mironov <alexandrfox@gmail.com> > --- > name-hash: don't add sparse directories in threaded lazy init > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v1 > Pull-Request: https://github.com/git/git/pull/1970 > > name-hash.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/name-hash.c b/name-hash.c > index d66de1cdfd5..03123a8779a 100644 > --- a/name-hash.c > +++ b/name-hash.c > @@ -492,6 +492,9 @@ static void *lazy_name_thread_proc(void *_data) > for (k = 0; k < d->istate->cache_nr; k++) { > struct cache_entry *ce_k = d->istate->cache[k]; > ce_k->ce_flags |= CE_HASHED; > + if (S_ISSPARSEDIR(ce_k->ce_mode)) { > + continue; > + } nit: for one-line blocks, we usually skip the braces. But I think that it might be better to reverse the logic to get something like: if (!S_ISSPARSEDIR(ce_k->ce_mode) { hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); hashmap_add(&d->istate->name_hash, &ce_k->ent); } This seems to be a performance-only fix, and it might be interesting to see if there is any impact on p2000-sparse-operations.sh. Those tests don't focus on many sparse-directory entries, so that may not demonstrate any meaningful difference. Thanks, -Stolee ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] name-hash: don't add sparse directories in threaded lazy init 2025-05-21 17:17 ` Derrick Stolee @ 2025-05-21 18:32 ` Junio C Hamano 2025-05-21 20:07 ` Alex Mironov 1 sibling, 0 replies; 13+ messages in thread From: Junio C Hamano @ 2025-05-21 18:32 UTC (permalink / raw) To: Derrick Stolee; +Cc: Alex Mironov via GitGitGadget, git, Alex Mironov Derrick Stolee <stolee@gmail.com> writes: > This seems to be a performance-only fix, ... Hmph, then the proposed log message is phrased in a slightly misleading way. I took that "should not" below ... > Sparse directory entries represent a directory that is outside the > sparse-checkout definition. These are not paths to blobs, so should not > be added to the name_hash table as they must never be queried. ... to be saying that we would find something we should not find when querying the name_hash table if we didn't omit them, and the change is done for correctness. Thanks for a quick review. Very much appreciated. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] name-hash: don't add sparse directories in threaded lazy init 2025-05-21 17:17 ` Derrick Stolee 2025-05-21 18:32 ` Junio C Hamano @ 2025-05-21 20:07 ` Alex Mironov 1 sibling, 0 replies; 13+ messages in thread From: Alex Mironov @ 2025-05-21 20:07 UTC (permalink / raw) To: Derrick Stolee; +Cc: Alex Mironov via GitGitGadget, git, Junio C Hamano Hey Derrick, Appreciate your quick response! I addressed the nits, and will submit the new patch shortly. > This seems to be a performance-only fix Indeed, and it's more impactful for the specific setup we have inside our org. In short, we have a custom implementation of the sparse checkout logic that makes sure to leave files outside of sparse cones present on disk. Because of this, git with sparse index enabled frequently resorted to a full index expansion despite 'sparse.expectFilesOutsideOfPatterns=true' config set which, I assume, was only made working with the full index. This problem specifically was tricky to catch since multithreaded lazy load logic applies only after a certain limit of objects present in the index. I hope I will be able to send more patches soon to fix-up bits and pieces of the 'sparse.expectFilesOutsideOfPatterns' flag with respect to other expansions. I'm planning to guard those under the same configuration option, let me know if you think it'd be better to introduce a new option. On Wed, May 21, 2025 at 7:17 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 5/21/2025 7:40 AM, Alex Mironov via GitGitGadget wrote: > > From: Alex Mironov <alexandrfox@gmail.com> > > > > Similarly to 5f116695864788d1fe45ff06bfad7a71a8d98d0a > > nit: we typically use the "reference" style to refer to other > commits, use 'git log -1 --pretty=reference <oid>' to get output > like this: > > 5f116695864 (name-hash: don't add directories to name_hash, 2021-04-12) > > > make sure to avoid placing sparse directories into the name_hash > > hashtable whenever multithreaded initialization is performed. > > > > Sparse directory entries represent a directory that is outside the > > sparse-checkout definition. These are not paths to blobs, so should not > > be added to the name_hash table as they must never be queried. > > > > Signed-off-by: Alex Mironov <alexandrfox@gmail.com> > > --- > > name-hash: don't add sparse directories in threaded lazy init > > > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v1 > > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v1 > > Pull-Request: https://github.com/git/git/pull/1970 > > > > name-hash.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/name-hash.c b/name-hash.c > > index d66de1cdfd5..03123a8779a 100644 > > --- a/name-hash.c > > +++ b/name-hash.c > > @@ -492,6 +492,9 @@ static void *lazy_name_thread_proc(void *_data) > > for (k = 0; k < d->istate->cache_nr; k++) { > > struct cache_entry *ce_k = d->istate->cache[k]; > > ce_k->ce_flags |= CE_HASHED; > > + if (S_ISSPARSEDIR(ce_k->ce_mode)) { > > + continue; > > + } > > nit: for one-line blocks, we usually skip the braces. But I think > that it might be better to reverse the logic to get something like: > > if (!S_ISSPARSEDIR(ce_k->ce_mode) { > hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); > hashmap_add(&d->istate->name_hash, &ce_k->ent); > } > > This seems to be a performance-only fix, and it might be interesting > to see if there is any impact on p2000-sparse-operations.sh. Those > tests don't focus on many sparse-directory entries, so that may not > demonstrate any meaningful difference. > > Thanks, > -Stolee > -- Best, Alex Mironov ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2] name-hash: don't add sparse directories in threaded lazy init 2025-05-21 11:40 [PATCH] name-hash: don't add sparse directories in threaded lazy init Alex Mironov via GitGitGadget 2025-05-21 17:17 ` Derrick Stolee @ 2025-05-21 20:16 ` Alex Mironov via GitGitGadget 2025-05-21 20:32 ` Junio C Hamano 2025-05-21 21:29 ` [PATCH v3] " Alex Mironov via GitGitGadget 1 sibling, 2 replies; 13+ messages in thread From: Alex Mironov via GitGitGadget @ 2025-05-21 20:16 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Junio C Hamano, Alex Mironov, Alex Mironov From: Alex Mironov <alexandrfox@gmail.com> Ensure that logic added in 5f11669586 (name-hash: don't add directories to name_hash, 2021-04-12) also applies in multithreaded hashtable init path. Sparse directory entries represent a directory that is outside the sparse-checkout definition. These are not paths to blobs, so should not be added to the name_hash table as they must never be queried. Signed-off-by: Alex Mironov <alexandrfox@gmail.com> --- name-hash: don't add sparse directories in threaded lazy init Changes since v1: * addressed feedback (code-style) Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v2 Pull-Request: https://github.com/git/git/pull/1970 Range-diff vs v1: 1: d12ebc612c2 ! 1: fb378147c73 name-hash: don't add sparse directories in threaded lazy init @@ Metadata ## Commit message ## name-hash: don't add sparse directories in threaded lazy init - Similarly to 5f116695864788d1fe45ff06bfad7a71a8d98d0a - make sure to avoid placing sparse directories into the name_hash - hashtable whenever multithreaded initialization is performed. + Ensure that logic added in 5f11669586 (name-hash: don't add directories + to name_hash, 2021-04-12) also applies in multithreaded hashtable init + path. Sparse directory entries represent a directory that is outside the sparse-checkout definition. These are not paths to blobs, so should not @@ name-hash.c: static void *lazy_name_thread_proc(void *_data) for (k = 0; k < d->istate->cache_nr; k++) { struct cache_entry *ce_k = d->istate->cache[k]; ce_k->ce_flags |= CE_HASHED; -+ if (S_ISSPARSEDIR(ce_k->ce_mode)) { -+ continue; +- hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); +- hashmap_add(&d->istate->name_hash, &ce_k->ent); ++ if (!S_ISSPARSEDIR(ce_k->ce_mode)) { ++ hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); ++ hashmap_add(&d->istate->name_hash, &ce_k->ent); + } - hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); - hashmap_add(&d->istate->name_hash, &ce_k->ent); } + + return NULL; name-hash.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/name-hash.c b/name-hash.c index d66de1cdfd5..b91e2762678 100644 --- a/name-hash.c +++ b/name-hash.c @@ -492,8 +492,10 @@ static void *lazy_name_thread_proc(void *_data) for (k = 0; k < d->istate->cache_nr; k++) { struct cache_entry *ce_k = d->istate->cache[k]; ce_k->ce_flags |= CE_HASHED; - hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); - hashmap_add(&d->istate->name_hash, &ce_k->ent); + if (!S_ISSPARSEDIR(ce_k->ce_mode)) { + hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); + hashmap_add(&d->istate->name_hash, &ce_k->ent); + } } return NULL; base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0 -- gitgitgadget ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2] name-hash: don't add sparse directories in threaded lazy init 2025-05-21 20:16 ` [PATCH v2] " Alex Mironov via GitGitGadget @ 2025-05-21 20:32 ` Junio C Hamano 2025-05-21 20:37 ` Alex Mironov 2025-05-21 21:29 ` [PATCH v3] " Alex Mironov via GitGitGadget 1 sibling, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2025-05-21 20:32 UTC (permalink / raw) To: Alex Mironov via GitGitGadget; +Cc: git, Derrick Stolee, Alex Mironov "Alex Mironov via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Alex Mironov <alexandrfox@gmail.com> > > Ensure that logic added in 5f11669586 (name-hash: don't add directories > to name_hash, 2021-04-12) also applies in multithreaded hashtable init > path. > > Sparse directory entries represent a directory that is outside the > sparse-checkout definition. These are not paths to blobs, so should not > be added to the name_hash table as they must never be queried. The second paragraph sounds as if this is a correctness fix, i.e. "should not be added" hints that we would see a wrong result returned from the hashmap if you added these entries to the name_hash. If this is a performance-only fix, that should be more clearly stated. Also, can we have performance numbers in the proposed log message as well, or is the improvement too small to measure? Thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] name-hash: don't add sparse directories in threaded lazy init 2025-05-21 20:32 ` Junio C Hamano @ 2025-05-21 20:37 ` Alex Mironov 2025-05-21 21:12 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: Alex Mironov @ 2025-05-21 20:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: Alex Mironov via GitGitGadget, git, Derrick Stolee Hey Junio, With respect to messaging I more or less copy-pasted Derricks message from the original commit for non-threaded init: please check the referenced commit. Let me know if another wording is needed/preferred. > Also, can we have performance numbers in the proposed log message as well, or is the improvement too small to measure? In their current form benchmarks do not show major improvements outside of special casing I mentioned in my previous message (existence of files/objects outside of sparse cones in special VFS setups). So, for the general audience this indeed can be treated as a correctness fix. On Wed, May 21, 2025 at 10:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Alex Mironov via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Alex Mironov <alexandrfox@gmail.com> > > > > Ensure that logic added in 5f11669586 (name-hash: don't add directories > > to name_hash, 2021-04-12) also applies in multithreaded hashtable init > > path. > > > > Sparse directory entries represent a directory that is outside the > > sparse-checkout definition. These are not paths to blobs, so should not > > be added to the name_hash table as they must never be queried. > > The second paragraph sounds as if this is a correctness fix, > i.e. "should not be added" hints that we would see a wrong result > returned from the hashmap if you added these entries to the > name_hash. If this is a performance-only fix, that should be more > clearly stated. Also, can we have performance numbers in the > proposed log message as well, or is the improvement too small to > measure? > > Thanks. -- Best, Alex Mironov ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] name-hash: don't add sparse directories in threaded lazy init 2025-05-21 20:37 ` Alex Mironov @ 2025-05-21 21:12 ` Junio C Hamano 2025-05-21 21:23 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2025-05-21 21:12 UTC (permalink / raw) To: Alex Mironov; +Cc: Alex Mironov via GitGitGadget, git, Derrick Stolee Alex Mironov <alexandrfox@gmail.com> writes: > Hey Junio, > > With respect to messaging I more or less copy-pasted Derricks message > from the original commit for non-threaded init: please check the > referenced commit. Let me know if another wording is needed/preferred. I know what you did. Copying and pasting others fuzzy words into your commit log message does not make your commit log message clear. I already said the given message is less clear than desired, so do I still have to let you know??? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] name-hash: don't add sparse directories in threaded lazy init 2025-05-21 21:12 ` Junio C Hamano @ 2025-05-21 21:23 ` Junio C Hamano 2025-05-21 21:40 ` Alex Mironov 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2025-05-21 21:23 UTC (permalink / raw) To: Alex Mironov; +Cc: Alex Mironov via GitGitGadget, git, Derrick Stolee Junio C Hamano <gitster@pobox.com> writes: > Alex Mironov <alexandrfox@gmail.com> writes: > >> Hey Junio, >> >> With respect to messaging I more or less copy-pasted Derricks message >> from the original commit for non-threaded init: please check the >> referenced commit. Let me know if another wording is needed/preferred. > > I know what you did. Copying and pasting others fuzzy words into > your commit log message does not make your commit log message clear. > > I already said the given message is less clear than desired, so do I > still have to let you know??? Actually after re-reading what Derrick wrote in that commit, I notice that you didn't even copy-pasted his message in full. Here is the message in 5f116695 (name-hash: don't add directories to name_hash, 2021-04-12): name-hash: don't add directories to name_hash Sparse directory entries represent a directory that is outside the sparse-checkout definition. These are not paths to blobs, so should not be added to the name_hash table. Instead, they should be added to the directory hashtable when 'ignore_case' is true. Add a condition to avoid placing sparse directories into the name_hash hashtable. This avoids filling the table with extra entries that will never be queried. Notice that the second paragraph here makes it clear that how extra entries would not contribute to or hurt the correctness? You failed to copy-paste that crucial bit, which ended up making your version of the explanation much less clear why the change would not affect correctness than it could have been. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] name-hash: don't add sparse directories in threaded lazy init 2025-05-21 21:23 ` Junio C Hamano @ 2025-05-21 21:40 ` Alex Mironov 0 siblings, 0 replies; 13+ messages in thread From: Alex Mironov @ 2025-05-21 21:40 UTC (permalink / raw) To: Junio C Hamano; +Cc: Alex Mironov via GitGitGadget, git, Derrick Stolee Dear Junio, Thank you for the feedback, I adjusted the commit message in full which hopefully makes this patch clearer. At the same time I don't quite understand the need for the perceived hostility - this is not some "fuzzy" words, but the messaging from the original author of the sparse feature. I certainly understand your desire to uphold the standards of contributions to git, especially from the new author, but I must say I feel quite alienated by your reply. Nonetheless, the adjustment is submitted now and I hope that further contributions remain welcomed. On Wed, May 21, 2025 at 11:23 PM Junio C Hamano <gitster@pobox.com> wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > > Alex Mironov <alexandrfox@gmail.com> writes: > > > >> Hey Junio, > >> > >> With respect to messaging I more or less copy-pasted Derricks message > >> from the original commit for non-threaded init: please check the > >> referenced commit. Let me know if another wording is needed/preferred. > > > > I know what you did. Copying and pasting others fuzzy words into > > your commit log message does not make your commit log message clear. > > > > I already said the given message is less clear than desired, so do I > > still have to let you know??? > > Actually after re-reading what Derrick wrote in that commit, I > notice that you didn't even copy-pasted his message in full. Here > is the message in 5f116695 (name-hash: don't add directories to > name_hash, 2021-04-12): > > name-hash: don't add directories to name_hash > > Sparse directory entries represent a directory that is outside the > sparse-checkout definition. These are not paths to blobs, so should not > be added to the name_hash table. Instead, they should be added to the > directory hashtable when 'ignore_case' is true. > > Add a condition to avoid placing sparse directories into the name_hash > hashtable. This avoids filling the table with extra entries that will > never be queried. > > Notice that the second paragraph here makes it clear that how extra > entries would not contribute to or hurt the correctness? You failed > to copy-paste that crucial bit, which ended up making your version > of the explanation much less clear why the change would not affect > correctness than it could have been. -- Best, Alex Mironov ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3] name-hash: don't add sparse directories in threaded lazy init 2025-05-21 20:16 ` [PATCH v2] " Alex Mironov via GitGitGadget 2025-05-21 20:32 ` Junio C Hamano @ 2025-05-21 21:29 ` Alex Mironov via GitGitGadget 2025-05-21 21:47 ` Junio C Hamano 1 sibling, 1 reply; 13+ messages in thread From: Alex Mironov via GitGitGadget @ 2025-05-21 21:29 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Junio C Hamano, Alex Mironov, Alex Mironov From: Alex Mironov <alexandrfox@gmail.com> Ensure that logic added in 5f11669586 (name-hash: don't add directories to name_hash, 2021-04-12) also applies in multithreaded hashtable init path. As per the original single-threaded change above: sparse directory entries represent a directory that is outside the sparse-checkout definition. These are not paths to blobs, so should not be added to the name_hash table. Instead, they should be added to the directory hashtable when 'ignore_case' is true. Add a condition to avoid placing sparse directories into the name_hash hashtable. This avoids filling the table with extra entries that will never be queried. Signed-off-by: Alex Mironov <alexandrfox@gmail.com> --- name-hash: don't add sparse directories in threaded lazy init Changes since v2: * addressed commit message structure Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v3 Pull-Request: https://github.com/git/git/pull/1970 Range-diff vs v2: 1: fb378147c73 ! 1: b91ccb640c5 name-hash: don't add sparse directories in threaded lazy init @@ Commit message to name_hash, 2021-04-12) also applies in multithreaded hashtable init path. - Sparse directory entries represent a directory that is outside the - sparse-checkout definition. These are not paths to blobs, so should not - be added to the name_hash table as they must never be queried. + As per the original single-threaded change above: sparse directory entries + represent a directory that is outside the sparse-checkout definition. + These are not paths to blobs, so should not be added to the name_hash + table. Instead, they should be added to the directory hashtable when + 'ignore_case' is true. + + Add a condition to avoid placing sparse directories into the name_hash + hashtable. This avoids filling the table with extra entries that will + never be queried. Signed-off-by: Alex Mironov <alexandrfox@gmail.com> name-hash.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/name-hash.c b/name-hash.c index d66de1cdfd5..b91e2762678 100644 --- a/name-hash.c +++ b/name-hash.c @@ -492,8 +492,10 @@ static void *lazy_name_thread_proc(void *_data) for (k = 0; k < d->istate->cache_nr; k++) { struct cache_entry *ce_k = d->istate->cache[k]; ce_k->ce_flags |= CE_HASHED; - hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); - hashmap_add(&d->istate->name_hash, &ce_k->ent); + if (!S_ISSPARSEDIR(ce_k->ce_mode)) { + hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); + hashmap_add(&d->istate->name_hash, &ce_k->ent); + } } return NULL; base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0 -- gitgitgadget ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3] name-hash: don't add sparse directories in threaded lazy init 2025-05-21 21:29 ` [PATCH v3] " Alex Mironov via GitGitGadget @ 2025-05-21 21:47 ` Junio C Hamano 2025-05-22 1:48 ` Derrick Stolee 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2025-05-21 21:47 UTC (permalink / raw) To: Alex Mironov via GitGitGadget; +Cc: git, Derrick Stolee, Alex Mironov "Alex Mironov via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Alex Mironov <alexandrfox@gmail.com> > > Ensure that logic added in 5f11669586 (name-hash: don't add directories > to name_hash, 2021-04-12) also applies in multithreaded hashtable init > path. > > As per the original single-threaded change above: sparse directory entries > represent a directory that is outside the sparse-checkout definition. > These are not paths to blobs, so should not be added to the name_hash > table. Instead, they should be added to the directory hashtable when > 'ignore_case' is true. > > Add a condition to avoid placing sparse directories into the name_hash > hashtable. This avoids filling the table with extra entries that will > never be queried. > > Signed-off-by: Alex Mironov <alexandrfox@gmail.com> > --- Sounds quite sensible and the above reads better. > diff --git a/name-hash.c b/name-hash.c > index d66de1cdfd5..b91e2762678 100644 > --- a/name-hash.c > +++ b/name-hash.c > @@ -492,8 +492,10 @@ static void *lazy_name_thread_proc(void *_data) > for (k = 0; k < d->istate->cache_nr; k++) { > struct cache_entry *ce_k = d->istate->cache[k]; > ce_k->ce_flags |= CE_HASHED; > - hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); > - hashmap_add(&d->istate->name_hash, &ce_k->ent); > + if (!S_ISSPARSEDIR(ce_k->ce_mode)) { > + hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); > + hashmap_add(&d->istate->name_hash, &ce_k->ent); > + } > } This unfortunately gives us deeper nesting than your previous round, but the conditional matches the original commit and easier to see what is going on by comparing the uni- and multi-threaded variants. Nicely done. Will queue. Thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3] name-hash: don't add sparse directories in threaded lazy init 2025-05-21 21:47 ` Junio C Hamano @ 2025-05-22 1:48 ` Derrick Stolee 0 siblings, 0 replies; 13+ messages in thread From: Derrick Stolee @ 2025-05-22 1:48 UTC (permalink / raw) To: Junio C Hamano, Alex Mironov via GitGitGadget; +Cc: git, Alex Mironov On 5/21/2025 5:47 PM, Junio C Hamano wrote: > "Alex Mironov via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Alex Mironov <alexandrfox@gmail.com> >> >> Ensure that logic added in 5f11669586 (name-hash: don't add directories >> to name_hash, 2021-04-12) also applies in multithreaded hashtable init >> path. >> >> As per the original single-threaded change above: sparse directory entries >> represent a directory that is outside the sparse-checkout definition. >> These are not paths to blobs, so should not be added to the name_hash >> table. Instead, they should be added to the directory hashtable when >> 'ignore_case' is true. >> >> Add a condition to avoid placing sparse directories into the name_hash >> hashtable. This avoids filling the table with extra entries that will >> never be queried. >> >> Signed-off-by: Alex Mironov <alexandrfox@gmail.com> >> --- > > Sounds quite sensible and the above reads better. > >> diff --git a/name-hash.c b/name-hash.c >> index d66de1cdfd5..b91e2762678 100644 >> --- a/name-hash.c >> +++ b/name-hash.c >> @@ -492,8 +492,10 @@ static void *lazy_name_thread_proc(void *_data) >> for (k = 0; k < d->istate->cache_nr; k++) { >> struct cache_entry *ce_k = d->istate->cache[k]; >> ce_k->ce_flags |= CE_HASHED; >> - hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); >> - hashmap_add(&d->istate->name_hash, &ce_k->ent); >> + if (!S_ISSPARSEDIR(ce_k->ce_mode)) { >> + hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); >> + hashmap_add(&d->istate->name_hash, &ce_k->ent); >> + } >> } > > > This unfortunately gives us deeper nesting than your previous round, > but the conditional matches the original commit and easier to see > what is going on by comparing the uni- and multi-threaded variants. > > Nicely done. Will queue. I agree that this version is ready to go. Thanks for shepherding the v2 discussion. -Stolee ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-05-22 1:48 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-21 11:40 [PATCH] name-hash: don't add sparse directories in threaded lazy init Alex Mironov via GitGitGadget 2025-05-21 17:17 ` Derrick Stolee 2025-05-21 18:32 ` Junio C Hamano 2025-05-21 20:07 ` Alex Mironov 2025-05-21 20:16 ` [PATCH v2] " Alex Mironov via GitGitGadget 2025-05-21 20:32 ` Junio C Hamano 2025-05-21 20:37 ` Alex Mironov 2025-05-21 21:12 ` Junio C Hamano 2025-05-21 21:23 ` Junio C Hamano 2025-05-21 21:40 ` Alex Mironov 2025-05-21 21:29 ` [PATCH v3] " Alex Mironov via GitGitGadget 2025-05-21 21:47 ` Junio C Hamano 2025-05-22 1:48 ` Derrick Stolee
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).