git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] name-hash: don't add sparse directories in threaded lazy init
@ 2025-05-21 11:40 Alex Mironov via GitGitGadget
  2025-05-21 17:17 ` Derrick Stolee
  2025-05-21 20:16 ` [PATCH v2] " Alex Mironov via GitGitGadget
  0 siblings, 2 replies; 13+ messages in thread
From: Alex Mironov via GitGitGadget @ 2025-05-21 11:40 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano, Alex Mironov, Alex Mironov

From: Alex Mironov <alexandrfox@gmail.com>

Similarly to 5f116695864788d1fe45ff06bfad7a71a8d98d0a
make sure to avoid placing sparse directories into the name_hash
hashtable whenever multithreaded initialization is performed.

Sparse directory entries represent a directory that is outside the
sparse-checkout definition. These are not paths to blobs, so should not
be added to the name_hash table as they must never be queried.

Signed-off-by: Alex Mironov <alexandrfox@gmail.com>
---
    name-hash: don't add sparse directories in threaded lazy init

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v1
Pull-Request: https://github.com/git/git/pull/1970

 name-hash.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/name-hash.c b/name-hash.c
index d66de1cdfd5..03123a8779a 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -492,6 +492,9 @@ static void *lazy_name_thread_proc(void *_data)
 	for (k = 0; k < d->istate->cache_nr; k++) {
 		struct cache_entry *ce_k = d->istate->cache[k];
 		ce_k->ce_flags |= CE_HASHED;
+		if (S_ISSPARSEDIR(ce_k->ce_mode)) {
+			continue;
+		}
 		hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
 		hashmap_add(&d->istate->name_hash, &ce_k->ent);
 	}

base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] name-hash: don't add sparse directories in threaded lazy init
  2025-05-21 11:40 [PATCH] name-hash: don't add sparse directories in threaded lazy init Alex Mironov via GitGitGadget
@ 2025-05-21 17:17 ` Derrick Stolee
  2025-05-21 18:32   ` Junio C Hamano
  2025-05-21 20:07   ` Alex Mironov
  2025-05-21 20:16 ` [PATCH v2] " Alex Mironov via GitGitGadget
  1 sibling, 2 replies; 13+ messages in thread
From: Derrick Stolee @ 2025-05-21 17:17 UTC (permalink / raw)
  To: Alex Mironov via GitGitGadget, git; +Cc: Junio C Hamano, Alex Mironov

On 5/21/2025 7:40 AM, Alex Mironov via GitGitGadget wrote:
> From: Alex Mironov <alexandrfox@gmail.com>
> 
> Similarly to 5f116695864788d1fe45ff06bfad7a71a8d98d0a

nit: we typically use the "reference" style to refer to other
commits, use 'git log -1 --pretty=reference <oid>' to get output
like this:

  5f116695864 (name-hash: don't add directories to name_hash, 2021-04-12)

> make sure to avoid placing sparse directories into the name_hash
> hashtable whenever multithreaded initialization is performed.
> 
> Sparse directory entries represent a directory that is outside the
> sparse-checkout definition. These are not paths to blobs, so should not
> be added to the name_hash table as they must never be queried.
> 
> Signed-off-by: Alex Mironov <alexandrfox@gmail.com>
> ---
>     name-hash: don't add sparse directories in threaded lazy init
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v1
> Pull-Request: https://github.com/git/git/pull/1970
> 
>  name-hash.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/name-hash.c b/name-hash.c
> index d66de1cdfd5..03123a8779a 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -492,6 +492,9 @@ static void *lazy_name_thread_proc(void *_data)
>  	for (k = 0; k < d->istate->cache_nr; k++) {
>  		struct cache_entry *ce_k = d->istate->cache[k];
>  		ce_k->ce_flags |= CE_HASHED;
> +		if (S_ISSPARSEDIR(ce_k->ce_mode)) {
> +			continue;
> +		}

nit: for one-line blocks, we usually skip the braces. But I think
that it might be better to reverse the logic to get something like:

	if (!S_ISSPARSEDIR(ce_k->ce_mode) {
  		hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
 		hashmap_add(&d->istate->name_hash, &ce_k->ent);
	}

This seems to be a performance-only fix, and it might be interesting
to see if there is any impact on p2000-sparse-operations.sh. Those
tests don't focus on many sparse-directory entries, so that may not
demonstrate any meaningful difference.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] name-hash: don't add sparse directories in threaded lazy init
  2025-05-21 17:17 ` Derrick Stolee
@ 2025-05-21 18:32   ` Junio C Hamano
  2025-05-21 20:07   ` Alex Mironov
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2025-05-21 18:32 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Alex Mironov via GitGitGadget, git, Alex Mironov

Derrick Stolee <stolee@gmail.com> writes:

> This seems to be a performance-only fix, ...

Hmph, then the proposed log message is phrased in a slightly
misleading way.  I took that "should not" below ...

> Sparse directory entries represent a directory that is outside the
> sparse-checkout definition. These are not paths to blobs, so should not
> be added to the name_hash table as they must never be queried.

... to be saying that we would find something we should not find
when querying the name_hash table if we didn't omit them, and the
change is done for correctness.

Thanks for a quick review.  Very much appreciated.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] name-hash: don't add sparse directories in threaded lazy init
  2025-05-21 17:17 ` Derrick Stolee
  2025-05-21 18:32   ` Junio C Hamano
@ 2025-05-21 20:07   ` Alex Mironov
  1 sibling, 0 replies; 13+ messages in thread
From: Alex Mironov @ 2025-05-21 20:07 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Alex Mironov via GitGitGadget, git, Junio C Hamano

Hey Derrick,

Appreciate your quick response! I addressed the nits, and will submit
the new patch shortly.

> This seems to be a performance-only fix

Indeed, and it's more impactful for the specific setup we have inside
our org. In short, we have a custom implementation of the sparse
checkout logic that makes sure to leave files outside of sparse cones
present on disk. Because of this, git with sparse index enabled
frequently resorted to a full index expansion despite
'sparse.expectFilesOutsideOfPatterns=true' config set which, I assume,
was only made working with the full index. This problem specifically
was tricky to catch since multithreaded lazy load logic applies only
after a certain limit of objects present in the index.

I hope I will be able to send more patches soon to fix-up bits and
pieces of the 'sparse.expectFilesOutsideOfPatterns' flag with respect
to other expansions. I'm planning to guard those under the same
configuration option, let me know if you think it'd be better to
introduce a new option.


On Wed, May 21, 2025 at 7:17 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/21/2025 7:40 AM, Alex Mironov via GitGitGadget wrote:
> > From: Alex Mironov <alexandrfox@gmail.com>
> >
> > Similarly to 5f116695864788d1fe45ff06bfad7a71a8d98d0a
>
> nit: we typically use the "reference" style to refer to other
> commits, use 'git log -1 --pretty=reference <oid>' to get output
> like this:
>
>   5f116695864 (name-hash: don't add directories to name_hash, 2021-04-12)
>
> > make sure to avoid placing sparse directories into the name_hash
> > hashtable whenever multithreaded initialization is performed.
> >
> > Sparse directory entries represent a directory that is outside the
> > sparse-checkout definition. These are not paths to blobs, so should not
> > be added to the name_hash table as they must never be queried.
> >
> > Signed-off-by: Alex Mironov <alexandrfox@gmail.com>
> > ---
> >     name-hash: don't add sparse directories in threaded lazy init
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v1
> > Pull-Request: https://github.com/git/git/pull/1970
> >
> >  name-hash.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/name-hash.c b/name-hash.c
> > index d66de1cdfd5..03123a8779a 100644
> > --- a/name-hash.c
> > +++ b/name-hash.c
> > @@ -492,6 +492,9 @@ static void *lazy_name_thread_proc(void *_data)
> >       for (k = 0; k < d->istate->cache_nr; k++) {
> >               struct cache_entry *ce_k = d->istate->cache[k];
> >               ce_k->ce_flags |= CE_HASHED;
> > +             if (S_ISSPARSEDIR(ce_k->ce_mode)) {
> > +                     continue;
> > +             }
>
> nit: for one-line blocks, we usually skip the braces. But I think
> that it might be better to reverse the logic to get something like:
>
>         if (!S_ISSPARSEDIR(ce_k->ce_mode) {
>                 hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
>                 hashmap_add(&d->istate->name_hash, &ce_k->ent);
>         }
>
> This seems to be a performance-only fix, and it might be interesting
> to see if there is any impact on p2000-sparse-operations.sh. Those
> tests don't focus on many sparse-directory entries, so that may not
> demonstrate any meaningful difference.
>
> Thanks,
> -Stolee
>


-- 
Best,
Alex Mironov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] name-hash: don't add sparse directories in threaded lazy init
  2025-05-21 11:40 [PATCH] name-hash: don't add sparse directories in threaded lazy init Alex Mironov via GitGitGadget
  2025-05-21 17:17 ` Derrick Stolee
@ 2025-05-21 20:16 ` Alex Mironov via GitGitGadget
  2025-05-21 20:32   ` Junio C Hamano
  2025-05-21 21:29   ` [PATCH v3] " Alex Mironov via GitGitGadget
  1 sibling, 2 replies; 13+ messages in thread
From: Alex Mironov via GitGitGadget @ 2025-05-21 20:16 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano, Alex Mironov, Alex Mironov

From: Alex Mironov <alexandrfox@gmail.com>

Ensure that logic added in 5f11669586 (name-hash: don't add directories
to name_hash, 2021-04-12) also applies in multithreaded hashtable init
path.

Sparse directory entries represent a directory that is outside the
sparse-checkout definition. These are not paths to blobs, so should not
be added to the name_hash table as they must never be queried.

Signed-off-by: Alex Mironov <alexandrfox@gmail.com>
---
    name-hash: don't add sparse directories in threaded lazy init
    
    Changes since v1:
    
     * addressed feedback (code-style)

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v2
Pull-Request: https://github.com/git/git/pull/1970

Range-diff vs v1:

 1:  d12ebc612c2 ! 1:  fb378147c73 name-hash: don't add sparse directories in threaded lazy init
     @@ Metadata
       ## Commit message ##
          name-hash: don't add sparse directories in threaded lazy init
      
     -    Similarly to 5f116695864788d1fe45ff06bfad7a71a8d98d0a
     -    make sure to avoid placing sparse directories into the name_hash
     -    hashtable whenever multithreaded initialization is performed.
     +    Ensure that logic added in 5f11669586 (name-hash: don't add directories
     +    to name_hash, 2021-04-12) also applies in multithreaded hashtable init
     +    path.
      
          Sparse directory entries represent a directory that is outside the
          sparse-checkout definition. These are not paths to blobs, so should not
     @@ name-hash.c: static void *lazy_name_thread_proc(void *_data)
       	for (k = 0; k < d->istate->cache_nr; k++) {
       		struct cache_entry *ce_k = d->istate->cache[k];
       		ce_k->ce_flags |= CE_HASHED;
     -+		if (S_ISSPARSEDIR(ce_k->ce_mode)) {
     -+			continue;
     +-		hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
     +-		hashmap_add(&d->istate->name_hash, &ce_k->ent);
     ++		if (!S_ISSPARSEDIR(ce_k->ce_mode)) {
     ++			hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
     ++			hashmap_add(&d->istate->name_hash, &ce_k->ent);
      +		}
     - 		hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
     - 		hashmap_add(&d->istate->name_hash, &ce_k->ent);
       	}
     + 
     + 	return NULL;


 name-hash.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/name-hash.c b/name-hash.c
index d66de1cdfd5..b91e2762678 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -492,8 +492,10 @@ static void *lazy_name_thread_proc(void *_data)
 	for (k = 0; k < d->istate->cache_nr; k++) {
 		struct cache_entry *ce_k = d->istate->cache[k];
 		ce_k->ce_flags |= CE_HASHED;
-		hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
-		hashmap_add(&d->istate->name_hash, &ce_k->ent);
+		if (!S_ISSPARSEDIR(ce_k->ce_mode)) {
+			hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
+			hashmap_add(&d->istate->name_hash, &ce_k->ent);
+		}
 	}
 
 	return NULL;

base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] name-hash: don't add sparse directories in threaded lazy init
  2025-05-21 20:16 ` [PATCH v2] " Alex Mironov via GitGitGadget
@ 2025-05-21 20:32   ` Junio C Hamano
  2025-05-21 20:37     ` Alex Mironov
  2025-05-21 21:29   ` [PATCH v3] " Alex Mironov via GitGitGadget
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2025-05-21 20:32 UTC (permalink / raw)
  To: Alex Mironov via GitGitGadget; +Cc: git, Derrick Stolee, Alex Mironov

"Alex Mironov via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Alex Mironov <alexandrfox@gmail.com>
>
> Ensure that logic added in 5f11669586 (name-hash: don't add directories
> to name_hash, 2021-04-12) also applies in multithreaded hashtable init
> path.
>
> Sparse directory entries represent a directory that is outside the
> sparse-checkout definition. These are not paths to blobs, so should not
> be added to the name_hash table as they must never be queried.

The second paragraph sounds as if this is a correctness fix,
i.e. "should not be added" hints that we would see a wrong result
returned from the hashmap if you added these entries to the
name_hash.  If this is a performance-only fix, that should be more
clearly stated.  Also, can we have performance numbers in the
proposed log message as well, or is the improvement too small to
measure?

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] name-hash: don't add sparse directories in threaded lazy init
  2025-05-21 20:32   ` Junio C Hamano
@ 2025-05-21 20:37     ` Alex Mironov
  2025-05-21 21:12       ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Mironov @ 2025-05-21 20:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alex Mironov via GitGitGadget, git, Derrick Stolee

Hey Junio,

With respect to messaging I more or less copy-pasted Derricks message
from the original commit for non-threaded init: please check the
referenced commit. Let me know if another wording is needed/preferred.

>  Also, can we have performance numbers in the
proposed log message as well, or is the improvement too small to
measure?

In their current form benchmarks do not show major improvements
outside of special casing I mentioned in my previous message
(existence of files/objects outside of sparse cones in special VFS
setups). So, for the general audience this indeed can be treated as a
correctness fix.

On Wed, May 21, 2025 at 10:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Alex Mironov via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Alex Mironov <alexandrfox@gmail.com>
> >
> > Ensure that logic added in 5f11669586 (name-hash: don't add directories
> > to name_hash, 2021-04-12) also applies in multithreaded hashtable init
> > path.
> >
> > Sparse directory entries represent a directory that is outside the
> > sparse-checkout definition. These are not paths to blobs, so should not
> > be added to the name_hash table as they must never be queried.
>
> The second paragraph sounds as if this is a correctness fix,
> i.e. "should not be added" hints that we would see a wrong result
> returned from the hashmap if you added these entries to the
> name_hash.  If this is a performance-only fix, that should be more
> clearly stated.  Also, can we have performance numbers in the
> proposed log message as well, or is the improvement too small to
> measure?
>
> Thanks.



-- 
Best,
Alex Mironov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] name-hash: don't add sparse directories in threaded lazy init
  2025-05-21 20:37     ` Alex Mironov
@ 2025-05-21 21:12       ` Junio C Hamano
  2025-05-21 21:23         ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2025-05-21 21:12 UTC (permalink / raw)
  To: Alex Mironov; +Cc: Alex Mironov via GitGitGadget, git, Derrick Stolee

Alex Mironov <alexandrfox@gmail.com> writes:

> Hey Junio,
>
> With respect to messaging I more or less copy-pasted Derricks message
> from the original commit for non-threaded init: please check the
> referenced commit. Let me know if another wording is needed/preferred.

I know what you did.  Copying and pasting others fuzzy words into
your commit log message does not make your commit log message clear.

I already said the given message is less clear than desired, so do I
still have to let you know???

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] name-hash: don't add sparse directories in threaded lazy init
  2025-05-21 21:12       ` Junio C Hamano
@ 2025-05-21 21:23         ` Junio C Hamano
  2025-05-21 21:40           ` Alex Mironov
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2025-05-21 21:23 UTC (permalink / raw)
  To: Alex Mironov; +Cc: Alex Mironov via GitGitGadget, git, Derrick Stolee

Junio C Hamano <gitster@pobox.com> writes:

> Alex Mironov <alexandrfox@gmail.com> writes:
>
>> Hey Junio,
>>
>> With respect to messaging I more or less copy-pasted Derricks message
>> from the original commit for non-threaded init: please check the
>> referenced commit. Let me know if another wording is needed/preferred.
>
> I know what you did.  Copying and pasting others fuzzy words into
> your commit log message does not make your commit log message clear.
>
> I already said the given message is less clear than desired, so do I
> still have to let you know???

Actually after re-reading what Derrick wrote in that commit, I
notice that you didn't even copy-pasted his message in full.  Here
is the message in 5f116695 (name-hash: don't add directories to
name_hash, 2021-04-12):

    name-hash: don't add directories to name_hash
    
    Sparse directory entries represent a directory that is outside the
    sparse-checkout definition. These are not paths to blobs, so should not
    be added to the name_hash table. Instead, they should be added to the
    directory hashtable when 'ignore_case' is true.
    
    Add a condition to avoid placing sparse directories into the name_hash
    hashtable. This avoids filling the table with extra entries that will
    never be queried.

Notice that the second paragraph here makes it clear that how extra
entries would not contribute to or hurt the correctness?  You failed
to copy-paste that crucial bit, which ended up making your version
of the explanation much less clear why the change would not affect
correctness than it could have been.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3] name-hash: don't add sparse directories in threaded lazy init
  2025-05-21 20:16 ` [PATCH v2] " Alex Mironov via GitGitGadget
  2025-05-21 20:32   ` Junio C Hamano
@ 2025-05-21 21:29   ` Alex Mironov via GitGitGadget
  2025-05-21 21:47     ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Alex Mironov via GitGitGadget @ 2025-05-21 21:29 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano, Alex Mironov, Alex Mironov

From: Alex Mironov <alexandrfox@gmail.com>

Ensure that logic added in 5f11669586 (name-hash: don't add directories
to name_hash, 2021-04-12) also applies in multithreaded hashtable init
path.

As per the original single-threaded change above: sparse directory entries
represent a directory that is outside the sparse-checkout definition.
These are not paths to blobs, so should not be added to the name_hash
table. Instead, they should be added to the directory hashtable when
'ignore_case' is true.

Add a condition to avoid placing sparse directories into the name_hash
hashtable. This avoids filling the table with extra entries that will
never be queried.

Signed-off-by: Alex Mironov <alexandrfox@gmail.com>
---
    name-hash: don't add sparse directories in threaded lazy init
    
    Changes since v2:
    
     * addressed commit message structure

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v3
Pull-Request: https://github.com/git/git/pull/1970

Range-diff vs v2:

 1:  fb378147c73 ! 1:  b91ccb640c5 name-hash: don't add sparse directories in threaded lazy init
     @@ Commit message
          to name_hash, 2021-04-12) also applies in multithreaded hashtable init
          path.
      
     -    Sparse directory entries represent a directory that is outside the
     -    sparse-checkout definition. These are not paths to blobs, so should not
     -    be added to the name_hash table as they must never be queried.
     +    As per the original single-threaded change above: sparse directory entries
     +    represent a directory that is outside the sparse-checkout definition.
     +    These are not paths to blobs, so should not be added to the name_hash
     +    table. Instead, they should be added to the directory hashtable when
     +    'ignore_case' is true.
     +
     +    Add a condition to avoid placing sparse directories into the name_hash
     +    hashtable. This avoids filling the table with extra entries that will
     +    never be queried.
      
          Signed-off-by: Alex Mironov <alexandrfox@gmail.com>
      


 name-hash.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/name-hash.c b/name-hash.c
index d66de1cdfd5..b91e2762678 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -492,8 +492,10 @@ static void *lazy_name_thread_proc(void *_data)
 	for (k = 0; k < d->istate->cache_nr; k++) {
 		struct cache_entry *ce_k = d->istate->cache[k];
 		ce_k->ce_flags |= CE_HASHED;
-		hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
-		hashmap_add(&d->istate->name_hash, &ce_k->ent);
+		if (!S_ISSPARSEDIR(ce_k->ce_mode)) {
+			hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
+			hashmap_add(&d->istate->name_hash, &ce_k->ent);
+		}
 	}
 
 	return NULL;

base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] name-hash: don't add sparse directories in threaded lazy init
  2025-05-21 21:23         ` Junio C Hamano
@ 2025-05-21 21:40           ` Alex Mironov
  0 siblings, 0 replies; 13+ messages in thread
From: Alex Mironov @ 2025-05-21 21:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alex Mironov via GitGitGadget, git, Derrick Stolee

Dear Junio,

Thank you for the feedback, I adjusted the commit message in full
which hopefully makes this patch clearer. At the same time I don't
quite understand the need for the perceived hostility - this is not
some "fuzzy" words, but the messaging from the original author of the
sparse feature. I certainly understand your desire to uphold the
standards of contributions to git, especially from the new author, but
I must say I feel quite alienated by your reply.

Nonetheless, the adjustment is submitted now and I hope that further
contributions remain welcomed.

On Wed, May 21, 2025 at 11:23 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Alex Mironov <alexandrfox@gmail.com> writes:
> >
> >> Hey Junio,
> >>
> >> With respect to messaging I more or less copy-pasted Derricks message
> >> from the original commit for non-threaded init: please check the
> >> referenced commit. Let me know if another wording is needed/preferred.
> >
> > I know what you did.  Copying and pasting others fuzzy words into
> > your commit log message does not make your commit log message clear.
> >
> > I already said the given message is less clear than desired, so do I
> > still have to let you know???
>
> Actually after re-reading what Derrick wrote in that commit, I
> notice that you didn't even copy-pasted his message in full.  Here
> is the message in 5f116695 (name-hash: don't add directories to
> name_hash, 2021-04-12):
>
>     name-hash: don't add directories to name_hash
>
>     Sparse directory entries represent a directory that is outside the
>     sparse-checkout definition. These are not paths to blobs, so should not
>     be added to the name_hash table. Instead, they should be added to the
>     directory hashtable when 'ignore_case' is true.
>
>     Add a condition to avoid placing sparse directories into the name_hash
>     hashtable. This avoids filling the table with extra entries that will
>     never be queried.
>
> Notice that the second paragraph here makes it clear that how extra
> entries would not contribute to or hurt the correctness?  You failed
> to copy-paste that crucial bit, which ended up making your version
> of the explanation much less clear why the change would not affect
> correctness than it could have been.



--
Best,
Alex Mironov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] name-hash: don't add sparse directories in threaded lazy init
  2025-05-21 21:29   ` [PATCH v3] " Alex Mironov via GitGitGadget
@ 2025-05-21 21:47     ` Junio C Hamano
  2025-05-22  1:48       ` Derrick Stolee
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2025-05-21 21:47 UTC (permalink / raw)
  To: Alex Mironov via GitGitGadget; +Cc: git, Derrick Stolee, Alex Mironov

"Alex Mironov via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Alex Mironov <alexandrfox@gmail.com>
>
> Ensure that logic added in 5f11669586 (name-hash: don't add directories
> to name_hash, 2021-04-12) also applies in multithreaded hashtable init
> path.
>
> As per the original single-threaded change above: sparse directory entries
> represent a directory that is outside the sparse-checkout definition.
> These are not paths to blobs, so should not be added to the name_hash
> table. Instead, they should be added to the directory hashtable when
> 'ignore_case' is true.
>
> Add a condition to avoid placing sparse directories into the name_hash
> hashtable. This avoids filling the table with extra entries that will
> never be queried.
>
> Signed-off-by: Alex Mironov <alexandrfox@gmail.com>
> ---

Sounds quite sensible and the above reads better.

> diff --git a/name-hash.c b/name-hash.c
> index d66de1cdfd5..b91e2762678 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -492,8 +492,10 @@ static void *lazy_name_thread_proc(void *_data)
>  	for (k = 0; k < d->istate->cache_nr; k++) {
>  		struct cache_entry *ce_k = d->istate->cache[k];
>  		ce_k->ce_flags |= CE_HASHED;
> -		hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
> -		hashmap_add(&d->istate->name_hash, &ce_k->ent);
> +		if (!S_ISSPARSEDIR(ce_k->ce_mode)) {
> +			hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
> +			hashmap_add(&d->istate->name_hash, &ce_k->ent);
> +		}
>  	}


This unfortunately gives us deeper nesting than your previous round,
but the conditional matches the original commit and easier to see
what is going on by comparing the uni- and multi-threaded variants.

Nicely done.  Will queue.

Thanks.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] name-hash: don't add sparse directories in threaded lazy init
  2025-05-21 21:47     ` Junio C Hamano
@ 2025-05-22  1:48       ` Derrick Stolee
  0 siblings, 0 replies; 13+ messages in thread
From: Derrick Stolee @ 2025-05-22  1:48 UTC (permalink / raw)
  To: Junio C Hamano, Alex Mironov via GitGitGadget; +Cc: git, Alex Mironov

On 5/21/2025 5:47 PM, Junio C Hamano wrote:
> "Alex Mironov via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Alex Mironov <alexandrfox@gmail.com>
>>
>> Ensure that logic added in 5f11669586 (name-hash: don't add directories
>> to name_hash, 2021-04-12) also applies in multithreaded hashtable init
>> path.
>>
>> As per the original single-threaded change above: sparse directory entries
>> represent a directory that is outside the sparse-checkout definition.
>> These are not paths to blobs, so should not be added to the name_hash
>> table. Instead, they should be added to the directory hashtable when
>> 'ignore_case' is true.
>>
>> Add a condition to avoid placing sparse directories into the name_hash
>> hashtable. This avoids filling the table with extra entries that will
>> never be queried.
>>
>> Signed-off-by: Alex Mironov <alexandrfox@gmail.com>
>> ---
> 
> Sounds quite sensible and the above reads better.
> 
>> diff --git a/name-hash.c b/name-hash.c
>> index d66de1cdfd5..b91e2762678 100644
>> --- a/name-hash.c
>> +++ b/name-hash.c
>> @@ -492,8 +492,10 @@ static void *lazy_name_thread_proc(void *_data)
>>  	for (k = 0; k < d->istate->cache_nr; k++) {
>>  		struct cache_entry *ce_k = d->istate->cache[k];
>>  		ce_k->ce_flags |= CE_HASHED;
>> -		hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
>> -		hashmap_add(&d->istate->name_hash, &ce_k->ent);
>> +		if (!S_ISSPARSEDIR(ce_k->ce_mode)) {
>> +			hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
>> +			hashmap_add(&d->istate->name_hash, &ce_k->ent);
>> +		}
>>  	}
> 
> 
> This unfortunately gives us deeper nesting than your previous round,
> but the conditional matches the original commit and easier to see
> what is going on by comparing the uni- and multi-threaded variants.
> 
> Nicely done.  Will queue.
I agree that this version is ready to go. Thanks for shepherding
the v2 discussion.

-Stolee

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-05-22  1:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 11:40 [PATCH] name-hash: don't add sparse directories in threaded lazy init Alex Mironov via GitGitGadget
2025-05-21 17:17 ` Derrick Stolee
2025-05-21 18:32   ` Junio C Hamano
2025-05-21 20:07   ` Alex Mironov
2025-05-21 20:16 ` [PATCH v2] " Alex Mironov via GitGitGadget
2025-05-21 20:32   ` Junio C Hamano
2025-05-21 20:37     ` Alex Mironov
2025-05-21 21:12       ` Junio C Hamano
2025-05-21 21:23         ` Junio C Hamano
2025-05-21 21:40           ` Alex Mironov
2025-05-21 21:29   ` [PATCH v3] " Alex Mironov via GitGitGadget
2025-05-21 21:47     ` Junio C Hamano
2025-05-22  1:48       ` Derrick Stolee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).