From: Alex Mironov <alexandrfox@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Alex Mironov via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] name-hash: don't add sparse directories in threaded lazy init
Date: Wed, 21 May 2025 22:07:04 +0200 [thread overview]
Message-ID: <CAC97EbwMKB5MUD==-6dHuj18OGa--p_UzWw+CsbGTM0iQ8Mw0A@mail.gmail.com> (raw)
In-Reply-To: <9c26d844-6ac5-449b-a5ff-a842ed6ba8b9@gmail.com>
Hey Derrick,
Appreciate your quick response! I addressed the nits, and will submit
the new patch shortly.
> This seems to be a performance-only fix
Indeed, and it's more impactful for the specific setup we have inside
our org. In short, we have a custom implementation of the sparse
checkout logic that makes sure to leave files outside of sparse cones
present on disk. Because of this, git with sparse index enabled
frequently resorted to a full index expansion despite
'sparse.expectFilesOutsideOfPatterns=true' config set which, I assume,
was only made working with the full index. This problem specifically
was tricky to catch since multithreaded lazy load logic applies only
after a certain limit of objects present in the index.
I hope I will be able to send more patches soon to fix-up bits and
pieces of the 'sparse.expectFilesOutsideOfPatterns' flag with respect
to other expansions. I'm planning to guard those under the same
configuration option, let me know if you think it'd be better to
introduce a new option.
On Wed, May 21, 2025 at 7:17 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/21/2025 7:40 AM, Alex Mironov via GitGitGadget wrote:
> > From: Alex Mironov <alexandrfox@gmail.com>
> >
> > Similarly to 5f116695864788d1fe45ff06bfad7a71a8d98d0a
>
> nit: we typically use the "reference" style to refer to other
> commits, use 'git log -1 --pretty=reference <oid>' to get output
> like this:
>
> 5f116695864 (name-hash: don't add directories to name_hash, 2021-04-12)
>
> > make sure to avoid placing sparse directories into the name_hash
> > hashtable whenever multithreaded initialization is performed.
> >
> > Sparse directory entries represent a directory that is outside the
> > sparse-checkout definition. These are not paths to blobs, so should not
> > be added to the name_hash table as they must never be queried.
> >
> > Signed-off-by: Alex Mironov <alexandrfox@gmail.com>
> > ---
> > name-hash: don't add sparse directories in threaded lazy init
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v1
> > Pull-Request: https://github.com/git/git/pull/1970
> >
> > name-hash.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/name-hash.c b/name-hash.c
> > index d66de1cdfd5..03123a8779a 100644
> > --- a/name-hash.c
> > +++ b/name-hash.c
> > @@ -492,6 +492,9 @@ static void *lazy_name_thread_proc(void *_data)
> > for (k = 0; k < d->istate->cache_nr; k++) {
> > struct cache_entry *ce_k = d->istate->cache[k];
> > ce_k->ce_flags |= CE_HASHED;
> > + if (S_ISSPARSEDIR(ce_k->ce_mode)) {
> > + continue;
> > + }
>
> nit: for one-line blocks, we usually skip the braces. But I think
> that it might be better to reverse the logic to get something like:
>
> if (!S_ISSPARSEDIR(ce_k->ce_mode) {
> hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
> hashmap_add(&d->istate->name_hash, &ce_k->ent);
> }
>
> This seems to be a performance-only fix, and it might be interesting
> to see if there is any impact on p2000-sparse-operations.sh. Those
> tests don't focus on many sparse-directory entries, so that may not
> demonstrate any meaningful difference.
>
> Thanks,
> -Stolee
>
--
Best,
Alex Mironov
next prev parent reply other threads:[~2025-05-21 20:07 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 11:40 [PATCH] name-hash: don't add sparse directories in threaded lazy init Alex Mironov via GitGitGadget
2025-05-21 17:17 ` Derrick Stolee
2025-05-21 18:32 ` Junio C Hamano
2025-05-21 20:07 ` Alex Mironov [this message]
2025-05-21 20:16 ` [PATCH v2] " Alex Mironov via GitGitGadget
2025-05-21 20:32 ` Junio C Hamano
2025-05-21 20:37 ` Alex Mironov
2025-05-21 21:12 ` Junio C Hamano
2025-05-21 21:23 ` Junio C Hamano
2025-05-21 21:40 ` Alex Mironov
2025-05-21 21:29 ` [PATCH v3] " Alex Mironov via GitGitGadget
2025-05-21 21:47 ` Junio C Hamano
2025-05-22 1:48 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAC97EbwMKB5MUD==-6dHuj18OGa--p_UzWw+CsbGTM0iQ8Mw0A@mail.gmail.com' \
--to=alexandrfox@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).