All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>, git@vger.kernel.org
Cc: vdye@github.com
Subject: Re: [PATCH v1 2/2] builtin/grep.c: integrate with sparse index
Date: Wed, 24 Aug 2022 20:39:31 -0400	[thread overview]
Message-ID: <243dab1e-990e-d8da-3a75-fe1beab18db2@github.com> (raw)
In-Reply-To: <666dc1a3-5f18-c487-6290-44b0646f5724@gmail.com>

On 8/24/22 5:06 PM, Shaoxuan Yuan wrote:
> On 8/17/2022 10:23 PM, Derrick Stolee wrote:
>> On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
>>> Turn on sparse index and remove ensure_full_index().

>>> -    /* TODO: audit for interaction with sparse-index. */
>>> -    ensure_full_index(repo->index);
>>> +    if (grep_sparse)
> 
> A side note: this condition should be `grep_sparse && cached`.
> 
>>> +        ensure_full_index(repo->index);
>>> +
>> As mentioned before, this approach is the simplest way to make the case
>> without --sparse faster, but the case _with_ --sparse will still be slow.
>> The way to fix this would be to modify this portion of the loop:
> 
> I'm not sure. If --sparse here means we want to expand the index, it
> is expected to be slow (ensure_full_index is slow), isn't it?
> 
>>     if (S_ISREG(ce->ce_mode) &&
>>         match_pathspec(repo->index, pathspec, name.buf, name.len, 0, NULL,
>>                S_ISDIR(ce->ce_mode) ||
>>                S_ISGITLINK(ce->ce_mode))) {
>>
>> by adding an initial case
>>
>>     if (S_ISSPARSEDIR(ce->ce_mode)) {
>>         hit |= grep_tree(opt, &ce->oid, name.buf, 0, name.buf);
>>     } else if (S_ISREG(ce->ce_mode) &&
>>            match_pathspec(repo->index, pathspec, name.buf, name.len, 0, NULL,
>>                   S_ISDIR(ce->ce_mode) ||
>>                   S_ISGITLINK(ce->ce_mode))) {
>>
>> and appropriately implement "grep_tree()" to walk the tree at ce->oid to
>> find all matching files within, then call grep_oid() for each of those
>> paths.
> 
> Tree walking is faster, yes. So, for this approach to be faster, I
> think you are suggesting we should not expand the index, even when
> --sparse is given? Instead, we just rely on the tree walking logic,
> right?

Yes. Tree walking is a sizeable portion of the cost of expanding the
index, but we also avoid constructing the new index _and_ we can use
the t1092 tests to show that we are satisfying the behavior without
resorting to ensure_full_index(). It shows that we are doing the "most
correct" thing.

Walking trees also provides the way to speed up when focused on a
pathspec, since maybe the pathspec reduces the scope of the tree
search automatically (from existing tree-walking logic). Expanding
the index means "walk all the trees, then scan all the files" when
there might be better things to do instead.

Thanks,
-Stolee

  reply	other threads:[~2022-08-25  0:39 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17  7:56 [PATCH v1 0/2] grep: integrate with sparse index Shaoxuan Yuan
2022-08-17  7:56 ` [PATCH v1 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-17 14:12   ` Derrick Stolee
2022-08-17 17:13     ` Junio C Hamano
2022-08-17 17:34       ` Victoria Dye
2022-08-17 17:43         ` Derrick Stolee
2022-08-17 18:47           ` Junio C Hamano
2022-08-17 17:37     ` Elijah Newren
2022-08-24 18:20     ` Shaoxuan Yuan
2022-08-24 19:08       ` Derrick Stolee
2022-08-17  7:56 ` [PATCH v1 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-17 14:23   ` Derrick Stolee
2022-08-24 21:06     ` Shaoxuan Yuan
2022-08-25  0:39       ` Derrick Stolee [this message]
2022-08-17 13:46 ` [PATCH v1 0/2] grep: " Derrick Stolee
2022-08-29 23:28 ` [PATCH v2 " Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-30 13:45     ` Derrick Stolee
2022-09-01  4:57 ` [PATCH v3 0/3] grep: " Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-01 17:03     ` Derrick Stolee
2022-09-01 18:31       ` Shaoxuan Yuan
2022-09-01 17:17     ` Junio C Hamano
2022-09-01 17:27       ` Junio C Hamano
2022-09-01 22:49         ` Shaoxuan Yuan
2022-09-01 22:36       ` Shaoxuan Yuan
2022-09-02  3:28     ` Victoria Dye
2022-09-02 18:47       ` Shaoxuan Yuan
2022-09-03  0:36 ` [PATCH v4 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-03  4:39     ` Junio C Hamano
2022-09-08  0:24       ` Shaoxuan Yuan
2022-09-08  0:18 ` [PATCH v5 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-10  1:07     ` Victoria Dye
2022-09-14  6:08     ` Elijah Newren
2022-09-15  2:57       ` Junio C Hamano
2022-09-18  2:14         ` Elijah Newren
2022-09-18 19:52           ` Victoria Dye
2022-09-19  1:23             ` Junio C Hamano
2022-09-19  4:27             ` Shaoxuan Yuan
2022-09-19 11:03             ` Ævar Arnfjörð Bjarmason
2022-09-20  7:13             ` Elijah Newren
2022-09-17  3:34       ` Shaoxuan Yuan
2022-09-18  4:24         ` Elijah Newren
2022-09-19  4:13           ` Shaoxuan Yuan
2022-09-17  3:45       ` Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-08 17:59     ` Junio C Hamano
2022-09-08 20:46       ` Derrick Stolee
2022-09-08 20:56         ` Junio C Hamano
2022-09-08 21:06           ` Shaoxuan Yuan
2022-09-09 12:49           ` Derrick Stolee
2022-09-13 17:23         ` Junio C Hamano
2022-09-10  2:04     ` Victoria Dye
2022-09-23  4:18 ` [PATCH v6 0/1] grep: integrate with sparse index Shaoxuan Yuan
2022-09-23  4:18   ` [PATCH v6 1/1] builtin/grep.c: " Shaoxuan Yuan
2022-09-23 16:40     ` Junio C Hamano
2022-09-23 16:58     ` Junio C Hamano
2022-09-26 17:28       ` Junio C Hamano
2022-09-23 14:13   ` [PATCH v6 0/1] grep: " Derrick Stolee
2022-09-23 16:01   ` Victoria Dye
2022-09-23 17:08     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=243dab1e-990e-d8da-3a75-fe1beab18db2@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.