All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
To: Derrick Stolee <derrickstolee@github.com>, git@vger.kernel.org
Cc: vdye@github.com
Subject: Re: [PATCH v1 1/2] builtin/grep.c: add --sparse option
Date: Thu, 25 Aug 2022 02:20:27 +0800	[thread overview]
Message-ID: <74ff1a97-fa98-7280-9d84-35dafaf3cb3d@gmail.com> (raw)
In-Reply-To: <80f24382-1188-d450-d1e2-42f68c08e60b@github.com>

Hi reviewrs,

I came back from busying with relocation :)

On 8/17/2022 10:12 PM, Derrick Stolee wrote:
 > On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
 >> Add a --sparse option to `git-grep`. This option is mainly used to:
 >>
 >> If searching in the index (using --cached):
 >>
 >> With --sparse, proceed the action when the current cache_entry is
 >
 > This phrasing is awkward. It might be better to reframe to describe the
 > _why_ before the _what_
 >
 >   When the '--cached' option is used with the 'git grep' command, the
 >   search is limited to the blobs found in the index, not in the worktree.
 >   If the user has enabled sparse-checkout, this might present more 
results
 >   than they would like, since the files outside of the 
sparse-checkout are
 >   unlikely to be important to them.
 >
 >   Change the default behavior of 'git grep' to focus on the files within
 >   the sparse-checkout definition. To enable the previous behavior, add a
 >   '--sparse' option to 'git grep' that triggers the old behavior that
 >   inspects paths outside of the sparse-checkout definition when paired
 >   with the '--cached' option.

Good suggestion!

 > Or something like that. The documentation updates will also help clarify
 > what happens when '--cached' is not included. I assume '--sparse' is
 > ignored, but perhaps it _could_ allow looking at the cached files outside
 > the sparse-checkout definition, this could make the simpler invocation of
 > 'git grep --sparse <pattern>' be the way that users can search after 
their
 > attempt to search the worktree failed.

This simpler version was in my earlier local branch, but later I
decided not to go with it. I found the difference between these two
approaches, is that "--cached --sparse" is more correct in terms of
how Git actually works (because sparsity is a concept in the index);
and "--sparse" is more comfortable for the end user.

I found the former one better here, because it is more self-explanatory,
and thus more info for the user, i.e. "you are now looking at the
index, and Git will also consider files outside of sparse definition."

To be honest, I don't know which one is "better", but I think I'll
keep the current implementation unless something more convincing shows
up later.

 >> marked with SKIP_WORKTREE bit (the default is to skip this kind of
 >> entry). Before this patch, --cached itself can realize this action.
 >> Adding --sparse here grants the user finer control over sparse
 >> entries. If the user only wants to peak into the index without
 >
 > s/peak/peek/
 >
 >> caring about sparse entries, --cached should suffice; if the user
 >> wants to peak into the index _and_ cares about sparse entries,
 >> combining --sparse with --cached can address this need.
 >>
 >> Suggested-by: Victoria Dye <vdye@github.com>
 >> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
 >> ---
 >>  builtin/grep.c                  | 10 +++++++++-
 >>  t/t7817-grep-sparse-checkout.sh | 12 ++++++------
 >>  2 files changed, 15 insertions(+), 7 deletions(-)
 >
 > You mentioned in Slack that you missed the documentation of the --sparse
 > option. Just pointing it out here so we don't forget.

Will do.

 >>
 >> diff --git a/builtin/grep.c b/builtin/grep.c
 >> index e6bcdf860c..61402e8084 100644
 >> --- a/builtin/grep.c
 >> +++ b/builtin/grep.c
 >> @@ -96,6 +96,8 @@ static pthread_cond_t cond_result;
 >>
 >>  static int skip_first_line;
 >>
 >> +static int grep_sparse = 0;
 >> +
 >
 > I initially thought it might be good to not define an additional global,
 > but there are many defined in this file outside of the context and they
 > are spread out with extra whitespace like this.
 >
 >>  static void add_work(struct grep_opt *opt, struct grep_source *gs)
 >>  {
 >>      if (opt->binary != GREP_BINARY_TEXT)
 >> @@ -525,7 +527,11 @@ static int grep_cache(struct grep_opt *opt,
 >>      for (nr = 0; nr < repo->index->cache_nr; nr++) {
 >>          const struct cache_entry *ce = repo->index->cache[nr];
 >>
 >> -        if (!cached && ce_skip_worktree(ce))
 >
 > This logic would skip files marked with SKIP_WORKTREE _unless_ --cached
 > was provided.
 >
 >> +        /*
 >> +         * If ce is a SKIP_WORKTREE entry, look into it when both
 >> +         * --sparse and --cached are given.
 >> +         */
 >> +        if (!(grep_sparse && cached) && ce_skip_worktree(ce))
 >>              continue;
 >
 > The logic of this if statement is backwards from the comment because a
 > true statement means "skip the entry" _not_ "look into it".
 >
 >     /*
 >      * Skip entries with SKIP_WORKTREE unless both --sparse and
 >      * --cached are given.
 >      */

Got it.

 > But again, we might want to consider this alternative:
 >
 >     /*
 >      * Skip entries with SKIP_WORKTREE unless --sparse is given.
 >      */
 >     if (!grep_sparse && ce_skip_worktree(ce))
 >         continue;
 >
 > This will require further changes below, specifically this bit:
 >
 >             /*
 >              * If CE_VALID is on, we assume worktree file and its
 >              * cache entry are identical, even if worktree file has
 >              * been modified, so use cache version instead
 >              */
 >             if (cached || (ce->ce_flags & CE_VALID)) {
 >                 if (ce_stage(ce) || ce_intent_to_add(ce))
 >                     continue;
 >                 hit |= grep_oid(opt, &ce->oid, name.buf,
 >                          0, name.buf);
 >             } else {
 >
 > We need to activate this grep_oid() call also when ce_skip_worktree(c) is
 > true. That is, if we want 'git grep --sparse' to extend the search beyond
 > the worktree and into the sparse entries.
 >
 >>
 >>          strbuf_setlen(&name, name_base_len);
 >> @@ -963,6 +969,8 @@ int cmd_grep(int argc, const char **argv, const 
char *prefix)
 >>                 PARSE_OPT_NOCOMPLETE),
 >>          OPT_INTEGER('m', "max-count", &opt.max_count,
 >>              N_("maximum number of results per file")),
 >> +        OPT_BOOL(0, "sparse", &grep_sparse,
 >> +             N_("search sparse contents and expand sparse index")),
 >
 > This "and expand sparse index" is an internal implementation detail, 
not a
 > heplful item for the help text. Instead, perhaps:
 >
 >     "search the contents of files outside the sparse-checkout definition"

Sounds good!

 > (Also, while the sparse index is being expanded right now, I would expect
 > to not expand the sparse index by the end of the series.)
 >
 >> -test_expect_success 'grep --cached searches entries with the 
SKIP_WORKTREE bit' '
 >> +test_expect_success 'grep --cached and --sparse searches entries 
with the SKIP_WORKTREE bit' '
 >>      cat >expect <<-EOF &&
 >>      a:text
 >>      b:text
 >>      dir/c:text
 >>      EOF
 >> -    git grep --cached "text" >actual &&
 >> +    git grep --cached --sparse "text" >actual &&
 >>      test_cmp expect actual
 >>  '
 >
 > Please add a test that demonstrates the change of behavior when only 
--cached
 > is provided, not --sparse.

Sure!

 > (If you take my suggestion to allow 'git grep --sparse' to do something
 > different, then also add a test for that case.)
 >
 >>
 >> @@ -143,7 +143,7 @@ test_expect_success 'grep --recurse-submodules 
honors sparse checkout in submodu
 >>      test_cmp expect actual
 >>  '
 >>
 >> -test_expect_success 'grep --recurse-submodules --cached searches 
entries with the SKIP_WORKTREE bit' '
 >> +test_expect_success 'grep --recurse-submodules --cached and 
--sparse searches entries with the SKIP_WORKTREE bit' '
 >>      cat >expect <<-EOF &&
 >>      a:text
 >>      b:text
 >> @@ -152,7 +152,7 @@ test_expect_success 'grep --recurse-submodules 
--cached searches entries with th
 >>      sub/B/b:text
 >>      sub2/a:text
 >>      EOF
 >> -    git grep --recurse-submodules --cached "text" >actual &&
 >> +    git grep --recurse-submodules --cached --sparse "text" >actual &&
 >>      test_cmp expect actual
 >>  '
 >> @@ -166,7 +166,7 @@ test_expect_success 'working tree grep does not 
search the index with CE_VALID a
 >>      test_cmp expect actual
 >>  '
 >>
 >> -test_expect_success 'grep --cached searches index entries with both 
CE_VALID and SKIP_WORKTREE' '
 >> +test_expect_success 'grep --cached and --sparse searches index 
entries with both CE_VALID and SKIP_WORKTREE' '
 >>      cat >expect <<-EOF &&
 >>      a:text
 >>      b:text
 >> @@ -174,7 +174,7 @@ test_expect_success 'grep --cached searches 
index entries with both CE_VALID and
 >>      EOF
 >>      test_when_finished "git update-index --no-assume-unchanged b" &&
 >>      git update-index --assume-unchanged b &&
 >> -    git grep --cached text >actual &&
 >> +    git grep --cached --sparse text >actual &&
 >>      test_cmp expect actual
 >>  '
 >
 > Same with these two tests. Add additional commands that show the 
change of
 > behavior when only using '--cached'.

--
Thanks,
Shaoxuan


  parent reply	other threads:[~2022-08-24 18:20 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17  7:56 [PATCH v1 0/2] grep: integrate with sparse index Shaoxuan Yuan
2022-08-17  7:56 ` [PATCH v1 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-17 14:12   ` Derrick Stolee
2022-08-17 17:13     ` Junio C Hamano
2022-08-17 17:34       ` Victoria Dye
2022-08-17 17:43         ` Derrick Stolee
2022-08-17 18:47           ` Junio C Hamano
2022-08-17 17:37     ` Elijah Newren
2022-08-24 18:20     ` Shaoxuan Yuan [this message]
2022-08-24 19:08       ` Derrick Stolee
2022-08-17  7:56 ` [PATCH v1 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-17 14:23   ` Derrick Stolee
2022-08-24 21:06     ` Shaoxuan Yuan
2022-08-25  0:39       ` Derrick Stolee
2022-08-17 13:46 ` [PATCH v1 0/2] grep: " Derrick Stolee
2022-08-29 23:28 ` [PATCH v2 " Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-30 13:45     ` Derrick Stolee
2022-09-01  4:57 ` [PATCH v3 0/3] grep: " Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-01 17:03     ` Derrick Stolee
2022-09-01 18:31       ` Shaoxuan Yuan
2022-09-01 17:17     ` Junio C Hamano
2022-09-01 17:27       ` Junio C Hamano
2022-09-01 22:49         ` Shaoxuan Yuan
2022-09-01 22:36       ` Shaoxuan Yuan
2022-09-02  3:28     ` Victoria Dye
2022-09-02 18:47       ` Shaoxuan Yuan
2022-09-03  0:36 ` [PATCH v4 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-03  4:39     ` Junio C Hamano
2022-09-08  0:24       ` Shaoxuan Yuan
2022-09-08  0:18 ` [PATCH v5 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-10  1:07     ` Victoria Dye
2022-09-14  6:08     ` Elijah Newren
2022-09-15  2:57       ` Junio C Hamano
2022-09-18  2:14         ` Elijah Newren
2022-09-18 19:52           ` Victoria Dye
2022-09-19  1:23             ` Junio C Hamano
2022-09-19  4:27             ` Shaoxuan Yuan
2022-09-19 11:03             ` Ævar Arnfjörð Bjarmason
2022-09-20  7:13             ` Elijah Newren
2022-09-17  3:34       ` Shaoxuan Yuan
2022-09-18  4:24         ` Elijah Newren
2022-09-19  4:13           ` Shaoxuan Yuan
2022-09-17  3:45       ` Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-08 17:59     ` Junio C Hamano
2022-09-08 20:46       ` Derrick Stolee
2022-09-08 20:56         ` Junio C Hamano
2022-09-08 21:06           ` Shaoxuan Yuan
2022-09-09 12:49           ` Derrick Stolee
2022-09-13 17:23         ` Junio C Hamano
2022-09-10  2:04     ` Victoria Dye
2022-09-23  4:18 ` [PATCH v6 0/1] grep: integrate with sparse index Shaoxuan Yuan
2022-09-23  4:18   ` [PATCH v6 1/1] builtin/grep.c: " Shaoxuan Yuan
2022-09-23 16:40     ` Junio C Hamano
2022-09-23 16:58     ` Junio C Hamano
2022-09-26 17:28       ` Junio C Hamano
2022-09-23 14:13   ` [PATCH v6 0/1] grep: " Derrick Stolee
2022-09-23 16:01   ` Victoria Dye
2022-09-23 17:08     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74ff1a97-fa98-7280-9d84-35dafaf3cb3d@gmail.com \
    --to=shaoxuan.yuan02@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.