All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>, git@vger.kernel.org
Cc: vdye@github.com
Subject: Re: [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse
Date: Thu, 1 Sep 2022 13:03:12 -0400	[thread overview]
Message-ID: <e74b326d-ce4a-31c3-5424-e35858cdb569@github.com> (raw)
In-Reply-To: <20220901045736.523371-4-shaoxuan.yuan02@gmail.com>

On 9/1/2022 12:57 AM, Shaoxuan Yuan wrote: 
> Test                                                                          HEAD~   HEAD
> ---------------------------------------------------------------------------------------------------
> 2000.78: git grep --cached --sparse bogus -- f2/f1/f1/builtin/* (full-v3)     0.11    0.09 (≈)
> 2000.79: git grep --cached --sparse bogus -- f2/f1/f1/builtin/* (full-v4)     0.08    0.09 (≈)
> 2000.80: git grep --cached --sparse bogus -- f2/f1/f1/builtin/* (sparse-v3)   0.44    0.04 (-90.9%)
> 2000.81: git grep --cached --sparse bogus -- f2/f1/f1/builtin/* (sparse-v4)   0.46    0.04 (-91.3%)
> 
> - Command used for testing:
> 
> 	git grep --cached --sparse bogus -- f2/f1/f1/builtin/*

It's good to list this command after the table. It allows you to shrink
the table by using "...":

Test                                HEAD~   HEAD
---------------------------------------------------------
2000.78: git grep ... (full-v3)     0.11    0.09 (≈)
2000.79: git grep ... (full-v4)     0.08    0.09 (≈)
2000.80: git grep ... (sparse-v3)   0.44    0.04 (-90.9%)
2000.81: git grep ... (sparse-v4)   0.46    0.04 (-91.3%)

This saves horizontal space without losing clarity. The test numbers help,
too.

>  		strbuf_setlen(&name, name_base_len);
>  		strbuf_addstr(&name, ce->name);
> +		if (S_ISSPARSEDIR(ce->ce_mode)) {
> +			enum object_type type;
> +			struct tree_desc tree;
> +			void *data;
> +			unsigned long size;
> +			struct strbuf base = STRBUF_INIT;
> +
> +			strbuf_addstr(&base, ce->name);
> +
> +			data = read_object_file(&ce->oid, &type, &size);
> +			init_tree_desc(&tree, data, size);
>  
> -		if (S_ISREG(ce->ce_mode) &&
> +			/*
> +			 * sneak in the ce_mode using check_attr parameter
> +			 */
> +			hit |= grep_tree(opt, pathspec, &tree, &base,
> +					 base.len, ce->ce_mode);
> +			strbuf_release(&base);
> +			free(data);
> +		} else if (S_ISREG(ce->ce_mode) &&

I think this is a good setup for transitioning from the index scan
to the tree-walking grep_tree() method. Below, I recommend calling
the method slightly differently, though.

>  		    match_pathspec(repo->index, pathspec, name.buf, name.len, 0, NULL,
>  				   S_ISDIR(ce->ce_mode) ||
>  				   S_ISGITLINK(ce->ce_mode))) {
> @@ -598,7 +613,14 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
>  		int te_len = tree_entry_len(&entry);
>  
>  		if (match != all_entries_interesting) {
> -			strbuf_addstr(&name, base->buf + tn_len);
> +			if (S_ISSPARSEDIR(check_attr)) {
> +				// object is a sparse directory entry
> +				strbuf_addbuf(&name, base);
> +			} else {
> +				// object is a commit or a root tree
> +				strbuf_addstr(&name, base->buf + tn_len);
> +			}
> +

I think this is abusing the check_attr too much, since this will also
trigger a different if branch further down the method.

These lines are the same if tn_len is zero, so will it suffice to pass
0 for that length? You are passing base.len when you call it, so maybe
that should be zero?

When I apply this change, all tests pass, so if there _is_ something
different between the two implementations, then it isn't covered by
tests:

diff --git a/builtin/grep.c b/builtin/grep.c
index 8c0edccd8e..fc4adf876a 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -549,8 +549,7 @@ static int grep_cache(struct grep_opt *opt,
 			/*
 			 * sneak in the ce_mode using check_attr parameter
 			 */
-			hit |= grep_tree(opt, pathspec, &tree, &base,
-					 base.len, ce->ce_mode);
+			hit |= grep_tree(opt, pathspec, &tree, &base, 0, 0);
 			strbuf_release(&base);
 			free(data);
 		} else if (S_ISREG(ce->ce_mode) &&
@@ -613,13 +612,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		int te_len = tree_entry_len(&entry);
 
 		if (match != all_entries_interesting) {
-			if (S_ISSPARSEDIR(check_attr)) {
-				// object is a sparse directory entry
-				strbuf_addbuf(&name, base);
-			} else {
-				// object is a commit or a root tree
-				strbuf_addstr(&name, base->buf + tn_len);
-			}
+			strbuf_addstr(&name, base->buf + tn_len);
 
 			match = tree_entry_interesting(repo->index,
 						       &entry, &name,

> +test_perf_on_all git grep --cached --sparse bogus -- "f2/f1/f1/builtin/*"

We can't use this path in general, because we don't always run the test
using the Git repository as the test repo (see GIT_PERF_[LARGE_]REPO
variables in t/perf/README).

We _can_ however use the structure that we have implied in our construction,
which is to use a path that we know exists and is still outside of the
sparse-checkout cone. Truncating to "f2/f1/f1/*" is sufficient for this.

Modifying the test and running them on my machine, I get:

Test                               HEAD~1            HEAD
----------------------------------------------------------------------------
2000.78: git grep ... (full-v3)    0.19(0.72+0.18)   0.18(0.84+0.13) -5.3%  
2000.79: git grep ... (full-v4)    0.17(0.83+0.16)   0.19(0.84+0.14) +11.8% 
2000.80: git grep ... (sparse-v3)  0.35(1.02+0.13)   0.15(0.85+0.15) -57.1% 
2000.81: git grep ... (sparse-v4)  0.37(1.06+0.12)   0.15(0.89+0.15) -59.5%

So, it's still expensive to do the blob search over a wider pathspec than
the test as you designed it, but this will work for other repo, such as the
Linux kernel:

Test                                HEAD~1             HEAD
------------------------------------------------------------------------------
2000.78: git grep ... (full-v3)     3.16(19.37+2.55)   2.56(15.24+1.76) -19.0%
2000.79: git grep ... (full-v4)     2.97(17.84+2.00)   2.59(15.51+1.89) -12.8%
2000.80: git grep ... (sparse-v3)   8.39(24.74+2.34)   2.13(16.03+1.72) -74.6%
2000.81: git grep ... (sparse-v4)   8.39(24.73+2.40)   2.16(16.14+1.90) -74.3%

Thanks,
-Stolee

  reply	other threads:[~2022-09-01 17:03 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17  7:56 [PATCH v1 0/2] grep: integrate with sparse index Shaoxuan Yuan
2022-08-17  7:56 ` [PATCH v1 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-17 14:12   ` Derrick Stolee
2022-08-17 17:13     ` Junio C Hamano
2022-08-17 17:34       ` Victoria Dye
2022-08-17 17:43         ` Derrick Stolee
2022-08-17 18:47           ` Junio C Hamano
2022-08-17 17:37     ` Elijah Newren
2022-08-24 18:20     ` Shaoxuan Yuan
2022-08-24 19:08       ` Derrick Stolee
2022-08-17  7:56 ` [PATCH v1 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-17 14:23   ` Derrick Stolee
2022-08-24 21:06     ` Shaoxuan Yuan
2022-08-25  0:39       ` Derrick Stolee
2022-08-17 13:46 ` [PATCH v1 0/2] grep: " Derrick Stolee
2022-08-29 23:28 ` [PATCH v2 " Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-30 13:45     ` Derrick Stolee
2022-09-01  4:57 ` [PATCH v3 0/3] grep: " Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-01 17:03     ` Derrick Stolee [this message]
2022-09-01 18:31       ` Shaoxuan Yuan
2022-09-01 17:17     ` Junio C Hamano
2022-09-01 17:27       ` Junio C Hamano
2022-09-01 22:49         ` Shaoxuan Yuan
2022-09-01 22:36       ` Shaoxuan Yuan
2022-09-02  3:28     ` Victoria Dye
2022-09-02 18:47       ` Shaoxuan Yuan
2022-09-03  0:36 ` [PATCH v4 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-03  4:39     ` Junio C Hamano
2022-09-08  0:24       ` Shaoxuan Yuan
2022-09-08  0:18 ` [PATCH v5 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-10  1:07     ` Victoria Dye
2022-09-14  6:08     ` Elijah Newren
2022-09-15  2:57       ` Junio C Hamano
2022-09-18  2:14         ` Elijah Newren
2022-09-18 19:52           ` Victoria Dye
2022-09-19  1:23             ` Junio C Hamano
2022-09-19  4:27             ` Shaoxuan Yuan
2022-09-19 11:03             ` Ævar Arnfjörð Bjarmason
2022-09-20  7:13             ` Elijah Newren
2022-09-17  3:34       ` Shaoxuan Yuan
2022-09-18  4:24         ` Elijah Newren
2022-09-19  4:13           ` Shaoxuan Yuan
2022-09-17  3:45       ` Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-08 17:59     ` Junio C Hamano
2022-09-08 20:46       ` Derrick Stolee
2022-09-08 20:56         ` Junio C Hamano
2022-09-08 21:06           ` Shaoxuan Yuan
2022-09-09 12:49           ` Derrick Stolee
2022-09-13 17:23         ` Junio C Hamano
2022-09-10  2:04     ` Victoria Dye
2022-09-23  4:18 ` [PATCH v6 0/1] grep: integrate with sparse index Shaoxuan Yuan
2022-09-23  4:18   ` [PATCH v6 1/1] builtin/grep.c: " Shaoxuan Yuan
2022-09-23 16:40     ` Junio C Hamano
2022-09-23 16:58     ` Junio C Hamano
2022-09-26 17:28       ` Junio C Hamano
2022-09-23 14:13   ` [PATCH v6 0/1] grep: " Derrick Stolee
2022-09-23 16:01   ` Victoria Dye
2022-09-23 17:08     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e74b326d-ce4a-31c3-5424-e35858cdb569@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.