public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: amisha <amishhhaaaa@gmail.com>, git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
	Elijah Newren <newren@gmail.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH v5 1/2] sparse-checkout: optimize string_list construction
Date: Mon, 19 Jan 2026 12:04:45 -0500	[thread overview]
Message-ID: <36b50d7d-b9f4-4ff3-b00e-9c98ad690749@gmail.com> (raw)
In-Reply-To: <20260119123339.48435-1-amishhhaaaa@gmail.com>

On 1/19/2026 7:33 AM, amisha wrote:
> From: Amisha Chhajed <amishhhaaaa@gmail.com>
> 
> Improve O(n^2) complexity to O(n log n) while building a sorted
> 'string_list' by constructing it unsorted then sorting it
> followed by removing duplicates.
> 
> Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
> ---
>  builtin/sparse-checkout.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 15d51e60a8..7dfb276bf0 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -91,10 +91,11 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
>  
>  		hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
>  			/* pe->pattern starts with "/", skip it */
> -			string_list_insert(&sl, pe->pattern + 1);
> +			string_list_append(&sl, pe->pattern + 1);
>  		}
>  
>  		string_list_sort(&sl);
> +		string_list_remove_duplicates(&sl, 0);

Shouldn't this line be added in the other uses of string_list_append()?

>  
>  		for (i = 0; i < sl.nr; i++) {
>  			quote_c_style(sl.items[i].string, NULL, stdout, 0);
> @@ -289,7 +290,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
>  		if (!hashmap_contains_parent(&pl->recursive_hashmap,
>  					     pe->pattern,
>  					     &parent_pattern))
> -			string_list_insert(&sl, pe->pattern);
> +			string_list_append(&sl, pe->pattern);
>  	}
> 
>  	string_list_sort(&sl);
Actually, there is a string_list_remove_duplicates() just
outside of the context of this diff. 

> @@ -311,7 +312,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
>  		if (!hashmap_contains_parent(&pl->recursive_hashmap,
>  					     pe->pattern,
>  					     &parent_pattern))
> -			string_list_insert(&sl, pe->pattern);
> +			string_list_append(&sl, pe->pattern);
>  	}
>  
>  	strbuf_release(&parent_pattern);

Same here.

This diff looks good, but I do believe it would be good to include your
duplicate test here instead of in a second patch.

Also, the way your second patch appeared as a trailer of your first patch,
so it didn't appear properly as a thread in my email client. Here it is
for review:

> From b20a99f0773bab063a31eea6fead730e18200ca7 Mon Sep 17 00:00:00 2001
> From: Amisha Chhajed <amishhhaaaa@gmail.com>
> Date: Mon, 19 Jan 2026 00:20:47 +0530
> Subject: [PATCH v5 2/2] t1091: Add tests for deduplication of cone-mode sparse
>  patterns

nit: this title is a little long and has incorrect capitalization. Take
note for later, since I expect this diff to be squashed into the previous
patch.

> +test_expect_success 'sparse-checkout deduplicates repeated cone patterns' '
> +    rm -f repo/.git/info/sparse-checkout &&
> +    git -C repo sparse-checkout init --cone &&
> +    git -C repo sparse-checkout add --stdin <<-\EOF &&
> +	/foo/
> +	/bar/
> +	/foo/
> +	EOF

This slashes are redundant for cone-mode patterns. I recommend a more
interesting case, such as

	foo/bar/baz
	a/b/c
	foo/bar/baz
	a/b

This should remove the duplicates foo/bar/baz during the run, but also it
should notice that a/b/c is contained within the recursive set defined by
a/b.

The resulting sparse-checkout file should have lines such as

	/*
	!/*/
	/a/
	!/a/*/
	/a/b
	/foo
	!/foo/*/
	/foo/bar
	!/foo/bar/*/
	/foo/bar/baz

The order might be different, but this is what I recall from how nested
directories work in cone mode.

> +    cat >expect <<-\EOF &&
> +	/*
> +	!/*/
> +	/bar/
> +	/foo/
> +	EOF
> +    test_cmp expect repo/.git/info/sparse-checkout
> +'

> +test_expect_success 'sparse-checkout list deduplicates repeated cone patterns' '
> +    rm -f repo/.git/info/sparse-checkout &&
> +    git -C repo sparse-checkout init --cone &&
> +    git -C repo sparse-checkout add --stdin <<-\EOF &&
> +	/foo/
> +	/bar/
> +	/foo/
> +	EOF
> +    git -C repo sparse-checkout list >actual &&
> +    cat >expect <<-\EOF &&
> +	bar
> +	foo
> +	EOF
> +    test_cmp expect actual
> +'

This does lead to an interesting case where the 'list' command was only
interacting with the sparse-checkout file, which wouldn't have duplicates
if it was modified by the user in cone mode.

Keep in mind that you're not actually testing the 'list' command, because
the 'add' command already deduplicated. You'll need to modify the
sparse-checkout file itself to get an interesting test of the 'list'
command.

When not in cone mode, we should not be removing duplicates because the
order of the patterns matters and we should not be reordering them. I'm
not sure if that's relevant but it's something to keep in mind while you're
testing, since the command will revert to non-cone mode if the
sparse-checkout file doesn't match the cone mode pattern expectations.

Thanks,
-Stolee

  reply	other threads:[~2026-01-19 17:04 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-14 19:28 [PATCH] sparse-checkout: optimize string_list construction amisha
2026-01-14 21:35 ` Jeff King
2026-01-18  2:39   ` Derrick Stolee
2026-01-15 12:56 ` [PATCH v2] " amisha
2026-01-15 13:09 ` [PATCH v3] " amisha
2026-01-15 13:15   ` Amisha Chhajed
2026-01-15 20:09     ` Jeff King
2026-01-16 17:03       ` Amisha Chhajed
2026-01-15 22:26     ` René Scharfe
2026-01-16  8:30       ` Amisha Chhajed
2026-01-16 17:17         ` Junio C Hamano
2026-01-18  2:46           ` Derrick Stolee
2026-01-18 13:09             ` Amisha Chhajed
2026-01-15 13:55   ` Junio C Hamano
2026-01-16 16:50 ` [PATCH] " amisha
2026-01-16 19:11   ` Junio C Hamano
2026-01-18 13:07     ` Amisha Chhajed
2026-01-19  5:32     ` Jeff King
2026-01-19 19:06       ` Junio C Hamano
2026-01-19 12:33 ` [PATCH v5 1/2] " amisha
2026-01-19 17:04   ` Derrick Stolee [this message]
2026-01-19 18:33     ` Pushkar Singh
2026-01-20 15:47     ` Amisha Chhajed
2026-01-20 15:38 ` [PATCH v6] sparse-checkout: optimize string_list construction and add tests to verify deduplication amisha
2026-01-20 20:37   ` Derrick Stolee
2026-01-21 13:00   ` [PATCH v7] " Amisha Chhajed
2026-01-21 16:28     ` Derrick Stolee
2026-01-21 16:51       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36b50d7d-b9f4-4ff3-b00e-9c98ad690749@gmail.com \
    --to=stolee@gmail.com \
    --cc=amishhhaaaa@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox