public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sparse-checkout: optimize string_list construction
@ 2026-01-14 19:28 amisha
  2026-01-14 21:35 ` Jeff King
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: amisha @ 2026-01-14 19:28 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee, Elijah Newren, Jeff King, amisha

Improve O(n^2) complexity to O(n log n) while building a sorted 'string_list' by constructing it unsorted and sorting it afterwards.

Signed-off-by: amisha <amishhhaaaa@gmail.com>
---
Note for reviewers:
I identified this as a strong candidate for optimization because we are 
pulling entries from a hashmap. Since hashmaps inherently guarantee 
uniqueness of keys, using string_list_append() is safe here.

 builtin/sparse-checkout.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 15d51e60a8..0a44808ed2 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -91,7 +91,7 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
 
 		hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
 			/* pe->pattern starts with "/", skip it */
-			string_list_insert(&sl, pe->pattern + 1);
+			string_list_append(&sl, pe->pattern + 1);
 		}
 
 		string_list_sort(&sl);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] sparse-checkout: optimize string_list construction
  2026-01-14 19:28 [PATCH] sparse-checkout: optimize string_list construction amisha
@ 2026-01-14 21:35 ` Jeff King
  2026-01-18  2:39   ` Derrick Stolee
  2026-01-15 12:56 ` [PATCH v2] " amisha
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2026-01-14 21:35 UTC (permalink / raw)
  To: amisha; +Cc: git, Junio C Hamano, Derrick Stolee, Elijah Newren

On Thu, Jan 15, 2026 at 12:58:03AM +0530, amisha wrote:

> Improve O(n^2) complexity to O(n log n) while building a sorted 'string_list' by constructing it unsorted and sorting it afterwards.
> 
> Signed-off-by: amisha <amishhhaaaa@gmail.com>

Thanks, I think the patch is an obvious improvement. In general, please
wrap your lines to something more reasonable (usually 70 or so is
common). And make sure your sign-off identity matches the DCO section of
Documentation/SubmittingPatches, in particular this part:

  Please use a known identity in the `Signed-off-by` trailer, since we cannot
  accept anonymous contributions. It is common, but not required, to use some form
  of your real name. We realize that some contributors are not comfortable doing
  so or prefer to contribute under a pseudonym or preferred name and we can accept
  your patch either way, as long as the name and email you use are distinctive,
  identifying, and not misleading.
  
  The goal of this policy is to allow us to have sufficient information to contact
  you if questions arise about your contribution.

I think what you have is probably sufficient, but if you are not opposed
to giving more identity information, we do usually prefer more full
names.

> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 15d51e60a8..0a44808ed2 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -91,7 +91,7 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
>  
>  		hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
>  			/* pe->pattern starts with "/", skip it */
> -			string_list_insert(&sl, pe->pattern + 1);
> +			string_list_append(&sl, pe->pattern + 1);
>  		}
>  
>  		string_list_sort(&sl);

Since we already sort here, I was quite curious how this came about.  It
looks like the _insert() call and the _sort() were both added together
in de11951b03 (sparse-checkout: list directories in cone mode,
2019-12-30).

I'd guess it was just a typo/brain-o to mix up append and insert.

Doesn't the same issue exist in write_cone_to_file(), too (in two
separate spots)?

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2] sparse-checkout: optimize string_list construction
  2026-01-14 19:28 [PATCH] sparse-checkout: optimize string_list construction amisha
  2026-01-14 21:35 ` Jeff King
@ 2026-01-15 12:56 ` amisha
  2026-01-15 13:09 ` [PATCH v3] " amisha
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: amisha @ 2026-01-15 12:56 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, derrickstolee, amisha

Improve O(n^2) complexity to O(n log n) while building a sorted 'string_list' by constructing it unsorted and sorting it afterwards.

Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
---
 builtin/sparse-checkout.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 15d51e60a8..edabe7cbd9 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -91,7 +91,7 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
 
 		hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
 			/* pe->pattern starts with "/", skip it */
-			string_list_insert(&sl, pe->pattern + 1);
+			string_list_append(&sl, pe->pattern + 1);
 		}
 
 		string_list_sort(&sl);
@@ -289,11 +289,10 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 		if (!hashmap_contains_parent(&pl->recursive_hashmap,
 					     pe->pattern,
 					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
+			string_list_append(&sl, pe->pattern);
 	}
 
 	string_list_sort(&sl);
-	string_list_remove_duplicates(&sl, 0);
 
 	fprintf(fp, "/*\n!/*/\n");
 
@@ -311,13 +310,12 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 		if (!hashmap_contains_parent(&pl->recursive_hashmap,
 					     pe->pattern,
 					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
+			string_list_append(&sl, pe->pattern);
 	}
 
 	strbuf_release(&parent_pattern);
 
 	string_list_sort(&sl);
-	string_list_remove_duplicates(&sl, 0);
 
 	for (i = 0; i < sl.nr; i++) {
 		char *pattern = escaped_pattern(sl.items[i].string);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3] sparse-checkout: optimize string_list construction
  2026-01-14 19:28 [PATCH] sparse-checkout: optimize string_list construction amisha
  2026-01-14 21:35 ` Jeff King
  2026-01-15 12:56 ` [PATCH v2] " amisha
@ 2026-01-15 13:09 ` amisha
  2026-01-15 13:15   ` Amisha Chhajed
  2026-01-15 13:55   ` Junio C Hamano
  2026-01-16 16:50 ` [PATCH] " amisha
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: amisha @ 2026-01-15 13:09 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee, newren, peff, amishhhaaaa

Improve O(n^2) complexity to O(n log n) while building a sorted 'string_list'
by constructing it unsorted and sorting it afterwards.

Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
---
 builtin/sparse-checkout.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 15d51e60a8..edabe7cbd9 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -91,7 +91,7 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
 
 		hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
 			/* pe->pattern starts with "/", skip it */
-			string_list_insert(&sl, pe->pattern + 1);
+			string_list_append(&sl, pe->pattern + 1);
 		}
 
 		string_list_sort(&sl);
@@ -289,11 +289,10 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 		if (!hashmap_contains_parent(&pl->recursive_hashmap,
 					     pe->pattern,
 					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
+			string_list_append(&sl, pe->pattern);
 	}
 
 	string_list_sort(&sl);
-	string_list_remove_duplicates(&sl, 0);
 
 	fprintf(fp, "/*\n!/*/\n");
 
@@ -311,13 +310,12 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 		if (!hashmap_contains_parent(&pl->recursive_hashmap,
 					     pe->pattern,
 					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
+			string_list_append(&sl, pe->pattern);
 	}
 
 	strbuf_release(&parent_pattern);
 
 	string_list_sort(&sl);
-	string_list_remove_duplicates(&sl, 0);
 
 	for (i = 0; i < sl.nr; i++) {
 		char *pattern = escaped_pattern(sl.items[i].string);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3] sparse-checkout: optimize string_list construction
  2026-01-15 13:09 ` [PATCH v3] " amisha
@ 2026-01-15 13:15   ` Amisha Chhajed
  2026-01-15 20:09     ` Jeff King
  2026-01-15 22:26     ` René Scharfe
  2026-01-15 13:55   ` Junio C Hamano
  1 sibling, 2 replies; 28+ messages in thread
From: Amisha Chhajed @ 2026-01-15 13:15 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee, newren, peff

Made the changes for other 2 places as well!

I was also very curious about the presence of
string_list_remove_duplicates in the original code, from my
understanding string_list_insert already removed duplicates and
string_list_remove_duplicates was still present with it.


On Thu, 15 Jan 2026 at 18:39, amisha <amishhhaaaa@gmail.com> wrote:
>
> Improve O(n^2) complexity to O(n log n) while building a sorted 'string_list'
> by constructing it unsorted and sorting it afterwards.
>
> Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
> ---
>  builtin/sparse-checkout.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 15d51e60a8..edabe7cbd9 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -91,7 +91,7 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
>
>                 hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
>                         /* pe->pattern starts with "/", skip it */
> -                       string_list_insert(&sl, pe->pattern + 1);
> +                       string_list_append(&sl, pe->pattern + 1);
>                 }
>
>                 string_list_sort(&sl);
> @@ -289,11 +289,10 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
>                 if (!hashmap_contains_parent(&pl->recursive_hashmap,
>                                              pe->pattern,
>                                              &parent_pattern))
> -                       string_list_insert(&sl, pe->pattern);
> +                       string_list_append(&sl, pe->pattern);
>         }
>
>         string_list_sort(&sl);
> -       string_list_remove_duplicates(&sl, 0);
>
>         fprintf(fp, "/*\n!/*/\n");
>
> @@ -311,13 +310,12 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
>                 if (!hashmap_contains_parent(&pl->recursive_hashmap,
>                                              pe->pattern,
>                                              &parent_pattern))
> -                       string_list_insert(&sl, pe->pattern);
> +                       string_list_append(&sl, pe->pattern);
>         }
>
>         strbuf_release(&parent_pattern);
>
>         string_list_sort(&sl);
> -       string_list_remove_duplicates(&sl, 0);
>
>         for (i = 0; i < sl.nr; i++) {
>                 char *pattern = escaped_pattern(sl.items[i].string);
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3] sparse-checkout: optimize string_list construction
  2026-01-15 13:09 ` [PATCH v3] " amisha
  2026-01-15 13:15   ` Amisha Chhajed
@ 2026-01-15 13:55   ` Junio C Hamano
  1 sibling, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2026-01-15 13:55 UTC (permalink / raw)
  To: amisha; +Cc: git, stolee, newren, peff

amisha <amishhhaaaa@gmail.com> writes:

> Improve O(n^2) complexity to O(n log n) while building a sorted 'string_list'
> by constructing it unsorted and sorting it afterwards.
>
> Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
> ---

Because your e-mail client claims that the messages is from "amisha
<amishhhaaaa@gmail.com>" in its "From:" header line, you'd need to
insert an extra "in-body header" line, which is separate by a blank
line from the rest of the message body, to override it as the first
line in the message, making the body of the message begin like this.

    From: Amisha Chhajed <amishhhaaaa@gmail.com>

    Improve O(n^2) complexity to O(n log n) while building a sorted 'string_list'
    by constructing it unsorted and sorting it afterwards.

    Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>

I've tweaked the message I retrieved from the mailing list before
applying, so no need to resend this message, but in your future
contributions please keep this in mind.

Thanks.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3] sparse-checkout: optimize string_list construction
  2026-01-15 13:15   ` Amisha Chhajed
@ 2026-01-15 20:09     ` Jeff King
  2026-01-16 17:03       ` Amisha Chhajed
  2026-01-15 22:26     ` René Scharfe
  1 sibling, 1 reply; 28+ messages in thread
From: Jeff King @ 2026-01-15 20:09 UTC (permalink / raw)
  To: Amisha Chhajed; +Cc: git, gitster, stolee, newren

On Thu, Jan 15, 2026 at 06:45:35PM +0530, Amisha Chhajed wrote:

> I was also very curious about the presence of
> string_list_remove_duplicates in the original code, from my
> understanding string_list_insert already removed duplicates and
> string_list_remove_duplicates was still present with it.

Yes, I don't think you could have duplicates when inserting with
string_list_insert(). Of course your patch removes that, which means
we're falling back on the notion that the hashmap cannot have
duplicates, either.

I think our hashmap _does_ allow duplicate entries, though. The
insertion code in insert_recursive_pattern() avoids duplicates in
parent_hashmap, but adds its arguments directly to recursive_hashmap.

So I think you could get duplicates with something like:

  git init
  git sparse-checkout set --cone
  git sparse-checkout add --stdin <<\EOF
  foo
  bar
  foo
  EOF

Before your patch, that produces this .git/info/sparse-checkout file:

  /*
  !/*/
  /bar/
  /foo/

and after we get:

  /*
  !/*/
  /bar/
  /foo/
  /foo/

So I think we do want to retain the duplicate suppression. Switching
from insert() to append() is still good, as long as we keep the
remove_duplicates() lines.

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3] sparse-checkout: optimize string_list construction
  2026-01-15 13:15   ` Amisha Chhajed
  2026-01-15 20:09     ` Jeff King
@ 2026-01-15 22:26     ` René Scharfe
  2026-01-16  8:30       ` Amisha Chhajed
  1 sibling, 1 reply; 28+ messages in thread
From: René Scharfe @ 2026-01-15 22:26 UTC (permalink / raw)
  To: Amisha Chhajed, git; +Cc: gitster, stolee, newren, peff

On 1/15/26 2:15 PM, Amisha Chhajed wrote:
> Made the changes for other 2 places as well!
> 
> I was also very curious about the presence of
> string_list_remove_duplicates in the original code, from my
> understanding string_list_insert already removed duplicates and
> string_list_remove_duplicates was still present with it.

So the string_list_remove_duplicates() calls were unnecessary with
string_list_insert(), but why is it safe to remove them now that you use
string_list_append() instead, which doesn't check for duplicates?

> 
> On Thu, 15 Jan 2026 at 18:39, amisha <amishhhaaaa@gmail.com> wrote:
>>
>> Improve O(n^2) complexity to O(n log n) while building a sorted 'string_list'
>> by constructing it unsorted and sorting it afterwards.
>>
>> Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
>> ---
>>  builtin/sparse-checkout.c | 8 +++-----
>>  1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
>> index 15d51e60a8..edabe7cbd9 100644
>> --- a/builtin/sparse-checkout.c
>> +++ b/builtin/sparse-checkout.c
>> @@ -91,7 +91,7 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
>>
>>                 hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
>>                         /* pe->pattern starts with "/", skip it */
>> -                       string_list_insert(&sl, pe->pattern + 1);
>> +                       string_list_append(&sl, pe->pattern + 1);
>>                 }
>>
>>                 string_list_sort(&sl);
>> @@ -289,11 +289,10 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
>>                 if (!hashmap_contains_parent(&pl->recursive_hashmap,
>>                                              pe->pattern,
>>                                              &parent_pattern))
>> -                       string_list_insert(&sl, pe->pattern);
>> +                       string_list_append(&sl, pe->pattern);
>>         }
>>
>>         string_list_sort(&sl);
>> -       string_list_remove_duplicates(&sl, 0);
>>
>>         fprintf(fp, "/*\n!/*/\n");
>>
>> @@ -311,13 +310,12 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
>>                 if (!hashmap_contains_parent(&pl->recursive_hashmap,
>>                                              pe->pattern,
>>                                              &parent_pattern))
>> -                       string_list_insert(&sl, pe->pattern);
>> +                       string_list_append(&sl, pe->pattern);
>>         }
>>
>>         strbuf_release(&parent_pattern);
>>
>>         string_list_sort(&sl);
>> -       string_list_remove_duplicates(&sl, 0);
>>
>>         for (i = 0; i < sl.nr; i++) {
>>                 char *pattern = escaped_pattern(sl.items[i].string);
>> --
>> 2.51.0
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3] sparse-checkout: optimize string_list construction
  2026-01-15 22:26     ` René Scharfe
@ 2026-01-16  8:30       ` Amisha Chhajed
  2026-01-16 17:17         ` Junio C Hamano
  0 siblings, 1 reply; 28+ messages in thread
From: Amisha Chhajed @ 2026-01-16  8:30 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, gitster, stolee, newren, peff

It was assumed to be safe under the notion that our entries are not
duplicate but as already pointed out, our entries are not unique so we
need one of those two ways either insert or remove_duplicates, this
can be a trivial question but i wonder how are the tests passing by
removing these lines, i was actually researching about it.

On Fri, 16 Jan 2026 at 03:56, René Scharfe <l.s.r@web.de> wrote:
>
> On 1/15/26 2:15 PM, Amisha Chhajed wrote:
> > Made the changes for other 2 places as well!
> >
> > I was also very curious about the presence of
> > string_list_remove_duplicates in the original code, from my
> > understanding string_list_insert already removed duplicates and
> > string_list_remove_duplicates was still present with it.
>
> So the string_list_remove_duplicates() calls were unnecessary with
> string_list_insert(), but why is it safe to remove them now that you use
> string_list_append() instead, which doesn't check for duplicates?
>
> >
> > On Thu, 15 Jan 2026 at 18:39, amisha <amishhhaaaa@gmail.com> wrote:
> >>
> >> Improve O(n^2) complexity to O(n log n) while building a sorted 'string_list'
> >> by constructing it unsorted and sorting it afterwards.
> >>
> >> Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
> >> ---
> >>  builtin/sparse-checkout.c | 8 +++-----
> >>  1 file changed, 3 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> >> index 15d51e60a8..edabe7cbd9 100644
> >> --- a/builtin/sparse-checkout.c
> >> +++ b/builtin/sparse-checkout.c
> >> @@ -91,7 +91,7 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
> >>
> >>                 hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
> >>                         /* pe->pattern starts with "/", skip it */
> >> -                       string_list_insert(&sl, pe->pattern + 1);
> >> +                       string_list_append(&sl, pe->pattern + 1);
> >>                 }
> >>
> >>                 string_list_sort(&sl);
> >> @@ -289,11 +289,10 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
> >>                 if (!hashmap_contains_parent(&pl->recursive_hashmap,
> >>                                              pe->pattern,
> >>                                              &parent_pattern))
> >> -                       string_list_insert(&sl, pe->pattern);
> >> +                       string_list_append(&sl, pe->pattern);
> >>         }
> >>
> >>         string_list_sort(&sl);
> >> -       string_list_remove_duplicates(&sl, 0);
> >>
> >>         fprintf(fp, "/*\n!/*/\n");
> >>
> >> @@ -311,13 +310,12 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
> >>                 if (!hashmap_contains_parent(&pl->recursive_hashmap,
> >>                                              pe->pattern,
> >>                                              &parent_pattern))
> >> -                       string_list_insert(&sl, pe->pattern);
> >> +                       string_list_append(&sl, pe->pattern);
> >>         }
> >>
> >>         strbuf_release(&parent_pattern);
> >>
> >>         string_list_sort(&sl);
> >> -       string_list_remove_duplicates(&sl, 0);
> >>
> >>         for (i = 0; i < sl.nr; i++) {
> >>                 char *pattern = escaped_pattern(sl.items[i].string);
> >> --
> >> 2.51.0
> >>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] sparse-checkout: optimize string_list construction
  2026-01-14 19:28 [PATCH] sparse-checkout: optimize string_list construction amisha
                   ` (2 preceding siblings ...)
  2026-01-15 13:09 ` [PATCH v3] " amisha
@ 2026-01-16 16:50 ` amisha
  2026-01-16 19:11   ` Junio C Hamano
  2026-01-19 12:33 ` [PATCH v5 1/2] " amisha
  2026-01-20 15:38 ` [PATCH v6] sparse-checkout: optimize string_list construction and add tests to verify deduplication amisha
  5 siblings, 1 reply; 28+ messages in thread
From: amisha @ 2026-01-16 16:50 UTC (permalink / raw)
  To: amishhhaaaa; +Cc: git, gitster, stolee, newren, peff

From: Amisha Chhajed <amishhhaaaa@gmail.com>

Improve O(n^2) complexity to O(n log n) while building a sorted
'string_list' by constructing it unsorted then sorting it
followed by removing duplicates.

Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
---
 builtin/sparse-checkout.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 15d51e60a8..7dfb276bf0 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -91,10 +91,11 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
 
 		hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
 			/* pe->pattern starts with "/", skip it */
-			string_list_insert(&sl, pe->pattern + 1);
+			string_list_append(&sl, pe->pattern + 1);
 		}
 
 		string_list_sort(&sl);
+		string_list_remove_duplicates(&sl, 0);
 
 		for (i = 0; i < sl.nr; i++) {
 			quote_c_style(sl.items[i].string, NULL, stdout, 0);
@@ -289,7 +290,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 		if (!hashmap_contains_parent(&pl->recursive_hashmap,
 					     pe->pattern,
 					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
+			string_list_append(&sl, pe->pattern);
 	}
 
 	string_list_sort(&sl);
@@ -311,7 +312,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 		if (!hashmap_contains_parent(&pl->recursive_hashmap,
 					     pe->pattern,
 					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
+			string_list_append(&sl, pe->pattern);
 	}
 
 	strbuf_release(&parent_pattern);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3] sparse-checkout: optimize string_list construction
  2026-01-15 20:09     ` Jeff King
@ 2026-01-16 17:03       ` Amisha Chhajed
  0 siblings, 0 replies; 28+ messages in thread
From: Amisha Chhajed @ 2026-01-16 17:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, stolee, newren

I was able to reproduce this, are we open to a patch adding a test
that checks if duplicate entries are present in stdin the result
should not have it? because the tests were passing even after removing
all duplicates checks, and non duplicates enforcement is a part of the
method's behaviour, if I am understanding correctly.

On Fri, 16 Jan 2026 at 01:39, Jeff King <peff@peff.net> wrote:
>
> On Thu, Jan 15, 2026 at 06:45:35PM +0530, Amisha Chhajed wrote:
>
> > I was also very curious about the presence of
> > string_list_remove_duplicates in the original code, from my
> > understanding string_list_insert already removed duplicates and
> > string_list_remove_duplicates was still present with it.
>
> Yes, I don't think you could have duplicates when inserting with
> string_list_insert(). Of course your patch removes that, which means
> we're falling back on the notion that the hashmap cannot have
> duplicates, either.
>
> I think our hashmap _does_ allow duplicate entries, though. The
> insertion code in insert_recursive_pattern() avoids duplicates in
> parent_hashmap, but adds its arguments directly to recursive_hashmap.
>
> So I think you could get duplicates with something like:
>
>   git init
>   git sparse-checkout set --cone
>   git sparse-checkout add --stdin <<\EOF
>   foo
>   bar
>   foo
>   EOF
>
> Before your patch, that produces this .git/info/sparse-checkout file:
>
>   /*
>   !/*/
>   /bar/
>   /foo/
>
> and after we get:
>
>   /*
>   !/*/
>   /bar/
>   /foo/
>   /foo/
>
> So I think we do want to retain the duplicate suppression. Switching
> from insert() to append() is still good, as long as we keep the
> remove_duplicates() lines.
>
> -Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3] sparse-checkout: optimize string_list construction
  2026-01-16  8:30       ` Amisha Chhajed
@ 2026-01-16 17:17         ` Junio C Hamano
  2026-01-18  2:46           ` Derrick Stolee
  0 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2026-01-16 17:17 UTC (permalink / raw)
  To: Amisha Chhajed; +Cc: René Scharfe, git, stolee, newren, peff

Amisha Chhajed <amishhhaaaa@gmail.com> writes:

> It was assumed to be safe under the notion that our entries are not
> duplicate but as already pointed out, our entries are not unique so we
> need one of those two ways either insert or remove_duplicates, this
> can be a trivial question but i wonder how are the tests passing by
> removing these lines, i was actually researching about it.

... suspense.  And the result of the research was???

If the answer was simply "we lack test coverage", it may make sense
to add a test taken from Peff's earlier response to increase test
coverage, perhaps?

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] sparse-checkout: optimize string_list construction
  2026-01-16 16:50 ` [PATCH] " amisha
@ 2026-01-16 19:11   ` Junio C Hamano
  2026-01-18 13:07     ` Amisha Chhajed
  2026-01-19  5:32     ` Jeff King
  0 siblings, 2 replies; 28+ messages in thread
From: Junio C Hamano @ 2026-01-16 19:11 UTC (permalink / raw)
  To: amisha; +Cc: git, stolee, newren, peff

amisha <amishhhaaaa@gmail.com> writes:

> Subject: Re: [PATCH] sparse-checkout: optimize string_list construction

It would have been nice to see [PATCH v2] or whatever that signals
that there is an earlier iteration.

> From: Amisha Chhajed <amishhhaaaa@gmail.com>
>
> Improve O(n^2) complexity to O(n log n) while building a sorted
> 'string_list' by constructing it unsorted then sorting it
> followed by removing duplicates.

By the way, do we have t/perf/ that substanticates the performance
claim here (in other words, how much improvement are we expecting in
practice)?

Also, have you found out why the previous round that did not remove
duplicates saw no failed tests?  Perhaps it is a good idea to add
some test that would notice if we failed to add calls to
remove_duplicates in this patch?

This is an unrelated tangent, a possible #leftoverbits material, but
should not be part of this patch (or even in the same series as this
patch).  I notice that string_list_remove_duplicates() almost always
immediately follow a call to string_list_sort() of the same
instance, which makes me wonder if we would be better off if we had
a variant of string_list_sort(), and call it string_list_sort_u().

Thanks.

> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 15d51e60a8..7dfb276bf0 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -91,10 +91,11 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
>  
>  		hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
>  			/* pe->pattern starts with "/", skip it */
> -			string_list_insert(&sl, pe->pattern + 1);
> +			string_list_append(&sl, pe->pattern + 1);
>  		}
>  
>  		string_list_sort(&sl);
> +		string_list_remove_duplicates(&sl, 0);
>  
>  		for (i = 0; i < sl.nr; i++) {
>  			quote_c_style(sl.items[i].string, NULL, stdout, 0);
> @@ -289,7 +290,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
>  		if (!hashmap_contains_parent(&pl->recursive_hashmap,
>  					     pe->pattern,
>  					     &parent_pattern))
> -			string_list_insert(&sl, pe->pattern);
> +			string_list_append(&sl, pe->pattern);
>  	}
>  
>  	string_list_sort(&sl);
> @@ -311,7 +312,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
>  		if (!hashmap_contains_parent(&pl->recursive_hashmap,
>  					     pe->pattern,
>  					     &parent_pattern))
> -			string_list_insert(&sl, pe->pattern);
> +			string_list_append(&sl, pe->pattern);
>  	}
>  
>  	strbuf_release(&parent_pattern);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] sparse-checkout: optimize string_list construction
  2026-01-14 21:35 ` Jeff King
@ 2026-01-18  2:39   ` Derrick Stolee
  0 siblings, 0 replies; 28+ messages in thread
From: Derrick Stolee @ 2026-01-18  2:39 UTC (permalink / raw)
  To: Jeff King, amisha; +Cc: git, Junio C Hamano, Elijah Newren

On 1/14/26 4:35 PM, Jeff King wrote:
> On Thu, Jan 15, 2026 at 12:58:03AM +0530, amisha wrote:
> 
>> Improve O(n^2) complexity to O(n log n) while building a sorted 'string_list' by constructing it unsorted and sorting it afterwards.
...
>>   		hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
>>   			/* pe->pattern starts with "/", skip it */
>> -			string_list_insert(&sl, pe->pattern + 1);
>> +			string_list_append(&sl, pe->pattern + 1);
>>   		}
>>   
>>   		string_list_sort(&sl);
> 
> Since we already sort here, I was quite curious how this came about.  It
> looks like the _insert() call and the _sort() were both added together
> in de11951b03 (sparse-checkout: list directories in cone mode,
> 2019-12-30).
> 
> I'd guess it was just a typo/brain-o to mix up append and insert.

This is exactly the case.

> Doesn't the same issue exist in write_cone_to_file(), too (in two
> separate spots)?

It would make sense that such a pattern could reappear in other areas
in this file. I see that you have caught a few more in v3.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3] sparse-checkout: optimize string_list construction
  2026-01-16 17:17         ` Junio C Hamano
@ 2026-01-18  2:46           ` Derrick Stolee
  2026-01-18 13:09             ` Amisha Chhajed
  0 siblings, 1 reply; 28+ messages in thread
From: Derrick Stolee @ 2026-01-18  2:46 UTC (permalink / raw)
  To: Junio C Hamano, Amisha Chhajed; +Cc: René Scharfe, git, newren, peff

On 1/16/26 12:17 PM, Junio C Hamano wrote:
> Amisha Chhajed <amishhhaaaa@gmail.com> writes:
> 
>> It was assumed to be safe under the notion that our entries are not
>> duplicate but as already pointed out, our entries are not unique so we
>> need one of those two ways either insert or remove_duplicates, this
>> can be a trivial question but i wonder how are the tests passing by
>> removing these lines, i was actually researching about it.
> 
> ... suspense.  And the result of the research was???
> 
> If the answer was simply "we lack test coverage", it may make sense
> to add a test taken from Peff's earlier response to increase test
> coverage, perhaps?

In addition to adding more tests to t/t1091-sparse-checkout-builtin.sh
to cover these duplicate cases. To demonstrate your quadratic perf
improvement, a test in t/perf/p2000-sparse-operations.sh or similar
would be good to add.

I expect that the test you would add doesn't matter too much about
the data shape, but would look very different from most tests in
p2000. You can make use of the constructed repo's directory structure
that has nesting directories with name f1, f2, f3, or f4.

Here's something to get you started that I haven't tested myself:

test_perf 'duplicate sparse directories' '
	(
		cd full-v4 &&
		
		for i in $(test_seq 1000)
		do
			printf "f1/f2/f3/f4\n"
		done >in &&
		git sparse-checkout set --stdin <in
	)
'

That should test the logic with 1000 identical directories, which
should be enough to have the quadratic growth show up.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] sparse-checkout: optimize string_list construction
  2026-01-16 19:11   ` Junio C Hamano
@ 2026-01-18 13:07     ` Amisha Chhajed
  2026-01-19  5:32     ` Jeff King
  1 sibling, 0 replies; 28+ messages in thread
From: Amisha Chhajed @ 2026-01-18 13:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, stolee, newren, peff

On Sat, 17 Jan 2026 at 00:41, Junio C Hamano <gitster@pobox.com> wrote:
>
> amisha <amishhhaaaa@gmail.com> writes:
>
> > Subject: Re: [PATCH] sparse-checkout: optimize string_list construction
>
> It would have been nice to see [PATCH v2] or whatever that signals
> that there is an earlier iteration.
>
> > From: Amisha Chhajed <amishhhaaaa@gmail.com>
> >
> > Improve O(n^2) complexity to O(n log n) while building a sorted
> > 'string_list' by constructing it unsorted then sorting it
> > followed by removing duplicates.
>
> By the way, do we have t/perf/ that substanticates the performance
> claim here (in other words, how much improvement are we expecting in
> practice)?

After running different perf tests i was not able to find any
substantial improvement in the results output before and after this
patch, after going through some perf tests i came to conclude that for
the results of this commit to shine we need a perf test that tests it
with many duplicates, thanks to inputs by Derrick for further
confirming this and giving me a starting point.

> Also, have you found out why the previous round that did not remove
> duplicates saw no failed tests?  Perhaps it is a good idea to add
> some test that would notice if we failed to add calls to
> remove_duplicates in this patch?

Yes!, actually we don't have a test that covers the line which removes
duplicates. I wrote a test locally which fails if duplicates are found
in the output with duplicates in input, very similar to what Jeff
wrote for reproducing. I will create a patch sh

> This is an unrelated tangent, a possible #leftoverbits material, but
> should not be part of this patch (or even in the same series as this
> patch).  I notice that string_list_remove_duplicates() almost always
> immediately follow a call to string_list_sort() of the same
> instance, which makes me wonder if we would be better off if we had
> a variant of string_list_sort(), and call it string_list_sort_u()

 After running command git grep -n -e "string_list_sort" -e
"string_list_remove_duplicates" -- clone.c fast-export.c fetch.c
help.c pack-objects.c sparse-checkout.c
from builtin/
i got the output

clone.c:1139:           string_list_sort(&option_recurse_submodules);

clone.c:1140:
string_list_remove_duplicates(&option_recurse_submodules, 0);

fast-export.c:1121:     string_list_sort(&extra_refs);

fast-export.c:1122:     string_list_remove_duplicates(&extra_refs, 0);

fetch.c:1370:           string_list_sort(&refnames);

fetch.c:2587:   string_list_remove_duplicates(&list, 0);

help.c:159:     string_list_sort(&keys);

help.c:199:     string_list_remove_duplicates(&keys_uniq, 0);

pack-objects.c:3852:    string_list_sort(&include_packs);

pack-objects.c:3853:    string_list_remove_duplicates(&include_packs, 0);

pack-objects.c:3854:    string_list_sort(&exclude_packs);

pack-objects.c:3855:    string_list_remove_duplicates(&exclude_packs, 0);

pack-objects.c:3899:     * string_list_item's ->util pointer, which
string_list_sort() does not

pack-objects.c:4141:    string_list_sort(&discard_packs);

pack-objects.c:4142:    string_list_sort(&fresh_packs);

sparse-checkout.c:97:           string_list_sort(&sl);

sparse-checkout.c:98:           string_list_remove_duplicates(&sl, 0);

sparse-checkout.c:296:  string_list_sort(&sl);

sparse-checkout.c:297:  string_list_remove_duplicates(&sl, 0);

sparse-checkout.c:320:  string_list_sort(&sl);

sparse-checkout.c:321:  string_list_remove_duplicates(&sl, 0);


There are many places where string_list_rmeove_duplicates is directly
next to string_list_sort, so it is a very common pattern

Thank you.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3] sparse-checkout: optimize string_list construction
  2026-01-18  2:46           ` Derrick Stolee
@ 2026-01-18 13:09             ` Amisha Chhajed
  0 siblings, 0 replies; 28+ messages in thread
From: Amisha Chhajed @ 2026-01-18 13:09 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, René Scharfe, git, newren, peff

On Sun, 18 Jan 2026 at 08:16, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 1/16/26 12:17 PM, Junio C Hamano wrote:
> > Amisha Chhajed <amishhhaaaa@gmail.com> writes:
> >
> >> It was assumed to be safe under the notion that our entries are not
> >> duplicate but as already pointed out, our entries are not unique so we
> >> need one of those two ways either insert or remove_duplicates, this
> >> can be a trivial question but i wonder how are the tests passing by
> >> removing these lines, i was actually researching about it.
> >
> > ... suspense.  And the result of the research was???
> >
> > If the answer was simply "we lack test coverage", it may make sense
> > to add a test taken from Peff's earlier response to increase test
> > coverage, perhaps?
>
> In addition to adding more tests to t/t1091-sparse-checkout-builtin.sh
> to cover these duplicate cases. To demonstrate your quadratic perf
> improvement, a test in t/perf/p2000-sparse-operations.sh or similar
> would be good to add.
>
> I expect that the test you would add doesn't matter too much about
> the data shape, but would look very different from most tests in
> p2000. You can make use of the constructed repo's directory structure
> that has nesting directories with name f1, f2, f3, or f4.
>
> Here's something to get you started that I haven't tested myself:
>
> test_perf 'duplicate sparse directories' '
>         (
>                 cd full-v4 &&
>
>                 for i in $(test_seq 1000)
>                 do
>                         printf "f1/f2/f3/f4\n"
>                 done >in &&
>                 git sparse-checkout set --stdin <in
>         )
> '
>
> That should test the logic with 1000 identical directories, which
> should be enough to have the quadratic growth show up.
>
> Thanks,
> -Stolee


Thank you so much for pointing me in the right direction.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] sparse-checkout: optimize string_list construction
  2026-01-16 19:11   ` Junio C Hamano
  2026-01-18 13:07     ` Amisha Chhajed
@ 2026-01-19  5:32     ` Jeff King
  2026-01-19 19:06       ` Junio C Hamano
  1 sibling, 1 reply; 28+ messages in thread
From: Jeff King @ 2026-01-19  5:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: amisha, git, stolee, newren

On Fri, Jan 16, 2026 at 11:11:16AM -0800, Junio C Hamano wrote:

> > Improve O(n^2) complexity to O(n log n) while building a sorted
> > 'string_list' by constructing it unsorted then sorting it
> > followed by removing duplicates.
> 
> By the way, do we have t/perf/ that substanticates the performance
> claim here (in other words, how much improvement are we expecting in
> practice)?

IMHO it is not that big a deal to demonstrate the perf improvement in
the test suite.

Probably you could feed a very long list of unique names to "git
sparse-checkout add --stdin" to trigger it. But a list long enough to
cause annoying quadratic behavior is getting far enough from the real
world that I'm not sure it is worth adding to the (already expensive)
perf suite.

And swapping append+sort for sorted insertion is a common and simple
improvement.  We probably don't need to prove its performance at all,
but if we do, a one-off hyperfine output in the commit message would be
enough.

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 1/2] sparse-checkout: optimize string_list construction
  2026-01-14 19:28 [PATCH] sparse-checkout: optimize string_list construction amisha
                   ` (3 preceding siblings ...)
  2026-01-16 16:50 ` [PATCH] " amisha
@ 2026-01-19 12:33 ` amisha
  2026-01-19 17:04   ` Derrick Stolee
  2026-01-20 15:38 ` [PATCH v6] sparse-checkout: optimize string_list construction and add tests to verify deduplication amisha
  5 siblings, 1 reply; 28+ messages in thread
From: amisha @ 2026-01-19 12:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Elijah Newren, Jeff King,
	Amisha Chhajed

From: Amisha Chhajed <amishhhaaaa@gmail.com>

Improve O(n^2) complexity to O(n log n) while building a sorted
'string_list' by constructing it unsorted then sorting it
followed by removing duplicates.

Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
---
 builtin/sparse-checkout.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 15d51e60a8..7dfb276bf0 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -91,10 +91,11 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
 
 		hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
 			/* pe->pattern starts with "/", skip it */
-			string_list_insert(&sl, pe->pattern + 1);
+			string_list_append(&sl, pe->pattern + 1);
 		}
 
 		string_list_sort(&sl);
+		string_list_remove_duplicates(&sl, 0);
 
 		for (i = 0; i < sl.nr; i++) {
 			quote_c_style(sl.items[i].string, NULL, stdout, 0);
@@ -289,7 +290,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 		if (!hashmap_contains_parent(&pl->recursive_hashmap,
 					     pe->pattern,
 					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
+			string_list_append(&sl, pe->pattern);
 	}
 
 	string_list_sort(&sl);
@@ -311,7 +312,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 		if (!hashmap_contains_parent(&pl->recursive_hashmap,
 					     pe->pattern,
 					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
+			string_list_append(&sl, pe->pattern);
 	}
 
 	strbuf_release(&parent_pattern);
-- 
2.51.0


From b20a99f0773bab063a31eea6fead730e18200ca7 Mon Sep 17 00:00:00 2001
From: Amisha Chhajed <amishhhaaaa@gmail.com>
Date: Mon, 19 Jan 2026 00:20:47 +0530
Subject: [PATCH v5 2/2] t1091: Add tests for deduplication of cone-mode sparse
 patterns

Sparse-checkout deduplicates repeated cone-mode patterns,
but this behaviour was previously untested.

Add tests that verify that sparse-checkout file contain each cone
pattern only once and sparse-checkout list reports each pattern
only once.

Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
---
 t/t1091-sparse-checkout-builtin.sh | 33 ++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index b2da4feaef..858801fed3 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -817,6 +817,39 @@ test_expect_success 'cone mode clears ignored subdirectories' '
 	test_cmp expect out
 '
 
+test_expect_success 'sparse-checkout deduplicates repeated cone patterns' '
+    rm -f repo/.git/info/sparse-checkout &&
+    git -C repo sparse-checkout init --cone &&
+    git -C repo sparse-checkout add --stdin <<-\EOF &&
+	/foo/
+	/bar/
+	/foo/
+	EOF
+    cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/bar/
+	/foo/
+	EOF
+    test_cmp expect repo/.git/info/sparse-checkout
+'
+
+test_expect_success 'sparse-checkout list deduplicates repeated cone patterns' '
+    rm -f repo/.git/info/sparse-checkout &&
+    git -C repo sparse-checkout init --cone &&
+    git -C repo sparse-checkout add --stdin <<-\EOF &&
+	/foo/
+	/bar/
+	/foo/
+	EOF
+    git -C repo sparse-checkout list >actual &&
+    cat >expect <<-\EOF &&
+	bar
+	foo
+	EOF
+    test_cmp expect actual
+'
+
 test_expect_success 'malformed cone-mode patterns' '
 	git -C repo sparse-checkout init --cone &&
 	mkdir -p repo/foo/bar &&
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 1/2] sparse-checkout: optimize string_list construction
  2026-01-19 12:33 ` [PATCH v5 1/2] " amisha
@ 2026-01-19 17:04   ` Derrick Stolee
  2026-01-19 18:33     ` Pushkar Singh
  2026-01-20 15:47     ` Amisha Chhajed
  0 siblings, 2 replies; 28+ messages in thread
From: Derrick Stolee @ 2026-01-19 17:04 UTC (permalink / raw)
  To: amisha, git; +Cc: Junio C Hamano, Elijah Newren, Jeff King

On 1/19/2026 7:33 AM, amisha wrote:
> From: Amisha Chhajed <amishhhaaaa@gmail.com>
> 
> Improve O(n^2) complexity to O(n log n) while building a sorted
> 'string_list' by constructing it unsorted then sorting it
> followed by removing duplicates.
> 
> Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
> ---
>  builtin/sparse-checkout.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 15d51e60a8..7dfb276bf0 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -91,10 +91,11 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
>  
>  		hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
>  			/* pe->pattern starts with "/", skip it */
> -			string_list_insert(&sl, pe->pattern + 1);
> +			string_list_append(&sl, pe->pattern + 1);
>  		}
>  
>  		string_list_sort(&sl);
> +		string_list_remove_duplicates(&sl, 0);

Shouldn't this line be added in the other uses of string_list_append()?

>  
>  		for (i = 0; i < sl.nr; i++) {
>  			quote_c_style(sl.items[i].string, NULL, stdout, 0);
> @@ -289,7 +290,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
>  		if (!hashmap_contains_parent(&pl->recursive_hashmap,
>  					     pe->pattern,
>  					     &parent_pattern))
> -			string_list_insert(&sl, pe->pattern);
> +			string_list_append(&sl, pe->pattern);
>  	}
> 
>  	string_list_sort(&sl);
Actually, there is a string_list_remove_duplicates() just
outside of the context of this diff. 

> @@ -311,7 +312,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
>  		if (!hashmap_contains_parent(&pl->recursive_hashmap,
>  					     pe->pattern,
>  					     &parent_pattern))
> -			string_list_insert(&sl, pe->pattern);
> +			string_list_append(&sl, pe->pattern);
>  	}
>  
>  	strbuf_release(&parent_pattern);

Same here.

This diff looks good, but I do believe it would be good to include your
duplicate test here instead of in a second patch.

Also, the way your second patch appeared as a trailer of your first patch,
so it didn't appear properly as a thread in my email client. Here it is
for review:

> From b20a99f0773bab063a31eea6fead730e18200ca7 Mon Sep 17 00:00:00 2001
> From: Amisha Chhajed <amishhhaaaa@gmail.com>
> Date: Mon, 19 Jan 2026 00:20:47 +0530
> Subject: [PATCH v5 2/2] t1091: Add tests for deduplication of cone-mode sparse
>  patterns

nit: this title is a little long and has incorrect capitalization. Take
note for later, since I expect this diff to be squashed into the previous
patch.

> +test_expect_success 'sparse-checkout deduplicates repeated cone patterns' '
> +    rm -f repo/.git/info/sparse-checkout &&
> +    git -C repo sparse-checkout init --cone &&
> +    git -C repo sparse-checkout add --stdin <<-\EOF &&
> +	/foo/
> +	/bar/
> +	/foo/
> +	EOF

This slashes are redundant for cone-mode patterns. I recommend a more
interesting case, such as

	foo/bar/baz
	a/b/c
	foo/bar/baz
	a/b

This should remove the duplicates foo/bar/baz during the run, but also it
should notice that a/b/c is contained within the recursive set defined by
a/b.

The resulting sparse-checkout file should have lines such as

	/*
	!/*/
	/a/
	!/a/*/
	/a/b
	/foo
	!/foo/*/
	/foo/bar
	!/foo/bar/*/
	/foo/bar/baz

The order might be different, but this is what I recall from how nested
directories work in cone mode.

> +    cat >expect <<-\EOF &&
> +	/*
> +	!/*/
> +	/bar/
> +	/foo/
> +	EOF
> +    test_cmp expect repo/.git/info/sparse-checkout
> +'

> +test_expect_success 'sparse-checkout list deduplicates repeated cone patterns' '
> +    rm -f repo/.git/info/sparse-checkout &&
> +    git -C repo sparse-checkout init --cone &&
> +    git -C repo sparse-checkout add --stdin <<-\EOF &&
> +	/foo/
> +	/bar/
> +	/foo/
> +	EOF
> +    git -C repo sparse-checkout list >actual &&
> +    cat >expect <<-\EOF &&
> +	bar
> +	foo
> +	EOF
> +    test_cmp expect actual
> +'

This does lead to an interesting case where the 'list' command was only
interacting with the sparse-checkout file, which wouldn't have duplicates
if it was modified by the user in cone mode.

Keep in mind that you're not actually testing the 'list' command, because
the 'add' command already deduplicated. You'll need to modify the
sparse-checkout file itself to get an interesting test of the 'list'
command.

When not in cone mode, we should not be removing duplicates because the
order of the patterns matters and we should not be reordering them. I'm
not sure if that's relevant but it's something to keep in mind while you're
testing, since the command will revert to non-cone mode if the
sparse-checkout file doesn't match the cone mode pattern expectations.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 1/2] sparse-checkout: optimize string_list construction
  2026-01-19 17:04   ` Derrick Stolee
@ 2026-01-19 18:33     ` Pushkar Singh
  2026-01-20 15:47     ` Amisha Chhajed
  1 sibling, 0 replies; 28+ messages in thread
From: Pushkar Singh @ 2026-01-19 18:33 UTC (permalink / raw)
  To: Derrick Stolee, git; +Cc: amisha, Junio C Hamano, Elijah Newren, Jeff King

Hi Amisha, Derrick,

Thanks for the detailed discussion here. Reading through the thread helped
clarify the intended behavior around deduplication and testing.

While investigating "git sparse-checkout list" in cone mode independently,
I noticed a related but slightly different case than repeated identical
patterns: semantically equivalent but syntactically different paths, for
example:

  folder1
  folder1/
  ./folder1

These collapse to a single canonical entry in the output of
"sparse-checkout list", even though they are distinct strings on input.

The tests in v5 cover duplicate identical patterns well, but they do not
appear to cover this normalization aspect. It may be worth adding a test
that exercises "list" behavior with such normalized-path variants, possibly
by modifying the sparse-checkout file directly so that "add" does not
perform deduplication first, as Derrick mentioned.

For context, I sent a small test-only patch exploring this behavior here:
https://lore.kernel.org/git/edbde063-2c39-4812-9970-247b67f678c7@gmail.com/T/#m68a4fd645e10cd8e82ac5e4080c48b12f8f6348a

Happy to help draft or review an additional test if that would be useful.

Thanks,
Pushkar

On Mon, Jan 19, 2026 at 10:39 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 1/19/2026 7:33 AM, amisha wrote:
> > From: Amisha Chhajed <amishhhaaaa@gmail.com>
> >
> > Improve O(n^2) complexity to O(n log n) while building a sorted
> > 'string_list' by constructing it unsorted then sorting it
> > followed by removing duplicates.
> >
> > Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
> > ---
> >  builtin/sparse-checkout.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> > index 15d51e60a8..7dfb276bf0 100644
> > --- a/builtin/sparse-checkout.c
> > +++ b/builtin/sparse-checkout.c
> > @@ -91,10 +91,11 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
> >
> >               hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
> >                       /* pe->pattern starts with "/", skip it */
> > -                     string_list_insert(&sl, pe->pattern + 1);
> > +                     string_list_append(&sl, pe->pattern + 1);
> >               }
> >
> >               string_list_sort(&sl);
> > +             string_list_remove_duplicates(&sl, 0);
>
> Shouldn't this line be added in the other uses of string_list_append()?
>
> >
> >               for (i = 0; i < sl.nr; i++) {
> >                       quote_c_style(sl.items[i].string, NULL, stdout, 0);
> > @@ -289,7 +290,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
> >               if (!hashmap_contains_parent(&pl->recursive_hashmap,
> >                                            pe->pattern,
> >                                            &parent_pattern))
> > -                     string_list_insert(&sl, pe->pattern);
> > +                     string_list_append(&sl, pe->pattern);
> >       }
> >
> >       string_list_sort(&sl);
> Actually, there is a string_list_remove_duplicates() just
> outside of the context of this diff.
>
> > @@ -311,7 +312,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
> >               if (!hashmap_contains_parent(&pl->recursive_hashmap,
> >                                            pe->pattern,
> >                                            &parent_pattern))
> > -                     string_list_insert(&sl, pe->pattern);
> > +                     string_list_append(&sl, pe->pattern);
> >       }
> >
> >       strbuf_release(&parent_pattern);
>
> Same here.
>
> This diff looks good, but I do believe it would be good to include your
> duplicate test here instead of in a second patch.
>
> Also, the way your second patch appeared as a trailer of your first patch,
> so it didn't appear properly as a thread in my email client. Here it is
> for review:
>
> > From b20a99f0773bab063a31eea6fead730e18200ca7 Mon Sep 17 00:00:00 2001
> > From: Amisha Chhajed <amishhhaaaa@gmail.com>
> > Date: Mon, 19 Jan 2026 00:20:47 +0530
> > Subject: [PATCH v5 2/2] t1091: Add tests for deduplication of cone-mode sparse
> >  patterns
>
> nit: this title is a little long and has incorrect capitalization. Take
> note for later, since I expect this diff to be squashed into the previous
> patch.
>
> > +test_expect_success 'sparse-checkout deduplicates repeated cone patterns' '
> > +    rm -f repo/.git/info/sparse-checkout &&
> > +    git -C repo sparse-checkout init --cone &&
> > +    git -C repo sparse-checkout add --stdin <<-\EOF &&
> > +     /foo/
> > +     /bar/
> > +     /foo/
> > +     EOF
>
> This slashes are redundant for cone-mode patterns. I recommend a more
> interesting case, such as
>
>         foo/bar/baz
>         a/b/c
>         foo/bar/baz
>         a/b
>
> This should remove the duplicates foo/bar/baz during the run, but also it
> should notice that a/b/c is contained within the recursive set defined by
> a/b.
>
> The resulting sparse-checkout file should have lines such as
>
>         /*
>         !/*/
>         /a/
>         !/a/*/
>         /a/b
>         /foo
>         !/foo/*/
>         /foo/bar
>         !/foo/bar/*/
>         /foo/bar/baz
>
> The order might be different, but this is what I recall from how nested
> directories work in cone mode.
>
> > +    cat >expect <<-\EOF &&
> > +     /*
> > +     !/*/
> > +     /bar/
> > +     /foo/
> > +     EOF
> > +    test_cmp expect repo/.git/info/sparse-checkout
> > +'
>
> > +test_expect_success 'sparse-checkout list deduplicates repeated cone patterns' '
> > +    rm -f repo/.git/info/sparse-checkout &&
> > +    git -C repo sparse-checkout init --cone &&
> > +    git -C repo sparse-checkout add --stdin <<-\EOF &&
> > +     /foo/
> > +     /bar/
> > +     /foo/
> > +     EOF
> > +    git -C repo sparse-checkout list >actual &&
> > +    cat >expect <<-\EOF &&
> > +     bar
> > +     foo
> > +     EOF
> > +    test_cmp expect actual
> > +'
>
> This does lead to an interesting case where the 'list' command was only
> interacting with the sparse-checkout file, which wouldn't have duplicates
> if it was modified by the user in cone mode.
>
> Keep in mind that you're not actually testing the 'list' command, because
> the 'add' command already deduplicated. You'll need to modify the
> sparse-checkout file itself to get an interesting test of the 'list'
> command.
>
> When not in cone mode, we should not be removing duplicates because the
> order of the patterns matters and we should not be reordering them. I'm
> not sure if that's relevant but it's something to keep in mind while you're
> testing, since the command will revert to non-cone mode if the
> sparse-checkout file doesn't match the cone mode pattern expectations.
>
> Thanks,
> -Stolee
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] sparse-checkout: optimize string_list construction
  2026-01-19  5:32     ` Jeff King
@ 2026-01-19 19:06       ` Junio C Hamano
  0 siblings, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2026-01-19 19:06 UTC (permalink / raw)
  To: Jeff King; +Cc: amisha, git, stolee, newren

Jeff King <peff@peff.net> writes:

> On Fri, Jan 16, 2026 at 11:11:16AM -0800, Junio C Hamano wrote:
>
>> > Improve O(n^2) complexity to O(n log n) while building a sorted
>> > 'string_list' by constructing it unsorted then sorting it
>> > followed by removing duplicates.
>> 
>> By the way, do we have t/perf/ that substanticates the performance
>> claim here (in other words, how much improvement are we expecting in
>> practice)?
>
> IMHO it is not that big a deal to demonstrate the perf improvement in
> the test suite.
> ... a one-off hyperfine output in the commit message would be
> enough.

Thanks, I agree with this conclusion; I didn't expect a huge
difference from this change unless N is meaningfully large anyway.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v6] sparse-checkout: optimize string_list construction and add tests to verify deduplication.
  2026-01-14 19:28 [PATCH] sparse-checkout: optimize string_list construction amisha
                   ` (4 preceding siblings ...)
  2026-01-19 12:33 ` [PATCH v5 1/2] " amisha
@ 2026-01-20 15:38 ` amisha
  2026-01-20 20:37   ` Derrick Stolee
  2026-01-21 13:00   ` [PATCH v7] " Amisha Chhajed
  5 siblings, 2 replies; 28+ messages in thread
From: amisha @ 2026-01-20 15:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee, Elijah Newren, Jeff King, amisha

From: Amisha Chhajed <amishhhaaaa@gmail.com>

Improve O(n^2) complexity to O(n log n) while building a sorted
'string_list' by constructing it unsorted then sorting it
followed by removing duplicates.

sparse-checkout deduplicates repeated cone-mode patterns,
but this behaviour was previously untested, add tests that
verify that sparse-checkout file contain each cone
pattern only once and sparse-checkout list reports each pattern
only once.

Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
---
 builtin/sparse-checkout.c          |  7 +++--
 t/t1091-sparse-checkout-builtin.sh | 48 ++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 15d51e60a8..7dfb276bf0 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -91,10 +91,11 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
 
 		hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
 			/* pe->pattern starts with "/", skip it */
-			string_list_insert(&sl, pe->pattern + 1);
+			string_list_append(&sl, pe->pattern + 1);
 		}
 
 		string_list_sort(&sl);
+		string_list_remove_duplicates(&sl, 0);
 
 		for (i = 0; i < sl.nr; i++) {
 			quote_c_style(sl.items[i].string, NULL, stdout, 0);
@@ -289,7 +290,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 		if (!hashmap_contains_parent(&pl->recursive_hashmap,
 					     pe->pattern,
 					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
+			string_list_append(&sl, pe->pattern);
 	}
 
 	string_list_sort(&sl);
@@ -311,7 +312,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 		if (!hashmap_contains_parent(&pl->recursive_hashmap,
 					     pe->pattern,
 					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
+			string_list_append(&sl, pe->pattern);
 	}
 
 	strbuf_release(&parent_pattern);
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index b2da4feaef..ede5c6b3f3 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -817,6 +817,54 @@ test_expect_success 'cone mode clears ignored subdirectories' '
 	test_cmp expect out
 '
 
+test_expect_success 'sparse-checkout deduplicates repeated cone patterns' '
+    rm -f repo/.git/info/sparse-checkout &&
+    git -C repo sparse-checkout init --cone &&
+    git -C repo sparse-checkout add --stdin <<-\EOF &&
+	foo/bar/baz
+	a/b/c
+	foo/bar/baz
+	a/b
+	EOF
+    cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/a/
+	!/a/*/
+	/foo/
+	!/foo/*/
+	/foo/bar/
+	!/foo/bar/*/
+	/a/b/
+	/foo/bar/baz/
+	EOF
+    test_cmp expect repo/.git/info/sparse-checkout
+'
+
+test_expect_success 'sparse-checkout list deduplicates repeated cone patterns' '
+    rm -f repo/.git/info/sparse-checkout &&
+    git -C repo sparse-checkout init --cone &&
+    cat <<-\EOF >repo/.git/info/sparse-checkout &&
+	/*
+	!/*/
+	/a/
+	!/a/*/
+	/foo/
+	!/foo/*/
+	/foo/bar/
+	!/foo/bar/*/
+	/a/b/
+	/foo/bar/baz/
+	/foo/bar/baz/
+	EOF
+    git -C repo sparse-checkout list >actual &&
+    cat <<-\EOF >expect &&
+	a/b
+	foo/bar/baz
+	EOF
+    test_cmp expect actual
+'
+
 test_expect_success 'malformed cone-mode patterns' '
 	git -C repo sparse-checkout init --cone &&
 	mkdir -p repo/foo/bar &&
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 1/2] sparse-checkout: optimize string_list construction
  2026-01-19 17:04   ` Derrick Stolee
  2026-01-19 18:33     ` Pushkar Singh
@ 2026-01-20 15:47     ` Amisha Chhajed
  1 sibling, 0 replies; 28+ messages in thread
From: Amisha Chhajed @ 2026-01-20 15:47 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano, Elijah Newren, Jeff King

On Mon, 19 Jan 2026 at 22:34, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 1/19/2026 7:33 AM, amisha wrote:
> > From: Amisha Chhajed <amishhhaaaa@gmail.com>
> >
> > Improve O(n^2) complexity to O(n log n) while building a sorted
> > 'string_list' by constructing it unsorted then sorting it
> > followed by removing duplicates.
> >
> > Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
> > ---
> >  builtin/sparse-checkout.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> > index 15d51e60a8..7dfb276bf0 100644
> > --- a/builtin/sparse-checkout.c
> > +++ b/builtin/sparse-checkout.c
> > @@ -91,10 +91,11 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
> >
> >               hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
> >                       /* pe->pattern starts with "/", skip it */
> > -                     string_list_insert(&sl, pe->pattern + 1);
> > +                     string_list_append(&sl, pe->pattern + 1);
> >               }
> >
> >               string_list_sort(&sl);
> > +             string_list_remove_duplicates(&sl, 0);
>
> Shouldn't this line be added in the other uses of string_list_append()?
>
> >
> >               for (i = 0; i < sl.nr; i++) {
> >                       quote_c_style(sl.items[i].string, NULL, stdout, 0);
> > @@ -289,7 +290,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
> >               if (!hashmap_contains_parent(&pl->recursive_hashmap,
> >                                            pe->pattern,
> >                                            &parent_pattern))
> > -                     string_list_insert(&sl, pe->pattern);
> > +                     string_list_append(&sl, pe->pattern);
> >       }
> >
> >       string_list_sort(&sl);
> Actually, there is a string_list_remove_duplicates() just
> outside of the context of this diff.

I was wondering if the string_list_remove_duplicates here is redundant
as if we refer to the code that adds entries in the parent hashmap,
refer:

from git/dir.c
if (hashmap_get_entry(&pl->parent_hashmap, translated, ent, NULL)) {
/* we already included this at the parent level */
warning(_("your sparse-checkout file may have issues: pattern '%s' is
repeated"),
given->pattern);
goto clear_hashmaps;
}

It does not add duplicates to it, and we are only iterating on
parent_hashmap in this loop, the tests i have added in v6 do fail on
removal of other string_list_remove duplicates lines in this file,
however here i tried testing but i was not able to find a covering
case for this line.

> Keep in mind that you're not actually testing the 'list' command, because
> the 'add' command already deduplicated. You'll need to modify the
> sparse-checkout file itself to get an interesting test of the 'list'
> command.
>
> When not in cone mode, we should not be removing duplicates because the
> order of the patterns matters and we should not be reordering them. I'm
> not sure if that's relevant but it's something to keep in mind while you're
> testing, since the command will revert to non-cone mode if the
> sparse-checkout file doesn't match the cone mode pattern expectations.
>
> Thanks,
> -Stolee

Thank you it was a bit tricky to make the test fail directly on list
command, i was able to do it in v6 after this guidance.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] sparse-checkout: optimize string_list construction and add tests to verify deduplication.
  2026-01-20 15:38 ` [PATCH v6] sparse-checkout: optimize string_list construction and add tests to verify deduplication amisha
@ 2026-01-20 20:37   ` Derrick Stolee
  2026-01-21 13:00   ` [PATCH v7] " Amisha Chhajed
  1 sibling, 0 replies; 28+ messages in thread
From: Derrick Stolee @ 2026-01-20 20:37 UTC (permalink / raw)
  To: amisha, git; +Cc: Junio C Hamano, Elijah Newren, Jeff King

On 1/20/2026 10:38 AM, amisha wrote:
> From: Amisha Chhajed <amishhhaaaa@gmail.com>

Code is the same as last time, which is good.

> +test_expect_success 'sparse-checkout deduplicates repeated cone patterns' '
> +    rm -f repo/.git/info/sparse-checkout &&
> +    git -C repo sparse-checkout init --cone &&
> +    git -C repo sparse-checkout add --stdin <<-\EOF &&
> +	foo/bar/baz
> +	a/b/c
> +	foo/bar/baz
> +	a/b
> +	EOF
> +    cat >expect <<-\EOF &&
> +	/*
> +	!/*/
> +	/a/
> +	!/a/*/
> +	/foo/
> +	!/foo/*/
> +	/foo/bar/
> +	!/foo/bar/*/
> +	/a/b/
> +	/foo/bar/baz/
> +	EOF
> +    test_cmp expect repo/.git/info/sparse-checkout
> +'
> +
> +test_expect_success 'sparse-checkout list deduplicates repeated cone patterns' '
> +    rm -f repo/.git/info/sparse-checkout &&
> +    git -C repo sparse-checkout init --cone &&
> +    cat <<-\EOF >repo/.git/info/sparse-checkout &&
> +	/*
> +	!/*/
> +	/a/
> +	!/a/*/
> +	/foo/
> +	!/foo/*/
> +	/foo/bar/
> +	!/foo/bar/*/
> +	/a/b/
> +	/foo/bar/baz/
> +	/foo/bar/baz/
> +	EOF
> +    git -C repo sparse-checkout list >actual &&
> +    cat <<-\EOF >expect &&
> +	a/b
> +	foo/bar/baz
> +	EOF
> +    test_cmp expect actual
> +'
> +

These tests have the right structure, but there's a problem: it appears
that you've used four spaces for the first level of indent and then
use 8-width tabs for the next level. You can see that it disagrees with
the last line of the previous test in the diff context. This should be
fixed, and likely "git rebase --whitespace=fix" is how you landed on
the current use of tab characters.

I think the content between the EOFs shouldn't be indented more than
the 'cat' it's a part of, but I could be incorrect there.

Outside of the whitespace issues, I think this test looks good.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v7] sparse-checkout: optimize string_list construction and add tests to verify deduplication.
  2026-01-20 15:38 ` [PATCH v6] sparse-checkout: optimize string_list construction and add tests to verify deduplication amisha
  2026-01-20 20:37   ` Derrick Stolee
@ 2026-01-21 13:00   ` Amisha Chhajed
  2026-01-21 16:28     ` Derrick Stolee
  1 sibling, 1 reply; 28+ messages in thread
From: Amisha Chhajed @ 2026-01-21 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Elijah Newren, Jeff King,
	Amisha Chhajed

Improve O(n^2) complexity to O(n log n) while building a sorted
'string_list' by constructing it unsorted then sorting it
followed by removing duplicates.

sparse-checkout deduplicates repeated cone-mode patterns,
but this behaviour was previously untested, add tests that
verify that sparse-checkout file contain each cone
pattern only once and sparse-checkout list reports each pattern
only once.

Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
---
 builtin/sparse-checkout.c          |  7 +++--
 t/t1091-sparse-checkout-builtin.sh | 48 ++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 15d51e60a8..7dfb276bf0 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -91,10 +91,11 @@ static int sparse_checkout_list(int argc, const char **argv, const char *prefix,
 
 		hashmap_for_each_entry(&pl.recursive_hashmap, &iter, pe, ent) {
 			/* pe->pattern starts with "/", skip it */
-			string_list_insert(&sl, pe->pattern + 1);
+			string_list_append(&sl, pe->pattern + 1);
 		}
 
 		string_list_sort(&sl);
+		string_list_remove_duplicates(&sl, 0);
 
 		for (i = 0; i < sl.nr; i++) {
 			quote_c_style(sl.items[i].string, NULL, stdout, 0);
@@ -289,7 +290,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 		if (!hashmap_contains_parent(&pl->recursive_hashmap,
 					     pe->pattern,
 					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
+			string_list_append(&sl, pe->pattern);
 	}
 
 	string_list_sort(&sl);
@@ -311,7 +312,7 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 		if (!hashmap_contains_parent(&pl->recursive_hashmap,
 					     pe->pattern,
 					     &parent_pattern))
-			string_list_insert(&sl, pe->pattern);
+			string_list_append(&sl, pe->pattern);
 	}
 
 	strbuf_release(&parent_pattern);
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index b2da4feaef..cd0aed9975 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -817,6 +817,54 @@ test_expect_success 'cone mode clears ignored subdirectories' '
 	test_cmp expect out
 '
 
+test_expect_success 'sparse-checkout deduplicates repeated cone patterns' '
+	rm -f repo/.git/info/sparse-checkout &&
+	git -C repo sparse-checkout init --cone &&
+	git -C repo sparse-checkout add --stdin <<-\EOF &&
+	foo/bar/baz
+	a/b/c
+	foo/bar/baz
+	a/b
+	EOF
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/a/
+	!/a/*/
+	/foo/
+	!/foo/*/
+	/foo/bar/
+	!/foo/bar/*/
+	/a/b/
+	/foo/bar/baz/
+	EOF
+	test_cmp expect repo/.git/info/sparse-checkout
+'
+
+test_expect_success 'sparse-checkout list deduplicates repeated cone patterns' '
+	rm -f repo/.git/info/sparse-checkout &&
+	git -C repo sparse-checkout init --cone &&
+	cat <<-\EOF >repo/.git/info/sparse-checkout &&
+	/*
+	!/*/
+	/a/
+	!/a/*/
+	/foo/
+	!/foo/*/
+	/foo/bar/
+	!/foo/bar/*/
+	/a/b/
+	/foo/bar/baz/
+	/foo/bar/baz/
+	EOF
+	git -C repo sparse-checkout list >actual &&
+	cat <<-\EOF >expect &&
+	a/b
+	foo/bar/baz
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'malformed cone-mode patterns' '
 	git -C repo sparse-checkout init --cone &&
 	mkdir -p repo/foo/bar &&
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v7] sparse-checkout: optimize string_list construction and add tests to verify deduplication.
  2026-01-21 13:00   ` [PATCH v7] " Amisha Chhajed
@ 2026-01-21 16:28     ` Derrick Stolee
  2026-01-21 16:51       ` Junio C Hamano
  0 siblings, 1 reply; 28+ messages in thread
From: Derrick Stolee @ 2026-01-21 16:28 UTC (permalink / raw)
  To: Amisha Chhajed, git; +Cc: Junio C Hamano, Elijah Newren, Jeff King

On 1/21/2026 8:00 AM, Amisha Chhajed wrote:
> Improve O(n^2) complexity to O(n log n) while building a sorted
> 'string_list' by constructing it unsorted then sorting it
> followed by removing duplicates.

Thanks for iterating on this. I think v7 is good to go.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v7] sparse-checkout: optimize string_list construction and add tests to verify deduplication.
  2026-01-21 16:28     ` Derrick Stolee
@ 2026-01-21 16:51       ` Junio C Hamano
  0 siblings, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2026-01-21 16:51 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Amisha Chhajed, git, Elijah Newren, Jeff King

Derrick Stolee <stolee@gmail.com> writes:

> On 1/21/2026 8:00 AM, Amisha Chhajed wrote:
>> Improve O(n^2) complexity to O(n log n) while building a sorted
>> 'string_list' by constructing it unsorted then sorting it
>> followed by removing duplicates.
>
> Thanks for iterating on this. I think v7 is good to go.
>
> Thanks,
> -Stolee

Thanks, both.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-01-21 16:51 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-14 19:28 [PATCH] sparse-checkout: optimize string_list construction amisha
2026-01-14 21:35 ` Jeff King
2026-01-18  2:39   ` Derrick Stolee
2026-01-15 12:56 ` [PATCH v2] " amisha
2026-01-15 13:09 ` [PATCH v3] " amisha
2026-01-15 13:15   ` Amisha Chhajed
2026-01-15 20:09     ` Jeff King
2026-01-16 17:03       ` Amisha Chhajed
2026-01-15 22:26     ` René Scharfe
2026-01-16  8:30       ` Amisha Chhajed
2026-01-16 17:17         ` Junio C Hamano
2026-01-18  2:46           ` Derrick Stolee
2026-01-18 13:09             ` Amisha Chhajed
2026-01-15 13:55   ` Junio C Hamano
2026-01-16 16:50 ` [PATCH] " amisha
2026-01-16 19:11   ` Junio C Hamano
2026-01-18 13:07     ` Amisha Chhajed
2026-01-19  5:32     ` Jeff King
2026-01-19 19:06       ` Junio C Hamano
2026-01-19 12:33 ` [PATCH v5 1/2] " amisha
2026-01-19 17:04   ` Derrick Stolee
2026-01-19 18:33     ` Pushkar Singh
2026-01-20 15:47     ` Amisha Chhajed
2026-01-20 15:38 ` [PATCH v6] sparse-checkout: optimize string_list construction and add tests to verify deduplication amisha
2026-01-20 20:37   ` Derrick Stolee
2026-01-21 13:00   ` [PATCH v7] " Amisha Chhajed
2026-01-21 16:28     ` Derrick Stolee
2026-01-21 16:51       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox