[GSoC][PATCH] merge-recursive: optimize string

public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed

* [GSoC][PATCH] merge-recursive: optimize string_list construction
@ 2025-02-11 19:43 Meet Soni
  2025-02-11 20:59 ` Elijah Newren
  2025-02-13  9:00 ` [RFC PATCH 0/2] merge-recursive: optimize time complexity Meet Soni
  0 siblings, 2 replies; 17+ messages in thread
From: Meet Soni @ 2025-02-11 19:43 UTC (permalink / raw)
  To: git; +Cc: Meet Soni, Junio C Hamano, Derrick Stolee, Elijah Newren,
	Jeff King

Avoid O(n^2) complexity when building a sorted `string_list` by
constructing it unsorted and sorting it afterward, reducing the
complexity to O(n log n).

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
---
 merge-recursive.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 5dfaf32b2c..c43b79e6ef 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2757,24 +2757,18 @@ static int process_renames(struct merge_options *opt,
 	struct string_list b_by_dst = STRING_LIST_INIT_NODUP;
 	const struct rename *sre;
 
-	/*
-	 * FIXME: As string-list.h notes, it's O(n^2) to build a sorted
-	 * string_list one-by-one, but O(n log n) to build it unsorted and
-	 * then sort it.  Note that as we build the list, we do not need to
-	 * check if the existing destination path is already in the list,
-	 * because the structure of diffcore_rename guarantees we won't
-	 * have duplicates.
-	 */
 	for (i = 0; i < a_renames->nr; i++) {
 		sre = a_renames->items[i].util;
-		string_list_insert(&a_by_dst, sre->pair->two->path)->util
+		string_list_append(&a_by_dst, sre->pair->two->path)->util
 			= (void *)sre;
 	}
 	for (i = 0; i < b_renames->nr; i++) {
 		sre = b_renames->items[i].util;
-		string_list_insert(&b_by_dst, sre->pair->two->path)->util
+		string_list_append(&b_by_dst, sre->pair->two->path)->util
 			= (void *)sre;
 	}
+	string_list_sort(&a_by_dst);
+	string_list_sort(&b_by_dst);
 
 	for (i = 0, j = 0; i < a_renames->nr || j < b_renames->nr;) {
 		struct string_list *renames1, *renames2Dst;

base-commit: 9520f7d9985d8879bddd157309928fc0679c8e92
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [GSoC][PATCH] merge-recursive: optimize string_list construction
  2025-02-11 19:43 [GSoC][PATCH] merge-recursive: optimize string_list construction Meet Soni
@ 2025-02-11 20:59 ` Elijah Newren
  2025-02-13  9:00 ` [RFC PATCH 0/2] merge-recursive: optimize time complexity Meet Soni
  1 sibling, 0 replies; 17+ messages in thread
From: Elijah Newren @ 2025-02-11 20:59 UTC (permalink / raw)
  To: Meet Soni; +Cc: git, Junio C Hamano, Derrick Stolee, Jeff King

On Tue, Feb 11, 2025 at 11:43 AM Meet Soni <meetsoni3017@gmail.com> wrote:
>
> Avoid O(n^2) complexity when building a sorted `string_list` by
> constructing it unsorted and sorting it afterward, reducing the
> complexity to O(n log n).

I'm tempted to say merge-recursive.[ch] is nearly dead and planned for
removal, so there's not much value in messing with it, but...it's not
dead yet, so I guess this is worthwhile.

> Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
> ---
>  merge-recursive.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 5dfaf32b2c..c43b79e6ef 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -2757,24 +2757,18 @@ static int process_renames(struct merge_options *opt,
>         struct string_list b_by_dst = STRING_LIST_INIT_NODUP;
>         const struct rename *sre;
>
> -       /*
> -        * FIXME: As string-list.h notes, it's O(n^2) to build a sorted
> -        * string_list one-by-one, but O(n log n) to build it unsorted and
> -        * then sort it.  Note that as we build the list, we do not need to
> -        * check if the existing destination path is already in the list,
> -        * because the structure of diffcore_rename guarantees we won't
> -        * have duplicates.
> -        */
>         for (i = 0; i < a_renames->nr; i++) {
>                 sre = a_renames->items[i].util;
> -               string_list_insert(&a_by_dst, sre->pair->two->path)->util
> +               string_list_append(&a_by_dst, sre->pair->two->path)->util
>                         = (void *)sre;
>         }
>         for (i = 0; i < b_renames->nr; i++) {
>                 sre = b_renames->items[i].util;
> -               string_list_insert(&b_by_dst, sre->pair->two->path)->util
> +               string_list_append(&b_by_dst, sre->pair->two->path)->util
>                         = (void *)sre;
>         }
> +       string_list_sort(&a_by_dst);
> +       string_list_sort(&b_by_dst);

If the original source had duplicates, this would change behavior (the
insert function checks for duplicates while append does not).
Granted, the comment above the block points out why there aren't
duplicates, but will that be obvious to future readers now that you've
removed the whole comment?

Also, are the sources already sorted?  If so, we can avoid the manual
sort calls at the end, and drop this from O(n log n) to O(n).  Digging
through the code...it appears these are setup in get_renames() and are
sorted but by pair->one->path rather than pair->two->path, so we do
need the sorts here.

Of course, get_renames() itself utilizes string_list_insert() rather
than string_list_append. and there are a number of other
string_list_insert calls in the code (though some of the others might
be hard to restructure) -- perhaps the first line of your commit
message should have a "in process_renames" qualifier, since it's only
addressing one case?

Anyway, other than perhaps tweaking the first line of the commit
message, and not removing the whole comment, the patch looks good to
me.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 0/2] merge-recursive: optimize time complexity
  2025-02-11 19:43 [GSoC][PATCH] merge-recursive: optimize string_list construction Meet Soni
  2025-02-11 20:59 ` Elijah Newren
@ 2025-02-13  9:00 ` Meet Soni
  2025-02-13  9:00   ` [RFC PATCH 1/2] merge-recursive: optimize time complexity for process_renames Meet Soni
                     ` (4 more replies)
  1 sibling, 5 replies; 17+ messages in thread
From: Meet Soni @ 2025-02-13  9:00 UTC (permalink / raw)
  To: git; +Cc: Meet Soni

changes in this version:
    - Updated comment and commit message as per review.
    - Added another commit implementing optimization logic.
    - added an RFC tag since, if the changes in 2nd commit are
      appropriate, we can apply similar logic in other places as
      well.

Meet Soni (2):
  merge-recursive: optimize time complexity for process_renames
  merge-recursive: optimize time complexity for get_unmerged

 merge-recursive.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

Range-diff:
1:  ec96e4010e ! 1:  c7dca6e971 merge-recursive: optimize string_list construction
    @@ Metadata
     Author: Meet Soni <meetsoni3017@gmail.com>
     
      ## Commit message ##
    -    merge-recursive: optimize string_list construction
    +    merge-recursive: optimize time complexity for process_renames
     
    -    Avoid O(n^2) complexity when building a sorted `string_list` by
    -    constructing it unsorted and sorting it afterward, reducing the
    -    complexity to O(n log n).
    +    Avoid O(n^2) complexity in `process_renames()` when building a sorted
    +    `string_list` by constructing it unsorted and sorting it afterward,
    +    reducing the complexity to O(n log n).
     
         Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
     
      ## merge-recursive.c ##
     @@ merge-recursive.c: static int process_renames(struct merge_options *opt,
    - 	struct string_list b_by_dst = STRING_LIST_INIT_NODUP;
      	const struct rename *sre;
      
    --	/*
    + 	/*
     -	 * FIXME: As string-list.h notes, it's O(n^2) to build a sorted
     -	 * string_list one-by-one, but O(n log n) to build it unsorted and
     -	 * then sort it.  Note that as we build the list, we do not need to
     -	 * check if the existing destination path is already in the list,
     -	 * because the structure of diffcore_rename guarantees we won't
     -	 * have duplicates.
    --	 */
    ++	 * Note that as we build the list, we do not need to check if the
    ++	 * existing destination path is already in the list, because the
    ++	 * structure of diffcore_rename guarantees we won't have duplicates.
    + 	 */
      	for (i = 0; i < a_renames->nr; i++) {
      		sre = a_renames->items[i].util;
     -		string_list_insert(&a_by_dst, sre->pair->two->path)->util
-:  ---------- > 2:  78a007be7d merge-recursive: optimize time complexity for get_unmerged
-- 
2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/2] merge-recursive: optimize time complexity for process_renames
  2025-02-13  9:00 ` [RFC PATCH 0/2] merge-recursive: optimize time complexity Meet Soni
@ 2025-02-13  9:00   ` Meet Soni
  2025-02-13 17:06     ` Elijah Newren
  2025-02-13  9:00   ` [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged Meet Soni
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Meet Soni @ 2025-02-13  9:00 UTC (permalink / raw)
  To: git; +Cc: Meet Soni, Elijah Newren, Derrick Stolee, Junio C Hamano

Avoid O(n^2) complexity in `process_renames()` when building a sorted
`string_list` by constructing it unsorted and sorting it afterward,
reducing the complexity to O(n log n).

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
---
 merge-recursive.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 5dfaf32b2c..884ccf99a5 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2758,23 +2758,22 @@ static int process_renames(struct merge_options *opt,
 	const struct rename *sre;
 
 	/*
-	 * FIXME: As string-list.h notes, it's O(n^2) to build a sorted
-	 * string_list one-by-one, but O(n log n) to build it unsorted and
-	 * then sort it.  Note that as we build the list, we do not need to
-	 * check if the existing destination path is already in the list,
-	 * because the structure of diffcore_rename guarantees we won't
-	 * have duplicates.
+	 * Note that as we build the list, we do not need to check if the
+	 * existing destination path is already in the list, because the
+	 * structure of diffcore_rename guarantees we won't have duplicates.
 	 */
 	for (i = 0; i < a_renames->nr; i++) {
 		sre = a_renames->items[i].util;
-		string_list_insert(&a_by_dst, sre->pair->two->path)->util
+		string_list_append(&a_by_dst, sre->pair->two->path)->util
 			= (void *)sre;
 	}
 	for (i = 0; i < b_renames->nr; i++) {
 		sre = b_renames->items[i].util;
-		string_list_insert(&b_by_dst, sre->pair->two->path)->util
+		string_list_append(&b_by_dst, sre->pair->two->path)->util
 			= (void *)sre;
 	}
+	string_list_sort(&a_by_dst);
+	string_list_sort(&b_by_dst);
 
 	for (i = 0, j = 0; i < a_renames->nr || j < b_renames->nr;) {
 		struct string_list *renames1, *renames2Dst;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged
  2025-02-13  9:00 ` [RFC PATCH 0/2] merge-recursive: optimize time complexity Meet Soni
  2025-02-13  9:00   ` [RFC PATCH 1/2] merge-recursive: optimize time complexity for process_renames Meet Soni
@ 2025-02-13  9:00   ` Meet Soni
  2025-02-13 17:11     ` Elijah Newren
  2025-02-13 11:02   ` [RFC PATCH 0/2] merge-recursive: optimize time complexity Meet Soni
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Meet Soni @ 2025-02-13  9:00 UTC (permalink / raw)
  To: git; +Cc: Meet Soni, Junio C Hamano

Previously, `get_unmerged()` used `string_list_insert()`, which has an
O(n^2) complexity due to shifting elements on each insertion. It also
called `string_list_lookup()` before insertion, which performs a binary
search in O(log n). This combination made insertion costly, especially
for large index states, as each new entry required both a search and
potentially shifting many elements.

Replace `string_list_insert()` with `string_list_append()` to achieve
O(n) insertion. After all entries are added, sort the list in O(n log n)
and remove duplicates in O(n), reducing the overall complexity to
O(n log n). This improves performance significantly for large datasets
while maintaining correctness.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
---
 merge-recursive.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 884ccf99a5..6165993429 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -547,15 +547,15 @@ static struct string_list *get_unmerged(struct index_state *istate)
 		if (!ce_stage(ce))
 			continue;

-		item = string_list_lookup(unmerged, ce->name);
-		if (!item) {
-			item = string_list_insert(unmerged, ce->name);
-			item->util = xcalloc(1, sizeof(struct stage_data));
-		}
+		item = string_list_append(unmerged, ce->name);
+		item->util = xcalloc(1, sizeof(struct stage_data));
+
 		e = item->util;
 		e->stages[ce_stage(ce)].mode = ce->ce_mode;
 		oidcpy(&e->stages[ce_stage(ce)].oid, &ce->oid);
 	}
+	string_list_sort(unmerged);
+	string_list_remove_duplicates(unmerged, 1);

 	return unmerged;
 }
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/2] merge-recursive: optimize time complexity
  2025-02-13  9:00 ` [RFC PATCH 0/2] merge-recursive: optimize time complexity Meet Soni
  2025-02-13  9:00   ` [RFC PATCH 1/2] merge-recursive: optimize time complexity for process_renames Meet Soni
  2025-02-13  9:00   ` [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged Meet Soni
@ 2025-02-13 11:02   ` Meet Soni
  2025-02-13 18:30   ` Elijah Newren
  2025-02-14  4:41   ` [GSoC][PATCH v2] merge-recursive: optimize time complexity for process_renames Meet Soni
  4 siblings, 0 replies; 17+ messages in thread
From: Meet Soni @ 2025-02-13 11:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Elijah Newren, Derrick Stolee, Jeff King

Hi everyone.

Adding the missing CCs who were unintentionally dropped in my original email.
Please refer to the previous message for the patch details.
Link to the patch:
https://lore.kernel.org/git/20250213090040.16133-1-meetsoni3017@gmail.com/

Thanks,
Meet

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] merge-recursive: optimize time complexity for process_renames
  2025-02-13  9:00   ` [RFC PATCH 1/2] merge-recursive: optimize time complexity for process_renames Meet Soni
@ 2025-02-13 17:06     ` Elijah Newren
  0 siblings, 0 replies; 17+ messages in thread
From: Elijah Newren @ 2025-02-13 17:06 UTC (permalink / raw)
  To: Meet Soni; +Cc: git, Derrick Stolee, Junio C Hamano

On Thu, Feb 13, 2025 at 1:00 AM Meet Soni <meetsoni3017@gmail.com> wrote:
>
> Avoid O(n^2) complexity in `process_renames()` when building a sorted
> `string_list` by constructing it unsorted and sorting it afterward,
> reducing the complexity to O(n log n).
>
> Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
> ---
>  merge-recursive.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 5dfaf32b2c..884ccf99a5 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -2758,23 +2758,22 @@ static int process_renames(struct merge_options *opt,
>         const struct rename *sre;
>
>         /*
> -        * FIXME: As string-list.h notes, it's O(n^2) to build a sorted
> -        * string_list one-by-one, but O(n log n) to build it unsorted and
> -        * then sort it.  Note that as we build the list, we do not need to
> -        * check if the existing destination path is already in the list,
> -        * because the structure of diffcore_rename guarantees we won't
> -        * have duplicates.
> +        * Note that as we build the list, we do not need to check if the
> +        * existing destination path is already in the list, because the
> +        * structure of diffcore_rename guarantees we won't have duplicates.
>          */
>         for (i = 0; i < a_renames->nr; i++) {
>                 sre = a_renames->items[i].util;
> -               string_list_insert(&a_by_dst, sre->pair->two->path)->util
> +               string_list_append(&a_by_dst, sre->pair->two->path)->util
>                         = (void *)sre;
>         }
>         for (i = 0; i < b_renames->nr; i++) {
>                 sre = b_renames->items[i].util;
> -               string_list_insert(&b_by_dst, sre->pair->two->path)->util
> +               string_list_append(&b_by_dst, sre->pair->two->path)->util
>                         = (void *)sre;
>         }
> +       string_list_sort(&a_by_dst);
> +       string_list_sort(&b_by_dst);
>
>         for (i = 0, j = 0; i < a_renames->nr || j < b_renames->nr;) {
>                 struct string_list *renames1, *renames2Dst;
> --
> 2.34.1

This version looks good to me.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged
  2025-02-13  9:00   ` [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged Meet Soni
@ 2025-02-13 17:11     ` Elijah Newren
  2025-02-13 18:30       ` Junio C Hamano
  2025-02-14  4:28       ` Meet Soni
  0 siblings, 2 replies; 17+ messages in thread
From: Elijah Newren @ 2025-02-13 17:11 UTC (permalink / raw)
  To: Meet Soni; +Cc: git, Junio C Hamano

On Thu, Feb 13, 2025 at 1:01 AM Meet Soni <meetsoni3017@gmail.com> wrote:
>
> Previously, `get_unmerged()` used `string_list_insert()`, which has an
> O(n^2) complexity due to shifting elements on each insertion. It also
> called `string_list_lookup()` before insertion, which performs a binary
> search in O(log n).

Okay.

> This combination made insertion costly, especially
> for large index states, as each new entry required both a search and
> potentially shifting many elements.

Why does the combination make it costly?  O(log n) + O(n^2) is still
O(n^2), so I don't see why it matters to mention the combination.
Could you clarify?

Also, does it actually make it costly, or do you only suspect that it
does?  O(n^2) worst case sometimes behaves O(n) or O(n log n) in some
cases.  Since your commit message says "made insertion costly" instead
of "might make insertion costly", I think that would suggest you have
some performance numbers to back this up on some interesting real
world repository.  Do you?  Can you share them?

> Replace `string_list_insert()` with `string_list_append()` to achieve
> O(n) insertion. After all entries are added, sort the list in O(n log n)
> and remove duplicates in O(n), reducing the overall complexity to
> O(n log n).

Okay.

> This improves performance significantly for large datasets

That's a big claim; it may be true, but without evidence I don't
believe it for three reasons : (1) n here is the number of conflicts,
not the number of files in the repo or the number of lines being
merged.  Thus, n is typically small.  (2) Other O(n^2) behavior in
merge-recursive likely drowns this particular codepath out, so any
gains here just aren't going to be noticed, (3) After looking at the
code and knowing the specialized structure of the index, I think that
while string_list_insert() for n items in general is going to be
O(n^2), it will likely functionally be O(n log n) for this particular
code path, meaning you haven't actually improved the performance.

> while maintaining correctness.

More on that below.

> Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
> ---
>  merge-recursive.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 884ccf99a5..6165993429 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -547,15 +547,15 @@ static struct string_list *get_unmerged(struct index_state *istate)
>                 if (!ce_stage(ce))
>                         continue;
>
> -               item = string_list_lookup(unmerged, ce->name);
> -               if (!item) {
> -                       item = string_list_insert(unmerged, ce->name);
> -                       item->util = xcalloc(1, sizeof(struct stage_data));
> -               }
> +               item = string_list_append(unmerged, ce->name);
> +               item->util = xcalloc(1, sizeof(struct stage_data));
> +
>                 e = item->util;
>                 e->stages[ce_stage(ce)].mode = ce->ce_mode;
>                 oidcpy(&e->stages[ce_stage(ce)].oid, &ce->oid);

Did you run any tests?  I'm not sure you maintained correctness here.

>         }
> +       string_list_sort(unmerged);
> +       string_list_remove_duplicates(unmerged, 1);
>
>         return unmerged;
>  }
> --
> 2.34.1

(As a side note, due to the specialized structure of the input, I
suspect this code could be modified to run in O(n), i.e. we could skip
the string_list_lookup and the string_list_sort and the
string_list_remove_duplicates...  But, it'd make the code trickier, so
it'd need to be carefully commented, the change would need to be
justified, and it'd need to be carefully tested.  Even if we weren't
planning to delete this entire file, I suspect it's not possible to
find a case justifying such a change without optimizing several other
things in merge-recursive first, but optimizing those things probably
results in a significant rewrite...which we've already done with
merge-ort.)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged
  2025-02-13 17:11     ` Elijah Newren
@ 2025-02-13 18:30       ` Junio C Hamano
  2025-02-13 18:45         ` Elijah Newren
  2025-02-14  4:28       ` Meet Soni
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2025-02-13 18:30 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Meet Soni, git

Elijah Newren <newren@gmail.com> writes:

> (As a side note, due to the specialized structure of the input, I
> suspect this code could be modified to run in O(n), i.e. we could skip
> the string_list_lookup and the string_list_sort and the
> string_list_remove_duplicates...

Are you talking about the input being already sorted so we can just
walk the multiple input and merge them into a single stream?  In the
cost analysis you did earlier in the message I am responding to,
being able to go down to O(n) sounds really like a great thing ;-)

> But, it'd make the code trickier, so
> it'd need to be carefully commented, the change would need to be
> justified, and it'd need to be carefully tested.  

... and measured.

> Even if we weren't
> planning to delete this entire file, I suspect it's not possible to
> find a case justifying such a change without optimizing several other
> things in merge-recursive first, but optimizing those things probably
> results in a significant rewrite...which we've already done with
> merge-ort.)

Sounds like unless the performance issues are shared between the
two, it may not be worth to spend too much brain cycles only on the
"recursive" one?

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/2] merge-recursive: optimize time complexity
  2025-02-13  9:00 ` [RFC PATCH 0/2] merge-recursive: optimize time complexity Meet Soni
                     ` (2 preceding siblings ...)
  2025-02-13 11:02   ` [RFC PATCH 0/2] merge-recursive: optimize time complexity Meet Soni
@ 2025-02-13 18:30   ` Elijah Newren
  2025-02-14  4:41   ` [GSoC][PATCH v2] merge-recursive: optimize time complexity for process_renames Meet Soni
  4 siblings, 0 replies; 17+ messages in thread
From: Elijah Newren @ 2025-02-13 18:30 UTC (permalink / raw)
  To: Meet Soni; +Cc: git

Hi,

On Thu, Feb 13, 2025 at 1:01 AM Meet Soni <meetsoni3017@gmail.com> wrote:
>
> changes in this version:
>     - Updated comment and commit message as per review.
>     - Added another commit implementing optimization logic.
>     - added an RFC tag since, if the changes in 2nd commit are
>       appropriate, we can apply similar logic in other places as
>       well.

The 1st patch looks good.  The 2nd appears to have some problems, as
per comments I left on it -- it might be easier to drop the second
patch and just apply the first.  I don't think merge-recursive is
worth putting much effort into (there's value in providing feedback on
patches by new contributors, because new contributors are valuable,
but there's really not much value in tweaking this particular file),
so I'd advise against adding more patches to this series that
transform more of merge-recursive.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged
  2025-02-13 18:30       ` Junio C Hamano
@ 2025-02-13 18:45         ` Elijah Newren
  0 siblings, 0 replies; 17+ messages in thread
From: Elijah Newren @ 2025-02-13 18:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Meet Soni, git

On Thu, Feb 13, 2025 at 10:30 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > (As a side note, due to the specialized structure of the input, I
> > suspect this code could be modified to run in O(n), i.e. we could skip
> > the string_list_lookup and the string_list_sort and the
> > string_list_remove_duplicates...
>
> Are you talking about the input being already sorted so we can just
> walk the multiple input and merge them into a single stream?  In the

I'm not sure what you mean by "merge them into a single stream".  I
think you have the right idea that we are creating a string list of
information about unmerged entries, and since we're taking information
from the index which is already sorted, we can just either modify the
last entry in the list if it matches or append a new entry to it; no
need to walk, insert, or binary search the list at all.

> cost analysis you did earlier in the message I am responding to,
> being able to go down to O(n) sounds really like a great thing ;-)

Note first that we aren't going from O(n^2) -> O(n), we're only going
from O(n log n) -> O(n).  That's still great, but:

  * n is typically pretty small (number of unmerged files)
  * there's things in merge-recursive that are O(m^2), where typically
m >> n (number of files in repo, or number of lines in big files in
the repo)
  * merge-recursive is used by almost no one
  * we are planning to delete merge-recursive

So, although O(n) is great....

> > But, it'd make the code trickier, so
> > it'd need to be carefully commented, the change would need to be
> > justified, and it'd need to be carefully tested.
>
> ... and measured.

+1

> > Even if we weren't
> > planning to delete this entire file, I suspect it's not possible to
> > find a case justifying such a change without optimizing several other
> > things in merge-recursive first, but optimizing those things probably
> > results in a significant rewrite...which we've already done with
> > merge-ort.)
>
> Sounds like unless the performance issues are shared between the
> two, it may not be worth to spend too much brain cycles only on the
> "recursive" one?

...yep, exactly, and this is not a performance issue shared with the
ort backend; it's unique to the recursive one.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged
  2025-02-13 17:11     ` Elijah Newren
  2025-02-13 18:30       ` Junio C Hamano
@ 2025-02-14  4:28       ` Meet Soni
  2025-02-14  6:04         ` Elijah Newren
  1 sibling, 1 reply; 17+ messages in thread
From: Meet Soni @ 2025-02-14  4:28 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Junio C Hamano

On Thu, 13 Feb 2025 at 22:41, Elijah Newren <newren@gmail.com> wrote:
>
> On Thu, Feb 13, 2025 at 1:01 AM Meet Soni <meetsoni3017@gmail.com> wrote:
> >
> > Previously, `get_unmerged()` used `string_list_insert()`, which has an
> > O(n^2) complexity due to shifting elements on each insertion. It also
> > called `string_list_lookup()` before insertion, which performs a binary
> > search in O(log n).
>
> Okay.
>
> > This combination made insertion costly, especially
> > for large index states, as each new entry required both a search and
> > potentially shifting many elements.
>
> Why does the combination make it costly?  O(log n) + O(n^2) is still
> O(n^2), so I don't see why it matters to mention the combination.
> Could you clarify?
>
> Also, does it actually make it costly, or do you only suspect that it
> does?  O(n^2) worst case sometimes behaves O(n) or O(n log n) in some
> cases.  Since your commit message says "made insertion costly" instead
> of "might make insertion costly", I think that would suggest you have
> some performance numbers to back this up on some interesting real
> world repository.  Do you?  Can you share them?
>
Sorry, I should've specified, this patch is purely theoretical, I was
aiming for a trial
and error kind of approach.

> > Replace `string_list_insert()` with `string_list_append()` to achieve
> > O(n) insertion. After all entries are added, sort the list in O(n log n)
> > and remove duplicates in O(n), reducing the overall complexity to
> > O(n log n).
>
> Okay.
>
> > This improves performance significantly for large datasets
>
> That's a big claim; it may be true, but without evidence I don't
> believe it for three reasons : (1) n here is the number of conflicts,
> not the number of files in the repo or the number of lines being
> merged.  Thus, n is typically small.  (2) Other O(n^2) behavior in
> merge-recursive likely drowns this particular codepath out, so any
> gains here just aren't going to be noticed, (3) After looking at the
> code and knowing the specialized structure of the index, I think that
> while string_list_insert() for n items in general is going to be
> O(n^2), it will likely functionally be O(n log n) for this particular
> code path, meaning you haven't actually improved the performance.
>
> > while maintaining correctness.
>
> More on that below.
>
>
> > Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
> > ---
> >  merge-recursive.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/merge-recursive.c b/merge-recursive.c
> > index 884ccf99a5..6165993429 100644
> > --- a/merge-recursive.c
> > +++ b/merge-recursive.c
> > @@ -547,15 +547,15 @@ static struct string_list *get_unmerged(struct index_state *istate)
> >                 if (!ce_stage(ce))
> >                         continue;
> >
> > -               item = string_list_lookup(unmerged, ce->name);
> > -               if (!item) {
> > -                       item = string_list_insert(unmerged, ce->name);
> > -                       item->util = xcalloc(1, sizeof(struct stage_data));
> > -               }
> > +               item = string_list_append(unmerged, ce->name);
> > +               item->util = xcalloc(1, sizeof(struct stage_data));
> > +
> >                 e = item->util;
> >                 e->stages[ce_stage(ce)].mode = ce->ce_mode;
> >                 oidcpy(&e->stages[ce_stage(ce)].oid, &ce->oid);
>
> Did you run any tests?  I'm not sure you maintained correctness here.

I didn't run any tests -- I wanted to, but I wasn’t sure how to do it
for this change. Since you suggested dropping this patch from the
series, I’ll do that. But for similar changes in the future, how should I go
about testing them?
>
> >         }
> > +       string_list_sort(unmerged);
> > +       string_list_remove_duplicates(unmerged, 1);
> >
> >         return unmerged;
> >  }
> > --
> > 2.34.1
>
> (As a side note, due to the specialized structure of the input, I
> suspect this code could be modified to run in O(n), i.e. we could skip
> the string_list_lookup and the string_list_sort and the
> string_list_remove_duplicates...  But, it'd make the code trickier, so
> it'd need to be carefully commented, the change would need to be
> justified, and it'd need to be carefully tested.  Even if we weren't
> planning to delete this entire file, I suspect it's not possible to
> find a case justifying such a change without optimizing several other
> things in merge-recursive first, but optimizing those things probably
> results in a significant rewrite...which we've already done with
> merge-ort.)
Makes sense.

Thankyou for reviewing,
Meet

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [GSoC][PATCH v2] merge-recursive: optimize time complexity for process_renames
  2025-02-13  9:00 ` [RFC PATCH 0/2] merge-recursive: optimize time complexity Meet Soni
                     ` (3 preceding siblings ...)
  2025-02-13 18:30   ` Elijah Newren
@ 2025-02-14  4:41   ` Meet Soni
  4 siblings, 0 replies; 17+ messages in thread
From: Meet Soni @ 2025-02-14  4:41 UTC (permalink / raw)
  To: git; +Cc: peff, Meet Soni, Elijah Newren, Junio C Hamano, Derrick Stolee

Avoid O(n^2) complexity in `process_renames()` when building a sorted
`string_list` by constructing it unsorted and sorting it afterward,
reducing the complexity to O(n log n).

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
---
Range-diff against v1:
1:  c7dca6e971 = 1:  c7dca6e971 merge-recursive: optimize time complexity for process_renames
2:  78a007be7d < -:  ---------- merge-recursive: optimize time complexity for get_unmerged

 merge-recursive.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 5dfaf32b2c..884ccf99a5 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2758,23 +2758,22 @@ static int process_renames(struct merge_options *opt,
 	const struct rename *sre;
 
 	/*
-	 * FIXME: As string-list.h notes, it's O(n^2) to build a sorted
-	 * string_list one-by-one, but O(n log n) to build it unsorted and
-	 * then sort it.  Note that as we build the list, we do not need to
-	 * check if the existing destination path is already in the list,
-	 * because the structure of diffcore_rename guarantees we won't
-	 * have duplicates.
+	 * Note that as we build the list, we do not need to check if the
+	 * existing destination path is already in the list, because the
+	 * structure of diffcore_rename guarantees we won't have duplicates.
 	 */
 	for (i = 0; i < a_renames->nr; i++) {
 		sre = a_renames->items[i].util;
-		string_list_insert(&a_by_dst, sre->pair->two->path)->util
+		string_list_append(&a_by_dst, sre->pair->two->path)->util
 			= (void *)sre;
 	}
 	for (i = 0; i < b_renames->nr; i++) {
 		sre = b_renames->items[i].util;
-		string_list_insert(&b_by_dst, sre->pair->two->path)->util
+		string_list_append(&b_by_dst, sre->pair->two->path)->util
 			= (void *)sre;
 	}
+	string_list_sort(&a_by_dst);
+	string_list_sort(&b_by_dst);
 
 	for (i = 0, j = 0; i < a_renames->nr || j < b_renames->nr;) {
 		struct string_list *renames1, *renames2Dst;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged
  2025-02-14  4:28       ` Meet Soni
@ 2025-02-14  6:04         ` Elijah Newren
  2025-02-14  8:24           ` Meet Soni
  0 siblings, 1 reply; 17+ messages in thread
From: Elijah Newren @ 2025-02-14  6:04 UTC (permalink / raw)
  To: Meet Soni; +Cc: git, Junio C Hamano

On Thu, Feb 13, 2025 at 8:28 PM Meet Soni <meetsoni3017@gmail.com> wrote:
>
> On Thu, 13 Feb 2025 at 22:41, Elijah Newren <newren@gmail.com> wrote:
> >
> > On Thu, Feb 13, 2025 at 1:01 AM Meet Soni <meetsoni3017@gmail.com> wrote:
...
> > > diff --git a/merge-recursive.c b/merge-recursive.c
> > > index 884ccf99a5..6165993429 100644
> > > --- a/merge-recursive.c
> > > +++ b/merge-recursive.c
> > > @@ -547,15 +547,15 @@ static struct string_list *get_unmerged(struct index_state *istate)
> > >                 if (!ce_stage(ce))
> > >                         continue;
> > >
> > > -               item = string_list_lookup(unmerged, ce->name);
> > > -               if (!item) {
> > > -                       item = string_list_insert(unmerged, ce->name);
> > > -                       item->util = xcalloc(1, sizeof(struct stage_data));
> > > -               }
> > > +               item = string_list_append(unmerged, ce->name);
> > > +               item->util = xcalloc(1, sizeof(struct stage_data));
> > > +
> > >                 e = item->util;
> > >                 e->stages[ce_stage(ce)].mode = ce->ce_mode;
> > >                 oidcpy(&e->stages[ce_stage(ce)].oid, &ce->oid);
> >
> > Did you run any tests?  I'm not sure you maintained correctness here.
>
> I didn't run any tests -- I wanted to, but I wasn’t sure how to do it
> for this change. Since you suggested dropping this patch from the
> series, I’ll do that. But for similar changes in the future, how should I go
> about testing them?

As per Documentation/CodingGuidelines: "After any code change, make
sure that the entire test suite passes."  You can do that by running:
    cd t && make
(You probably want to also run that before making any changes, just to
verify that they all pass for you.  Then, if any test fails after you
make changes, you know it's because of your changes rather than
because you missed something in building or setting up the tests.)


And although it doesn't matter since we're dropping this patch, the
issue I noticed was that if there were, say, three unmerged entries
with the same path, the original code would create one entry in the
string list and modify it 3 times (each with a different ce_stage(ce).
Your modification would create three different entries (each with only
information from one stage) and drop two of them, meaning we no longer
have a single string_list_item that contains information from all 3
unmerged entries for the same path.  I'm pretty sure running the
existing tests would catch that kind of bug, which is what raised the
question.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged
  2025-02-14  6:04         ` Elijah Newren
@ 2025-02-14  8:24           ` Meet Soni
  2025-02-14 19:00             ` Elijah Newren
  0 siblings, 1 reply; 17+ messages in thread
From: Meet Soni @ 2025-02-14  8:24 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Junio C Hamano

On Fri, 14 Feb 2025 at 11:35, Elijah Newren <newren@gmail.com> wrote:
>
> > > Did you run any tests?  I'm not sure you maintained correctness here.
> >
> > I didn't run any tests -- I wanted to, but I wasn’t sure how to do it
> > for this change. Since you suggested dropping this patch from the
> > series, I’ll do that. But for similar changes in the future, how should I go
> > about testing them?
>
> As per Documentation/CodingGuidelines: "After any code change, make
> sure that the entire test suite passes."  You can do that by running:
>     cd t && make
> (You probably want to also run that before making any changes, just to
> verify that they all pass for you.  Then, if any test fails after you
> make changes, you know it's because of your changes rather than
> because you missed something in building or setting up the tests.)
>
>
> And although it doesn't matter since we're dropping this patch, the
> issue I noticed was that if there were, say, three unmerged entries
> with the same path, the original code would create one entry in the
> string list and modify it 3 times (each with a different ce_stage(ce).
> Your modification would create three different entries (each with only
> information from one stage) and drop two of them, meaning we no longer
> have a single string_list_item that contains information from all 3
> unmerged entries for the same path.  I'm pretty sure running the
> existing tests would catch that kind of bug, which is what raised the
> question.

That's the thing -- I did run make in the t/ directory, and it passed. I was
just wondering if there's any other way to test this in isolation, in case
I want to verify such changes more directly in the future.

Thanks for the clarification!
Meet

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged
  2025-02-14  8:24           ` Meet Soni
@ 2025-02-14 19:00             ` Elijah Newren
  2025-02-15  8:42               ` Meet Soni
  0 siblings, 1 reply; 17+ messages in thread
From: Elijah Newren @ 2025-02-14 19:00 UTC (permalink / raw)
  To: Meet Soni; +Cc: git, Junio C Hamano

On Fri, Feb 14, 2025 at 12:24 AM Meet Soni <meetsoni3017@gmail.com> wrote:
>
> On Fri, 14 Feb 2025 at 11:35, Elijah Newren <newren@gmail.com> wrote:
> >
> > > > Did you run any tests?  I'm not sure you maintained correctness here.
> > >
> > > I didn't run any tests -- I wanted to, but I wasn’t sure how to do it
> > > for this change. Since you suggested dropping this patch from the
> > > series, I’ll do that. But for similar changes in the future, how should I go
> > > about testing them?
> >
> > As per Documentation/CodingGuidelines: "After any code change, make
> > sure that the entire test suite passes."  You can do that by running:
> >     cd t && make
> > (You probably want to also run that before making any changes, just to
> > verify that they all pass for you.  Then, if any test fails after you
> > make changes, you know it's because of your changes rather than
> > because you missed something in building or setting up the tests.)
> >
> >
> > And although it doesn't matter since we're dropping this patch, the
> > issue I noticed was that if there were, say, three unmerged entries
> > with the same path, the original code would create one entry in the
> > string list and modify it 3 times (each with a different ce_stage(ce).
> > Your modification would create three different entries (each with only
> > information from one stage) and drop two of them, meaning we no longer
> > have a single string_list_item that contains information from all 3
> > unmerged entries for the same path.  I'm pretty sure running the
> > existing tests would catch that kind of bug, which is what raised the
> > question.
>
> That's the thing -- I did run make in the t/ directory, and it passed. I was
> just wondering if there's any other way to test this in isolation, in case
> I want to verify such changes more directly in the future.

Really?  Did you rebuild the code, after making your changes?  You may
have been running with a pre-changes version of the code.

I just applied your changes and ran the tests.  I see it fail as soon
as it gets to t1004.

$ cd t && make test
[... lots of output snipped ...]
*** t1004-read-tree-m-u-wf.sh ***
ok 1 - two-way setup
ok 2 - two-way not clobbering
ok 3 - two-way with incorrect --exclude-per-directory (1)
ok 4 - two-way with incorrect --exclude-per-directory (2)
ok 5 - two-way clobbering a ignored file
ok 6 - three-way not complaining on an untracked path in both
ok 7 - three-way not clobbering a working tree file
ok 8 - three-way not complaining on an untracked file
ok 9 - 3-way not overwriting local changes (setup)
ok 10 - 3-way not overwriting local changes (our side)
ok 11 - 3-way not overwriting local changes (their side)
ok 12 - funny symlink in work tree
ok 13 - funny symlink in work tree, un-unlink-able
ok 14 - D/F setup
ok 15 - D/F
ok 16 - D/F resolve
not ok 17 - D/F recursive
#
#
#        git reset --hard &&
#        git checkout side-b &&
#        git merge-recursive branch-point -- side-b side-a
#
#
# failed 1 among 17 test(s)
1..17
make[1]: *** [Makefile:77: t1004-read-tree-m-u-wf.sh] Error 1
make[1]: Leaving directory '/home/newren/floss/git/t'
make: *** [Makefile:63: test] Error 2


...and if go to the toplevel directory and run under prove so I can
see all the failures (and run the test suites in parallel), I see:

$ cd .. && make DEFAULT_TEST_TARGET=prove GIT_PROVE_OPTS='--timer
--state failed,slow,save --jobs 12' test
[... lots of output snipped ...]
Test Summary Report
-------------------
t3424-rebase-empty.sh                            (Wstat: 256 Tests: 20
Failed: 18)
  Failed tests:  3-20
  Non-zero exit status: 1
t3436-rebase-more-options.sh                     (Wstat: 256 Tests: 19
Failed: 17)
  Failed tests:  2-18
  Non-zero exit status: 1
t4151-am-abort.sh                                (Wstat: 256 Tests: 20
Failed: 12)
  Failed tests:  5-9, 12-16, 19-20
  Non-zero exit status: 1
t3407-rebase-abort.sh                            (Wstat: 256 Tests: 17
Failed: 8)
  Failed tests:  2-9
  Non-zero exit status: 1
t3428-rebase-signoff.sh                          (Wstat: 256 Tests: 7 Failed: 5)
  Failed tests:  2, 4-7
  Non-zero exit status: 1
t6409-merge-subtree.sh                           (Wstat: 256 Tests: 12
Failed: 5)
  Failed tests:  2-6
  Non-zero exit status: 1
t7102-reset.sh                                   (Wstat: 256 Tests: 38
Failed: 7)
  Failed tests:  14-20
  Non-zero exit status: 1
t6432-merge-recursive-space-options.sh           (Wstat: 256 Tests: 11
Failed: 4)
  Failed tests:  2, 7-8, 11
  Non-zero exit status: 1
t6430-merge-recursive.sh                         (Wstat: 256 Tests: 37
Failed: 15)
  Failed tests:  10-11, 13-20, 22-24, 28-29
  Non-zero exit status: 1
t3406-rebase-message.sh                          (Wstat: 256 Tests: 32
Failed: 8)
  Failed tests:  22, 24-27, 29-31
  Non-zero exit status: 1
t4200-rerere.sh                                  (Wstat: 256 Tests: 36
Failed: 5)
  Failed tests:  24-28
  Non-zero exit status: 1
t7201-co.sh                                      (Wstat: 256 Tests: 46
Failed: 5)
  Failed tests:  5-9
  Non-zero exit status: 1
t3418-rebase-continue.sh                         (Wstat: 256 Tests: 29
Failed: 7)
  Failed tests:  4, 6, 10-12, 26-27
  Non-zero exit status: 1
t3403-rebase-skip.sh                             (Wstat: 256 Tests: 20
Failed: 3)
  Failed tests:  2, 4, 9
  Non-zero exit status: 1
t4253-am-keep-cr-dos.sh                          (Wstat: 256 Tests: 7 Failed: 2)
  Failed tests:  6-7
  Non-zero exit status: 1
t9903-bash-prompt.sh                             (Wstat: 256 Tests: 67
Failed: 39)
  Failed tests:  16-31, 33-35, 37, 40-44, 46-52, 55-58, 60
                62, 67
  Non-zero exit status: 1
t3503-cherry-pick-root.sh                        (Wstat: 256 Tests: 6 Failed: 2)
  Failed tests:  5-6
  Non-zero exit status: 1
t3401-rebase-and-am-rename.sh                    (Wstat: 256 Tests: 10
Failed: 2)
  Failed tests:  4, 10
  Non-zero exit status: 1
t2407-worktree-heads.sh                          (Wstat: 256 Tests: 12
Failed: 2)
  Failed tests:  4-5
  Non-zero exit status: 1
t5407-post-rewrite-hook.sh                       (Wstat: 256 Tests: 17
Failed: 3)
  Failed tests:  4-6
  Non-zero exit status: 1
t2500-untracked-overwriting.sh                   (Wstat: 256 Tests: 10
Failed: 2)
  Failed tests:  9-10
  Non-zero exit status: 1
t4153-am-resume-override-opts.sh                 (Wstat: 256 Tests: 6 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
t1015-read-index-unmerged.sh                     (Wstat: 256 Tests: 6 Failed: 1)
  Failed test:  6
  Non-zero exit status: 1
t3509-cherry-pick-merge-df.sh                    (Wstat: 256 Tests: 9 Failed: 1)
  Failed test:  9
  Non-zero exit status: 1
t2023-checkout-m.sh                              (Wstat: 256 Tests: 5 Failed: 1)
  Failed test:  5
  Non-zero exit status: 1
t7615-diff-algo-with-mergy-operations.sh         (Wstat: 256 Tests: 7 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1
t6427-diff3-conflict-markers.sh                  (Wstat: 256 Tests: 9 Failed: 1)
  Failed test:  8
  Non-zero exit status: 1
t1004-read-tree-m-u-wf.sh                        (Wstat: 256 Tests: 17
Failed: 1)
  Failed test:  17
  Non-zero exit status: 1
t3420-rebase-autostash.sh                        (Wstat: 256 Tests: 52
Failed: 10)
  Failed tests:  11-17, 21-23
  Non-zero exit status: 1
t4150-am.sh                                      (Wstat: 256 Tests: 87
Failed: 33)
  Failed tests:  34-40, 42-46, 48, 50-54, 57-62, 64-65, 67-71
                75, 87
  Non-zero exit status: 1
t7512-status-help.sh                             (Wstat: 256 Tests: 46
Failed: 3)
  Failed tests:  5-6, 29
  Non-zero exit status: 1
t3400-rebase.sh                                  (Wstat: 256 Tests: 39
Failed: 1)
  Failed test:  30
  Non-zero exit status: 1
t3404-rebase-interactive.sh                      (Wstat: 256 Tests:
131 Failed: 1)
  Failed test:  80
  Non-zero exit status: 1
Files=1031, Tests=30662, 70 wallclock secs ( 8.33 usr  2.13 sys +
248.60 cusr 516.60 csys = 775.66 CPU)
Result: FAIL
make[1]: *** [Makefile:73: prove] Error 1
make[1]: Leaving directory '/home/newren/floss/git/t'
make: *** [Makefile:3237: test] Error 2

I suspect this is a case where it was testing a version of git that
you built before making the changes.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged
  2025-02-14 19:00             ` Elijah Newren
@ 2025-02-15  8:42               ` Meet Soni
  0 siblings, 0 replies; 17+ messages in thread
From: Meet Soni @ 2025-02-15  8:42 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Junio C Hamano

On Sat, 15 Feb 2025 at 00:30, Elijah Newren <newren@gmail.com> wrote:
>
> On Fri, Feb 14, 2025 at 12:24 AM Meet Soni <meetsoni3017@gmail.com> wrote:
> > That's the thing -- I did run make in the t/ directory, and it passed. I was
> > just wondering if there's any other way to test this in isolation, in case
> > I want to verify such changes more directly in the future.
>
> Really?  Did you rebuild the code, after making your changes?  You may
> have been running with a pre-changes version of the code.
>
> I just applied your changes and ran the tests.  I see it fail as soon
> as it gets to t1004.
>
> $ cd t && make test
> [... lots of output snipped ...]
> *** t1004-read-tree-m-u-wf.sh ***
> ok 1 - two-way setup
> ok 2 - two-way not clobbering
> ok 3 - two-way with incorrect --exclude-per-directory (1)
> ok 4 - two-way with incorrect --exclude-per-directory (2)
> ok 5 - two-way clobbering a ignored file
> ok 6 - three-way not complaining on an untracked path in both
> ok 7 - three-way not clobbering a working tree file
> ok 8 - three-way not complaining on an untracked file
> ok 9 - 3-way not overwriting local changes (setup)
> ok 10 - 3-way not overwriting local changes (our side)
> ok 11 - 3-way not overwriting local changes (their side)
> ok 12 - funny symlink in work tree
> ok 13 - funny symlink in work tree, un-unlink-able
> ok 14 - D/F setup
> ok 15 - D/F
> ok 16 - D/F resolve
> not ok 17 - D/F recursive
> #
> #
> #        git reset --hard &&
> #        git checkout side-b &&
> #        git merge-recursive branch-point -- side-b side-a
> #
> #
> # failed 1 among 17 test(s)
> 1..17
> make[1]: *** [Makefile:77: t1004-read-tree-m-u-wf.sh] Error 1
> make[1]: Leaving directory '/home/newren/floss/git/t'
> make: *** [Makefile:63: test] Error 2
>
>
> ...and if go to the toplevel directory and run under prove so I can
> see all the failures (and run the test suites in parallel), I see:
>
> $ cd .. && make DEFAULT_TEST_TARGET=prove GIT_PROVE_OPTS='--timer
> --state failed,slow,save --jobs 12' test
> [... lots of output snipped ...]
> Test Summary Report
> -------------------
> t3424-rebase-empty.sh                            (Wstat: 256 Tests: 20
> Failed: 18)
>   Failed tests:  3-20
>   Non-zero exit status: 1
> t3436-rebase-more-options.sh                     (Wstat: 256 Tests: 19
> Failed: 17)
>   Failed tests:  2-18
>   Non-zero exit status: 1
> t4151-am-abort.sh                                (Wstat: 256 Tests: 20
> Failed: 12)
>   Failed tests:  5-9, 12-16, 19-20
>   Non-zero exit status: 1
> t3407-rebase-abort.sh                            (Wstat: 256 Tests: 17
> Failed: 8)
>   Failed tests:  2-9
>   Non-zero exit status: 1
> t3428-rebase-signoff.sh                          (Wstat: 256 Tests: 7 Failed: 5)
>   Failed tests:  2, 4-7
>   Non-zero exit status: 1
> t6409-merge-subtree.sh                           (Wstat: 256 Tests: 12
> Failed: 5)
>   Failed tests:  2-6
>   Non-zero exit status: 1
> t7102-reset.sh                                   (Wstat: 256 Tests: 38
> Failed: 7)
>   Failed tests:  14-20
>   Non-zero exit status: 1
> t6432-merge-recursive-space-options.sh           (Wstat: 256 Tests: 11
> Failed: 4)
>   Failed tests:  2, 7-8, 11
>   Non-zero exit status: 1
> t6430-merge-recursive.sh                         (Wstat: 256 Tests: 37
> Failed: 15)
>   Failed tests:  10-11, 13-20, 22-24, 28-29
>   Non-zero exit status: 1
> t3406-rebase-message.sh                          (Wstat: 256 Tests: 32
> Failed: 8)
>   Failed tests:  22, 24-27, 29-31
>   Non-zero exit status: 1
> t4200-rerere.sh                                  (Wstat: 256 Tests: 36
> Failed: 5)
>   Failed tests:  24-28
>   Non-zero exit status: 1
> t7201-co.sh                                      (Wstat: 256 Tests: 46
> Failed: 5)
>   Failed tests:  5-9
>   Non-zero exit status: 1
> t3418-rebase-continue.sh                         (Wstat: 256 Tests: 29
> Failed: 7)
>   Failed tests:  4, 6, 10-12, 26-27
>   Non-zero exit status: 1
> t3403-rebase-skip.sh                             (Wstat: 256 Tests: 20
> Failed: 3)
>   Failed tests:  2, 4, 9
>   Non-zero exit status: 1
> t4253-am-keep-cr-dos.sh                          (Wstat: 256 Tests: 7 Failed: 2)
>   Failed tests:  6-7
>   Non-zero exit status: 1
> t9903-bash-prompt.sh                             (Wstat: 256 Tests: 67
> Failed: 39)
>   Failed tests:  16-31, 33-35, 37, 40-44, 46-52, 55-58, 60
>                 62, 67
>   Non-zero exit status: 1
> t3503-cherry-pick-root.sh                        (Wstat: 256 Tests: 6 Failed: 2)
>   Failed tests:  5-6
>   Non-zero exit status: 1
> t3401-rebase-and-am-rename.sh                    (Wstat: 256 Tests: 10
> Failed: 2)
>   Failed tests:  4, 10
>   Non-zero exit status: 1
> t2407-worktree-heads.sh                          (Wstat: 256 Tests: 12
> Failed: 2)
>   Failed tests:  4-5
>   Non-zero exit status: 1
> t5407-post-rewrite-hook.sh                       (Wstat: 256 Tests: 17
> Failed: 3)
>   Failed tests:  4-6
>   Non-zero exit status: 1
> t2500-untracked-overwriting.sh                   (Wstat: 256 Tests: 10
> Failed: 2)
>   Failed tests:  9-10
>   Non-zero exit status: 1
> t4153-am-resume-override-opts.sh                 (Wstat: 256 Tests: 6 Failed: 1)
>   Failed test:  3
>   Non-zero exit status: 1
> t1015-read-index-unmerged.sh                     (Wstat: 256 Tests: 6 Failed: 1)
>   Failed test:  6
>   Non-zero exit status: 1
> t3509-cherry-pick-merge-df.sh                    (Wstat: 256 Tests: 9 Failed: 1)
>   Failed test:  9
>   Non-zero exit status: 1
> t2023-checkout-m.sh                              (Wstat: 256 Tests: 5 Failed: 1)
>   Failed test:  5
>   Non-zero exit status: 1
> t7615-diff-algo-with-mergy-operations.sh         (Wstat: 256 Tests: 7 Failed: 1)
>   Failed test:  2
>   Non-zero exit status: 1
> t6427-diff3-conflict-markers.sh                  (Wstat: 256 Tests: 9 Failed: 1)
>   Failed test:  8
>   Non-zero exit status: 1
> t1004-read-tree-m-u-wf.sh                        (Wstat: 256 Tests: 17
> Failed: 1)
>   Failed test:  17
>   Non-zero exit status: 1
> t3420-rebase-autostash.sh                        (Wstat: 256 Tests: 52
> Failed: 10)
>   Failed tests:  11-17, 21-23
>   Non-zero exit status: 1
> t4150-am.sh                                      (Wstat: 256 Tests: 87
> Failed: 33)
>   Failed tests:  34-40, 42-46, 48, 50-54, 57-62, 64-65, 67-71
>                 75, 87
>   Non-zero exit status: 1
> t7512-status-help.sh                             (Wstat: 256 Tests: 46
> Failed: 3)
>   Failed tests:  5-6, 29
>   Non-zero exit status: 1
> t3400-rebase.sh                                  (Wstat: 256 Tests: 39
> Failed: 1)
>   Failed test:  30
>   Non-zero exit status: 1
> t3404-rebase-interactive.sh                      (Wstat: 256 Tests:
> 131 Failed: 1)
>   Failed test:  80
>   Non-zero exit status: 1
> Files=1031, Tests=30662, 70 wallclock secs ( 8.33 usr  2.13 sys +
> 248.60 cusr 516.60 csys = 775.66 CPU)
> Result: FAIL
> make[1]: *** [Makefile:73: prove] Error 1
> make[1]: Leaving directory '/home/newren/floss/git/t'
> make: *** [Makefile:3237: test] Error 2
>
> I suspect this is a case where it was testing a version of git that
> you built before making the changes.

Thanks! You're right. I ran the tests before running make. After
running make and testing again, it failed.

Meet

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-02-15  8:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-11 19:43 [GSoC][PATCH] merge-recursive: optimize string_list construction Meet Soni
2025-02-11 20:59 ` Elijah Newren
2025-02-13  9:00 ` [RFC PATCH 0/2] merge-recursive: optimize time complexity Meet Soni
2025-02-13  9:00   ` [RFC PATCH 1/2] merge-recursive: optimize time complexity for process_renames Meet Soni
2025-02-13 17:06     ` Elijah Newren
2025-02-13  9:00   ` [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged Meet Soni
2025-02-13 17:11     ` Elijah Newren
2025-02-13 18:30       ` Junio C Hamano
2025-02-13 18:45         ` Elijah Newren
2025-02-14  4:28       ` Meet Soni
2025-02-14  6:04         ` Elijah Newren
2025-02-14  8:24           ` Meet Soni
2025-02-14 19:00             ` Elijah Newren
2025-02-15  8:42               ` Meet Soni
2025-02-13 11:02   ` [RFC PATCH 0/2] merge-recursive: optimize time complexity Meet Soni
2025-02-13 18:30   ` Elijah Newren
2025-02-14  4:41   ` [GSoC][PATCH v2] merge-recursive: optimize time complexity for process_renames Meet Soni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox