git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Paulo Casaretto via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Paulo Casaretto <pcasaretto@gmail.com>,
	pcasaretto <paulo.casaretto@shopify.com>
Subject: Re: [PATCH] range-diff: add configurable memory limit for cost matrix
Date: Tue, 26 Aug 2025 12:18:41 -0700	[thread overview]
Message-ID: <xmqqzfblj3hq.fsf@gitster.g> (raw)
In-Reply-To: <pull.1958.git.1756228693233.gitgitgadget@gmail.com> (Paulo Casaretto via GitGitGadget's message of "Tue, 26 Aug 2025 17:18:13 +0000")

"Paulo Casaretto via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: pcasaretto <paulo.casaretto@shopify.com>

<administrivia>

It is usual to see a less human readable name embedded in the commit
object than the mail header when a mail comes from GGG.  

Just in case you want to be known to this community as "Paulo
Casaretto", not "pcasaretto", I thought I'd point it out that you
may want to redo the commit.  I do not mind what name you like to
use, as long as it is identifiable, and From: identity matches the
identity you add your Signed-off-by: with.

</administrivia>

> Acked-by: Johannes Schindelin johannes.schindelin@gmx.de

It is unusual to lack <> around e-mail address here.

> Signed-off-by: pcasaretto <paulo.casaretto@shopify.com>
> ---
>     range-diff: add configurable memory limit for cost matrix

> +static int parse_max_memory(const struct option *opt, const char *arg, int unset)
> +{
> +	size_t *max_memory = opt->value;
> +	uintmax_t val;
> +
> +	if (unset) {
> +		return 0;
> +	}

No unnecessary {braces} around a single statement, please.

> +	if (!git_parse_unsigned(arg, &val, SIZE_MAX))
> +		return error(_("invalid max-memory value: %s"), arg);
> +
> +	*max_memory = (size_t)val;
> +	return 0;
> +}

> @@ -33,17 +51,21 @@ int cmd_range_diff(int argc,
>  		OPT_INTEGER(0, "creation-factor",
>  			    &range_diff_opts.creation_factor,
>  			    N_("percentage by which creation is weighted")),
> +		OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
> +				  N_("style"), N_("passed to 'git log'"), 0),
> +		OPT_BOOL(0, "left-only", &left_only,
> +			 N_("only emit output related to the first range")),
> +		OPT_CALLBACK(0, "max-memory", &range_diff_opts.max_memory,
> +			     N_("size"),
> +			     N_("maximum memory for cost matrix (default 4G)"),
> +			     parse_max_memory),
>  		OPT_BOOL(0, "no-dual-color", &simple_color,
>  			    N_("use simple diff colors")),
>  		OPT_PASSTHRU_ARGV(0, "notes", &other_arg,
>  				  N_("notes"), N_("passed to 'git log'"),
>  				  PARSE_OPT_OPTARG),
> -		OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
> -				  N_("style"), N_("passed to 'git log'"), 0),
>  		OPT_PASSTHRU_ARGV(0, "remerge-diff", &diff_merges_arg, NULL,
>  				  N_("passed to 'git log'"), PARSE_OPT_NOARG),
> -		OPT_BOOL(0, "left-only", &left_only,
> -			 N_("only emit output related to the first range")),
>  		OPT_BOOL(0, "right-only", &right_only,
>  			 N_("only emit output related to the second range")),
>  		OPT_END()

This seems to mix unrelated changes.  Please don't.

Or if the reordering of options do have a reason to exist in _this_
commit, please justify it in your proposed log message.  Even if
there were a good reason for reordering existing options, I strongly
suspect that the change would want to be done in a separate,
preparatory-clean-up commit (i.e., making this topic a two-patch
series), because it has nothing to do with preventing inefficient
cost matrix computation from consuming too much memory, which _is_
the theme of this commit.

> diff --git a/range-diff.c b/range-diff.c
> index 8a2dcbee322..6e9b6b115e5 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -21,6 +21,7 @@
>  #include "apply.h"
>  #include "revision.h"
>  
> +

Unrelated, unexplained, and unnecessary change snuck in?  Please
proof-read the patch yourself before sending.

> @@ -287,8 +288,8 @@ static void find_exact_matches(struct string_list *a, struct string_list *b)
>  }
>  
>  static int diffsize_consume(void *data,
> -			     char *line UNUSED,
> -			     unsigned long len UNUSED)
> +			    char *line UNUSED,
> +			    unsigned long len UNUSED)

What is this change about???

>  static void get_correspondences(struct string_list *a, struct string_list *b,
> -				int creation_factor)
> +				int creation_factor, size_t max_memory)
>  {
>  	int n = a->nr + b->nr;
>  	int *cost, c, *a2b, *b2a;
>  	int i, j;
> -
> -	ALLOC_ARRAY(cost, st_mult(n, n));
> +	size_t cost_size = st_mult(n, n);
> +	size_t cost_bytes = st_mult(sizeof(int), cost_size);
> +	if (cost_bytes >= max_memory) {
> +		struct strbuf cost_str = STRBUF_INIT;
> +		struct strbuf max_str = STRBUF_INIT;
> +		strbuf_humanise_bytes(&cost_str, cost_bytes);
> +		strbuf_humanise_bytes(&max_str, max_memory);
> +		die(_("range-diff: unable to compute the range-diff, since it "
> +		      "exceeds the maximum memory for the cost matrix: %s "
> +		      "(%"PRIuMAX" bytes) needed, %s (%"PRIuMAX" bytes) available"),
> +		    cost_str.buf, (uintmax_t)cost_bytes, max_str.buf, (uintmax_t)max_memory);
> +	}
> +	ALLOC_ARRAY(cost, cost_size);

Nicely done.

> @@ -351,7 +363,8 @@ static void get_correspondences(struct string_list *a, struct string_list *b,
>  		}
>  
>  		c = a_util->matching < 0 ?
> -			a_util->diffsize * creation_factor / 100 : COST_MAX;
> +			    a_util->diffsize * creation_factor / 100 :
> +			    COST_MAX;
>  		for (j = b->nr; j < n; j++)
>  			cost[i + n * j] = c;
>  	}

There seem to be other unrelated changes indentation-only changes
mixed in to the changes to this file, not just this one.

As a style fix, 

		c = a_util->matching < 0
		  ? a_util->diffsize * creation_factor / 100
		  : COST_MAX;

would be easier to follow and read, but please do not do such a
cosmetic clean-up in the same patch.  Do them in a separate
preliminary clean-up patch before the "real work".

> @@ -591,7 +605,8 @@ int show_range_diff(const char *range1, const char *range2,
>  	if (!res) {
>  		find_exact_matches(&branch1, &branch2);
>  		get_correspondences(&branch1, &branch2,
> -				    range_diff_opts->creation_factor);
> +				    range_diff_opts->creation_factor,
> +				    range_diff_opts->max_memory);
>  		output(&branch1, &branch2, range_diff_opts);
>  	}

OK.

  reply	other threads:[~2025-08-26 19:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-26 17:18 [PATCH] range-diff: add configurable memory limit for cost matrix Paulo Casaretto via GitGitGadget
2025-08-26 19:18 ` Junio C Hamano [this message]
2025-08-28  8:38 ` [PATCH v2 0/2] " Paulo Casaretto via GitGitGadget
2025-08-28  8:38   ` [PATCH v2 1/2] range-diff: reorder options lexicographically pcasaretto via GitGitGadget
2025-08-28 15:21     ` Junio C Hamano
2025-08-28 17:12       ` Elijah Newren
2025-08-29 10:56         ` Paulo L F Casaretto
2025-08-29 15:15           ` Junio C Hamano
2025-08-28  8:38   ` [PATCH v2 2/2] range-diff: add configurable memory limit for cost matrix pcasaretto via GitGitGadget
2025-08-28 17:04     ` Elijah Newren
2025-08-28 21:22       ` Junio C Hamano
2025-08-28 21:34         ` Elijah Newren
2025-08-28 21:45           ` Junio C Hamano
2025-08-29 11:00   ` [PATCH v3] " Paulo Casaretto via GitGitGadget
2025-08-29 15:21     ` Elijah Newren
2025-08-29 16:33       ` Junio C Hamano
2025-08-29 15:40     ` Junio C Hamano
2025-08-29 16:02     ` [PATCH v4] " Paulo Casaretto via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqzfblj3hq.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=paulo.casaretto@shopify.com \
    --cc=pcasaretto@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).