From: Junio C Hamano <gitster@pobox.com>
To: "Paulo Casaretto via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Paulo Casaretto <pcasaretto@gmail.com>,
pcasaretto <paulo.casaretto@shopify.com>
Subject: Re: [PATCH] range-diff: add configurable memory limit for cost matrix
Date: Tue, 26 Aug 2025 12:18:41 -0700 [thread overview]
Message-ID: <xmqqzfblj3hq.fsf@gitster.g> (raw)
In-Reply-To: <pull.1958.git.1756228693233.gitgitgadget@gmail.com> (Paulo Casaretto via GitGitGadget's message of "Tue, 26 Aug 2025 17:18:13 +0000")
"Paulo Casaretto via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: pcasaretto <paulo.casaretto@shopify.com>
<administrivia>
It is usual to see a less human readable name embedded in the commit
object than the mail header when a mail comes from GGG.
Just in case you want to be known to this community as "Paulo
Casaretto", not "pcasaretto", I thought I'd point it out that you
may want to redo the commit. I do not mind what name you like to
use, as long as it is identifiable, and From: identity matches the
identity you add your Signed-off-by: with.
</administrivia>
> Acked-by: Johannes Schindelin johannes.schindelin@gmx.de
It is unusual to lack <> around e-mail address here.
> Signed-off-by: pcasaretto <paulo.casaretto@shopify.com>
> ---
> range-diff: add configurable memory limit for cost matrix
> +static int parse_max_memory(const struct option *opt, const char *arg, int unset)
> +{
> + size_t *max_memory = opt->value;
> + uintmax_t val;
> +
> + if (unset) {
> + return 0;
> + }
No unnecessary {braces} around a single statement, please.
> + if (!git_parse_unsigned(arg, &val, SIZE_MAX))
> + return error(_("invalid max-memory value: %s"), arg);
> +
> + *max_memory = (size_t)val;
> + return 0;
> +}
> @@ -33,17 +51,21 @@ int cmd_range_diff(int argc,
> OPT_INTEGER(0, "creation-factor",
> &range_diff_opts.creation_factor,
> N_("percentage by which creation is weighted")),
> + OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
> + N_("style"), N_("passed to 'git log'"), 0),
> + OPT_BOOL(0, "left-only", &left_only,
> + N_("only emit output related to the first range")),
> + OPT_CALLBACK(0, "max-memory", &range_diff_opts.max_memory,
> + N_("size"),
> + N_("maximum memory for cost matrix (default 4G)"),
> + parse_max_memory),
> OPT_BOOL(0, "no-dual-color", &simple_color,
> N_("use simple diff colors")),
> OPT_PASSTHRU_ARGV(0, "notes", &other_arg,
> N_("notes"), N_("passed to 'git log'"),
> PARSE_OPT_OPTARG),
> - OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
> - N_("style"), N_("passed to 'git log'"), 0),
> OPT_PASSTHRU_ARGV(0, "remerge-diff", &diff_merges_arg, NULL,
> N_("passed to 'git log'"), PARSE_OPT_NOARG),
> - OPT_BOOL(0, "left-only", &left_only,
> - N_("only emit output related to the first range")),
> OPT_BOOL(0, "right-only", &right_only,
> N_("only emit output related to the second range")),
> OPT_END()
This seems to mix unrelated changes. Please don't.
Or if the reordering of options do have a reason to exist in _this_
commit, please justify it in your proposed log message. Even if
there were a good reason for reordering existing options, I strongly
suspect that the change would want to be done in a separate,
preparatory-clean-up commit (i.e., making this topic a two-patch
series), because it has nothing to do with preventing inefficient
cost matrix computation from consuming too much memory, which _is_
the theme of this commit.
> diff --git a/range-diff.c b/range-diff.c
> index 8a2dcbee322..6e9b6b115e5 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -21,6 +21,7 @@
> #include "apply.h"
> #include "revision.h"
>
> +
Unrelated, unexplained, and unnecessary change snuck in? Please
proof-read the patch yourself before sending.
> @@ -287,8 +288,8 @@ static void find_exact_matches(struct string_list *a, struct string_list *b)
> }
>
> static int diffsize_consume(void *data,
> - char *line UNUSED,
> - unsigned long len UNUSED)
> + char *line UNUSED,
> + unsigned long len UNUSED)
What is this change about???
> static void get_correspondences(struct string_list *a, struct string_list *b,
> - int creation_factor)
> + int creation_factor, size_t max_memory)
> {
> int n = a->nr + b->nr;
> int *cost, c, *a2b, *b2a;
> int i, j;
> -
> - ALLOC_ARRAY(cost, st_mult(n, n));
> + size_t cost_size = st_mult(n, n);
> + size_t cost_bytes = st_mult(sizeof(int), cost_size);
> + if (cost_bytes >= max_memory) {
> + struct strbuf cost_str = STRBUF_INIT;
> + struct strbuf max_str = STRBUF_INIT;
> + strbuf_humanise_bytes(&cost_str, cost_bytes);
> + strbuf_humanise_bytes(&max_str, max_memory);
> + die(_("range-diff: unable to compute the range-diff, since it "
> + "exceeds the maximum memory for the cost matrix: %s "
> + "(%"PRIuMAX" bytes) needed, %s (%"PRIuMAX" bytes) available"),
> + cost_str.buf, (uintmax_t)cost_bytes, max_str.buf, (uintmax_t)max_memory);
> + }
> + ALLOC_ARRAY(cost, cost_size);
Nicely done.
> @@ -351,7 +363,8 @@ static void get_correspondences(struct string_list *a, struct string_list *b,
> }
>
> c = a_util->matching < 0 ?
> - a_util->diffsize * creation_factor / 100 : COST_MAX;
> + a_util->diffsize * creation_factor / 100 :
> + COST_MAX;
> for (j = b->nr; j < n; j++)
> cost[i + n * j] = c;
> }
There seem to be other unrelated changes indentation-only changes
mixed in to the changes to this file, not just this one.
As a style fix,
c = a_util->matching < 0
? a_util->diffsize * creation_factor / 100
: COST_MAX;
would be easier to follow and read, but please do not do such a
cosmetic clean-up in the same patch. Do them in a separate
preliminary clean-up patch before the "real work".
> @@ -591,7 +605,8 @@ int show_range_diff(const char *range1, const char *range2,
> if (!res) {
> find_exact_matches(&branch1, &branch2);
> get_correspondences(&branch1, &branch2,
> - range_diff_opts->creation_factor);
> + range_diff_opts->creation_factor,
> + range_diff_opts->max_memory);
> output(&branch1, &branch2, range_diff_opts);
> }
OK.
next prev parent reply other threads:[~2025-08-26 19:18 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-26 17:18 [PATCH] range-diff: add configurable memory limit for cost matrix Paulo Casaretto via GitGitGadget
2025-08-26 19:18 ` Junio C Hamano [this message]
2025-08-28 8:38 ` [PATCH v2 0/2] " Paulo Casaretto via GitGitGadget
2025-08-28 8:38 ` [PATCH v2 1/2] range-diff: reorder options lexicographically pcasaretto via GitGitGadget
2025-08-28 15:21 ` Junio C Hamano
2025-08-28 17:12 ` Elijah Newren
2025-08-29 10:56 ` Paulo L F Casaretto
2025-08-29 15:15 ` Junio C Hamano
2025-08-28 8:38 ` [PATCH v2 2/2] range-diff: add configurable memory limit for cost matrix pcasaretto via GitGitGadget
2025-08-28 17:04 ` Elijah Newren
2025-08-28 21:22 ` Junio C Hamano
2025-08-28 21:34 ` Elijah Newren
2025-08-28 21:45 ` Junio C Hamano
2025-08-29 11:00 ` [PATCH v3] " Paulo Casaretto via GitGitGadget
2025-08-29 15:21 ` Elijah Newren
2025-08-29 16:33 ` Junio C Hamano
2025-08-29 15:40 ` Junio C Hamano
2025-08-29 16:02 ` [PATCH v4] " Paulo Casaretto via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqzfblj3hq.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=paulo.casaretto@shopify.com \
--cc=pcasaretto@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).