From: Junio C Hamano <gitster@pobox.com>
To: "Paulo Casaretto via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Paulo Casaretto <pcasaretto@gmail.com>,
pcasaretto <paulo.casaretto@shopify.com>
Subject: Re: [PATCH] range-diff: add configurable memory limit for cost matrix
Date: Tue, 26 Aug 2025 12:18:41 -0700 [thread overview]
Message-ID: <xmqqzfblj3hq.fsf@gitster.g> (raw)
In-Reply-To: <pull.1958.git.1756228693233.gitgitgadget@gmail.com> (Paulo Casaretto via GitGitGadget's message of "Tue, 26 Aug 2025 17:18:13 +0000")
"Paulo Casaretto via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: pcasaretto <paulo.casaretto@shopify.com>
<administrivia>
It is usual to see a less human readable name embedded in the commit
object than the mail header when a mail comes from GGG.
Just in case you want to be known to this community as "Paulo
Casaretto", not "pcasaretto", I thought I'd point it out that you
may want to redo the commit. I do not mind what name you like to
use, as long as it is identifiable, and From: identity matches the
identity you add your Signed-off-by: with.
</administrivia>
> Acked-by: Johannes Schindelin johannes.schindelin@gmx.de
It is unusual to lack <> around e-mail address here.
> Signed-off-by: pcasaretto <paulo.casaretto@shopify.com>
> ---
> range-diff: add configurable memory limit for cost matrix
> +static int parse_max_memory(const struct option *opt, const char *arg, int unset)
> +{
> + size_t *max_memory = opt->value;
> + uintmax_t val;
> +
> + if (unset) {
> + return 0;
> + }
No unnecessary {braces} around a single statement, please.
> + if (!git_parse_unsigned(arg, &val, SIZE_MAX))
> + return error(_("invalid max-memory value: %s"), arg);
> +
> + *max_memory = (size_t)val;
> + return 0;
> +}
> @@ -33,17 +51,21 @@ int cmd_range_diff(int argc,
> OPT_INTEGER(0, "creation-factor",
> &range_diff_opts.creation_factor,
> N_("percentage by which creation is weighted")),
> + OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
> + N_("style"), N_("passed to 'git log'"), 0),
> + OPT_BOOL(0, "left-only", &left_only,
> + N_("only emit output related to the first range")),
> + OPT_CALLBACK(0, "max-memory", &range_diff_opts.max_memory,
> + N_("size"),
> + N_("maximum memory for cost matrix (default 4G)"),
> + parse_max_memory),
> OPT_BOOL(0, "no-dual-color", &simple_color,
> N_("use simple diff colors")),
> OPT_PASSTHRU_ARGV(0, "notes", &other_arg,
> N_("notes"), N_("passed to 'git log'"),
> PARSE_OPT_OPTARG),
> - OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
> - N_("style"), N_("passed to 'git log'"), 0),
> OPT_PASSTHRU_ARGV(0, "remerge-diff", &diff_merges_arg, NULL,
> N_("passed to 'git log'"), PARSE_OPT_NOARG),
> - OPT_BOOL(0, "left-only", &left_only,
> - N_("only emit output related to the first range")),
> OPT_BOOL(0, "right-only", &right_only,
> N_("only emit output related to the second range")),
> OPT_END()
This seems to mix unrelated changes. Please don't.
Or if the reordering of options do have a reason to exist in _this_
commit, please justify it in your proposed log message. Even if
there were a good reason for reordering existing options, I strongly
suspect that the change would want to be done in a separate,
preparatory-clean-up commit (i.e., making this topic a two-patch
series), because it has nothing to do with preventing inefficient
cost matrix computation from consuming too much memory, which _is_
the theme of this commit.
> diff --git a/range-diff.c b/range-diff.c
> index 8a2dcbee322..6e9b6b115e5 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -21,6 +21,7 @@
> #include "apply.h"
> #include "revision.h"
>
> +
Unrelated, unexplained, and unnecessary change snuck in? Please
proof-read the patch yourself before sending.
> @@ -287,8 +288,8 @@ static void find_exact_matches(struct string_list *a, struct string_list *b)
> }
>
> static int diffsize_consume(void *data,
> - char *line UNUSED,
> - unsigned long len UNUSED)
> + char *line UNUSED,
> + unsigned long len UNUSED)
What is this change about???
> static void get_correspondences(struct string_list *a, struct string_list *b,
> - int creation_factor)
> + int creation_factor, size_t max_memory)
> {
> int n = a->nr + b->nr;
> int *cost, c, *a2b, *b2a;
> int i, j;
> -
> - ALLOC_ARRAY(cost, st_mult(n, n));
> + size_t cost_size = st_mult(n, n);
> + size_t cost_bytes = st_mult(sizeof(int), cost_size);
> + if (cost_bytes >= max_memory) {
> + struct strbuf cost_str = STRBUF_INIT;
> + struct strbuf max_str = STRBUF_INIT;
> + strbuf_humanise_bytes(&cost_str, cost_bytes);
> + strbuf_humanise_bytes(&max_str, max_memory);
> + die(_("range-diff: unable to compute the range-diff, since it "
> + "exceeds the maximum memory for the cost matrix: %s "
> + "(%"PRIuMAX" bytes) needed, %s (%"PRIuMAX" bytes) available"),
> + cost_str.buf, (uintmax_t)cost_bytes, max_str.buf, (uintmax_t)max_memory);
> + }
> + ALLOC_ARRAY(cost, cost_size);
Nicely done.
> @@ -351,7 +363,8 @@ static void get_correspondences(struct string_list *a, struct string_list *b,
> }
>
> c = a_util->matching < 0 ?
> - a_util->diffsize * creation_factor / 100 : COST_MAX;
> + a_util->diffsize * creation_factor / 100 :
> + COST_MAX;
> for (j = b->nr; j < n; j++)
> cost[i + n * j] = c;
> }
There seem to be other unrelated changes indentation-only changes
mixed in to the changes to this file, not just this one.
As a style fix,
c = a_util->matching < 0
? a_util->diffsize * creation_factor / 100
: COST_MAX;
would be easier to follow and read, but please do not do such a
cosmetic clean-up in the same patch. Do them in a separate
preliminary clean-up patch before the "real work".
> @@ -591,7 +605,8 @@ int show_range_diff(const char *range1, const char *range2,
> if (!res) {
> find_exact_matches(&branch1, &branch2);
> get_correspondences(&branch1, &branch2,
> - range_diff_opts->creation_factor);
> + range_diff_opts->creation_factor,
> + range_diff_opts->max_memory);
> output(&branch1, &branch2, range_diff_opts);
> }
OK.
next prev parent reply other threads:[~2025-08-26 19:18 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-26 17:18 [PATCH] range-diff: add configurable memory limit for cost matrix Paulo Casaretto via GitGitGadget
2025-08-26 19:18 ` Junio C Hamano [this message]
2025-08-28 8:38 ` [PATCH v2 0/2] " Paulo Casaretto via GitGitGadget
2025-08-28 8:38 ` [PATCH v2 1/2] range-diff: reorder options lexicographically pcasaretto via GitGitGadget
2025-08-28 15:21 ` Junio C Hamano
2025-08-28 17:12 ` Elijah Newren
2025-08-29 10:56 ` Paulo L F Casaretto
2025-08-29 15:15 ` Junio C Hamano
2025-08-28 8:38 ` [PATCH v2 2/2] range-diff: add configurable memory limit for cost matrix pcasaretto via GitGitGadget
2025-08-28 17:04 ` Elijah Newren
2025-08-28 21:22 ` Junio C Hamano
2025-08-28 21:34 ` Elijah Newren
2025-08-28 21:45 ` Junio C Hamano
2025-08-29 11:00 ` [PATCH v3] " Paulo Casaretto via GitGitGadget
2025-08-29 15:21 ` Elijah Newren
2025-08-29 16:33 ` Junio C Hamano
2025-08-29 15:40 ` Junio C Hamano
2025-08-29 16:02 ` [PATCH v4] " Paulo Casaretto via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqzfblj3hq.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=paulo.casaretto@shopify.com \
--cc=pcasaretto@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.