From: Junio C Hamano <gitster@pobox.com>
To: "Paulo Casaretto via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>,
Paulo Casaretto <pcasaretto@gmail.com>
Subject: Re: [PATCH v3] range-diff: add configurable memory limit for cost matrix
Date: Fri, 29 Aug 2025 08:40:08 -0700 [thread overview]
Message-ID: <xmqqecsup25j.fsf@gitster.g> (raw)
In-Reply-To: <pull.1958.v3.git.1756465231183.gitgitgadget@gmail.com> (Paulo Casaretto via GitGitGadget's message of "Fri, 29 Aug 2025 11:00:31 +0000")
"Paulo Casaretto via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Paulo Casaretto <pcasaretto@gmail.com>
>
> When comparing large commit ranges (e.g., 250,000+ commits), range-diff
> attempts to allocate an n×n cost matrix that can exhaust available
> memory. For example, with 256,784 commits (n = 513,568), the matrix
> would require approximately 256GB of memory (513,568² × 4 bytes),
> causing either immediate segmentation faults due to integer overflow or
> system hangs.
>
> Add a memory limit check in get_correspondences() before allocating the
> cost matrix. This check uses the total size in bytes (n² × sizeof(int))
> and compares it against a configurable maximum, preventing both
> excessive memory usage and integer overflow issues.
>
> The limit is configurable via a new --max-memory option that accepts
> human-readable sizes (e.g., "1G", "500M"). The default is 4GB for 64 bit
> systems and 2GB for 32 bit systems. This allows comparing ranges of
> approximately 32,000 (16,000) commits - generous for real-world use cases
> while preventing impractical operations.
>
> When the limit is exceeded, range-diff now displays a clear error
> message showing both the requested memory size and the maximum allowed,
> formatted in human-readable units for better user experience.
>
> Example usage:
> git range-diff --max-memory=1G branch1...branch2
> git range-diff --max-memory=500M base..topic1 base..topic2
>
> This approach was chosen over alternatives:
> - Pre-counting commits: Would require spawning additional git processes
> and reading all commits twice
> - Limiting by commit count: Less precise than actual memory usage
> - Streaming approach: Would require significant refactoring of the
> current algorithm
>
> This issue was previously discussed in:
> https://lore.kernel.org/git/RFC-cover-v2-0.5-00000000000-20211210T122901Z-avarab@gmail.com/
>
> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> Signed-off-by: Paulo Casaretto <pcasaretto@gmail.com>
> ---
Looks good, especially without the reordering existing entries in
the options list. The authorship information above looks much
better, too.
> @@ -40,6 +57,10 @@ int cmd_range_diff(int argc,
> PARSE_OPT_OPTARG),
> OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
> N_("style"), N_("passed to 'git log'"), 0),
> + OPT_CALLBACK(0, "max-memory", &range_diff_opts.max_memory,
> + N_("size"),
> + N_("maximum memory for cost matrix (default 4G)"),
> + parse_max_memory),
> OPT_PASSTHRU_ARGV(0, "remerge-diff", &diff_merges_arg, NULL,
> N_("passed to 'git log'"), PARSE_OPT_NOARG),
> OPT_BOOL(0, "left-only", &left_only,
Among existing options (an excerpt from "git range-diff h")
--[no-]creation-factor <n>
percentage by which creation is weighted
This controls how correspondence between commits on old and new
branches are computed.
--no-dual-color use simple diff colors
--dual-color opposite of --no-dual-color
These control how the findings are shown, by painting the lines
in distinct colors.
--[no-]notes[=<notes>]
passed to 'git log'
--[no-]diff-merges <style>
passed to 'git log'
--[no-]remerge-diff passed to 'git log'
These control what text are used to represent each commit and
participate in comparison and display.
--[no-]left-only only emit output related to the first range
--[no-]right-only only emit output related to the second range
These again control how the findings are shown, by omitting some
commits from the output.
So there is no perfectly logical place to place the new option, but
between diff-merges and remerge-diff somewhat feels a bit odder
choice than other possible places.
Will queue as is. If some users find the location in the "-h"
output too odd and disturbing, they can later send in a reordering
patch on top, but I would think the chosen location is good enough.
As #leftoverbits we might want to
* Group range-diff specific options with OPT_GROUP()
* Instead of having to match the full NxN matrix, perhaps reduce
the matrix by keeping the most promising M (which is much smaller
than N) for each N, or something?
but that (especially the latter) is totally outside the scope of
this patch.
Thanks.
next prev parent reply other threads:[~2025-08-29 15:40 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-26 17:18 [PATCH] range-diff: add configurable memory limit for cost matrix Paulo Casaretto via GitGitGadget
2025-08-26 19:18 ` Junio C Hamano
2025-08-28 8:38 ` [PATCH v2 0/2] " Paulo Casaretto via GitGitGadget
2025-08-28 8:38 ` [PATCH v2 1/2] range-diff: reorder options lexicographically pcasaretto via GitGitGadget
2025-08-28 15:21 ` Junio C Hamano
2025-08-28 17:12 ` Elijah Newren
2025-08-29 10:56 ` Paulo L F Casaretto
2025-08-29 15:15 ` Junio C Hamano
2025-08-28 8:38 ` [PATCH v2 2/2] range-diff: add configurable memory limit for cost matrix pcasaretto via GitGitGadget
2025-08-28 17:04 ` Elijah Newren
2025-08-28 21:22 ` Junio C Hamano
2025-08-28 21:34 ` Elijah Newren
2025-08-28 21:45 ` Junio C Hamano
2025-08-29 11:00 ` [PATCH v3] " Paulo Casaretto via GitGitGadget
2025-08-29 15:21 ` Elijah Newren
2025-08-29 16:33 ` Junio C Hamano
2025-08-29 15:40 ` Junio C Hamano [this message]
2025-08-29 16:02 ` [PATCH v4] " Paulo Casaretto via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqecsup25j.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=newren@gmail.com \
--cc=pcasaretto@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).