From: Patrick Steinhardt <ps@pks.im>
To: "René Scharfe" <l.s.r@web.de>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH] commit: avoid parent list buildup in clear_commit_marks_many()
Date: Wed, 12 Feb 2025 08:05:34 +0100 [thread overview]
Message-ID: <Z6xIPowXnL-awm6g@pks.im> (raw)
In-Reply-To: <16a7b572-0a3d-4707-9034-0dac69ea99ac@web.de>
On Sun, Feb 09, 2025 at 11:41:15AM +0100, René Scharfe wrote:
> clear_commit_marks_1() clears the marks of the first parent and its
> first parent and so on, and saves the higher numbered parents in a list
> for later. There is no benefit in keeping that list growing with each
> handled commit. Clear it after each run to reduce peak memory usage.
Okay. So the issue here is that `clear_commit_marks_1()` only processes
the first-parent chain of commits, and any other commits will be added
to the `struct commit_list` backlog. Consequently, merge-heavy histories
are very likely to build up quite a backlog of non-first-parent commits.
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> commit.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index 540660359d..6efdb03997 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -780,14 +780,14 @@ static void clear_commit_marks_1(struct commit_list **plist,
>
> void clear_commit_marks_many(size_t nr, struct commit **commit, unsigned int mark)
> {
> - struct commit_list *list = NULL;
> -
> for (size_t i = 0; i < nr; i++) {
> + struct commit_list *list = NULL;
> +
> clear_commit_marks_1(&list, *commit, mark);
> + while (list)
> + clear_commit_marks_1(&list, pop_commit(&list), mark);
> commit++;
> }
> - while (list)
> - clear_commit_marks_1(&list, pop_commit(&list), mark);
> }
And this is fixed by immediately processing all commits that we
currently have in the list. As `clear_commit_marks_1()` only processes
immediate children of the handed-in commit we know that it will have
processed the first parent, and `list` will contain remaining commits,
if any.
We also end up adding grandchildren to the list, so this change
essentially converts the algorithm from depth-first to breadth-first. I
bet we can construct cases where this will perform _worse_ than the
current algorithm, e.g. when you have branch thickets where every commit
is a merge: But I assume that for the most common cases this might be an
improvement indeed.
The question to me is: does this actually matter in the real world? It
would be nice to maybe get some numbers that demonstrate the improvement
in a repository.
Patrick
next prev parent reply other threads:[~2025-02-12 7:05 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-09 10:41 [PATCH] commit: avoid parent list buildup in clear_commit_marks_many() René Scharfe
2025-02-12 7:05 ` Patrick Steinhardt [this message]
2025-02-13 21:38 ` René Scharfe
2025-02-17 6:33 ` Patrick Steinhardt
2025-02-23 8:25 ` René Scharfe
2025-02-23 8:26 ` [PATCH resend] " René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6xIPowXnL-awm6g@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=l.s.r@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).