git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: "René Scharfe" <l.s.r@web.de>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH] commit: avoid parent list buildup in clear_commit_marks_many()
Date: Wed, 12 Feb 2025 08:05:34 +0100	[thread overview]
Message-ID: <Z6xIPowXnL-awm6g@pks.im> (raw)
In-Reply-To: <16a7b572-0a3d-4707-9034-0dac69ea99ac@web.de>

On Sun, Feb 09, 2025 at 11:41:15AM +0100, René Scharfe wrote:
> clear_commit_marks_1() clears the marks of the first parent and its
> first parent and so on, and saves the higher numbered parents in a list
> for later.  There is no benefit in keeping that list growing with each
> handled commit.  Clear it after each run to reduce peak memory usage.

Okay. So the issue here is that `clear_commit_marks_1()` only processes
the first-parent chain of commits, and any other commits will be added
to the `struct commit_list` backlog. Consequently, merge-heavy histories
are very likely to build up quite a backlog of non-first-parent commits.

> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  commit.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/commit.c b/commit.c
> index 540660359d..6efdb03997 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -780,14 +780,14 @@ static void clear_commit_marks_1(struct commit_list **plist,
> 
>  void clear_commit_marks_many(size_t nr, struct commit **commit, unsigned int mark)
>  {
> -	struct commit_list *list = NULL;
> -
>  	for (size_t i = 0; i < nr; i++) {
> +		struct commit_list *list = NULL;
> +
>  		clear_commit_marks_1(&list, *commit, mark);
> +		while (list)
> +			clear_commit_marks_1(&list, pop_commit(&list), mark);
>  		commit++;
>  	}
> -	while (list)
> -		clear_commit_marks_1(&list, pop_commit(&list), mark);
>  }

And this is fixed by immediately processing all commits that we
currently have in the list. As `clear_commit_marks_1()` only processes
immediate children of the handed-in commit we know that it will have
processed the first parent, and `list` will contain remaining commits,
if any.

We also end up adding grandchildren to the list, so this change
essentially converts the algorithm from depth-first to breadth-first. I
bet we can construct cases where this will perform _worse_ than the
current algorithm, e.g. when you have branch thickets where every commit
is a merge: But I assume that for the most common cases this might be an
improvement indeed.

The question to me is: does this actually matter in the real world? It
would be nice to maybe get some numbers that demonstrate the improvement
in a repository.

Patrick

  reply	other threads:[~2025-02-12  7:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-09 10:41 [PATCH] commit: avoid parent list buildup in clear_commit_marks_many() René Scharfe
2025-02-12  7:05 ` Patrick Steinhardt [this message]
2025-02-13 21:38   ` René Scharfe
2025-02-17  6:33     ` Patrick Steinhardt
2025-02-23  8:25       ` René Scharfe
2025-02-23  8:26 ` [PATCH resend] " René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6xIPowXnL-awm6g@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).