From: Yuri Karnilaev <karnilaev@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: Unpredictable peak memory usage when using `git log` command
Date: Sat, 31 Aug 2024 13:24:35 +0300 [thread overview]
Message-ID: <13D8CFB5-4073-42CC-BFDD-ADFE8FFE985C@gmail.com> (raw)
In-Reply-To: <20240830210607.GB1038751@coredump.intra.peff.net>
Thanks, Peff!
I will try the recommendations for optimizing memory consumption for my task, that you mentioned.
Have a nice day,
Yuri
> On 31. Aug 2024, at 0.06, Jeff King <peff@peff.net> wrote:
>
> On Fri, Aug 30, 2024 at 03:20:15PM +0300, Yuri Karnilaev wrote:
>
>> 2. Processing commits in batches:
>> ```
>> /usr/bin/time -l -h -p git log --ignore-missing --pretty=format:%H%x02%P%x02%aN%x02%aE%x02%at%x00 -n 1000 --skip=1000000 --numstat > 1.txt
>> ```
>> [...]
>> Operating System: Mac OS 14.6.1 (23G93)
>> Git Version: 2.39.3 (Apple Git-146)
>
> I sent a patch which I think should make things better for you, but I
> wanted to mention two things in a more general way:
>
> 1. You should really consider building a commit-graph file with "git
> commit-graph write --reachable". That will reduce the memory usage
> for this case, but also improve the CPU quite a bit (we won't have
> to open those million skipped commits to chase their parent
> pointers).
>
> I haven't kept up with the defaults for writing graph files. I
> thought gc.writeCommitGraph defaults to "true" these days, though
> that wouldn't help in a freshly cloned repository (arguably we
> should write the commit graph on clone?).
>
> 2. Using "--skip" still has to traverse all of those intermediate
> commits. So it's effectively quadratic in the number of commits
> overall (you end up skipping the first 1000 over and over).
>
> It's been a while since I've had to "paginate" segments of history
> like this, but a better solution is along the lines of:
>
> - use "-n 1000" to get 1000 commits in each chunk
>
> - use "--boundary" to report the commits that were queued to be
> traversed next but weren't shown
>
> - in invocations after the first one, start the traversal at
> those boundary commits, rather than HEAD
>
> You'll probably need to add "%m" to your format to show the
> boundaries (or alternatively, you can do the commit selection with
> rev-list, and then output the result to "log --no-walk --stdin" to
> do the pretty-printing).
>
> -Peff
next prev parent reply other threads:[~2024-08-31 10:24 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-30 12:20 Unpredictable peak memory usage when using `git log` command Yuri Karnilaev
2024-08-30 20:53 ` [PATCH] revision: free commit buffers for skipped commits Jeff King
2024-08-30 21:27 ` Junio C Hamano
2024-08-30 21:06 ` Unpredictable peak memory usage when using `git log` command Jeff King
2024-08-31 10:24 ` Yuri Karnilaev [this message]
2024-09-02 13:08 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=13D8CFB5-4073-42CC-BFDD-ADFE8FFE985C@gmail.com \
--to=karnilaev@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).