Re: Unpredictable peak memory usage when using `git log` command

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Yuri Karnilaev <karnilaev@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: Unpredictable peak memory usage when using `git log` command
Date: Sat, 31 Aug 2024 13:24:35 +0300	[thread overview]
Message-ID: <13D8CFB5-4073-42CC-BFDD-ADFE8FFE985C@gmail.com> (raw)
In-Reply-To: <20240830210607.GB1038751@coredump.intra.peff.net>

Thanks, Peff!

I will try the recommendations for optimizing memory consumption for my task, that you mentioned.

Have a nice day,
Yuri

> On 31. Aug 2024, at 0.06, Jeff King <peff@peff.net> wrote:
> 
> On Fri, Aug 30, 2024 at 03:20:15PM +0300, Yuri Karnilaev wrote:
> 
>> 2. Processing commits in batches:
>> ```
>> /usr/bin/time -l -h -p git log --ignore-missing --pretty=format:%H%x02%P%x02%aN%x02%aE%x02%at%x00 -n 1000 --skip=1000000 --numstat > 1.txt
>> ```
>> [...]
>> Operating System: Mac OS 14.6.1 (23G93)
>> Git Version: 2.39.3 (Apple Git-146)
> 
> I sent a patch which I think should make things better for you, but I
> wanted to mention two things in a more general way:
> 
>  1. You should really consider building a commit-graph file with "git
>     commit-graph write --reachable". That will reduce the memory usage
>     for this case, but also improve the CPU quite a bit (we won't have
>     to open those million skipped commits to chase their parent
>     pointers).
> 
>     I haven't kept up with the defaults for writing graph files. I
>     thought gc.writeCommitGraph defaults to "true" these days, though
>     that wouldn't help in a freshly cloned repository (arguably we
>     should write the commit graph on clone?).
> 
>  2. Using "--skip" still has to traverse all of those intermediate
>     commits. So it's effectively quadratic in the number of commits
>     overall (you end up skipping the first 1000 over and over).
> 
>     It's been a while since I've had to "paginate" segments of history
>     like this, but a better solution is along the lines of:
> 
>       - use "-n 1000" to get 1000 commits in each chunk
> 
>       - use "--boundary" to report the commits that were queued to be
> 	 traversed next but weren't shown
> 
>       - in invocations after the first one, start the traversal at
> 	 those boundary commits, rather than HEAD
> 
>     You'll probably need to add "%m" to your format to show the
>     boundaries (or alternatively, you can do the commit selection with
>     rev-list, and then output the result to "log --no-walk --stdin" to
>     do the pretty-printing).
> 
> -Peff

next prev parent reply	other threads:[~2024-08-31 10:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-30 12:20 Unpredictable peak memory usage when using `git log` command Yuri Karnilaev
2024-08-30 20:53 ` [PATCH] revision: free commit buffers for skipped commits Jeff King
2024-08-30 21:27   ` Junio C Hamano
2024-08-30 21:06 ` Unpredictable peak memory usage when using `git log` command Jeff King
2024-08-31 10:24   ` Yuri Karnilaev [this message]
2024-09-02 13:08   ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13D8CFB5-4073-42CC-BFDD-ADFE8FFE985C@gmail.com \
    --to=karnilaev@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).