git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unpredictable peak memory usage when using `git log` command
@ 2024-08-30 12:20 Yuri Karnilaev
  2024-08-30 20:53 ` [PATCH] revision: free commit buffers for skipped commits Jeff King
  2024-08-30 21:06 ` Unpredictable peak memory usage when using `git log` command Jeff King
  0 siblings, 2 replies; 6+ messages in thread
From: Yuri Karnilaev @ 2024-08-30 12:20 UTC (permalink / raw)
  To: git

Hello,

I encountered an issue when using the `git log` command to retrieve commits in large repositories. My task is to iterate over all commits and output them in a specific format. However, my computer has limited memory, so I am looking for a way to reduce the memory consumption of this operation.

I tested two different commands on the `torvalds/linux` repository as an example of a large repository and noticed a significant difference in peak memory usage:

1. Processing all commits in one go:
```
/usr/bin/time -l -h -p git log --ignore-missing --pretty=format:%H%x02%P%x02%aN%x02%aE%x02%at%x00 --numstat > 1.txt
```
Result:
```
real 594,01
user 562,22
sys 12,43
          7407976448  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              187437  page reclaims
              274228  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                1031  voluntary context switches
              287056  involuntary context switches
       5455479398547  instructions retired
       1828253079874  cycles elapsed
           135_616_064  peak memory footprint
```

2. Processing commits in batches:
```
/usr/bin/time -l -h -p git log --ignore-missing --pretty=format:%H%x02%P%x02%aN%x02%aE%x02%at%x00 -n 1000 --skip=1000000 --numstat > 1.txt
```
Result:
```
real 9,83
user 7,48
sys 0,40
          2390540288  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               93487  page reclaims
               52995  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                 634  voluntary context switches
               14183  involuntary context switches
         50173495540  instructions retired
         24906960156  cycles elapsed
          1_470_935_680  peak memory footprint
```

As you can see from the results, the peak memory usage when processing commits in batches is 10 times higher than when processing all commits in one go.
Can you please explain why this happens? Is there a way to work around this? Or maybe can you fix this in future Git versions?

Operating System: Mac OS 14.6.1 (23G93)
Git Version: 2.39.3 (Apple Git-146)

Best regards,
Yuri

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-09-02 13:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-30 12:20 Unpredictable peak memory usage when using `git log` command Yuri Karnilaev
2024-08-30 20:53 ` [PATCH] revision: free commit buffers for skipped commits Jeff King
2024-08-30 21:27   ` Junio C Hamano
2024-08-30 21:06 ` Unpredictable peak memory usage when using `git log` command Jeff King
2024-08-31 10:24   ` Yuri Karnilaev
2024-09-02 13:08   ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).