From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: [PATCH 0/2] diffcore-break optimizations
Date: Mon, 16 Nov 2009 10:53:32 -0500 [thread overview]
Message-ID: <20091116155331.GA30719@coredump.intra.peff.net> (raw)
On one of my more ridiculously gigantic repositories, I recently tried
to make a commit that ran git out of memory while trying to commit. The
repository has about 3 gigabytes of data, and I made a small-ish change
to every file. Pathological, yes, but I think we can do better than
chugging for 5 minutes and dying.
The culprit turned out to be memory usage in diffcore-break, which is on
by default for "git status" (and for the "git commit" template message).
It wants to have every changed blob in memory at once, which is just
silly.
The patches are:
[1/2]: diffcore-break: free filespec data as we go
This addresses the memory consumption issue. If you have enough
memory, it doesn't actually yield a speed improvement, but nor does it
show any slowdown for practical workloads.
There is a theoretical slowdown when doing -B -M, because the rename
phase has to re-fetch the blobs from the object store. However, I
wasn't able to measure any slowdown for real-world cases (like "git
log --summary -M -B >/dev/null" on git.git).
I did manage to produce the slowdown on a pathological case: ten
20-megabyte files, each copied with a slight modification to another
file, and then replaced with totally different contents (so each one
will be broken and then trigger an inexact rename). That diff went
from 16s to 17s.
But I improved that and more with the next optimization.
[2/2]: diffcore-break: save cnt_data for other phases
We already do this in rename detection, and since they use the same
data format, there is little reason not to do so. My pathological case
above went from 17s down to 12s. I wasn't able to detect any speedup
or slowdown for sane cases.
So I doubt anybody will even notice this, but I think since we can
address pathological cases, we might as well (and as you will see, the
code change is quite small).
All of that being said, I was able to do my commit, but I still had to
wait five minutes for it to chug through 3G of data. :) I am tempted to
add a "quick" mode to git-commit, but perhaps such a ridiculous case is
rare enough not to worry about. I worked around it by writing my commit
message separately and using "git commit -F".
-Peff
next reply other threads:[~2009-11-16 15:53 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-16 15:53 Jeff King [this message]
2009-11-16 15:56 ` [PATCH 1/2] diffcore-break: free filespec data as we go Jeff King
2009-11-16 16:02 ` [PATCH 2/2] diffcore-break: save cnt_data for other phases Jeff King
2009-11-16 21:20 ` Junio C Hamano
2009-11-19 15:22 ` Jeff King
2009-11-20 7:32 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091116155331.GA30719@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).