From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 0/5] diff_filespec cleanups and optimizations
Date: Thu, 16 Jan 2014 20:18:45 -0500 [thread overview]
Message-ID: <20140117011844.GA6870@sigill.intra.peff.net> (raw)
I recently came across a repository with a commit containing 100 million
paths in its tree. Cleverly, the whole repo fits into a 1.5K packfile
(can you guess how it was done?). Not so cleverly, running "diff-tree
--root" on that commit uses a large amount of memory. :)
I do not think it is worth optimizing for such a pathological
repository. But I was curious how much it would want (it OOM'd on my
64-bit 16G machine). The answer is roughly:
100,000,000 * (
8 bytes per pointer to diff_filepair in the diff_queue
+ 32 bytes per diff_filepair struct
+ 2 * (
96 bytes per diff_filespec struct
+ 12 bytes per filename (in this case)
)
)
which is about 25G. Plus malloc overhead. So obviously this example is
unreasonable. A more reasonable large case is something like WebKit at
~150K files, doing a diff against the empty tree. That's only 37M.
But while looking at it, I noticed a bunch of cleanups for
diff_filespec. With the patches below, sizeof(struct diff_filespec) on
my 64-bit machine goes from 96 bytes down to 80. Compiling with "-m32"
goes from 64 bytes down to 52.
The first few patches have cleanup value aside from the struct size
improvement. The last two are pure optimization. I doubt the
optimization is noticeable for any real-life cases, so I don't mind if
they get dropped. But they're quite trivial and obvious.
[1/5]: diff_filespec: reorder dirty_submodule macro definitions
[2/5]: diff_filespec: drop funcname_pattern_ident field
[3/5]: diff_filespec: drop xfrm_flags field
[4/5]: diff_filespec: reorder is_binary field
[5/5]: diff_filespec: use only 2 bits for is_binary flag
-Peff
next reply other threads:[~2014-01-17 1:18 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-17 1:18 Jeff King [this message]
2014-01-17 1:19 ` [PATCH 1/5] diff_filespec: reorder dirty_submodule macro definitions Jeff King
2014-01-17 18:46 ` Junio C Hamano
2014-01-17 19:47 ` Jeff King
2014-01-17 23:50 ` Junio C Hamano
2014-01-17 1:20 ` [PATCH 2/5] diff_filespec: drop funcname_pattern_ident field Jeff King
2014-01-17 1:21 ` [PATCH 3/5] diff_filespec: drop xfrm_flags field Jeff King
2014-01-17 1:22 ` [PATCH 4/5] diff_filespec: reorder is_binary field Jeff King
2014-01-17 1:25 ` [PATCH 5/5] diff_filespec: use only 2 bits for is_binary flag Jeff King
2014-01-17 18:49 ` [PATCH 0/5] diff_filespec cleanups and optimizations Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140117011844.GA6870@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).