git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] diff_filespec cleanups and optimizations
@ 2014-01-17  1:18 Jeff King
  2014-01-17  1:19 ` [PATCH 1/5] diff_filespec: reorder dirty_submodule macro definitions Jeff King
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Jeff King @ 2014-01-17  1:18 UTC (permalink / raw)
  To: git

I recently came across a repository with a commit containing 100 million
paths in its tree. Cleverly, the whole repo fits into a 1.5K packfile
(can you guess how it was done?). Not so cleverly, running "diff-tree
--root" on that commit uses a large amount of memory. :)

I do not think it is worth optimizing for such a pathological
repository. But I was curious how much it would want (it OOM'd on my
64-bit 16G machine). The answer is roughly:

   100,000,000 * (
      8 bytes per pointer to diff_filepair in the diff_queue
    + 32 bytes per diff_filepair struct
    +  2 * (
         96 bytes per diff_filespec struct
       + 12 bytes per filename (in this case)
     )
  )

which is about 25G. Plus malloc overhead. So obviously this example is
unreasonable. A more reasonable large case is something like WebKit at
~150K files, doing a diff against the empty tree. That's only 37M.

But while looking at it, I noticed a bunch of cleanups for
diff_filespec.  With the patches below, sizeof(struct diff_filespec) on
my 64-bit machine goes from 96 bytes down to 80. Compiling with "-m32"
goes from 64 bytes down to 52.

The first few patches have cleanup value aside from the struct size
improvement. The last two are pure optimization. I doubt the
optimization is noticeable for any real-life cases, so I don't mind if
they get dropped. But they're quite trivial and obvious.

  [1/5]: diff_filespec: reorder dirty_submodule macro definitions
  [2/5]: diff_filespec: drop funcname_pattern_ident field
  [3/5]: diff_filespec: drop xfrm_flags field
  [4/5]: diff_filespec: reorder is_binary field
  [5/5]: diff_filespec: use only 2 bits for is_binary flag

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-01-17 23:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-17  1:18 [PATCH 0/5] diff_filespec cleanups and optimizations Jeff King
2014-01-17  1:19 ` [PATCH 1/5] diff_filespec: reorder dirty_submodule macro definitions Jeff King
2014-01-17 18:46   ` Junio C Hamano
2014-01-17 19:47     ` Jeff King
2014-01-17 23:50       ` Junio C Hamano
2014-01-17  1:20 ` [PATCH 2/5] diff_filespec: drop funcname_pattern_ident field Jeff King
2014-01-17  1:21 ` [PATCH 3/5] diff_filespec: drop xfrm_flags field Jeff King
2014-01-17  1:22 ` [PATCH 4/5] diff_filespec: reorder is_binary field Jeff King
2014-01-17  1:25 ` [PATCH 5/5] diff_filespec: use only 2 bits for is_binary flag Jeff King
2014-01-17 18:49 ` [PATCH 0/5] diff_filespec cleanups and optimizations Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).