git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Shawn Pearce <spearce@spearce.org>
Cc: "Vicent Martí" <tanoku@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Colby Ranger" <cranger@google.com>, git <git@vger.kernel.org>
Subject: Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Date: Thu, 27 Jun 2013 13:17:33 -0400	[thread overview]
Message-ID: <20130627171733.GA17601@sigill.intra.peff.net> (raw)
In-Reply-To: <CAJo=hJvOq=CATrDeYAwi+jgkPpqjywWhuKeC1TVYeCXr6NVM6w@mail.gmail.com>

On Thu, Jun 27, 2013 at 09:07:38AM -0700, Shawn O. Pearce wrote:

> > And the pack-order versus idx-order for the bitmaps is still up in the
> > air. Do we have numbers on the on-disk sizes of the resulting EWAHs?
> 
> I did not see any presented in this thread, and I am very interested
> in this aspect of the series. The path hash cache should be taking
> about 9.9M of disk space, but I recall reading the bitmap file is 8M.
> I don't understand.

I don't know there the 8M number came from, or if it was on the kernel
repo. My bitmap-enabled pack of linux-2.6 (about 3.2M objects) using
Vicent's patches looks like:

  $ du -sh *
  42M     pack-9ea76831aec6c49c5ff42509a2a2ce97da13c5ad.bitmap
  87M     pack-9ea76831aec6c49c5ff42509a2a2ce97da13c5ad.idx
  630M    pack-9ea76831aec6c49c5ff42509a2a2ce97da13c5ad.pack

Packing the same repo with "jgit debug-gc" (jgit 3.0.0) yields:

  $ du -sh *
  3.0M    pack-2478783825733a1f1012f0087a0b5a92aa7437d8.bitmap
  82M     pack-2478783825733a1f1012f0087a0b5a92aa7437d8.idx
  585M    pack-2478783825733a1f1012f0087a0b5a92aa7437d8.pack
  4.8M    pack-f61fb76112372288923be7a0464476892dfebe3e.idx
  97M     pack-f61fb76112372288923be7a0464476892dfebe3e.pack

If we assume that 12M of that is name-hash, that's still an order of
magnitude larger. For reference, jgit created 327 bitmaps (according to
its progress eye candy), and Vicent's patches generated 385. So that
explains some of the increase, but the per-bitmap size is still much
larger.

> The path hash cache may still be required, Colby and I have been
> debating the merits of having the data available for delta compression
> vs. the increase in memory required to hold it.

I guess this is not an option for JGit, but for C git, an mmap-able
name-hash file means we can just fault in the pages mentioning objects
we actually need it for. And its use can be completely optional; in
fact, it doesn't even need to be inside the .bitmap file (though I
cannot think of a reason it would be useful outside of having bitmaps).

-Peff

  reply	other threads:[~2013-06-27 17:17 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-24 23:22 [PATCH 00/16] Speed up Counting Objects with bitmap data Vicent Marti
2013-06-24 23:22 ` [PATCH 01/16] list-objects: mark tree as unparsed when we free its buffer Vicent Marti
2013-06-24 23:22 ` [PATCH 02/16] sha1_file: refactor into `find_pack_object_pos` Vicent Marti
2013-06-25 13:59   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 03/16] pack-objects: use a faster hash table Vicent Marti
2013-06-25 14:03   ` Thomas Rast
2013-06-26  2:14     ` Jeff King
2013-06-26  4:47       ` Jeff King
2013-06-25 17:58   ` Ramkumar Ramachandra
2013-06-25 22:48   ` Junio C Hamano
2013-06-25 23:09     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 04/16] pack-objects: make `pack_name_hash` global Vicent Marti
2013-06-24 23:23 ` [PATCH 05/16] revision: allow setting custom limiter function Vicent Marti
2013-06-24 23:23 ` [PATCH 06/16] sha1_file: export `git_open_noatime` Vicent Marti
2013-06-24 23:23 ` [PATCH 07/16] compat: add endinanness helpers Vicent Marti
2013-06-25 13:08   ` Peter Krefting
2013-06-25 13:25     ` Vicent Martí
2013-06-27  5:56       ` Peter Krefting
2013-06-24 23:23 ` [PATCH 08/16] ewah: compressed bitmap implementation Vicent Marti
2013-06-25  1:10   ` Junio C Hamano
2013-06-25 22:51     ` Junio C Hamano
2013-06-25 15:38   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 09/16] documentation: add documentation for the bitmap format Vicent Marti
2013-06-25  5:42   ` Shawn Pearce
2013-06-25 19:33     ` Vicent Martí
2013-06-25 21:17       ` Junio C Hamano
2013-06-25 22:08         ` Vicent Martí
2013-06-27  1:11           ` Shawn Pearce
2013-06-27  2:36             ` Vicent Martí
2013-06-27  2:45               ` Jeff King
2013-06-27 16:07                 ` Shawn Pearce
2013-06-27 17:17                   ` Jeff King [this message]
2013-07-01 18:47                   ` Colby Ranger
2013-07-01 19:13                     ` Shawn Pearce
2013-07-07  9:46                     ` Jeff King
2013-07-07 17:27                       ` Shawn Pearce
2013-06-26  5:11       ` Jeff King
2013-06-26 18:41         ` Colby Ranger
2013-06-26 22:33           ` Colby Ranger
2013-06-27  0:53             ` Colby Ranger
2013-06-27  1:32               ` Shawn Pearce
2013-06-27  1:29         ` Shawn Pearce
2013-06-25 15:58   ` Thomas Rast
2013-06-25 22:30     ` Vicent Martí
2013-06-26 23:12       ` Thomas Rast
2013-06-26 23:19         ` Thomas Rast
2013-06-24 23:23 ` [PATCH 10/16] pack-objects: use bitmaps when packing objects Vicent Marti
2013-06-25 12:48   ` Ramkumar Ramachandra
2013-06-25 15:58   ` Thomas Rast
2013-06-25 23:06   ` Junio C Hamano
2013-06-25 23:14     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 11/16] rev-list: add bitmap mode to speed up lists Vicent Marti
2013-06-25 16:22   ` Thomas Rast
2013-06-26  1:45     ` Vicent Martí
2013-06-26 23:13       ` Thomas Rast
2013-06-26  5:22     ` Jeff King
2013-06-24 23:23 ` [PATCH 12/16] pack-objects: implement bitmap writing Vicent Marti
2013-06-24 23:23 ` [PATCH 13/16] repack: consider bitmaps when performing repacks Vicent Marti
2013-06-25 23:00   ` Junio C Hamano
2013-06-25 23:16     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 14/16] sha1_file: implement `nth_packed_object_info` Vicent Marti
2013-06-24 23:23 ` [PATCH 15/16] write-bitmap: implement new git command to write bitmaps Vicent Marti
2013-06-24 23:23 ` [PATCH 16/16] rev-list: Optimize --count using bitmaps too Vicent Marti
2013-06-25 16:05 ` [PATCH 00/16] Speed up Counting Objects with bitmap data Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130627171733.GA17601@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=cranger@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=spearce@spearce.org \
    --cc=tanoku@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).