git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: "Vicent Martí" <vicent@github.com>
Subject: [PATCH v3 09/21] documentation: add documentation for the bitmap format
Date: Thu, 14 Nov 2013 07:44:02 -0500	[thread overview]
Message-ID: <20131114124402.GI10757@sigill.intra.peff.net> (raw)
In-Reply-To: <20131114124157.GA23784@sigill.intra.peff.net>

From: Vicent Marti <tanoku@gmail.com>

This is the technical documentation for the JGit-compatible Bitmap v1
on-disk format.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/technical/bitmap-format.txt | 131 ++++++++++++++++++++++++++++++
 1 file changed, 131 insertions(+)
 create mode 100644 Documentation/technical/bitmap-format.txt

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
new file mode 100644
index 0000000..7a86bd7
--- /dev/null
+++ b/Documentation/technical/bitmap-format.txt
@@ -0,0 +1,131 @@
+GIT bitmap v1 format
+====================
+
+	- A header appears at the beginning:
+
+		4-byte signature: {'B', 'I', 'T', 'M'}
+
+		2-byte version number (network byte order)
+			The current implementation only supports version 1
+			of the bitmap index (the same one as JGit).
+
+		2-byte flags (network byte order)
+
+			The following flags are supported:
+
+			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
+			This flag must always be present. It implies that the bitmap
+			index has been generated for a packfile with full closure
+			(i.e. where every single object in the packfile can find
+			 its parent links inside the same packfile). This is a
+			requirement for the bitmap index format, also present in JGit,
+			that greatly reduces the complexity of the implementation.
+
+		4-byte entry count (network byte order)
+
+			The total count of entries (bitmapped commits) in this bitmap index.
+
+		20-byte checksum
+
+			The SHA1 checksum of the pack this bitmap index belongs to.
+
+	- 4 EWAH bitmaps that act as type indexes
+
+		Type indexes are serialized after the hash cache in the shape
+		of four EWAH bitmaps stored consecutively (see Appendix A for
+		the serialization format of an EWAH bitmap).
+
+		There is a bitmap for each Git object type, stored in the following
+		order:
+
+			- Commits
+			- Trees
+			- Blobs
+			- Tags
+
+		In each bitmap, the `n`th bit is set to true if the `n`th object
+		in the packfile is of that type.
+
+		The obvious consequence is that the OR of all 4 bitmaps will result
+		in a full set (all bits set), and the AND of all 4 bitmaps will
+		result in an empty bitmap (no bits set).
+
+	- N entries with compressed bitmaps, one for each indexed commit
+
+		Where `N` is the total amount of entries in this bitmap index.
+		Each entry contains the following:
+
+		- 4-byte object position (network byte order)
+			The position **in the index for the packfile** where the
+			bitmap for this commit is found.
+
+		- 1-byte XOR-offset
+			The xor offset used to compress this bitmap. For an entry
+			in position `x`, a XOR offset of `y` means that the actual
+			bitmap representing this commit is composed by XORing the
+			bitmap for this entry with the bitmap in entry `x-y` (i.e.
+			the bitmap `y` entries before this one).
+
+			Note that this compression can be recursive. In order to
+			XOR this entry with a previous one, the previous entry needs
+			to be decompressed first, and so on.
+
+			The hard-limit for this offset is 160 (an entry can only be
+			xor'ed against one of the 160 entries preceding it). This
+			number is always positive, and hence entries are always xor'ed
+			with **previous** bitmaps, not bitmaps that will come afterwards
+			in the index.
+
+		- 1-byte flags for this bitmap
+			At the moment the only available flag is `0x1`, which hints
+			that this bitmap can be re-used when rebuilding bitmap indexes
+			for the repository.
+
+		- The compressed bitmap itself, see Appendix A.
+
+== Appendix A: Serialization format for an EWAH bitmap
+
+Ewah bitmaps are serialized in the same protocol as the JAVAEWAH
+library, making them backwards compatible with the JGit
+implementation:
+
+	- 4-byte number of bits of the resulting UNCOMPRESSED bitmap
+
+	- 4-byte number of words of the COMPRESSED bitmap, when stored
+
+	- N x 8-byte words, as specified by the previous field
+
+		This is the actual content of the compressed bitmap.
+
+	- 4-byte position of the current RLW for the compressed
+		bitmap
+
+All words are stored in network byte order for their corresponding
+sizes.
+
+The compressed bitmap is stored in a form of run-length encoding, as
+follows.  It consists of a concatenation of an arbitrary number of
+chunks.  Each chunk consists of one or more 64-bit words
+
+     H  L_1  L_2  L_3 .... L_M
+
+H is called RLW (run length word).  It consists of (from lower to higher
+order bits):
+
+     - 1 bit: the repeated bit B
+
+     - 32 bits: repetition count K (unsigned)
+
+     - 31 bits: literal word count M (unsigned)
+
+The bitstream represented by the above chunk is then:
+
+     - K repetitions of B
+
+     - The bits stored in `L_1` through `L_M`.  Within a word, bits at
+       lower order come earlier in the stream than those at higher
+       order.
+
+The next word after `L_M` (if any) must again be a RLW, for the next
+chunk.  For efficient appending to the bitstream, the EWAH stores a
+pointer to the last RLW in the stream.
-- 
1.8.5.rc0.443.g2df7f3f

  parent reply	other threads:[~2013-11-14 12:44 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-14 12:41 [PATCH v3 0/21] pack bitmaps Jeff King
2013-11-14 12:42 ` [PATCH v3 01/21] sha1write: make buffer const-correct Jeff King
2013-11-14 12:42 ` [PATCH v3 02/21] revindex: Export new APIs Jeff King
2013-11-14 12:42 ` [PATCH v3 03/21] pack-objects: Refactor the packing list Jeff King
2013-11-14 12:43 ` [PATCH v3 04/21] pack-objects: factor out name_hash Jeff King
2013-11-14 12:43 ` [PATCH v3 05/21] revision: allow setting custom limiter function Jeff King
2013-11-14 12:43 ` [PATCH v3 06/21] sha1_file: export `git_open_noatime` Jeff King
2013-11-14 12:43 ` [PATCH v3 07/21] compat: add endianness helpers Jeff King
2013-11-14 12:43 ` [PATCH v3 08/21] ewah: compressed bitmap implementation Jeff King
2013-11-14 12:44 ` Jeff King [this message]
2013-11-14 12:44 ` [PATCH v3 10/21] pack-bitmap: add support for bitmap indexes Jeff King
2013-11-24 21:36   ` Thomas Rast
2013-11-25 15:04     ` [PATCH] Document khash Thomas Rast
2013-11-28 10:35       ` Jeff King
2013-11-27  9:08     ` [PATCH v3 10/21] pack-bitmap: add support for bitmap indexes Karsten Blees
2013-11-28 10:38       ` Jeff King
2013-12-03 14:40         ` Karsten Blees
2013-12-03 18:21           ` Jeff King
2013-12-07 20:52             ` Karsten Blees
2013-11-29 21:21   ` Thomas Rast
2013-12-02 16:12     ` Jeff King
2013-12-02 20:36       ` Junio C Hamano
2013-12-02 20:47         ` Jeff King
2013-12-02 21:43           ` Junio C Hamano
2013-11-14 12:45 ` [PATCH v3 11/21] pack-objects: use bitmaps when packing objects Jeff King
2013-12-07 15:47   ` Thomas Rast
2013-12-21 13:15     ` Jeff King
2013-11-14 12:45 ` [PATCH v3 12/21] rev-list: add bitmap mode to speed up object lists Jeff King
2013-12-07 16:05   ` Thomas Rast
2013-11-14 12:45 ` [PATCH v3 13/21] pack-objects: implement bitmap writing Jeff King
2013-12-07 16:32   ` Thomas Rast
2013-12-21 13:17     ` Jeff King
2013-11-14 12:45 ` [PATCH v3 14/21] repack: stop using magic number for ARRAY_SIZE(exts) Jeff King
2013-12-07 16:34   ` Thomas Rast
2013-11-14 12:46 ` [PATCH v3 15/21] repack: turn exts array into array-of-struct Jeff King
2013-12-07 16:34   ` Thomas Rast
2013-11-14 12:46 ` [PATCH v3 16/21] repack: handle optional files created by pack-objects Jeff King
2013-12-07 16:35   ` Thomas Rast
2013-11-14 12:46 ` [PATCH v3 17/21] repack: consider bitmaps when performing repacks Jeff King
2013-12-07 16:37   ` Thomas Rast
2013-11-14 12:46 ` [PATCH v3 18/21] count-objects: recognize .bitmap in garbage-checking Jeff King
2013-12-07 16:38   ` Thomas Rast
2013-11-14 12:46 ` [PATCH v3 19/21] t: add basic bitmap functionality tests Jeff King
2013-12-07 16:43   ` Thomas Rast
2013-12-21 13:22     ` Jeff King
2013-11-14 12:48 ` [PATCH v3 20/21] t/perf: add tests for pack bitmaps Jeff King
2013-12-07 16:51   ` Thomas Rast
2013-12-21 13:40     ` Jeff King
2013-11-14 12:48 ` [PATCH v3 21/21] pack-bitmap: implement optional name_hash cache Jeff King
2013-12-07 16:59   ` Thomas Rast
2013-11-14 19:19 ` [PATCH v3 0/21] pack bitmaps Ramsay Jones
2013-11-14 21:33   ` Jeff King
2013-11-14 23:09     ` Ramsay Jones
2013-11-18 21:16       ` Ramsay Jones
2013-11-16 10:28     ` Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131114124402.GI10757@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=vicent@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).