git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH v4 09/23] documentation: add documentation for the bitmap format
Date: Sat, 21 Dec 2013 08:59:58 -0500	[thread overview]
Message-ID: <20131221135957.GI21145@sigill.intra.peff.net> (raw)
In-Reply-To: <20131221135651.GA20818@sigill.intra.peff.net>

From: Vicent Marti <tanoku@gmail.com>

This is the technical documentation for the JGit-compatible Bitmap v1
on-disk format.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/technical/bitmap-format.txt | 131 ++++++++++++++++++++++++++++++
 1 file changed, 131 insertions(+)
 create mode 100644 Documentation/technical/bitmap-format.txt

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
new file mode 100644
index 0000000..7a86bd7
--- /dev/null
+++ b/Documentation/technical/bitmap-format.txt
@@ -0,0 +1,131 @@
+GIT bitmap v1 format
+====================
+
+	- A header appears at the beginning:
+
+		4-byte signature: {'B', 'I', 'T', 'M'}
+
+		2-byte version number (network byte order)
+			The current implementation only supports version 1
+			of the bitmap index (the same one as JGit).
+
+		2-byte flags (network byte order)
+
+			The following flags are supported:
+
+			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
+			This flag must always be present. It implies that the bitmap
+			index has been generated for a packfile with full closure
+			(i.e. where every single object in the packfile can find
+			 its parent links inside the same packfile). This is a
+			requirement for the bitmap index format, also present in JGit,
+			that greatly reduces the complexity of the implementation.
+
+		4-byte entry count (network byte order)
+
+			The total count of entries (bitmapped commits) in this bitmap index.
+
+		20-byte checksum
+
+			The SHA1 checksum of the pack this bitmap index belongs to.
+
+	- 4 EWAH bitmaps that act as type indexes
+
+		Type indexes are serialized after the hash cache in the shape
+		of four EWAH bitmaps stored consecutively (see Appendix A for
+		the serialization format of an EWAH bitmap).
+
+		There is a bitmap for each Git object type, stored in the following
+		order:
+
+			- Commits
+			- Trees
+			- Blobs
+			- Tags
+
+		In each bitmap, the `n`th bit is set to true if the `n`th object
+		in the packfile is of that type.
+
+		The obvious consequence is that the OR of all 4 bitmaps will result
+		in a full set (all bits set), and the AND of all 4 bitmaps will
+		result in an empty bitmap (no bits set).
+
+	- N entries with compressed bitmaps, one for each indexed commit
+
+		Where `N` is the total amount of entries in this bitmap index.
+		Each entry contains the following:
+
+		- 4-byte object position (network byte order)
+			The position **in the index for the packfile** where the
+			bitmap for this commit is found.
+
+		- 1-byte XOR-offset
+			The xor offset used to compress this bitmap. For an entry
+			in position `x`, a XOR offset of `y` means that the actual
+			bitmap representing this commit is composed by XORing the
+			bitmap for this entry with the bitmap in entry `x-y` (i.e.
+			the bitmap `y` entries before this one).
+
+			Note that this compression can be recursive. In order to
+			XOR this entry with a previous one, the previous entry needs
+			to be decompressed first, and so on.
+
+			The hard-limit for this offset is 160 (an entry can only be
+			xor'ed against one of the 160 entries preceding it). This
+			number is always positive, and hence entries are always xor'ed
+			with **previous** bitmaps, not bitmaps that will come afterwards
+			in the index.
+
+		- 1-byte flags for this bitmap
+			At the moment the only available flag is `0x1`, which hints
+			that this bitmap can be re-used when rebuilding bitmap indexes
+			for the repository.
+
+		- The compressed bitmap itself, see Appendix A.
+
+== Appendix A: Serialization format for an EWAH bitmap
+
+Ewah bitmaps are serialized in the same protocol as the JAVAEWAH
+library, making them backwards compatible with the JGit
+implementation:
+
+	- 4-byte number of bits of the resulting UNCOMPRESSED bitmap
+
+	- 4-byte number of words of the COMPRESSED bitmap, when stored
+
+	- N x 8-byte words, as specified by the previous field
+
+		This is the actual content of the compressed bitmap.
+
+	- 4-byte position of the current RLW for the compressed
+		bitmap
+
+All words are stored in network byte order for their corresponding
+sizes.
+
+The compressed bitmap is stored in a form of run-length encoding, as
+follows.  It consists of a concatenation of an arbitrary number of
+chunks.  Each chunk consists of one or more 64-bit words
+
+     H  L_1  L_2  L_3 .... L_M
+
+H is called RLW (run length word).  It consists of (from lower to higher
+order bits):
+
+     - 1 bit: the repeated bit B
+
+     - 32 bits: repetition count K (unsigned)
+
+     - 31 bits: literal word count M (unsigned)
+
+The bitstream represented by the above chunk is then:
+
+     - K repetitions of B
+
+     - The bits stored in `L_1` through `L_M`.  Within a word, bits at
+       lower order come earlier in the stream than those at higher
+       order.
+
+The next word after `L_M` (if any) must again be a RLW, for the next
+chunk.  For efficient appending to the bitstream, the EWAH stores a
+pointer to the last RLW in the stream.
-- 
1.8.5.1.399.g900e7cd

  parent reply	other threads:[~2013-12-21 14:00 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-21 13:56 [PATCH v4 0/22] pack bitmaps Jeff King
2013-12-21 13:59 ` [PATCH v4 01/23] sha1write: make buffer const-correct Jeff King
2013-12-22  9:06   ` Christian Couder
2013-12-21 13:59 ` [PATCH v4 02/23] revindex: Export new APIs Jeff King
2013-12-21 13:59 ` [PATCH v4 03/23] pack-objects: Refactor the packing list Jeff King
2013-12-21 13:59 ` [PATCH v4 04/23] pack-objects: factor out name_hash Jeff King
2013-12-21 13:59 ` [PATCH v4 05/23] revision: allow setting custom limiter function Jeff King
2013-12-21 13:59 ` [PATCH v4 06/23] sha1_file: export `git_open_noatime` Jeff King
2013-12-21 13:59 ` [PATCH v4 07/23] compat: add endianness helpers Jeff King
2013-12-21 13:59 ` [PATCH v4 08/23] ewah: compressed bitmap implementation Jeff King
2014-01-23  2:05   ` Jonathan Nieder
2014-01-23 18:33     ` Jeff King
2014-01-23 18:35       ` [PATCH 1/2] compat: move unaligned helpers to bswap.h Jeff King
2014-01-23 19:41         ` Jonathan Nieder
2014-01-23 19:44           ` Jeff King
2014-01-23 19:56             ` Jonathan Nieder
2014-01-23 20:04               ` Jeff King
2014-01-23 20:08                 ` Jonathan Nieder
2014-01-23 20:09                   ` Jeff King
2014-01-23 18:35       ` [PATCH 2/2] ewah: support platforms that require aligned reads Jeff King
2014-01-23 19:52       ` [PATCH v4 08/23] ewah: compressed bitmap implementation Jonathan Nieder
2014-01-23 20:03         ` Jeff King
2014-01-23 20:12           ` Jonathan Nieder
2014-01-23 20:13             ` Jeff King
2014-01-23 20:23           ` Jonathan Nieder
2014-01-23 20:29             ` Jeff King
2014-01-23 20:38           ` Jeff King
2014-01-23 20:14       ` Shawn Pearce
2014-01-23 20:26         ` Jeff King
2014-01-23 21:53           ` brian m. carlson
2014-01-23 22:07             ` Jeff King
2014-01-23 22:17               ` Jonathan Nieder
2014-01-23 22:26                 ` Jeff King
2014-01-23 22:33                   ` Jonathan Nieder
2014-01-23 20:18       ` Jonathan Nieder
2014-01-23 21:20       ` [PATCH v2 0/3] unaligned reads from .bitmap files Jeff King
2014-01-23 21:23         ` [PATCH 1/3] block-sha1: factor out get_be and put_be wrappers Jeff King
2014-01-23 23:19           ` Jonathan Nieder
2014-01-23 21:26         ` [PATCH 2/3] read-cache: use get_be32 instead of hand-rolled ntoh_l Jeff King
2014-01-23 23:34           ` Jonathan Nieder
2014-01-24  2:22             ` Jeff King
2014-01-23 21:27         ` [PATCH 3/3] ewah: support platforms that require aligned reads Jeff King
2014-01-23 23:44           ` Jonathan Nieder
2014-01-23 23:49             ` Vicent Martí
2014-01-24  0:15               ` Jonathan Nieder
2014-01-23 23:17         ` [PATCH v2 0/3] unaligned reads from .bitmap files Jonathan Nieder
2013-12-21 13:59 ` Jeff King [this message]
2013-12-21 14:00 ` [PATCH v4 10/23] pack-bitmap: add support for bitmap indexes Jeff King
2013-12-21 14:00 ` [PATCH v4 11/23] pack-objects: split add_object_entry Jeff King
2013-12-21 14:00 ` [PATCH v4 12/23] pack-objects: use bitmaps when packing objects Jeff King
2013-12-21 14:00 ` [PATCH v4 13/23] rev-list: add bitmap mode to speed up object lists Jeff King
2013-12-21 14:00 ` [PATCH v4 14/23] pack-objects: implement bitmap writing Jeff King
2013-12-21 14:00 ` [PATCH v4 15/23] repack: stop using magic number for ARRAY_SIZE(exts) Jeff King
2013-12-21 14:00 ` [PATCH v4 16/23] repack: turn exts array into array-of-struct Jeff King
2013-12-21 14:00 ` [PATCH v4 17/23] repack: handle optional files created by pack-objects Jeff King
2013-12-21 14:00 ` [PATCH v4 18/23] repack: consider bitmaps when performing repacks Jeff King
2013-12-21 14:00 ` [PATCH v4 19/23] count-objects: recognize .bitmap in garbage-checking Jeff King
2013-12-21 14:00 ` [PATCH v4 20/23] t: add basic bitmap functionality tests Jeff King
2013-12-21 14:00 ` [PATCH v4 21/23] t/perf: add tests for pack bitmaps Jeff King
2013-12-21 14:00 ` [PATCH v4 22/23] pack-bitmap: implement optional name_hash cache Jeff King
2013-12-21 14:00 ` [PATCH v4 23/23] compat/mingw.h: Fix the MinGW and msvc builds Jeff King
2013-12-25 22:08   ` Erik Faye-Lund
2013-12-28 10:00     ` Jeff King
2013-12-28 10:06       ` Vicent Martí
2013-12-28 15:58       ` Ramsay Jones
2013-12-21 14:03 ` [PATCH v4 0/22] pack bitmaps Jeff King
2013-12-21 14:05 ` Jeff King
2013-12-21 18:34 ` Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131221135957.GI21145@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).