All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Rast <trast@inf.ethz.ch>
To: Vicent Marti <tanoku@gmail.com>
Cc: <git@vger.kernel.org>
Subject: Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Date: Tue, 25 Jun 2013 08:58:54 -0700	[thread overview]
Message-ID: <87y59xlvt7.fsf@linux-k42r.v.cablecom.net> (raw)
In-Reply-To: 1372116193-32762-10-git-send-email-tanoku@gmail.com

Vicent Marti <tanoku@gmail.com> writes:

> This is the technical documentation and design rationale for the new
> Bitmap v2 on-disk format.

Hrmpf, that's what I get for reading the series in order...

> +			The folowing flags are supported:
                              ^^

typos marked by ^

> +		By storing all the hashes in a cache together with the bitmapsin
                                                                             ^^

> +		The obvious consequence is that the XOR of all 4 bitmaps will result
> +		in a full set (all bits sets), and the AND of all 4 bitmaps will
                                           ^

> +		- 1-byte XOR-offset
> +			The xor offset used to compress this bitmap. For an entry
> +			in position `x`, a XOR offset of `y` means that the actual
> +			bitmap representing for this commit is composed by XORing the
> +			bitmap for this entry with the bitmap in entry `x-y` (i.e.
> +			the bitmap `y` entries before this one).
> +
> +			Note that this compression can be recursive. In order to
> +			XOR this entry with a previous one, the previous entry needs
> +			to be decompressed first, and so on.
> +
> +			The hard-limit for this offset is 160 (an entry can only be
> +			xor'ed against one of the 160 entries preceding it). This
> +			number is always positivea, and hence entries are always xor'ed
                                                 ^

> +			with **previous** bitmaps, not bitmaps that will come afterwards
> +			in the index.

Clever.  Why 160 though?

> +		- 2 bytes of RESERVED data (used right now for better packing).

What do they mean?

> +  With an index at the end of the file, we can load only this index in memory,
> +  allowing for very efficient access to all the available bitmaps lazily (we
> +  have their offsets in the mmaped file).

Is there anything preventing you from mmap()ing the index also?

> +== Appendix A: Serialization format for an EWAH bitmap
> +
> +Ewah bitmaps are serialized in the protocol as the JAVAEWAH
> +library, making them backwards compatible with the JGit
> +implementation:
> +
> +	- 4-byte number of bits of the resulting UNCOMPRESSED bitmap
> +
> +	- 4-byte number of words of the COMPRESSED bitmap, when stored
> +
> +	- N x 8-byte words, as specified by the previous field
> +
> +		This is the actual content of the compressed bitmap.
> +
> +	- 4-byte position of the current RLW for the compressed
> +		bitmap
> +
> +Note that the byte order for this serialization is not defined by
> +default. The byte order for all the content in a serialized EWAH
> +bitmap can be known by the byte order flags in the header of the
> +bitmap index file.

Please document the RLW format here.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

  parent reply	other threads:[~2013-06-25 21:33 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-24 23:22 [PATCH 00/16] Speed up Counting Objects with bitmap data Vicent Marti
2013-06-24 23:22 ` [PATCH 01/16] list-objects: mark tree as unparsed when we free its buffer Vicent Marti
2013-06-24 23:22 ` [PATCH 02/16] sha1_file: refactor into `find_pack_object_pos` Vicent Marti
2013-06-25 13:59   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 03/16] pack-objects: use a faster hash table Vicent Marti
2013-06-25 14:03   ` Thomas Rast
2013-06-26  2:14     ` Jeff King
2013-06-26  4:47       ` Jeff King
2013-06-25 17:58   ` Ramkumar Ramachandra
2013-06-25 22:48   ` Junio C Hamano
2013-06-25 23:09     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 04/16] pack-objects: make `pack_name_hash` global Vicent Marti
2013-06-24 23:23 ` [PATCH 05/16] revision: allow setting custom limiter function Vicent Marti
2013-06-24 23:23 ` [PATCH 06/16] sha1_file: export `git_open_noatime` Vicent Marti
2013-06-24 23:23 ` [PATCH 07/16] compat: add endinanness helpers Vicent Marti
2013-06-25 13:08   ` Peter Krefting
2013-06-25 13:25     ` Vicent Martí
2013-06-27  5:56       ` Peter Krefting
2013-06-24 23:23 ` [PATCH 08/16] ewah: compressed bitmap implementation Vicent Marti
2013-06-25  1:10   ` Junio C Hamano
2013-06-25 22:51     ` Junio C Hamano
2013-06-25 15:38   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 09/16] documentation: add documentation for the bitmap format Vicent Marti
2013-06-25  5:42   ` Shawn Pearce
2013-06-25 19:33     ` Vicent Martí
2013-06-25 21:17       ` Junio C Hamano
2013-06-25 22:08         ` Vicent Martí
2013-06-27  1:11           ` Shawn Pearce
2013-06-27  2:36             ` Vicent Martí
2013-06-27  2:45               ` Jeff King
2013-06-27 16:07                 ` Shawn Pearce
2013-06-27 17:17                   ` Jeff King
2013-07-01 18:47                   ` Colby Ranger
2013-07-01 19:13                     ` Shawn Pearce
2013-07-07  9:46                     ` Jeff King
2013-07-07 17:27                       ` Shawn Pearce
2013-06-26  5:11       ` Jeff King
2013-06-26 18:41         ` Colby Ranger
2013-06-26 22:33           ` Colby Ranger
2013-06-27  0:53             ` Colby Ranger
2013-06-27  1:32               ` Shawn Pearce
2013-06-27  1:29         ` Shawn Pearce
2013-06-25 15:58   ` Thomas Rast [this message]
2013-06-25 22:30     ` Vicent Martí
2013-06-26 23:12       ` Thomas Rast
2013-06-26 23:19         ` Thomas Rast
2013-06-24 23:23 ` [PATCH 10/16] pack-objects: use bitmaps when packing objects Vicent Marti
2013-06-25 12:48   ` Ramkumar Ramachandra
2013-06-25 15:58   ` Thomas Rast
2013-06-25 23:06   ` Junio C Hamano
2013-06-25 23:14     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 11/16] rev-list: add bitmap mode to speed up lists Vicent Marti
2013-06-25 16:22   ` Thomas Rast
2013-06-26  1:45     ` Vicent Martí
2013-06-26 23:13       ` Thomas Rast
2013-06-26  5:22     ` Jeff King
2013-06-24 23:23 ` [PATCH 12/16] pack-objects: implement bitmap writing Vicent Marti
2013-06-24 23:23 ` [PATCH 13/16] repack: consider bitmaps when performing repacks Vicent Marti
2013-06-25 23:00   ` Junio C Hamano
2013-06-25 23:16     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 14/16] sha1_file: implement `nth_packed_object_info` Vicent Marti
2013-06-24 23:23 ` [PATCH 15/16] write-bitmap: implement new git command to write bitmaps Vicent Marti
2013-06-24 23:23 ` [PATCH 16/16] rev-list: Optimize --count using bitmaps too Vicent Marti
2013-06-25 16:05 ` [PATCH 00/16] Speed up Counting Objects with bitmap data Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y59xlvt7.fsf@linux-k42r.v.cablecom.net \
    --to=trast@inf.ethz.ch \
    --cc=git@vger.kernel.org \
    --cc=tanoku@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.