From: Thomas Rast <trast@inf.ethz.ch>
To: "Vicent Martí" <tanoku@gmail.com>
Cc: git <git@vger.kernel.org>, Colby Ranger <cranger@google.com>
Subject: Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Date: Wed, 26 Jun 2013 16:12:45 -0700 [thread overview]
Message-ID: <87vc50lb4i.fsf@linux-k42r.v.cablecom.net> (raw)
In-Reply-To: <CAFFjANQ_PoTT5bUrZ_0oARz=oZysJdMC1MAsHR2MCZVubfSbsw@mail.gmail.com> ("Vicent \=\?utf-8\?Q\?Mart\=C3\=AD\=22's\?\= message of "Wed, 26 Jun 2013 00:30:06 +0200")
Vicent Martí <tanoku@gmail.com> writes:
> On Tue, Jun 25, 2013 at 5:58 PM, Thomas Rast <trast@inf.ethz.ch> wrote:
>>
>> Please document the RLW format here.
>
> Har har. I was going to comment on your review of the Ewah patchset,
> but might as well do it here: the only thing I know about Ewah bitmaps
> is that they work. And I know this because I did extensive fuzz
> testing of my C port. Unfortunately, the original Java code I ported
> from has 0 comments, so any documentation here would have to be
> reverse-engineered.
I think the below would be a reasonable documentation, to be appended
after your description of the EWAH format. Maybe Colby can correct me
if I got anything wrong. You can basically read this off from the
implementation of ewah_each_bit() and the helper functions it uses.
-- 8< --
The compressed bitmap is stored in a form of run-length encoding, as
follows. It consists of a concatenation of an arbitrary number of
chunks. Each chunk consists of one or more 64-bit words
H L_1 L_2 L_3 .... L_M
H is called RLW (run length word). It consists of (from lower to higher
order bits):
- 1 bit: the repeated bit B
- 32 bits: repetition count K (unsigned)
- 31 bits: literal word count M (unsigned)
The bitstream represented by the above chunk is then:
- K repetitions of B
- The bits stored in `L_1` through `L_M`. Within a word, bits at
lower order come earlier in the stream than those at higher
order.
The next word after `L_M` (if any) must again be a RLW, for the next
chunk. For efficient appending to the bitstream, the EWAH stores a
format to the last RLW in the stream.
--
Thomas Rast
trast@{inf,student}.ethz.ch
next prev parent reply other threads:[~2013-06-26 23:12 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-24 23:22 [PATCH 00/16] Speed up Counting Objects with bitmap data Vicent Marti
2013-06-24 23:22 ` [PATCH 01/16] list-objects: mark tree as unparsed when we free its buffer Vicent Marti
2013-06-24 23:22 ` [PATCH 02/16] sha1_file: refactor into `find_pack_object_pos` Vicent Marti
2013-06-25 13:59 ` Thomas Rast
2013-06-24 23:23 ` [PATCH 03/16] pack-objects: use a faster hash table Vicent Marti
2013-06-25 14:03 ` Thomas Rast
2013-06-26 2:14 ` Jeff King
2013-06-26 4:47 ` Jeff King
2013-06-25 17:58 ` Ramkumar Ramachandra
2013-06-25 22:48 ` Junio C Hamano
2013-06-25 23:09 ` Vicent Martí
2013-06-24 23:23 ` [PATCH 04/16] pack-objects: make `pack_name_hash` global Vicent Marti
2013-06-24 23:23 ` [PATCH 05/16] revision: allow setting custom limiter function Vicent Marti
2013-06-24 23:23 ` [PATCH 06/16] sha1_file: export `git_open_noatime` Vicent Marti
2013-06-24 23:23 ` [PATCH 07/16] compat: add endinanness helpers Vicent Marti
2013-06-25 13:08 ` Peter Krefting
2013-06-25 13:25 ` Vicent Martí
2013-06-27 5:56 ` Peter Krefting
2013-06-24 23:23 ` [PATCH 08/16] ewah: compressed bitmap implementation Vicent Marti
2013-06-25 1:10 ` Junio C Hamano
2013-06-25 22:51 ` Junio C Hamano
2013-06-25 15:38 ` Thomas Rast
2013-06-24 23:23 ` [PATCH 09/16] documentation: add documentation for the bitmap format Vicent Marti
2013-06-25 5:42 ` Shawn Pearce
2013-06-25 19:33 ` Vicent Martí
2013-06-25 21:17 ` Junio C Hamano
2013-06-25 22:08 ` Vicent Martí
2013-06-27 1:11 ` Shawn Pearce
2013-06-27 2:36 ` Vicent Martí
2013-06-27 2:45 ` Jeff King
2013-06-27 16:07 ` Shawn Pearce
2013-06-27 17:17 ` Jeff King
2013-07-01 18:47 ` Colby Ranger
2013-07-01 19:13 ` Shawn Pearce
2013-07-07 9:46 ` Jeff King
2013-07-07 17:27 ` Shawn Pearce
2013-06-26 5:11 ` Jeff King
2013-06-26 18:41 ` Colby Ranger
2013-06-26 22:33 ` Colby Ranger
2013-06-27 0:53 ` Colby Ranger
2013-06-27 1:32 ` Shawn Pearce
2013-06-27 1:29 ` Shawn Pearce
2013-06-25 15:58 ` Thomas Rast
2013-06-25 22:30 ` Vicent Martí
2013-06-26 23:12 ` Thomas Rast [this message]
2013-06-26 23:19 ` Thomas Rast
2013-06-24 23:23 ` [PATCH 10/16] pack-objects: use bitmaps when packing objects Vicent Marti
2013-06-25 12:48 ` Ramkumar Ramachandra
2013-06-25 15:58 ` Thomas Rast
2013-06-25 23:06 ` Junio C Hamano
2013-06-25 23:14 ` Vicent Martí
2013-06-24 23:23 ` [PATCH 11/16] rev-list: add bitmap mode to speed up lists Vicent Marti
2013-06-25 16:22 ` Thomas Rast
2013-06-26 1:45 ` Vicent Martí
2013-06-26 23:13 ` Thomas Rast
2013-06-26 5:22 ` Jeff King
2013-06-24 23:23 ` [PATCH 12/16] pack-objects: implement bitmap writing Vicent Marti
2013-06-24 23:23 ` [PATCH 13/16] repack: consider bitmaps when performing repacks Vicent Marti
2013-06-25 23:00 ` Junio C Hamano
2013-06-25 23:16 ` Vicent Martí
2013-06-24 23:23 ` [PATCH 14/16] sha1_file: implement `nth_packed_object_info` Vicent Marti
2013-06-24 23:23 ` [PATCH 15/16] write-bitmap: implement new git command to write bitmaps Vicent Marti
2013-06-24 23:23 ` [PATCH 16/16] rev-list: Optimize --count using bitmaps too Vicent Marti
2013-06-25 16:05 ` [PATCH 00/16] Speed up Counting Objects with bitmap data Thomas Rast
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87vc50lb4i.fsf@linux-k42r.v.cablecom.net \
--to=trast@inf.ethz.ch \
--cc=cranger@google.com \
--cc=git@vger.kernel.org \
--cc=tanoku@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.