From: Junio C Hamano <gitster@pobox.com>
To: "Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>,
Kaartic Sivaram <kaartic.sivaraam@gmail.com>,
Derrick Stolee <derrickstolee@github.com>,
Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Subject: Re: [PATCH 3/5] roaring: teach Git to write roaring bitmaps
Date: Thu, 29 Sep 2022 23:20:58 -0700 [thread overview]
Message-ID: <xmqqczbdl6wl.fsf@gitster.g> (raw)
In-Reply-To: 4364224f9bddc8f1e40875ebc540b28225317176.1663609659.git.gitgitgadget@gmail.com
"Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Roaring bitmaps are said to be more efficient (most of the time) than
> ewah bitmaps. So Git might gain some optimization if it support roaring
> bitmaps. As Roaring library has all the changes it needed to implement
> roaring bitmaps in Git, Git can learn to write roaring bitmaps. However,
> all the changes are backward-compatible.
>
> Teach Git to write roaring bitmaps.
That is way underexplained. At least cover what the plans are, so
that readers do not have to ask these questions:
* When is the choice of bitmap type is made? Is it fixed at
repository initialization time and once chosen other kinds cannot
be used?
* Is the bitmap file self describing? How does a reader know
between ewah and roaring codepaths to use to read a given bitmap
file? Is there enough room for extending the set of bitmap
formats, or we cannot add other formats easily?
> Mentored-by: Taylor Blau <me@ttaylorr.com>
> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
> Makefile | 1 +
> bitmap.c | 225 +++++++++++++++++++++++++++
> bitmap.h | 33 ++++
> builtin/diff.c | 10 +-
> ewah/bitmap.c | 61 +++++---
> ewah/ewok.h | 37 ++---
> pack-bitmap-write.c | 326 ++++++++++++++++++++++++++++++----------
> pack-bitmap.c | 114 +++++++-------
> pack-bitmap.h | 22 ++-
> t/t5310-pack-bitmaps.sh | 17 +++
> 10 files changed, 664 insertions(+), 182 deletions(-)
> create mode 100644 bitmap.c
> create mode 100644 bitmap.h
>
> diff --git a/Makefile b/Makefile
> index e9537951105..9ca19b3ca8d 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -900,6 +900,7 @@ LIB_OBJS += archive.o
> LIB_OBJS += attr.o
> LIB_OBJS += base85.o
> LIB_OBJS += bisect.o
> +LIB_OBJS += bitmap.o
> LIB_OBJS += blame.o
> LIB_OBJS += blob.o
> LIB_OBJS += bloom.o
> diff --git a/bitmap.c b/bitmap.c
> new file mode 100644
> index 00000000000..7d547eb9f53
> --- /dev/null
> +++ b/bitmap.c
> @@ -0,0 +1,225 @@
> +#include "bitmap.h"
> +#include "cache.h"
> +
> +static enum bitmap_type bitmap_type = INIT_BITMAP_TYPE;
"INIT" is a strange name for "UNINITIALIZED". Especially ...
> +void *roaring_or_ewah_bitmap_init(void)
> +{
> + switch (bitmap_type)
> + {
(Style)
> + case EWAH:
> + return ewah_new();
> + case ROARING:
> + return roaring_bitmap_create();
> + default:
... here, you use it to mean exactly that.
> + error(_("bitmap type not initialized\n"));
> + return NULL;
Do you really need the global variable that holds the bitmap type?
Wouldn't it be easier to write code that needs to deal with both
types (e.g. in a repository with existing ewah bitmap, you want to
do a repack and index the result using the roaring bitmap) if you
passed the type through the callchain as a parameter?
It may be that the codepath that reads from an existing bitmap file
says "ah, the file given to us seems to be in format X (either EWAH
or ROARING or perhaps something else), so let's call bitmap_init(X)
to obtain the in-core data structure to deal with that file". When
that happens, you may probably need to have two cases in the default:
arm of this switch statement, i.e. one to diagnose a BUG() to pass
an uninitialized bitmap type to the codepath, and the other to
diagnose a runtime error() to have read a bitmap file whose format
this version of Git does not understand.
> +void *roaring_or_raw_bitmap_copy(void *bitmap)
> +{
> + switch (bitmap_type)
> + {
> + case EWAH:
> ...
> +int roaring_or_ewah_bitmap_set(void *bitmap, uint32_t i)
> +{
> + switch (bitmap_type) {
> + case EWAH:
> +...
> +void roaring_or_raw_bitmap_set(void *bitmap, uint32_t i)
> +{
> + switch (bitmap_type)
> + {
> + case EWAH:
> +...
> +void roaring_or_raw_bitmap_unset(void *bitmap, uint32_t i)
> +{
> + switch (bitmap_type)
> + {
> + case EWAH:
> +...
These repetitive patterns makes me wonder if void *bitmap
is a good type to be passing around. Shouldn't it be a struct with
its first member being a bitmap_type, and another member being what
these functions are passing to the underlying bitmap format specific
functions as "bitmap"? E.g.
void bitmap_unset(struct bitmap *bm, uint32_t i)
{
switch (bm->type) {
case EWAH:
ewah_bitmap_remove(bm->u.ewah, i);
break;
...
> \ No newline at end of file
Careful.
> diff --git a/bitmap.h b/bitmap.h
> new file mode 100644
> index 00000000000..d75400922cc
> --- /dev/null
> +++ b/bitmap.h
> @@ -0,0 +1,33 @@
> +#ifndef __BITMAP_H__
> +#define __BITMAP_H__
> +
> +
> +#include "git-compat-util.h"
> +#include "ewah/ewok.h"
> +#include "roaring/roaring.h"
> +
> +enum bitmap_type {
> + INIT_BITMAP_TYPE = 0,
"UNINITIALIZED_BITMAP_TYPE", probably.
> +void *roaring_or_ewah_bitmap_init(void);
I would strongly suggest reconsider these names. What if you later
want to add the third variant? roaring_or_ewah_or_xyzzy_bitmap_init()?
Instead just use the most generic name, like "bitmap_init", perhaps
something along the lines of ...
struct bitmap {
enum bitmap_type type;
union {
struct ewah_bitmap *ewah;
struct roaring_bitmap *roaring;
} u;
};
struct bitmap *bitmap_new(enum bitmap_type type)
{
struct bitmap *bm = xmalloc(sizeof(*bm));
bm->type = type;
switch (bm->type) {
case EWAH:
bm->u.ewah = ewah_new();
break;
case ROARING:
bm->u.roaring = roaring_bitmap_create();
break;
default:
die(_("unknown bitmap type %d"), (int)type);
}
return bm;
}
I dunno.
next prev parent reply other threads:[~2022-09-30 6:21 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-19 17:47 [PATCH 0/5] [RFC] introduce Roaring bitmaps to Git Abhradeep Chakraborty via GitGitGadget
2022-09-19 17:47 ` [PATCH 1/5] reachability-bitmaps: add CRoaring library " Abhradeep Chakraborty via GitGitGadget
2022-09-19 17:47 ` [PATCH 2/5] roaring.[ch]: apply Git specific changes to the roaring API Abhradeep Chakraborty via GitGitGadget
2022-09-19 18:33 ` Derrick Stolee
2022-09-19 22:02 ` Junio C Hamano
2022-09-20 12:19 ` Derrick Stolee
2022-09-20 15:09 ` Abhradeep Chakraborty
2022-09-21 16:58 ` Junio C Hamano
2022-09-20 14:46 ` Abhradeep Chakraborty
2022-09-19 17:47 ` [PATCH 3/5] roaring: teach Git to write roaring bitmaps Abhradeep Chakraborty via GitGitGadget
2022-09-30 6:20 ` Junio C Hamano [this message]
2022-09-30 16:23 ` Abhradeep Chakraborty
2022-10-30 6:35 ` Abhradeep Chakraborty
2022-10-30 19:46 ` Derrick Stolee
2022-10-31 14:30 ` Abhradeep Chakraborty
2022-10-31 16:06 ` Junio C Hamano
2022-10-31 17:51 ` C99 -> C11 or C17? (was: [PATCH 3/5] roaring: teach Git to write roaring bitmaps) Ævar Arnfjörð Bjarmason
2022-10-31 20:55 ` rsbecker
2022-11-01 6:58 ` [PATCH 3/5] roaring: teach Git to write roaring bitmaps Abhradeep Chakraborty
2022-09-19 17:47 ` [PATCH 4/5] roaring: introduce a new config option for " Abhradeep Chakraborty via GitGitGadget
2022-09-19 17:47 ` [PATCH 5/5] roaring: teach Git to read " Abhradeep Chakraborty via GitGitGadget
2022-09-19 18:18 ` [PATCH 0/5] [RFC] introduce Roaring bitmaps to Git Derrick Stolee
2022-09-20 14:05 ` Abhradeep Chakraborty
2022-09-20 21:59 ` Taylor Blau
2022-09-21 15:27 ` Abhradeep Chakraborty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqczbdl6wl.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=chakrabortyabhradeep79@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=kaartic.sivaraam@gmail.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.