From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 4/8] add functions for memory-efficient bitmaps
Date: Wed, 25 Jun 2014 19:40:01 -0400 [thread overview]
Message-ID: <20140625234000.GD23146@sigill.intra.peff.net> (raw)
In-Reply-To: <20140625233429.GA20457@sigill.intra.peff.net>
We already have a nice-to-use bitmap implementation in
ewah/bitmap.c. It pretends to be infinitely long when asking
for a bit (and just returns 0 for bits that haven't been
allocated or set), and dynamically resizes as appropriate
when you set bits.
The cost to this is that each bitmap must store its own
pointer and length, using up to 16 bytes per bitmap on top
of the actual bit storage. This is a lot of storage (not to
mention an extra level of pointer indirection) if you are
going to store one bitmap per commit in a traversal.
These functions provide an alternative bitmap implementation
that can be used when you have a large number of fixed-size
bitmaps. See the documentation in the header file for
details and examples.
Signed-off-by: Jeff King <peff@peff.net>
---
Makefile | 1 +
bitset.h | 113 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 114 insertions(+)
create mode 100644 bitset.h
diff --git a/Makefile b/Makefile
index 07ea105..8158fd2 100644
--- a/Makefile
+++ b/Makefile
@@ -633,6 +633,7 @@ LIB_H += archive.h
LIB_H += argv-array.h
LIB_H += attr.h
LIB_H += bisect.h
+LIB_H += bitset.h
LIB_H += blob.h
LIB_H += branch.h
LIB_H += builtin.h
diff --git a/bitset.h b/bitset.h
new file mode 100644
index 0000000..5fbc956
--- /dev/null
+++ b/bitset.h
@@ -0,0 +1,113 @@
+#ifndef BITSET_H
+#define BITSET_H
+
+/*
+ * This header file provides functions for operating on an array of unsigned
+ * characters as a bitmap. There is zero per-bitset storage overhead beyond the
+ * actual number of stored bits (modulo some padding). This is efficient, but
+ * makes the API harder to use. In particular, each bitset does not know how
+ * long it is. It is the caller's responsibility to:
+ *
+ * 1. Never ask to get or set a bit outside of the allocated range.
+ *
+ * 2. Provide the allocated range to any functions which operate
+ * on the whole bitset (e.g., bitset_or).
+ *
+ * 3. Always feed bitsets of the same size to functions which require it
+ * (e.g., bitset_or).
+ *
+ * It is mostly intended to be used with commit-slabs to store N bits per
+ * commit. Here's an example:
+ *
+ * define_commit_slab(bit_slab, unsigned char);
+ *
+ * ... assume we want to store nr bits per commit ...
+ * struct bit_slab bits;
+ * init_bit_slab_with_stride(&bits, bitset_sizeof(nr));
+ *
+ * ... set a bit (make sure 10 < nr!) ...
+ * bitset_set(bit_slab_at(&bits, commit), 10);
+ *
+ * ... or get a bit ...
+ * if (bitset_get(bit_slab_at(&bits, commit), 5))
+ *
+ * ... propagate bits to a parent commit ...
+ * bitset_or(bit_slab_at(&bits, parent),
+ * bit_slab_at(&bits, commit),
+ * nr);
+ */
+
+/*
+ * Return the number of unsigned chars required to store num_bits bits.
+ *
+ * This is mostly used internally by the bitset functions, but you may need it
+ * when allocating the bitset. Example:
+ *
+ * bits = xcalloc(1, bitset_sizeof(nr));
+ */
+static inline int bitset_sizeof(int num_bits)
+{
+ return (num_bits + CHAR_BIT - 1) / CHAR_BIT;
+}
+
+/*
+ * Set the bit at position "n". "n" is counted from zero, and must be
+ * smaller than the num_bits given to bitset_sizeof when allocating the bitset.
+ */
+static inline void bitset_set(unsigned char *bits, int n)
+{
+ bits[n / CHAR_BIT] |= 1 << (n % CHAR_BIT);
+}
+
+/*
+ * Return the bit at position "n" (see bitset_set for a description of "n").
+ */
+static inline int bitset_get(unsigned char *bits, int n)
+{
+ return !!(bits[n / CHAR_BIT] & (1 << (n % CHAR_BIT)));
+}
+
+/*
+ * Return true iff the bitsets contain the same bits. Each bitset should be the
+ * same size, and should have been allocated using bitset_sizeof(max).
+ *
+ * Note that it is not safe to check partial equality by providing a smaller
+ * "max" (we assume any bits beyond "max" up to the next CHAR_BIT boundary are
+ * zeroed padding).
+ */
+static inline int bitset_equal(unsigned char *a, unsigned char *b, int max)
+{
+ int i;
+ for (i = bitset_sizeof(max); i > 0; i--)
+ if (*a++ != *b++)
+ return 0;
+ return 1;
+}
+
+/*
+ * Bitwise-or the bitsets in "dst" and "src", and store the result in "dst".
+ *
+ * See bitset_equal for the definition of "max".
+ */
+static inline void bitset_or(unsigned char *dst, const unsigned char *src, int max)
+{
+ int i;
+ for (i = bitset_sizeof(max); i > 0; i--)
+ *dst++ |= *src++;
+}
+
+/*
+ * Returns true iff the bitset contains all zeroes.
+ *
+ * See bitset_equal for the definition of "max".
+ */
+static inline int bitset_empty(const unsigned char *bits, int max)
+{
+ int i;
+ for (i = bitset_sizeof(max); i > 0; i--, bits++)
+ if (*bits)
+ return 0;
+ return 1;
+}
+
+#endif /* BITSET_H */
--
2.0.0.566.gfe3e6b2
next prev parent reply other threads:[~2014-06-25 23:40 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-25 23:34 [PATCH 0/8] use merge-base for tag --contains Jeff King
2014-06-25 23:35 ` [PATCH 1/8] tag: allow --sort with -n Jeff King
2014-06-25 23:35 ` [PATCH 2/8] tag: factor out decision to stream tags Jeff King
2014-06-25 23:39 ` [PATCH 3/8] paint_down_to_common: use prio_queue Jeff King
2014-07-01 16:23 ` Junio C Hamano
2014-07-01 17:10 ` Jeff King
2014-06-25 23:40 ` Jeff King [this message]
2014-06-26 3:15 ` [PATCH 4/8] add functions for memory-efficient bitmaps Torsten Bögershausen
2014-06-26 15:51 ` Jeff King
2014-06-29 7:41 ` Eric Sunshine
2014-06-30 17:07 ` Jeff King
2014-07-01 16:57 ` Junio C Hamano
2014-07-01 17:18 ` Jeff King
2014-06-25 23:42 ` [PATCH 5/8] string-list: add pos to iterator callback Jeff King
2014-07-01 17:45 ` Junio C Hamano
2014-07-01 19:00 ` Jeff King
2014-06-25 23:47 ` [PATCH 6/8] commit: provide a fast multi-tip contains function Jeff King
2014-06-26 18:55 ` Junio C Hamano
2014-06-26 19:19 ` Junio C Hamano
2014-06-26 19:26 ` Junio C Hamano
2014-07-01 18:16 ` Junio C Hamano
2014-07-01 19:14 ` Junio C Hamano
2014-06-25 23:49 ` [PATCH 7/8] tag: use commit_contains Jeff King
2014-06-25 23:53 ` [PATCH 8/8] perf: add tests for tag --contains Jeff King
2014-06-26 0:01 ` Jeff King
2014-06-26 0:04 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140625234000.GD23146@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).