* [PATCH 0/3] Using commit slab to replace indegree
@ 2013-04-14 6:04 Junio C Hamano
2013-04-14 6:04 ` [PATCH 1/3] commit: allow associating auxiliary info on-demand Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Junio C Hamano @ 2013-04-14 6:04 UTC (permalink / raw)
To: git; +Cc: Jeff King
The first one is Jeff's "here is what I have now" technology
demonstration.
The second attempts to iron out one kink in it, and then the third
one introduces a macro to allow other code to replicate exactly the
same code structure to support their uses.
Jeff King (1):
commit: allow associating auxiliary info on-demand
Junio C Hamano (2):
commit-slab: avoid large realloc
commit-slab: introduce a macro to define a slab for new type
commit-slab.h | 95 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
commit.c | 32 ++++++++++++++------
commit.h | 2 +-
3 files changed, 119 insertions(+), 10 deletions(-)
create mode 100644 commit-slab.h
--
1.8.2.1-514-gf369d36
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] commit: allow associating auxiliary info on-demand
2013-04-14 6:04 [PATCH 0/3] Using commit slab to replace indegree Junio C Hamano
@ 2013-04-14 6:04 ` Junio C Hamano
2013-04-14 15:12 ` Jeff King
2013-04-14 6:04 ` [PATCH 2/3] commit-slab: avoid large realloc Junio C Hamano
2013-04-14 6:04 ` [PATCH 3/3] commit-slab: introduce a macro to define a slab for new type Junio C Hamano
2 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2013-04-14 6:04 UTC (permalink / raw)
To: git; +Cc: Jeff King
From: Jeff King <peff@peff.net>
The "indegree" field in the commit object is only used while sorting
a list of commits in topological order, and wasting memory otherwise.
We would prefer to shrink the size of individual commit objects,
which we may have to hold thousands of in-core. We could eject
"indegree" field out from the commit object and represent it as a
dynamic table based on the decoration infrastructure, but the
decoration is meant for sparse annotation and is not a good match.
Instead, let's try a different approach.
- Assign an integer (commit->index) to each commit we keep in-core
(reuse the space of "indegree" field for it);
- When running the topological sort, allocate an array of integers
in bulk (called "slab"), use the commit->index as an index into
this array, and store the "indegree" information there.
This does _not_ reduce the memory footprint of a commit object, but
the commit->index can be used as the index to dynamically associate
commits with other kinds of information as needed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
commit.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
commit.h | 2 +-
2 files changed, 51 insertions(+), 10 deletions(-)
diff --git a/commit.c b/commit.c
index 1a41757..9365e3b 100644
--- a/commit.c
+++ b/commit.c
@@ -14,6 +14,7 @@ static struct commit_extra_header *read_commit_extra_header_lines(const char *bu
int save_commit_buffer = 1;
const char *commit_type = "commit";
+static int commit_count;
static struct commit *check_commit(struct object *obj,
const unsigned char *sha1,
@@ -58,8 +59,11 @@ struct commit *lookup_commit_or_die(const unsigned char *sha1, const char *ref_n
struct commit *lookup_commit(const unsigned char *sha1)
{
struct object *obj = lookup_object(sha1);
- if (!obj)
- return create_object(sha1, OBJ_COMMIT, alloc_commit_node());
+ if (!obj) {
+ struct commit *c = alloc_commit_node();
+ c->index = commit_count++;
+ return create_object(sha1, OBJ_COMMIT, c);
+ }
if (!obj->type)
obj->type = OBJ_COMMIT;
return check_commit(obj, sha1, 0);
@@ -497,6 +501,36 @@ struct commit *pop_commit(struct commit_list **stack)
return item;
}
+struct commit_slab {
+ int *buf;
+ int alloc;
+};
+
+static void slab_init(struct commit_slab *s)
+{
+ memset(s, 0, sizeof(*s));
+}
+
+static void slab_clear(struct commit_slab *s)
+{
+ free(s->buf);
+ slab_init(s);
+}
+
+static inline int *slab_at(struct commit_slab *s, const struct commit *c)
+{
+ if (s->alloc <= c->index) {
+ int new_alloc = alloc_nr(s->alloc);
+ if (new_alloc <= c->index)
+ new_alloc = c->index + 1;
+
+ s->buf = xrealloc(s->buf, new_alloc * sizeof(*s->buf));
+ memset(s->buf + s->alloc, 0, new_alloc - s->alloc);
+ s->alloc = new_alloc;
+ }
+ return s->buf + c->index;
+}
+
/*
* Performs an in-place topological sort on the list supplied.
*/
@@ -505,15 +539,18 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
struct commit_list *next, *orig = *list;
struct commit_list *work, **insert;
struct commit_list **pptr;
+ struct commit_slab indegree;
if (!orig)
return;
*list = NULL;
+ slab_init(&indegree);
+
/* Mark them and clear the indegree */
for (next = orig; next; next = next->next) {
struct commit *commit = next->item;
- commit->indegree = 1;
+ *slab_at(&indegree, commit) = 1;
}
/* update the indegree */
@@ -521,9 +558,10 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
struct commit_list * parents = next->item->parents;
while (parents) {
struct commit *parent = parents->item;
+ int *pi = slab_at(&indegree, parent);
- if (parent->indegree)
- parent->indegree++;
+ if (*pi)
+ (*pi)++;
parents = parents->next;
}
}
@@ -540,7 +578,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
for (next = orig; next; next = next->next) {
struct commit *commit = next->item;
- if (commit->indegree == 1)
+ if (*slab_at(&indegree, commit) == 1)
insert = &commit_list_insert(commit, insert)->next;
}
@@ -561,8 +599,9 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
commit = work_item->item;
for (parents = commit->parents; parents ; parents = parents->next) {
struct commit *parent = parents->item;
+ int *pi = slab_at(&indegree, parent);
- if (!parent->indegree)
+ if (!*pi)
continue;
/*
@@ -570,7 +609,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
* when all their children have been emitted thereby
* guaranteeing topological order.
*/
- if (--parent->indegree == 1) {
+ if (--(*pi) == 1) {
if (!lifo)
commit_list_insert_by_date(parent, &work);
else
@@ -581,10 +620,12 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
* work_item is a commit all of whose children
* have already been emitted. we can emit it now.
*/
- commit->indegree = 0;
+ *slab_at(&indegree, commit) = 0;
*pptr = work_item;
pptr = &work_item->next;
}
+
+ slab_clear(&indegree);
}
/* merge-base stuff */
diff --git a/commit.h b/commit.h
index 252c7f8..70e749d 100644
--- a/commit.h
+++ b/commit.h
@@ -14,7 +14,7 @@ struct commit_list {
struct commit {
struct object object;
void *util;
- unsigned int indegree;
+ unsigned int index;
unsigned long date;
struct commit_list *parents;
struct tree *tree;
--
1.8.2.1-514-gf369d36
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/3] commit-slab: avoid large realloc
2013-04-14 6:04 [PATCH 0/3] Using commit slab to replace indegree Junio C Hamano
2013-04-14 6:04 ` [PATCH 1/3] commit: allow associating auxiliary info on-demand Junio C Hamano
@ 2013-04-14 6:04 ` Junio C Hamano
2013-04-14 15:28 ` Jeff King
2013-04-14 6:04 ` [PATCH 3/3] commit-slab: introduce a macro to define a slab for new type Junio C Hamano
2 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2013-04-14 6:04 UTC (permalink / raw)
To: git; +Cc: Jeff King
Instead of using a single "slab" and keep reallocating it as we find
that we need to deal with commits with larger values of commit->index,
make a "slab" an array of many "slab_piece"s. Each access may need
two levels of indirections, but we only need to reallocate the first
level array of pointers when we have to grow the table this way.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
commit.c | 62 ++++++++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 42 insertions(+), 20 deletions(-)
diff --git a/commit.c b/commit.c
index 9365e3b..4c05b39 100644
--- a/commit.c
+++ b/commit.c
@@ -501,34 +501,56 @@ struct commit *pop_commit(struct commit_list **stack)
return item;
}
+struct commit_slab_piece {
+ int buf;
+};
+
struct commit_slab {
- int *buf;
- int alloc;
+ int piece_size;
+ int piece_count;
+ struct commit_slab_piece **piece;
};
static void slab_init(struct commit_slab *s)
{
- memset(s, 0, sizeof(*s));
+ /* allocate ~512kB at once, allowing for malloc overhead */
+ int size = (512*1024-32) / sizeof(struct commit_slab_piece);
+
+ s->piece_size = size;
+ s->piece_count = 0;
+ s->piece = NULL;
}
static void slab_clear(struct commit_slab *s)
{
- free(s->buf);
- slab_init(s);
+ int i;
+
+ for (i = 0; i < s->piece_count; i++)
+ free(s->piece[i]);
+ s->piece_count = 0;
+ free(s->piece);
+ s->piece = NULL;
}
-static inline int *slab_at(struct commit_slab *s, const struct commit *c)
+static inline struct commit_slab_piece *slab_at(struct commit_slab *s,
+ const struct commit *c)
{
- if (s->alloc <= c->index) {
- int new_alloc = alloc_nr(s->alloc);
- if (new_alloc <= c->index)
- new_alloc = c->index + 1;
-
- s->buf = xrealloc(s->buf, new_alloc * sizeof(*s->buf));
- memset(s->buf + s->alloc, 0, new_alloc - s->alloc);
- s->alloc = new_alloc;
+ int nth_piece, nth_slot;
+
+ nth_piece = c->index / s->piece_size;
+ nth_slot = c->index % s->piece_size;
+
+ if (s->piece_count <= nth_piece) {
+ int i;
+
+ s->piece = xrealloc(s->piece, (nth_piece + 1) * sizeof(s->piece));
+ for (i = s->piece_count; i <= nth_piece; i++)
+ s->piece[i] = NULL;
+ s->piece_count = nth_piece + 1;
}
- return s->buf + c->index;
+ if (!s->piece[nth_piece])
+ s->piece[nth_piece] = xcalloc(s->piece_size, sizeof(**s->piece));
+ return &s->piece[nth_piece][nth_slot];
}
/*
@@ -550,7 +572,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
/* Mark them and clear the indegree */
for (next = orig; next; next = next->next) {
struct commit *commit = next->item;
- *slab_at(&indegree, commit) = 1;
+ slab_at(&indegree, commit)->buf = 1;
}
/* update the indegree */
@@ -558,7 +580,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
struct commit_list * parents = next->item->parents;
while (parents) {
struct commit *parent = parents->item;
- int *pi = slab_at(&indegree, parent);
+ int *pi = &slab_at(&indegree, parent)->buf;
if (*pi)
(*pi)++;
@@ -578,7 +600,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
for (next = orig; next; next = next->next) {
struct commit *commit = next->item;
- if (*slab_at(&indegree, commit) == 1)
+ if (slab_at(&indegree, commit)->buf == 1)
insert = &commit_list_insert(commit, insert)->next;
}
@@ -599,7 +621,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
commit = work_item->item;
for (parents = commit->parents; parents ; parents = parents->next) {
struct commit *parent = parents->item;
- int *pi = slab_at(&indegree, parent);
+ int *pi = &slab_at(&indegree, parent)->buf;
if (!*pi)
continue;
@@ -620,7 +642,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
* work_item is a commit all of whose children
* have already been emitted. we can emit it now.
*/
- *slab_at(&indegree, commit) = 0;
+ slab_at(&indegree, commit)->buf = 0;
*pptr = work_item;
pptr = &work_item->next;
}
--
1.8.2.1-514-gf369d36
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/3] commit-slab: introduce a macro to define a slab for new type
2013-04-14 6:04 [PATCH 0/3] Using commit slab to replace indegree Junio C Hamano
2013-04-14 6:04 ` [PATCH 1/3] commit: allow associating auxiliary info on-demand Junio C Hamano
2013-04-14 6:04 ` [PATCH 2/3] commit-slab: avoid large realloc Junio C Hamano
@ 2013-04-14 6:04 ` Junio C Hamano
2013-04-14 18:41 ` Jeff King
2 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2013-04-14 6:04 UTC (permalink / raw)
To: git; +Cc: Jeff King
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit,
you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that,
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "stride" variant.
Suppose you want to give one bit per existing ref and paint commits
down to find which refs are descendants of each commit. You find
that you have 320 refs only at runtime.
The code can declare a commit slab "struct flagbits"
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs that returns say 320 ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
commit-slab.h | 95 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
commit.c | 69 +++++++------------------------------------
2 files changed, 105 insertions(+), 59 deletions(-)
create mode 100644 commit-slab.h
diff --git a/commit-slab.h b/commit-slab.h
new file mode 100644
index 0000000..8030885
--- /dev/null
+++ b/commit-slab.h
@@ -0,0 +1,95 @@
+#ifndef COMMIT_SLAB_H
+#define COMMIT_SLAB_H
+
+/*
+ * define_commit_slab(slabname, elemtype) creates boilerplate code to define
+ * a new struct (struct slabname) that is used to associate a piece of data
+ * of elemtype to commits, and a few functions to use that struct.
+ *
+ * After including this header file, using:
+ *
+ * define_commit_slab(indegee, int);
+ *
+ * will let you call the following functions:
+ *
+ * - int *indegree_at(struct indegree *, struct commit *);
+ *
+ * This function locates the data associated with the given commit in
+ * the indegree slab, and returns the pointer to it.
+ *
+ * - void init_indegree(struct indegree *, int stride);
+ *
+ * Initializes the indegree slab that associates an array of integers
+ * to each commit. 'stride' specifies how big each array is.
+ */
+
+/* allocate ~512kB at once, allowing for malloc overhead */
+#ifndef COMMIT_SLAB_SIZE
+#define COMMIT_SLAB_SIZE (512*1024-32)
+#endif
+
+#define define_commit_slab(slabname, elemtype) \
+ \
+struct slabname { \
+ unsigned slab_size; \
+ unsigned stride; \
+ unsigned slab_count; \
+ elemtype **slab; \
+}; \
+static int stat_ ##slabname## realloc; \
+ \
+static void init_ ##slabname## _with_stride(struct slabname *s, \
+ unsigned stride) \
+{ \
+ unsigned int elem_size; \
+ if (!stride) \
+ stride = 1; \
+ s->stride = stride; \
+ elem_size = sizeof(struct slabname) * stride; \
+ s->slab_size = COMMIT_SLAB_SIZE / elem_size; \
+ s->slab_count = 0; \
+ s->slab = NULL; \
+} \
+ \
+static void init_ ##slabname(struct slabname *s) \
+{ \
+ init_ ##slabname## _with_stride(s, 1); \
+} \
+ \
+static void clear_ ##slabname(struct slabname *s) \
+{ \
+ int i; \
+ for (i = 0; i < s->slab_count; i++) \
+ free(s->slab[i]); \
+ s->slab_count = 0; \
+ free(s->slab); \
+ s->slab = NULL; \
+} \
+ \
+static elemtype *slabname## _at(struct slabname *s, \
+ const struct commit *c) \
+{ \
+ int nth_slab, nth_slot, ix; \
+ \
+ ix = c->index * s->stride; \
+ nth_slab = ix / s->slab_size; \
+ nth_slot = ix % s->slab_size; \
+ \
+ if (s->slab_count <= nth_slab) { \
+ int i; \
+ s->slab = xrealloc(s->slab, \
+ (nth_slab + 1) * sizeof(s->slab)); \
+ stat_ ##slabname## realloc++; \
+ for (i = s->slab_count; i <= nth_slab; i++) \
+ s->slab[i] = NULL; \
+ s->slab_count = nth_slab + 1; \
+ } \
+ if (!s->slab[nth_slab]) \
+ s->slab[nth_slab] = xcalloc(s->slab_size, \
+ sizeof(**s->slab)); \
+ return &s->slab[nth_slab][nth_slot]; \
+} \
+ \
+static int stat_ ##slabname## realloc
+
+#endif /* COMMIT_SLAB_H */
diff --git a/commit.c b/commit.c
index 4c05b39..66a6c00 100644
--- a/commit.c
+++ b/commit.c
@@ -8,6 +8,7 @@
#include "notes.h"
#include "gpg-interface.h"
#include "mergesort.h"
+#include "commit-slab.h"
static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
@@ -501,57 +502,7 @@ struct commit *pop_commit(struct commit_list **stack)
return item;
}
-struct commit_slab_piece {
- int buf;
-};
-
-struct commit_slab {
- int piece_size;
- int piece_count;
- struct commit_slab_piece **piece;
-};
-
-static void slab_init(struct commit_slab *s)
-{
- /* allocate ~512kB at once, allowing for malloc overhead */
- int size = (512*1024-32) / sizeof(struct commit_slab_piece);
-
- s->piece_size = size;
- s->piece_count = 0;
- s->piece = NULL;
-}
-
-static void slab_clear(struct commit_slab *s)
-{
- int i;
-
- for (i = 0; i < s->piece_count; i++)
- free(s->piece[i]);
- s->piece_count = 0;
- free(s->piece);
- s->piece = NULL;
-}
-
-static inline struct commit_slab_piece *slab_at(struct commit_slab *s,
- const struct commit *c)
-{
- int nth_piece, nth_slot;
-
- nth_piece = c->index / s->piece_size;
- nth_slot = c->index % s->piece_size;
-
- if (s->piece_count <= nth_piece) {
- int i;
-
- s->piece = xrealloc(s->piece, (nth_piece + 1) * sizeof(s->piece));
- for (i = s->piece_count; i <= nth_piece; i++)
- s->piece[i] = NULL;
- s->piece_count = nth_piece + 1;
- }
- if (!s->piece[nth_piece])
- s->piece[nth_piece] = xcalloc(s->piece_size, sizeof(**s->piece));
- return &s->piece[nth_piece][nth_slot];
-}
+define_commit_slab(indegree_slab, int);
/*
* Performs an in-place topological sort on the list supplied.
@@ -561,18 +512,18 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
struct commit_list *next, *orig = *list;
struct commit_list *work, **insert;
struct commit_list **pptr;
- struct commit_slab indegree;
+ struct indegree_slab indegree;
if (!orig)
return;
*list = NULL;
- slab_init(&indegree);
+ init_indegree_slab(&indegree);
/* Mark them and clear the indegree */
for (next = orig; next; next = next->next) {
struct commit *commit = next->item;
- slab_at(&indegree, commit)->buf = 1;
+ *(indegree_slab_at(&indegree, commit)) = 1;
}
/* update the indegree */
@@ -580,7 +531,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
struct commit_list * parents = next->item->parents;
while (parents) {
struct commit *parent = parents->item;
- int *pi = &slab_at(&indegree, parent)->buf;
+ int *pi = indegree_slab_at(&indegree, parent);
if (*pi)
(*pi)++;
@@ -600,7 +551,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
for (next = orig; next; next = next->next) {
struct commit *commit = next->item;
- if (slab_at(&indegree, commit)->buf == 1)
+ if (*(indegree_slab_at(&indegree, commit)) == 1)
insert = &commit_list_insert(commit, insert)->next;
}
@@ -621,7 +572,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
commit = work_item->item;
for (parents = commit->parents; parents ; parents = parents->next) {
struct commit *parent = parents->item;
- int *pi = &slab_at(&indegree, parent)->buf;
+ int *pi = indegree_slab_at(&indegree, parent);
if (!*pi)
continue;
@@ -642,12 +593,12 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
* work_item is a commit all of whose children
* have already been emitted. we can emit it now.
*/
- slab_at(&indegree, commit)->buf = 0;
+ *(indegree_slab_at(&indegree, commit)) = 0;
*pptr = work_item;
pptr = &work_item->next;
}
- slab_clear(&indegree);
+ clear_indegree_slab(&indegree);
}
/* merge-base stuff */
--
1.8.2.1-514-gf369d36
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] commit: allow associating auxiliary info on-demand
2013-04-14 6:04 ` [PATCH 1/3] commit: allow associating auxiliary info on-demand Junio C Hamano
@ 2013-04-14 15:12 ` Jeff King
2013-04-14 19:01 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2013-04-14 15:12 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sat, Apr 13, 2013 at 11:04:47PM -0700, Junio C Hamano wrote:
> From: Jeff King <peff@peff.net>
>
> The "indegree" field in the commit object is only used while sorting
> a list of commits in topological order, and wasting memory otherwise.
>
> We would prefer to shrink the size of individual commit objects,
> which we may have to hold thousands of in-core. We could eject
> "indegree" field out from the commit object and represent it as a
> dynamic table based on the decoration infrastructure, but the
> decoration is meant for sparse annotation and is not a good match.
>
> Instead, let's try a different approach.
>
> - Assign an integer (commit->index) to each commit we keep in-core
> (reuse the space of "indegree" field for it);
>
> - When running the topological sort, allocate an array of integers
> in bulk (called "slab"), use the commit->index as an index into
> this array, and store the "indegree" information there.
>
> This does _not_ reduce the memory footprint of a commit object, but
> the commit->index can be used as the index to dynamically associate
> commits with other kinds of information as needed.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Thanks, this writeup makes sense to me. Obviously,
Signed-off-by: Jeff King <peff@peff.net>
I had planned to clean it up into a more general form, as you did in
patch 3, before submitting it, but doing it like this with the cleanup
on top is fine with me (and it makes the attribution of the work much
clearer).
The rest of this email ended up long and rambly, and is kind of a dump
of the pending thoughts I had on the slab approach. Don't take it as "do
not do this patch", but rather "here are some areas to explore on top".
One thing you did gloss over a bit is "the decoration is meant for
sparse annotation". I do wonder what the performance would look like for
implementing --topo-order fully via "struct decoration". Given that our
hash function is a direct memory access of object->sha1, and that with a
reasonable load factor in the table, most lookups should not hit any
collision, it is not too expensive to do a lookup. I think the biggest
cost would be that we have to do a full hashcmp() for each access to
actually _check_ the collision. So it probably is noticeably slower due
to that, but I didn't actually measure it.
Assuming the slab technique is faster, I suspect there are some "struct
decoration" cases that should be switched. Unfortunately the most
obvious one to me would be the mark storage in fast-export, but that is
used for object types other than commit. And I don't think we want to
move the slab index into "struct object", for two reasons:
1. It bloats the size of blob and tree objects, which we have more of
than commits (especially in something like "rev-list --objects
--all").
2. Your commit indices become less dense, so you have to use a bigger
slab, which is wasteful and will have worse memory cache performance.
We could maybe mitigate that by keeping a per-type index counter,
but that complicates the slab lookup (it would have to dereference
based on type).
I also note that we still have commit->util, which serves a similar
purpose to the slab index, and is often used when we are attaching data
to a lot of commits (and in fact, --topo-sort could probably just be
implemented via commit->util).
Comparing a slab index versus a util pointer, I think the differences
should be:
1. with slab, you have an extra level of indirection ("x + c->index"
rather than "c->util"), which might be measurably slower.
2. with slab, you do not have to worry about conflicting with other
users of "struct commit" in the same program
3. with util, you can squeeze values smaller than a pointer into
"struct commit"; technically you can do this with the slab index,
too, but that would negate (2) above. However, I don't think we
even use "util" that way anywhere currently.
4. util pointers can point multiple commits to the same chunk of
memory. A slab index can do that, too, but you have an extra level
of indirection (you store a slab of pointers to items, some of
which are shared).
5. with a util pointer, we typically point to an malloc'd chunk of
memory. With a slab, you avoid per-commit allocation in favor of
allocating a big chunk of slab at once. However, you should be able
to do the same thing with a util pointer (i.e., allocate from a
pool and just point into it).
So I think the util pointer is strictly better, except for point (2). And
the main cost is point (1), the extra indirection. So if we can measure
(1) and decide it isn't a big cost, the maintainability improvement for
(2) is probably worth it, and we can rip out the util pointer entirely,
saving us some extra space in "struct commit".
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] commit-slab: avoid large realloc
2013-04-14 6:04 ` [PATCH 2/3] commit-slab: avoid large realloc Junio C Hamano
@ 2013-04-14 15:28 ` Jeff King
2013-04-14 18:51 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2013-04-14 15:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sat, Apr 13, 2013 at 11:04:48PM -0700, Junio C Hamano wrote:
> Instead of using a single "slab" and keep reallocating it as we find
> that we need to deal with commits with larger values of commit->index,
> make a "slab" an array of many "slab_piece"s. Each access may need
> two levels of indirections, but we only need to reallocate the first
> level array of pointers when we have to grow the table this way.
I don't know if shrinking the size of the realloc is all that big a
deal. We are doubling, so the allocation cost is already amortized
constant time.
Whereas the cost of the extra level of indirection is paid for every
lookup. So for some workloads, I can imagine this actually being slower
(I haven't yet measured it for --topo-order, though).
An interesting side effect of this patch is that the pointers returned
by slab_at() are stable until slab_clear(). It doesn't matter for
--topo-order, but if, for example, you had a recursive function, you
would not have to worry about invalidation in sub-functions. I.e.,
before this patch, this would be wrong:
static struct commit_slab generation_cache;
int commit_generation(struct commit *c)
{
int *gp = slab_at(&generation_cache, c);
if (!*gp) {
int max = 0;
struct commit_list *p;
parse_commit(c);
for (p = c->parents; p; p = p->next) {
int g = commit_generation(p->item);
if (max < g)
max = g;
}
*gp = max + 1;
}
return *gp;
}
because the recursive calls might invalidate the "gp" pointer (you would
have to call slab_at repeatedly instead). Whereas with your patch, it's
fine.
> struct commit_slab {
> - int *buf;
> - int alloc;
> + int piece_size;
> + int piece_count;
> + struct commit_slab_piece **piece;
> };
Is there a reason to make piece_size a struct member? It must be
constant after any pieces are allocated. Is the intent that callers
could do:
slab_init(&s);
/* I know ahead of time we are going to need N of these. */
s.piece_size = n;
? I think that is OK (though they do not need slab_* in that case at
all), but we should probably have a warning in the struct that
piece_size should never be touched except in that circumstance.
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] commit-slab: introduce a macro to define a slab for new type
2013-04-14 6:04 ` [PATCH 3/3] commit-slab: introduce a macro to define a slab for new type Junio C Hamano
@ 2013-04-14 18:41 ` Jeff King
2013-04-15 1:32 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2013-04-14 18:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sat, Apr 13, 2013 at 11:04:49PM -0700, Junio C Hamano wrote:
> Suppose you want to give one bit per existing ref and paint commits
> down to find which refs are descendants of each commit. You find
> that you have 320 refs only at runtime.
>
> The code can declare a commit slab "struct flagbits"
>
> define_commit_slab(flagbits, unsigned char);
> struct flagbits flags;
>
> and initialize it by:
>
> nrefs = ... count number of refs that returns say 320 ...
> init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
>
> so that
>
> unsigned char *fp = flagbits_at(&flags, commit);
>
> will return a pointer pointing at an array of 40 "unsigned char"s
> associated with the commit.
Thanks, I was thinking originally that we would want to break it down
into "unsigned long" or something, but there is probably no real
performance advantage to doing that over bytes.
I'd probably further wrap it with a flagbit_set and flagbit_tst to wrap
the "figure out which byte, then which bit of that byte" logic, but that
would be a wrapper around flagbits_at, anyway. It can come later.
> +static elemtype *slabname## _at(struct slabname *s, \
> + const struct commit *c) \
> +{ \
> + int nth_slab, nth_slot, ix; \
> + \
> + ix = c->index * s->stride; \
> + nth_slab = ix / s->slab_size; \
> + nth_slot = ix % s->slab_size; \
> + \
> + if (s->slab_count <= nth_slab) { \
> + int i; \
> + s->slab = xrealloc(s->slab, \
> + (nth_slab + 1) * sizeof(s->slab)); \
> + stat_ ##slabname## realloc++; \
> + for (i = s->slab_count; i <= nth_slab; i++) \
> + s->slab[i] = NULL; \
> + s->slab_count = nth_slab + 1; \
> + } \
> + if (!s->slab[nth_slab]) \
> + s->slab[nth_slab] = xcalloc(s->slab_size, \
> + sizeof(**s->slab)); \
> + return &s->slab[nth_slab][nth_slot]; \
> +} \
We'd probably want the hot path of this (returning the actual pointer)
to be inline, but not necessarily the parts about growing, which should
trigger a lot less. It may make sense to split the conditional bodies
out into a sub-function. And do we want to mark it with "inline"?
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] commit-slab: avoid large realloc
2013-04-14 15:28 ` Jeff King
@ 2013-04-14 18:51 ` Junio C Hamano
2013-04-14 19:19 ` Jeff King
0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2013-04-14 18:51 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> On Sat, Apr 13, 2013 at 11:04:48PM -0700, Junio C Hamano wrote:
>
>> Instead of using a single "slab" and keep reallocating it as we find
>> that we need to deal with commits with larger values of commit->index,
>> make a "slab" an array of many "slab_piece"s. Each access may need
>> two levels of indirections, but we only need to reallocate the first
>> level array of pointers when we have to grow the table this way.
>
> I don't know if shrinking the size of the realloc is all that big a
> deal. We are doubling, so the allocation cost is already amortized
> constant time.
I was more disturbed about copying the actual bytes. One of the
envisioned use of the mechanism is to give us unbound number of flag
bits to paint the history, and also this could be later used to
store larger structures per commit.
Also "you can now take a pointer" you illustrated (but I snipped
from here) is a good point.
>> struct commit_slab {
>> - int *buf;
>> - int alloc;
>> + int piece_size;
>> + int piece_count;
>> + struct commit_slab_piece **piece;
>> };
>
> Is there a reason to make piece_size a struct member? It must be
> constant after any pieces are allocated. Is the intent that callers
> could do:
>
> slab_init(&s);
> /* I know ahead of time we are going to need N of these. */
> s.piece_size = n;
The piece_size (later slab_size) is to hold the number of commits
each slice (i.e. the piece of memory s->slab[nth_slab] points at)
handles.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] commit: allow associating auxiliary info on-demand
2013-04-14 15:12 ` Jeff King
@ 2013-04-14 19:01 ` Junio C Hamano
0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2013-04-14 19:01 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> Thanks, this writeup makes sense to me. Obviously,
>
> Signed-off-by: Jeff King <peff@peff.net>
>
> Comparing a slab index versus a util pointer, I think the differences
> should be:
> ...
> So I think the util pointer is strictly better, except for point (2). And
> the main cost is point (1), the extra indirection. So if we can measure
> (1) and decide it isn't a big cost, the maintainability improvement for
> (2) is probably worth it, and we can rip out the util pointer entirely,
> saving us some extra space in "struct commit".
I am glad you brought up "util". In an earlier round of the indgree
thing, I think that it was discussed that casting the indegree to
(void *) and storing it there was a possibility. Because sorting
was so generic a thing everybody may want in their codepath, and
some callers may already have something they want to keep in util,
we ended up adding a separate field not to interfere with the callers'
use of the util field.
In the simpler days, a single util used with careful coorination
among various codepaths may have been manageable, but I tend to
think it will become more and more cumbersome to extend the system
correctly (the same can be said for the in-object flags). We should
encourage the users of the util field to migrate to either slab or
decorate in the longer term.
A criterion to choose between the two is probably the density. The
slab mechanism is for annotations that almost everybody involved in
the processing gets, while decorate is more suited for sparse ones.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] commit-slab: avoid large realloc
2013-04-14 18:51 ` Junio C Hamano
@ 2013-04-14 19:19 ` Jeff King
2013-04-15 1:29 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2013-04-14 19:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sun, Apr 14, 2013 at 11:51:40AM -0700, Junio C Hamano wrote:
> > I don't know if shrinking the size of the realloc is all that big a
> > deal. We are doubling, so the allocation cost is already amortized
> > constant time.
>
> I was more disturbed about copying the actual bytes. One of the
> envisioned use of the mechanism is to give us unbound number of flag
> bits to paint the history, and also this could be later used to
> store larger structures per commit.
Right; your solution does lower the constant factor. I just don't know
that it matters all that much, since the allocations are so few (i.e.,
log2(N) to get to N items). And you are hurting the "hot path" of every
access of the flags to do it (by introducing the extra division and
indirection). Probably the extra work on lookup doesn't matter on modern
processors due to pipelining. I'd like to measure a few cases to be
sure, but I probably won't get to it today.
> Also "you can now take a pointer" you illustrated (but I snipped
> from here) is a good point.
Yeah, assuming the timings are equal, I'd take your solution on that
principle alone.
> >> struct commit_slab {
> >> - int *buf;
> >> - int alloc;
> >> + int piece_size;
> >> + int piece_count;
> >> + struct commit_slab_piece **piece;
> >> };
> >
> > Is there a reason to make piece_size a struct member? It must be
> > constant after any pieces are allocated. Is the intent that callers
> > could do:
> >
> > slab_init(&s);
> > /* I know ahead of time we are going to need N of these. */
> > s.piece_size = n;
>
> The piece_size (later slab_size) is to hold the number of commits
> each slice (i.e. the piece of memory s->slab[nth_slab] points at)
> handles.
Yes, but isn't that a constant:
(512*1024-32) / sizeof(struct commit_slab_piece)
Leaving it as such lets the compiler optimize better, and is safe from
anybody changing it at runtime. But I think the answer to my question is
"yes, that would be the best thing for patch 2, but patch 3 will allow a
run-time stride parameter anyway". Correct?
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] commit-slab: avoid large realloc
2013-04-14 19:19 ` Jeff King
@ 2013-04-15 1:29 ` Junio C Hamano
0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2013-04-15 1:29 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> Yes, but isn't that a constant:
>
> (512*1024-32) / sizeof(struct commit_slab_piece)
>
> Leaving it as such lets the compiler optimize better, and is safe from
> anybody changing it at runtime. But I think the answer to my question is
> "yes, that would be the best thing for patch 2, but patch 3 will allow a
> run-time stride parameter anyway". Correct?
Yup.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] commit-slab: introduce a macro to define a slab for new type
2013-04-14 18:41 ` Jeff King
@ 2013-04-15 1:32 ` Junio C Hamano
0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2013-04-15 1:32 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> On Sat, Apr 13, 2013 at 11:04:49PM -0700, Junio C Hamano wrote:
>
>> Suppose you want to give one bit per existing ref and paint commits
>> down to find which refs are descendants of each commit. You find
>> that you have 320 refs only at runtime.
>>
>> The code can declare a commit slab "struct flagbits"
>>
>> define_commit_slab(flagbits, unsigned char);
>> struct flagbits flags;
>>
>> and initialize it by:
>>
>> nrefs = ... count number of refs that returns say 320 ...
>> init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
>>
>> so that
>>
>> unsigned char *fp = flagbits_at(&flags, commit);
>>
>> will return a pointer pointing at an array of 40 "unsigned char"s
>> associated with the commit.
>
> Thanks, I was thinking originally that we would want to break it down
> into "unsigned long" or something, but there is probably no real
> performance advantage to doing that over bytes.
The 320 came from writing "an array of 5 unsigned long long" in the
first draft ;-)
> I'd probably further wrap it with a flagbit_set and flagbit_tst to wrap
> the "figure out which byte, then which bit of that byte" logic, but that
> would be a wrapper around flagbits_at, anyway. It can come later.
Exactly. At that point, it is not about "what you could use commit
slab for" but is about "how you would implement unbounded set of
flag bits".
> We'd probably want the hot path of this (returning the actual pointer)
> to be inline, but not necessarily the parts about growing,...
Yeah, this was just a technology demonstration as your original.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-04-15 1:32 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-14 6:04 [PATCH 0/3] Using commit slab to replace indegree Junio C Hamano
2013-04-14 6:04 ` [PATCH 1/3] commit: allow associating auxiliary info on-demand Junio C Hamano
2013-04-14 15:12 ` Jeff King
2013-04-14 19:01 ` Junio C Hamano
2013-04-14 6:04 ` [PATCH 2/3] commit-slab: avoid large realloc Junio C Hamano
2013-04-14 15:28 ` Jeff King
2013-04-14 18:51 ` Junio C Hamano
2013-04-14 19:19 ` Jeff King
2013-04-15 1:29 ` Junio C Hamano
2013-04-14 6:04 ` [PATCH 3/3] commit-slab: introduce a macro to define a slab for new type Junio C Hamano
2013-04-14 18:41 ` Jeff King
2013-04-15 1:32 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).