* [PATCH 2/2] decorate: add "clear_decoration()"
@ 2013-04-08 6:14 Junio C Hamano
2013-04-08 21:09 ` Jeff King
0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2013-04-08 6:14 UTC (permalink / raw)
To: git
So far, all the users of the decoration API used decoration that
only grows and discarded at the end of the program execution.
Introduce for_each_decoration() that lets the caller iterate over
all defined decorations and use it to implement clear_decoration()
function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Documentation/technical/api-decorate.txt | 22 ++++++++++++++++++++++
decorate.c | 19 +++++++++++++++++++
decorate.h | 4 ++++
3 files changed, 45 insertions(+)
diff --git a/Documentation/technical/api-decorate.txt b/Documentation/technical/api-decorate.txt
index 0cc2b64..600366d 100644
--- a/Documentation/technical/api-decorate.txt
+++ b/Documentation/technical/api-decorate.txt
@@ -35,3 +35,25 @@ the `obj`. You cannot tell if `obj` does not appear in the hashtable
at all, or if `obj` has decoration whose value is NULL, so if you want
to use the decoration API for "Did I see this object?" hashtable,
use decoration value that is _not_ NULL.
+
+Iterating
+---------
+
+`for_each_decoration(struct decoration *deco, for_each_decoration_fn fn)`
+iterates over all the entries in the hashtable, and calls `fn` on each
+entry. The `fn` callback function takes a single `struct object_decoration`
+as its parameter, that has `base` field that points at the `obj`
+given to an earlier call to `add_decoration` and `decoration` field
+that remembers the `decoration`.
+
+Clearing
+--------
+
+`clear_decoration(struct decoration *deco, for_each_decoration_fn fn)`,
+when `fn` is not NULL, iterates over all the entries and calls the
+callback function `fn` using `for_each_decoration`, and then frees
+the memory used for the hashtable but not the `struct decoration` itself.
+
+The callback function can be used to release the resource used by
+the `decoration` the earlier `add_decoration` registered to the
+hashtable.
diff --git a/decorate.c b/decorate.c
index 2f8a63e..3e15358 100644
--- a/decorate.c
+++ b/decorate.c
@@ -86,3 +86,22 @@ void *lookup_decoration(struct decoration *n, const struct object *obj)
j = 0;
}
}
+
+void for_each_decoration(struct decoration *n, for_each_decoration_fn fn)
+{
+ int i;
+
+ for (i = 0; i < n->size; i++) {
+ struct object_decoration *entry = &n->hash[i];
+ if (!entry->base)
+ continue;
+ fn(entry);
+ }
+}
+
+void clear_decoration(struct decoration *n, for_each_decoration_fn fn)
+{
+ if (fn)
+ for_each_decoration(n, fn);
+ free(n->hash);
+}
diff --git a/decorate.h b/decorate.h
index e732804..4a0d37e 100644
--- a/decorate.h
+++ b/decorate.h
@@ -15,4 +15,8 @@ struct decoration {
extern void *add_decoration(struct decoration *n, const struct object *obj, void *decoration);
extern void *lookup_decoration(struct decoration *n, const struct object *obj);
+typedef void (*for_each_decoration_fn)(struct object_decoration *);
+extern void for_each_decoration(struct decoration *, for_each_decoration_fn);
+extern void clear_decoration(struct decoration *, for_each_decoration_fn);
+
#endif
--
1.8.2.1-450-gd047976
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] decorate: add "clear_decoration()"
2013-04-08 6:14 [PATCH 2/2] decorate: add "clear_decoration()" Junio C Hamano
@ 2013-04-08 21:09 ` Jeff King
2013-04-08 21:22 ` Junio C Hamano
2013-04-08 23:01 ` [PATCH 0/3] Using a bit more decoration Junio C Hamano
0 siblings, 2 replies; 15+ messages in thread
From: Jeff King @ 2013-04-08 21:09 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sun, Apr 07, 2013 at 11:14:17PM -0700, Junio C Hamano wrote:
> So far, all the users of the decoration API used decoration that
> only grows and discarded at the end of the program execution.
>
> Introduce for_each_decoration() that lets the caller iterate over
> all defined decorations and use it to implement clear_decoration()
> function.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both this and the first patch look good to me. I had documented
api-decorate long ago as part of the cache-metadata-on-disk series (it
was built on generic maps, which I also refactored decorate on top of).
But I didn't see anything in my version that was missing from yours (in
fact, it was much sparser, because it just referred to the generic map
api, which we never merged).
Out of curiosity: where is the patch 3 that presumably led you to
wanting these?
-Peff
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] decorate: add "clear_decoration()"
2013-04-08 21:09 ` Jeff King
@ 2013-04-08 21:22 ` Junio C Hamano
2013-04-08 23:01 ` [PATCH 0/3] Using a bit more decoration Junio C Hamano
1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2013-04-08 21:22 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> On Sun, Apr 07, 2013 at 11:14:17PM -0700, Junio C Hamano wrote:
>
>> So far, all the users of the decoration API used decoration that
>> only grows and discarded at the end of the program execution.
>>
>> Introduce for_each_decoration() that lets the caller iterate over
>> all defined decorations and use it to implement clear_decoration()
>> function.
>>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>
> Both this and the first patch look good to me. I had documented
> api-decorate long ago as part of the cache-metadata-on-disk series (it
> was built on generic maps, which I also refactored decorate on top of).
> But I didn't see anything in my version that was missing from yours (in
> fact, it was much sparser, because it just referred to the generic map
> api, which we never merged).
>
> Out of curiosity: where is the patch 3 that presumably led you to
> wanting these?
It is nowhere near ready to be published, as I am not finding any
time for coding today X-<.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 0/3] Using a bit more decoration
2013-04-08 21:09 ` Jeff King
2013-04-08 21:22 ` Junio C Hamano
@ 2013-04-08 23:01 ` Junio C Hamano
2013-04-08 23:01 ` [PATCH 1/3] commit: shrink "indegree" field Junio C Hamano
` (3 more replies)
1 sibling, 4 replies; 15+ messages in thread
From: Junio C Hamano @ 2013-04-08 23:01 UTC (permalink / raw)
To: git; +Cc: Jeff King
This comes on top of the two "decorate" patches that introduced the
clear_decoration() function.
I am not sure if pre-parsing and sharing the encoding values in-core
would affect performance very much, but now we have 2 bytes (or 6
bytes, if you are on 64-bit archs) free to use in "struct commit"
for later use (e.g. extra object flags that are relevant only for
commit objects, without having to widen "struct object").
Junio C Hamano (3):
commit: shrink "indegree" field
commit: rename parse_commit_date()
commit: add get_commit_encoding()
commit.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++++++----------
commit.h | 5 ++-
pretty.c | 33 +--------------
sequencer.c | 23 +---------
4 files changed, 122 insertions(+), 77 deletions(-)
--
1.8.2.1-450-gd047976
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] commit: shrink "indegree" field
2013-04-08 23:01 ` [PATCH 0/3] Using a bit more decoration Junio C Hamano
@ 2013-04-08 23:01 ` Junio C Hamano
2013-04-09 3:52 ` Jeff King
2013-04-08 23:01 ` [PATCH 2/3] commit: rename parse_commit_date() Junio C Hamano
` (2 subsequent siblings)
3 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2013-04-08 23:01 UTC (permalink / raw)
To: git; +Cc: Jeff King
The "indegree" field in the commit object is used only while a
commit list is sorted using the topo_order sorting. The idea is to
initially store the number of direct parents and the commit itself
there, and start emitting the commits with indegree 1 (i.e. leaves
that do not have any commit as the parent), and every time a commit
is emitted, decrement its own count (to drop it to 0 to clear the
field) and the count in its parents (so eventually a commit whose
children are all emitted will see its count dropped to 1, becoming
eligible to be emitted) by one.
We used to allocate a full integer for this, but in real life, a
commit with thousands of direct children are rare. Shrink it to an
unsigned char to represent a commit with 254 children or less
without any extra overhead, and spilling the overflow to a separate
decoration hash.
This does not save any bytes from the commit object structure due to
alignment, but would give us 3 or 7 bytes to store other information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
commit.c | 46 +++++++++++++++++++++++++++++++++++++++-------
commit.h | 2 +-
2 files changed, 40 insertions(+), 8 deletions(-)
diff --git a/commit.c b/commit.c
index 516a4ff..9d7e81b 100644
--- a/commit.c
+++ b/commit.c
@@ -506,6 +506,30 @@ struct commit *pop_commit(struct commit_list **stack)
return item;
}
+#define QUICK_INDEGREE_LIMIT 255
+
+static inline void set_indegree(struct decoration *indegree,
+ struct commit *commit, intptr_t value)
+{
+ if (QUICK_INDEGREE_LIMIT <= value) {
+ commit->indegree = QUICK_INDEGREE_LIMIT;
+ add_decoration(indegree, &commit->object, (void *)value);
+ } else {
+ commit->indegree = value;
+ }
+}
+
+static inline intptr_t get_indegree(struct decoration *indegree,
+ struct commit *commit)
+{
+ if (commit->indegree < QUICK_INDEGREE_LIMIT)
+ return commit->indegree;
+ else {
+ void *count = lookup_decoration(indegree, &commit->object);
+ return (int) count;
+ }
+}
+
/*
* Performs an in-place topological sort on the list supplied.
*/
@@ -514,15 +538,18 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
struct commit_list *next, *orig = *list;
struct commit_list *work, **insert;
struct commit_list **pptr;
+ struct decoration id_overflow;
+ intptr_t count;
if (!orig)
return;
*list = NULL;
/* Mark them and clear the indegree */
+ memset(&id_overflow, 0, sizeof(id_overflow));
for (next = orig; next; next = next->next) {
struct commit *commit = next->item;
- commit->indegree = 1;
+ set_indegree(&id_overflow, commit, 1);
}
/* update the indegree */
@@ -531,8 +558,9 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
while (parents) {
struct commit *parent = parents->item;
- if (parent->indegree)
- parent->indegree++;
+ count = get_indegree(&id_overflow, parent);
+ if (count)
+ set_indegree(&id_overflow, parent, count + 1);
parents = parents->next;
}
}
@@ -549,7 +577,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
for (next = orig; next; next = next->next) {
struct commit *commit = next->item;
- if (commit->indegree == 1)
+ if (get_indegree(&id_overflow, commit) == 1)
insert = &commit_list_insert(commit, insert)->next;
}
@@ -571,7 +599,8 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
for (parents = commit->parents; parents ; parents = parents->next) {
struct commit *parent = parents->item;
- if (!parent->indegree)
+ count = get_indegree(&id_overflow, parent);
+ if (!count)
continue;
/*
@@ -579,7 +608,9 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
* when all their children have been emitted thereby
* guaranteeing topological order.
*/
- if (--parent->indegree == 1) {
+ count--;
+ set_indegree(&id_overflow, parent, count);
+ if (count == 1) {
if (!lifo)
commit_list_insert_by_date(parent, &work);
else
@@ -590,10 +621,11 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
* work_item is a commit all of whose children
* have already been emitted. we can emit it now.
*/
- commit->indegree = 0;
+ set_indegree(&id_overflow, commit, 0);
*pptr = work_item;
pptr = &work_item->next;
}
+ clear_decoration(&id_overflow, NULL);
}
/* merge-base stuff */
diff --git a/commit.h b/commit.h
index 87b4b6c..771eeae 100644
--- a/commit.h
+++ b/commit.h
@@ -15,7 +15,7 @@ struct commit_list {
struct commit {
struct object object;
void *util;
- unsigned int indegree;
+ unsigned int indegree:8; /* see QUICK_INDEGREE_LIMIT in commit.c */
unsigned long date;
struct commit_list *parents;
struct tree *tree;
--
1.8.2.1-450-gd047976
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/3] commit: rename parse_commit_date()
2013-04-08 23:01 ` [PATCH 0/3] Using a bit more decoration Junio C Hamano
2013-04-08 23:01 ` [PATCH 1/3] commit: shrink "indegree" field Junio C Hamano
@ 2013-04-08 23:01 ` Junio C Hamano
2013-04-08 23:01 ` [PATCH 3/3] commit: add get_commit_encoding() Junio C Hamano
2013-04-09 3:51 ` [PATCH 0/3] Using a bit more decoration Jeff King
3 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2013-04-08 23:01 UTC (permalink / raw)
To: git; +Cc: Jeff King
This function does a lot more than parsing the committer date out of
a commit object buffer. After its sole caller parses one "tree",
and 0 or more "parent", it makes sure the next one is "author" (and
skips it), makes sure "committer" follows (and skips the committer
identity), and parses the date field. Each of these fields must be
on its own line (no header folding is allowed).
Rename it to parse_commit_standard_header(), and change the function
signature to accept "struct commit *" to be updated as a parameter.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
commit.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/commit.c b/commit.c
index 9d7e81b..50a9827 100644
--- a/commit.c
+++ b/commit.c
@@ -78,31 +78,33 @@ struct commit *lookup_commit_reference_by_name(const char *name)
return commit;
}
-static unsigned long parse_commit_date(const char *buf, const char *tail)
+static void parse_commit_standard_headers(const char *buf, const char *tail,
+ struct commit *item)
{
const char *dateptr;
+ item->date = 0;
if (buf + 6 >= tail)
- return 0;
+ return;
if (memcmp(buf, "author", 6))
- return 0;
+ return;
while (buf < tail && *buf++ != '\n')
- /* nada */;
+ ; /* skip to the end of the line */
if (buf + 9 >= tail)
- return 0;
+ return;
if (memcmp(buf, "committer", 9))
- return 0;
+ return;
while (buf < tail && *buf++ != '>')
- /* nada */;
+ ; /* skip to the end of the e-mail */
if (buf >= tail)
- return 0;
+ return;
dateptr = buf;
while (buf < tail && *buf++ != '\n')
- /* nada */;
+ ; /* skip to the end of the line */
if (buf >= tail)
- return 0;
+ return;
/* dateptr < buf && buf[-1] == '\n', so strtoul will stop at buf-1 */
- return strtoul(dateptr, NULL, 10);
+ item->date = strtoul(dateptr, NULL, 10);
}
static struct commit_graft **commit_graft;
@@ -262,6 +264,11 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
if (item->object.parsed)
return 0;
item->object.parsed = 1;
+
+ /*
+ * tree, 0-or-more parents, author and committer are required
+ * and must appear in this order; no line folding is allowed.
+ */
tail += size;
if (tail <= bufptr + 46 || memcmp(bufptr, "tree ", 5) || bufptr[45] != '\n')
return error("bogus commit object %s", sha1_to_hex(item->object.sha1));
@@ -301,8 +308,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
pptr = &commit_list_insert(new_parent, pptr)->next;
}
}
- item->date = parse_commit_date(bufptr, tail);
-
+ parse_commit_standard_headers(bufptr, tail, item);
return 0;
}
--
1.8.2.1-450-gd047976
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/3] commit: add get_commit_encoding()
2013-04-08 23:01 ` [PATCH 0/3] Using a bit more decoration Junio C Hamano
2013-04-08 23:01 ` [PATCH 1/3] commit: shrink "indegree" field Junio C Hamano
2013-04-08 23:01 ` [PATCH 2/3] commit: rename parse_commit_date() Junio C Hamano
@ 2013-04-08 23:01 ` Junio C Hamano
2013-04-09 3:51 ` [PATCH 0/3] Using a bit more decoration Jeff King
3 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2013-04-08 23:01 UTC (permalink / raw)
To: git; +Cc: Jeff King
This replaces two duplicated and slightly different helper functions
in pretty.c and sequencer.c that parses the commit object to find
the encoding header, by having the header parsed when other standard
commit header fields are parsed. A commit object now has a small
integer that is used to index into a static table that holds the
first 200+ values for encoding headers, and in a weird project that
uses more than that many encoding, the encoding values are parsed
into a separate decoration hash (that is not expected to be used in
real life).
Incidentally, this fixes a small leak in sequencer(); the memory
allocated in its get_encoding() helper was never freed in the
caller.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
commit.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
commit.h | 3 +++
pretty.c | 33 ++------------------------------
sequencer.c | 23 +----------------------
4 files changed, 64 insertions(+), 57 deletions(-)
diff --git a/commit.c b/commit.c
index 50a9827..89c8d7c 100644
--- a/commit.c
+++ b/commit.c
@@ -78,10 +78,53 @@ struct commit *lookup_commit_reference_by_name(const char *name)
return commit;
}
+#define QUICK_ENCODING_LIMIT 254 /* offset 1, as 0 is for "no encoding" */
+
+static int quick_encoding_used;
+static struct {
+ const char *encoding;
+ size_t sz;
+} quick_encoding[QUICK_ENCODING_LIMIT];
+static struct decoration slow_encoding;
+
+static void set_commit_encoding(struct commit *item, const char *encoding, size_t sz)
+{
+ int i;
+
+ for (i = 0; i < quick_encoding_used; i++) {
+ if (sz == quick_encoding[i].sz &&
+ !memcmp(encoding, quick_encoding[i].encoding, sz)) {
+ item->encoding = i + 1;
+ return;
+ }
+ }
+ if (i < QUICK_ENCODING_LIMIT) {
+ quick_encoding[i].sz = sz;
+ quick_encoding[i].encoding = xmemdupz(encoding, sz);
+ item->encoding = i + 1;
+ quick_encoding_used++;
+ return;
+ }
+ item->encoding = QUICK_ENCODING_LIMIT;
+ add_decoration(&slow_encoding, &item->object, xmemdupz(encoding, sz));
+}
+
+const char *get_commit_encoding(const struct commit *item)
+{
+ int i;
+
+ if (!item->encoding)
+ return NULL;
+ i = item->encoding - 1; /* offset 1 */
+ if (i < QUICK_ENCODING_LIMIT)
+ return quick_encoding[i].encoding;
+ return lookup_decoration(&slow_encoding, &item->object);
+}
+
static void parse_commit_standard_headers(const char *buf, const char *tail,
struct commit *item)
{
- const char *dateptr;
+ const char *ptr;
item->date = 0;
if (buf + 6 >= tail)
@@ -98,13 +141,24 @@ static void parse_commit_standard_headers(const char *buf, const char *tail,
; /* skip to the end of the e-mail */
if (buf >= tail)
return;
- dateptr = buf;
+ ptr = buf;
+ while (buf < tail && *buf++ != '\n')
+ ; /* skip to the end of the line */
+ if (buf >= tail)
+ return;
+ /* ptr < buf && buf[-1] == '\n', so strtoul will stop at buf-1 */
+ item->date = strtoul(ptr, NULL, 10);
+
+ item->encoding = 0; /* no encoding header */
+ if (memcmp(buf, "encoding ", 9))
+ return;
+ ptr = buf + 9;
while (buf < tail && *buf++ != '\n')
; /* skip to the end of the line */
if (buf >= tail)
return;
- /* dateptr < buf && buf[-1] == '\n', so strtoul will stop at buf-1 */
- item->date = strtoul(dateptr, NULL, 10);
+ /* buf[-1] == '\n' that is the end of encoding */
+ set_commit_encoding(item, ptr, buf - ptr);
}
static struct commit_graft **commit_graft;
diff --git a/commit.h b/commit.h
index 771eeae..e7a70d3 100644
--- a/commit.h
+++ b/commit.h
@@ -16,12 +16,15 @@ struct commit {
struct object object;
void *util;
unsigned int indegree:8; /* see QUICK_INDEGREE_LIMIT in commit.c */
+ unsigned int encoding:8; /* see QUICK_ENCODING_LIMIT in commit.c */
unsigned long date;
struct commit_list *parents;
struct tree *tree;
char *buffer;
};
+extern const char *get_commit_encoding(const struct commit *);
+
extern int save_commit_buffer;
extern const char *commit_type;
diff --git a/pretty.c b/pretty.c
index d3a82d2..08c6ffc 100644
--- a/pretty.c
+++ b/pretty.c
@@ -534,34 +534,6 @@ static void add_merge_info(const struct pretty_print_context *pp,
strbuf_addch(sb, '\n');
}
-static char *get_header(const struct commit *commit, const char *msg,
- const char *key)
-{
- int key_len = strlen(key);
- const char *line = msg;
-
- while (line) {
- const char *eol = strchr(line, '\n'), *next;
-
- if (line == eol)
- return NULL;
- if (!eol) {
- warning("malformed commit (header is missing newline): %s",
- sha1_to_hex(commit->object.sha1));
- eol = line + strlen(line);
- next = NULL;
- } else
- next = eol + 1;
- if (eol - line > key_len &&
- !strncmp(line, key, key_len) &&
- line[key_len] == ' ') {
- return xmemdupz(line + key_len + 1, eol - line - key_len - 1);
- }
- line = next;
- }
- return NULL;
-}
-
static char *replace_encoding_header(char *buf, const char *encoding)
{
struct strbuf tmp = STRBUF_INIT;
@@ -598,7 +570,7 @@ char *logmsg_reencode(const struct commit *commit,
{
static const char *utf8 = "UTF-8";
const char *use_encoding;
- char *encoding;
+ const char *encoding;
char *msg = commit->buffer;
char *out;
@@ -617,7 +589,7 @@ char *logmsg_reencode(const struct commit *commit,
if (!output_encoding || !*output_encoding)
return msg;
- encoding = get_header(commit, msg, "encoding");
+ encoding = get_commit_encoding(commit);
use_encoding = encoding ? encoding : utf8;
if (same_encoding(use_encoding, output_encoding)) {
/*
@@ -658,7 +630,6 @@ char *logmsg_reencode(const struct commit *commit,
if (out)
out = replace_encoding_header(out, output_encoding);
- free(encoding);
/*
* If the re-encoding failed, out might be NULL here; in that
* case we just return the commit message verbatim.
diff --git a/sequencer.c b/sequencer.c
index baa0310..5d97519 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -116,8 +116,6 @@ static const char *action_name(const struct replay_opts *opts)
return opts->action == REPLAY_REVERT ? "revert" : "cherry-pick";
}
-static char *get_encoding(const char *message);
-
struct commit_message {
char *parent_label;
const char *label;
@@ -135,7 +133,7 @@ static int get_message(struct commit *commit, struct commit_message *out)
if (!commit->buffer)
return -1;
- encoding = get_encoding(commit->buffer);
+ encoding = get_commit_encoding(commit);
if (!encoding)
encoding = "UTF-8";
if (!git_commit_encoding)
@@ -173,25 +171,6 @@ static void free_message(struct commit_message *msg)
free(msg->reencoded_message);
}
-static char *get_encoding(const char *message)
-{
- const char *p = message, *eol;
-
- while (*p && *p != '\n') {
- for (eol = p + 1; *eol && *eol != '\n'; eol++)
- ; /* do nothing */
- if (!prefixcmp(p, "encoding ")) {
- char *result = xmalloc(eol - 8 - p);
- strlcpy(result, p + 9, eol - 8 - p);
- return result;
- }
- p = eol;
- if (*p == '\n')
- p++;
- }
- return NULL;
-}
-
static void write_cherry_pick_head(struct commit *commit, const char *pseudoref)
{
const char *filename;
--
1.8.2.1-450-gd047976
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Using a bit more decoration
2013-04-08 23:01 ` [PATCH 0/3] Using a bit more decoration Junio C Hamano
` (2 preceding siblings ...)
2013-04-08 23:01 ` [PATCH 3/3] commit: add get_commit_encoding() Junio C Hamano
@ 2013-04-09 3:51 ` Jeff King
2013-04-09 4:17 ` Junio C Hamano
2013-04-09 4:37 ` Junio C Hamano
3 siblings, 2 replies; 15+ messages in thread
From: Jeff King @ 2013-04-09 3:51 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Mon, Apr 08, 2013 at 04:01:51PM -0700, Junio C Hamano wrote:
> This comes on top of the two "decorate" patches that introduced the
> clear_decoration() function.
> [...]
> Junio C Hamano (3):
> commit: shrink "indegree" field
> commit: rename parse_commit_date()
> commit: add get_commit_encoding()
Neat. Reading through, I didn't notice anything obviously wrong with any
of the patches (though there is one gcc warning, which I'll respond to
separately).
It does make me a little nervous to have code that almost never gets
exercised (i.e., when indegree is really high, or a large number of
encodings). It seems like a bug waiting to happen when somebody does hit
that condition. But the decoration lookups do look fairly simple and
easy to get right.
> I am not sure if pre-parsing and sharing the encoding values in-core
> would affect performance very much, but now we have 2 bytes (or 6
> bytes, if you are on 64-bit archs) free to use in "struct commit"
> for later use (e.g. extra object flags that are relevant only for
> commit objects, without having to widen "struct object").
I wasn't able to measure any speedup, but I'm not surprised; something
like "git log" already spends way more effort extracting and
pretty-printing the commit objects.
I was actually thinking of something related to this recently. It would
be nice to be able to allocate a slab of very efficient fixed-size
per-commit or per-object storage. That would eliminate the need to pay
the cost for "indegree" all the time; the topo-sort could allocate a
slab of indegree integers, then throw it away when the sort was done.
Similarly, a traversal would not have to worry about pollution of the
object flags, nor about using an arbitrarily large number of flags[1],
if it could allocate a separate flag structure.
Using a separate hash would be too slow. I'm thinking more like giving
each commit an integer "id" as it is parsed, and then using that id to
index into a temporary slab (you'd grow the slab as you see more commits
but that's OK, since we're always indexing it, not using a pointer). The
slab management would get amortized, so the main cost would be the extra
indirect memory access (i.e., hitting "flags[commit->id]" instead of
"commit->id"), and the extra memory used by the id field.
I dunno. Maybe those costs would make it too slow. I haven't actually
coded or measured anything yet.
-Peff
[1] The thing that made me think about this was making a version of
paint_down_to_common that could keep a separate color for each
source, rather than only PARENT1 and PARENT2. That would let us do
the "--contains" traversal in a breadth-first way, but calculate the
answer simultaneously for all sources (i.e., avoid the depth-first
"go to roots" problem that the current "tag --contains" has if you
do not use timestamp cutoffs).
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] commit: shrink "indegree" field
2013-04-08 23:01 ` [PATCH 1/3] commit: shrink "indegree" field Junio C Hamano
@ 2013-04-09 3:52 ` Jeff King
0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2013-04-09 3:52 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Mon, Apr 08, 2013 at 04:01:52PM -0700, Junio C Hamano wrote:
> +static inline intptr_t get_indegree(struct decoration *indegree,
> + struct commit *commit)
> +{
> + if (commit->indegree < QUICK_INDEGREE_LIMIT)
> + return commit->indegree;
> + else {
> + void *count = lookup_decoration(indegree, &commit->object);
> + return (int) count;
> + }
> +}
Should that cast be to intptr_t?
-Peff
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Using a bit more decoration
2013-04-09 3:51 ` [PATCH 0/3] Using a bit more decoration Jeff King
@ 2013-04-09 4:17 ` Junio C Hamano
2013-04-09 4:39 ` Jeff King
2013-04-09 4:37 ` Junio C Hamano
1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2013-04-09 4:17 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> [1] The thing that made me think about this was making a version of
> paint_down_to_common that could keep a separate color for each
> source, rather than only PARENT1 and PARENT2. That would let us do
> the "--contains" traversal in a breadth-first way, but calculate the
> answer simultaneously for all sources (i.e., avoid the depth-first
> "go to roots" problem that the current "tag --contains" has if you
> do not use timestamp cutoffs).
Yes, show-branch operates independently of rev-list machinery, hence
can use full set of bits, but the full set of bits in a word is not
always sufficient and it can be helped greatly with such an unbound
set of bits machinery.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Using a bit more decoration
2013-04-09 3:51 ` [PATCH 0/3] Using a bit more decoration Jeff King
2013-04-09 4:17 ` Junio C Hamano
@ 2013-04-09 4:37 ` Junio C Hamano
1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2013-04-09 4:37 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> Neat. Reading through, I didn't notice anything obviously wrong with any
> of the patches (though there is one gcc warning, which I'll respond to
> separately).
>
> It does make me a little nervous to have code that almost never gets
> exercised (i.e., when indegree is really high, or a large number of
> encodings). It seems like a bug waiting to happen when somebody does hit
> that condition.
One round of work-in-progress code I had when you asked what I was
up to did have that off-by-one bug ;-) set_indegree() had to spill
into the hash when storing 255 (i.e. exactly the value of LIMIT) but
I was spilling strting from 256, so an entry with 255 children looked
into the hash, finding nothing and said "I am done" X-<.
I haven't bothered to try the "more than 256 encodings", but with
the likely off-by-one in mind, I think I was being careful enough.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Using a bit more decoration
2013-04-09 4:17 ` Junio C Hamano
@ 2013-04-09 4:39 ` Jeff King
2013-04-09 4:54 ` Junio C Hamano
0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2013-04-09 4:39 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Mon, Apr 08, 2013 at 09:17:42PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > [1] The thing that made me think about this was making a version of
> > paint_down_to_common that could keep a separate color for each
> > source, rather than only PARENT1 and PARENT2. That would let us do
> > the "--contains" traversal in a breadth-first way, but calculate the
> > answer simultaneously for all sources (i.e., avoid the depth-first
> > "go to roots" problem that the current "tag --contains" has if you
> > do not use timestamp cutoffs).
>
> Yes, show-branch operates independently of rev-list machinery, hence
> can use full set of bits, but the full set of bits in a word is not
> always sufficient and it can be helped greatly with such an unbound
> set of bits machinery.
The tricky part is how to store the index for each commit (or object). I
don't see a way around adding a new field to "struct commit" to do so.
It _might_ be worth it, because that index would be reusable for many
different operations.
I had hoped to do something clever and implicit with the commit pointer,
since we allocate from a pool. For example, if you have pointer X and
know that the pool starts at Y, you can get an index with simple
subtraction. But of course we don't know what the pool is for any given
allocator without searching the pools, which grow linearly with the
number of objects. So it ends up with the same algorithmic complexity as
just searching for the commit in a data structure (albeit with a smaller
constant, since we allocate big chunks of objects).
-Peff
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Using a bit more decoration
2013-04-09 4:39 ` Jeff King
@ 2013-04-09 4:54 ` Junio C Hamano
2013-04-09 6:52 ` Jeff King
0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2013-04-09 4:54 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> The tricky part is how to store the index for each commit (or object). I
> don't see a way around adding a new field to "struct commit" to do so.
We could reuse the space "indegree" used to live at; that is a
one-word space already, and I really do not need "encoding". It was
no more than "while we are at it, we could add different things
here" demonstration.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Using a bit more decoration
2013-04-09 4:54 ` Junio C Hamano
@ 2013-04-09 6:52 ` Jeff King
2013-04-09 20:06 ` Junio C Hamano
0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2013-04-09 6:52 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Mon, Apr 08, 2013 at 09:54:50PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > The tricky part is how to store the index for each commit (or object). I
> > don't see a way around adding a new field to "struct commit" to do so.
>
> We could reuse the space "indegree" used to live at; that is a
> one-word space already, and I really do not need "encoding". It was
> no more than "while we are at it, we could add different things
> here" demonstration.
Here is what my proposal would look like, replacing indegree with the
"index" field. In my timings of "git rev-list --topo-order --all", I
couldn't measure any difference.
Now the next tricky step will actually be writing a paint_down_to_common
that uses it. :) The auto-growing slab implementation below assumes a
fixed width of one int per commit. I think we'd want something similar
that stores a set of N bits per commit, where N is fixed at the start of
the algorithm.
---
commit.c | 59 +++++++++++++++++++++++++++++------
commit.h | 2 +-
2 files changed, 51 insertions(+), 10 deletions(-)
diff --git a/commit.c b/commit.c
index 516a4ff..95f041a 100644
--- a/commit.c
+++ b/commit.c
@@ -14,6 +14,7 @@ const char *commit_type = "commit";
int save_commit_buffer = 1;
const char *commit_type = "commit";
+static int commit_count;
static struct commit *check_commit(struct object *obj,
const unsigned char *sha1,
@@ -58,8 +59,11 @@ struct commit *lookup_commit(const unsigned char *sha1)
struct commit *lookup_commit(const unsigned char *sha1)
{
struct object *obj = lookup_object(sha1);
- if (!obj)
- return create_object(sha1, OBJ_COMMIT, alloc_commit_node());
+ if (!obj) {
+ struct commit *c = alloc_commit_node();
+ c->index = commit_count++;
+ return create_object(sha1, OBJ_COMMIT, c);
+ }
if (!obj->type)
obj->type = OBJ_COMMIT;
return check_commit(obj, sha1, 0);
@@ -506,6 +510,36 @@ struct commit *pop_commit(struct commit_list **stack)
return item;
}
+struct commit_slab {
+ int *buf;
+ int alloc;
+};
+
+static void slab_init(struct commit_slab *s)
+{
+ memset(s, 0, sizeof(*s));
+}
+
+static void slab_clear(struct commit_slab *s)
+{
+ free(s->buf);
+ slab_init(s);
+}
+
+static inline int *slab_at(struct commit_slab *s, const struct commit *c)
+{
+ if (s->alloc <= c->index) {
+ int new_alloc = alloc_nr(s->alloc);
+ if (new_alloc <= c->index)
+ new_alloc = c->index + 1;
+
+ s->buf = xrealloc(s->buf, new_alloc * sizeof(*s->buf));
+ memset(s->buf + s->alloc, 0, new_alloc - s->alloc);
+ s->alloc = new_alloc;
+ }
+ return s->buf + c->index;
+}
+
/*
* Performs an in-place topological sort on the list supplied.
*/
@@ -514,15 +548,18 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
struct commit_list *next, *orig = *list;
struct commit_list *work, **insert;
struct commit_list **pptr;
+ struct commit_slab indegree;
if (!orig)
return;
*list = NULL;
+ slab_init(&indegree);
+
/* Mark them and clear the indegree */
for (next = orig; next; next = next->next) {
struct commit *commit = next->item;
- commit->indegree = 1;
+ *slab_at(&indegree, commit) = 1;
}
/* update the indegree */
@@ -530,9 +567,10 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
struct commit_list * parents = next->item->parents;
while (parents) {
struct commit *parent = parents->item;
+ int *pi = slab_at(&indegree, parent);
- if (parent->indegree)
- parent->indegree++;
+ if (*pi)
+ (*pi)++;
parents = parents->next;
}
}
@@ -549,7 +587,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
for (next = orig; next; next = next->next) {
struct commit *commit = next->item;
- if (commit->indegree == 1)
+ if (*slab_at(&indegree, commit) == 1)
insert = &commit_list_insert(commit, insert)->next;
}
@@ -570,8 +608,9 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
commit = work_item->item;
for (parents = commit->parents; parents ; parents = parents->next) {
struct commit *parent = parents->item;
+ int *pi = slab_at(&indegree, parent);
- if (!parent->indegree)
+ if (!*pi)
continue;
/*
@@ -579,7 +618,7 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
* when all their children have been emitted thereby
* guaranteeing topological order.
*/
- if (--parent->indegree == 1) {
+ if (--(*pi) == 1) {
if (!lifo)
commit_list_insert_by_date(parent, &work);
else
@@ -590,10 +629,12 @@ void sort_in_topological_order(struct commit_list ** list, int lifo)
* work_item is a commit all of whose children
* have already been emitted. we can emit it now.
*/
- commit->indegree = 0;
+ *slab_at(&indegree, commit) = 0;
*pptr = work_item;
pptr = &work_item->next;
}
+
+ slab_clear(&indegree);
}
/* merge-base stuff */
diff --git a/commit.h b/commit.h
index 87b4b6c..8b1099f 100644
--- a/commit.h
+++ b/commit.h
@@ -15,7 +15,7 @@ struct commit {
struct commit {
struct object object;
void *util;
- unsigned int indegree;
+ unsigned int index;
unsigned long date;
struct commit_list *parents;
struct tree *tree;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Using a bit more decoration
2013-04-09 6:52 ` Jeff King
@ 2013-04-09 20:06 ` Junio C Hamano
0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2013-04-09 20:06 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
Jeff King <peff@peff.net> writes:
> +static inline int *slab_at(struct commit_slab *s, const struct commit *c)
> +{
> + if (s->alloc <= c->index) {
> + int new_alloc = alloc_nr(s->alloc);
> + if (new_alloc <= c->index)
> + new_alloc = c->index + 1;
> +
> + s->buf = xrealloc(s->buf, new_alloc * sizeof(*s->buf));
> + memset(s->buf + s->alloc, 0, new_alloc - s->alloc);
> + s->alloc = new_alloc;
> + }
> + return s->buf + c->index;
> +}
This will hurt more as the number of objects we deal with grows (our
ALLOC_GROW() shares the same).
I wonder if it might be a good idea to do
struct commit_slab {
int piece_size;
int piece_count;
struct commit_slab_piece {
int *buf;
} *piece;
};
and then make the look-up logic like this:
int nth_piece, nth_slot;
nth_piece = c->index / s->piece_size;
nth_slot = c->index % s->piece_size;
if (s->piece_count <= nth_piece) {
/* xrealloc s->piece to grow, update s->piece_count */
}
if (!s->piece[nth_piece]) {
/* xcalloc s->piece[nth_piece] to allocate */
}
return s->piece[nth_piece]->buf + nth_slot;
Other than that, looks like a good technology demonstration.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-04-09 20:06 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-08 6:14 [PATCH 2/2] decorate: add "clear_decoration()" Junio C Hamano
2013-04-08 21:09 ` Jeff King
2013-04-08 21:22 ` Junio C Hamano
2013-04-08 23:01 ` [PATCH 0/3] Using a bit more decoration Junio C Hamano
2013-04-08 23:01 ` [PATCH 1/3] commit: shrink "indegree" field Junio C Hamano
2013-04-09 3:52 ` Jeff King
2013-04-08 23:01 ` [PATCH 2/3] commit: rename parse_commit_date() Junio C Hamano
2013-04-08 23:01 ` [PATCH 3/3] commit: add get_commit_encoding() Junio C Hamano
2013-04-09 3:51 ` [PATCH 0/3] Using a bit more decoration Jeff King
2013-04-09 4:17 ` Junio C Hamano
2013-04-09 4:39 ` Jeff King
2013-04-09 4:54 ` Junio C Hamano
2013-04-09 6:52 ` Jeff King
2013-04-09 20:06 ` Junio C Hamano
2013-04-09 4:37 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).