git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] fast-import: cache oe more often
@ 2011-09-19  1:27 Dmitry Ivankov
  2011-09-19  1:27 ` [PATCH 1/8] fast-import: cache oe in file_change_m Dmitry Ivankov
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-09-19  1:27 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Sverre Rabbelier,
	Dmitry Ivankov

fast-import keeps a struct object_entry for each object written to
it's pack. This is to keep type, pack-coordinates and delta_depth.
struct object_entry is also used to cache this metadata for objects
that exist outside fast-import's pack ('old' objects).
struct object_entry has a small fixed size and thus it should be
reasonable to cache any 'old' object metadata retrieval to save the
disk i/o.

Also it is a step toward making fast-import identify objects via
struct object_entry rather than sha1. One pointer takes less than
20 bytes, it'll be later possible to have references to objects
that don't yet have sha1 computed (fast-import with threads future).

Dmitry Ivankov (8):
  fast-import: cache oe in file_change_m
  fast-import: cache oe in parse_new_tag
  fast-import: cache oe in note_change_n
  fast-import: extract common sha1_file access functions
  fast-import: tiny optimization in read_marks
  fast-import: cache oe in load_tree
  fast-import: cache oe in cat_blob
  fast-import: cache objects while dereferencing

 fast-import.c |  177 +++++++++++++++++++++++++++++++--------------------------
 1 files changed, 96 insertions(+), 81 deletions(-)

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/8] fast-import: cache oe in file_change_m
  2011-09-19  1:27 [PATCH 0/8] fast-import: cache oe more often Dmitry Ivankov
@ 2011-09-19  1:27 ` Dmitry Ivankov
  2011-09-19  1:27 ` [PATCH 2/8] fast-import: cache oe in parse_new_tag Dmitry Ivankov
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-09-19  1:27 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Sverre Rabbelier,
	Dmitry Ivankov

file_change_m checks object type for objects specified by sha1. It does
so via sha1_object_info but doesn't cache this information in struct
object_entry.

Make this call to sha1_object_info cached in struct object_entry.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   22 ++++++++++++++--------
 1 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 742e7da..42f9b17 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2297,15 +2297,21 @@ static void file_change_m(struct branch *b)
 	} else {
 		enum object_type expected = S_ISDIR(mode) ?
 						OBJ_TREE: OBJ_BLOB;
-		enum object_type type = oe ? oe->type :
-					sha1_object_info(sha1, NULL);
-		if (type < 0)
-			die("%s not found: %s",
-					S_ISDIR(mode) ?  "Tree" : "Blob",
-					command_buf.buf);
-		if (type != expected)
+		if (!oe)
+			oe = insert_object(sha1);
+		if (!oe->idx.offset) {
+			enum object_type type = sha1_object_info(oe->idx.sha1, NULL);
+			if (type < 0)
+				die("%s not found: %s",
+						S_ISDIR(mode) ?  "Tree" : "Blob",
+						command_buf.buf);
+			oe->type = type;
+			oe->pack_id = MAX_PACK_ID;
+			oe->idx.offset = 1; /* nonzero */
+		}
+		if (oe->type != expected)
 			die("Not a %s (actually a %s): %s",
-				typename(expected), typename(type),
+				typename(expected), typename(oe->type),
 				command_buf.buf);
 	}
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/8] fast-import: cache oe in parse_new_tag
  2011-09-19  1:27 [PATCH 0/8] fast-import: cache oe more often Dmitry Ivankov
  2011-09-19  1:27 ` [PATCH 1/8] fast-import: cache oe in file_change_m Dmitry Ivankov
@ 2011-09-19  1:27 ` Dmitry Ivankov
  2011-09-19  1:27 ` [PATCH 3/8] fast-import: cache oe in note_change_n Dmitry Ivankov
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-09-19  1:27 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Sverre Rabbelier,
	Dmitry Ivankov

parse_new_tag uses sha1_object_info to find out the type of an object
given by a sha1 expression. But doesn't cache it in struct object_entry.

Make this call to sha1_object_info cached in struct object_entry.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 42f9b17..2b049f7 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2732,11 +2732,14 @@ static void parse_new_tag(void)
 		type = oe->type;
 		hashcpy(sha1, oe->idx.sha1);
 	} else if (!get_sha1(from, sha1)) {
-		struct object_entry *oe = find_object(sha1);
-		if (!oe) {
+		struct object_entry *oe = insert_object(sha1);
+		if (!oe->idx.offset) {
 			type = sha1_object_info(sha1, NULL);
 			if (type < 0)
 				die("Not a valid object: %s", from);
+			oe->type = type;
+			oe->pack_id = MAX_PACK_ID;
+			oe->idx.offset = 1; /* nonzero */
 		} else
 			type = oe->type;
 	} else
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/8] fast-import: cache oe in note_change_n
  2011-09-19  1:27 [PATCH 0/8] fast-import: cache oe more often Dmitry Ivankov
  2011-09-19  1:27 ` [PATCH 1/8] fast-import: cache oe in file_change_m Dmitry Ivankov
  2011-09-19  1:27 ` [PATCH 2/8] fast-import: cache oe in parse_new_tag Dmitry Ivankov
@ 2011-09-19  1:27 ` Dmitry Ivankov
  2011-09-19  1:27 ` [PATCH 4/8] fast-import: extract common sha1_file access functions Dmitry Ivankov
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-09-19  1:27 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Sverre Rabbelier,
	Dmitry Ivankov

note_change_n checks the type of annotating data object to be Blob.
For objects given by sha1 it does so via sha1_object_info and does not
cache the result in struct object_entry.

Make this call to sha1_object_info cached in struct object_entry. Also
make note_change_n operate on oe rather than on sha1 - no functional
change, just a purification.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   30 +++++++++++++++++-------------
 1 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 2b049f7..47c1e69 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2394,9 +2394,9 @@ static void note_change_n(struct branch *b, unsigned char old_fanout)
 {
 	const char *p = command_buf.buf + 2;
 	static struct strbuf uq = STRBUF_INIT;
-	struct object_entry *oe = oe;
+	struct object_entry *oe = NULL;
 	struct branch *s;
-	unsigned char sha1[20], commit_sha1[20];
+	unsigned char commit_sha1[20];
 	char path[60];
 	uint16_t inline_data = 0;
 	unsigned char new_fanout;
@@ -2405,15 +2405,16 @@ static void note_change_n(struct branch *b, unsigned char old_fanout)
 	if (*p == ':') {
 		char *x;
 		oe = find_mark(strtoumax(p + 1, &x, 10));
-		hashcpy(sha1, oe->idx.sha1);
 		p = x;
 	} else if (!prefixcmp(p, "inline")) {
 		inline_data = 1;
 		p += 6;
 	} else {
+		unsigned char sha1[20];
 		if (get_sha1_hex(p, sha1))
 			die("Invalid SHA1: %s", command_buf.buf);
-		oe = find_object(sha1);
+		if (!is_null_sha1(sha1))
+			oe = insert_object(sha1);
 		p += 40;
 	}
 	if (*p++ != ' ')
@@ -2440,36 +2441,39 @@ static void note_change_n(struct branch *b, unsigned char old_fanout)
 		die("Invalid ref name or SHA1 expression: %s", p);
 
 	if (inline_data) {
+		unsigned char sha1[20];
 		if (p != uq.buf) {
 			strbuf_addstr(&uq, p);
 			p = uq.buf;
 		}
 		read_next_command();
 		parse_and_store_blob(&last_blob, sha1, 0);
+		oe = find_object(sha1);
 	} else if (oe) {
+		if (!oe->idx.offset) {
+			enum object_type type = sha1_object_info(oe->idx.sha1, NULL);
+			if (type < 0)
+				die("Blob not found: %s", command_buf.buf);
+			oe->type = type;
+			oe->pack_id = MAX_PACK_ID;
+			oe->idx.offset = 1; /* nonzero */
+		}
 		if (oe->type != OBJ_BLOB)
 			die("Not a blob (actually a %s): %s",
 				typename(oe->type), command_buf.buf);
-	} else if (!is_null_sha1(sha1)) {
-		enum object_type type = sha1_object_info(sha1, NULL);
-		if (type < 0)
-			die("Blob not found: %s", command_buf.buf);
-		if (type != OBJ_BLOB)
-			die("Not a blob (actually a %s): %s",
-			    typename(type), command_buf.buf);
 	}
 
 	construct_path_with_fanout(sha1_to_hex(commit_sha1), old_fanout, path);
 	if (tree_content_remove(&b->branch_tree, path, NULL))
 		b->num_notes--;
 
-	if (is_null_sha1(sha1))
+	if (!oe)
 		return; /* nothing to insert */
 
 	b->num_notes++;
 	new_fanout = convert_num_notes_to_fanout(b->num_notes);
 	construct_path_with_fanout(sha1_to_hex(commit_sha1), new_fanout, path);
-	tree_content_set(&b->branch_tree, path, sha1, S_IFREG | 0644, NULL);
+	tree_content_set(&b->branch_tree, path, oe->idx.sha1, S_IFREG | 0644, NULL);
 }
 
 static void file_change_deleteall(struct branch *b)
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/8] fast-import: extract common sha1_file access functions
  2011-09-19  1:27 [PATCH 0/8] fast-import: cache oe more often Dmitry Ivankov
                   ` (2 preceding siblings ...)
  2011-09-19  1:27 ` [PATCH 3/8] fast-import: cache oe in note_change_n Dmitry Ivankov
@ 2011-09-19  1:27 ` Dmitry Ivankov
  2011-09-19  1:27 ` [PATCH 5/8] fast-import: tiny optimization in read_marks Dmitry Ivankov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-09-19  1:27 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Sverre Rabbelier,
	Dmitry Ivankov

fast-import asks sha1_object_info and find_sha1_pack to initialize
struct object_entry in several codepoints.

Extract common functions doing this initialization.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   82 +++++++++++++++++++++++---------------------------------
 1 files changed, 34 insertions(+), 48 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 47c1e69..3c2a067 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -589,6 +589,28 @@ static struct object_entry *insert_object(unsigned char *sha1)
 	return e;
 }
 
+static void resolve_sha1_object(struct object_entry *oe)
+{
+	enum object_type type = sha1_object_info(oe->idx.sha1, NULL);
+	if (type < 0)
+		die("object not found: %s", sha1_to_hex(oe->idx.sha1));
+	oe->type = type;
+	oe->pack_id = MAX_PACK_ID;
+	oe->idx.offset = 1; /* nonzero */
+}
+
+static int try_resolve_sha1_pack_object(struct object_entry *e,
+						enum object_type type)
+{
+	if (!find_sha1_pack(e->idx.sha1, packed_git))
+		return 0;
+
+	e->type = type;
+	e->pack_id = MAX_PACK_ID;
+	e->idx.offset = 1; /* just not zero! */
+	return 1;
+}
+
 static unsigned int hc_str(const char *s, size_t len)
 {
 	unsigned int r = 0;
@@ -1042,10 +1064,7 @@ static int store_object(
 	if (e->idx.offset) {
 		duplicate_count_by_type[type]++;
 		return 1;
-	} else if (find_sha1_pack(sha1, packed_git)) {
-		e->type = type;
-		e->pack_id = MAX_PACK_ID;
-		e->idx.offset = 1; /* just not zero! */
+	} else if (try_resolve_sha1_pack_object(e, type)) {
 		duplicate_count_by_type[type]++;
 		return 1;
 	}
@@ -1252,10 +1271,7 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 		duplicate_count_by_type[OBJ_BLOB]++;
 		truncate_pack(offset, &pack_file_ctx);
 
-	} else if (find_sha1_pack(sha1, packed_git)) {
-		e->type = OBJ_BLOB;
-		e->pack_id = MAX_PACK_ID;
-		e->idx.offset = 1; /* just not zero! */
+	} else if (try_resolve_sha1_pack_object(e, OBJ_BLOB)) {
 		duplicate_count_by_type[OBJ_BLOB]++;
 		truncate_pack(offset, &pack_file_ctx);
 
@@ -1838,13 +1854,8 @@ static void read_marks(void)
 			die("corrupt mark line: %s", line);
 		e = find_object(sha1);
 		if (!e) {
-			enum object_type type = sha1_object_info(sha1, NULL);
-			if (type < 0)
-				die("object not found: %s", sha1_to_hex(sha1));
 			e = insert_object(sha1);
-			e->type = type;
-			e->pack_id = MAX_PACK_ID;
-			e->idx.offset = 1; /* just not zero! */
+			resolve_sha1_object(e);
 		}
 		insert_mark(mark, e);
 	}
@@ -2299,16 +2310,8 @@ static void file_change_m(struct branch *b)
 						OBJ_TREE: OBJ_BLOB;
 		if (!oe)
 			oe = insert_object(sha1);
-		if (!oe->idx.offset) {
-			enum object_type type = sha1_object_info(oe->idx.sha1, NULL);
-			if (type < 0)
-				die("%s not found: %s",
-						S_ISDIR(mode) ?  "Tree" : "Blob",
-						command_buf.buf);
-			oe->type = type;
-			oe->pack_id = MAX_PACK_ID;
-			oe->idx.offset = 1; /* nonzero */
-		}
+		if (!oe->idx.offset)
+			resolve_sha1_object(oe);
 		if (oe->type != expected)
 			die("Not a %s (actually a %s): %s",
 				typename(expected), typename(oe->type),
@@ -2450,14 +2453,8 @@ static void note_change_n(struct branch *b, unsigned char old_fanout)
 		parse_and_store_blob(&last_blob, sha1, 0);
 		oe = find_object(sha1);
 	} else if (oe) {
-		if (!oe->idx.offset) {
-			enum object_type type = sha1_object_info(oe->idx.sha1, NULL);
-			if (type < 0)
-				die("Blob not found: %s", command_buf.buf);
-			oe->type = type;
-			oe->pack_id = MAX_PACK_ID;
-			oe->idx.offset = 1; /* nonzero */
-		}
+		if (!oe->idx.offset)
+			resolve_sha1_object(oe);
 		if (oe->type != OBJ_BLOB)
 			die("Not a blob (actually a %s): %s",
 				typename(oe->type), command_buf.buf);
@@ -2737,15 +2734,10 @@ static void parse_new_tag(void)
 		hashcpy(sha1, oe->idx.sha1);
 	} else if (!get_sha1(from, sha1)) {
 		struct object_entry *oe = insert_object(sha1);
-		if (!oe->idx.offset) {
-			type = sha1_object_info(sha1, NULL);
-			if (type < 0)
-				die("Not a valid object: %s", from);
-			oe->type = type;
-			oe->pack_id = MAX_PACK_ID;
-			oe->idx.offset = 1; /* nonzero */
-		} else
-			type = oe->type;
+		if (!oe->idx.offset)
+			resolve_sha1_object(oe);
+
+		type = oe->type;
 	} else
 		die("Invalid ref name or SHA1 expression: %s", from);
 	read_next_command();
@@ -2892,14 +2884,8 @@ static struct object_entry *dereference(struct object_entry *oe,
 	unsigned long size;
 	char *buf = NULL;
 	if (!oe) {
-		enum object_type type = sha1_object_info(sha1, NULL);
-		if (type < 0)
-			die("object not found: %s", sha1_to_hex(sha1));
-		/* cache it! */
 		oe = insert_object(sha1);
-		oe->type = type;
-		oe->pack_id = MAX_PACK_ID;
-		oe->idx.offset = 1;
+		resolve_sha1_object(oe);
 	}
 	switch (oe->type) {
 	case OBJ_TREE:	/* easy case. */
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/8] fast-import: tiny optimization in read_marks
  2011-09-19  1:27 [PATCH 0/8] fast-import: cache oe more often Dmitry Ivankov
                   ` (3 preceding siblings ...)
  2011-09-19  1:27 ` [PATCH 4/8] fast-import: extract common sha1_file access functions Dmitry Ivankov
@ 2011-09-19  1:27 ` Dmitry Ivankov
  2011-09-19  1:27 ` [PATCH 6/8] fast-import: cache oe in load_tree Dmitry Ivankov
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-09-19  1:27 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Sverre Rabbelier,
	Dmitry Ivankov

read_marks calls find_object and then insert_object if nothing is found.

Reduce it to just insert_object and a check if it was found or inserted.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 3c2a067..dd3dcd5 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1852,11 +1852,9 @@ static void read_marks(void)
 		if (!mark || end == line + 1
 			|| *end != ' ' || get_sha1(end + 1, sha1))
 			die("corrupt mark line: %s", line);
-		e = find_object(sha1);
-		if (!e) {
-			e = insert_object(sha1);
+		e = insert_object(sha1);
+		if (!e->idx.offset)
 			resolve_sha1_object(e);
-		}
 		insert_mark(mark, e);
 	}
 	fclose(f);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/8] fast-import: cache oe in load_tree
  2011-09-19  1:27 [PATCH 0/8] fast-import: cache oe more often Dmitry Ivankov
                   ` (4 preceding siblings ...)
  2011-09-19  1:27 ` [PATCH 5/8] fast-import: tiny optimization in read_marks Dmitry Ivankov
@ 2011-09-19  1:27 ` Dmitry Ivankov
  2011-09-19  1:27 ` [PATCH 7/8] fast-import: cache oe in cat_blob Dmitry Ivankov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-09-19  1:27 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Sverre Rabbelier,
	Dmitry Ivankov

load_tree reads a tree object given it's sha1. If there was no
struct object_entry allocated for this sha1, load_tree doesn't
allocate it and thus doesn't cache it's struct object_entry.

Make this read_sha1_file cached in struct object_entry.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   20 +++++++++++++++++---
 1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index dd3dcd5..1c0716b 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -599,6 +599,17 @@ static void resolve_sha1_object(struct object_entry *oe)
 	oe->idx.offset = 1; /* nonzero */
 }
 
+static void *resolve_sha1_object_read(struct object_entry *oe, enum object_type *type, unsigned long *size)
+{
+	void *ret = read_sha1_file(oe->idx.sha1, type, size);
+	if (!ret)
+		return ret;
+	oe->type = *type;
+	oe->pack_id = MAX_PACK_ID;
+	oe->idx.offset = 1; /* nonzero */
+	return ret;
+}
+
 static int try_resolve_sha1_pack_object(struct object_entry *e,
 						enum object_type type)
 {
@@ -1363,8 +1374,8 @@ static void load_tree(struct tree_entry *root)
 	if (is_null_sha1(sha1))
 		return;
 
-	myoe = find_object(sha1);
-	if (myoe && myoe->pack_id != MAX_PACK_ID) {
+	myoe = insert_object(sha1);
+	if (myoe->idx.offset && myoe->pack_id != MAX_PACK_ID) {
 		if (myoe->type != OBJ_TREE)
 			die("Not a tree: %s", sha1_to_hex(sha1));
 		t->delta_depth = myoe->depth;
@@ -1373,7 +1384,10 @@ static void load_tree(struct tree_entry *root)
 			die("Can't load tree %s", sha1_to_hex(sha1));
 	} else {
 		enum object_type type;
-		buf = read_sha1_file(sha1, &type, &size);
+		if (!myoe->idx.offset)
+			buf = resolve_sha1_object_read(myoe, &type, &size);
+		else
+			buf = read_sha1_file(sha1, &type, &size);
 		if (!buf || type != OBJ_TREE)
 			die("Can't load tree %s", sha1_to_hex(sha1));
 	}
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/8] fast-import: cache oe in cat_blob
  2011-09-19  1:27 [PATCH 0/8] fast-import: cache oe more often Dmitry Ivankov
                   ` (5 preceding siblings ...)
  2011-09-19  1:27 ` [PATCH 6/8] fast-import: cache oe in load_tree Dmitry Ivankov
@ 2011-09-19  1:27 ` Dmitry Ivankov
  2011-09-19  1:27 ` [PATCH 8/8] fast-import: cache objects while dereferencing Dmitry Ivankov
  2011-09-20  4:02 ` [PATCH 0/8] fast-import: cache oe more often Junio C Hamano
  8 siblings, 0 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-09-19  1:27 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Sverre Rabbelier,
	Dmitry Ivankov

cat_blob read_sha1_file's the blob object and doesn't cache
it in struct object_entry.

Make this call to read_sha1_file cached in struct object_entry.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   25 +++++++++++++------------
 1 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 1c0716b..3c4c998 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2816,16 +2816,18 @@ static void cat_blob_write(const char *buf, unsigned long size)
 		die_errno("Write to frontend failed");
 }
 
-static void cat_blob(struct object_entry *oe, unsigned char sha1[20])
+static void cat_blob(struct object_entry *oe)
 {
 	struct strbuf line = STRBUF_INIT;
 	unsigned long size;
 	enum object_type type = 0;
 	char *buf;
 
-	if (!oe || oe->pack_id == MAX_PACK_ID) {
-		buf = read_sha1_file(sha1, &type, &size);
-	} else {
+	if (!oe->idx.offset)
+		buf = resolve_sha1_object_read(oe, &type, &size);
+	else if (oe->pack_id == MAX_PACK_ID)
+		buf = read_sha1_file(oe->idx.sha1, &type, &size);
+	else {
 		type = oe->type;
 		buf = gfi_unpack_entry(oe, &size);
 	}
@@ -2835,25 +2837,25 @@ static void cat_blob(struct object_entry *oe, unsigned char sha1[20])
 	 */
 	if (type <= 0) {
 		strbuf_reset(&line);
-		strbuf_addf(&line, "%s missing\n", sha1_to_hex(sha1));
+		strbuf_addf(&line, "%s missing\n", sha1_to_hex(oe->idx.sha1));
 		cat_blob_write(line.buf, line.len);
 		strbuf_release(&line);
 		free(buf);
 		return;
 	}
 	if (!buf)
-		die("Can't read object %s", sha1_to_hex(sha1));
+		die("Can't read object %s", sha1_to_hex(oe->idx.sha1));
 	if (type != OBJ_BLOB)
 		die("Object %s is a %s but a blob was expected.",
-		    sha1_to_hex(sha1), typename(type));
+		    sha1_to_hex(oe->idx.sha1), typename(type));
 	strbuf_reset(&line);
-	strbuf_addf(&line, "%s %s %lu\n", sha1_to_hex(sha1),
+	strbuf_addf(&line, "%s %s %lu\n", sha1_to_hex(oe->idx.sha1),
 						typename(type), size);
 	cat_blob_write(line.buf, line.len);
 	strbuf_release(&line);
 	cat_blob_write(buf, size);
 	cat_blob_write("\n", 1);
-	if (oe && oe->pack_id == pack_id) {
+	if (oe->pack_id == pack_id) {
 		last_blob.offset = oe->idx.offset;
 		strbuf_attach(&last_blob.data, buf, size, size);
 		last_blob.depth = oe->depth;
@@ -2878,16 +2880,15 @@ static void parse_cat_blob(void)
 			die("Unknown mark: %s", command_buf.buf);
 		if (*x)
 			die("Garbage after mark: %s", command_buf.buf);
-		hashcpy(sha1, oe->idx.sha1);
 	} else {
 		if (get_sha1_hex(p, sha1))
 			die("Invalid SHA1: %s", command_buf.buf);
 		if (p[40])
 			die("Garbage after SHA1: %s", command_buf.buf);
-		oe = find_object(sha1);
+		oe = insert_object(sha1);
 	}
 
-	cat_blob(oe, sha1);
+	cat_blob(oe);
 }
 
 static struct object_entry *dereference(struct object_entry *oe,
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 8/8] fast-import: cache objects while dereferencing
  2011-09-19  1:27 [PATCH 0/8] fast-import: cache oe more often Dmitry Ivankov
                   ` (6 preceding siblings ...)
  2011-09-19  1:27 ` [PATCH 7/8] fast-import: cache oe in cat_blob Dmitry Ivankov
@ 2011-09-19  1:27 ` Dmitry Ivankov
  2011-09-20  4:02 ` [PATCH 0/8] fast-import: cache oe more often Junio C Hamano
  8 siblings, 0 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-09-19  1:27 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Sverre Rabbelier,
	Dmitry Ivankov

dereference() reads objects with read_sha1_file, and reads types
of objects with sha1_object_info. But doesn't cache the result in
struct object_entry.

Make these calls to read_sha1_file and sha1_object_info cached in
struct object_entry.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   31 +++++++++++++++++--------------
 1 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 3c4c998..43158c8 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2891,15 +2891,11 @@ static void parse_cat_blob(void)
 	cat_blob(oe);
 }
 
-static struct object_entry *dereference(struct object_entry *oe,
-					unsigned char sha1[20])
+static struct object_entry *dereference(struct object_entry *oe)
 {
+	unsigned char next_sha1[20];
 	unsigned long size;
 	char *buf = NULL;
-	if (!oe) {
-		oe = insert_object(sha1);
-		resolve_sha1_object(oe);
-	}
 	switch (oe->type) {
 	case OBJ_TREE:	/* easy case. */
 		return oe;
@@ -2914,26 +2910,31 @@ static struct object_entry *dereference(struct object_entry *oe,
 		buf = gfi_unpack_entry(oe, &size);
 	} else {
 		enum object_type unused;
-		buf = read_sha1_file(sha1, &unused, &size);
+		buf = read_sha1_file(oe->idx.sha1, &unused, &size);
 	}
 	if (!buf)
-		die("Can't load object %s", sha1_to_hex(sha1));
+		die("Can't load object %s", sha1_to_hex(oe->idx.sha1));
 
 	/* Peel one layer. */
 	switch (oe->type) {
 	case OBJ_TAG:
 		if (size < 40 + strlen("object ") ||
-		    get_sha1_hex(buf + strlen("object "), sha1))
+		    get_sha1_hex(buf + strlen("object "), next_sha1))
 			die("Invalid SHA1 in tag: %s", command_buf.buf);
 		break;
 	case OBJ_COMMIT:
 		if (size < 40 + strlen("tree ") ||
-		    get_sha1_hex(buf + strlen("tree "), sha1))
+		    get_sha1_hex(buf + strlen("tree "), next_sha1))
 			die("Invalid SHA1 in commit: %s", command_buf.buf);
 	}
 
 	free(buf);
-	return find_object(sha1);
+
+	oe = insert_object(next_sha1);
+	if (!oe->idx.offset)
+		resolve_sha1_object(oe);
+
+	return oe;
 }
 
 static struct object_entry *parse_treeish_dataref(const char **p)
@@ -2953,12 +2954,14 @@ static struct object_entry *parse_treeish_dataref(const char **p)
 	} else {	/* <sha1> */
 		if (get_sha1_hex(*p, sha1))
 			die("Invalid SHA1: %s", command_buf.buf);
-		e = find_object(sha1);
+		e = insert_object(sha1);
+		if (!e->idx.offset)
+			resolve_sha1_object(e);
 		*p += 40;
 	}
 
-	while (!e || e->type != OBJ_TREE)
-		e = dereference(e, sha1);
+	while (e->type != OBJ_TREE)
+		e = dereference(e);
 	return e;
 }
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/8] fast-import: cache oe more often
  2011-09-19  1:27 [PATCH 0/8] fast-import: cache oe more often Dmitry Ivankov
                   ` (7 preceding siblings ...)
  2011-09-19  1:27 ` [PATCH 8/8] fast-import: cache objects while dereferencing Dmitry Ivankov
@ 2011-09-20  4:02 ` Junio C Hamano
  2011-09-20  4:26   ` Jonathan Nieder
  8 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2011-09-20  4:02 UTC (permalink / raw)
  To: Dmitry Ivankov
  Cc: git, Jonathan Nieder, Shawn O. Pearce, David Barr,
	Sverre Rabbelier

Dmitry Ivankov <divanorama@gmail.com> writes:

> fast-import keeps a struct object_entry for each object written to
> it's pack. This is to keep type, pack-coordinates and delta_depth.
> struct object_entry is also used to cache this metadata for objects
> that exist outside fast-import's pack ('old' objects).
> struct object_entry has a small fixed size and thus it should be
> reasonable to cache any 'old' object metadata retrieval to save the
> disk i/o.
>
> Also it is a step toward making fast-import identify objects via
> struct object_entry rather than sha1. One pointer takes less than
> 20 bytes, it'll be later possible to have references to objects
> that don't yet have sha1 computed (fast-import with threads future).

I gave the series a cursory look, and the patches all looked like a good
and straight forward rewrites.  Provided if it is indeed a good idea
overall to stuff more objects in-core, that is.

Hopefully people more involved in fast-import can review and ack after the
pre-release feature freeze.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/8] fast-import: cache oe more often
  2011-09-20  4:02 ` [PATCH 0/8] fast-import: cache oe more often Junio C Hamano
@ 2011-09-20  4:26   ` Jonathan Nieder
  2011-09-20  7:17     ` Dmitry Ivankov
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Nieder @ 2011-09-20  4:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Dmitry Ivankov, git, Shawn O. Pearce, David Barr,
	Sverre Rabbelier

Junio C Hamano wrote:

> I gave the series a cursory look, and the patches all looked like a good
> and straight forward rewrites.  Provided if it is indeed a good idea
> overall to stuff more objects in-core, that is.

Right, that's exactly the question I had.  When and why is it a good
idea to stuff more objects in-core (or when might it be a bad idea,
for that matter)?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/8] fast-import: cache oe more often
  2011-09-20  4:26   ` Jonathan Nieder
@ 2011-09-20  7:17     ` Dmitry Ivankov
  2011-09-20 14:39       ` Jonathan Nieder
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Ivankov @ 2011-09-20  7:17 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Junio C Hamano, git, Shawn O. Pearce, David Barr,
	Sverre Rabbelier

On Tue, Sep 20, 2011 at 10:26 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Junio C Hamano wrote:
>
>> I gave the series a cursory look, and the patches all looked like a good
>> and straight forward rewrites.  Provided if it is indeed a good idea
>> overall to stuff more objects in-core, that is.
>
> Right, that's exactly the question I had.  When and why is it a good
> idea to stuff more objects in-core (or when might it be a bad idea,
> for that matter)?

The next step would be to replace sha1 with struct object_entry* in fast-import.
So it'll be in struct tree_entry (twice, for each of versions[2]),
branch, tag, hash_list (used to store merge from lists), last_object.
Then some fields will be deleted as they can be accessed from
object_entry:
last_object->depth
last_object->offset
tree_content->delta_depth
branch,tag->pack_id

And it all even slightly decreased memory consumption (checked some
time ago, but think it's still true). Probably because of tree nodes
having NULL instead of null_sha1 and then ptr instead of sha1; and
maybe because for huge imports each object is from our pack so for a
sha1 there always is object_entry anyway.

In short, if there is nothing bad with this patchset, it'll be
absolutely natural one after switch to oe instead of sha1, but it's
put before to split the big series. And of course this part may have a
small speedup of it's own. If it's not too good to be accepted on it's
own, I'll just include it into future series depending on it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/8] fast-import: cache oe more often
  2011-09-20  7:17     ` Dmitry Ivankov
@ 2011-09-20 14:39       ` Jonathan Nieder
  0 siblings, 0 replies; 13+ messages in thread
From: Jonathan Nieder @ 2011-09-20 14:39 UTC (permalink / raw)
  To: Dmitry Ivankov
  Cc: Junio C Hamano, git, Shawn O. Pearce, David Barr,
	Sverre Rabbelier

Dmitry Ivankov wrote:

> The next step would be to replace sha1 with struct object_entry* in fast-import.
> So it'll be in struct tree_entry (twice, for each of versions[2]),
> branch, tag, hash_list (used to store merge from lists), last_object.
> Then some fields will be deleted as they can be accessed from
> object_entry:
> last_object->depth
> last_object->offset
> tree_content->delta_depth
> branch,tag->pack_id
> 
> And it all even slightly decreased memory consumption (checked some
> time ago, but think it's still true).

Yes, that sounds interesting, so:

[...]
> In short, if there is nothing bad with this patchset, it'll be
> absolutely natural one after switch to oe instead of sha1, but it's
> put before to split the big series. And of course this part may have a
> small speedup of it's own. If it's not too good to be accepted on it's
> own, I'll just include it into future series depending on it.

It would be indeed be more natural to review a single series that
combines this preparation with the change it prepares for.  (And the
change descriptions should explain on their own why they are
individually justified or what project they are contributing towards.)

My question was actually about this last point you made in the
second-to-last sentence: have you measured the speedup produced by the
patches you already sent?  I didn't think carefully about it, but my
first thought was that it might slow things down as the internal hash
tables (which still seem to be fixed-size in mainline git) start to
fill up.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-09-20 14:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-19  1:27 [PATCH 0/8] fast-import: cache oe more often Dmitry Ivankov
2011-09-19  1:27 ` [PATCH 1/8] fast-import: cache oe in file_change_m Dmitry Ivankov
2011-09-19  1:27 ` [PATCH 2/8] fast-import: cache oe in parse_new_tag Dmitry Ivankov
2011-09-19  1:27 ` [PATCH 3/8] fast-import: cache oe in note_change_n Dmitry Ivankov
2011-09-19  1:27 ` [PATCH 4/8] fast-import: extract common sha1_file access functions Dmitry Ivankov
2011-09-19  1:27 ` [PATCH 5/8] fast-import: tiny optimization in read_marks Dmitry Ivankov
2011-09-19  1:27 ` [PATCH 6/8] fast-import: cache oe in load_tree Dmitry Ivankov
2011-09-19  1:27 ` [PATCH 7/8] fast-import: cache oe in cat_blob Dmitry Ivankov
2011-09-19  1:27 ` [PATCH 8/8] fast-import: cache objects while dereferencing Dmitry Ivankov
2011-09-20  4:02 ` [PATCH 0/8] fast-import: cache oe more often Junio C Hamano
2011-09-20  4:26   ` Jonathan Nieder
2011-09-20  7:17     ` Dmitry Ivankov
2011-09-20 14:39       ` Jonathan Nieder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).