Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2)

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2)
@ 2007-06-04  0:51 Johan Herland
  2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
                   ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-04  0:51 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

(In response to the earlier thread on the git 'notes' feature/idea: "[PATCH 
00/15] git-note: A mechanism for providing free-form after-the-fact 
annotations on commits")

Ok, this is getting too big for me to ship all in one patch series, so I'm 
gonna have to split it up. Right now I'm thinking three patch series, but 
as I'm not near finished yet, that may change.

The three patch series I'm talking about are:

1. Refactoring the git tag object. This lays down part of the groundwork 
needed to support the second incarnation of git 'notes'. This is all ready 
to go and should follow shortly after this mail.

2. Introducing soft references (softrefs). Softrefs is the general mechanism 
behind the "reverse mapping" needed to support 'notes', as discussed in the 
previous thread. I'm still working on this patch series, and it will 
hopefully be ready soon-ish.

3. Reimplementing git 'notes' on top of (1) and (2). This should be fairly 
easy (once (2) is done), as it's mostly a matter of building a simple 
porcelain on top of (1) and (2).

Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 0/6] Refactor the tag object
  2007-06-04  0:51 Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2) Johan Herland
@ 2007-06-04  0:51 ` Johan Herland
  2007-06-04  0:52   ` [PATCH 1/6] Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header Johan Herland
                     ` (7 more replies)
  2007-06-09 18:19 ` [PATCH 0/7] Introduce soft references (softrefs) Johan Herland
  2007-06-09 22:57 ` Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2) Steven Grimm
  2 siblings, 8 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-04  0:51 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

This patch series implements part of the ground work for the 'notes'
feature discussed earlier in the thread "[PATCH 00/15] git-note: A
mechanism for providing free-form after-the-fact annotations on commits".

The following patches refactors the tag object by:
1. Making the "tag" header optional
2. Introducing a new optional "keywords" header
3. Making the "tagger" header mandatory as far as possible
4. Do better and more thorough verification of tag objects

Unfortunately the first patch in the series is bigger than I would have
liked, but I couldn't find an easy way to split it up.

Here's the shortlog:

Johan Herland (6):
      Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header
      git-show: When showing tag objects with no tag name, show tag object's SHA1 instead of an empty string
      git-fsck: Do thorough verification of tag objects.
      Documentation/git-mktag: Document the changes in tag object structure
      git-mktag tests: Fix and expand the mktag tests according to the new tag object structure
      Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object

 Documentation/git-mktag.txt |   42 +++++--
 builtin-fsck.c              |   35 ++++++
 builtin-log.c               |    2 +-
 mktag.c                     |  148 +++++-------------------
 t/t3800-mktag.sh            |  204 ++++++++++++++++++++++++++++++---
 tag.c                       |  266 +++++++++++++++++++++++++++++++++++--------
 tag.h                       |    4 +-
 7 files changed, 507 insertions(+), 194 deletions(-)


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 1/6] Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header
  2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
@ 2007-06-04  0:52   ` Johan Herland
  2007-06-04  6:08     ` Matthias Lederhofer
  2007-06-04  0:53   ` [PATCH 2/6] git-show: When showing tag objects with no tag name, show tag object's SHA1 instead of an empty string Johan Herland
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 52+ messages in thread
From: Johan Herland @ 2007-06-04  0:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano


In order to support ref-less tag objects (aka. 'notes'), we want to do
some changes to the tag object structure. The new structure implemented
in this patch is backward-compatible in the way that all existing tag
objects (valid in the old implementation) will remain valid in the new
implementation. The following changes are done:

1. Make the "tag" header optional. The "tag" header contains the tag name,
   which is optional for 'notes'. The new semantics for the "tag" header
   are as follows: The tag header _must_ be given for signed tags (this
   is already enforced by git-tag.sh). When the tag header is not given,
   its value defaults to the empty string.

2. Introduce a new optional "keywords" header. The "keywords" header is a
   comma-separated list of free-form values. However, two certain value
   have special meaning: "tag" and "note": When the "keywords" header is
   missing, its default value is set to "tag" if a "tag" header is
   present; else the default "keywords" value is set to "note". The
   "keywords" header is meant to be used by porcelains for classifying
   different types of tag objects. This classification may then be used to
   filter tag objects in the presentation layer (e.g. by implementing
   extra filter options to --decorate, etc.)

3. Make the "tagger" header mandatory. This header has already for a long
   time been "de facto" mandatory, in that it is automatically generated
   by git-tag.sh. This patch verifies the existence of the header when
   creating tag objects. However, since there exists old tags without
   the "tagger" header, the verification should not be done when parsing
   tag objects.

4. Consolidate the parsing and verification of tag objects. Currently
   parsing is done in tag.c:parse_tag_buffer(), and verification is done
   with a very similar piece of code in mktag.c:verify_tag(). This patch
   unifies the parsing and verification of tag objects in a new function
   parse_and_verify_tag_buffer() which is then called from the other
   places.

Signed-off-by: Johan Herland <johan@herland.net>
---
 mktag.c |  148 ++++++++---------------------------
 tag.c   |  266 +++++++++++++++++++++++++++++++++++++++++++++++++++------------
 tag.h   |    4 +-
 3 files changed, 251 insertions(+), 167 deletions(-)

diff --git a/mktag.c b/mktag.c
index 9310111..cde6036 100644
--- a/mktag.c
+++ b/mktag.c
@@ -2,128 +2,42 @@
 #include "tag.h"
 
 /*
- * A signature file has a very simple fixed format: four lines
- * of "object <sha1>" + "type <typename>" + "tag <tagname>" +
+ * Tag object data has the following format: two mandatory lines of
+ * "object <sha1>" + "type <typename>", plus two optional lines of
+ * "tag <tagname>" + "keywords <keywords>", plus a mandatory line of
  * "tagger <committer>", followed by a blank line, a free-form tag
- * message and a signature block that git itself doesn't care about,
- * but that can be verified with gpg or similar.
+ * message and an optional signature block that git itself doesn't
+ * care about, but that can be verified with gpg or similar.
  *
- * The first three lines are guaranteed to be at least 63 bytes:
- * "object <sha1>\n" is 48 bytes, "type tag\n" at 9 bytes is the
- * shortest possible type-line, and "tag .\n" at 6 bytes is the
- * shortest single-character-tag line. 
+ * <sha1> represents the object pointed to by this tag, <typename> is
+ * the type of the object pointed to ("tag", "blob", "tree" or "commit"),
+ * <tagname> is the name of this tag object (and must correspond to the
+ * name of the corresponding ref (if any) in '.git/refs/'). <keywords> is
+ * a comma-separated list of keywords associated with this tag object, and
+ * <committer> holds the "name <email>" of the tag creator and timestamp
+ * of when the tag object was created (analogous to "committer" in commit
+ * objects).
  *
- * We also artificially limit the size of the full object to 8kB.
- * Just because I'm a lazy bastard, and if you can't fit a signature
- * in that size, you're doing something wrong.
- */
-
-/* Some random size */
-#define MAXSIZE (8192)
-
-/*
- * We refuse to tag something we can't verify. Just because.
+ * The first two lines are guaranteed to be at least 57 bytes:
+ * "object <sha1>\n" is 48 bytes, and "type tag\n" at 9 bytes is
+ * the shortest possible "type" line. The tagger line is at least
+ * "tagger \n" (8 bytes), and a blank line is also needed (1 byte).
+ * Therefore a tag object _must_ have >= 66 bytes.
+ *
+ * If "tag <tagname>" is omitted, <tagname> defaults to the empty string.
+ * If "keywords <keywords>" is omitted, <keywords> defaults to "tag" if
+ * a <tagname> was given, "note" otherwise.
  */
-static int verify_object(unsigned char *sha1, const char *expected_type)
-{
-	int ret = -1;
-	enum object_type type;
-	unsigned long size;
-	void *buffer = read_sha1_file(sha1, &type, &size);
-
-	if (buffer) {
-		if (type == type_from_string(expected_type))
-			ret = check_sha1_signature(sha1, buffer, size, expected_type);
-		free(buffer);
-	}
-	return ret;
-}
-
-#ifdef NO_C99_FORMAT
-#define PD_FMT "%d"
-#else
-#define PD_FMT "%td"
-#endif
-
-static int verify_tag(char *buffer, unsigned long size)
-{
-	int typelen;
-	char type[20];
-	unsigned char sha1[20];
-	const char *object, *type_line, *tag_line, *tagger_line;
-
-	if (size < 64)
-		return error("wanna fool me ? you obviously got the size wrong !");
-
-	buffer[size] = 0;
-
-	/* Verify object line */
-	object = buffer;
-	if (memcmp(object, "object ", 7))
-		return error("char%d: does not start with \"object \"", 0);
-
-	if (get_sha1_hex(object + 7, sha1))
-		return error("char%d: could not get SHA1 hash", 7);
-
-	/* Verify type line */
-	type_line = object + 48;
-	if (memcmp(type_line - 1, "\ntype ", 6))
-		return error("char%d: could not find \"\\ntype \"", 47);
-
-	/* Verify tag-line */
-	tag_line = strchr(type_line, '\n');
-	if (!tag_line)
-		return error("char" PD_FMT ": could not find next \"\\n\"", type_line - buffer);
-	tag_line++;
-	if (memcmp(tag_line, "tag ", 4) || tag_line[4] == '\n')
-		return error("char" PD_FMT ": no \"tag \" found", tag_line - buffer);
-
-	/* Get the actual type */
-	typelen = tag_line - type_line - strlen("type \n");
-	if (typelen >= sizeof(type))
-		return error("char" PD_FMT ": type too long", type_line+5 - buffer);
-
-	memcpy(type, type_line+5, typelen);
-	type[typelen] = 0;
-
-	/* Verify that the object matches */
-	if (verify_object(sha1, type))
-		return error("char%d: could not verify object %s", 7, sha1_to_hex(sha1));
-
-	/* Verify the tag-name: we don't allow control characters or spaces in it */
-	tag_line += 4;
-	for (;;) {
-		unsigned char c = *tag_line++;
-		if (c == '\n')
-			break;
-		if (c > ' ')
-			continue;
-		return error("char" PD_FMT ": could not verify tag name", tag_line - buffer);
-	}
-
-	/* Verify the tagger line */
-	tagger_line = tag_line;
-
-	if (memcmp(tagger_line, "tagger", 6) || (tagger_line[6] == '\n'))
-		return error("char" PD_FMT ": could not find \"tagger\"", tagger_line - buffer);
-
-	/* TODO: check for committer info + blank line? */
-	/* Also, the minimum length is probably + "tagger .", or 63+8=71 */
-
-	/* The actual stuff afterwards we don't care about.. */
-	return 0;
-}
-
-#undef PD_FMT
 
 int main(int argc, char **argv)
 {
 	unsigned long size = 4096;
 	char *buffer = xmalloc(size);
+	struct tag result_tag;
 	unsigned char result_sha1[20];
 
 	if (argc != 1)
-		usage("git-mktag < signaturefile");
+		usage("git-mktag < tag_data_file");
 
 	setup_git_directory();
 
@@ -132,16 +46,18 @@ int main(int argc, char **argv)
 		die("could not read from stdin");
 	}
 
-	/* Verify it for some basic sanity: it needs to start with
-	   "object <sha1>\ntype\ntagger " */
-	if (verify_tag(buffer, size) < 0)
-		die("invalid tag signature file");
+	/* Verify tag object data */
+	if (parse_and_verify_tag_buffer(&result_tag, buffer, size, 1)) {
+		free(buffer);
+		die("invalid tag data file");
+	}
 
-	if (write_sha1_file(buffer, size, tag_type, result_sha1) < 0)
+	if (write_sha1_file(buffer, size, tag_type, result_sha1) < 0) {
+		free(buffer);
 		die("unable to write tag file");
+	}
 
 	free(buffer);
-
 	printf("%s\n", sha1_to_hex(result_sha1));
 	return 0;
 }
diff --git a/tag.c b/tag.c
index bbacd59..9c95e0b 100644
--- a/tag.c
+++ b/tag.c
@@ -33,65 +33,231 @@ struct tag *lookup_tag(const unsigned char *sha1)
         return (struct tag *) obj;
 }
 
-int parse_tag_buffer(struct tag *item, void *data, unsigned long size)
+/*
+ * We refuse to tag something we can't verify. Just because.
+ */
+static int verify_object(unsigned char *sha1, const char *expected_type)
+{
+	int ret = -1;
+	enum object_type type;
+	unsigned long size;
+	void *buffer = read_sha1_file(sha1, &type, &size);
+
+	if (buffer) {
+		if (type == type_from_string(expected_type))
+			ret = check_sha1_signature(sha1, buffer, size, expected_type);
+		free(buffer);
+	}
+	return ret;
+}
+
+/*
+ * Perform parsing and verification of tag object data.
+ *
+ * The 'item' parameter may be set to NULL if only verification is desired.
+ */
+int parse_and_verify_tag_buffer(struct tag *item, const char *data, const unsigned long size, int thorough_verify)
 {
-	int typelen, taglen;
+#ifdef NO_C99_FORMAT
+#define PD_FMT "%d"
+#else
+#define PD_FMT "%td"
+#endif
+
 	unsigned char sha1[20];
-	const char *type_line, *tag_line, *sig_line;
 	char type[20];
+	const char   *type_line, *tag_line, *keywords_line, *tagger_line;
+	unsigned long type_len,   tag_len,   keywords_len,   tagger_len;
+	const char *header_end, *end = data + size;
 
-        if (item->object.parsed)
-                return 0;
-        item->object.parsed = 1;
-
-	if (size < 64)
-		return -1;
-	if (memcmp("object ", data, 7) || get_sha1_hex((char *) data + 7, sha1))
-		return -1;
-
-	type_line = (char *) data + 48;
-	if (memcmp("\ntype ", type_line-1, 6))
-		return -1;
-
-	tag_line = strchr(type_line, '\n');
-	if (!tag_line || memcmp("tag ", ++tag_line, 4))
-		return -1;
-
-	sig_line = strchr(tag_line, '\n');
-	if (!sig_line)
-		return -1;
-	sig_line++;
-
-	typelen = tag_line - type_line - strlen("type \n");
-	if (typelen >= 20)
-		return -1;
-	memcpy(type, type_line + 5, typelen);
-	type[typelen] = '\0';
-	taglen = sig_line - tag_line - strlen("tag \n");
-	item->tag = xmalloc(taglen + 1);
-	memcpy(item->tag, tag_line + 4, taglen);
-	item->tag[taglen] = '\0';
-
-	if (!strcmp(type, blob_type)) {
-		item->tagged = &lookup_blob(sha1)->object;
-	} else if (!strcmp(type, tree_type)) {
-		item->tagged = &lookup_tree(sha1)->object;
-	} else if (!strcmp(type, commit_type)) {
-		item->tagged = &lookup_commit(sha1)->object;
-	} else if (!strcmp(type, tag_type)) {
-		item->tagged = &lookup_tag(sha1)->object;
-	} else {
-		error("Unknown type %s", type);
-		item->tagged = NULL;
+	if (item) {
+		if (item->object.parsed)
+			return 0;
+		item->object.parsed = 1;
 	}
 
-	if (item->tagged && track_object_refs) {
-		struct object_refs *refs = alloc_object_refs(1);
-		refs->ref[0] = item->tagged;
-		set_object_refs(&item->object, refs);
+	if (size < 66)
+		return error("failed preliminary size check");
+
+	/* Verify mandatory object line */
+	if (memcmp(data, "object ", 7))
+		return error("char%d: does not start with \"object \"", 0);
+
+	if (get_sha1_hex(data + 7, sha1))
+		return error("char%d: could not get SHA1 hash", 7);
+
+	/* Verify mandatory type line */
+	type_line = data + 48;
+	if (memcmp(type_line - 1, "\ntype ", 6))
+		return error("char%d: could not find \"\\ntype \"", 47);
+
+	/* Verify optional tag line */
+	tag_line = memchr(type_line, '\n', end - type_line);
+	if (!tag_line++)
+		return error("char" PD_FMT ": could not find \"\\n\" after \"type\"", type_line - data);
+	if (end - tag_line < 4)
+		return error("char" PD_FMT ": premature end of data", tag_line - data);
+	if (memcmp("tag ", tag_line, 4))
+		keywords_line = tag_line; /* no tag name given */
+	else {                            /* tag name given */
+		keywords_line = memchr(tag_line, '\n', end - tag_line);
+		if (!keywords_line++)
+			return error("char" PD_FMT ": could not find \"\\n\" after \"tag\"", tag_line - data);
+	}
+
+	/* Verify optional keywords line */
+	if (end - keywords_line < 9)
+		return error("char" PD_FMT ": premature end of data", keywords_line - data);
+	if (memcmp("keywords ", keywords_line, 9))
+		tagger_line = keywords_line; /* no keywords given */
+	else {                               /* keywords given */
+		tagger_line = memchr(keywords_line, '\n', end - keywords_line);
+		if (!tagger_line++)
+			return error("char" PD_FMT ": could not find \"\\n\" after \"keywords\"", keywords_line - data);
+	}
+
+	if (thorough_verify) {
+		/*
+		 * Verify mandatory tagger line, but only when we're checking
+		 * thoroughly, i.e. on inserting a new tag, and on fsck.
+		 * There are existing tag objects without a tagger line (most
+		 * notably the "v0.99" tag in the main git repo), and we don't
+		 * want to fail parsing on these.
+		 */
+		if (end - tagger_line < 7)
+			return error("char" PD_FMT ": premature end of data", tagger_line - data);
+		if (memcmp("tagger ", tagger_line, 7))
+			return error("char" PD_FMT ": could not find \"tagger \"", tagger_line - data);
+		header_end = memchr(tagger_line, '\n', end - tagger_line);
+		if (!header_end++)
+			return error("char" PD_FMT ": could not find \"\\n\" after \"tagger\"", tagger_line - data);
+		if (end - header_end < 1)
+			return error("char" PD_FMT ": premature end of data", header_end - data);
+		if (*header_end != '\n') /* header must end with "\n\n" */
+			return error("char" PD_FMT ": could not find blank line after header section", header_end - data);
+	}
+	else {
+		/* Treat tagger line as optional */
+		if (end - tagger_line >= 7 && !memcmp("tagger ", tagger_line, 7)) {
+			/* Found tagger line */
+			header_end = memchr(tagger_line, '\n', end - tagger_line);
+			if (!header_end++)
+				return error("char" PD_FMT ": could not find \"\\n\" after \"tagger\"", tagger_line - data);
+		}
+		else /* No tagger line */
+			header_end = tagger_line;
 	}
 
+	if (end - header_end < 1)
+		return error("char" PD_FMT ": premature end of data", header_end - data);
+	if (*header_end != '\n') /* header must end with "\n\n" */
+		return error("char" PD_FMT ": could not find blank line after header section", header_end - data);
+
+	/* Calculate lengths of header fields */
+	type_len      = tag_line      == type_line ? 0 :     /* 0 if not given, > 0 if given */
+			(tag_line      - type_line)     - strlen("type \n");
+	tag_len       = keywords_line == tag_line ? 0 :      /* 0 if not given, > 0 if given */
+			(keywords_line - tag_line)      - strlen("tag \n");
+	keywords_len  = tagger_line   == keywords_line ? 0 : /* 0 if not given, > 0 if given */
+			(tagger_line   - keywords_line) - strlen("keywords \n");
+	tagger_len    = header_end    == tagger_line ? 0 :   /* 0 if not given, > 0 if given */
+			(header_end    - tagger_line)   - strlen("tagger \n");
+
+	/* Get the actual type */
+	if (type_len >= sizeof(type))
+		return error("char" PD_FMT ": type too long", (type_line + 5) - data);
+	memcpy(type, type_line + 5, type_len);
+	type[type_len] = '\0';
+
+	if (thorough_verify) {
+		/* Verify that the object matches */
+		if (verify_object(sha1, type))
+			return error("char%d: could not verify object %s", 7, sha1_to_hex(sha1));
+
+		/* Verify the tag name: we don't allow control characters or spaces in it */
+		if (tag_len > 0) { /* tag name was given */
+			tag_line += 4; /* skip past "tag " */
+			for (;;) {
+				unsigned char c = *tag_line++;
+				if (c == '\n')
+					break;
+				if (c > ' ' && c != 0x7f)
+					continue;
+				return error("char" PD_FMT ": could not verify tag name", tag_line - data);
+			}
+		}
+
+		/* Verify the keywords line: we don't allow control characters or spaces in it, or two subsequent commas */
+		if (keywords_len > 0) { /* keywords line was given */
+			keywords_line += 9; /* skip past "keywords " */
+			for (;;) {
+				unsigned char c = *keywords_line++;
+				if (c == '\n')
+					break;
+				if (c == ',' && *keywords_line == ',')
+					return error("char" PD_FMT ": found empty keyword", keywords_line - data);
+				if (c > ' ' && c != 0x7f)
+					continue;
+				return error("char" PD_FMT ": could not verify keywords", keywords_line - data);
+			}
+		}
+
+		/* Verify the tagger line */
+		/* TODO: check for committer/tagger info */
+
+		/* The actual stuff afterwards we don't care about.. */
+	}
+
+	if (item) { /* Store parsed information into item */
+		if (tag_len > 0) { /* optional tag name was given */
+			item->tag = xmalloc(tag_len + 1);
+			memcpy(item->tag, tag_line + 4, tag_len);
+			item->tag[tag_len] = '\0';
+		}
+		else { /* optional tag name not given */
+			item->tag = xmalloc(1);
+			item->tag[0] = '\0';
+		}
+
+		if (keywords_len > 0) { /* optional keywords string was given */
+			item->keywords = xmalloc(keywords_len + 1);
+			memcpy(item->keywords, keywords_line + 9, keywords_len);
+			item->keywords[keywords_len] = '\0';
+		}
+		else { /* optional keywords string not given. Set default keywords */
+			/* if tag name is set, use "tag"; otherwise use "note" */
+			const char *default_kw = item->tag ? "tag" : "note";
+			item->keywords = xmalloc(strlen(default_kw) + 1);
+			memcpy(item->keywords, default_kw, strlen(default_kw) + 1);
+		}
+
+		if (!strcmp(type, blob_type)) {
+			item->tagged = &lookup_blob(sha1)->object;
+		} else if (!strcmp(type, tree_type)) {
+			item->tagged = &lookup_tree(sha1)->object;
+		} else if (!strcmp(type, commit_type)) {
+			item->tagged = &lookup_commit(sha1)->object;
+		} else if (!strcmp(type, tag_type)) {
+			item->tagged = &lookup_tag(sha1)->object;
+		} else {
+			error("Unknown type %s", type);
+			item->tagged = NULL;
+		}
+
+		if (item->tagged && track_object_refs) {
+			struct object_refs *refs = alloc_object_refs(1);
+			refs->ref[0] = item->tagged;
+			set_object_refs(&item->object, refs);
+		}
+	}
 	return 0;
+
+#undef PD_FMT
+}
+
+int parse_tag_buffer(struct tag *item, void *data, unsigned long size)
+{
+	return parse_and_verify_tag_buffer(item, (const char *) data, size, 0);
 }
 
 int parse_tag(struct tag *item)
diff --git a/tag.h b/tag.h
index 7a0cb00..6853594 100644
--- a/tag.h
+++ b/tag.h
@@ -8,11 +8,13 @@ extern const char *tag_type;
 struct tag {
 	struct object object;
 	struct object *tagged;
-	char *tag;
+	char *tag;       /* optional, may be empty ("") */
+	char *keywords;  /* optional, defaults to tag ? "tag" : "note" */
 	char *signature; /* not actually implemented */
 };
 
 extern struct tag *lookup_tag(const unsigned char *sha1);
+extern int parse_and_verify_tag_buffer(struct tag *item, const char *data, const unsigned long size, int thorough_verify);
 extern int parse_tag_buffer(struct tag *item, void *data, unsigned long size);
 extern int parse_tag(struct tag *item);
 extern struct object *deref_tag(struct object *, const char *, int);
-- 
1.5.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 2/6] git-show: When showing tag objects with no tag name, show tag object's SHA1 instead of an empty string
  2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
  2007-06-04  0:52   ` [PATCH 1/6] Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header Johan Herland
@ 2007-06-04  0:53   ` Johan Herland
  2007-06-04  0:53   ` [PATCH 3/6] git-fsck: Do thorough verification of tag objects Johan Herland
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-04  0:53 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

This is a consequence of making the "tag" header in tag objects optional.

Signed-off-by: Johan Herland <johan@herland.net>
---
 builtin-log.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index 3744712..1a0f111 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -181,7 +181,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			printf("%stag %s%s\n\n",
 					diff_get_color(rev.diffopt.color_diff,
 						DIFF_COMMIT),
-					t->tag,
+					*(t->tag) ? t->tag : name,
 					diff_get_color(rev.diffopt.color_diff,
 						DIFF_RESET));
 			ret = show_object(o->sha1, 1);
-- 
1.5.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 3/6] git-fsck: Do thorough verification of tag objects.
  2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
  2007-06-04  0:52   ` [PATCH 1/6] Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header Johan Herland
  2007-06-04  0:53   ` [PATCH 2/6] git-show: When showing tag objects with no tag name, show tag object's SHA1 instead of an empty string Johan Herland
@ 2007-06-04  0:53   ` Johan Herland
  2007-06-04  5:56     ` Matthias Lederhofer
  2007-06-04  0:54   ` [PATCH 4/6] Documentation/git-mktag: Document the changes in tag object structure Johan Herland
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 52+ messages in thread
From: Johan Herland @ 2007-06-04  0:53 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Teach git-fsck to do the same kind of verification on tag objects that is
already done by git-mktag.

Signed-off-by: Johan Herland <johan@herland.net>
---
 builtin-fsck.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index cbbcaf0..a8914ae 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -344,6 +344,20 @@ static int fsck_commit(struct commit *commit)
 static int fsck_tag(struct tag *tag)
 {
 	struct object *tagged = tag->tagged;
+	enum object_type type;
+	unsigned long size;
+	char *data = (char *) read_sha1_file(tag->object.sha1, &type, &size);
+	if (!data)
+		return error("Could not read tag %s", sha1_to_hex(tag->object.sha1));
+	if (type != OBJ_TAG) {
+		free(data);
+		return error("Internal error: Tag %s not a tag", sha1_to_hex(tag->object.sha1));
+	}
+	if (parse_and_verify_tag_buffer(0, data, size, 1)) { /* Thoroughly verify tag object */
+		free(data);
+		return error("Tag %s failed thorough tag object verification", sha1_to_hex(tag->object.sha1));
+	}
+	free(data);
 
 	if (!tagged) {
 		return objerror(&tag->object, "could not load tagged object");
-- 
1.5.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 4/6] Documentation/git-mktag: Document the changes in tag object structure
  2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
                     ` (2 preceding siblings ...)
  2007-06-04  0:53   ` [PATCH 3/6] git-fsck: Do thorough verification of tag objects Johan Herland
@ 2007-06-04  0:54   ` Johan Herland
  2007-06-04  0:54   ` [PATCH 5/6] git-mktag tests: Fix and expand the mktag tests according to the new " Johan Herland
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-04  0:54 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

The new structure of tag objects is documented.

Also some much-needed cleanup is done. E.g. remove the paragraph on the
8kB limit, since this limit was removed ages ago.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/git-mktag.txt |   42 ++++++++++++++++++++++++++++++------------
 1 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt
index 2860a3d..411105d 100644
--- a/Documentation/git-mktag.txt
+++ b/Documentation/git-mktag.txt
@@ -8,40 +8,58 @@ git-mktag - Creates a tag object
 
 SYNOPSIS
 --------
-'git-mktag' < signature_file
+[verse]
+'git-mktag' < tag_data_file
+
 
 DESCRIPTION
 -----------
-Reads a tag contents on standard input and creates a tag object
+Reads tag object data on standard input and creates a tag object
 that can also be used to sign other objects.
 
 The output is the new tag's <object> identifier.
 
-Tag Format
+
+DISCUSSION
 ----------
-A tag signature file has a very simple fixed format: three lines of
+Tag object data has the following format
 
+[verse]
   object <sha1>
   type <typename>
-  tag <tagname>
+  tag <tagname>               (optional)
+  keywords <keywords>         (optional)
+  tagger <committer>
+
+followed by a blank line and a free-form message and an optional signature
+that git itself doesn't care about, but that may be verified with gpg or
+similar.
 
-followed by some 'optional' free-form signature that git itself
-doesn't care about, but that can be verified with gpg or similar.
+In the above listing, `<sha1>` represents the object pointed to by this tag,
+`<typename>` is the type of the object pointed to ("tag", "blob", "tree" or
+"commit"), `<tagname>` is the name of this tag object (and must correspond
+to the name of the corresponding ref (if any) in `.git/refs/`). `<keywords>`
+is a comma-separated list of keywords associated with this tag object, and
+`<committer>` holds the "`name <email>`" of the tag creator and timestamp
+of when the tag object was created (analogous to "committer" in commit
+objects).
 
-The size of the full object is artificially limited to 8kB.  (Just
-because I'm a lazy bastard, and if you can't fit a signature in that
-size, you're doing something wrong)
+If "`tag <tagname>`" is omitted, <tagname> defaults to the empty string.
+If "`keywords <keywords>`" is omitted, <keywords> defaults to "`tag`" if
+a <tagname> was given, "`note`" otherwise.
 
 
 Author
 ------
 Written by Linus Torvalds <torvalds@osdl.org>
 
+
 Documentation
 --------------
-Documentation by David Greaves, Junio C Hamano and the git-list <git@vger.kernel.org>.
+Documentation by Johan Herland, David Greaves, Junio C Hamano and the
+git-list <git@vger.kernel.org>.
+
 
 GIT
 ---
 Part of the gitlink:git[7] suite
-
-- 
1.5.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 5/6] git-mktag tests: Fix and expand the mktag tests according to the new tag object structure
  2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
                     ` (3 preceding siblings ...)
  2007-06-04  0:54   ` [PATCH 4/6] Documentation/git-mktag: Document the changes in tag object structure Johan Herland
@ 2007-06-04  0:54   ` Johan Herland
  2007-06-04  0:54   ` [PATCH 6/6] Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object Johan Herland
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-04  0:54 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

The existing tests are updated to reflect the changes in the tag object.

Additionally some more tests are added to test the new "keywords" header,
and to test the more thorough verification routine.

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3800-mktag.sh |  204 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 190 insertions(+), 14 deletions(-)

diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index 7c7e433..f6e3d10 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -34,7 +34,7 @@ too short for a tag
 EOF
 
 cat >expect.pat <<EOF
-^error: .*size wrong.*$
+^error: .* size .*$
 EOF
 
 check_verify_failure 'Tag object length check'
@@ -46,6 +46,8 @@ cat >tag.sig <<EOF
 xxxxxx 139e9b33986b1c2670fff52c5067603117b3e895
 type tag
 tag mytag
+tagger foo
+
 EOF
 
 cat >expect.pat <<EOF
@@ -61,6 +63,8 @@ cat >tag.sig <<EOF
 object zz9e9b33986b1c2670fff52c5067603117b3e895
 type tag
 tag mytag
+tagger foo
+
 EOF
 
 cat >expect.pat <<EOF
@@ -76,6 +80,8 @@ cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 xxxx tag
 tag mytag
+tagger foo
+
 EOF
 
 cat >expect.pat <<EOF
@@ -91,7 +97,7 @@ echo "object 779e9b33986b1c2670fff52c5067603117b3e895" >tag.sig
 printf "type tagsssssssssssssssssssssssssssssss" >>tag.sig
 
 cat >expect.pat <<EOF
-^error: char48: .*"[\]n"$
+^error: char48: .*"[\]n" after "type"$
 EOF
 
 check_verify_failure '"type" line eol check'
@@ -103,10 +109,12 @@ cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 type tag
 xxx mytag
+tagger foo
+
 EOF
 
 cat >expect.pat <<EOF
-^error: char57: no "tag " found$
+^error: char57: .*$
 EOF
 
 check_verify_failure '"tag" line label check #1'
@@ -118,21 +126,27 @@ cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 type taggggggggggggggggggggggggggggggg
 tag
+keywords foo
+tagger bar@baz.com
+
 EOF
 
 cat >expect.pat <<EOF
-^error: char87: no "tag " found$
+^error: char87: .*$
 EOF
 
 check_verify_failure '"tag" line label check #2'
 
 ############################################################
-#  8. type line type-name length check
+#  8. type line type name length check
 
 cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 type taggggggggggggggggggggggggggggggg
 tag mytag
+keywords foo
+tagger bar@baz.com
+
 EOF
 
 cat >expect.pat <<EOF
@@ -148,6 +162,9 @@ cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 type tagggg
 tag mytag
+keywords foo
+tagger bar@baz.com
+
 EOF
 
 cat >expect.pat <<EOF
@@ -157,12 +174,15 @@ EOF
 check_verify_failure 'verify object (SHA1/type) check'
 
 ############################################################
-# 10. verify tag-name check
+# 10. verify tag name check
 
 cat >tag.sig <<EOF
 object $head
 type commit
 tag my	tag
+keywords foo
+tagger bar@baz.com
+
 EOF
 
 cat >expect.pat <<EOF
@@ -172,56 +192,212 @@ EOF
 check_verify_failure 'verify tag-name check'
 
 ############################################################
-# 11. tagger line label check #1
+# 11. keywords line label check #1
+
+cat >tag.sig <<EOF
+object $head
+type commit
+tag mytag
+xxxxxxxx foo
+tagger bar@baz.com
+
+EOF
+
+cat >expect.pat <<EOF
+^error: char70: .*$
+EOF
+
+check_verify_failure '"keywords" line label check #1'
+
+############################################################
+# 12. keywords line label check #2
+
+cat >tag.sig <<EOF
+object $head
+type commit
+tag mytag
+keywords
+tagger bar@baz.com
+
+EOF
+
+cat >expect.pat <<EOF
+^error: char70: .*$
+EOF
+
+check_verify_failure '"keywords" line label check #2'
+
+############################################################
+# 13. keywords line check #1
+
+cat >tag.sig <<EOF
+object $head
+type commit
+tag mytag
+keywords foo bar	baz
+tagger bar@baz.com
+
+EOF
+
+cat >expect.pat <<EOF
+^error: char83: .*$
+EOF
+
+check_verify_failure '"keywords" line check #1'
+
+############################################################
+# 14. keywords line check #2
+
+cat >tag.sig <<EOF
+object $head
+type commit
+tag mytag
+keywords foo,bar	baz
+tagger bar@baz.com
+
+EOF
+
+cat >expect.pat <<EOF
+^error: char87: .*$
+EOF
+
+check_verify_failure '"keywords" line check #2'
+
+############################################################
+# 15. keywords line check #3
+
+cat >tag.sig <<EOF
+object $head
+type commit
+tag mytag
+keywords foo,,bar
+tagger bar@baz.com
+
+EOF
+
+cat >expect.pat <<EOF
+^error: char83: .*$
+EOF
+
+check_verify_failure '"keywords" line check #3'
+
+############################################################
+# 16. tagger line label check #1
 
 cat >tag.sig <<EOF
 object $head
 type commit
 tag mytag
+
 EOF
 
 cat >expect.pat <<EOF
-^error: char70: could not find "tagger"$
+^error: char70: .*$
 EOF
 
 check_verify_failure '"tagger" line label check #1'
 
 ############################################################
-# 12. tagger line label check #2
+# 17. tagger line label check #2
 
 cat >tag.sig <<EOF
 object $head
 type commit
 tag mytag
-tagger
+xxxxxx bar@baz.com
+
 EOF
 
 cat >expect.pat <<EOF
-^error: char70: could not find "tagger"$
+^error: char70: .*$
 EOF
 
 check_verify_failure '"tagger" line label check #2'
 
 ############################################################
-# 13. create valid tag
+# 18. tagger line label check #3
+
+cat >tag.sig <<EOF
+object $head
+type commit
+tag mytag
+keywords foo
+tagger
+
+EOF
+
+cat >expect.pat <<EOF
+^error: char83: .*$
+EOF
+
+check_verify_failure '"tagger" line label check #3'
+
+############################################################
+# 19. create valid tag #1
 
 cat >tag.sig <<EOF
 object $head
 type commit
 tag mytag
 tagger another@example.com
+
 EOF
 
 test_expect_success \
-    'create valid tag' \
+    'create valid tag #1' \
     'git-mktag <tag.sig >.git/refs/tags/mytag 2>message'
 
 ############################################################
-# 14. check mytag
+# 20. check mytag
 
 test_expect_success \
     'check mytag' \
     'git-tag -l | grep mytag'
 
+############################################################
+# 21. create valid tag #2
+
+cat >tag.sig <<EOF
+object $head
+type commit
+tagger another@example.com
+
+EOF
+
+test_expect_success \
+    'create valid tag #2' \
+    'git-mktag <tag.sig >.git/refs/tags/mytag 2>message'
+
+############################################################
+# 22. create valid tag #3
+
+cat >tag.sig <<EOF
+object $head
+type commit
+keywords foo,bar,baz,spam,spam,spam,spam,spam,spam,spam,spam
+tagger another@example.com
+
+EOF
+
+test_expect_success \
+    'create valid tag #3' \
+    'git-mktag <tag.sig >.git/refs/tags/mytag 2>message'
+
+############################################################
+# 23. create valid tag #4
+
+cat >tag.sig <<EOF
+object $head
+type commit
+tag mytag
+keywords note
+tagger another@example.com
+
+EOF
+
+test_expect_success \
+    'create valid tag #4' \
+    'git-mktag <tag.sig >.git/refs/tags/mytag 2>message'
+
 
 test_done
-- 
1.5.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 6/6] Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object
  2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
                     ` (4 preceding siblings ...)
  2007-06-04  0:54   ` [PATCH 5/6] git-mktag tests: Fix and expand the mktag tests according to the new " Johan Herland
@ 2007-06-04  0:54   ` Johan Herland
  2007-06-04 20:32   ` [PATCH 0/6] Refactor the " Junio C Hamano
  2007-06-07 22:13   ` [PATCH] Fix bug in tag parsing when thorough verification was in effect Johan Herland
  7 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-04  0:54 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

On Monday 28 May 2007, Junio C Hamano wrote:
> However it would be a good
> idea to add logic to fsck to warn upon inconsistencis (perhaps
> by mistake) between refname and tag's true name.
>
> The check would say something like:
>
>   If an annotated (signed or unsigned) tag has a "tag"
>   line to give it the official $name, and if it is pointed
>   at by a ref, the refname must end with "/$name".
>   Otherwise we warn.
>
> Trivially, the above rule says that having v2.6.22 tag under
> refs/tags/v2.6.20 is a mistake we would want to be warned upon.

This patch adds the check described by Junio.

Signed-off-by: Johan Herland <johan@herland.net>
---
 builtin-fsck.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index a8914ae..379317e 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -515,6 +515,25 @@ static int fsck_handle_reflog(const char *logname, const unsigned char *sha1, in
 	return 0;
 }
 
+static void fsck_verify_ref_to_tag_object(const char *refname, struct object *obj)
+{
+	/* Verify that refname matches the name stored in obj's "tag" header */
+	struct tag *tagobj = (struct tag *) parse_object(obj->sha1);
+	size_t tagname_len = strlen(tagobj->tag);
+	size_t refname_len = strlen(refname);
+
+	if (!tagname_len) return; /* No tag name stored in tagobj. Nothing to do. */
+
+	if (tagname_len < refname_len &&
+	    !memcmp(tagobj->tag, refname + (refname_len - tagname_len), tagname_len) &&
+	    refname[(refname_len - tagname_len) - 1] == '/') {
+		/* OK: tag name is "$name", and refname ends with "/$name" */
+		return;
+	}
+	else
+		error("%s: Mismatch between tag ref and tag object's name %s", refname, tagobj->tag);
+}
+
 static int fsck_handle_ref(const char *refname, const unsigned char *sha1, int flag, void *cb_data)
 {
 	struct object *obj;
@@ -529,6 +548,8 @@ static int fsck_handle_ref(const char *refname, const unsigned char *sha1, int f
 		/* We'll continue with the rest despite the error.. */
 		return 0;
 	}
+	if (obj->type == OBJ_TAG) /* ref to tag object */
+		fsck_verify_ref_to_tag_object(refname, obj);
 	default_refs++;
 	obj->used = 1;
 	mark_reachable(obj, REACHABLE);
-- 
1.5.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/6] git-fsck: Do thorough verification of tag objects.
  2007-06-04  0:53   ` [PATCH 3/6] git-fsck: Do thorough verification of tag objects Johan Herland
@ 2007-06-04  5:56     ` Matthias Lederhofer
  2007-06-04  7:51       ` Johan Herland
  0 siblings, 1 reply; 52+ messages in thread
From: Matthias Lederhofer @ 2007-06-04  5:56 UTC (permalink / raw)
  To: Johan Herland; +Cc: git

Johan Herland <johan@herland.net> wrote:
> diff --git a/builtin-fsck.c b/builtin-fsck.c
> index cbbcaf0..a8914ae 100644
> --- a/builtin-fsck.c
> +++ b/builtin-fsck.c
> @@ -344,6 +344,20 @@ static int fsck_commit(struct commit *commit)
>  static int fsck_tag(struct tag *tag)
>  {
>  	struct object *tagged = tag->tagged;
> +	enum object_type type;
> +	unsigned long size;
> +	char *data = (char *) read_sha1_file(tag->object.sha1, &type, &size);
> +	if (!data)
> +		return error("Could not read tag %s", sha1_to_hex(tag->object.sha1));
> +	if (type != OBJ_TAG) {
> +		free(data);
> +		return error("Internal error: Tag %s not a tag", sha1_to_hex(tag->object.sha1));
> +	}
> +	if (parse_and_verify_tag_buffer(0, data, size, 1)) { /* Thoroughly verify tag object */
> +		free(data);
> +		return error("Tag %s failed thorough tag object verification", sha1_to_hex(tag->object.sha1));
> +	}
> +	free(data);
>  
>  	if (!tagged) {
>  		return objerror(&tag->object, "could not load tagged object");

The objerror() function prints the sha1 and object type, I think this
one should be used instead of error() here.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/6] Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header
  2007-06-04  0:52   ` [PATCH 1/6] Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header Johan Herland
@ 2007-06-04  6:08     ` Matthias Lederhofer
  2007-06-04  7:30       ` Johan Herland
  0 siblings, 1 reply; 52+ messages in thread
From: Matthias Lederhofer @ 2007-06-04  6:08 UTC (permalink / raw)
  To: Johan Herland; +Cc: git

Johan Herland <johan@herland.net> wrote:
> 1. Make the "tag" header optional. The "tag" header contains the tag name,
>    which is optional for 'notes'. The new semantics for the "tag" header
>    are as follows: The tag header _must_ be given for signed tags (this
>    is already enforced by git-tag.sh). When the tag header is not given,
>    its value defaults to the empty string.

Why must signed tags have a tag header?  Will notes optionally have a
tag header?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/6] Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header
  2007-06-04  6:08     ` Matthias Lederhofer
@ 2007-06-04  7:30       ` Johan Herland
  0 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-04  7:30 UTC (permalink / raw)
  To: Matthias Lederhofer; +Cc: git

On Monday 04 June 2007, Matthias Lederhofer wrote:
> Johan Herland <johan@herland.net> wrote:
> > 1. Make the "tag" header optional. The "tag" header contains the tag 
name,
> >    which is optional for 'notes'. The new semantics for the "tag" header
> >    are as follows: The tag header _must_ be given for signed tags (this
> >    is already enforced by git-tag.sh). When the tag header is not given,
> >    its value defaults to the empty string.
> 
> Why must signed tags have a tag header?  Will notes optionally have a
> tag header?

The purpose of signing a tag is to cryptographically verify the thing 
pointed at by the tag. But you also want to protect the tag itself. In 
order to make it harder for someone to rename a signed tag (thereby opening 
the door to replacing it with a different - possibly signed - malicious 
tag), you want to include the tag name in the signed data. This allows us 
to verify that the tag ref (as stored in '.git/refs') is identical to the
tag name stored inside the signed object.

Yes, 'notes' will optionally have a "tag" header. When I originally designed  
notes, I didn't think anybody would want to name their notes, but Linus 
requested it, and there's no technical argument against it. Note that if 
you name your note, and put a ref to it (under '.git/refs'), there's 
technically no distinction between a tag object and a note object, except 
what you choose to put in the "keywords" header, of course.

Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/6] git-fsck: Do thorough verification of tag objects.
  2007-06-04  5:56     ` Matthias Lederhofer
@ 2007-06-04  7:51       ` Johan Herland
  2007-06-06  7:18         ` Junio C Hamano
  0 siblings, 1 reply; 52+ messages in thread
From: Johan Herland @ 2007-06-04  7:51 UTC (permalink / raw)
  To: Matthias Lederhofer; +Cc: git

Teach git-fsck to do the same kind of verification on tag objects that is
already done by git-mktag.

Signed-off-by: Johan Herland <johan@herland.net>
---

Matthias Lederhofer <matled@gmx.net> wrote:
> The objerror() function prints the sha1 and object type, I think this
> one should be used instead of error() here.

Of course, you're right. Like this, I hope:

 builtin-fsck.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index cbbcaf0..71a5fd5 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -344,6 +344,20 @@ static int fsck_commit(struct commit *commit)
 static int fsck_tag(struct tag *tag)
 {
 	struct object *tagged = tag->tagged;
+	enum object_type type;
+	unsigned long size;
+	char *data = (char *) read_sha1_file(tag->object.sha1, &type, &size);
+	if (!data)
+		return objerror(&tag->object, "could not read tag");
+	if (type != OBJ_TAG) {
+		free(data);
+		return objerror(&tag->object, "not a tag (internal error)");
+	}
+	if (parse_and_verify_tag_buffer(0, data, size, 1)) { /* thoroughly verify tag object */
+		free(data);
+		return objerror(&tag->object, "failed thorough tag object verification");
+	}
+	free(data);
 
 	if (!tagged) {
 		return objerror(&tag->object, "could not load tagged object");
-- 
1.5.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/6] Refactor the tag object
  2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
                     ` (5 preceding siblings ...)
  2007-06-04  0:54   ` [PATCH 6/6] Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object Johan Herland
@ 2007-06-04 20:32   ` Junio C Hamano
  2007-06-07 22:13   ` [PATCH] Fix bug in tag parsing when thorough verification was in effect Johan Herland
  7 siblings, 0 replies; 52+ messages in thread
From: Junio C Hamano @ 2007-06-04 20:32 UTC (permalink / raw)
  To: Johan Herland; +Cc: git

Johan, I gave only a cursory look at the series so far (that is,
I looked at your log message and had a very quick pass over the
code to see what the basic idea is and if it is sound, without
really reading/reviewing the code).  It all looks good.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/6] git-fsck: Do thorough verification of tag objects.
  2007-06-04  7:51       ` Johan Herland
@ 2007-06-06  7:18         ` Junio C Hamano
  2007-06-06  8:06           ` Johan Herland
  0 siblings, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2007-06-06  7:18 UTC (permalink / raw)
  To: Johan Herland; +Cc: Matthias Lederhofer, git

Johan Herland <johan@herland.net> writes:

> Teach git-fsck to do the same kind of verification on tag objects that is
> already done by git-mktag.

The tagger field was introduced mid July 2005; any repository
with a tag object older than that would now get non-zero exit
from fsck.

This won't practically be problem in newer repositories, but it
is somewhat annoying.  Perhaps do this only under the new -v
option to git-fsck, say "warning" not "error", and not exit with
non-zero because of this?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 3/6] git-fsck: Do thorough verification of tag objects.
  2007-06-06  7:18         ` Junio C Hamano
@ 2007-06-06  8:06           ` Johan Herland
  2007-06-06  9:03             ` Junio C Hamano
  0 siblings, 1 reply; 52+ messages in thread
From: Johan Herland @ 2007-06-06  8:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthias Lederhofer, git

Teach git-fsck to do the same kind of verification on tag objects that is
already done by git-mktag.

Signed-off-by: Johan Herland <johan@herland.net>
---

On Wednesday 06 June 2007, Junio C Hamano wrote:
> The tagger field was introduced mid July 2005; any repository
> with a tag object older than that would now get non-zero exit
> from fsck.
> 
> This won't practically be problem in newer repositories, but it
> is somewhat annoying.  Perhaps do this only under the new -v
> option to git-fsck, say "warning" not "error", and not exit with
> non-zero because of this?

Like this?

Or would you rather switch around the "verbose" and the
"parse_and_verify_tag_buffer()" (i.e. not even attempt the thorough
verification unless in verbose mode)?


Have fun!

...Johan

 builtin-fsck.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index bacae5d..fb9a8bb 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -359,11 +359,24 @@ static int fsck_commit(struct commit *commit)
 static int fsck_tag(struct tag *tag)
 {
 	struct object *tagged = tag->tagged;
+	enum object_type type;
+	unsigned long size;
+	char *data = (char *) read_sha1_file(tag->object.sha1, &type, &size);
 
 	if (verbose)
 		fprintf(stderr, "Checking tag %s\n",
 			sha1_to_hex(tag->object.sha1));
 
+	if (!data)
+		return objerror(&tag->object, "could not read tag");
+	if (type != OBJ_TAG) {
+		free(data);
+		return objerror(&tag->object, "not a tag (internal error)");
+	}
+	if (parse_and_verify_tag_buffer(0, data, size, 1) && verbose)
+		objwarning(&tag->object, "failed thorough tag object verification");
+	free(data);
+
 	if (!tagged) {
 		return objerror(&tag->object, "could not load tagged object");
 	}
-- 
1.5.2



-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/6] git-fsck: Do thorough verification of tag objects.
  2007-06-06  8:06           ` Johan Herland
@ 2007-06-06  9:03             ` Junio C Hamano
  2007-06-06  9:21               ` Junio C Hamano
  0 siblings, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2007-06-06  9:03 UTC (permalink / raw)
  To: Johan Herland; +Cc: Matthias Lederhofer, git

Johan Herland <johan@herland.net> writes:

>> This won't practically be problem in newer repositories, but it
>> is somewhat annoying.  Perhaps do this only under the new -v
>> option to git-fsck, say "warning" not "error", and not exit with
>> non-zero because of this?
>
> Like this?
>
> Or would you rather switch around the "verbose" and the
> "parse_and_verify_tag_buffer()" (i.e. not even attempt the thorough
> verification unless in verbose mode)?

Actually I was thinking about doing something like this.

-	if (parse_and_verify_tag_buffer(0, data, size, 1) && verbose)
+	if (parse_and_verify_tag_buffer(0, data, size, verbose))

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/6] git-fsck: Do thorough verification of tag objects.
  2007-06-06  9:03             ` Junio C Hamano
@ 2007-06-06  9:21               ` Junio C Hamano
  2007-06-06 10:26                 ` Johan Herland
  0 siblings, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2007-06-06  9:21 UTC (permalink / raw)
  To: Johan Herland; +Cc: Matthias Lederhofer, git

Junio C Hamano <gitster@pobox.com> writes:

> Johan Herland <johan@herland.net> writes:
> ...
>> Or would you rather switch around the "verbose" and the
>> "parse_and_verify_tag_buffer()" (i.e. not even attempt the thorough
>> verification unless in verbose mode)?
>
> Actually I was thinking about doing something like this.
>
> -	if (parse_and_verify_tag_buffer(0, data, size, 1) && verbose)
> +	if (parse_and_verify_tag_buffer(0, data, size, verbose))

Well, after running fsck with --verbose, I take the whole
suggestion back.  I think it is a good idea to do the "thorough"
tag validation in general, and it should not be buried under the
verbose output, which is almost useless unless in a very narrow
special case that you are really trying to see which exact
object is corrupt.

So I think your original patch to signal error on thorough tag
validation failure is probably a good approach in general.
People need to know that in git.git fsck would return non-zero
because of v0.99 tag, but the people who get hit/annoyed by this
ought to be minority.  It may be the case that a major portion
of git users currently are the ones who futz with the git.git
repository, but there would be a serious problem if it continues
to be the case ;-)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/6] git-fsck: Do thorough verification of tag objects.
  2007-06-06  9:21               ` Junio C Hamano
@ 2007-06-06 10:26                 ` Johan Herland
  2007-06-06 10:35                   ` Junio C Hamano
  0 siblings, 1 reply; 52+ messages in thread
From: Johan Herland @ 2007-06-06 10:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthias Lederhofer, git

On Wednesday 06 June 2007, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Johan Herland <johan@herland.net> writes:
> > ...
> >> Or would you rather switch around the "verbose" and the
> >> "parse_and_verify_tag_buffer()" (i.e. not even attempt the thorough
> >> verification unless in verbose mode)?
> >
> > Actually I was thinking about doing something like this.
> >
> > -	if (parse_and_verify_tag_buffer(0, data, size, 1) && verbose)
> > +	if (parse_and_verify_tag_buffer(0, data, size, verbose))
> 
> Well, after running fsck with --verbose, I take the whole
> suggestion back.  I think it is a good idea to do the "thorough"
> tag validation in general, and it should not be buried under the
> verbose output, which is almost useless unless in a very narrow
> special case that you are really trying to see which exact
> object is corrupt.

Take your pick among my patches or feel free to roll your own. :)

> So I think your original patch to signal error on thorough tag
> validation failure is probably a good approach in general.
> People need to know that in git.git fsck would return non-zero
> because of v0.99 tag, but the people who get hit/annoyed by this
> ought to be minority.  It may be the case that a major portion
> of git users currently are the ones who futz with the git.git
> repository, but there would be a serious problem if it continues
> to be the case ;-)

I also noticed that a number of the early tags in the kernel repo use the 
ancient format, and would thus fail fsck.

<stroke-of-madness>
Could we replace the v0.99 tag (and other ancient tags) with "correct" 
versions, and then encourage users who have already cloned to delete their 
v0.99 tag and re-pull? New clones would of course never see the old tag at 
all. This sure as hell sounds similar to inserting foot into mouth before 
shooting oneself in said foot, but it might still be worth considering...
</stroke-of-madness>


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/6] git-fsck: Do thorough verification of tag objects.
  2007-06-06 10:26                 ` Johan Herland
@ 2007-06-06 10:35                   ` Junio C Hamano
  0 siblings, 0 replies; 52+ messages in thread
From: Junio C Hamano @ 2007-06-06 10:35 UTC (permalink / raw)
  To: Johan Herland; +Cc: Matthias Lederhofer, git

Johan Herland <johan@herland.net> writes:

> I also noticed that a number of the early tags in the kernel repo use the 
> ancient format, and would thus fail fsck.
>
> <stroke-of-madness>
> Could we replace the v0.99 tag (and other ancient tags) with "correct" 
> versions, and then encourage users who have already cloned to delete their 
> v0.99 tag and re-pull? New clones would of course never see the old tag at 
> all. This sure as hell sounds similar to inserting foot into mouth before 
> shooting oneself in said foot, but it might still be worth considering...
> </stroke-of-madness>

I actually think that is not too bad.  In the course of git
development, the kernel folks had to do the wholesale repository
conversion twice (once when the order of hashing and compression
was swapped, another when the flat tree was made hierarchical),
I think.  Compared to that, tags are not referred to by other
entities, so it's much easier to "convert" (iow, re-sign).

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] Fix bug in tag parsing when thorough verification was in effect
  2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
                     ` (6 preceding siblings ...)
  2007-06-04 20:32   ` [PATCH 0/6] Refactor the " Junio C Hamano
@ 2007-06-07 22:13   ` Johan Herland
  7 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-07 22:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

The code that was enabled by passing a non-zero 'thorough_verify' argument
to parse_and_verify_tag_buffer() moved the 'tag_line' and 'keywords_line'
pointer variables forward in memory while checking for illegal chars.
These pointers were later used when setting the respective members on
the parsed tag object.

The fix refactors the verification loop so as to use offsets to the
'tag_line' and 'keywords_line' pointers, instead of moving the pointers
directly.

The patch also includes cleanup of the code associated with moving the
various '*_line' pointers past their initial header field identifier.
These operations are now done along with the calculation of their
corresponding '*_len' variables.

The patch also includes minor changes to expected output in associated
testcases.

The bug was discovered by inspection. Currently none of the callers of
parse_and_verify_tag_buffer() that use thorough_verify != 0, also use
the 'tag' and 'keywords' members of the parsed tag object.

Signed-off-by: Johan Herland <johan@herland.net>
---

This goes on top of the existing "Refactor the tag object" patch series.


Have fun!

...Johan

 t/t3800-mktag.sh |    8 ++++----
 tag.c            |   49 ++++++++++++++++++++++++++-----------------------
 2 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index f6e3d10..ac9008a 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -186,7 +186,7 @@ tagger bar@baz.com
 EOF
 
 cat >expect.pat <<EOF
-^error: char67: could not verify tag name$
+^error: char66: could not verify tag name$
 EOF
 
 check_verify_failure 'verify tag-name check'
@@ -240,7 +240,7 @@ tagger bar@baz.com
 EOF
 
 cat >expect.pat <<EOF
-^error: char83: .*$
+^error: char82: .*$
 EOF
 
 check_verify_failure '"keywords" line check #1'
@@ -258,7 +258,7 @@ tagger bar@baz.com
 EOF
 
 cat >expect.pat <<EOF
-^error: char87: .*$
+^error: char86: .*$
 EOF
 
 check_verify_failure '"keywords" line check #2'
@@ -276,7 +276,7 @@ tagger bar@baz.com
 EOF
 
 cat >expect.pat <<EOF
-^error: char83: .*$
+^error: char82: .*$
 EOF
 
 check_verify_failure '"keywords" line check #3'
diff --git a/tag.c b/tag.c
index 9c95e0b..e371179 100644
--- a/tag.c
+++ b/tag.c
@@ -153,52 +153,55 @@ int parse_and_verify_tag_buffer(struct tag *item, const char *data, const unsign
 	if (*header_end != '\n') /* header must end with "\n\n" */
 		return error("char" PD_FMT ": could not find blank line after header section", header_end - data);
 
-	/* Calculate lengths of header fields */
-	type_len      = tag_line      == type_line ? 0 :     /* 0 if not given, > 0 if given */
-			(tag_line      - type_line)     - strlen("type \n");
-	tag_len       = keywords_line == tag_line ? 0 :      /* 0 if not given, > 0 if given */
-			(keywords_line - tag_line)      - strlen("tag \n");
-	keywords_len  = tagger_line   == keywords_line ? 0 : /* 0 if not given, > 0 if given */
-			(tagger_line   - keywords_line) - strlen("keywords \n");
-	tagger_len    = header_end    == tagger_line ? 0 :   /* 0 if not given, > 0 if given */
-			(header_end    - tagger_line)   - strlen("tagger \n");
+	/*
+	 * Advance header field pointers past their initial identifier.
+	 * Calculate lengths of header fields (0 for fields that are not given).
+	 */
+	type_line     += strlen("type ");
+	type_len       =       tag_line >     type_line ? (     tag_line -     type_line) - 1 : 0;
+	tag_line      += strlen("tag ");
+	tag_len        =  keywords_line >      tag_line ? (keywords_line -      tag_line) - 1 : 0;
+	keywords_line += strlen("keywords ");
+	keywords_len   =    tagger_line > keywords_line ? (  tagger_line - keywords_line) - 1 : 0;
+	tagger_line   += strlen("tagger ");
+	tagger_len     =     header_end >   tagger_line ? (   header_end -   tagger_line) - 1 : 0;
 
 	/* Get the actual type */
 	if (type_len >= sizeof(type))
-		return error("char" PD_FMT ": type too long", (type_line + 5) - data);
-	memcpy(type, type_line + 5, type_len);
+		return error("char" PD_FMT ": type too long", (type_line) - data);
+	memcpy(type, type_line, type_len);
 	type[type_len] = '\0';
 
 	if (thorough_verify) {
+		unsigned long i;
+
 		/* Verify that the object matches */
 		if (verify_object(sha1, type))
 			return error("char%d: could not verify object %s", 7, sha1_to_hex(sha1));
 
 		/* Verify the tag name: we don't allow control characters or spaces in it */
 		if (tag_len > 0) { /* tag name was given */
-			tag_line += 4; /* skip past "tag " */
-			for (;;) {
-				unsigned char c = *tag_line++;
+			for (i = 0; i < tag_len; ++i) {
+				unsigned char c = tag_line[i];
 				if (c == '\n')
 					break;
 				if (c > ' ' && c != 0x7f)
 					continue;
-				return error("char" PD_FMT ": could not verify tag name", tag_line - data);
+				return error("char" PD_FMT ": could not verify tag name", tag_line + i - data);
 			}
 		}
 
 		/* Verify the keywords line: we don't allow control characters or spaces in it, or two subsequent commas */
 		if (keywords_len > 0) { /* keywords line was given */
-			keywords_line += 9; /* skip past "keywords " */
-			for (;;) {
-				unsigned char c = *keywords_line++;
+			for (i = 0; i < keywords_len; ++i) {
+				unsigned char c = keywords_line[i];
 				if (c == '\n')
 					break;
-				if (c == ',' && *keywords_line == ',')
-					return error("char" PD_FMT ": found empty keyword", keywords_line - data);
+				if (c == ',' && keywords_line[i + 1] == ',') /* consecutive commas */
+					return error("char" PD_FMT ": found empty keyword", keywords_line + i - data);
 				if (c > ' ' && c != 0x7f)
 					continue;
-				return error("char" PD_FMT ": could not verify keywords", keywords_line - data);
+				return error("char" PD_FMT ": could not verify keywords", keywords_line + i - data);
 			}
 		}
 
@@ -211,7 +214,7 @@ int parse_and_verify_tag_buffer(struct tag *item, const char *data, const unsign
 	if (item) { /* Store parsed information into item */
 		if (tag_len > 0) { /* optional tag name was given */
 			item->tag = xmalloc(tag_len + 1);
-			memcpy(item->tag, tag_line + 4, tag_len);
+			memcpy(item->tag, tag_line, tag_len);
 			item->tag[tag_len] = '\0';
 		}
 		else { /* optional tag name not given */
@@ -221,7 +224,7 @@ int parse_and_verify_tag_buffer(struct tag *item, const char *data, const unsign
 
 		if (keywords_len > 0) { /* optional keywords string was given */
 			item->keywords = xmalloc(keywords_len + 1);
-			memcpy(item->keywords, keywords_line + 9, keywords_len);
+			memcpy(item->keywords, keywords_line, keywords_len);
 			item->keywords[keywords_len] = '\0';
 		}
 		else { /* optional keywords string not given. Set default keywords */
-- 
1.5.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 0/7] Introduce soft references (softrefs)
  2007-06-04  0:51 Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2) Johan Herland
  2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
@ 2007-06-09 18:19 ` Johan Herland
  2007-06-09 18:21   ` [PATCH 1/7] Softrefs: Add softrefs header file with API documentation Johan Herland
                     ` (8 more replies)
  2007-06-09 22:57 ` Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2) Steven Grimm
  2 siblings, 9 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-09 18:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds

This patch series introduces soft references (softrefs); a mechanism for
declaring reachability between arbitrary (but existing) git objects.
Softrefs are meant to provide the mechanism for "reverse mapping" that
we determined was needed for tag objects (especially 'notes'). The patch
series also teaches git-mktag to create softrefs for all tag objects.

See the Discussion section in the git-softref manual page (patch #4/7) or
the comments in the header file (patch #1/7) for more details on the
design of softrefs.

I've added some informal performance data at the bottom of this mail [1].

Note that this patch series is incomplete in that the following things
have yet to be implemented:

1. Clone/fetch/push of softrefs

2. Packing of softrefs

3. General integration of softrefs into parts of git where they might be
   useful

4. Find appropriate value for MAX_UNSORTED_ENTRIES

There are also some questions connected to the above list of todos:

1. Just how should softrefs affect reachability? Should softrefs be
   used/followed in _all_ reachability computations? If not, which?

2. How should softrefs propagate. I suggest they are pretty much always
   propagated under clone/fetch/push. (Note that the softrefs merge
   algorithm in softrefs.c removes duplicates and softrefs between
   non-existing objects, so pre-filtering of the softrefs to be
   clones/fetched/pushed may not be necessary)

3. Where can softrefs be used to improve performance by replacing existing
   techniques?

4. How to best pack softrefs? Keeping them in the same pack as the objects
   they refer to seems to be a good idea, but more thought needs to be put
   into this before we can make an implementation

5. How to find _all_ (even unreachable) tag objects in repo for
   'git-softref --rebuild-tags'?

6. Optimization. Pretty much nothing has been done so far. Performance
   seems to be acceptable for now. Probably needs more testing to
   determine bottlenecks

NOTE: After the 7 patches, I will send an _optional_ patch
that changes the softrefs entries from text format (82 bytes per entry)
to binary format (40 bytes per entry). The patch is optional, because
I want the list to decide if we want the (marginal) speedup and
simplified code provided by the patch, or if we want to keep the
read-/maintainability of the text format. Currently I'm in favour of
keeping the text format, but I'm far from sure.

Finally, here's the shortlog: (This patch series of course goes on top of
the previous "Refactor the tag object" patch series, although there isn't
really that many dependencies between them):

Johan Herland (7):
      Softrefs: Add softrefs header file with API documentation
      Softrefs: Add implementation of softrefs API
      Softrefs: Add git-softref, a builtin command for adding, listing and administering softrefs
      Softrefs: Add manual page documenting git-softref and softrefs subsystem in general
      Softrefs: Add testcases for basic softrefs behaviour
      Softrefs: Administrivia associated with softrefs subsystem and git-softref builtin
      Teach git-mktag to register softrefs for all tag objects

 .gitignore                    |    1 +
 Documentation/cmd-list.perl   |    7 +-
 Documentation/git-softref.txt |  119 +++++++
 Makefile                      |    6 +-
 builtin-softref.c             |  167 ++++++++++
 builtin.h                     |    1 +
 git.c                         |    1 +
 mktag.c                       |   11 +-
 softrefs.c                    |  712 +++++++++++++++++++++++++++++++++++++++++
 softrefs.h                    |  188 +++++++++++
 t/t3050-softrefs.sh           |  314 ++++++++++++++++++
 11 files changed, 1521 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/git-softref.txt
 create mode 100644 builtin-softref.c
 create mode 100644 softrefs.c
 create mode 100644 softrefs.h
 create mode 100755 t/t3050-softrefs.sh

Have fun!

...Johan

[1] Informal performance measurements

I prepared a linux kernel repo (holding 57274 commits) with 10 tag objects,
and created softrefs from every commit to every tag object (572740 softrefs
in total). The resulting softrefs db was 46964680 bytes. The experiment was
done on a 32-bit Intel Pentium 4 (3 GHz w/HyperThreading) with 1 GB RAM:

========
Operations on unsorted softrefs:
(572740 (10 per commit) entries in random/unsorted order)
========

Listing all softrefs
(sequential reading of unsorted softrefs file)
--------
$ /usr/bin/time git softref --list > /dev/null
0.44user 0.02system 0:00.47elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11786minor)pagefaults 0swaps

Listing HEAD's softrefs
(sequential reading of unsorted softrefs file)
--------
$ /usr/bin/time git softref --list HEAD > /dev/null
0.11user 0.01system 0:00.14elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11790minor)pagefaults 0swaps

Sorting softrefs
--------
$ /usr/bin/time git softref --merge-unsorted
2.73user 4.97system 0:07.77elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+15833minor)pagefaults 0swaps

Sorting softrefs into existing sorted file
(throwing away duplicates)
--------
$ /usr/bin/time git softref --merge-unsorted
3.49user 5.12system 0:08.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27300minor)pagefaults 0swaps

========
Operations on sorted softrefs:
(572740 (10 per commit) entries in sorted order)
========

Listing all softrefs
(sequential reading of sorted softrefs file)
--------
$ /usr/bin/time git softref --list > /dev/null
0.43user 0.02system 0:00.48elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11786minor)pagefaults 0swaps

Listing HEAD's softrefs
(256-fanout followed by binary search in sorted softrefs file)
--------
$/usr/bin/time git softref --list HEAD > /dev/null
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+334minor)pagefaults 0swaps

Sorting softrefs
(no-op)
--------
$ /usr/bin/time git softref --merge-unsorted
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+312minor)pagefaults 0swaps

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 1/7] Softrefs: Add softrefs header file with API documentation
  2007-06-09 18:19 ` [PATCH 0/7] Introduce soft references (softrefs) Johan Herland
@ 2007-06-09 18:21   ` Johan Herland
  2007-06-10  6:58     ` Johannes Schindelin
  2007-06-10 14:27     ` Jakub Narebski
  2007-06-09 18:22   ` [PATCH 2/7] Softrefs: Add implementation of softrefs API Johan Herland
                     ` (7 subsequent siblings)
  8 siblings, 2 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-09 18:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds

See patch for documentation.

Signed-off-by: Johan Herland <johan@herland.net>
---
 softrefs.h |  188 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 188 insertions(+), 0 deletions(-)
 create mode 100644 softrefs.h

diff --git a/softrefs.h b/softrefs.h
new file mode 100644
index 0000000..db0f8b9
--- /dev/null
+++ b/softrefs.h
@@ -0,0 +1,188 @@
+#ifndef SOFTREFS_H
+#define SOFTREFS_H
+
+/*
+ * Softrefs is a general mechanism for declaring a relationship between two
+ * existing arbitrary objects in the repo. Softrefs differ from the existing
+ * reachability relationship in that a softref may be created after _both_ of
+ * the involved objects have been added to the repo. In contrast, the regular
+ * reachability relationship depends on the reachable object's name being
+ * stored _inside_ the other object. A reachability relationship can therefore
+ * not be created at a later time without violating the immutability of git
+ * objects.
+ *
+ * Softrefs are defined as going _from_ one object _to_ another object. Once
+ * a softref between two objects has been created, the "to" object is
+ * considered reachable from the "from" object.
+ *
+ * Also, softrefs are stored in a way that makes it easy and quick to find all
+ * the "to" objects reachable from a given "from" object.
+ *
+ * The softrefs db consists of two files: .git/softrefs.unsorted and
+ * .git/softrefs.sorted. Both files use the same format; one softref per line
+ * of the form "<from-sha1> <to-sha1>\n". Each sha1 sum is 40 bytes long; this
+ * makes each entry exactly 82 bytes long (including the space between the sha1
+ * sums and the terminating linefeed).
+ *
+ * The entries in .git/softrefs.sorted are sorted on <from-sha1>, in order to
+ * make lookup fast.
+ *
+ * The entries in .git/softrefs.unsorted are _not_ sorted. This is to make
+ * insertion fast.
+ *
+ * When softrefs are created (by calling add_softref()/add_softrefs()), they
+ * are appended to .git/softrefs.unsorted. When .git/softrefs.unsorted reach a
+ * certain number of entries (determined by MAX_UNSORTED_ENTRIES), all the
+ * entries in .git/softrefs.unsorted are merged into .git/softrefs.sorted.
+ *
+ * Soft references are used as a reverse mapping between tag objects and their
+ * corresponding tagged objects. For each tag object, a soft reference _to_
+ * the tag object _from_ the tagged object is created. Given an arbitrary
+ * object X in the database, softrefs allow for easy lookup of which tag
+ * objects that point to object X.
+ */
+
+/*
+ * Simple list of softrefs
+ */
+struct softref_list {
+	struct softref_list *next;
+	unsigned char from_sha1[20];
+	unsigned char   to_sha1[20];
+};
+
+/* Callback function type; used as parameter to for_each_softref()
+ *
+ * The functions takes the following arguments:
+ * - from_sha1 - The SHA1 of the 'from' object in the current softref
+ * - to_sha1   - The SHA1 of the 'to' object in the current softref
+ * - cb_data   - as passed to for_each_softref()
+ *
+ * Return non-zero to stop for_each_softref() from iterating through.
+ */
+typedef int each_softref_fn(
+	const unsigned char *from_sha1,
+	const unsigned char *to_sha1,
+	void *cb_data);
+
+/*
+ * Invoke 'fn' with 'cb_data' for each object pointed to by 'from_sha1'
+ *
+ * If 'from_sha1' is NULL, 'fn' is invoked for _all_ softrefs in the db.
+ *
+ * If 'fn' returns non-zero for any given softref, iteration is stopped and the
+ * same return value is returned from this function. If other problems are
+ * encountered while iterating, -1 is returned. If all matching entries were
+ * iterated successfully, and 'fn' returned 0 for all of them, 0 is returned.
+ */
+extern int for_each_softref_with_from(
+	const unsigned char *from_sha1, each_softref_fn fn, void *cb_data);
+
+/*
+ * Invoke 'fn' with 'cb_data' for each softref stored in the db
+ *
+ * This function is identical to calling for_each_softref_with_from() with
+ * NULL as the first parameter.
+ */
+extern int for_each_softref(each_softref_fn fn, void *cb_data);
+
+/*
+ * Initialize/prepare the softrefs db for a lot of read-only access
+ *
+ * You may call this function before doing repeated calls to accessor functions
+ * such as:
+ * - for_each_softref_with_from()
+ * - for_each_softref()
+ * - lookup_softref()
+ * - has_softref()
+ *
+ * This function is purely optional, although it may improve performance when
+ * accessor functions are called repeatedly. The change in performance is
+ * caused by:
+ *  1. Merging unsorted softref entries into the sorted db file,
+ *  2. Doing open() and mmap() on the sorted db file (in order to avoid doing
+ *     this on each subsequent call to an accessor function).
+ *
+ * When done accessing the softrefs db, the caller _must_ call
+ * deinit_softrefs_access() to properly deinitialize internal structures.
+ */
+extern void init_softrefs_access();
+
+/*
+ * Deinitialize internal structures associated with init_softrefs_access()
+ *
+ * Call this function when finished accessing softrefs after a call to
+ * init_softrefs_access().
+ */
+extern void deinit_softrefs_access();
+
+/*
+ * Look up the given object id in the softrefs db
+ *
+ * Returns a list of all the matching softrefs, i.e. softrefs whose from_sha1
+ * is identical to the given. If the given from_sha1 is NULL, all softrefs are
+ * returned.
+ *
+ * The entired softref_list returned (i.e. all elements retrievable by
+ * following the next pointer) must be free()d by the caller.
+ *
+ * You should consider using one of the for_each_softref*() functions instead,
+ * as those might save you some memory.
+ */
+extern struct softref_list *lookup_softref(const unsigned char *from_sha1);
+
+/*
+ * Delete (i.e. free()) all elements in the given softref_list
+ */
+extern void delete_softref_list(struct softref_list *list);
+
+/*
+ * Return 1 if there exists a softref between 'from_sha1' and 'to_sha1'
+ *
+ * Otherwise, return 0.
+ */
+extern int has_softref(
+	const unsigned char *from_sha1, const unsigned char *to_sha1);
+
+/*
+ * Add all the softrefs given in the given 'list' to the db.
+ *
+ * Returns the number of softrefs added, or -1 on failure to add any softrefs.
+ */
+extern int add_softrefs(const struct softref_list *list);
+
+/*
+ * Add a softref between 'from_sha1' and 'to_sha1'
+ *
+ * 'from_sha1' and 'to_sha1' are two 20-byte object ids.
+ * Returns 0 on success, 1 if the softref already exists, -1 on failure.
+ */
+extern int add_softref(
+	const unsigned char *from_sha1, const unsigned char *to_sha1);
+
+/*
+ * Merge softrefs found in the given unsorted softrefs file into the sorted db
+ *
+ * If 'unsorted_file' is NULL, the internal unsorted db file is merged.
+ *
+ * Note that this routine is automatically invoked by add_softrefs() and
+ * add_softref() to control the size of the unsorted db file.
+ *
+ * If 'unsorted_file' is NULL, the merging is only done if the number of
+ * softrefs in the unsorted db file exceed a fixed threshold (see
+ * MAX_UNSORTED_ENTRIES). However, if 'force' is set, the merging will be done
+ * regardless. Passing anything other than NULL for 'unsorted_file'
+ * automatically turns on 'force'.
+ *
+ * Returns 0 on success; non-zero if problems were encountered.
+ */
+extern int merge_unsorted_softrefs(const char *unsorted_file, int force);
+
+/*
+ * Merge softrefs found in the given sorted softrefs file into the sorted db
+ *
+ * Returns 0 on success; non-zero if problems were encountered.
+ */
+extern int merge_sorted_softrefs(const char *sorted_file);
+
+#endif /* SOFTREFS_H */
-- 
1.5.2.1.144.gabc40

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 2/7] Softrefs: Add implementation of softrefs API
  2007-06-09 18:19 ` [PATCH 0/7] Introduce soft references (softrefs) Johan Herland
  2007-06-09 18:21   ` [PATCH 1/7] Softrefs: Add softrefs header file with API documentation Johan Herland
@ 2007-06-09 18:22   ` Johan Herland
  2007-06-09 18:22   ` [PATCH 3/7] Softrefs: Add git-softref, a builtin command for adding, listing and administering softrefs Johan Herland
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-09 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds

This code tries to implement the softrefs API as straightforwardly as
possible. Virtually no optimization has been done, although I do have
a feeling the code has ok performance as is.

All functions that do not appear in the API docs have some comments
attached to them.

There are also a couple of things to be considered before inclusion:
- File locking. Currently no locking is performed on softrefs files
  before reading or writing entries.
- Packing. We need a plan for how softrefs should be included in packs,
  at which supporting code must be added to this implementation.

Signed-off-by: Johan Herland <johan@herland.net>
---
 softrefs.c |  712 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 712 insertions(+), 0 deletions(-)
 create mode 100644 softrefs.c

diff --git a/softrefs.c b/softrefs.c
new file mode 100644
index 0000000..c7308c8
--- /dev/null
+++ b/softrefs.c
@@ -0,0 +1,712 @@
+#include "cache.h"
+#include "softrefs.h"
+
+/* constants */
+static const char *       UNSORTED_FILENAME    = "softrefs.unsorted";
+static const char *       SORTED_FILENAME      = "softrefs.sorted";
+static const unsigned int MAX_UNSORTED_ENTRIES = 1000;
+
+
+/* softref entry as it appears in a softrefs file */
+struct softrefs_entry {
+	char from_sha1_hex[40];
+	char space;
+	char to_sha1_hex[40];
+	char lf;
+};
+
+/* simple encapsulation of a softrefs file */
+struct softrefs_file {
+	char *filename;
+	int fd;
+	struct softrefs_entry *data; /* mmap()ed softrefs_entry objects */
+	unsigned long data_len;      /* # of softrefs_entry objects in data */
+};
+
+/* Internal file opened/closed by (de)init_softrefs_access() */
+static struct softrefs_file *internal_file = 0;
+
+/*
+ * Open and mmap() the given filename, Assign the file descriptior, data
+ * pointer and data length to the given softrefs_file object.
+ * Return 0 on success, -1 on failure.
+ *
+ * Note that a non-existing file is not a failure per se, but is rather treated
+ * as an empty file, i.e. there will be no data in the file structure
+ * (data_len == 0), but 0/sucess will be returned.
+ *
+ * The caller must _always_ call close_softrefs_file() with the same
+ * softrefs_file argument after processing the file data, even if no file
+ * is actually opened and/or this function returns -1.
+ */
+static int open_softrefs_file(const char *filename, struct softrefs_file *file)
+{
+	struct stat st;
+
+	/* Default "failure" values */
+	file->filename = xstrdup(filename);
+	file->fd = -1;
+	file->data = MAP_FAILED;
+	file->data_len = 0;
+
+	/* FIXME: File locking!? */
+	if (access(file->filename, F_OK))
+		return 0;
+	file->fd = open(file->filename, O_RDONLY);
+	if (file->fd < 0)
+		return error("Failed to open softrefs file %s: %s",
+				file->filename, strerror(errno));
+	if (fstat(file->fd, &st))
+		return error("Failed to fstat softrefs file %s: %s",
+				file->filename, strerror(errno));
+	if (st.st_size == 0) /* Empty file. No need to call mmap() */
+		return 0;
+	if (st.st_size % sizeof(struct softrefs_entry))
+		return error("Refuse to mmap softrefs file %s: File does not have whole number of softref entries",
+				file->filename);
+
+	file->data = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, file->fd, 0);
+	if (file->data == MAP_FAILED)
+		return error("Failed to mmap softrefs file %s: %s",
+				file->filename, strerror(errno));
+
+	file->data_len = st.st_size / sizeof(struct softrefs_entry);
+
+	return 0;
+}
+
+/*
+ * Close the softrefs file identified by the given softrefs_file object.
+ * Return 0 on success, non-zero on failure.
+ */
+static int close_softrefs_file(const struct softrefs_file *file)
+{
+	int ret = 0;
+	if (file->data != MAP_FAILED &&
+	    munmap(file->data, file->data_len * sizeof(struct softrefs_entry)))
+	{
+		ret = error("Failed to munmap softrefs file %s: %s",
+				file->filename, strerror(errno));
+	}
+	if (file->fd != -1 && close(file->fd))
+		ret = error("Failed to close softrefs file %s: %s",
+				file->filename, strerror(errno));
+	free(file->filename);
+	return ret;
+}
+
+/*
+ * Write the given softrefs_entry to the given file descriptor, which must be
+ * open and writable.
+ *
+ * Returns 0 on success, non-zero on failure.
+ */
+static int write_entry(int fd, const struct softrefs_entry *entry)
+{
+	if (write(fd, (const void *) entry, sizeof(struct softrefs_entry))
+		< sizeof(struct softrefs_entry))
+	{
+		return error("Failed to write entry '%.40s -> %.40s' to softrefs file descriptor %i: %s",
+				entry->from_sha1_hex, entry->to_sha1_hex,
+				fd, strerror(errno));
+	}
+	return 0;
+}
+
+/* See softrefs.h for documentation */
+void init_softrefs_access()
+{
+	if (internal_file) /* already initialized */
+		return;
+
+	/* Force merge into sorted, so that we only have one file to search */
+	if (merge_unsorted_softrefs(NULL, 1))
+		return; /* merge failed */
+
+	internal_file = xmalloc(sizeof(struct softrefs_file));
+	if (open_softrefs_file(git_path(SORTED_FILENAME), internal_file)) {
+		free(internal_file);
+		internal_file = 0;
+	}
+}
+
+/* See softrefs.h for documentation */
+void deinit_softrefs_access()
+{
+	if (!internal_file) /* already deinitialized */
+		return;
+	close_softrefs_file(internal_file);
+	internal_file = 0;
+}
+
+/* comparison between a SHA1 sum and a softrefs entry */
+static int sha1_to_entry_cmp(
+	const unsigned char *from_sha1, const struct softrefs_entry *entry)
+{
+	unsigned char sha1[20];
+	get_sha1_hex(entry->from_sha1_hex, sha1);
+	return hashcmp(from_sha1, sha1);
+}
+
+/* comparison between softrefs entries */
+static int softrefs_entry_cmp(
+		const struct softrefs_entry *a, const struct softrefs_entry *b)
+{
+	unsigned char sa[20], sb[20];
+	int ret;
+	get_sha1_hex(a->from_sha1_hex, sa);
+	get_sha1_hex(b->from_sha1_hex, sb);
+	ret = hashcmp(sa, sb);
+	if (!ret) {
+		get_sha1_hex(a->to_sha1_hex, sa);
+		get_sha1_hex(b->to_sha1_hex, sb);
+		ret = hashcmp(sa, sb);
+	}
+	return ret;
+}
+
+/* comparison between softrefs entries as invoked by qsort() */
+static int softrefs_entry_qsort_cmp(const void *a, const void *b)
+{
+	const struct softrefs_entry *na = *((const struct softrefs_entry **) a);
+	const struct softrefs_entry *nb = *((const struct softrefs_entry **) b);
+	return softrefs_entry_cmp(na, nb);
+}
+
+
+/*
+ * Sequentially process given 'file' starting at index 'i'
+ *
+ * For each entry matching 'from_sha1' (if NULL, match all entries), invoke
+ * callback function 'fn' with the from_sha1 and to_sha1 of the matching
+ * softref. Keep going until 'fn' returns non-zero, or end of file is reached.
+ *
+ * If the 'stop_at_first_non_match' flag is set, processing will stop when the
+ * first non-matching entry is encountered.
+ *
+ * Returns result of 'fn' if non-zero; otherwise 0 on success and -1 on failure.
+ */
+static int do_for_each_sequential(
+		const unsigned char *from_sha1,
+		each_softref_fn fn, void *cb_data,
+		struct softrefs_file *file,
+		unsigned long i,
+		int stop_at_first_non_match)
+{
+	unsigned char f_sha1[20], t_sha1[20]; /* Holds sha1 per entry */
+	int ret = 0;
+	for (; i < file->data_len; ++i) { /* Step through file, starting at i */
+		/* sanity check entry */
+		if (file->data[i].space != ' ' || file->data[i].lf != '\n') {
+			ret = error("Entry #%lu in softrefs file %s failed sanity check",
+					i, file->filename);
+			break;
+		}
+		/* retrieve SHA1 values */
+		if (get_sha1_hex(file->data[i].from_sha1_hex, f_sha1) ||
+		    get_sha1_hex(file->data[i].to_sha1_hex,   t_sha1)) {
+			ret = error("Failed to read SHA1 values from entry #%lu in softrefs file %s",
+					i, file->filename);
+			break;
+		}
+		/* Compare to lookup value */
+		if (!from_sha1 || !hashcmp(from_sha1, f_sha1)) {
+			if ((ret = fn(f_sha1, t_sha1, cb_data)))
+				break; /* bail out if callback returns != 0 */
+		}
+		else if (stop_at_first_non_match)
+			break;
+	}
+	return ret;
+}
+
+/* Invoke callback 'fn' for each matching entry in UNSORTED_FILENAME */
+static int do_for_each_unsorted(
+		const unsigned char *from_sha1,
+		each_softref_fn fn, void *cb_data)
+{
+	struct softrefs_file file;
+	int ret = 0;
+
+	if (internal_file)
+		/*
+		 * internal_file is open. Unsorted entries are merged just
+		 * before opening internal_file (in init_softrefs_access()).
+		 * Since internal_file is still open, no entries have been
+		 * added since last merge, meaning that there can be no
+		 * unsorted entries in the db, and thus no unsorted file.
+		 * Therefore return immediate success.
+		 */
+		return 0;
+
+	if (!(ret = open_softrefs_file(git_path(UNSORTED_FILENAME), &file)))
+		ret = do_for_each_sequential(from_sha1, fn, cb_data, &file, 0, 0);
+
+	close_softrefs_file(&file);
+	return ret;
+}
+
+/* Invoke callback 'fn' for each matching entry in SORTED_FILENAME */
+static int do_for_each_sorted(
+		const unsigned char *from_sha1,
+		each_softref_fn fn, void *cb_data)
+{
+	struct softrefs_file *file;
+	unsigned long i, left, right;
+	int cmp_result;
+	int ret = 0;
+
+	if (internal_file) /* use already open internal_file */
+		file = internal_file;
+	else { /* open file ourselves */
+		file = xmalloc(sizeof(struct softrefs_file));
+		if ((ret = open_softrefs_file(git_path(SORTED_FILENAME), file)))
+			goto done;
+	}
+	if (!file->data_len) /* no entries */
+		goto done;
+
+	if (!from_sha1) { /* match _all_ entries; do sequential walk instead */
+		ret = do_for_each_sequential(from_sha1, fn, cb_data, file, 0, 0);
+		goto done;
+	}
+
+	/* Calculate first index by 256-fanout */
+	left = 0;
+	right = file->data_len;
+	i = (from_sha1[0] * file->data_len) / 256;
+
+	/* Binary search */
+	while ((cmp_result = sha1_to_entry_cmp(from_sha1, &(file->data[i])))) {
+		if (right - left <= 1) /* not found; give up */
+			goto done;
+		if (cmp_result > 0) /* go right */
+			left = i + 1;
+		else /* go left */
+			right = i;
+		i = (left + right) / 2;
+	}
+
+	/* i points to a matching entry, but not necessarily the first */
+	while (i >= 1 && sha1_to_entry_cmp(from_sha1, &(file->data[i - 1])) == 0)
+		--i;
+
+	/* i points to the first matching entry */
+	/* do sequential processing from i, stopping at first non-match */
+	ret = do_for_each_sequential(from_sha1, fn, cb_data, file, i, 1);
+
+done:
+	if (!internal_file) { /* only close if we opened ourselves */
+		close_softrefs_file(file);
+		free(file);
+	}
+	return ret;
+}
+
+/* See softrefs.h for documentation */
+int for_each_softref_with_from(
+		const unsigned char *from_sha1,
+		each_softref_fn fn, void *cb_data)
+{
+	int ret = do_for_each_unsorted(from_sha1, fn, cb_data);
+	if (ret)
+		return ret;
+	ret = do_for_each_sorted(from_sha1, fn, cb_data);
+	return ret;
+}
+
+/* See softrefs.h for documentation */
+int for_each_softref(each_softref_fn fn, void *cb_data)
+{
+	return for_each_softref_with_from(0, fn, cb_data);
+}
+
+static int lookup_softref_helper(
+		const unsigned char *from_sha1, const unsigned char *to_sha1,
+		void *cb_data)
+{
+	struct softref_list **prev = (struct softref_list **) cb_data;
+
+	struct softref_list *current = xmalloc(sizeof(struct softref_list));
+	current->next = *prev;
+	hashcpy(current->from_sha1, from_sha1);
+	hashcpy(current->to_sha1, to_sha1);
+	*prev = current;
+	return 0;
+}
+
+/* See softrefs.h for documentation */
+struct softref_list *lookup_softref(const unsigned char *from_sha1)
+{
+	struct softref_list *result = NULL;
+	struct softref_list **p = &result;
+	if (for_each_softref_with_from(
+			from_sha1, lookup_softref_helper, (void *) p))
+	{
+		delete_softref_list(result);
+		result = NULL;
+	}
+	return result;
+}
+
+/* See softrefs.h for documentation */
+void delete_softref_list(struct softref_list *list)
+{
+	while (list) {
+		struct softref_list *next = list->next;
+		free(list);
+		list = next;
+	}
+}
+
+static int has_softref_helper(
+		const unsigned char *from_sha1, const unsigned char *to_sha1,
+		void *cb_data)
+{
+	const unsigned char *needle = (const unsigned char *) cb_data;
+	if (!hashcmp(to_sha1, needle))
+		return 1; /* found */
+	return 0; /* keep going */
+}
+
+/* See softrefs.h for documentation */
+int has_softref(const unsigned char *from_sha1, const unsigned char *to_sha1)
+{
+	int ret = for_each_softref_with_from(
+			from_sha1, has_softref_helper, (void *) to_sha1);
+	return ret == 1 ? 1 : 0;
+}
+
+
+/*
+ * Merge the unsorted softref entries in unsorted_filename into sorted_filename
+ *
+ * Returns 0 on success; non-zero on failure.
+ *
+ * If sorted_filename does not exist, the entries in unsorted_filename will be
+ * sorted and stored into sorted_filename.
+ * If unsorted_filename does not exist, this function will do nothing and
+ * return 0.
+ */
+static int merge_unsorted_into_sorted(
+		const char *unsorted_filename, const char *sorted_filename)
+{
+	struct softrefs_file unsorted, sorted;
+	char *result_filename = 0;
+	int result_fd = -1;
+	int ret = 0;
+	unsigned long i, j;
+	/* array of pointers to softrefs_entries in unsorted file */
+	struct softrefs_entry **to_insert;
+	/* keep track of last processed entry, to remove duplicates */
+	struct softrefs_entry *prev = NULL;
+
+	/* Open input files */
+	deinit_softrefs_access();
+	open_softrefs_file(unsorted_filename, &unsorted);
+	if (!unsorted.data_len) { /* no unsorted entries; nothing to do */
+		close_softrefs_file(&unsorted);
+		return 0;
+	}
+	open_softrefs_file(sorted_filename, &sorted);
+
+	/* Sort the unsorted entries */
+	to_insert = xmalloc(sizeof(struct softrefs_entry *) * unsorted.data_len);
+	for (i = 0; i < unsorted.data_len; ++i)
+		to_insert[i] = &(unsorted.data[i]);
+	qsort(to_insert, unsorted.data_len, sizeof(struct softrefs_entry *),
+			softrefs_entry_qsort_cmp);
+
+	/* Create result file */
+	result_filename = xmalloc(strlen(sorted_filename) + 4);
+	sprintf(result_filename, "%s.new", sorted_filename);
+	result_fd = open(result_filename, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0666);
+	if (result_fd < 0) {
+		ret = error("Failed to open merge result file %s: %s",
+				result_filename, strerror(errno));
+		goto done;
+	}
+
+	i = 0; /* index into to_insert (the sorted version of unsorted.data) */
+	j = 0; /* index into sorted.data */
+	while (!ret && (i < unsorted.data_len || j < sorted.data_len)) {
+		/* there are still entries in either list */
+		struct softrefs_entry *cur;
+		unsigned char from_sha1[20], to_sha1[20];
+		if (i < unsorted.data_len && j < sorted.data_len) {
+			/* there are still entries in _both_ lists */
+			/* choose "lowest" entry from either list */
+			if (softrefs_entry_cmp(to_insert[i], &(sorted.data[j])) < 0)
+				cur = to_insert[i++];
+			else
+				cur = &(sorted.data[j++]);
+		}
+		else if (i < unsorted.data_len) /* entries left in to_insert */
+			cur = to_insert[i++];
+		else /* entries left in sorted.data */
+			cur = &(sorted.data[j++]);
+
+		if (prev && !softrefs_entry_cmp(prev, cur))
+			continue; /* skip writing if prev == cur */
+		prev = cur;
+
+		/* skip writing if softref involves a non-existing object */
+		if (get_sha1_hex(cur->from_sha1_hex, from_sha1) ||
+			!has_sha1_file(from_sha1) ||
+		    get_sha1_hex(cur->to_sha1_hex,     to_sha1) ||
+			!has_sha1_file(  to_sha1))
+		{
+			continue;
+		}
+
+		ret = write_entry(result_fd, cur);
+	}
+
+done:
+	if (result_fd >= 0 && close(result_fd))
+		ret = error("Failed to close merge result file %s: %s",
+				result_filename, strerror(errno));
+	close_softrefs_file(&sorted);
+	close_softrefs_file(&unsorted);
+	if (ret) { /* Failure. Delete result_filename */
+		if (result_filename && unlink(result_filename))
+			error("Failed to remove merge result file %s: %s",
+					result_filename, strerror(errno));
+	}
+	else { /* Success. Replace sorted_filename with result_filename */
+		if (rename(result_filename, sorted_filename))
+			ret = error("Failed to replace sorted softrefs file %s: %s",
+					sorted_filename, strerror(errno));
+	}
+	return ret;
+}
+
+/*
+ * Merge the sorted softref entries in 'from_file' into 'to_file'
+ *
+ * Returns 0 on success; non-zero on failure.
+ *
+ * If to_file does not exist, from_file will be copied into to_file.
+ * If from_file does not exist, this function will do nothing and return 0.
+ */
+static int merge_sorted_into_sorted(const char *from_file, const char *to_file)
+{
+	struct softrefs_file file1, file2;
+	char *result_filename = 0;
+	int result_fd = -1;
+	int ret = 0;
+	unsigned long i, j;
+	/* keep track of last processed entry, to remove duplicates */
+	struct softrefs_entry *prev = NULL;
+
+	/* Open input files */
+	deinit_softrefs_access();
+	open_softrefs_file(from_file, &file1);
+	if (!file1.data_len) { /* no entries; nothing to do */
+		close_softrefs_file(&file1);
+		return 0;
+	}
+	open_softrefs_file(to_file, &file2);
+
+	/* Create result file */
+	result_filename = xmalloc(strlen(to_file) + 4);
+	sprintf(result_filename, "%s.new", to_file);
+	result_fd = open(result_filename, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0666);
+	if (result_fd < 0) {
+		ret = error("Failed to open merge result file %s: %s",
+				result_filename, strerror(errno));
+		goto done;
+	}
+
+	i = 0; /* index into file1.data */
+	j = 0; /* index into file2.data */
+	while (!ret && (i < file1.data_len || j < file2.data_len)) {
+		/* there are still entries in either list */
+		struct softrefs_entry *cur;
+		unsigned char from_sha1[20], to_sha1[20];
+		if (i < file1.data_len && j < file2.data_len) {
+			/* there are still entries in _both_ lists */
+			/* choose "lowest" entry from either list */
+			if (softrefs_entry_cmp(&(file1.data[i]), &(file2.data[j])) < 0)
+				cur = &(file1.data[i++]);
+			else
+				cur = &(file2.data[j++]);
+		}
+		else if (i < file1.data_len) /* entries left in file1.data */
+			cur = &(file1.data[i++]);
+		else                         /* entries left in file2.data */
+			cur = &(file2.data[j++]);
+
+		if (prev && !softrefs_entry_cmp(prev, cur))
+			continue; /* skip writing if cur and prev are duplicates */
+		prev = cur;
+
+		/* skip writing if softref involves a non-existing object */
+		if (get_sha1_hex(cur->from_sha1_hex, from_sha1) ||
+			!has_sha1_file(from_sha1) ||
+		    get_sha1_hex(cur->to_sha1_hex,     to_sha1) ||
+			!has_sha1_file(  to_sha1))
+		{
+			continue;
+		}
+
+		ret = write_entry(result_fd, cur);
+	}
+
+done:
+	if (result_fd >= 0 && close(result_fd))
+		ret = error("Failed to close merge result file %s: %s",
+				result_filename, strerror(errno));
+	close_softrefs_file(&file2);
+	close_softrefs_file(&file1);
+	if (ret) { /* Failure. Delete result_filename */
+		if (result_filename && unlink(result_filename))
+			error("Failed to remove merge result file %s: %s",
+					result_filename, strerror(errno));
+	}
+	else { /* Success. Replace to_file with result_filename */
+		if (rename(result_filename, to_file))
+			ret = error("Failed to replace sorted softrefs file %s: %s",
+					to_file, strerror(errno));
+	}
+	return ret;
+}
+
+/* See softrefs.h for documentation */
+int add_softrefs(const struct softref_list *list)
+{
+	struct softrefs_entry entry;
+	int fd;
+	struct stat st;
+	int ret = 0;
+
+	/* Close internal softrefs file, if initialized. */
+	deinit_softrefs_access();
+
+	/* FIXME: File locking!? */
+	fd = open(git_path(UNSORTED_FILENAME), O_WRONLY|O_APPEND|O_CREAT, 0666);
+	if (fd < 0)
+		return error("Failed to open softrefs file %s: %s",
+				git_path(UNSORTED_FILENAME), strerror(errno));
+	if (fstat(fd, &st))
+		return error("Failed to fstat softrefs file %s: %s",
+				git_path(UNSORTED_FILENAME), strerror(errno));
+	if (st.st_size % sizeof(struct softrefs_entry))
+		return error("Refuse to edit softrefs file %s: File does not have whole number of softref entries",
+				git_path(UNSORTED_FILENAME));
+
+	/* File is open; start writing entries */
+	while (list) {
+		if (!hashcmp(list->from_sha1, list->to_sha1)) {
+			/* self-reference: from_sha1 == to_sha1 */
+			error("Cannot add self-reference (%s -> %s)",
+					sha1_to_hex(list->from_sha1),
+					sha1_to_hex(list->to_sha1));
+		}
+		else if (has_softref(list->from_sha1, list->to_sha1)) {
+			/* softref exists already */
+			/* nada */;
+		}
+		else {  /* softref is ok */
+			strcpy(entry.from_sha1_hex, sha1_to_hex(list->from_sha1));
+			strcpy(entry.to_sha1_hex, sha1_to_hex(list->to_sha1));
+			entry.space = ' ';
+			entry.lf = '\n';
+			if (write_entry(fd, &entry))
+				error("Failed to write entry to softrefs file %s: %s",
+						git_path(UNSORTED_FILENAME),
+						strerror(errno));
+			else /* write_entry() succeeded */
+				ret++;
+		}
+		list = list->next;
+	}
+
+	/* finished writing entries */
+	if (close(fd))
+		return error("Failed to close softrefs file %s: %s",
+				git_path(UNSORTED_FILENAME), strerror(errno));
+
+	merge_unsorted_softrefs(NULL, 0);
+	return ret;
+}
+
+/* See softrefs.h for documentation */
+int add_softref(const unsigned char *from_sha1, const unsigned char *to_sha1)
+{
+	struct softref_list l;
+	int ret;
+
+	if (!hashcmp(from_sha1, to_sha1))
+		return error("Cannot add self-reference (%s -> %s)",
+			sha1_to_hex(from_sha1), sha1_to_hex(to_sha1));
+
+	hashcpy(l.from_sha1, from_sha1);
+	hashcpy(l.to_sha1, to_sha1);
+	l.next = NULL;
+	ret = add_softrefs(&l);
+	switch (ret) {
+		case 0:  return 1;
+		case 1:  return 0;
+		default: return -1;
+	}
+}
+
+/* See softrefs.h for documentation */
+int merge_unsorted_softrefs(const char *unsorted_file, int force)
+{
+	struct stat st;
+	int num_entries;
+	int delete_file = 0; /* set to true to delete unsorted_file afterwards */
+	int ret = 0;
+
+	if (unsorted_file == NULL) { /* use UNSORTED_FILENAME */
+		unsorted_file = git_path(UNSORTED_FILENAME);
+		delete_file = 1;
+		if (access(unsorted_file, F_OK))
+			/* UNSORTED_FILENAME doesn't exist; nothing to do */
+			return 0;
+	}
+	else {
+		force = 1; /* no threshold on merging external file */
+		if (access(unsorted_file, F_OK))
+			/* external unsorted file doesn't exist; failure */
+			return error("Failed to access softrefs file %s: %s",
+					unsorted_file, strerror(errno));
+	}
+
+	if (stat(unsorted_file, &st))
+		return error("Failed to stat() softrefs file %s: %s",
+				unsorted_file, strerror(errno));
+	if (st.st_size % sizeof(struct softrefs_entry))
+		return error("Corrupt softrefs file %s: Aborting",
+				unsorted_file);
+	if (st.st_size == 0) /* file is empty; nothing to do */
+		return 0;
+	num_entries = st.st_size / sizeof(struct softrefs_entry);
+	if (force || num_entries > MAX_UNSORTED_ENTRIES) { /* do it */
+		ret = merge_unsorted_into_sorted(
+				unsorted_file, git_path(SORTED_FILENAME));
+		if (!ret && delete_file && unlink(unsorted_file))
+			error("Failed to remove unsorted softrefs file %s: %s",
+					unsorted_file, strerror(errno));
+	}
+	return ret;
+}
+
+/* See softrefs.h for documentation */
+int merge_sorted_softrefs(const char *sorted_file)
+{
+	struct stat st;
+	if (access(sorted_file, F_OK)) /* external file doesn't exist; FAIL */
+		return error("Failed to access softrefs file %s: %s",
+				sorted_file, strerror(errno));
+	if (stat(sorted_file, &st))
+		return error("Failed to stat() softrefs file %s: %s",
+				sorted_file, strerror(errno));
+	if (st.st_size % sizeof(struct softrefs_entry))
+		return error("Corrupt softrefs file %s: Aborting", sorted_file);
+	if (st.st_size == 0) /* file is empty; nothing to do */
+		return 0;
+	return merge_sorted_into_sorted(sorted_file, git_path(SORTED_FILENAME));
+}
-- 
1.5.2.1.144.gabc40

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 3/7] Softrefs: Add git-softref, a builtin command for adding, listing and administering softrefs
  2007-06-09 18:19 ` [PATCH 0/7] Introduce soft references (softrefs) Johan Herland
  2007-06-09 18:21   ` [PATCH 1/7] Softrefs: Add softrefs header file with API documentation Johan Herland
  2007-06-09 18:22   ` [PATCH 2/7] Softrefs: Add implementation of softrefs API Johan Herland
@ 2007-06-09 18:22   ` Johan Herland
  2007-06-09 18:23   ` [PATCH 4/7] Softrefs: Add manual page documenting git-softref and softrefs subsystem in general Johan Herland
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-09 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds

git-softref is meant to be used from shell scripts that need to interact
with the softrefs database. The builtin command provides most of the
functionality present in the softrefs C API.

Documentation to follow in a subsequent patch.

Signed-off-by: Johan Herland <johan@herland.net>
---
 builtin-softref.c |  167 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 167 insertions(+), 0 deletions(-)
 create mode 100644 builtin-softref.c

diff --git a/builtin-softref.c b/builtin-softref.c
new file mode 100644
index 0000000..f95db4e
--- /dev/null
+++ b/builtin-softref.c
@@ -0,0 +1,167 @@
+/*
+ * git softref builtin command
+ *
+ * Add, list and administer soft references (softrefs)
+ *
+ * Copyright (c) 2007 Johan Herland
+ */
+
+#include "cache.h"
+#include "tag.h"
+#include "refs.h"
+#include "softrefs.h"
+
+static const char builtin_softref_usage[] =
+	"git-softref [ --list [<from-object>]"
+		" | --has <from-object> <to-object>"
+		" | --add <from-object> <to-object>"
+		" | --rebuild-tags"
+		" | --merge-unsorted [<softrefs-file>]"
+		" | --merge-sorted <softrefs-file> ]";
+
+static int list_helper(
+		const unsigned char *from_sha1,
+		const unsigned char *to_sha1,
+		void *cb_data)
+{
+	printf("%s %s\n", sha1_to_hex(from_sha1), sha1_to_hex(to_sha1));
+	return 0;
+}
+
+int rebuild_tags_helper(
+		const char *refname,
+		const unsigned char *sha1,
+		int flags,
+		void *cb_data)
+{
+	struct object *o = parse_object(sha1);
+	if (o && o->type == OBJ_TAG) {
+		struct tag *t = (struct tag *) o;
+		struct softref_list **prev = (struct softref_list **) cb_data;
+		struct softref_list *current = xmalloc(sizeof(struct softref_list));
+		current->next = *prev;
+		hashcpy(current->from_sha1, t->tagged->sha1);
+		hashcpy(current->to_sha1,   t->object.sha1);
+		*prev = current;
+	}
+	return 0;
+}
+
+int cmd_softref(int argc, const char **argv, const char *prefix)
+{
+	int i;
+	int show_usage = 0, list = 0, has = 0, add = 0, rebuild_tags = 0,
+	    merge_unsorted = 0, merge_sorted = 0;
+	const char *from_name = NULL, *to_name = NULL, *softrefs_file = NULL;
+	unsigned char from_sha1[20], to_sha1[20];
+
+	git_config(git_default_config);
+
+	for (i = 1; i < argc; i++) {
+		const char *arg = argv[i];
+		if (!strcmp(arg, "--list")) {
+			list = 1;
+			if (i + 1 < argc) /* <from-object> given */
+				from_name = argv[++i];
+		}
+		else if (!strcmp(arg, "--has")) {
+			has = 1;
+			if (i + 2 >= argc)
+				show_usage = error("--has needs two arguments: <from-object> and <to-object>");
+			else {
+				from_name = argv[++i];
+				to_name = argv[++i];
+			}
+		}
+		else if (!strcmp(arg, "--add")) {
+			add = 1;
+			if (i + 2 >= argc)
+				show_usage = error("--add needs two arguments: <from-object> and <to-object>");
+			else {
+				from_name = argv[++i];
+				to_name = argv[++i];
+			}
+		}
+		else if (!strcmp(arg, "--rebuild-tags"))
+			rebuild_tags = 1;
+		else if (!strcmp(arg, "--merge-unsorted")) {
+			merge_unsorted = 1;
+			if (i + 1 < argc) /* <softrefs-file> given */
+				softrefs_file = argv[++i];
+		}
+		else if (!strcmp(arg, "--merge-sorted")) {
+			merge_sorted = 1;
+			if (i + 1 >= argc)
+				show_usage = error("--merge-sorted needs one argument: <softrefs-file>");
+			else
+				softrefs_file = argv[++i];
+		}
+		else
+			show_usage = error("Unknown argument '%s'", arg);
+	}
+
+	/* default to --list if no command given; fail if more than one */
+	switch(list + has + add + rebuild_tags + merge_unsorted + merge_sorted) {
+		case 0: list = 1; break;
+		case 1: break;
+		default: show_usage = 1;
+	}
+	if (show_usage)
+		usage(builtin_softref_usage);
+
+	if (list) {
+		if (from_name) { /* show from_name's softrefs */
+			if (get_sha1(from_name, from_sha1))
+				die("Not a valid object name %s", from_name);
+			if (for_each_softref_with_from(from_sha1, list_helper, 0))
+				die("Error encountered while listing softrefs");
+		}
+		else if (for_each_softref(list_helper, 0)) /* show all softrefs */
+			die("Error encountered while listing softrefs");
+	}
+	else if (has) {
+		if (get_sha1(from_name, from_sha1) || !has_sha1_file(from_sha1))
+			die("Not a valid object name %s", from_name);
+		if (get_sha1(to_name, to_sha1) || !has_sha1_file(to_sha1))
+			die("Not a valid object name %s", to_name);
+		return has_softref(from_sha1, to_sha1);
+	}
+	else if (add) {
+		if (get_sha1(from_name, from_sha1) || !has_sha1_file(from_sha1))
+			die("Not a valid object name %s", from_name);
+		if (get_sha1(to_name, to_sha1) || !has_sha1_file(to_sha1))
+			die("Not a valid object name %s", to_name);
+		if (add_softref(from_sha1, to_sha1) < 0)
+			die("Failed to create softref from %s to %s",
+				from_name, to_name);
+	}
+	else if (rebuild_tags) {
+		/*
+		 * Find all tag objects, and add their corresponding softrefs
+		 *
+		 * For now, we'll have to settle for referenced tag objects as
+		 * it seems to be non-trivial to find _all_ the tag objects in
+		 * the db (including unreachables).
+		 */
+		struct softref_list *to_add = NULL;
+		struct softref_list **p = &to_add;
+		int ret;
+		if (for_each_tag_ref(rebuild_tags_helper, (void *) p)) {
+			delete_softref_list(to_add);
+			die("Failed to find tag objects");
+		}
+		ret = add_softrefs(to_add);
+		delete_softref_list(to_add);
+		if (ret < 0)
+			die("Failed to add softrefs for tag objects");
+		printf("Added %i missing softrefs for tag objects.\n", ret);
+	}
+	else if (merge_unsorted) {
+		return merge_unsorted_softrefs(softrefs_file, 1);
+	}
+	else if (merge_sorted) {
+		return merge_sorted_softrefs(softrefs_file);
+	}
+
+	return 0;
+}
-- 
1.5.2.1.144.gabc40

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 4/7] Softrefs: Add manual page documenting git-softref and softrefs subsystem in general
  2007-06-09 18:19 ` [PATCH 0/7] Introduce soft references (softrefs) Johan Herland
                     ` (2 preceding siblings ...)
  2007-06-09 18:22   ` [PATCH 3/7] Softrefs: Add git-softref, a builtin command for adding, listing and administering softrefs Johan Herland
@ 2007-06-09 18:23   ` Johan Herland
  2007-06-09 18:23   ` [PATCH 5/7] Softrefs: Add testcases for basic softrefs behaviour Johan Herland
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-09 18:23 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/git-softref.txt |  119 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 119 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-softref.txt

diff --git a/Documentation/git-softref.txt b/Documentation/git-softref.txt
new file mode 100644
index 0000000..6a3e13b
--- /dev/null
+++ b/Documentation/git-softref.txt
@@ -0,0 +1,119 @@
+git-softref(1)
+==============
+
+NAME
+----
+git-softref - Create, list and administer soft references
+
+
+SYNOPSIS
+--------
+[verse]
+'git-softref' --list [<from-object>]
+'git-softref' --has <from-object> <to-object>
+'git-softref' --add <from-object> <to-object>
+'git-softref' --rebuild-tags
+'git-softref' --merge-unsorted [<softrefs-file>]
+'git-softref' --merge-sorted <softrefs-file>
+
+
+DESCRIPTION
+-----------
+Query and administer soft references in a git repository.
+
+Soft references are used to declare reachability between already existing
+objects. An object (called the 'to-object') may be declared reachable from
+another object (the 'from-object') without affecting the immutability of either
+object.
+
+The `--list` option will list existing softrefs in the database. If given the
+optional <from-object>, the list is limited to softrefs from the given object.
+The `--has` option is used to check for the existence of a softref between two
+given objects, similarly the `--add` option is used to add such a softref.
+
+The `--rebuild-tags` option is used to generate softrefs for all tag objects in
+the repository reachable from tag refs. Tag objects use softrefs to declare
+reachability 'from' the tagged object, 'to' the tag object. This allows for tag
+objects to the cloned/fetched/pushed along with their associated objects.
+
+Finally, the `--merge-unsorted` and `--merge-sorted` options are used to merge
+softrefs files into the sorted softrefs db. The filename argument must point
+to an existing file in unsorted/sorted softrefs format. The softrefs entries
+in this file will be merged into the sorted softrefs db. The `--merge-unsorted`
+option may be used 'without' a filename, in which case the currently unsorted
+portion of the softrefs db will be merged into the sorted db. Note that this
+last operation is also done regularly by default when adding softrefs, so
+there is no need to invoke this option during regular use.
+
+
+OPTIONS
+-------
+--list [<from-object>]::
+	List all softrefs that have the given '<from-object>'.
+	If '<from-object>' is not given, list 'all' softrefs in the repository.
+
+--has <from-object> <to-object>::
+	Return with exit code 1 if the given softref exists in the repository.
+	Return with exit code 0 otherwise.
+
+--add <from-object> <to-object>::
+	Add a softref from the given '<from-object>' to the given '<to-object>'.
+	The '<to-object>' will from now on be considered reachable from the
+	'<from-object>'.
+
+--rebuild-tags::
+	Automatically generate softrefs for all tag objects reachable from
+	tag refs in the repository.
+
+--merge-unsorted [<softrefs-file>]::
+	Merge the softrefs in the given unsorted '<softrefs-file>' into the
+	sorted softrefs db. If a filename is not given, force a merge of the
+	internal unsorted softrefs store into the sorted softrefs db.
+
+--merge-sorted <softrefs-file>::
+	Merge the softrefs in the given sorted '<softrefs-file>' into the
+	sorted softrefs db.
+
+
+DISCUSSION
+----------
+Soft references (softrefs) is a general mechanism for declaring a relationship
+between two existing arbitrary objects in the repo. Softrefs differ from the
+existing reachability relationship in that a softref may be created after
+'both' of the involved objects have been added to the repo. In contrast,
+regular reachability depends on the reachable object's name being stored
+'inside' the other object. A reachability relationship can therefore not be
+created at a later time without violating the immutability of git objects.
+
+Softrefs are defined as going 'from' one object 'to' another object. Once
+a softref between two objects has been created, the "to" object is considered
+reachable from the "from" object.
+
+Also, softrefs are stored in a way that makes it easy and quick to find all
+the "to" objects reachable from a given "from" object.
+
+The softrefs db consists of two files: `.git/softrefs.unsorted` and
+`.git/softrefs.sorted`. Both files use the same format; one softref per line
+of the form "`<from-sha1> <to-sha1>\n`". Each sha1 sum is 40 bytes long; this
+makes each entry exactly 82 bytes long.
+
+The entries in `.git/softrefs.sorted` are sorted on `<from-sha1>`, in order to
+make lookup fast. This file is also known as the "sorted softrefs db".
+
+The entries in `.git/softrefs.unsorted` are 'not' sorted. This is to make
+insertion fast. This file is also known as the "unsorted softrefs db".
+
+When softrefs are created, they are appended to `.git/softrefs.unsorted`.
+When `.git/softrefs.unsorted` reach a certain number of entries, all the
+entries in `.git/softrefs.unsorted` are automatically merged into
+`.git/softrefs.sorted`.
+
+
+Author
+------
+Written by Johan Herland <johan@herland.net>.
+
+
+GIT
+---
+Part of the gitlink:git[7] suite
-- 
1.5.2.1.144.gabc40

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 5/7] Softrefs: Add testcases for basic softrefs behaviour
  2007-06-09 18:19 ` [PATCH 0/7] Introduce soft references (softrefs) Johan Herland
                     ` (3 preceding siblings ...)
  2007-06-09 18:23   ` [PATCH 4/7] Softrefs: Add manual page documenting git-softref and softrefs subsystem in general Johan Herland
@ 2007-06-09 18:23   ` Johan Herland
  2007-06-09 18:24   ` [PATCH 6/7] Softrefs: Administrivia associated with softrefs subsystem and git-softref builtin Johan Herland
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-09 18:23 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds

Adds testing of the basic options available to the git-softref command.

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3050-softrefs.sh |  314 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 314 insertions(+), 0 deletions(-)
 create mode 100755 t/t3050-softrefs.sh

diff --git a/t/t3050-softrefs.sh b/t/t3050-softrefs.sh
new file mode 100755
index 0000000..a925178
--- /dev/null
+++ b/t/t3050-softrefs.sh
@@ -0,0 +1,314 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johan Herland
+#
+
+test_description='Basic functionality of soft references'
+. ./test-lib.sh
+
+
+# Prepare repo and create some notes
+
+test_expect_success 'Populating repo with test data' '
+	echo "foo" > foo &&
+	git-add foo &&
+	test_tick &&
+	git-commit -m "Initial commit" &&
+	git-tag -a -m "Tagging initial commit" footag &&
+	echo "bar" >> foo &&
+	test_tick &&
+	git-commit -m "Second commit" foo &&
+	git-tag -a -m "Tagging second commit" bartag
+'
+
+# At this point we should have:
+# - commit @ 301711b66fe71164f646b798706a2c1f7024da8d ("Initial commit")
+#    - tag @ ad60bc179c6874af6d97f181c67f11adcca5122b ("footag")
+# - commit @ 9671cbee7ad26528645b2665c8f74d39a6288864 ("Second commit")
+#    - tag @ a927fc832d42f1f64d8318e8acec43545d9562de ("bartag")
+# - The tag creation should also have created softrefs:
+#   - From "Initial commit" to "footag"
+#   - From  "Second commit" to "bartag"
+
+# Testing git-softref --list
+
+test_expect_success 'Testing git-softref --list on initial test data (1)' '
+	cat > expected_output << EOF &&
+301711b66fe71164f646b798706a2c1f7024da8d ad60bc179c6874af6d97f181c67f11adcca5122b
+9671cbee7ad26528645b2665c8f74d39a6288864 a927fc832d42f1f64d8318e8acec43545d9562de
+EOF
+	git-softref > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --list > actual_output 2>&1 &&
+	cmp actual_output expected_output
+'
+
+test_expect_success 'Testing git-softref --list on initial test data (2)' '
+	cat > expected_output << EOF &&
+9671cbee7ad26528645b2665c8f74d39a6288864 a927fc832d42f1f64d8318e8acec43545d9562de
+EOF
+	git-softref --list 9671cbee7ad26528645b2665c8f74d39a6288864 > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --list HEAD > actual_output 2>&1 &&
+	cmp actual_output expected_output
+'
+
+test_expect_success 'Testing git-softref --list on initial test data (3)' '
+	cat > expected_output << EOF &&
+301711b66fe71164f646b798706a2c1f7024da8d ad60bc179c6874af6d97f181c67f11adcca5122b
+EOF
+	git-softref --list 301711b66fe71164f646b798706a2c1f7024da8d > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --list HEAD^ > actual_output 2>&1 &&
+	cmp actual_output expected_output
+'
+
+test_expect_success 'Testing git-softref --list on initial test data (4)' '
+	cat > expected_output << EOF &&
+EOF
+	git-softref --list footag > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --list bartag > actual_output 2>&1 &&
+	cmp actual_output expected_output
+'
+
+# Testing git-softref --has
+
+test_expect_success 'Testing git-softref --has on initial test data' '
+	(git-softref --has 301711b66fe71164f646b798706a2c1f7024da8d ad60bc179c6874af6d97f181c67f11adcca5122b;
+	test "$?" = "1") &&
+	(git-softref --has HEAD^ footag;
+	test "$?" = "1") &&
+	(git-softref --has footag^{} footag;
+	test "$?" = "1") &&
+	(git-softref --has 9671cbee7ad26528645b2665c8f74d39a6288864 a927fc832d42f1f64d8318e8acec43545d9562de;
+	test "$?" = "1") &&
+	(git-softref --has HEAD bartag;
+	test "$?" = "1") &&
+	(git-softref --has bartag^{} bartag;
+	test "$?" = "1") &&
+	(git-softref --has ad60bc179c6874af6d97f181c67f11adcca5122b 301711b66fe71164f646b798706a2c1f7024da8d;
+	test "$?" = "0") &&
+	(git-softref --has a927fc832d42f1f64d8318e8acec43545d9562de 9671cbee7ad26528645b2665c8f74d39a6288864;
+	test "$?" = "0") &&
+	(git-softref --has HEAD HEAD^;
+	test "$?" = "0") &&
+	(git-softref --has HEAD^ HEAD;
+	test "$?" = "0") &&
+	(git-softref --has footag HEAD;
+	test "$?" = "0") &&
+	(git-softref --has bartag HEAD;
+	test "$?" = "0") &&
+	(git-softref --has footag HEAD^;
+	test "$?" = "0") &&
+	(git-softref --has bartag HEAD^;
+	test "$?" = "0") &&
+	(git-softref --has footag bartag;
+	test "$?" = "0") &&
+	(git-softref --has bartag footag;
+	test "$?" = "0")
+'
+
+# Testing git-softref --rebuild-tags
+
+test_expect_success 'Testing git-softref --rebuild-tags on initial test data' '
+	cat > expected_output << EOF &&
+Added 0 missing softrefs for tag objects.
+EOF
+	cat .git/softrefs.* | sort > expected_softrefs &&
+	git-softref --rebuild-tags > actual_output 2>&1 &&
+	cat .git/softrefs.* | sort > actual_softrefs &&
+	cmp actual_output   expected_output &&
+	cmp actual_softrefs expected_softrefs
+'
+
+# Testing git-softref --add
+
+test_expect_success 'Testing git-softref --add with existing softref' '
+	cat > expected_output << EOF &&
+EOF
+	cat .git/softrefs.* | sort > expected_softrefs &&
+	git-softref --add HEAD bartag > actual_output 2>&1 &&
+	cat .git/softrefs.* | sort > actual_softrefs &&
+	cmp actual_output   expected_output &&
+	cmp actual_softrefs expected_softrefs
+'
+
+test_expect_success 'Testing git-softref --add with self-refential softref' '
+	cat > expected_output << EOF &&
+error: Cannot add self-reference (9671cbee7ad26528645b2665c8f74d39a6288864 -> 9671cbee7ad26528645b2665c8f74d39a6288864)
+fatal: Failed to create softref from HEAD to HEAD
+EOF
+	cat .git/softrefs.* | sort > expected_softrefs &&
+	(git-softref --add HEAD HEAD > actual_output 2>&1; test "$?" != "0") &&
+	cat .git/softrefs.* | sort > actual_softrefs &&
+	cmp actual_output   expected_output &&
+	cmp actual_softrefs expected_softrefs
+'
+
+test_expect_success 'Testing git-softref --add with non-existing objects (1)' '
+	cat > expected_output << EOF &&
+fatal: Not a valid object name 1234567890123456789012345678901234567890
+EOF
+	cat .git/softrefs.* | sort > expected_softrefs &&
+	(git-softref --add 1234567890123456789012345678901234567890 HEAD > actual_output 2>&1;
+		test "$?" != "0") &&
+	cat .git/softrefs.* | sort > actual_softrefs &&
+	cmp actual_output   expected_output &&
+	cmp actual_softrefs expected_softrefs &&
+	(git-softref --add HEAD 1234567890123456789012345678901234567890 > actual_output 2>&1;
+		test "$?" != "0") &&
+	cat .git/softrefs.* | sort > actual_softrefs &&
+	cmp actual_output   expected_output &&
+	cmp actual_softrefs expected_softrefs
+'
+
+test_expect_success 'Testing git-softref --add with non-existing objects (2)' '
+	cat > expected_output << EOF &&
+fatal: Not a valid object name HEAD^^^
+EOF
+	cat .git/softrefs.* | sort > expected_softrefs &&
+	(git-softref --add HEAD^^^ HEAD > actual_output 2>&1; test "$?" != "0") &&
+	cat .git/softrefs.* | sort > actual_softrefs &&
+	cmp actual_output   expected_output &&
+	cmp actual_softrefs expected_softrefs
+'
+
+test_expect_success 'Testing git-softref --add with valid arguments (1)' '
+	cat > expected_output << EOF &&
+EOF
+	cat > new_softref << EOF
+301711b66fe71164f646b798706a2c1f7024da8d 9671cbee7ad26528645b2665c8f74d39a6288864
+EOF
+	cat .git/softrefs.* new_softref | sort > expected_softrefs &&
+	git-softref --add HEAD^ HEAD > actual_output 2>&1 &&
+	cat .git/softrefs.* | sort > actual_softrefs &&
+	cmp actual_output   expected_output &&
+	cmp actual_softrefs expected_softrefs
+'
+
+test_expect_success 'Testing git-softref --add with valid arguments (2)' '
+	cat > expected_output << EOF &&
+EOF
+	cat > new_softref << EOF
+ad60bc179c6874af6d97f181c67f11adcca5122b a927fc832d42f1f64d8318e8acec43545d9562de
+EOF
+	cat .git/softrefs.* new_softref | sort > expected_softrefs &&
+	git-softref --add footag bartag > actual_output 2>&1 &&
+	cat .git/softrefs.* | sort > actual_softrefs &&
+	cmp actual_output   expected_output &&
+	cmp actual_softrefs expected_softrefs
+'
+
+# Removing softrefs
+
+test_expect_success 'Removing all softrefs' '
+	rm .git/softrefs.*
+'
+
+# Testing git-softref --list and --has
+
+test_expect_success 'Testing git-softref with no softrefs' '
+	cat > expected_output << EOF &&
+EOF
+	git-softref > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --list > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --list 9671cbee7ad26528645b2665c8f74d39a6288864 > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --list HEAD > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --list HEAD^ > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --list footag > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --list bartag > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --has HEAD bartag > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --has HEAD^ footag > actual_output 2>&1 &&
+	cmp actual_output expected_output
+'
+
+# Testing git-softref --rebuild-tags
+# (Should recreated missing softrefs for tag objects reachable from 'refs/tags')
+
+test_expect_success 'Testing git-softref --rebuild-tags to rebuild missing tag softrefs' '
+	cat > expected_output << EOF &&
+Added 2 missing softrefs for tag objects.
+EOF
+	cat > new_softref << EOF
+301711b66fe71164f646b798706a2c1f7024da8d ad60bc179c6874af6d97f181c67f11adcca5122b
+9671cbee7ad26528645b2665c8f74d39a6288864 a927fc832d42f1f64d8318e8acec43545d9562de
+EOF
+	cat .git/softrefs.* new_softref | sort > expected_softrefs &&
+	git-softref --rebuild-tags > actual_output 2>&1 &&
+	cat .git/softrefs.* | sort > actual_softrefs &&
+	cmp actual_output   expected_output &&
+	cmp actual_softrefs expected_softrefs
+'
+
+# Testing git-softref --merge-unsorted
+
+test_expect_success 'Testing git-softref --merge-unsorted' '
+	cat > expected_output << EOF &&
+EOF
+	rm .git/softrefs*
+	cat > .git/softrefs.unsorted << EOF
+9671cbee7ad26528645b2665c8f74d39a6288864 a927fc832d42f1f64d8318e8acec43545d9562de
+301711b66fe71164f646b798706a2c1f7024da8d ad60bc179c6874af6d97f181c67f11adcca5122b
+EOF
+	cat > expected_softrefs << EOF
+301711b66fe71164f646b798706a2c1f7024da8d ad60bc179c6874af6d97f181c67f11adcca5122b
+9671cbee7ad26528645b2665c8f74d39a6288864 a927fc832d42f1f64d8318e8acec43545d9562de
+EOF
+	git-softref --merge-unsorted > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	cmp .git/softrefs.sorted expected_softrefs &&
+	test ! -e .git/softrefs.unsorted
+'
+
+# Testing git-softref --merge-unsorted <filename>
+
+test_expect_success 'Testing git-softref --merge-unsorted <filename>' '
+	cat > expected_output << EOF &&
+EOF
+	rm .git/softrefs*
+	cat > new_softrefs << EOF
+9671cbee7ad26528645b2665c8f74d39a6288864 a927fc832d42f1f64d8318e8acec43545d9562de
+301711b66fe71164f646b798706a2c1f7024da8d ad60bc179c6874af6d97f181c67f11adcca5122b
+EOF
+	cat > expected_softrefs << EOF
+301711b66fe71164f646b798706a2c1f7024da8d ad60bc179c6874af6d97f181c67f11adcca5122b
+9671cbee7ad26528645b2665c8f74d39a6288864 a927fc832d42f1f64d8318e8acec43545d9562de
+EOF
+	git-softref --merge-unsorted new_softrefs > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	cmp .git/softrefs.sorted expected_softrefs &&
+	test -e new_softrefs
+'
+
+# Testing git-softref --merge-sorted <filename>
+
+test_expect_success 'Testing git-softref --merge-sorted <filename>' '
+	cat > expected_output << EOF &&
+EOF
+	rm .git/softrefs*
+	cat > new_softrefs << EOF
+301711b66fe71164f646b798706a2c1f7024da8d ad60bc179c6874af6d97f181c67f11adcca5122b
+9671cbee7ad26528645b2665c8f74d39a6288864 a927fc832d42f1f64d8318e8acec43545d9562de
+EOF
+	cat > expected_softrefs << EOF
+301711b66fe71164f646b798706a2c1f7024da8d ad60bc179c6874af6d97f181c67f11adcca5122b
+9671cbee7ad26528645b2665c8f74d39a6288864 a927fc832d42f1f64d8318e8acec43545d9562de
+EOF
+	git-softref --merge-sorted new_softrefs > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	cmp .git/softrefs.sorted expected_softrefs &&
+	test -e new_softrefs
+'
+
+# FIXME: More testing needed on how softrefs interact with the rest of git
+
+test_done
-- 
1.5.2.1.144.gabc40

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 6/7] Softrefs: Administrivia associated with softrefs subsystem and git-softref builtin
  2007-06-09 18:19 ` [PATCH 0/7] Introduce soft references (softrefs) Johan Herland
                     ` (4 preceding siblings ...)
  2007-06-09 18:23   ` [PATCH 5/7] Softrefs: Add testcases for basic softrefs behaviour Johan Herland
@ 2007-06-09 18:24   ` Johan Herland
  2007-06-09 18:24   ` [PATCH 7/7] Teach git-mktag to register softrefs for all tag objects Johan Herland
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-09 18:24 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds

Also cleans up sorting and whitespace in Documentation/cmd-list.perl

Signed-off-by: Johan Herland <johan@herland.net>
---
 .gitignore                  |    1 +
 Documentation/cmd-list.perl |    7 ++++---
 Makefile                    |    6 ++++--
 builtin.h                   |    1 +
 git.c                       |    1 +
 5 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/.gitignore b/.gitignore
index 27e5aeb..7fd6904 100644
--- a/.gitignore
+++ b/.gitignore
@@ -119,6 +119,7 @@ git-show
 git-show-branch
 git-show-index
 git-show-ref
+git-softref
 git-ssh-fetch
 git-ssh-pull
 git-ssh-push
diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
index a181f75..4e1f45b 100755
--- a/Documentation/cmd-list.perl
+++ b/Documentation/cmd-list.perl
@@ -90,6 +90,7 @@ git-clean                               mainporcelain
 git-clone                               mainporcelain
 git-commit                              mainporcelain
 git-commit-tree                         plumbingmanipulators
+git-config                              ancillarymanipulators
 git-convert-objects                     ancillarymanipulators
 git-count-objects                       ancillaryinterrogators
 git-cvsexportcommit                     foreignscminterface
@@ -101,13 +102,13 @@ git-diff-files                          plumbinginterrogators
 git-diff-index                          plumbinginterrogators
 git-diff                                mainporcelain
 git-diff-tree                           plumbinginterrogators
-git-fast-import				ancillarymanipulators
+git-fast-import                         ancillarymanipulators
 git-fetch                               mainporcelain
 git-fetch-pack                          synchingrepositories
 git-fmt-merge-msg                       purehelpers
 git-for-each-ref                        plumbinginterrogators
 git-format-patch                        mainporcelain
-git-fsck	                        ancillaryinterrogators
+git-fsck                                ancillaryinterrogators
 git-gc                                  mainporcelain
 git-get-tar-commit-id                   ancillaryinterrogators
 git-grep                                mainporcelain
@@ -155,7 +156,6 @@ git-receive-pack                        synchelpers
 git-reflog                              ancillarymanipulators
 git-relink                              ancillarymanipulators
 git-repack                              ancillarymanipulators
-git-config                              ancillarymanipulators
 git-remote                              ancillarymanipulators
 git-request-pull                        foreignscminterface
 git-rerere                              ancillaryinterrogators
@@ -174,6 +174,7 @@ git-show-branch                         ancillaryinterrogators
 git-show-index                          plumbinginterrogators
 git-show-ref                            plumbinginterrogators
 git-sh-setup                            purehelpers
+git-softref                             ancillarymanipulators
 git-ssh-fetch                           synchingrepositories
 git-ssh-upload                          synchingrepositories
 git-status                              mainporcelain
diff --git a/Makefile b/Makefile
index 0f75955..22e3e53 100644
--- a/Makefile
+++ b/Makefile
@@ -296,7 +296,7 @@ LIB_H = \
 	run-command.h strbuf.h tag.h tree.h git-compat-util.h revision.h \
 	tree-walk.h log-tree.h dir.h path-list.h unpack-trees.h builtin.h \
 	utf8.h reflog-walk.h patch-ids.h attr.h decorate.h progress.h \
-	mailmap.h remote.h
+	mailmap.h remote.h softrefs.h
 
 DIFF_OBJS = \
 	diff.o diff-lib.o diffcore-break.o diffcore-order.o \
@@ -318,7 +318,8 @@ LIB_OBJS = \
 	write_or_die.o trace.o list-objects.o grep.o match-trees.o \
 	alloc.o merge-file.o path-list.o help.o unpack-trees.o $(DIFF_OBJS) \
 	color.o wt-status.o archive-zip.o archive-tar.o shallow.o utf8.o \
-	convert.o attr.o decorate.o progress.o mailmap.o symlinks.o remote.o
+	convert.o attr.o decorate.o progress.o mailmap.o symlinks.o remote.o \
+	softrefs.o
 
 BUILTIN_OBJS = \
 	builtin-add.o \
@@ -370,6 +371,7 @@ BUILTIN_OBJS = \
 	builtin-runstatus.o \
 	builtin-shortlog.o \
 	builtin-show-branch.o \
+	builtin-softref.o \
 	builtin-stripspace.o \
 	builtin-symbolic-ref.o \
 	builtin-tar-tree.o \
diff --git a/builtin.h b/builtin.h
index da4834c..beae52c 100644
--- a/builtin.h
+++ b/builtin.h
@@ -67,6 +67,7 @@ extern int cmd_runstatus(int argc, const char **argv, const char *prefix);
 extern int cmd_shortlog(int argc, const char **argv, const char *prefix);
 extern int cmd_show(int argc, const char **argv, const char *prefix);
 extern int cmd_show_branch(int argc, const char **argv, const char *prefix);
+extern int cmd_softref(int argc, const char **argv, const char *prefix);
 extern int cmd_stripspace(int argc, const char **argv, const char *prefix);
 extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
 extern int cmd_tar_tree(int argc, const char **argv, const char *prefix);
diff --git a/git.c b/git.c
index 29b55a1..96cc0b8 100644
--- a/git.c
+++ b/git.c
@@ -283,6 +283,7 @@ static void handle_internal_command(int argc, const char **argv, char **envp)
 		{ "shortlog", cmd_shortlog, RUN_SETUP | USE_PAGER },
 		{ "show-branch", cmd_show_branch, RUN_SETUP },
 		{ "show", cmd_show, RUN_SETUP | USE_PAGER },
+		{ "softref", cmd_softref, RUN_SETUP },
 		{ "stripspace", cmd_stripspace },
 		{ "symbolic-ref", cmd_symbolic_ref, RUN_SETUP },
 		{ "tar-tree", cmd_tar_tree },
-- 
1.5.2.1.144.gabc40

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 7/7] Teach git-mktag to register softrefs for all tag objects
  2007-06-09 18:19 ` [PATCH 0/7] Introduce soft references (softrefs) Johan Herland
                     ` (5 preceding siblings ...)
  2007-06-09 18:24   ` [PATCH 6/7] Softrefs: Administrivia associated with softrefs subsystem and git-softref builtin Johan Herland
@ 2007-06-09 18:24   ` Johan Herland
  2007-06-09 18:25   ` [PATCH] Change softrefs file format from text (82 bytes per entry) to binary (40 bytes per entry) Johan Herland
  2007-06-09 23:55   ` Comment on weak refs Junio C Hamano
  8 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-09 18:24 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds

For each tag object created, we create a corresponding softref from the
tagged object to the tag object itself. This is needed to enable efficient
lookup of which tag objects that point to a given commit/object.

Signed-off-by: Johan Herland <johan@herland.net>
---
 mktag.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/mktag.c b/mktag.c
index af0cfa6..db8a6b8 100644
--- a/mktag.c
+++ b/mktag.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "tag.h"
+#include "softrefs.h"
 
 /*
  * Tag object data has the following format: two mandatory lines of
@@ -32,6 +33,7 @@ int main(int argc, char **argv)
 {
 	unsigned long size = 4096;
 	char *buffer = xmalloc(size);
+	struct tag result_tag;
 	unsigned char result_sha1[20];
 
 	if (argc != 1)
@@ -46,7 +48,7 @@ int main(int argc, char **argv)
 	buffer[size] = 0;
 
 	/* Verify tag object data */
-	if (parse_and_verify_tag_buffer(0, buffer, size, 1)) {
+	if (parse_and_verify_tag_buffer(&result_tag, buffer, size, 1)) {
 		free(buffer);
 		die("invalid tag data file");
 	}
@@ -57,6 +59,13 @@ int main(int argc, char **argv)
 	}
 
 	free(buffer);
+
+	/* Create reverse mapping softref */
+	if (add_softref(result_tag.tagged->sha1, result_sha1) < 0) {
+		die("unable to create softref for resulting tag object %s",
+			sha1_to_hex(result_sha1));
+	}
+
 	printf("%s\n", sha1_to_hex(result_sha1));
 	return 0;
 }
-- 
1.5.2.1.144.gabc40

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH] Change softrefs file format from text (82 bytes per entry) to binary (40 bytes per entry)
  2007-06-09 18:19 ` [PATCH 0/7] Introduce soft references (softrefs) Johan Herland
                     ` (6 preceding siblings ...)
  2007-06-09 18:24   ` [PATCH 7/7] Teach git-mktag to register softrefs for all tag objects Johan Herland
@ 2007-06-09 18:25   ` Johan Herland
  2007-06-10  8:02     ` Johannes Schindelin
  2007-06-09 23:55   ` Comment on weak refs Junio C Hamano
  8 siblings, 1 reply; 52+ messages in thread
From: Johan Herland @ 2007-06-09 18:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds

The text-based softrefs file format uses 82 bytes per entry (40 bytes
from_sha1 in hex, 1 byte SP, 40 bytes to_sha1 in hex, 1 byte LF).

The binary softrefs file format uses 40 bytes per entry (20 bytes
from_sha1, 20 bytes to_sha1).

Moving to a binary format increases performance slightly, but sacrifices
easy readability of the softrefs files.

Signed-off-by: Johan Herland <johan@herland.net>
---

To illustrate the change in performance from changing the softrefs file
format, I prepared a linux repo (holding 57274 commits) with 10 tag
objects, and created softrefs from every commit to every tag object
(572740 softrefs in total). The resulting softrefs db was 46964680 bytes
when using the text format, and 22909600 bytes when using the binary
format. The experiment was done on a 32-bit Intel Pentium 4
(3 GHz w/HyperThreading) with 1 GB RAM:


========
Operations on unsorted softrefs:
(572740 (10 per commit) entries in random/unsorted order)
========

Listing all softrefs
(sequential reading of unsorted softrefs file)
--------
[text format]
$ /usr/bin/time git softref --list > /dev/null
0.44user 0.02system 0:00.47elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11786minor)pagefaults 0swaps
[binary format]
$ /usr/bin/time git softref --list > /dev/null
0.35user 0.01system 0:00.38elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5913minor)pagefaults 0swaps

Listing HEAD's softrefs
(sequential reading of unsorted softrefs file)
--------
[text format]
$ /usr/bin/time git softref --list HEAD > /dev/null
0.11user 0.01system 0:00.14elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11790minor)pagefaults 0swaps
[binary format]
$ /usr/bin/time git softref --list HEAD > /dev/null
0.02user 0.01system 0:00.03elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5918minor)pagefaults 0swaps

Sorting softrefs
--------
[text format]
$ /usr/bin/time git softref --merge-unsorted
2.73user 4.97system 0:07.77elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+15833minor)pagefaults 0swaps
[binary format]
$ /usr/bin/time git softref --merge-unsorted
1.78user 5.00system 0:06.79elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9961minor)pagefaults 0swaps

Sorting softrefs into existing sorted file
(throwing away duplicates)
--------
[text format]
$ /usr/bin/time git softref --merge-unsorted
3.49user 5.12system 0:08.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27300minor)pagefaults 0swaps
[binary format]
$ /usr/bin/time git softref --merge-unsorted
2.03user 4.92system 0:07.05elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+15556minor)pagefaults 0swaps


========
Operations on sorted softrefs:
(572740 (10 per commit) entries in sorted order)
========

Listing all softrefs
(sequential reading of sorted softrefs file)
--------
[text format]
$ /usr/bin/time git softref --list > /dev/null
0.43user 0.02system 0:00.48elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11786minor)pagefaults 0swaps
[binary format]
$ /usr/bin/time git softref --list > /dev/null
0.37user 0.00system 0:00.38elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5914minor)pagefaults 0swaps

Listing HEAD's softrefs
(256-fanout followed by binary search in sorted softrefs file)
--------
[text format]
$/usr/bin/time git softref --list HEAD > /dev/null
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+334minor)pagefaults 0swaps
[binary format]
$ /usr/bin/time git softref --list HEAD > /dev/null
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+333minor)pagefaults 0swaps

Sorting softrefs
(no-op)
--------
[text format]
$ /usr/bin/time git softref --merge-unsorted
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+312minor)pagefaults 0swaps
[binary format]
$ /usr/bin/time git softref --merge-unsorted
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+312minor)pagefaults 0swaps


As expected, the binary format almost halved the number of pagefaults for
cases causing the entire softrefs db to be read (the reason for this, of
course, being the halved size of the db).

For the most common use case (looking up a given commit in a sorted db)
the binary format has no measurable effect on performance.


...Johan


 Documentation/git-softref.txt |    6 ++--
 softrefs.c                    |   78 +++++++++++------------------------------
 softrefs.h                    |    7 ++--
 3 files changed, 27 insertions(+), 64 deletions(-)

diff --git a/Documentation/git-softref.txt b/Documentation/git-softref.txt
index 6a3e13b..e21aaf3 100644
--- a/Documentation/git-softref.txt
+++ b/Documentation/git-softref.txt
@@ -93,9 +93,9 @@ Also, softrefs are stored in a way that makes it easy and quick to find all
 the "to" objects reachable from a given "from" object.
 
 The softrefs db consists of two files: `.git/softrefs.unsorted` and
-`.git/softrefs.sorted`. Both files use the same format; one softref per line
-of the form "`<from-sha1> <to-sha1>\n`". Each sha1 sum is 40 bytes long; this
-makes each entry exactly 82 bytes long.
+`.git/softrefs.sorted`. Both files use the same binary format; `<from-sha1>`
+followed by `<to-sha1>` per entry. Each sha1 sum is 20 bytes long; this
+makes each entry exactly 40 bytes long.
 
 The entries in `.git/softrefs.sorted` are sorted on `<from-sha1>`, in order to
 make lookup fast. This file is also known as the "sorted softrefs db".
diff --git a/softrefs.c b/softrefs.c
index c7308c8..8bb3a83 100644
--- a/softrefs.c
+++ b/softrefs.c
@@ -9,10 +9,8 @@ static const unsigned int MAX_UNSORTED_ENTRIES = 1000;
 
 /* softref entry as it appears in a softrefs file */
 struct softrefs_entry {
-	char from_sha1_hex[40];
-	char space;
-	char to_sha1_hex[40];
-	char lf;
+	unsigned char from_sha1[20];
+	unsigned char to_sha1[20];
 };
 
 /* simple encapsulation of a softrefs file */
@@ -106,8 +104,9 @@ static int write_entry(int fd, const struct softrefs_entry *entry)
 	if (write(fd, (const void *) entry, sizeof(struct softrefs_entry))
 		< sizeof(struct softrefs_entry))
 	{
-		return error("Failed to write entry '%.40s -> %.40s' to softrefs file descriptor %i: %s",
-				entry->from_sha1_hex, entry->to_sha1_hex,
+		return error("Failed to write entry '%s -> %s' to softrefs file descriptor %i: %s",
+				sha1_to_hex(entry->from_sha1),
+				sha1_to_hex(entry->to_sha1),
 				fd, strerror(errno));
 	}
 	return 0;
@@ -139,29 +138,13 @@ void deinit_softrefs_access()
 	internal_file = 0;
 }
 
-/* comparison between a SHA1 sum and a softrefs entry */
-static int sha1_to_entry_cmp(
-	const unsigned char *from_sha1, const struct softrefs_entry *entry)
-{
-	unsigned char sha1[20];
-	get_sha1_hex(entry->from_sha1_hex, sha1);
-	return hashcmp(from_sha1, sha1);
-}
-
 /* comparison between softrefs entries */
 static int softrefs_entry_cmp(
 		const struct softrefs_entry *a, const struct softrefs_entry *b)
 {
-	unsigned char sa[20], sb[20];
-	int ret;
-	get_sha1_hex(a->from_sha1_hex, sa);
-	get_sha1_hex(b->from_sha1_hex, sb);
-	ret = hashcmp(sa, sb);
-	if (!ret) {
-		get_sha1_hex(a->to_sha1_hex, sa);
-		get_sha1_hex(b->to_sha1_hex, sb);
-		ret = hashcmp(sa, sb);
-	}
+	int ret = hashcmp(a->from_sha1, b->from_sha1);
+	if (!ret)
+		ret = hashcmp(a->to_sha1, b->to_sha1);
 	return ret;
 }
 
@@ -193,26 +176,15 @@ static int do_for_each_sequential(
 		unsigned long i,
 		int stop_at_first_non_match)
 {
-	unsigned char f_sha1[20], t_sha1[20]; /* Holds sha1 per entry */
 	int ret = 0;
 	for (; i < file->data_len; ++i) { /* Step through file, starting at i */
-		/* sanity check entry */
-		if (file->data[i].space != ' ' || file->data[i].lf != '\n') {
-			ret = error("Entry #%lu in softrefs file %s failed sanity check",
-					i, file->filename);
-			break;
-		}
-		/* retrieve SHA1 values */
-		if (get_sha1_hex(file->data[i].from_sha1_hex, f_sha1) ||
-		    get_sha1_hex(file->data[i].to_sha1_hex,   t_sha1)) {
-			ret = error("Failed to read SHA1 values from entry #%lu in softrefs file %s",
-					i, file->filename);
-			break;
-		}
 		/* Compare to lookup value */
-		if (!from_sha1 || !hashcmp(from_sha1, f_sha1)) {
-			if ((ret = fn(f_sha1, t_sha1, cb_data)))
+		if (!from_sha1 || !hashcmp(from_sha1, file->data[i].from_sha1)) {
+			if ((ret = fn(file->data[i].from_sha1,
+					file->data[i].to_sha1, cb_data)))
+			{
 				break; /* bail out if callback returns != 0 */
+			}
 		}
 		else if (stop_at_first_non_match)
 			break;
@@ -277,7 +249,7 @@ static int do_for_each_sorted(
 	i = (from_sha1[0] * file->data_len) / 256;
 
 	/* Binary search */
-	while ((cmp_result = sha1_to_entry_cmp(from_sha1, &(file->data[i])))) {
+	while ((cmp_result = hashcmp(from_sha1, file->data[i].from_sha1))) {
 		if (right - left <= 1) /* not found; give up */
 			goto done;
 		if (cmp_result > 0) /* go right */
@@ -288,7 +260,7 @@ static int do_for_each_sorted(
 	}
 
 	/* i points to a matching entry, but not necessarily the first */
-	while (i >= 1 && sha1_to_entry_cmp(from_sha1, &(file->data[i - 1])) == 0)
+	while (i >= 1 && !hashcmp(from_sha1, file->data[i - 1].from_sha1))
 		--i;
 
 	/* i points to the first matching entry */
@@ -432,7 +404,6 @@ static int merge_unsorted_into_sorted(
 	while (!ret && (i < unsorted.data_len || j < sorted.data_len)) {
 		/* there are still entries in either list */
 		struct softrefs_entry *cur;
-		unsigned char from_sha1[20], to_sha1[20];
 		if (i < unsorted.data_len && j < sorted.data_len) {
 			/* there are still entries in _both_ lists */
 			/* choose "lowest" entry from either list */
@@ -451,10 +422,8 @@ static int merge_unsorted_into_sorted(
 		prev = cur;
 
 		/* skip writing if softref involves a non-existing object */
-		if (get_sha1_hex(cur->from_sha1_hex, from_sha1) ||
-			!has_sha1_file(from_sha1) ||
-		    get_sha1_hex(cur->to_sha1_hex,     to_sha1) ||
-			!has_sha1_file(  to_sha1))
+		if (!has_sha1_file(cur->from_sha1) ||
+		    !has_sha1_file(cur->to_sha1))
 		{
 			continue;
 		}
@@ -523,7 +492,6 @@ static int merge_sorted_into_sorted(const char *from_file, const char *to_file)
 	while (!ret && (i < file1.data_len || j < file2.data_len)) {
 		/* there are still entries in either list */
 		struct softrefs_entry *cur;
-		unsigned char from_sha1[20], to_sha1[20];
 		if (i < file1.data_len && j < file2.data_len) {
 			/* there are still entries in _both_ lists */
 			/* choose "lowest" entry from either list */
@@ -542,10 +510,8 @@ static int merge_sorted_into_sorted(const char *from_file, const char *to_file)
 		prev = cur;
 
 		/* skip writing if softref involves a non-existing object */
-		if (get_sha1_hex(cur->from_sha1_hex, from_sha1) ||
-			!has_sha1_file(from_sha1) ||
-		    get_sha1_hex(cur->to_sha1_hex,     to_sha1) ||
-			!has_sha1_file(  to_sha1))
+		if (!has_sha1_file(cur->from_sha1) ||
+		    !has_sha1_file(cur->to_sha1))
 		{
 			continue;
 		}
@@ -608,10 +574,8 @@ int add_softrefs(const struct softref_list *list)
 			/* nada */;
 		}
 		else {  /* softref is ok */
-			strcpy(entry.from_sha1_hex, sha1_to_hex(list->from_sha1));
-			strcpy(entry.to_sha1_hex, sha1_to_hex(list->to_sha1));
-			entry.space = ' ';
-			entry.lf = '\n';
+			hashcpy(entry.from_sha1, list->from_sha1);
+			hashcpy(entry.to_sha1, list->to_sha1);
 			if (write_entry(fd, &entry))
 				error("Failed to write entry to softrefs file %s: %s",
 						git_path(UNSORTED_FILENAME),
diff --git a/softrefs.h b/softrefs.h
index db0f8b9..89d25ce 100644
--- a/softrefs.h
+++ b/softrefs.h
@@ -19,10 +19,9 @@
  * the "to" objects reachable from a given "from" object.
  *
  * The softrefs db consists of two files: .git/softrefs.unsorted and
- * .git/softrefs.sorted. Both files use the same format; one softref per line
- * of the form "<from-sha1> <to-sha1>\n". Each sha1 sum is 40 bytes long; this
- * makes each entry exactly 82 bytes long (including the space between the sha1
- * sums and the terminating linefeed).
+ * .git/softrefs.sorted. Both files use the same binary format; <from-sha1>
+ * followed by <to-sha1> per entry. Each sha1 sum is 20 bytes long; this
+ * makes each entry exactly 40 bytes long.
  *
  * The entries in .git/softrefs.sorted are sorted on <from-sha1>, in order to
  * make lookup fast.
-- 
1.5.2.1.144.gabc40

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2)
  2007-06-04  0:51 Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2) Johan Herland
  2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
  2007-06-09 18:19 ` [PATCH 0/7] Introduce soft references (softrefs) Johan Herland
@ 2007-06-09 22:57 ` Steven Grimm
  2007-06-09 23:16   ` Johan Herland
  2 siblings, 1 reply; 52+ messages in thread
From: Steven Grimm @ 2007-06-09 22:57 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano

(Resending this in plaintext; sorry to those who got it twice.)

Being able to specify relationships between commits after the fact seems 
like a very useful facility.

Does it make sense to have type information to record what the 
relationship between two objects means? Without that, it seems like 
it'll be hard to build much of a tool set on top of this feature, since 
no two tools that made use of it could unambiguously query just their 
own softrefs.

A few use cases for relationships-after-the-fact come to mind in 
addition to the one the patch itself mentions:

A facility like this could replace the info/grafts file, or at least 
provide another way to turn a regular commit into a merge commit. Just 
put a "manually specified merge parent" ref between the target revision 
and the one you want git to think you've merged from. That would scale a 
lot better than info/grafts does, I suspect, if only by virtue of being 
O(log n) searchable thanks to the sorting.

One could easily imagine recording a "cherry picked" softref, which 
could, e.g., be the rebase machinery to skip over an already-applied 
revision. IMO the lack of any tool-readable history about cherry picking 
-- which is, after all, a sort of merge, at least conceptually -- is a 
shortcoming in present-day git. (Not a huge one, but if nothing else 
it'd be great to see cherry picking represented in, e.g., the gitk 
history display.)

It might be possible to annotate rebases to make pulling from rebased 
branches less troublesome. If you have

A--B--C--D
    \
     -E--F--G

and you rebase E onto D, a "rebased from" softref could be recorded 
between E and E':

A--B--C--D
    \     \
     -E....E'--F'--G'

Then a pulling client could potentially use that information to cleanly 
replay the rebase operation to keep its history straight. Perhaps if you 
could record historical rebases like that, you could do away with the 
current gotchas involving rebasing shared repositories. One negative 
side effect would be that you'd end up needing to keep E around where 
before you'd have been able to throw it away, but it should delta 
compress well, and you can, I think, still prune revisions F and G in 
the above picture. Or maybe it's enough to just keep E's SHA1 around 
without actually retaining its contents.

But in any event, this seems like the start of a useful new set of 
capabilities for git.

-Steve

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2)
  2007-06-09 22:57 ` Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2) Steven Grimm
@ 2007-06-09 23:16   ` Johan Herland
  2007-06-10  8:29     ` Pierre Habouzit
  0 siblings, 1 reply; 52+ messages in thread
From: Johan Herland @ 2007-06-09 23:16 UTC (permalink / raw)
  To: Steven Grimm; +Cc: git, Junio C Hamano

On Sunday 10 June 2007, Steven Grimm wrote:
> Being able to specify relationships between commits after the fact seems 
> like a very useful facility.
> 
> Does it make sense to have type information to record what the 
> relationship between two objects means? Without that, it seems like 
> it'll be hard to build much of a tool set on top of this feature, since 
> no two tools that made use of it could unambiguously query just their 
> own softrefs.

Actually MadCoder/Pierre had a similar idea on IRC. He wanted to separate 
softrefs into namespaces, so that softrefs for tags could live in a 
different place than softrefs associated with his "gits" bug tracker.

I haven't thought very much about this, but it's certainly possible to do 
something like this. What do the rest of y'all think?

> A few use cases for relationships-after-the-fact come to mind in 
> addition to the one the patch itself mentions:
> 
> A facility like this could replace the info/grafts file, or at least 
> provide another way to turn a regular commit into a merge commit. Just 
> put a "manually specified merge parent" ref between the target revision 
> and the one you want git to think you've merged from. That would scale a 
> lot better than info/grafts does, I suspect, if only by virtue of being 
> O(log n) searchable thanks to the sorting.

Yes, I _knew_ this was similar to grafts in some way :) While working on 
this, I tried to see if I could leverage grafts somewhere in my design, but 
I found them to be too commit-bound and specific. But when you look at it 
the other way it seems to make more sense.

> One could easily imagine recording a "cherry picked" softref, which 
> could, e.g., be the rebase machinery to skip over an already-applied 
> revision. IMO the lack of any tool-readable history about cherry picking 
> -- which is, after all, a sort of merge, at least conceptually -- is a 
> shortcoming in present-day git. (Not a huge one, but if nothing else 
> it'd be great to see cherry picking represented in, e.g., the gitk 
> history display.)
> 
> It might be possible to annotate rebases to make pulling from rebased 
> branches less troublesome. If you have
> 
> A--B--C--D
>     \
>      -E--F--G
> 
> and you rebase E onto D, a "rebased from" softref could be recorded 
> between E and E':
> 
> A--B--C--D
>     \     \
>      -E....E'--F'--G'
> 
> Then a pulling client could potentially use that information to cleanly 
> replay the rebase operation to keep its history straight. Perhaps if you 
> could record historical rebases like that, you could do away with the 
> current gotchas involving rebasing shared repositories. One negative 
> side effect would be that you'd end up needing to keep E around where 
> before you'd have been able to throw it away, but it should delta 
> compress well, and you can, I think, still prune revisions F and G in 
> the above picture. Or maybe it's enough to just keep E's SHA1 around 
> without actually retaining its contents.

Whoa. I hadn't even imagined this, but I guess you're right. I actually 
thought about solving the same problem (using a much worse method) way back 
in May [1], but I'd since totally forgotten about it. 

> But in any event, this seems like the start of a useful new set of 
> capabilities for git.

Thanks a lot for sharing your ideas. :)


Have fun!

...Johan

[1] http://article.gmane.org/gmane.comp.version-control.git/46137

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Comment on weak refs
  2007-06-09 18:19 ` [PATCH 0/7] Introduce soft references (softrefs) Johan Herland
                     ` (7 preceding siblings ...)
  2007-06-09 18:25   ` [PATCH] Change softrefs file format from text (82 bytes per entry) to binary (40 bytes per entry) Johan Herland
@ 2007-06-09 23:55   ` Junio C Hamano
  2007-06-10  1:25     ` Johan Herland
  2007-06-10 15:26     ` Jakub Narebski
  8 siblings, 2 replies; 52+ messages in thread
From: Junio C Hamano @ 2007-06-09 23:55 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds, Pierre Habouzit

The patch series to look-up and maintain "softref" relationship
is trivially clean.  Although I probably would have many nits to
pick, I do not think it is _wrong_ in a major way per-se.  I
would not even mind saying that I liked the basic concept, until
I thought things a bit deeper.

Here are some initial notes I took while reading your patches.

Semantics
---------

Not all "softref" relationship is equal.  "This object is
referred to by these tags" is one obvious application, and only
because we already try to follow tags when git-fetch happens
anyway, it looks natural to make everybody follow such a softref
relationship.

However, as Pierre Habouzit wants to, we may want to make a bug
tracking sheet (the details of the implementation of such a bug
tracking sheet does not matter in this discussion -- it could be
a blob, a commit, or a tag) refer to commits using this
mechanism, by pointing at the blob from commits after the fact
(i.e. "later it was verified that this commit fixes the bug
described in this bug entry").

	Side note: I am assuming a simplest implementation where
	one blob would always capture the latest status of the
	bug.  refs/bugs/12127 would point at the latest version
	of bug 12127's tracking sheet.  An alternative
	implementation would be to represent each entry of the
	tracking sheet for a single bug as a blob, and have
	multiple blobs associated to a commit on the main
	project, or the other way around, but then you would
	need a way to give order between referers to a single
	referent, which I do not find in your proposed
	"softref".

Most users who want to download and compile the main project do
not care about bug tracker objects.  You would need to have a
way to describe what kind of relationship a particular softlink
represents, and adjust the definition of reachability for the
purposes of traversal of objects.

It gets worse when you actually start using softrefs.  I do not
think you would have a limited set of softrefs, such as
"reverse-tag-lookup-softref", "bug-tracker-softref".  For
example, a typical bug tracking sheet may look like this:

      - Hey I found a bug, you can reproduce like so... I am
        testing commit _A_.
      - It appears that commit _B_ introduced it; it does not
        reproduce with anything older.
      -	Here is a patch to fix it; please try.
      - Oh, it seems to fix.  Committed as _C_.
      - No, it does not work for me, and the approach to fix is
        wrong; reopening the bug.
      - Ok, take 2 with different approach.  Committed as _D_.
 	please try.
      - Thanks, it finally fixes it.  Closing the bug.

The bug will be associated with commits A, B, C and D.  The
questions maintainers would want to ask are:

 - What caused this bug?
 - Which versions (or earlier) have this bug?
 - Which versions (or later) have proper fix?
 - What alternate approaches were taken to fix this bug?
 - In this upcoming release, which bugs have been fixed?
 - What bugs are still open after this release?

Depending on what you want to find out, you would need to ask
which commits are related to this bug tracking sheet object, and
the answer has to be different.  Some "softref" relation should
extend to its ancestry (when "this fixes" is attached to a
commit, its children ought to inherit that property), some
shouldn't ("this is what broke it" should not propagate to its
parent nor child).

It is also unclear how relationship "softref" introduces is
propagated across repositories (not objects the softref binds,
but the fact that such a binding between two objects exists need
to be propagated).  I would imagine that your assumption is
simply "to take union".  IOW, if you say A refers to B and
transfer object A to the other side, in addition to transfering
object B (if the other side does not have it yet), you would
tell the other side that B is related to A and have the other
side add that to its set of softrefs.  Techinically it is a
simple and easy to implement semantics, but I suspect that would
not necessarily be useful in practice.  Maybe two people would
disagree if A and B are related or not.  Maybe you first think A
and B are related and then later change your mind.  Should
"softref" relationships be versioned?

Reachability
------------

The association brought in between referent and referer by
softref is "weak", in that referer needs to exist only if
referent need to be there.  This has the following
consequences.

Fsck/prune/lost-found
.....................

The current object traversal starts from "known tip objects"
(i.e. refs, HEAD, index, and reflog entries when not doing
lost-found) and follows the reachability link embedded in
referer objects (i.e. tag to tagged object, commit to tree, tree
to tree and blob).  We only need to extend this "reachability"
with softref.  If a referer object refers to another object via
a softref, we consider referent reachable and we are done.

This should be reasonably straightforward, except that we
probably would need to worry about circular references that
softlink makes possible.

push/fetch/rev-list --objects
.............................

We walk the revision range (object transfers essentially starts
traversal from the tips of the sender until it meets what the
receiver already has), enumerating the reachable objects.  I
suspect that adding reachability with softref to this scheme has
consequences on performance.

Imagine:

	A---B---C---D---E

The sender's tip is at E and the receiver claims to have C.  In
the sender's repository, E is associated with A (somebody
noticed that E fixes regression introduced by A, and added a
softlink to make A reachable from E).  Currently we only need to
know C is reachable from E to decide that we do not have to send
A again, but with softlink we would need to.

The ancestry chain of referent and referer do not have to share
any common commits.  Imagine a bug tracking system where each
bug's tracking sheet is represented as a DAG of commits (this
will allow you to merge and split bugs easily).  This history
would not share any tree nor blob with the history of the source
code of the project.  And you would make a commit in the main
project associated with objects in the bug tracking project
using softrefs.  As sender and receiver exchanges the commit
ancestry information on the main project, both ends may need to
negotiate which objects in the bug tracking history are already
present in the reciever.

One attractive point of softref is that you do not have to
anchor referents with explicit refs.  E.g. if a commit in the
main project is associated with bug tracking entries in the "bug
tracking" project via softrefs, that is enough to keep the bug
tracking objects alive.  But I suspect this property makes the
enumeration of "what do we have on this end" costly.  I dunno.

Come to think of it, and this is off topic for "softref", I
think using an isolated commit DAG for each bug is probably a
very natural way to implement a bug tracking system.  If you
want to refer to commits on the mainline, you can refer to them
by their object names, just like mainline commit log messages
would refer to earlier commits in the text (e.g. "This fixes the
regression introduced by commit ABC").  The new text entry for
traditional BTS will go to the commit message (a bug "project"
commit does not have to have anything in its tree), and you can
use "git log" to view what the ordinary BTS would provide, and
it will distribute the tracking.  You do not necessarily
need/benefit from "softref" for this, though.  It could be that
such a bug "project" commit might have a commit from the main
project in its tree (it would look like a subproject) whose tree
entry name may be something like "fixes/yyyy-mm-yy-author".

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Comment on weak refs
  2007-06-09 23:55   ` Comment on weak refs Junio C Hamano
@ 2007-06-10  1:25     ` Johan Herland
  2007-06-10  6:33       ` Johannes Schindelin
  2007-06-10  9:03       ` Pierre Habouzit
  2007-06-10 15:26     ` Jakub Narebski
  1 sibling, 2 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-10  1:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, Pierre Habouzit

On Sunday 10 June 2007, Junio C Hamano wrote:
> The patch series to look-up and maintain "softref" relationship
> is trivially clean.  Although I probably would have many nits to
> pick, I do not think it is _wrong_ in a major way per-se.  I
> would not even mind saying that I liked the basic concept, until
> I thought things a bit deeper.
> 
> Here are some initial notes I took while reading your patches.
> 
> 
> Semantics
> ---------
> 
> Not all "softref" relationship is equal.  "This object is
> referred to by these tags" is one obvious application, and only
> because we already try to follow tags when git-fetch happens
> anyway, it looks natural to make everybody follow such a softref
> relationship.
> 
> However, as Pierre Habouzit wants to, we may want to make a bug
> tracking sheet (the details of the implementation of such a bug
> tracking sheet does not matter in this discussion -- it could be
> a blob, a commit, or a tag) refer to commits using this
> mechanism, by pointing at the blob from commits after the fact
> (i.e. "later it was verified that this commit fixes the bug
> described in this bug entry").
> 
> 	Side note: I am assuming a simplest implementation where
> 	one blob would always capture the latest status of the
> 	bug.  refs/bugs/12127 would point at the latest version
> 	of bug 12127's tracking sheet.  An alternative
> 	implementation would be to represent each entry of the
> 	tracking sheet for a single bug as a blob, and have
> 	multiple blobs associated to a commit on the main
> 	project, or the other way around, but then you would
> 	need a way to give order between referers to a single
> 	referent, which I do not find in your proposed
> 	"softref".
> 
> Most users who want to download and compile the main project do
> not care about bug tracker objects.  You would need to have a
> way to describe what kind of relationship a particular softlink
> represents, and adjust the definition of reachability for the
> purposes of traversal of objects.

Yes, I'm starting to see that it's not a good idea to put _all_ softrefs in 
one bag.

> It gets worse when you actually start using softrefs.  I do not
> think you would have a limited set of softrefs, such as
> "reverse-tag-lookup-softref", "bug-tracker-softref".  For
> example, a typical bug tracking sheet may look like this:
> 
>       - Hey I found a bug, you can reproduce like so... I am
>         testing commit _A_.
>       - It appears that commit _B_ introduced it; it does not
>         reproduce with anything older.
>       -	Here is a patch to fix it; please try.
>       - Oh, it seems to fix.  Committed as _C_.
>       - No, it does not work for me, and the approach to fix is
>         wrong; reopening the bug.
>       - Ok, take 2 with different approach.  Committed as _D_.
>  	please try.
>       - Thanks, it finally fixes it.  Closing the bug.
> 
> The bug will be associated with commits A, B, C and D.  The
> questions maintainers would want to ask are:
> 
>  - What caused this bug?
>  - Which versions (or earlier) have this bug?
>  - Which versions (or later) have proper fix?
>  - What alternate approaches were taken to fix this bug?
>  - In this upcoming release, which bugs have been fixed?
>  - What bugs are still open after this release?
> 
> Depending on what you want to find out, you would need to ask
> which commits are related to this bug tracking sheet object, and
> the answer has to be different.  Some "softref" relation should
> extend to its ancestry (when "this fixes" is attached to a
> commit, its children ought to inherit that property), some
> shouldn't ("this is what broke it" should not propagate to its
> parent nor child).

We're getting a little ahead of ourselves, aren't we? IMHO, it would be up 
to the bug system to determine which (and how many) connections to make 
between the bug reports and the commits (or even if softrefs would be the 
correct mechanism for these connections at all). We shouldn't necessarily 
base the softrefs design on how we imagine a hypothetical bug system to 
work. But Pierre might have something to say on how he would want to use 
softrefs, and his system is hopefully _less_ hypothetical. :)

But I can see the use of letting the user/porcelain/bugtracker define 
classes/namespaces of softrefs (at runtime).

> It is also unclear how relationship "softref" introduces is
> propagated across repositories (not objects the softref binds,
> but the fact that such a binding between two objects exists need
> to be propagated).  I would imagine that your assumption is
> simply "to take union".  IOW, if you say A refers to B and
> transfer object A to the other side, in addition to transfering
> object B (if the other side does not have it yet), you would
> tell the other side that B is related to A and have the other
> side add that to its set of softrefs.  Techinically it is a
> simple and easy to implement semantics, but I suspect that would
> not necessarily be useful in practice.  Maybe two people would
> disagree if A and B are related or not.

Yes, I see that different classes of softrefs would have different semantics 
for propagation. we could probably try to set up some sane defaults, and 
then let users put rules in their configs for how they would want to 
propagate the various softrefs classes.

> Maybe you first think A 
> and B are related and then later change your mind.  Should
> "softref" relationships be versioned?

Intriguing idea. Not immediately sure how we would implement it though...

> Reachability
> ------------
> 
> The association brought in between referent and referer by
> softref is "weak", in that referer needs to exist only if
> referent need to be there.  This has the following
> consequences.
>
> Fsck/prune/lost-found
> .....................
> 
> The current object traversal starts from "known tip objects"
> (i.e. refs, HEAD, index, and reflog entries when not doing
> lost-found) and follows the reachability link embedded in
> referer objects (i.e. tag to tagged object, commit to tree, tree
> to tree and blob).  We only need to extend this "reachability"
> with softref.  If a referer object refers to another object via
> a softref, we consider referent reachable and we are done.

Agreed.

> This should be reasonably straightforward, except that we
> probably would need to worry about circular references that
> softlink makes possible.

Isn't there a .used flag on objects we could easily check to see if we've 
seen an object before, thus preventing us from following the circular 
reference?

> push/fetch/rev-list --objects
> .............................
> 
> We walk the revision range (object transfers essentially starts
> traversal from the tips of the sender until it meets what the
> receiver already has), enumerating the reachable objects.  I
> suspect that adding reachability with softref to this scheme has
> consequences on performance.
> 
> Imagine:
> 
> 	A---B---C---D---E
> 
> The sender's tip is at E and the receiver claims to have C.  In
> the sender's repository, E is associated with A (somebody
> noticed that E fixes regression introduced by A, and added a
> softlink to make A reachable from E).  Currently we only need to
> know C is reachable from E to decide that we do not have to send
> A again, but with softlink we would need to.

Hmm. First of all, I'm not sure it would be useful to add a _direct_ link 
between E and A, but even so...
I'm thinking we can do the regular/current reachability calculation first, 
and after it's done, we analyze the softrefs to see if there are more 
objects to be fetched. In that scenario, we wouldn't need to send A again, 
since it's already in our repo.

> The ancestry chain of referent and referer do not have to share
> any common commits.  Imagine a bug tracking system where each
> bug's tracking sheet is represented as a DAG of commits (this
> will allow you to merge and split bugs easily).  This history
> would not share any tree nor blob with the history of the source
> code of the project.  And you would make a commit in the main
> project associated with objects in the bug tracking project
> using softrefs.  As sender and receiver exchanges the commit
> ancestry information on the main project, both ends may need to
> negotiate which objects in the bug tracking history are already
> present in the reciever.

As above, if the receving side gets the list of involved softrefs, it can 
make the decision on which objects it needs to fetch.

Hmm. Thinking about it, this process would of course need to be recursive, 
which could potentially adversely affect the runtime of fetch...

> One attractive point of softref is that you do not have to
> anchor referents with explicit refs.  E.g. if a commit in the
> main project is associated with bug tracking entries in the "bug
> tracking" project via softrefs, that is enough to keep the bug
> tracking objects alive.  But I suspect this property makes the
> enumeration of "what do we have on this end" costly.  I dunno.

Yeah, there are still may open questions. But I'm glad to see that most 
people seem to find the basic concept useful, at least.


Thanks for taking the time and effort to comment.


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Comment on weak refs
  2007-06-10  1:25     ` Johan Herland
@ 2007-06-10  6:33       ` Johannes Schindelin
  2007-06-10 13:41         ` Johan Herland
  2007-06-10  9:03       ` Pierre Habouzit
  1 sibling, 1 reply; 52+ messages in thread
From: Johannes Schindelin @ 2007-06-10  6:33 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, git, Linus Torvalds, Pierre Habouzit

Hi,

On Sun, 10 Jun 2007, Johan Herland wrote:

> On Sunday 10 June 2007, Junio C Hamano wrote:
>
> > Maybe you first think A and B are related and then later change your 
> > mind.  Should "softref" relationships be versioned?
> 
> Intriguing idea. Not immediately sure how we would implement it 
> though...

Has my lightweight annotation patch reached you?

I like my approach better than yours, because it is

1) a way, way smaller patch, and
2) it automatically includes the versionability.

After thinking about it a little more (my plane was slow, and as a result 
I am allowed to spend 8 more hours in Paris), I think that a small but 
crucial change would make this thing even more useful:

Instead of having "core.showAnnotations" be a boolean config, it might be 
better to have "core.annotationsRef" instead, overrideable by the 
environment variable GIT_ANNOTATION_REF.

With this, you can have different refs for different kinds of annotations.

For example, some people might add bugtracker comments (even comments like 
"this commit was bad: introduced bug #798, solved by commit 9899fdadc.."). 
Those comments could live in refs/annotations/bugs. To see them, just say 

	GIT_ANNOTATION_REF=refs/annotations/bugs gitk

Voila.

I am quite certain that treating annotations as branches, containing 
fan-out directories for the reverse lookup. I am even quite certain that 
in most cases, a working-directory-less merging is possible for such 
annotations.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/7] Softrefs: Add softrefs header file with API documentation
  2007-06-09 18:21   ` [PATCH 1/7] Softrefs: Add softrefs header file with API documentation Johan Herland
@ 2007-06-10  6:58     ` Johannes Schindelin
  2007-06-10  7:43       ` Junio C Hamano
  2007-06-10 14:00       ` Johan Herland
  2007-06-10 14:27     ` Jakub Narebski
  1 sibling, 2 replies; 52+ messages in thread
From: Johannes Schindelin @ 2007-06-10  6:58 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano, Linus Torvalds

Hi,

On Sat, 9 Jun 2007, Johan Herland wrote:

> See patch for documentation.

This is preposterous. Either you substitute the patch for a documentation, 
or you document it in the commit message. I consider commit messages like 
"See patch for documentation" as reasonable as all those CVS "** no 
message **" abominations.

> + * The softrefs db consists of two files: .git/softrefs.unsorted and
> + * .git/softrefs.sorted. Both files use the same format; one softref per line
> + * of the form "<from-sha1> <to-sha1>\n". Each sha1 sum is 40 bytes long; this
> + * makes each entry exactly 82 bytes long (including the space between the sha1 + * sums and the terminating linefeed).
> + *
> + * The entries in .git/softrefs.sorted are sorted on <from-sha1>, in order to
> + * make lookup fast.
> + *
> + * The entries in .git/softrefs.unsorted are _not_ sorted. This is to make
> + * insertion fast.

This sure sounds like you buy the disadvantages of both. Last time I 
checked, it was recommended to look at your needs and pick _one_ 
appropriate data structure fitting _all_ your needs.

Besides, your lines are way too long. Yes, it is not in 
Documentation/SubmittingPatches, but even just a cursory look into the 
existing source shows you that it is mostly 80-chars-per-line. I think it 
goes without saying that you should try to imitate the existing practices 
in any project, and since you have to read the source to get acquainted 
with it _anyway_, it would only be a duplication to have it in 
SubmittingPatches, too.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/7] Softrefs: Add softrefs header file with API documentation
  2007-06-10  6:58     ` Johannes Schindelin
@ 2007-06-10  7:43       ` Junio C Hamano
  2007-06-10  7:54         ` Johannes Schindelin
  2007-06-10 14:00       ` Johan Herland
  1 sibling, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2007-06-10  7:43 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Johan Herland, git, Linus Torvalds

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Besides, your lines are way too long. Yes, it is not in 
> Documentation/SubmittingPatches,...
> ... since you have to read the source to get acquainted 
> with it _anyway_, it would only be a duplication to have it in 
> SubmittingPatches, too.

Well, maybe we should do this.

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 01354c2..4bdfdfe 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -5,6 +5,7 @@ Checklist (and a short version for the impatient):
 	- make commits of logical units
 	- check for unnecessary whitespace with "git diff --check"
 	  before committing
+	- tab width is 8, the terminal is 80-columns wide.
 	- do not check in commented out code or unneeded files
 	- provide a meaningful commit message
 	- the first line of the commit message should be a short
@@ -82,6 +83,14 @@ option).
 Another thing: NULL pointers shall be written as NULL, not as 0.
 
 
+(1b) Tab width is 8, Terminal is 80-column wide.
+
+We generally follow the same coding style guidelines as the
+Linux kernel project.  Lines are indented with Tabs, each of
+which are 8 columns wide.  Lines should fit on 80-column wide
+terminals.
+
+
 (2) Generate your patch using git tools out of your commits.
 
 git based diff tools (git, Cogito, and StGIT included) generate

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/7] Softrefs: Add softrefs header file with API documentation
  2007-06-10  7:43       ` Junio C Hamano
@ 2007-06-10  7:54         ` Johannes Schindelin
  0 siblings, 0 replies; 52+ messages in thread
From: Johannes Schindelin @ 2007-06-10  7:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, git, Linus Torvalds

Hi,

On Sun, 10 Jun 2007, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > Besides, your lines are way too long. Yes, it is not in 
> > Documentation/SubmittingPatches,...
> > ... since you have to read the source to get acquainted 
> > with it _anyway_, it would only be a duplication to have it in 
> > SubmittingPatches, too.
> 
> Well, maybe we should do this.
>
> [...]

But where to stop?

Many people want to put an opening curly bracket in its own line. Other 
indenting is subject for discussion, too. White space after operators, but 
not after function names should be included, too.

I know you mean good, but I think it is not a bad idea to let people get 
familiar with the code (and the formatting rules) first. This way we can 
even tell who did, and who did not do that, before submitting a patch.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] Change softrefs file format from text (82 bytes per entry) to binary (40 bytes per entry)
  2007-06-09 18:25   ` [PATCH] Change softrefs file format from text (82 bytes per entry) to binary (40 bytes per entry) Johan Herland
@ 2007-06-10  8:02     ` Johannes Schindelin
  2007-06-10  8:30       ` Junio C Hamano
  2007-06-10 14:03       ` Johan Herland
  0 siblings, 2 replies; 52+ messages in thread
From: Johannes Schindelin @ 2007-06-10  8:02 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano, Linus Torvalds

Hi,

On Sat, 9 Jun 2007, Johan Herland wrote:

> The text-based softrefs file format uses 82 bytes per entry (40 bytes 
> from_sha1 in hex, 1 byte SP, 40 bytes to_sha1 in hex, 1 byte LF).
> 
> The binary softrefs file format uses 40 bytes per entry (20 bytes 
> from_sha1, 20 bytes to_sha1).
> 
> Moving to a binary format increases performance slightly, but sacrifices 
> easy readability of the softrefs files.

It is bad style to introduce one type, and then change it to another in a 
backwards-incompatible way. Either you make it backwards compatible, or 
you start with the second format, never even mentioning that you had 
another format.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Refactoring the tag object; Introducing soft references  (softrefs); Git 'notes' (take 2)
  2007-06-09 23:16   ` Johan Herland
@ 2007-06-10  8:29     ` Pierre Habouzit
  2007-06-10 14:31       ` Johan Herland
  0 siblings, 1 reply; 52+ messages in thread
From: Pierre Habouzit @ 2007-06-10  8:29 UTC (permalink / raw)
  To: Johan Herland; +Cc: Steven Grimm, git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1534 bytes --]

On Sun, Jun 10, 2007 at 01:16:45AM +0200, Johan Herland wrote:
> On Sunday 10 June 2007, Steven Grimm wrote:
> > Being able to specify relationships between commits after the fact seems 
> > like a very useful facility.
> > 
> > Does it make sense to have type information to record what the 
> > relationship between two objects means? Without that, it seems like 
> > it'll be hard to build much of a tool set on top of this feature, since 
> > no two tools that made use of it could unambiguously query just their 
> > own softrefs.
> 
> Actually MadCoder/Pierre had a similar idea on IRC. He wanted to separate 
> softrefs into namespaces, so that softrefs for tags could live in a 
> different place than softrefs associated with his "gits" bug tracker.
> 
> I haven't thought very much about this, but it's certainly possible to do 
> something like this. What do the rest of y'all think?

  Well, if we're two with the same idea, it's a good one, no ? :)

  In fact, the namespace idea like I told you on IRC isn't _that_
brilliant. But I'm sure recording a softref with:

  <from_sha> <to_sha> <token>

  token would help classify the softref. And I'm sure we'll end up with:

  <from_sha> <to_sha> <token> <flags>

  with the flags to say what behaviour (e.g.) the reachability resolver
should have wrt that link ?

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] Change softrefs file format from text (82 bytes per entry) to binary (40 bytes per entry)
  2007-06-10  8:02     ` Johannes Schindelin
@ 2007-06-10  8:30       ` Junio C Hamano
  2007-06-10  9:46         ` Johannes Schindelin
  2007-06-10 14:03       ` Johan Herland
  1 sibling, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2007-06-10  8:30 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Johan Herland, git, Linus Torvalds

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Sat, 9 Jun 2007, Johan Herland wrote:
>
>> The text-based softrefs file format uses 82 bytes per entry (40 bytes 
>> from_sha1 in hex, 1 byte SP, 40 bytes to_sha1 in hex, 1 byte LF).
>> 
>> The binary softrefs file format uses 40 bytes per entry (20 bytes 
>> from_sha1, 20 bytes to_sha1).
>> 
>> Moving to a binary format increases performance slightly, but sacrifices 
>> easy readability of the softrefs files.
>
> It is bad style to introduce one type, and then change it to another in a 
> backwards-incompatible way. Either you make it backwards compatible, or 
> you start with the second format, never even mentioning that you had 
> another format.

While I agree with that in principle, I think you are being a
bit too harsh to a set of patches that shows possible
alternatives for an idea that is not even in any unreleased
version of git.

Got out of the wrong side of bed this morning?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Comment on weak refs
  2007-06-10  1:25     ` Johan Herland
  2007-06-10  6:33       ` Johannes Schindelin
@ 2007-06-10  9:03       ` Pierre Habouzit
  1 sibling, 0 replies; 52+ messages in thread
From: Pierre Habouzit @ 2007-06-10  9:03 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, git, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 4962 bytes --]

On Sun, Jun 10, 2007 at 03:25:32AM +0200, Johan Herland wrote:
> On Sunday 10 June 2007, Junio C Hamano wrote:
> > It gets worse when you actually start using softrefs.  I do not
> > think you would have a limited set of softrefs, such as
> > "reverse-tag-lookup-softref", "bug-tracker-softref".  For
> > example, a typical bug tracking sheet may look like this:
> > 
> >       - Hey I found a bug, you can reproduce like so... I am
> >         testing commit _A_.
> >       - It appears that commit _B_ introduced it; it does not
> >         reproduce with anything older.
> >       -	Here is a patch to fix it; please try.
> >       - Oh, it seems to fix.  Committed as _C_.
> >       - No, it does not work for me, and the approach to fix is
> >         wrong; reopening the bug.
> >       - Ok, take 2 with different approach.  Committed as _D_.
> >  	please try.
> >       - Thanks, it finally fixes it.  Closing the bug.
> > 
> > The bug will be associated with commits A, B, C and D.  The
> > questions maintainers would want to ask are:
> > 
> >  - What caused this bug?
> >  - Which versions (or earlier) have this bug?
> >  - Which versions (or later) have proper fix?
> >  - What alternate approaches were taken to fix this bug?
> >  - In this upcoming release, which bugs have been fixed?
> >  - What bugs are still open after this release?
> > 
> > Depending on what you want to find out, you would need to ask
> > which commits are related to this bug tracking sheet object, and
> > the answer has to be different.  Some "softref" relation should
> > extend to its ancestry (when "this fixes" is attached to a
> > commit, its children ought to inherit that property), some
> > shouldn't ("this is what broke it" should not propagate to its
> > parent nor child).
> 
> We're getting a little ahead of ourselves, aren't we? IMHO, it would be up 
> to the bug system to determine which (and how many) connections to make 
> between the bug reports and the commits (or even if softrefs would be the 
> correct mechanism for these connections at all). We shouldn't necessarily 
> base the softrefs design on how we imagine a hypothetical bug system to 
> work. But Pierre might have something to say on how he would want to use 
> softrefs, and his system is hopefully _less_ hypothetical. :)

  To be fair, I'm still struggling with the storage backend yet, trying
to make things fast enough (My current import rate of mails is 10 per
second, wich is not that brilliant I guess), and also to design some
simple things like "answering" to a bug.

  For now, my design is the following, I've a 'bts' branch where the
bugs reports (plain mailboxes) go. Grit is able to manage as many branch
the user wants, bts is just the default name for it. Then, for a bts
branch, you have $GIT_DIR/grit/<branch>.index and
$GIT_DIR/grit/<branch>/. The former is the index of the tip of the bts
branch, and the latter contains some bits of the tip of the branch
checkouted (can be seen as some kind of cache, useful to run mutt -f on
a mbox e.g.).

  Bugs have the sha id of the hash of the first imported mail, and are
put in sha[:2]/sha[2:] files, à la .git/objects/. I also should have a
second file with annotations about the bug, format not really clear for
now, as "one file per bug" could be quite inefficient. OTOH if I mix too
many bugs in the same file, the merge risk is bigger (but I suppose I
could use a specific merge strategy on this).

  Here is the sole non hypothetical thing yet. My plans then was to use
"links" (softlinks or not, I'm speaking generically, I hope softrefs
will match my needs, I don't know yet) between specific commits, and
bugs. Links would somehow carry information on wether this is an
"opening" tag (like: this bug is present starting at that commit), an
informationnal tag (like: this commit helps fixing that bug, but is not
enough), or a closing/fixing tag (like: this commit fixes it). A fourth
kind may be also used aka a not-found tag (like: well this commit does
not fixes the bug, but for sure it's not there anymore at that commit).

  Though, softlinks do not need to "carry" the information for real,
they just need to be linked somehow to the bug, bug that would have the
annotations for those softlinks in them.

  What is somehow flawed for me, is that when someone "answers" to the
bug or changes a bit of information about it, it generate a "new"
commit, and I would need to move the softlinks to the new commit object
it generated to shorten the path and go directly to the last version of
the bug status file.

  So to be of use for me, yes, I guess I would really like the
versionning of softlinks. If I use them at all, I don't know yet.

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] Change softrefs file format from text (82 bytes per entry) to binary (40 bytes per entry)
  2007-06-10  8:30       ` Junio C Hamano
@ 2007-06-10  9:46         ` Johannes Schindelin
  0 siblings, 0 replies; 52+ messages in thread
From: Johannes Schindelin @ 2007-06-10  9:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, git, Linus Torvalds

Hi,

On Sun, 10 Jun 2007, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > On Sat, 9 Jun 2007, Johan Herland wrote:
> >
> >> The text-based softrefs file format uses 82 bytes per entry (40 bytes 
> >> from_sha1 in hex, 1 byte SP, 40 bytes to_sha1 in hex, 1 byte LF).
> >> 
> >> The binary softrefs file format uses 40 bytes per entry (20 bytes 
> >> from_sha1, 20 bytes to_sha1).
> >> 
> >> Moving to a binary format increases performance slightly, but sacrifices 
> >> easy readability of the softrefs files.
> >
> > It is bad style to introduce one type, and then change it to another in a 
> > backwards-incompatible way. Either you make it backwards compatible, or 
> > you start with the second format, never even mentioning that you had 
> > another format.
> 
> While I agree with that in principle, I think you are being a
> bit too harsh to a set of patches that shows possible
> alternatives for an idea that is not even in any unreleased
> version of git.
> 
> Got out of the wrong side of bed this morning?

Possibly. Except it was not a bed, but an airplane passenger seat.

And it did not help that I totally disagree with the approach: "sorted 
list does not do this well, unsorted not that... let's take both!"

So, please take my words with a grain of salt. But I still think that 
_what_ I was saying is correct.

Yes, the patch series shows an alternative, but I would have wished for 
either a smaller quick-n-dirty proof-of-concept implementation ([RFC]), or 
a better thought-through one ([PATCH]).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Comment on weak refs
  2007-06-10  6:33       ` Johannes Schindelin
@ 2007-06-10 13:41         ` Johan Herland
  2007-06-10 14:09           ` Pierre Habouzit
  0 siblings, 1 reply; 52+ messages in thread
From: Johan Herland @ 2007-06-10 13:41 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git, Linus Torvalds, Pierre Habouzit

On Sunday 10 June 2007, Johannes Schindelin wrote:
> On Sun, 10 Jun 2007, Johan Herland wrote:
> > On Sunday 10 June 2007, Junio C Hamano wrote:
> Has my lightweight annotation patch reached you?
> 
> I like my approach better than yours, because it is
> 
> 1) a way, way smaller patch, and
> 2) it automatically includes the versionability.

I see your point, but your lightweight annotations are solving a different 
problem, aren't they? They do provide the after-the-fact annotations that 
sort of sparked of these discussions, but I can't see how your patch is a 
replacement of the general "relationships between arbitrary objects" 
concept that softrefs try to solve.

Of course, it might be that the lightweight annotations are "good enough" 
for the use cases we currently see, and that softrefs are a bit overkill. 
We'll just have to see what features people (like Pierre) really need.

> After thinking about it a little more (my plane was slow, and as a result 
> I am allowed to spend 8 more hours in Paris), I think that a small but 
> crucial change would make this thing even more useful:
> 
> Instead of having "core.showAnnotations" be a boolean config, it might be 
> better to have "core.annotationsRef" instead, overrideable by the 
> environment variable GIT_ANNOTATION_REF.
> 
> With this, you can have different refs for different kinds of annotations.
> 
> For example, some people might add bugtracker comments (even comments like 
> "this commit was bad: introduced bug #798, solved by commit 9899fdadc.."). 
> Those comments could live in refs/annotations/bugs. To see them, just say 
> 
> 	GIT_ANNOTATION_REF=refs/annotations/bugs gitk
> 
> Voila.

Nice. Something similar should be possible to do with softrefs as well.

> I am quite certain that treating annotations as branches, containing 
> fan-out directories for the reverse lookup. I am even quite certain that 
> in most cases, a working-directory-less merging is possible for such 
> annotations.

I'm not convinced about the working-directory-less merging. AFAICS the 
lightweight annotations will behave pretty much like the "regular" version 
controlled filesystem, and you'll have the same kind of conflicts when you 
merge stuff between repos. I'd be glad to be proven wrong, of course.

Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/7] Softrefs: Add softrefs header file with API documentation
  2007-06-10  6:58     ` Johannes Schindelin
  2007-06-10  7:43       ` Junio C Hamano
@ 2007-06-10 14:00       ` Johan Herland
  1 sibling, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-10 14:00 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano, Linus Torvalds

On Sunday 10 June 2007, Johannes Schindelin wrote:
> Hi,
> 
> On Sat, 9 Jun 2007, Johan Herland wrote:
> 
> > See patch for documentation.
> 
> This is preposterous. Either you substitute the patch for a documentation, 
> or you document it in the commit message. I consider commit messages like 
> "See patch for documentation" as reasonable as all those CVS "** no 
> message **" abominations.

Well, I could have copied documentation from the header file into the commit
message

> > + * The softrefs db consists of two files: .git/softrefs.unsorted and
> > + * .git/softrefs.sorted. Both files use the same format; one softref per line
> > + * of the form "<from-sha1> <to-sha1>\n". Each sha1 sum is 40 bytes long; this
> > + * makes each entry exactly 82 bytes long (including the space between the sha1 + * sums and the terminating linefeed).
> > + *
> > + * The entries in .git/softrefs.sorted are sorted on <from-sha1>, in order to
> > + * make lookup fast.
> > + *
> > + * The entries in .git/softrefs.unsorted are _not_ sorted. This is to make
> > + * insertion fast.
> 
> This sure sounds like you buy the disadvantages of both. Last time I 
> checked, it was recommended to look at your needs and pick _one_ 
> appropriate data structure fitting _all_ your needs.

First, the unsorted file is bounded in size to make sure it never gets
large enough to really impact performance
Second, I'd ask you to look at the performance numbers (in patch #0)
before commenting on the performance.

> Besides, your lines are way too long. Yes, it is not in 
> Documentation/SubmittingPatches, but even just a cursory look into the 
> existing source shows you that it is mostly 80-chars-per-line. I think it 
> goes without saying that you should try to imitate the existing practices 
> in any project, and since you have to read the source to get acquainted 
> with it _anyway_, it would only be a duplication to have it in 
> SubmittingPatches, too.

I have indeed tried to follow the 80-chars-per-line rule. In softrefs.h
I fail to see a _single_ line exceeding 80 chars per line. In the other
files I believe the number of long lines is comparable to other files
in the git repo.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] Change softrefs file format from text (82 bytes per entry) to binary (40 bytes per entry)
  2007-06-10  8:02     ` Johannes Schindelin
  2007-06-10  8:30       ` Junio C Hamano
@ 2007-06-10 14:03       ` Johan Herland
  1 sibling, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-10 14:03 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano, Linus Torvalds

On Sunday 10 June 2007, Johannes Schindelin wrote:
> On Sat, 9 Jun 2007, Johan Herland wrote:
> > The text-based softrefs file format uses 82 bytes per entry (40 bytes 
> > from_sha1 in hex, 1 byte SP, 40 bytes to_sha1 in hex, 1 byte LF).
> > 
> > The binary softrefs file format uses 40 bytes per entry (20 bytes 
> > from_sha1, 20 bytes to_sha1).
> > 
> > Moving to a binary format increases performance slightly, but sacrifices 
> > easy readability of the softrefs files.
> 
> It is bad style to introduce one type, and then change it to another in a 
> backwards-incompatible way. Either you make it backwards compatible, or 
> you start with the second format, never even mentioning that you had 
> another format.

As Junio correctly pointed out, this patch is only here to demonstrate an
alternative solution. Whether or not this patch should be used should be
determined _long_ before (even thinking about) putting this into a release.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Comment on weak refs
  2007-06-10 13:41         ` Johan Herland
@ 2007-06-10 14:09           ` Pierre Habouzit
  2007-06-10 14:25             ` Pierre Habouzit
  0 siblings, 1 reply; 52+ messages in thread
From: Pierre Habouzit @ 2007-06-10 14:09 UTC (permalink / raw)
  To: Johan Herland; +Cc: Johannes Schindelin, Junio C Hamano, git, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 3833 bytes --]

  Sorry for the noise, but I'm really pissed with git@vger recently.
That mail I'm answering to, never made it to the list. Neither did my
quite long answer to Martin, who kindly forwared it back to me so that I
can send it again, sadly git@vger just does not wants mail, whereas its
SMTP seems to accept mail:

  Jun 10 16:02:09 pan postfix/smtp[20560]: DE84FCAD9: to=<git@vger.kernel.org>, relay=vger.kernel.org[209.132.176.167]:25, delay=4, delays=0.31/0.02/0.5/3.1, dsn=2.7.0, status=sent (250 2.7.0 nothing apparently wrong in the message. BF:<H 0.0800319>; S1754569AbXFJOCJ)

  I've sent a mail to postmaster@vger.kernel.org a week ago, but it
seems it remained a dead letter. Is anyone able to tell what's going
wrong ? it's _really_ irritating, and let me want to give up
discussions, as I _hate_ losing mail (I usually don't keep a copy of
mails I send to a mail list as I expect it to send it back to me, and
well, what would be a copy worth if nobody can read the mail anyway ?).

  So if anyone knows what can be done ....


On Sun, Jun 10, 2007 at 03:41:44PM +0200, Johan Herland wrote:
> On Sunday 10 June 2007, Johannes Schindelin wrote:
> > On Sun, 10 Jun 2007, Johan Herland wrote:
> > > On Sunday 10 June 2007, Junio C Hamano wrote:
> > Has my lightweight annotation patch reached you?
> > 
> > I like my approach better than yours, because it is
> > 
> > 1) a way, way smaller patch, and
> > 2) it automatically includes the versionability.
> 
> I see your point, but your lightweight annotations are solving a different 
> problem, aren't they? They do provide the after-the-fact annotations that 
> sort of sparked of these discussions, but I can't see how your patch is a 
> replacement of the general "relationships between arbitrary objects" 
> concept that softrefs try to solve.
> 
> Of course, it might be that the lightweight annotations are "good enough" 
> for the use cases we currently see, and that softrefs are a bit overkill. 
> We'll just have to see what features people (like Pierre) really need.
> 
> > After thinking about it a little more (my plane was slow, and as a result 
> > I am allowed to spend 8 more hours in Paris), I think that a small but 
> > crucial change would make this thing even more useful:
> > 
> > Instead of having "core.showAnnotations" be a boolean config, it might be 
> > better to have "core.annotationsRef" instead, overrideable by the 
> > environment variable GIT_ANNOTATION_REF.
> > 
> > With this, you can have different refs for different kinds of annotations.
> > 
> > For example, some people might add bugtracker comments (even comments like 
> > "this commit was bad: introduced bug #798, solved by commit 9899fdadc.."). 
> > Those comments could live in refs/annotations/bugs. To see them, just say 
> > 
> > 	GIT_ANNOTATION_REF=refs/annotations/bugs gitk
> > 
> > Voila.
> 
> Nice. Something similar should be possible to do with softrefs as well.
> 
> > I am quite certain that treating annotations as branches, containing 
> > fan-out directories for the reverse lookup. I am even quite certain that 
> > in most cases, a working-directory-less merging is possible for such 
> > annotations.
> 
> I'm not convinced about the working-directory-less merging. AFAICS the 
> lightweight annotations will behave pretty much like the "regular" version 
> controlled filesystem, and you'll have the same kind of conflicts when you 
> merge stuff between repos. I'd be glad to be proven wrong, of course.
> 
> 
> Have fun!
> 
> ....Johan
> 
> -- 
> Johan Herland, <johan@herland.net>
> www.herland.net

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Comment on weak refs
  2007-06-10 14:09           ` Pierre Habouzit
@ 2007-06-10 14:25             ` Pierre Habouzit
  0 siblings, 0 replies; 52+ messages in thread
From: Pierre Habouzit @ 2007-06-10 14:25 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1908 bytes --]

On Sun, Jun 10, 2007 at 04:08:59PM +0200, Pierre Habouzit wrote:
>   Sorry for the noise, but I'm really pissed with git@vger recently.
> That mail I'm answering to, never made it to the list. Neither did my
> quite long answer to Martin, who kindly forwared it back to me so that I
> can send it again, sadly git@vger just does not wants mail, whereas its
> SMTP seems to accept mail:
> 
>   Jun 10 16:02:09 pan postfix/smtp[20560]: DE84FCAD9: to=<git@vger.kernel.org>, relay=vger.kernel.org[209.132.176.167]:25, delay=4, delays=0.31/0.02/0.5/3.1, dsn=2.7.0, status=sent (250 2.7.0 nothing apparently wrong in the message. BF:<H 0.0800319>; S1754569AbXFJOCJ)
> 
>   I've sent a mail to postmaster@vger.kernel.org a week ago, but it
> seems it remained a dead letter. Is anyone able to tell what's going
> wrong ? it's _really_ irritating, and let me want to give up
> discussions, as I _hate_ losing mail (I usually don't keep a copy of
> mails I send to a mail list as I expect it to send it back to me, and
> well, what would be a copy worth if nobody can read the mail anyway ?).
> 
>   So if anyone knows what can be done ....

  SOrry, *I* screwed up. I did not checked first if the archived had the
mails, and it had. So as I don't see the mails come, it's definitely a
problem in between, and is not necessarilly vger.kernel.org's fault, and
I apologies for the noise.

  So if anyone sent a mail to the list (without me in Cc) hoping that I
would get the mail and never answered, it's time to send it again :/

  That is also the reason why some of my mails have been sent many
times: because I thought they were eaten at some point. That should not
happen again, sorry about that.

Cheers,
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/7] Softrefs: Add softrefs header file with API documentation
  2007-06-09 18:21   ` [PATCH 1/7] Softrefs: Add softrefs header file with API documentation Johan Herland
  2007-06-10  6:58     ` Johannes Schindelin
@ 2007-06-10 14:27     ` Jakub Narebski
  2007-06-10 14:45       ` [PATCH] Teach git-gc to merge unsorted softrefs Johan Herland
  1 sibling, 1 reply; 52+ messages in thread
From: Jakub Narebski @ 2007-06-10 14:27 UTC (permalink / raw)
  To: git

Johan Herland wrote:

> + * When softrefs are created (by calling add_softref()/add_softrefs()), they
> + * are appended to .git/softrefs.unsorted. When .git/softrefs.unsorted reach a
> + * certain number of entries (determined by MAX_UNSORTED_ENTRIES), all the
> + * entries in .git/softrefs.unsorted are merged into .git/softrefs.sorted.

Perhaps git-gc should also sort softrefs (if it doesn't do it yet)?
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2)
  2007-06-10  8:29     ` Pierre Habouzit
@ 2007-06-10 14:31       ` Johan Herland
  2007-06-10 19:42         ` Steven Grimm
  0 siblings, 1 reply; 52+ messages in thread
From: Johan Herland @ 2007-06-10 14:31 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: Steven Grimm, git, Junio C Hamano

On Sunday 10 June 2007, Pierre Habouzit wrote:
> On Sun, Jun 10, 2007 at 01:16:45AM +0200, Johan Herland wrote:
> > On Sunday 10 June 2007, Steven Grimm wrote:
> > > Being able to specify relationships between commits after the fact seems 
> > > like a very useful facility.
> > > 
> > > Does it make sense to have type information to record what the 
> > > relationship between two objects means? Without that, it seems like 
> > > it'll be hard to build much of a tool set on top of this feature, since 
> > > no two tools that made use of it could unambiguously query just their 
> > > own softrefs.
> > 
> > Actually MadCoder/Pierre had a similar idea on IRC. He wanted to separate 
> > softrefs into namespaces, so that softrefs for tags could live in a 
> > different place than softrefs associated with his "gits" bug tracker.
> > 
> > I haven't thought very much about this, but it's certainly possible to do 
> > something like this. What do the rest of y'all think?
> 
>   Well, if we're two with the same idea, it's a good one, no ? :)
> 
>   In fact, the namespace idea like I told you on IRC isn't _that_
> brilliant. But I'm sure recording a softref with:
> 
>   <from_sha> <to_sha> <token>
> 
>   token would help classify the softref. And I'm sure we'll end up with:
> 
>   <from_sha> <to_sha> <token> <flags>
> 
>   with the flags to say what behaviour (e.g.) the reachability resolver
> should have wrt that link ?

Interesting. But I'm not sure I want to give up the fixed-length softref
records as I imagine it makes the lookup and processing _much_ faster.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] Teach git-gc to merge unsorted softrefs
  2007-06-10 14:27     ` Jakub Narebski
@ 2007-06-10 14:45       ` Johan Herland
  0 siblings, 0 replies; 52+ messages in thread
From: Johan Herland @ 2007-06-10 14:45 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebski

Signed-off-by: Johan Herland <johan@herland.net>
---

On Sunday 10 June 2007, Jakub Narebski wrote:
> Perhaps git-gc should also sort softrefs (if it doesn't do it yet)?

Sure.

 builtin-gc.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/builtin-gc.c b/builtin-gc.c
index 45025fb..30e1e44 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -12,6 +12,7 @@
 
 #include "cache.h"
 #include "run-command.h"
+#include "softrefs.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -87,6 +88,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	if (i != argc)
 		usage(builtin_gc_usage);
 
+	if (merge_unsorted_softrefs(NULL, 1))
+		return error("failed to merge unsorted softrefs");
+
 	if (pack_refs && run_command_v_opt(argv_pack_refs, RUN_GIT_CMD))
 		return error(FAILED_RUN, argv_pack_refs[0]);
 
-- 
1.5.2.1.144.gabc40

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: Comment on weak refs
  2007-06-09 23:55   ` Comment on weak refs Junio C Hamano
  2007-06-10  1:25     ` Johan Herland
@ 2007-06-10 15:26     ` Jakub Narebski
  1 sibling, 0 replies; 52+ messages in thread
From: Jakub Narebski @ 2007-06-10 15:26 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:

> Semantics
> ---------
> 
> Not all "softref" relationship is equal.  "This object is
> referred to by these tags" is one obvious application, and only
> because we already try to follow tags when git-fetch happens
> anyway, it looks natural to make everybody follow such a softref
> relationship.

Or "this onject is referred to by these _notes_", where notes differ
from tags on what is more important: name or comment (message). For
tags most important is the name, for notes most important is the
comment (which might be bug message, but might be correction to
commit message, or additional acked-by).

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2)
  2007-06-10 14:31       ` Johan Herland
@ 2007-06-10 19:42         ` Steven Grimm
  0 siblings, 0 replies; 52+ messages in thread
From: Steven Grimm @ 2007-06-10 19:42 UTC (permalink / raw)
  To: Johan Herland; +Cc: Pierre Habouzit, git, Junio C Hamano

Johan Herland wrote:
> Interesting. But I'm not sure I want to give up the fixed-length softref
> records as I imagine it makes the lookup and processing _much_ faster.
>   

The token (really a namespace identifier) could be defined as a string 
with a fixed, probably small, maximum size. Or, better IMO, it could be 
an integer with a bunch of enumerated values for internal git uses and a 
range reserved for unofficial / experimental use. I agree that 
fixed-length records seem like a win here, assuming this is the general 
storage layout we end up with when all is said and done.

-Steve

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2007-06-10 19:42 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-04  0:51 Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2) Johan Herland
2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
2007-06-04  0:52   ` [PATCH 1/6] Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header Johan Herland
2007-06-04  6:08     ` Matthias Lederhofer
2007-06-04  7:30       ` Johan Herland
2007-06-04  0:53   ` [PATCH 2/6] git-show: When showing tag objects with no tag name, show tag object's SHA1 instead of an empty string Johan Herland
2007-06-04  0:53   ` [PATCH 3/6] git-fsck: Do thorough verification of tag objects Johan Herland
2007-06-04  5:56     ` Matthias Lederhofer
2007-06-04  7:51       ` Johan Herland
2007-06-06  7:18         ` Junio C Hamano
2007-06-06  8:06           ` Johan Herland
2007-06-06  9:03             ` Junio C Hamano
2007-06-06  9:21               ` Junio C Hamano
2007-06-06 10:26                 ` Johan Herland
2007-06-06 10:35                   ` Junio C Hamano
2007-06-04  0:54   ` [PATCH 4/6] Documentation/git-mktag: Document the changes in tag object structure Johan Herland
2007-06-04  0:54   ` [PATCH 5/6] git-mktag tests: Fix and expand the mktag tests according to the new " Johan Herland
2007-06-04  0:54   ` [PATCH 6/6] Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object Johan Herland
2007-06-04 20:32   ` [PATCH 0/6] Refactor the " Junio C Hamano
2007-06-07 22:13   ` [PATCH] Fix bug in tag parsing when thorough verification was in effect Johan Herland
2007-06-09 18:19 ` [PATCH 0/7] Introduce soft references (softrefs) Johan Herland
2007-06-09 18:21   ` [PATCH 1/7] Softrefs: Add softrefs header file with API documentation Johan Herland
2007-06-10  6:58     ` Johannes Schindelin
2007-06-10  7:43       ` Junio C Hamano
2007-06-10  7:54         ` Johannes Schindelin
2007-06-10 14:00       ` Johan Herland
2007-06-10 14:27     ` Jakub Narebski
2007-06-10 14:45       ` [PATCH] Teach git-gc to merge unsorted softrefs Johan Herland
2007-06-09 18:22   ` [PATCH 2/7] Softrefs: Add implementation of softrefs API Johan Herland
2007-06-09 18:22   ` [PATCH 3/7] Softrefs: Add git-softref, a builtin command for adding, listing and administering softrefs Johan Herland
2007-06-09 18:23   ` [PATCH 4/7] Softrefs: Add manual page documenting git-softref and softrefs subsystem in general Johan Herland
2007-06-09 18:23   ` [PATCH 5/7] Softrefs: Add testcases for basic softrefs behaviour Johan Herland
2007-06-09 18:24   ` [PATCH 6/7] Softrefs: Administrivia associated with softrefs subsystem and git-softref builtin Johan Herland
2007-06-09 18:24   ` [PATCH 7/7] Teach git-mktag to register softrefs for all tag objects Johan Herland
2007-06-09 18:25   ` [PATCH] Change softrefs file format from text (82 bytes per entry) to binary (40 bytes per entry) Johan Herland
2007-06-10  8:02     ` Johannes Schindelin
2007-06-10  8:30       ` Junio C Hamano
2007-06-10  9:46         ` Johannes Schindelin
2007-06-10 14:03       ` Johan Herland
2007-06-09 23:55   ` Comment on weak refs Junio C Hamano
2007-06-10  1:25     ` Johan Herland
2007-06-10  6:33       ` Johannes Schindelin
2007-06-10 13:41         ` Johan Herland
2007-06-10 14:09           ` Pierre Habouzit
2007-06-10 14:25             ` Pierre Habouzit
2007-06-10  9:03       ` Pierre Habouzit
2007-06-10 15:26     ` Jakub Narebski
2007-06-09 22:57 ` Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2) Steven Grimm
2007-06-09 23:16   ` Johan Herland
2007-06-10  8:29     ` Pierre Habouzit
2007-06-10 14:31       ` Johan Herland
2007-06-10 19:42         ` Steven Grimm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).