Git development
 help / color / mirror / Atom feed
* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
From: Jeff King @ 2011-10-19 19:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List
In-Reply-To: <CACBZZX5PqYa0uWiGgs952rk2cy+QRCU95kF63qzSi3fKK-YrCQ@mail.gmail.com>

On Wed, Oct 19, 2011 at 08:03:24PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This is quick hack I wrote just before leaving work to show that I
> could indeed push patches to our main repository starting with
> 31337. Names hidden to protect the innocent.

Clever and amusing.

> Which in just over a minute will generate, in my case:
> 
>     $ git show --pretty=raw 313375d995e6f8b7773c6ed1ee165e5a9e15690b | head -n 7
>     commit 313375d995e6f8b7773c6ed1ee165e5a9e15690b
>     tree c9bebc99c05dfe61cccf02ebdf442945c8ff8b3c
>     parent 0dce2d45a79d26a593f0e12301cdfeb7eb23c17a
>     author Ævar Arnfjörð Bjarmason <avar@example.com> <censored> <censored>
>     committer Ævar Arnfjörð Bjarmason <avar@example.com> <censored> <censored>
>     lulz 697889

Nice header name.

> I also think it's interesting that we seemingly don't have (in my
> brief search, maybe I missed it) an API for writing a complete commit
> object into a strbuf, so I had to write a lot of commit objects to
> disk, and keep unlinking the unacceptable candidates so the repository
> wouldn't balloon in size.

Calculating an object sha1 is pretty straightforward. I think you could
just tack the "commit <size>" header in front of the commit strbuf, and
sha1 that, and then just send the "winner" to actually be written.

Too bad it won't work in an append only way.  The internal state of sha1
after a certain set of bytes is deterministic, so you could do something
like:

  void leetify_object(struct strbuf *data)
  {
          SHA_CTX base, guess;
          int len = data->len;
          int i = 0;

          /* come up with a base internal state representing
           * the commit */
          SHA1_Init(&base);
          SHA1_Update(&base, data->buf, data->len);

          while (1) {
                  char append[64];
                  int n = snprintf(append, sizeof(append),
                                   "x-lulz-by: %d\n", i);
                  unsigned char sha1[20];

                  /* and then from that state, see what the sha1 would
                   * be if we add our few new bytes */
                  memcpy(&guess, &base, sizeof(guess));
                  SHA1_Update(&guess, append, n);
                  SHA1_Final(sha1, &guess);

                  /* did we pick a winner? */
                  if (sha1[0] == 0x31 &&
                      sha1[1] == 0x33 &&
                      &sha1[2] & 0xf0) == 0x70) {
                            strbuf_addstr(data, append);
                            return;
                  }

                  i++;
          }
  }

There are two problems with this:

  1. The x-lulz-by header is visible in the commit message. I think you
     can actually get around this by appending a NUL character, and "git
     log" and friends will stop processing the commit message there.

  2. The object header will actually have the length of the object at
     the beginning. So as your lulz number grows, the length of the
     object will change, and invalidate your earlier sha1. You could get
     around this by using a fixed-size guess (e.g., start with 20 bytes
     of zeroes, and then just keep incrementing).

-Peff

^ permalink raw reply

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
From: Junio C Hamano @ 2011-10-19 19:29 UTC (permalink / raw)
  To: git
  Cc: Michael Haggerty, Jeff King, cmn, A Large Angry SCM,
	Daniel Barkalow, Sverre Rabbelier
In-Reply-To: <7v39epft32.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> I was trying to summarize this topic for Release Notes.
>
>   Possibly incompatible changes
>   -----------------------------
>
>    * Special refs such as "HEAD", "MERGE_HEAD", and "FETCH_HEAD" are now
>      restricted to names with only uppercase letters and underscore. All
>      other refs must live under refs/ namespace. Earlier, you could
>      confuse git by storing an object name in $GIT_DIR/tmp/junk and say
>      "git show tmp/junk", but this will no longer work.
>
> But noticed that "git update-ref tmp/junk HEAD" does create such a ref
> that won't be recognized, and "git check-ref-format tmp/junk" is happy.
>
> I think we would need to restrict check_ref_format() so that these
> commands (and possibly others, but I think that single function will cover
> pretty much everything) also reject "tmp/junk" immediately below $GIT_DIR
> as a bad string. Otherwise we cannot merge these fixups, which would mean
> we would have to revert the "Clean up refname checks and normalization"
> series, at least the part that started emitting the "warning", which is a
> mess I would rather want to avoid.
>
> Opinions on how to get us out of this mess?

Let me step back a bit to avoid making hasty decisions that may cause
negative effect on the end users only because a single topic with a
possible fallout has been merged to 'master' already (a sane position for
such a situation always has been to revert the merge of the topic so far,
but I ended up being hasty, only because the merge was deliberately made
early in the cycle to give us enough time to deal with fallouts).

In general, the "Hey that file that appears in our dwimmed ref namespace
does not store correct [0-9a-f]{40} contents" warning message is a good
thing to have. When we try the dwimmery and disambiguation, we however
look at the potential refs and warn disambiguity only when two or more
such files have good contents. E.g. if I do this:

    $ git rev-parse HEAD >.git/refs/heads/frotz
    $ echo hello >.git/refs/tags/frotz
    $ git show frotz

we have never paid attention to the broken tag and showed the 'frotz'
branch without complaining. Once tags/frotz gets a real object name,
however, we start giving ambiguity warnings.

Perhaps that is what we should be fixing instead.

Right now, dwim_ref() classifies candidates into two categories, ones that
resolve_ref() succeeds, and others that resolve_ref() does not. And when
we have two or more candidates that successfully resolves, there is an
ambiguity.

Perhaps we need the third kind: ones that exist on the filesystem but
are ill-formed.

When naïvely looked at, the code in 'master' may seem to be doing just
that, but it does _not_ have any way to affect the ambiguity detection
logic even if the caller wanted to. The warning is issued at a wrong
level.

Perhaps resolve_ref() should return in its *flag parameter that "a file
exists there but incorrectly formatted", and dwim_ref() should notice and
use that information to warn about ambiguity and also illformed-ness.

A patch is attached at the end of this message to minimally fix what is in
'master' (without the jc/check-ref-format-fixup topic). This update allows
us to make dwim_ref() notice such "exists but broken" candidates and later
take them into consideration when deciding if a name is ambiguous, but I
did not want to change the semantics from the traditional implementation,
so it only uses this information to warn ref breakages. Also this squelches
the phony "index is not well formed" warning against "git show index --"
by knowing that many files directly under $GIT_DIR are not ref-like things.
Michaels's "when taken as a full refname, is this string valid?" update
mentioned in the thread may be used to replace the strchr(fullref, '/') check
that can be seen in this patch.

 refs.c      |   22 +++++++++++-----------
 refs.h      |    5 +++--
 sha1_name.c |    5 ++++-
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/refs.c b/refs.c
index cab4394..0f58e46 100644
--- a/refs.c
+++ b/refs.c
@@ -4,9 +4,8 @@
 #include "tag.h"
 #include "dir.h"
 
-/* ISSYMREF=01 and ISPACKED=02 are public interfaces */
-#define REF_KNOWS_PEELED 04
-#define REF_BROKEN 010
+/* ISSYMREF=0x01, ISPACKED=0x02 and ISBROKEN=0x04 are public interfaces */
+#define REF_KNOWS_PEELED 0x10
 
 struct ref_entry {
 	unsigned char flag; /* ISSYMREF? ISPACKED? */
@@ -329,12 +328,12 @@ static void get_ref_dir(const char *submodule, const char *base,
 				flag = 0;
 				if (resolve_gitlink_ref(submodule, ref, sha1) < 0) {
 					hashclr(sha1);
-					flag |= REF_BROKEN;
+					flag |= REF_ISBROKEN;
 				}
 			} else
 				if (!resolve_ref(ref, sha1, 1, &flag)) {
 					hashclr(sha1);
-					flag |= REF_BROKEN;
+					flag |= REF_ISBROKEN;
 				}
 			add_ref(ref, sha1, flag, array, NULL);
 		}
@@ -501,7 +500,6 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 	ssize_t len;
 	char buffer[256];
 	static char ref_buffer[256];
-	char path[PATH_MAX];
 
 	if (flag)
 		*flag = 0;
@@ -510,6 +508,7 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		return NULL;
 
 	for (;;) {
+		char path[PATH_MAX];
 		struct stat st;
 		char *buf;
 		int fd;
@@ -586,8 +585,8 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		while (isspace(*buf))
 			buf++;
 		if (check_refname_format(buf, REFNAME_ALLOW_ONELEVEL)) {
-			warning("symbolic reference in %s is formatted incorrectly",
-				path);
+			if (flag)
+				*flag |= REF_ISBROKEN;
 			return NULL;
 		}
 		ref = strcpy(ref_buffer, buf);
@@ -596,7 +595,8 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 	}
 	/* Please note that FETCH_HEAD has a second line containing other data. */
 	if (get_sha1_hex(buffer, sha1) || (buffer[40] != '\0' && !isspace(buffer[40]))) {
-		warning("reference in %s is formatted incorrectly", path);
+		if (flag)
+			*flag |= REF_ISBROKEN;
 		return NULL;
 	}
 	return ref;
@@ -624,8 +624,8 @@ static int do_one_ref(const char *base, each_ref_fn fn, int trim,
 		return 0;
 
 	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) {
-		if (entry->flag & REF_BROKEN)
-			return 0; /* ignore dangling symref */
+		if (entry->flag & REF_ISBROKEN)
+			return 0; /* ignore dangling symref and corrupt ref */
 		if (!has_sha1_file(entry->sha1)) {
 			error("%s does not point to a valid object!", entry->name);
 			return 0;
diff --git a/refs.h b/refs.h
index 0229c57..7442b29 100644
--- a/refs.h
+++ b/refs.h
@@ -10,8 +10,9 @@ struct ref_lock {
 	int force_write;
 };
 
-#define REF_ISSYMREF 01
-#define REF_ISPACKED 02
+#define REF_ISSYMREF 0x01
+#define REF_ISPACKED 0x02
+#define REF_ISBROKEN 0x04
 
 /*
  * Calls the specified function for each ref file until it returns nonzero,
diff --git a/sha1_name.c b/sha1_name.c
index ba976b4..1fe37c6 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -282,8 +282,11 @@ int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref)
 				*ref = xstrdup(r);
 			if (!warn_ambiguous_refs)
 				break;
-		} else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD"))
+		} else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD")) {
 			warning("ignoring dangling symref %s.", fullref);
+		} else if ((flag & REF_ISBROKEN) && strchr(fullref, '/')) {
+			warning("ignoring broken ref %s.", fullref);
+		}
 	}
 	free(last_branch);
 	return refs_found;

^ permalink raw reply related

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
From: Jeff King @ 2011-10-19 19:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List
In-Reply-To: <20111019190114.GA4670@sigill.intra.peff.net>

On Wed, Oct 19, 2011 at 03:01:14PM -0400, Jeff King wrote:

> Too bad it won't work in an append only way.  The internal state of sha1
> after a certain set of bytes is deterministic, so you could do something
> like:

OK, here's a patch which does that. It's way faster:

  $ git init
  $ time for i in `seq 1 10`; do
      echo $i >file
      git add file
      git commit -q -m $i
    done
  real    0m1.212s
  user    0m1.132s
  sys     0m0.028s

So that's about .12 seconds per commit. Without my patch, it's about .01
seconds. So you waste a tenth of a second generating the collision. Not
too bad.

And the result:

  $ git log --oneline
  31337a1 10
  313376b 9
  3133782 8
  31337cf 7
  313377a 6
  313374b 5
  31337b1 4
  31337a3 3
  3133703 2
  3133706 1

And nothing shows up in the body, because git truncates at the NUL we
added:

  $ git show
  commit 31337a1093af2d97eb2e6c08b261c2946395fdd3
  Author: Jeff King <peff@peff.net>
  Date:   Wed Oct 19 15:34:00 2011 -0400

      10

  diff --git a/file b/file
  index ec63514..f599e28 100644
  --- a/file
  +++ b/file
  @@ -1 +1 @@
  -9
  +10

It also parameterizes the desired sha1, so you could easily find hashes
ending in 31337, or any other pattern. Or add "git commit
--collide=31337".

---
diff --git a/commit.c b/commit.c
index 73b7e00..c478752 100644
--- a/commit.c
+++ b/commit.c
@@ -840,6 +840,57 @@ struct commit_list *reduce_heads(struct commit_list *heads)
 	return result;
 }
 
+static unsigned char elite_want[20] = { 0x31, 0x33, 0x70 };
+static unsigned char elite_mask[20] = { 0xff, 0xff, 0xf0 };
+
+static inline int sha1_match_mask(unsigned char *sha1,
+				  unsigned char *want,
+				  unsigned char *mask)
+{
+	int i;
+	for (i = 0; i < 20; i++)
+		if ((want[i] & mask[i]) != (sha1[i] & mask[i]))
+		    return 0;
+	return 1;
+}
+
+static void collide_commit(struct strbuf *data,
+			   unsigned char want[20],
+			   unsigned char mask[20])
+{
+	static const char terminator[] = { 0 };
+	char header[32];
+	int header_len;
+	unsigned int lulz;
+	SHA_CTX base;
+
+	header_len = snprintf(header, sizeof(header),
+			      "commit %lu",
+			      data->len + 1 + sizeof(lulz)) + 1;
+	SHA1_Init(&base);
+	SHA1_Update(&base, header, header_len);
+	SHA1_Update(&base, data->buf, data->len);
+	SHA1_Update(&base, terminator, sizeof(terminator));
+
+	lulz = 0;
+	do {
+		SHA_CTX guess;
+		unsigned char sha1[20];
+
+		memcpy(&guess, &base, sizeof(guess));
+		SHA1_Update(&guess, &lulz, sizeof(lulz));
+		SHA1_Final(sha1, &guess);
+
+		if (sha1_match_mask(sha1, want, mask)) {
+			strbuf_add(data, terminator, sizeof(terminator));
+			strbuf_add(data, &lulz, sizeof(lulz));
+			return;
+		}
+
+		lulz++;
+	} while (1);
+}
+
 static const char commit_utf8_warn[] =
 "Warning: commit message does not conform to UTF-8.\n"
 "You may want to amend it after fixing the message, or set the config\n"
@@ -890,6 +941,8 @@ int commit_tree(const char *msg, unsigned char *tree,
 	if (encoding_is_utf8 && !is_utf8(buffer.buf))
 		fprintf(stderr, commit_utf8_warn);
 
+	collide_commit(&buffer, elite_want, elite_mask);
+
 	result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
 	strbuf_release(&buffer);
 	return result;

^ permalink raw reply related

* [PATCH] resolve_ref(): report breakage to the caller without warning
From: Junio C Hamano @ 2011-10-19 19:39 UTC (permalink / raw)
  To: git
  Cc: Michael Haggerty, Jeff King, cmn, A Large Angry SCM,
	Daniel Barkalow, Sverre Rabbelier
In-Reply-To: <7vaa8wdbld.fsf@alter.siamese.dyndns.org>

629cd3a (resolve_ref(): emit warnings for improperly-formatted references,
2011-09-15) made resolve_ref() warn against files that are found in the
directories the ref dwimmery looks at. The intent may be good, but these
messages come from a wrong level of the API hierarchy.

Instead record the breakage in "flags" whose purpose is to explain the
result of the function to the caller, who is in a much better position to
make intelligent decision based on the information.

This updates sha1_name.c::dwim_ref() to warn against such a broken
candidate only when it does not appear directly below $GIT_DIR to restore
the traditional behaviour, as we know many files directly underneath
$GIT_DIR/ are are not refs.

Warning against "git show config --" with "$GIT_DIR/config does not look
like a well-formed ref" does not make sense, and we may later tweak the
dwimmery not to even consider them as candidates, but that is a longer
term topic.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * With a log message and the same patch text.

 refs.c      |   22 +++++++++++-----------
 refs.h      |    5 +++--
 sha1_name.c |    5 ++++-
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/refs.c b/refs.c
index cab4394..0f58e46 100644
--- a/refs.c
+++ b/refs.c
@@ -4,9 +4,8 @@
 #include "tag.h"
 #include "dir.h"
 
-/* ISSYMREF=01 and ISPACKED=02 are public interfaces */
-#define REF_KNOWS_PEELED 04
-#define REF_BROKEN 010
+/* ISSYMREF=0x01, ISPACKED=0x02 and ISBROKEN=0x04 are public interfaces */
+#define REF_KNOWS_PEELED 0x10
 
 struct ref_entry {
 	unsigned char flag; /* ISSYMREF? ISPACKED? */
@@ -329,12 +328,12 @@ static void get_ref_dir(const char *submodule, const char *base,
 				flag = 0;
 				if (resolve_gitlink_ref(submodule, ref, sha1) < 0) {
 					hashclr(sha1);
-					flag |= REF_BROKEN;
+					flag |= REF_ISBROKEN;
 				}
 			} else
 				if (!resolve_ref(ref, sha1, 1, &flag)) {
 					hashclr(sha1);
-					flag |= REF_BROKEN;
+					flag |= REF_ISBROKEN;
 				}
 			add_ref(ref, sha1, flag, array, NULL);
 		}
@@ -501,7 +500,6 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 	ssize_t len;
 	char buffer[256];
 	static char ref_buffer[256];
-	char path[PATH_MAX];
 
 	if (flag)
 		*flag = 0;
@@ -510,6 +508,7 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		return NULL;
 
 	for (;;) {
+		char path[PATH_MAX];
 		struct stat st;
 		char *buf;
 		int fd;
@@ -586,8 +585,8 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		while (isspace(*buf))
 			buf++;
 		if (check_refname_format(buf, REFNAME_ALLOW_ONELEVEL)) {
-			warning("symbolic reference in %s is formatted incorrectly",
-				path);
+			if (flag)
+				*flag |= REF_ISBROKEN;
 			return NULL;
 		}
 		ref = strcpy(ref_buffer, buf);
@@ -596,7 +595,8 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 	}
 	/* Please note that FETCH_HEAD has a second line containing other data. */
 	if (get_sha1_hex(buffer, sha1) || (buffer[40] != '\0' && !isspace(buffer[40]))) {
-		warning("reference in %s is formatted incorrectly", path);
+		if (flag)
+			*flag |= REF_ISBROKEN;
 		return NULL;
 	}
 	return ref;
@@ -624,8 +624,8 @@ static int do_one_ref(const char *base, each_ref_fn fn, int trim,
 		return 0;
 
 	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) {
-		if (entry->flag & REF_BROKEN)
-			return 0; /* ignore dangling symref */
+		if (entry->flag & REF_ISBROKEN)
+			return 0; /* ignore dangling symref and corrupt ref */
 		if (!has_sha1_file(entry->sha1)) {
 			error("%s does not point to a valid object!", entry->name);
 			return 0;
diff --git a/refs.h b/refs.h
index 0229c57..7442b29 100644
--- a/refs.h
+++ b/refs.h
@@ -10,8 +10,9 @@ struct ref_lock {
 	int force_write;
 };
 
-#define REF_ISSYMREF 01
-#define REF_ISPACKED 02
+#define REF_ISSYMREF 0x01
+#define REF_ISPACKED 0x02
+#define REF_ISBROKEN 0x04
 
 /*
  * Calls the specified function for each ref file until it returns nonzero,
diff --git a/sha1_name.c b/sha1_name.c
index ba976b4..1fe37c6 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -282,8 +282,11 @@ int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref)
 				*ref = xstrdup(r);
 			if (!warn_ambiguous_refs)
 				break;
-		} else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD"))
+		} else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD")) {
 			warning("ignoring dangling symref %s.", fullref);
+		} else if ((flag & REF_ISBROKEN) && strchr(fullref, '/')) {
+			warning("ignoring broken ref %s.", fullref);
+		}
 	}
 	free(last_branch);
 	return refs_found;
-- 
1.7.7.494.g7d7707

^ permalink raw reply related

* Re: [PATCH] git-show-ref: fix escaping in asciidoc source
From: Junio C Hamano @ 2011-10-19 19:43 UTC (permalink / raw)
  To: mhagger; +Cc: git
In-Reply-To: <1319050336-24717-1-git-send-email-mhagger@alum.mit.edu>

Thanks.

^ permalink raw reply

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
From: Michael Haggerty @ 2011-10-19 20:31 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier
In-Reply-To: <7vaa8wdbld.fsf@alter.siamese.dyndns.org>

On 10/19/2011 09:29 PM, Junio C Hamano wrote:
> In general, the "Hey that file that appears in our dwimmed ref namespace
> does not store correct [0-9a-f]{40} contents" warning message is a good
> thing to have. When we try the dwimmery and disambiguation, we however
> look at the potential refs and warn disambiguity only when two or more
> such files have good contents. E.g. if I do this:
> 
>     $ git rev-parse HEAD >.git/refs/heads/frotz
>     $ echo hello >.git/refs/tags/frotz
>     $ git show frotz
> 
> we have never paid attention to the broken tag and showed the 'frotz'
> branch without complaining. Once tags/frotz gets a real object name,
> however, we start giving ambiguity warnings.
> 
> Perhaps that is what we should be fixing instead.

This all sounds very sane.  dwim_ref() currently casts about wildly
looking for possible references, so it makes sense that it be selective
about what warnings it emits.  I also agree with the principle that this
warning is better emitted from higher-level code.

> Perhaps resolve_ref() should return in its *flag parameter that "a file
> exists there but incorrectly formatted", and dwim_ref() should notice and
> use that information to warn about ambiguity and also illformed-ness.

ISTM that such warnings should also be emitted when (for example) the
following commands encounter corrupt references: "git fsck", "git
pack-refs", "git gc", and "git push".  (Maybe these commands already
emit warnings; I haven't checked.)  These commands are not run so
frequently (so that the warnings would not be so annoying).  They are
also presumably not so promiscuous about where they look for refs and
therefore won't generate so many false alarms.

But it will be easy to add warnings to these commands using the
REF_ISBROKEN flag that you made public.

> A patch is attached at the end of this message to minimally fix what is in
> 'master' (without the jc/check-ref-format-fixup topic).  [...]

I would have preferred this change being applied on top of your first
patch in jc/check-ref-format-fixup, because moving these functions to
refs.c makes sense.  (Not to mention that my "REFNAME_FULL" patch series
is built on top of jc/check-ref-format-fixup.)

>  refs.c      |   22 +++++++++++-----------
>  refs.h      |    5 +++--
>  sha1_name.c |    5 ++++-
>  3 files changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index cab4394..0f58e46 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -329,12 +328,12 @@ static void get_ref_dir(const char *submodule, const char *base,
>  				flag = 0;
>  				if (resolve_gitlink_ref(submodule, ref, sha1) < 0) {
>  					hashclr(sha1);
> -					flag |= REF_BROKEN;
> +					flag |= REF_ISBROKEN;
>  				}

The renaming of this constant could have been done in a separate patch
to reduce noise like this in the main patch.

Otherwise the patch looks fine to me.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
From: Junio C Hamano @ 2011-10-19 20:39 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier
In-Reply-To: <4E9F33B9.4040803@alum.mit.edu>

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I would have preferred this change being applied on top of your first
> patch in jc/check-ref-format-fixup, because moving these functions to
> refs.c makes sense.

Thanks for a quick sanity check, and I agree.

^ permalink raw reply

* [RFC 00/13] Checking full vs. partial refnames
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

From: Michael Haggerty <mhagger@alum.mit.edu>

There are many places where it is necessary to determine whether a
refname is a complete, valid, top-level reference name in the "refs/"
tree, one of the special refnames like "HEAD" or "FETCH_HEAD", or
whether it is potentially a valid fragment of a refname that can be
DWIMed into a true reference.  Until now such checks have been
incomplete and/or scattered around.

The first three patches in this series beef up check_refname_format()
(and adds another static function, parse_refname_prefix()) with the
ability to make such checks when the REFNAME_FULL flag is used.

The fourth patch removes the checking of refnames passed to
add_extra_ref(), allowing the function to tolerate oddities like
"refs/tags/3.1.1.1^{}".

The rest of the patches consist of wild-assed guesses about some
callers of check_refname_format() that I suppose can use the
REFNAME_FULL option, thereby tightening up what they accept.  About
all I can say is that the test suite passes with these patches
applied.  But recent experience indicates that the test suite has a
lot of holes.  Therefore, it would be great if experts would look over
these suggestions.

There are many other callers of check_refname_format() that I haven't
touched, because I'm not even brave enough to make wild-assed guesses
about them.  (Since I left them without the REFNAME_FULL option, they
rather allow too many references through than too few.)  It would be
great if a more experienced developer would look at the other callers
and decide which need to be changed.

BTW, this patch series does *not* fix the specific problem that Junio
mentioned (that "git update-ref tmp/junk HEAD" does not reject the
bogus refname), nor probably many others.  The gruelling work is not
this patch series; it is the effort of tracking down all of the code
paths that might try to pass bogus refnames to the refs API.

This patch series applies on top of jc/check-ref-format-fixup, and
also rebases smoothly to the current "next".

Michael Haggerty (13):
  check_refname_component(): iterate via index rather than via pointer
  parse_refname_prefix(): new function
  Teach check_refname_format() to check full refnames
  add_ref(): move the call of check_refname_format() to callers
  receive-pack::update(): use check_refname_format(..., REFNAME_FULL)
  strbuf_check_branch_ref(): use check_refname_format(...,
    REFNAME_FULL)
  one_local_ref(): use check_refname_format(..., REFNAME_FULL)
  expand_namespace(): the refname is full, so use REFNAME_FULL option
  new_branch(): verify that new branch name is a valid full refname
  strbuf_check_tag_ref(): the refname is full, so use REFNAME_FULL
    option
  replace_object(): the refname is full, so use REFNAME_FULL option
  resolve_ref: use check_refname_format(..., REFNAME_FULL)
  filter_refs(): the refname is full, so use REFNAME_FULL option

 Documentation/git-check-ref-format.txt |   14 ++++
 builtin/check-ref-format.c             |    4 +
 builtin/fetch-pack.c                   |    2 +-
 builtin/receive-pack.c                 |    2 +-
 builtin/replace.c                      |    2 +-
 builtin/tag.c                          |    2 +-
 environment.c                          |    2 +-
 fast-import.c                          |    2 +-
 refs.c                                 |  124 ++++++++++++++++++++------------
 refs.h                                 |   12 ++-
 remote.c                               |    2 +-
 sha1_name.c                            |    2 +-
 t/t1402-check-ref-format.sh            |   31 ++++++++
 13 files changed, 143 insertions(+), 58 deletions(-)

-- 
1.7.7

^ permalink raw reply

* [RFC 05/13] receive-pack::update(): use check_refname_format(..., REFNAME_FULL)
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>

Replace local code with a use of check_refname_format()'s new
REFNAME_FULL option.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/receive-pack.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 261b610..508451b 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -401,7 +401,7 @@ static const char *update(struct command *cmd)
 	struct ref_lock *lock;
 
 	/* only refs/... are allowed */
-	if (prefixcmp(name, "refs/") || check_refname_format(name + 5, 0)) {
+	if (check_refname_format(name, REFNAME_FULL)) {
 		rp_error("refusing to create funny ref '%s' remotely", name);
 		return "funny refname";
 	}
-- 
1.7.7

^ permalink raw reply related

* [RFC 04/13] add_ref(): move the call of check_refname_format() to callers
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>

Do not call check_refname_format() in add_ref(); instead change its
callers to do the check.  (In fact, don't do any checking in
add_extra_ref(), because that function handles bizarre things like
"refs/tags/3.1.1.1^{}".)

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---

I'm still not clear on how extra_refs are used.  Are they generated
from local refs or are they generated from remote refs?  If the
latter, then it is probably irresponsible not to do *some* sanity
checking in add_extra_ref() to prevent any chance of refnames like
"../../../etc/passwd".

 refs.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index 8299e51..a40dfa5 100644
--- a/refs.c
+++ b/refs.c
@@ -60,8 +60,6 @@ static void add_ref(const char *name, const unsigned char *sha1,
 	entry = xmalloc(sizeof(struct ref_entry) + len);
 	hashcpy(entry->sha1, sha1);
 	hashclr(entry->peeled);
-	if (check_refname_format(name, REFNAME_ALLOW_ONELEVEL|REFNAME_DOT_COMPONENT))
-		die("Reference has invalid format: '%s'", name);
 	memcpy(entry->name, name, len);
 	entry->flag = flag;
 	if (new_entry)
@@ -232,6 +230,8 @@ static void read_packed_refs(FILE *f, struct ref_array *array)
 
 		name = parse_ref_line(refline, sha1);
 		if (name) {
+			if (check_refname_format(name, REFNAME_FULL))
+				die("Reference has invalid format: '%s'", name);
 			add_ref(name, sha1, flag, array, &last);
 			continue;
 		}
@@ -336,6 +336,8 @@ static void get_ref_dir(const char *submodule, const char *base,
 					hashclr(sha1);
 					flag |= REF_BROKEN;
 				}
+			if (check_refname_format(ref, REFNAME_FULL))
+				die("Reference has invalid format: '%s'", ref);
 			add_ref(ref, sha1, flag, array, NULL);
 		}
 		free(ref);
-- 
1.7.7

^ permalink raw reply related

* [RFC 02/13] parse_refname_prefix(): new function
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>

Add a function parse_refname_prefix() that can read a possible refname
from the front of a string.  This is like check_refname_format(),
except:

* It accepts (string, len) parameters and can therefore handle
  non-NUL-terminated strings.

* It returns the length of the part of the string that was parsed,
  allowing it to be used for refnames that are followed by additional
  characters.

Re-implement check_refname_format() using the new function.

Rename function check_refname_component() to parse_refname_component()
for consistency.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |   64 ++++++++++++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/refs.c b/refs.c
index 3b41252..387da83 100644
--- a/refs.c
+++ b/refs.c
@@ -907,24 +907,30 @@ static inline int bad_ref_char(int ch)
 
 /*
  * Try to read one refname component from the front of ref.  Return
- * the length of the component found, or -1 if the component is not
- * legal.
+ * the length of the component found, or -1 if the start of the string
+ * cannot be interpreted as a component of a legal refname.
  */
-static int check_refname_component(const char *ref, int flags)
+static int parse_refname_component(const char *ref, int len, int flags)
 {
-	int i, len = strlen(ref);
+	int i;
 	char last = '\0';
 
 	for (i = 0; i < len; i++) {
 		char ch = ref[i];
-		if (ch == '\0' || ch == '/')
-			break;
+		if (ch == '/')
+			break; /* Component terminated by "..". */
 		if (bad_ref_char(ch))
-			return -1; /* Illegal character in refname. */
-		if (last == '.' && ch == '.')
-			return -1; /* Refname contains "..". */
-		if (last == '@' && ch == '{')
-			return -1; /* Refname contains "@{". */
+			break; /* Component terminated by illegal character. */
+		if (last == '.' && ch == '.') {
+			/* Component terminated by "..". */
+			i--;
+			break;
+		}
+		if (last == '@' && ch == '{') {
+			/* Component terminated by "@{". */
+			i--;
+			break;
+		}
 		last = ch;
 	}
 	if (i == 0)
@@ -944,17 +950,26 @@ static int check_refname_component(const char *ref, int flags)
 	return i;
 }
 
-int check_refname_format(const char *ref, int flags)
+/*
+ * Try to interpret the beginning of a string as a refname.  Return
+ * the length of the part of the string that could constitute a valid
+ * refname, or -1 if the start of the string cannot possibly be
+ * interpreted as a refname.  flags has the same interpretation as for
+ * check_refname_format().
+ */
+static int parse_refname_prefix(const char *ref, int len, int flags)
 {
 	int component_len, component_count = 0;
+	const char *p = ref;
+	int valid_len;
 
 	while (1) {
-		/* We are at the start of a path component. */
-		component_len = check_refname_component(ref, flags);
+		/* p is at the start of a path component. */
+		component_len = parse_refname_component(p, len, flags);
 		if (component_len < 0) {
 			if ((flags & REFNAME_REFSPEC_PATTERN) &&
-					ref[0] == '*' &&
-					(ref[1] == '\0' || ref[1] == '/')) {
+					len && p[0] == '*' &&
+					(len == 1 || p[1] == '/')) {
 				/* Accept one wildcard as a full refname component. */
 				flags &= ~REFNAME_REFSPEC_PATTERN;
 				component_len = 1;
@@ -963,17 +978,26 @@ int check_refname_format(const char *ref, int flags)
 			}
 		}
 		component_count++;
-		if (ref[component_len] == '\0')
+		if (component_len == len || p[component_len] != '/')
 			break;
 		/* Skip to next component. */
-		ref += component_len + 1;
+		p += component_len + 1;
+		len -= component_len + 1;
 	}
 
-	if (ref[component_len - 1] == '.')
+	valid_len = p + component_len - ref;
+
+	if (ref[valid_len - 1] == '.')
 		return -1; /* Refname ends with '.'. */
 	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
 		return -1; /* Refname has only one component. */
-	return 0;
+	return valid_len;
+}
+
+int check_refname_format(const char *ref, int flags)
+{
+	int len = strlen(ref);
+	return parse_refname_prefix(ref, len, flags) == len ? 0 : -1;
 }
 
 const char *prettify_refname(const char *name)
-- 
1.7.7

^ permalink raw reply related

* [RFC 06/13] strbuf_check_branch_ref(): use check_refname_format(..., REFNAME_FULL)
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>

The reference name should be a full one, so adjust the check
accordingly.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 sha1_name.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 03ffc2c..335da37 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -883,7 +883,7 @@ int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
 	if (name[0] == '-')
 		return -1;
 	strbuf_splice(sb, 0, 0, "refs/heads/", 11);
-	return check_refname_format(sb->buf, 0);
+	return check_refname_format(sb->buf, REFNAME_FULL);
 }
 
 /*
-- 
1.7.7

^ permalink raw reply related

* [RFC 09/13] new_branch(): verify that new branch name is a valid full refname
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---

Is it possible to omit the REFNAME_ALLOW_ONELEVEL option from this
call?

 fast-import.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 8d8ea3c..51cf898 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -722,7 +722,7 @@ static struct branch *new_branch(const char *name)
 
 	if (b)
 		die("Invalid attempt to create duplicate branch: %s", name);
-	if (check_refname_format(name, REFNAME_ALLOW_ONELEVEL))
+	if (check_refname_format(name, REFNAME_FULL|REFNAME_ALLOW_ONELEVEL))
 		die("Branch name doesn't conform to GIT standards: %s", name);
 
 	b = pool_calloc(1, sizeof(struct branch));
-- 
1.7.7

^ permalink raw reply related

* [RFC 07/13] one_local_ref(): use check_refname_format(..., REFNAME_FULL)
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>

Instead of manually skipping over "refs/", let check_refname_format()
handle the whole refname.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 remote.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/remote.c b/remote.c
index e52aa9b..30ee6d8 100644
--- a/remote.c
+++ b/remote.c
@@ -1595,7 +1595,7 @@ static int one_local_ref(const char *refname, const unsigned char *sha1, int fla
 	int len;
 
 	/* we already know it starts with refs/ to get here */
-	if (check_refname_format(refname + 5, 0))
+	if (check_refname_format(refname, REFNAME_FULL))
 		return 0;
 
 	len = strlen(refname) + 1;
-- 
1.7.7

^ permalink raw reply related

* [RFC 08/13] expand_namespace(): the refname is full, so use REFNAME_FULL option
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 environment.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/environment.c b/environment.c
index 0bee6a7..a6d6f04 100644
--- a/environment.c
+++ b/environment.c
@@ -107,7 +107,7 @@ static char *expand_namespace(const char *raw_namespace)
 		if (strcmp((*c)->buf, "/") != 0)
 			strbuf_addf(&buf, "refs/namespaces/%s", (*c)->buf);
 	strbuf_list_free(components);
-	if (check_refname_format(buf.buf, 0))
+	if (check_refname_format(buf.buf, REFNAME_FULL))
 		die("bad git namespace path \"%s\"", raw_namespace);
 	strbuf_addch(&buf, '/');
 	return strbuf_detach(&buf, NULL);
-- 
1.7.7

^ permalink raw reply related

* [RFC 03/13] Teach check_refname_format() to check full refnames
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>

A full refname has to start with "refs/" or consist entirely of capital
letters and underscores.  Add an option REFNAME_FULL that causes
check_refname_format() and parse_refname_prefix() to verify that the
refname is a valid full refname.

Add a "--full" option to "git check-ref-format" that can be used to
access the new option.  Use this to add some tests involving "git
check-ref-format --full".

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 Documentation/git-check-ref-format.txt |   14 +++++++++++
 builtin/check-ref-format.c             |    4 +++
 refs.c                                 |   39 ++++++++++++++++++-------------
 refs.h                                 |   12 ++++++---
 t/t1402-check-ref-format.sh            |   31 +++++++++++++++++++++++++
 5 files changed, 80 insertions(+), 20 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt
index 103e7b1..9a8aafb 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -27,6 +27,11 @@ if refs are packed by `git gc`).
 
 git imposes the following rules on how references are named:
 
+. They must either start with `refs/` (e.g., `refs/heads/master`) or
+  consist entirely of capital letters and underscores (e.g., `HEAD` or
+  `MERGE_HEAD`).  This rule is enforced if the `--full` option is
+  used.
+
 . They can include slash `/` for hierarchical (directory)
   grouping, but no slash-separated component can begin with a
   dot `.` or end with the sequence `.lock`.
@@ -83,6 +88,15 @@ typed the branch name.
 
 OPTIONS
 -------
+--full::
+--no-full::
+	Controls whether the refname must represent a full refname
+	(i.e., starting with `refs/`).  If `--full` and
+	`--allow-onelevel` are both specified, then special top-level
+	refnames consisting of capital letters and underscored (like
+	`HEAD` or `MERGE_HEAD`) are also allowed.  The default is
+	`--no-full`.
+
 --allow-onelevel::
 --no-allow-onelevel::
 	Controls whether one-level refnames are accepted (i.e.,
diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 28a7320..a73626d 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -64,6 +64,10 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 	for (i = 1; i < argc && argv[i][0] == '-'; i++) {
 		if (!strcmp(argv[i], "--normalize") || !strcmp(argv[i], "--print"))
 			normalize = 1;
+		else if (!strcmp(argv[i], "--full"))
+			flags |= REFNAME_FULL;
+		else if (!strcmp(argv[i], "--no-full"))
+			flags &= ~REFNAME_FULL;
 		else if (!strcmp(argv[i], "--allow-onelevel"))
 			flags |= REFNAME_ALLOW_ONELEVEL;
 		else if (!strcmp(argv[i], "--no-allow-onelevel"))
diff --git a/refs.c b/refs.c
index 387da83..8299e51 100644
--- a/refs.c
+++ b/refs.c
@@ -980,6 +980,11 @@ static int parse_refname_prefix(const char *ref, int len, int flags)
 		component_count++;
 		if (component_len == len || p[component_len] != '/')
 			break;
+		if (component_count == 1 && (flags & REFNAME_FULL)) {
+			/* That was the first of multiple components */
+			if (component_len != 4 || strncmp(p, "refs", 4))
+				return -1;
+		}
 		/* Skip to next component. */
 		p += component_len + 1;
 		len -= component_len + 1;
@@ -989,8 +994,22 @@ static int parse_refname_prefix(const char *ref, int len, int flags)
 
 	if (ref[valid_len - 1] == '.')
 		return -1; /* Refname ends with '.'. */
-	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
-		return -1; /* Refname has only one component. */
+
+	if (component_count == 1) {
+		if (!(flags & REFNAME_ALLOW_ONELEVEL))
+			return -1;
+
+		if (flags & REFNAME_FULL) {
+			/* Make sure that the refname is ALL_CAPS. */
+			int i;
+			for (i = 0; i < component_len; i++) {
+				char ch = p[i];
+				if (ch != '_' && (ch < 'A' || 'Z' < ch))
+					return -1;
+			}
+		}
+	}
+
 	return valid_len;
 }
 
@@ -1028,20 +1047,8 @@ const char *ref_fetch_rules[] = {
 
 static int refname_ok_at_root_level(const char *str, int len)
 {
-	if (len >= 5 && !memcmp(str, "refs/", 5))
-		return 1;
-
-	while (len--) {
-		char ch = *str++;
-
-		/*
-		 * Only accept likes of .git/HEAD, .git/MERGE_HEAD at
-		 * the root level as a ref.
-		 */
-		if (ch != '_' && (ch < 'A' || 'Z' < ch))
-			return 0;
-	}
-	return 1;
+	return parse_refname_prefix(str, len,
+				    REFNAME_ALLOW_ONELEVEL|REFNAME_FULL) == len;
 }
 
 int refname_match(const char *abbrev_name, const char *full_name, const char **rules)
diff --git a/refs.h b/refs.h
index 0229c57..7707cda 100644
--- a/refs.h
+++ b/refs.h
@@ -97,9 +97,10 @@ int for_each_recent_reflog_ent(const char *ref, each_reflog_ent_fn fn, long, voi
  */
 extern int for_each_reflog(each_ref_fn, void *);
 
-#define REFNAME_ALLOW_ONELEVEL 1
-#define REFNAME_REFSPEC_PATTERN 2
-#define REFNAME_DOT_COMPONENT 4
+#define REFNAME_ALLOW_ONELEVEL 01
+#define REFNAME_REFSPEC_PATTERN 02
+#define REFNAME_DOT_COMPONENT 04
+#define REFNAME_FULL 010
 
 /*
  * Return 0 iff ref has the correct format for a refname according to
@@ -110,7 +111,10 @@ extern int for_each_reflog(each_ref_fn, void *);
  * components.  No leading or repeated slashes are accepted.  If
  * REFNAME_DOT_COMPONENT is set in flags, then allow refname
  * components to start with "." (but not a whole component equal to
- * "." or "..").
+ * "." or "..").  If REFNAME_FULL is set in flags, then additionally
+ * verify that the refname is a valid full refname--that it either
+ * starts with "refs/" or that it consists of only capital letters and
+ * underscores (like "HEAD" or "MERGE_HEAD").
  */
 extern int check_refname_format(const char *ref, int flags);
 
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index 1ae4d87..2f8a48e 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -70,6 +70,7 @@ invalid_ref "$ref" --refspec-pattern
 valid_ref "$ref" '--refspec-pattern --allow-onelevel'
 invalid_ref "$ref" --normalize
 valid_ref "$ref" '--allow-onelevel --normalize'
+invalid_ref "$ref" '--full'
 
 ref='foo/bar'
 valid_ref "$ref"
@@ -77,12 +78,19 @@ valid_ref "$ref" --allow-onelevel
 valid_ref "$ref" --refspec-pattern
 valid_ref "$ref" '--refspec-pattern --allow-onelevel'
 valid_ref "$ref" --normalize
+invalid_ref "$ref" '--full'
 
 ref='foo/*'
 invalid_ref "$ref"
 invalid_ref "$ref" --allow-onelevel
 valid_ref "$ref" --refspec-pattern
 valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+invalid_ref "$ref" '--full'
+invalid_ref "$ref" '--refspec-pattern --full'
+
+ref='refs/*'
+invalid_ref "$ref" '--full'
+valid_ref "$ref" '--refspec-pattern --full'
 
 ref='*/foo'
 invalid_ref "$ref"
@@ -91,18 +99,23 @@ valid_ref "$ref" --refspec-pattern
 valid_ref "$ref" '--refspec-pattern --allow-onelevel'
 invalid_ref "$ref" --normalize
 valid_ref "$ref" '--refspec-pattern --normalize'
+invalid_ref "$ref" '--full'
+invalid_ref "$ref" '--refspec-pattern --full'
 
 ref='foo/*/bar'
 invalid_ref "$ref"
 invalid_ref "$ref" --allow-onelevel
 valid_ref "$ref" --refspec-pattern
 valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+invalid_ref "$ref" '--full'
 
 ref='*'
 invalid_ref "$ref"
 invalid_ref "$ref" --allow-onelevel
 invalid_ref "$ref" --refspec-pattern
 valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+invalid_ref "$ref" '--full'
+invalid_ref "$ref" '--refspec-pattern --full'
 
 ref='foo/*/*'
 invalid_ref "$ref" --refspec-pattern
@@ -126,6 +139,24 @@ valid_ref NOT_MINGW "$ref" '--allow-onelevel --normalize'
 invalid_ref NOT_MINGW "$ref" '--refspec-pattern --normalize'
 valid_ref NOT_MINGW "$ref" '--refspec-pattern --allow-onelevel --normalize'
 
+ref='HEAD'
+invalid_ref "$ref"
+valid_ref "$ref" --allow-onelevel
+invalid_ref "$ref" --full
+valid_ref "$ref" '--allow-onelevel --full'
+
+valid_ref FETCH_HEAD '--allow-onelevel --full'
+valid_ref refs/foo --full
+invalid_ref HEAD/ '--allow-onelevel --full'
+valid_ref HEAD/ '--allow-onelevel --full --normalize'
+invalid_ref refs '--full'
+invalid_ref refs '--allow-onelevel --full'
+invalid_ref refs/ '--allow-onelevel --full'
+invalid_ref HEAD/foo '--full'
+invalid_ref head '--allow-onelevel --full'
+valid_ref 'refs/foo/*' '--refspec-pattern --full'
+valid_ref 'refs/*/foo' '--refspec-pattern --full'
+
 test_expect_success "check-ref-format --branch @{-1}" '
 	T=$(git write-tree) &&
 	sha1=$(echo A | git commit-tree $T) &&
-- 
1.7.7

^ permalink raw reply related

* [RFC 01/13] check_refname_component(): iterate via index rather than via pointer
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>

This will make later changes easier.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/refs.c b/refs.c
index 0d74d70..3b41252 100644
--- a/refs.c
+++ b/refs.c
@@ -912,11 +912,11 @@ static inline int bad_ref_char(int ch)
  */
 static int check_refname_component(const char *ref, int flags)
 {
-	const char *cp;
+	int i, len = strlen(ref);
 	char last = '\0';
 
-	for (cp = ref; ; cp++) {
-		char ch = *cp;
+	for (i = 0; i < len; i++) {
+		char ch = ref[i];
 		if (ch == '\0' || ch == '/')
 			break;
 		if (bad_ref_char(ch))
@@ -927,7 +927,7 @@ static int check_refname_component(const char *ref, int flags)
 			return -1; /* Refname contains "@{". */
 		last = ch;
 	}
-	if (cp == ref)
+	if (i == 0)
 		return -1; /* Component has zero length. */
 	if (ref[0] == '.') {
 		if (!(flags & REFNAME_DOT_COMPONENT))
@@ -936,12 +936,12 @@ static int check_refname_component(const char *ref, int flags)
 		 * Even if leading dots are allowed, don't allow "."
 		 * as a component (".." is prevented by a rule above).
 		 */
-		if (ref[1] == '\0')
+		if (i == 1)
 			return -1; /* Component equals ".". */
 	}
-	if (cp - ref >= 5 && !memcmp(cp - 5, ".lock", 5))
+	if (i >= 5 && !memcmp(&ref[i - 5], ".lock", 5))
 		return -1; /* Refname ends with ".lock". */
-	return cp - ref;
+	return i;
 }
 
 int check_refname_format(const char *ref, int flags)
-- 
1.7.7

^ permalink raw reply related

* [RFC 10/13] strbuf_check_tag_ref(): the refname is full, so use REFNAME_FULL option
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/tag.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index 9b6fd95..abf3cb0 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -412,7 +412,7 @@ static int strbuf_check_tag_ref(struct strbuf *sb, const char *name)
 	strbuf_reset(sb);
 	strbuf_addf(sb, "refs/tags/%s", name);
 
-	return check_refname_format(sb->buf, 0);
+	return check_refname_format(sb->buf, REFNAME_FULL);
 }
 
 int cmd_tag(int argc, const char **argv, const char *prefix)
-- 
1.7.7

^ permalink raw reply related

* [RFC 11/13] replace_object(): the refname is full, so use REFNAME_FULL option
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/replace.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/builtin/replace.c b/builtin/replace.c
index 517fa10..f5d4f77 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -94,7 +94,7 @@ static int replace_object(const char *object_ref, const char *replace_ref,
 		     "refs/replace/%s",
 		     sha1_to_hex(object)) > sizeof(ref) - 1)
 		die("replace ref name too long: %.*s...", 50, ref);
-	if (check_refname_format(ref, 0))
+	if (check_refname_format(ref, REFNAME_FULL))
 		die("'%s' is not a valid ref name.", ref);
 
 	if (!resolve_ref(ref, prev, 1, NULL))
-- 
1.7.7

^ permalink raw reply related

* [RFC 13/13] filter_refs(): the refname is full, so use REFNAME_FULL option
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/fetch-pack.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index c6bc8eb..d72aa44 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -546,7 +546,7 @@ static void filter_refs(struct ref **refs, int nr_match, char **match)
 	for (ref = *refs; ref; ref = next) {
 		next = ref->next;
 		if (!memcmp(ref->name, "refs/", 5) &&
-		    check_refname_format(ref->name + 5, 0))
+		    check_refname_format(ref->name, REFNAME_FULL))
 			; /* trash */
 		else if (args.fetch_all &&
 			 (!args.depth || prefixcmp(ref->name, "refs/tags/") )) {
-- 
1.7.7

^ permalink raw reply related

* [RFC 12/13] resolve_ref: use check_refname_format(..., REFNAME_FULL)
From: mhagger @ 2011-10-19 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty
In-Reply-To: <1319057716-28094-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>

...instead of own inline code.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index a40dfa5..67df3a6 100644
--- a/refs.c
+++ b/refs.c
@@ -548,8 +548,7 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 			if (len < 0)
 				return NULL;
 			buffer[len] = 0;
-			if (!prefixcmp(buffer, "refs/") &&
-					!check_refname_format(buffer, 0)) {
+			if (!check_refname_format(buffer, REFNAME_FULL)) {
 				strcpy(ref_buffer, buffer);
 				ref = ref_buffer;
 				if (flag)
-- 
1.7.7

^ permalink raw reply related

* [PATCH v2 0/6] Sequencer fixups mini-series
From: Ramkumar Ramachandra @ 2011-10-19 21:03 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Jonathan Nieder, Christian Couder

Hi,

This is the second iteration of series I first posted in
$gmane/183157.  The improvements can be summarized as follows:
1. Junio wrote some extra code to squash into the fourth part.  It
took me some time to resolve conflicts correctly while rebasing.
2. Jonathan and Tay (off-list) have helped improve all the commit
messages.

Thanks.

Jonathan Nieder (1):
  revert: simplify communicating command-line arguments

Ramkumar Ramachandra (5):
  revert: free msg in format_todo()
  revert: simplify getting commit subject in format_todo()
  revert: fix buffer overflow in insn sheet parser
  revert: make commit subjects in insn sheet optional
  revert: allow mixed pick and revert instructions

 builtin/revert.c                |  219 +++++++++++++++++++--------------------
 sequencer.h                     |    8 ++
 t/t3510-cherry-pick-sequence.sh |   86 +++++++++++++++
 3 files changed, 202 insertions(+), 111 deletions(-)

-- 
1.7.6.351.gb35ac.dirty

^ permalink raw reply

* [PATCH 1/6] revert: free msg in format_todo()
From: Ramkumar Ramachandra @ 2011-10-19 21:03 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Jonathan Nieder, Christian Couder
In-Reply-To: <1319058208-17923-1-git-send-email-artagnon@gmail.com>

Memory allocated to the fields of msg by get_message() isn't freed.
This is potentially a big leak, because fresh memory is allocated to
store the commit message for each commit.  Fix this using
free_message().

Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/revert.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/builtin/revert.c b/builtin/revert.c
index 010508d..94b7c50 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -680,6 +680,7 @@ static int format_todo(struct strbuf *buf, struct commit_list *todo_list,
 		if (get_message(cur->item, &msg))
 			return error(_("Cannot get commit message for %s"), sha1_abbrev);
 		strbuf_addf(buf, "%s %s %s\n", action_str, sha1_abbrev, msg.subject);
+		free_message(&msg);
 	}
 	return 0;
 }
-- 
1.7.6.351.gb35ac.dirty

^ permalink raw reply related

* [PATCH 2/6] revert: simplify getting commit subject in format_todo()
From: Ramkumar Ramachandra @ 2011-10-19 21:03 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Jonathan Nieder, Christian Couder
In-Reply-To: <1319058208-17923-1-git-send-email-artagnon@gmail.com>

format_todo() calls get_message(), but uses only the subject line of
the commit message.  Save work and unnecessary memory allocations by
using find_commit_subject() instead.

Suggested-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/revert.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/builtin/revert.c b/builtin/revert.c
index 94b7c50..acb357d 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -671,16 +671,16 @@ static int format_todo(struct strbuf *buf, struct commit_list *todo_list,
 		struct replay_opts *opts)
 {
 	struct commit_list *cur = NULL;
-	struct commit_message msg = { NULL, NULL, NULL, NULL, NULL };
 	const char *sha1_abbrev = NULL;
 	const char *action_str = opts->action == REVERT ? "revert" : "pick";
+	const char *subject;
+	int subject_len;
 
 	for (cur = todo_list; cur; cur = cur->next) {
 		sha1_abbrev = find_unique_abbrev(cur->item->object.sha1, DEFAULT_ABBREV);
-		if (get_message(cur->item, &msg))
-			return error(_("Cannot get commit message for %s"), sha1_abbrev);
-		strbuf_addf(buf, "%s %s %s\n", action_str, sha1_abbrev, msg.subject);
-		free_message(&msg);
+		subject_len = find_commit_subject(cur->item->buffer, &subject);
+		strbuf_addf(buf, "%s %s %.*s\n", action_str, sha1_abbrev,
+			subject_len, subject);
 	}
 	return 0;
 }
-- 
1.7.6.351.gb35ac.dirty

^ permalink raw reply related

* [PATCH 3/6] revert: fix buffer overflow in insn sheet parser
From: Ramkumar Ramachandra @ 2011-10-19 21:03 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Jonathan Nieder, Christian Couder
In-Reply-To: <1319058208-17923-1-git-send-email-artagnon@gmail.com>

Check that the commit name argument to a "pick" or "revert" action in
'.git/sequencer/todo' is not too long, to avoid overflowing an
on-stack buffer.  This fixes a regression introduced by 5a5d80f4
(revert: Introduce --continue to continue the operation, 2011-08-04).

Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/revert.c                |    2 +-
 t/t3510-cherry-pick-sequence.sh |   14 ++++++++++++++
 2 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/builtin/revert.c b/builtin/revert.c
index acb357d..474dda1 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -705,7 +705,7 @@ static struct commit *parse_insn_line(char *start, struct replay_opts *opts)
 		return NULL;
 
 	q = strchr(p, ' ');
-	if (!q)
+	if (!q || q - p + 1 > sizeof(sha1_abbrev))
 		return NULL;
 	q++;
 
diff --git a/t/t3510-cherry-pick-sequence.sh b/t/t3510-cherry-pick-sequence.sh
index 3bca2b3..2113308 100755
--- a/t/t3510-cherry-pick-sequence.sh
+++ b/t/t3510-cherry-pick-sequence.sh
@@ -12,6 +12,9 @@ test_description='Test cherry-pick continuation features
 
 . ./test-lib.sh
 
+# Repeat first match 10 times
+_r10='\1\1\1\1\1\1\1\1\1\1'
+
 pristine_detach () {
 	git cherry-pick --reset &&
 	git checkout -f "$1^0" &&
@@ -211,4 +214,15 @@ test_expect_success 'malformed instruction sheet 2' '
 	test_must_fail git cherry-pick --continue
 '
 
+test_expect_success 'malformed instruction sheet 3' '
+	pristine_detach initial &&
+	test_must_fail git cherry-pick base..anotherpick &&
+	echo "resolved" >foo &&
+	git add foo &&
+	git commit &&
+	sed "s/pick \([0-9a-f]*\)/pick $_r10/" .git/sequencer/todo >new_sheet &&
+	cp new_sheet .git/sequencer/todo &&
+	test_must_fail git cherry-pick --continue
+'
+
 test_done
-- 
1.7.6.351.gb35ac.dirty

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox