git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Improved infrastructure for refname normalization
@ 2011-09-10  6:50 Michael Haggerty
  2011-09-10  6:50 ` [PATCH v2 1/7] t1402: add some more tests Michael Haggerty
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-10  6:50 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, A Large Angry SCM, Michael Haggerty

Patch series re-roll:

The main difference is that I incorporated feedback from Junio that it
should be made more difficult to use unnormalized references.  The
mechanism that I selected is a bit different than discussed in the
mailing list; I changed normalize_refname() to accept unnormalized
refnames *only* if the dst argument is non-NULL.  I changed
check_ref_format() to reject unnormalized refnames but implemented a
transitional function check_ref_format_unsafe() that has the old
behavior and changed callers to use the _unsafe function to avoid
changing their behavior.


As a prerequisite to storing references caches hierarchically (itself
needed for performance reasons), here is a patch series to help us get
refname normalization under control.

The problem is that some UI accepts unnormalized reference names (like
"/foo/bar" or "foo///bar" instead of "foo/bar") and passes them on to
library routines without normalizing them.  The library, on the other
hand, assumes that the refnames are normalized.  Sometimes (mostly in
the case of loose references) unnormalized refnames happen to work,
but in other cases (like packed references or when looking up refnames
in the cache) they silently fail.  Given that refnames are sometimes
treated as path names, there is a chance that some security-relevant
bugs are lurking in this area, if not in git proper then in scripts
that interact with git.

This patch series adds the following tools for dealing with refnames
and their normalization (without actually doing much to fix the
problem; see below):

* Fix check_ref_format() to make it easier and reliable to specify
  which types of refnames are allowed in a particular situation
  (multi-level vs. one-level, with vs. without a refspec-like "*").
  This function only accepts normalized refnames.

* Add a function normalize_refname() that accepts unnormalized
  refnames, checks the format, and outputs a normalized version.

* Add a function check_ref_format_unsafe() that has the old behavior.
  Change callers to use this function until they can be fixed.

* Add options to "git check-ref-format" to give scripts access to
  these facilities (and to allow them to be tested in the test suite).

* Forbid ".lock" at the end of any refname component, as directories
  with such names can conflict with attempts to create lock files for
  other refnames.


I suggest the following policy for handling unnormalized refnames more
consistently:

Unnormalized refnames should only be accepted at the UI level and
should be normalized before use.  This can be done using code like

    int len = strlen(arg);
    char *normalized_refname = xmalloc(len);
    if (normalize_refname(normalized_refname, len, arg, flags))
        die("invalid refname '%s'", arg);

    /* From now on, use normalized_refname. */

Refnames coming from other sources, such as from a remote repository,
should be checked that they are in the correct *normalized* format,
like so:

    if (check_ref_format(refname, flags))
        die("invalid refname '%s'", refname);

Refnames from the local repository (e.g., from the packed references
file) should also be checked that they are in the correct normalized
format, though this policy could be debated if there are performance
concerns.

Refnames probably do not need to be checked at the entrance to the
refs.{c,h} library functions because callers are responsible for not
passing invalid or unnormalized refnames in.  However, some assert()s
would probably be justified, especially during the transition while we
are fixing broken callers.

If there is agreement about this policy, I would be happy to write it
up (presumably in Documentation/technical/api-ref.txt, maybe also
incorporating the content from
Documentation/technical/api-ref-iteration.txt).

I do not yet have enough global overview of the code to know which
callers need fixing, but once these tools are in place the callers can
be fixed incrementally.

Michael Haggerty (7):
  t1402: add some more tests
  Change bad_ref_char() to return a boolean value
  git check-ref-format: add options --allow-onelevel and
    --refspec-pattern
  Change check_ref_format() to take a flags argument
  Add a library function normalize_refname()
  Do not allow ".lock" at the end of any refname component
  Add tools to avoid the use of unnormalized refnames.

 Documentation/git-check-ref-format.txt |   44 ++++++++--
 builtin/check-ref-format.c             |   76 +++++++++---------
 builtin/checkout.c                     |    2 +-
 builtin/fetch-pack.c                   |    2 +-
 builtin/receive-pack.c                 |    2 +-
 builtin/replace.c                      |    2 +-
 builtin/show-ref.c                     |    2 +-
 builtin/tag.c                          |    4 +-
 connect.c                              |    2 +-
 environment.c                          |    2 +-
 fast-import.c                          |    7 +--
 notes-merge.c                          |    5 +-
 pack-refs.c                            |    2 +-
 refs.c                                 |  139 +++++++++++++++++++------------
 refs.h                                 |   46 +++++++++-
 remote.c                               |   54 ++++---------
 sha1_name.c                            |    4 +-
 t/t1402-check-ref-format.sh            |   96 ++++++++++++++++++++--
 transport.c                            |   17 +---
 walker.c                               |    2 +-
 20 files changed, 325 insertions(+), 185 deletions(-)

-- 
1.7.6.8.gd2879

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/7] t1402: add some more tests
  2011-09-10  6:50 [PATCH v2 0/7] Improved infrastructure for refname normalization Michael Haggerty
@ 2011-09-10  6:50 ` Michael Haggerty
  2011-09-10  6:50 ` [PATCH v2 2/7] Change bad_ref_char() to return a boolean value Michael Haggerty
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-10  6:50 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, A Large Angry SCM, Michael Haggerty


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 t/t1402-check-ref-format.sh |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index ed4275a..eeb0b1a 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -13,6 +13,8 @@ invalid_ref() {
 		"test_must_fail git check-ref-format '$1'"
 }
 
+invalid_ref ''
+invalid_ref '/'
 valid_ref 'heads/foo'
 invalid_ref 'foo'
 valid_ref 'foo/bar/baz'
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/7] Change bad_ref_char() to return a boolean value
  2011-09-10  6:50 [PATCH v2 0/7] Improved infrastructure for refname normalization Michael Haggerty
  2011-09-10  6:50 ` [PATCH v2 1/7] t1402: add some more tests Michael Haggerty
@ 2011-09-10  6:50 ` Michael Haggerty
  2011-09-10  6:50 ` [PATCH v2 3/7] git check-ref-format: add options --allow-onelevel and --refspec-pattern Michael Haggerty
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-10  6:50 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, A Large Angry SCM, Michael Haggerty

Previously most bad characters were indicated by returning 1, but "*"
was special-cased to return 2 instead of 1.  One caller examined the
return value to see whether the special case occurred.

But it is easier (to document and understand) for bad_ref_char()
simply to return a boolean value, treating "*" like any other bad
character.  Special-case the handling of "*" (which only occurs in
very specific circumstances) at the caller.  The resulting calling
code thereby also becomes more transparent.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---

This is just a random refactoring.

 refs.c |   15 ++++++---------
 1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/refs.c b/refs.c
index a615043..fd29d89 100644
--- a/refs.c
+++ b/refs.c
@@ -860,22 +860,21 @@ int for_each_rawref(each_ref_fn fn, void *cb_data)
  * - it contains a "\" (backslash)
  */
 
+/* Return true iff ch is not allowed in reference names. */
 static inline int bad_ref_char(int ch)
 {
 	if (((unsigned) ch) <= ' ' || ch == 0x7f ||
 	    ch == '~' || ch == '^' || ch == ':' || ch == '\\')
 		return 1;
 	/* 2.13 Pattern Matching Notation */
-	if (ch == '?' || ch == '[') /* Unsupported */
+	if (ch == '*' || ch == '?' || ch == '[') /* Unsupported */
 		return 1;
-	if (ch == '*') /* Supported at the end */
-		return 2;
 	return 0;
 }
 
 int check_ref_format(const char *ref)
 {
-	int ch, level, bad_type, last;
+	int ch, level, last;
 	int ret = CHECK_REF_FORMAT_OK;
 	const char *cp = ref;
 
@@ -890,9 +889,8 @@ int check_ref_format(const char *ref)
 		/* we are at the beginning of the path component */
 		if (ch == '.')
 			return CHECK_REF_FORMAT_ERROR;
-		bad_type = bad_ref_char(ch);
-		if (bad_type) {
-			if (bad_type == 2 && (!*cp || *cp == '/') &&
+		if (bad_ref_char(ch)) {
+			if (ch == '*' && (!*cp || *cp == '/') &&
 			    ret == CHECK_REF_FORMAT_OK)
 				ret = CHECK_REF_FORMAT_WILDCARD;
 			else
@@ -902,8 +900,7 @@ int check_ref_format(const char *ref)
 		last = ch;
 		/* scan the rest of the path component */
 		while ((ch = *cp++) != 0) {
-			bad_type = bad_ref_char(ch);
-			if (bad_type)
+			if (bad_ref_char(ch))
 				return CHECK_REF_FORMAT_ERROR;
 			if (ch == '/')
 				break;
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/7] git check-ref-format: add options --allow-onelevel and --refspec-pattern
  2011-09-10  6:50 [PATCH v2 0/7] Improved infrastructure for refname normalization Michael Haggerty
  2011-09-10  6:50 ` [PATCH v2 1/7] t1402: add some more tests Michael Haggerty
  2011-09-10  6:50 ` [PATCH v2 2/7] Change bad_ref_char() to return a boolean value Michael Haggerty
@ 2011-09-10  6:50 ` Michael Haggerty
  2011-09-10  6:50 ` [PATCH v2 4/7] Change check_ref_format() to take a flags argument Michael Haggerty
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-10  6:50 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, A Large Angry SCM, Michael Haggerty

Also add tests of the new options.  (Actually, one big reason to add
the new options is to make it easy to test check_ref_format(), though
the options should also be useful to other scripts.)

Interpret the result of check_ref_format() based on which types of
refnames are allowed.  However, because check_ref_format() can only
return a single value, one test case is still broken.  Specifically,
the case "git check-ref-format --onelevel '*'" incorrectly succeeds
because check_ref_format() returns CHECK_REF_FORMAT_ONELEVEL for this
refname even though the refname is also CHECK_REF_FORMAT_WILDCARD.
The type of check that leads to this failure is used elsewhere in
"real" code and could lead to bugs; it will be fixed over the next few
commits.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---

The failing test is expressed awkwardly in
t/t1402-check-ref-format.sh, but since it will be fixed in the next
commit it doesn't seem worth investing any work in it.

 Documentation/git-check-ref-format.txt |   29 +++++++++--
 builtin/check-ref-format.c             |   56 +++++++++++++++++---
 t/t1402-check-ref-format.sh            |   88 +++++++++++++++++++++++++++++---
 3 files changed, 152 insertions(+), 21 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt
index c9fdf84..3ab22b9 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -8,8 +8,8 @@ git-check-ref-format - Ensures that a reference name is well formed
 SYNOPSIS
 --------
 [verse]
-'git check-ref-format' <refname>
-'git check-ref-format' --print <refname>
+'git check-ref-format' [--print]
+       [--[no-]allow-onelevel] [--refspec-pattern] <refname>
 'git check-ref-format' --branch <branchname-shorthand>
 
 DESCRIPTION
@@ -32,14 +32,18 @@ git imposes the following rules on how references are named:
 
 . They must contain at least one `/`. This enforces the presence of a
   category like `heads/`, `tags/` etc. but the actual names are not
-  restricted.
+  restricted.  If the `--allow-onelevel` option is used, this rule
+  is waived.
 
 . They cannot have two consecutive dots `..` anywhere.
 
 . They cannot have ASCII control characters (i.e. bytes whose
   values are lower than \040, or \177 `DEL`), space, tilde `~`,
-  caret `{caret}`, colon `:`, question-mark `?`, asterisk `*`,
-  or open bracket `[` anywhere.
+  caret `{caret}`, or colon `:` anywhere.
+
+. They cannot have question-mark `?`, asterisk `*`, or open bracket
+  `[` anywhere.  See the `--refspec-pattern` option below for an
+  exception to this rule.
 
 . They cannot end with a slash `/` nor a dot `.`.
 
@@ -78,6 +82,21 @@ were on.  This option should be used by porcelains to accept this
 syntax anywhere a branch name is expected, so they can act as if you
 typed the branch name.
 
+OPTIONS
+-------
+--allow-onelevel::
+--no-allow-onelevel::
+	Controls whether one-level refnames are accepted (i.e.,
+	refnames that do not contain multiple `/`-separated
+	components).  The default is `--no-allow-onelevel`.
+
+--refspec-pattern::
+	Interpret <refname> as a reference name pattern for a refspec
+	(as used with remote repositories).  If this option is
+	enabled, <refname> is allowed to contain a single `*` in place
+	of a one full pathname component (e.g., `foo/*/bar` but not
+	`foo/bar*`).
+
 EXAMPLES
 --------
 
diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 0723cf2..6bb9377 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -8,7 +8,7 @@
 #include "strbuf.h"
 
 static const char builtin_check_ref_format_usage[] =
-"git check-ref-format [--print] <refname>\n"
+"git check-ref-format [--print] [options] <refname>\n"
 "   or: git check-ref-format --branch <branchname-shorthand>";
 
 /*
@@ -45,27 +45,65 @@ static int check_ref_format_branch(const char *arg)
 	return 0;
 }
 
-static int check_ref_format_print(const char *arg)
+static void refname_format_print(const char *arg)
 {
 	char *refname = xmalloc(strlen(arg) + 1);
 
-	if (check_ref_format(arg))
-		return 1;
 	collapse_slashes(refname, arg);
 	printf("%s\n", refname);
-	return 0;
 }
 
+#define REFNAME_ALLOW_ONELEVEL 1
+#define REFNAME_REFSPEC_PATTERN 2
+
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
+	int i;
+	int print = 0;
+	int flags = 0;
+
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(builtin_check_ref_format_usage);
 
 	if (argc == 3 && !strcmp(argv[1], "--branch"))
 		return check_ref_format_branch(argv[2]);
-	if (argc == 3 && !strcmp(argv[1], "--print"))
-		return check_ref_format_print(argv[2]);
-	if (argc != 2)
+
+	for (i = 1; i < argc && argv[i][0] == '-'; ++i) {
+		if (!strcmp(argv[i], "--print"))
+			print = 1;
+		else if (!strcmp(argv[i], "--allow-onelevel"))
+			flags |= REFNAME_ALLOW_ONELEVEL;
+		else if (!strcmp(argv[i], "--no-allow-onelevel"))
+			flags &= ~REFNAME_ALLOW_ONELEVEL;
+		else if (!strcmp(argv[i], "--refspec-pattern"))
+			flags |= REFNAME_REFSPEC_PATTERN;
+		else
+			usage(builtin_check_ref_format_usage);
+	}
+	if (! (i == argc - 1))
 		usage(builtin_check_ref_format_usage);
-	return !!check_ref_format(argv[1]);
+
+	switch (check_ref_format(argv[i])) {
+	case CHECK_REF_FORMAT_OK:
+		break;
+	case CHECK_REF_FORMAT_ERROR:
+		return 1;
+	case CHECK_REF_FORMAT_ONELEVEL:
+		if (!(flags & REFNAME_ALLOW_ONELEVEL))
+			return 1;
+		else
+			break;
+	case CHECK_REF_FORMAT_WILDCARD:
+		if (!(flags & REFNAME_REFSPEC_PATTERN))
+			return 1;
+		else
+			break;
+	default:
+		die("internal error: unexpected value from check_ref_format()");
+	}
+
+	if (print)
+		refname_format_print(argv[i]);
+
+	return 0;
 }
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index eeb0b1a..795301d 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -5,25 +5,38 @@ test_description='Test git check-ref-format'
 . ./test-lib.sh
 
 valid_ref() {
-	test_expect_success "ref name '$1' is valid" \
-		"git check-ref-format '$1'"
+	if test "$#" = 1
+	then
+		test_expect_success "ref name '$1' is valid" \
+			"git check-ref-format '$1'"
+	else
+		test_expect_success "ref name '$1' is valid with options $2" \
+			"git check-ref-format $2 '$1'"
+	fi
 }
 invalid_ref() {
-	test_expect_success "ref name '$1' is not valid" \
-		"test_must_fail git check-ref-format '$1'"
+	if test "$#" = 1
+	then
+		test_expect_success "ref name '$1' is invalid" \
+			"test_must_fail git check-ref-format '$1'"
+	else
+		test_expect_success "ref name '$1' is invalid with options $2" \
+			"test_must_fail git check-ref-format $2 '$1'"
+	fi
 }
 
 invalid_ref ''
 invalid_ref '/'
-valid_ref 'heads/foo'
-invalid_ref 'foo'
+invalid_ref '/' --allow-onelevel
 valid_ref 'foo/bar/baz'
 valid_ref 'refs///heads/foo'
 invalid_ref 'heads/foo/'
 valid_ref '/heads/foo'
 valid_ref '///heads/foo'
-invalid_ref '/foo'
 invalid_ref './foo'
+invalid_ref './foo/bar'
+invalid_ref 'foo/./bar'
+invalid_ref 'foo/bar/.'
 invalid_ref '.refs/foo'
 invalid_ref 'heads/foo..bar'
 invalid_ref 'heads/foo?bar'
@@ -35,6 +48,67 @@ invalid_ref 'heads/foo\bar'
 invalid_ref "$(printf 'heads/foo\t')"
 invalid_ref "$(printf 'heads/foo\177')"
 valid_ref "$(printf 'heads/fu\303\237')"
+invalid_ref 'heads/*foo/bar' --refspec-pattern
+invalid_ref 'heads/foo*/bar' --refspec-pattern
+invalid_ref 'heads/f*o/bar' --refspec-pattern
+
+ref='foo'
+invalid_ref "$ref"
+valid_ref "$ref" --allow-onelevel
+invalid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='foo/bar'
+valid_ref "$ref"
+valid_ref "$ref" --allow-onelevel
+valid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='foo/*'
+invalid_ref "$ref"
+invalid_ref "$ref" --allow-onelevel
+valid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='*/foo'
+invalid_ref "$ref"
+invalid_ref "$ref" --allow-onelevel
+valid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='foo/*/bar'
+invalid_ref "$ref"
+invalid_ref "$ref" --allow-onelevel
+valid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='*'
+invalid_ref "$ref"
+
+#invalid_ref "$ref" --allow-onelevel
+test_expect_failure "ref name '$ref' is invalid with options --allow-onelevel" \
+	"test_must_fail git check-ref-format --allow-onelevel '$ref'"
+
+invalid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='foo/*/*'
+invalid_ref "$ref" --refspec-pattern
+invalid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='*/foo/*'
+invalid_ref "$ref" --refspec-pattern
+invalid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='*/*/foo'
+invalid_ref "$ref" --refspec-pattern
+invalid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='/foo'
+invalid_ref "$ref"
+valid_ref "$ref" --allow-onelevel
+invalid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
 
 test_expect_success "check-ref-format --branch @{-1}" '
 	T=$(git write-tree) &&
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/7] Change check_ref_format() to take a flags argument
  2011-09-10  6:50 [PATCH v2 0/7] Improved infrastructure for refname normalization Michael Haggerty
                   ` (2 preceding siblings ...)
  2011-09-10  6:50 ` [PATCH v2 3/7] git check-ref-format: add options --allow-onelevel and --refspec-pattern Michael Haggerty
@ 2011-09-10  6:50 ` Michael Haggerty
  2011-09-10  6:50 ` [PATCH v2 5/7] Add a library function normalize_refname() Michael Haggerty
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-10  6:50 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, A Large Angry SCM, Michael Haggerty

Change check_ref_format() to take a flags argument that indicates what
is acceptable in the reference name (analogous to --allow-onelevel and
--refspec-pattern).  This is more convenient for callers and also
fixes a failure in the test suite (and likely elsewhere in the code)
by enabling "onelevel" and "refspec-pattern" to be allowed
independently of each other.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/check-ref-format.c  |   21 +----------------
 builtin/checkout.c          |    2 +-
 builtin/fetch-pack.c        |    2 +-
 builtin/receive-pack.c      |    2 +-
 builtin/replace.c           |    2 +-
 builtin/show-ref.c          |    2 +-
 builtin/tag.c               |    4 +-
 connect.c                   |    2 +-
 environment.c               |    2 +-
 fast-import.c               |    7 +-----
 notes-merge.c               |    5 ++-
 pack-refs.c                 |    2 +-
 refs.c                      |   42 +++++++++++++++------------------
 refs.h                      |   17 +++++++++----
 remote.c                    |   53 +++++++++++-------------------------------
 sha1_name.c                 |    4 +-
 t/t1402-check-ref-format.sh |    6 +----
 transport.c                 |   15 ++---------
 walker.c                    |    2 +-
 19 files changed, 67 insertions(+), 125 deletions(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 6bb9377..c639400 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -53,9 +53,6 @@ static void refname_format_print(const char *arg)
 	printf("%s\n", refname);
 }
 
-#define REFNAME_ALLOW_ONELEVEL 1
-#define REFNAME_REFSPEC_PATTERN 2
-
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
 	int i;
@@ -83,24 +80,8 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 	if (! (i == argc - 1))
 		usage(builtin_check_ref_format_usage);
 
-	switch (check_ref_format(argv[i])) {
-	case CHECK_REF_FORMAT_OK:
-		break;
-	case CHECK_REF_FORMAT_ERROR:
+	if (check_ref_format(argv[i], flags))
 		return 1;
-	case CHECK_REF_FORMAT_ONELEVEL:
-		if (!(flags & REFNAME_ALLOW_ONELEVEL))
-			return 1;
-		else
-			break;
-	case CHECK_REF_FORMAT_WILDCARD:
-		if (!(flags & REFNAME_REFSPEC_PATTERN))
-			return 1;
-		else
-			break;
-	default:
-		die("internal error: unexpected value from check_ref_format()");
-	}
 
 	if (print)
 		refname_format_print(argv[i]);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 3bb6525..2882116 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -882,7 +882,7 @@ static int parse_branchname_arg(int argc, const char **argv,
 	new->name = arg;
 	setup_branch_path(new);
 
-	if (check_ref_format(new->path) == CHECK_REF_FORMAT_OK &&
+	if (!check_ref_format(new->path, 0) &&
 	    resolve_ref(new->path, branch_rev, 1, NULL))
 		hashcpy(rev, branch_rev);
 	else
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 412bd32..411aa7d 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -544,7 +544,7 @@ static void filter_refs(struct ref **refs, int nr_match, char **match)
 	for (ref = *refs; ref; ref = next) {
 		next = ref->next;
 		if (!memcmp(ref->name, "refs/", 5) &&
-		    check_ref_format(ref->name + 5))
+		    check_ref_format(ref->name + 5, 0))
 			; /* trash */
 		else if (args.fetch_all &&
 			 (!args.depth || prefixcmp(ref->name, "refs/tags/") )) {
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index ae164da..4e880ef 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -356,7 +356,7 @@ static const char *update(struct command *cmd)
 	struct ref_lock *lock;
 
 	/* only refs/... are allowed */
-	if (prefixcmp(name, "refs/") || check_ref_format(name + 5)) {
+	if (prefixcmp(name, "refs/") || check_ref_format(name + 5, 0)) {
 		rp_error("refusing to create funny ref '%s' remotely", name);
 		return "funny refname";
 	}
diff --git a/builtin/replace.c b/builtin/replace.c
index fe3a647..15f0e5e 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -94,7 +94,7 @@ static int replace_object(const char *object_ref, const char *replace_ref,
 		     "refs/replace/%s",
 		     sha1_to_hex(object)) > sizeof(ref) - 1)
 		die("replace ref name too long: %.*s...", 50, ref);
-	if (check_ref_format(ref))
+	if (check_ref_format(ref, 0))
 		die("'%s' is not a valid ref name.", ref);
 
 	if (!resolve_ref(ref, prev, 1, NULL))
diff --git a/builtin/show-ref.c b/builtin/show-ref.c
index 45f0340..375a14b 100644
--- a/builtin/show-ref.c
+++ b/builtin/show-ref.c
@@ -145,7 +145,7 @@ static int exclude_existing(const char *match)
 			if (strncmp(ref, match, matchlen))
 				continue;
 		}
-		if (check_ref_format(ref)) {
+		if (check_ref_format(ref, 0)) {
 			warning("ref '%s' ignored", ref);
 			continue;
 		}
diff --git a/builtin/tag.c b/builtin/tag.c
index 667515e..7aceaab 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -407,12 +407,12 @@ static int parse_msg_arg(const struct option *opt, const char *arg, int unset)
 static int strbuf_check_tag_ref(struct strbuf *sb, const char *name)
 {
 	if (name[0] == '-')
-		return CHECK_REF_FORMAT_ERROR;
+		return -1;
 
 	strbuf_reset(sb);
 	strbuf_addf(sb, "refs/tags/%s", name);
 
-	return check_ref_format(sb->buf);
+	return check_ref_format(sb->buf, 0);
 }
 
 int cmd_tag(int argc, const char **argv, const char *prefix)
diff --git a/connect.c b/connect.c
index ee1d4b4..292a9e2 100644
--- a/connect.c
+++ b/connect.c
@@ -22,7 +22,7 @@ static int check_ref(const char *name, int len, unsigned int flags)
 	len -= 5;
 
 	/* REF_NORMAL means that we don't want the magic fake tag refs */
-	if ((flags & REF_NORMAL) && check_ref_format(name) < 0)
+	if ((flags & REF_NORMAL) && check_ref_format(name, 0))
 		return 0;
 
 	/* REF_HEADS means that we want regular branch heads */
diff --git a/environment.c b/environment.c
index e96edcf..8acbb87 100644
--- a/environment.c
+++ b/environment.c
@@ -106,7 +106,7 @@ static char *expand_namespace(const char *raw_namespace)
 		if (strcmp((*c)->buf, "/") != 0)
 			strbuf_addf(&buf, "refs/namespaces/%s", (*c)->buf);
 	strbuf_list_free(components);
-	if (check_ref_format(buf.buf) != CHECK_REF_FORMAT_OK)
+	if (check_ref_format(buf.buf, 0))
 		die("bad git namespace path \"%s\"", raw_namespace);
 	strbuf_addch(&buf, '/');
 	return strbuf_detach(&buf, NULL);
diff --git a/fast-import.c b/fast-import.c
index 742e7da..4d55ee6 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -722,13 +722,8 @@ static struct branch *new_branch(const char *name)
 
 	if (b)
 		die("Invalid attempt to create duplicate branch: %s", name);
-	switch (check_ref_format(name)) {
-	case 0: break; /* its valid */
-	case CHECK_REF_FORMAT_ONELEVEL:
-		break; /* valid, but too few '/', allow anyway */
-	default:
+	if (check_ref_format(name, REFNAME_ALLOW_ONELEVEL))
 		die("Branch name doesn't conform to GIT standards: %s", name);
-	}
 
 	b = pool_calloc(1, sizeof(struct branch));
 	b->name = pool_strdup(name);
diff --git a/notes-merge.c b/notes-merge.c
index e1aaf43..bb8d7c8 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -570,7 +570,8 @@ int notes_merge(struct notes_merge_options *o,
 	/* Dereference o->local_ref into local_sha1 */
 	if (!resolve_ref(o->local_ref, local_sha1, 0, NULL))
 		die("Failed to resolve local notes ref '%s'", o->local_ref);
-	else if (!check_ref_format(o->local_ref) && is_null_sha1(local_sha1))
+	else if (!check_ref_format(o->local_ref, 0) &&
+		is_null_sha1(local_sha1))
 		local = NULL; /* local_sha1 == null_sha1 indicates unborn ref */
 	else if (!(local = lookup_commit_reference(local_sha1)))
 		die("Could not parse local commit %s (%s)",
@@ -583,7 +584,7 @@ int notes_merge(struct notes_merge_options *o,
 		 * Failed to get remote_sha1. If o->remote_ref looks like an
 		 * unborn ref, perform the merge using an empty notes tree.
 		 */
-		if (!check_ref_format(o->remote_ref)) {
+		if (!check_ref_format(o->remote_ref, 0)) {
 			hashclr(remote_sha1);
 			remote = NULL;
 		} else {
diff --git a/pack-refs.c b/pack-refs.c
index 1290570..bc0032d 100644
--- a/pack-refs.c
+++ b/pack-refs.c
@@ -72,7 +72,7 @@ static void try_remove_empty_parents(char *name)
 	for (i = 0; i < 2; i++) { /* refs/{heads,tags,...}/ */
 		while (*p && *p != '/')
 			p++;
-		/* tolerate duplicate slashes; see check_ref_format() */
+		/* tolerate duplicate slashes; see normalize_refname() */
 		while (*p == '/')
 			p++;
 	}
diff --git a/refs.c b/refs.c
index fd29d89..a206a4c 100644
--- a/refs.c
+++ b/refs.c
@@ -872,10 +872,9 @@ static inline int bad_ref_char(int ch)
 	return 0;
 }
 
-int check_ref_format(const char *ref)
+int check_ref_format(const char *ref, int flags)
 {
 	int ch, level, last;
-	int ret = CHECK_REF_FORMAT_OK;
 	const char *cp = ref;
 
 	level = 0;
@@ -884,41 +883,42 @@ int check_ref_format(const char *ref)
 			; /* tolerate duplicated slashes */
 		if (!ch)
 			/* should not end with slashes */
-			return CHECK_REF_FORMAT_ERROR;
+			return -1;
 
 		/* we are at the beginning of the path component */
 		if (ch == '.')
-			return CHECK_REF_FORMAT_ERROR;
+			return -1;
 		if (bad_ref_char(ch)) {
-			if (ch == '*' && (!*cp || *cp == '/') &&
-			    ret == CHECK_REF_FORMAT_OK)
-				ret = CHECK_REF_FORMAT_WILDCARD;
+			if ((flags & REFNAME_REFSPEC_PATTERN) && ch == '*' &&
+				(!*cp || *cp == '/'))
+				/* Accept one wildcard as a full refname component. */
+				flags &= ~REFNAME_REFSPEC_PATTERN;
 			else
-				return CHECK_REF_FORMAT_ERROR;
+				return -1;
 		}
 
 		last = ch;
 		/* scan the rest of the path component */
 		while ((ch = *cp++) != 0) {
 			if (bad_ref_char(ch))
-				return CHECK_REF_FORMAT_ERROR;
+				return -1;
 			if (ch == '/')
 				break;
 			if (last == '.' && ch == '.')
-				return CHECK_REF_FORMAT_ERROR;
+				return -1;
 			if (last == '@' && ch == '{')
-				return CHECK_REF_FORMAT_ERROR;
+				return -1;
 			last = ch;
 		}
 		level++;
 		if (!ch) {
 			if (ref <= cp - 2 && cp[-2] == '.')
-				return CHECK_REF_FORMAT_ERROR;
-			if (level < 2)
-				return CHECK_REF_FORMAT_ONELEVEL;
+				return -1;
+			if (level < 2 && !(flags & REFNAME_ALLOW_ONELEVEL))
+				return -1;
 			if (has_extension(ref, ".lock"))
-				return CHECK_REF_FORMAT_ERROR;
-			return ret;
+				return -1;
+			return 0;
 		}
 	}
 }
@@ -1103,7 +1103,7 @@ static struct ref_lock *lock_ref_sha1_basic(const char *ref, const unsigned char
 struct ref_lock *lock_ref_sha1(const char *ref, const unsigned char *old_sha1)
 {
 	char refpath[PATH_MAX];
-	if (check_ref_format(ref))
+	if (check_ref_format(ref, 0))
 		return NULL;
 	strcpy(refpath, mkpath("refs/%s", ref));
 	return lock_ref_sha1_basic(refpath, old_sha1, 0, NULL);
@@ -1111,13 +1111,9 @@ struct ref_lock *lock_ref_sha1(const char *ref, const unsigned char *old_sha1)
 
 struct ref_lock *lock_any_ref_for_update(const char *ref, const unsigned char *old_sha1, int flags)
 {
-	switch (check_ref_format(ref)) {
-	default:
+	if (check_ref_format(ref, REFNAME_ALLOW_ONELEVEL))
 		return NULL;
-	case 0:
-	case CHECK_REF_FORMAT_ONELEVEL:
-		return lock_ref_sha1_basic(ref, old_sha1, flags, NULL);
-	}
+	return lock_ref_sha1_basic(ref, old_sha1, flags, NULL);
 }
 
 static struct lock_file packlock;
diff --git a/refs.h b/refs.h
index dfb086e..b248ce6 100644
--- a/refs.h
+++ b/refs.h
@@ -97,11 +97,18 @@ int for_each_recent_reflog_ent(const char *ref, each_reflog_ent_fn fn, long, voi
  */
 extern int for_each_reflog(each_ref_fn, void *);
 
-#define CHECK_REF_FORMAT_OK 0
-#define CHECK_REF_FORMAT_ERROR (-1)
-#define CHECK_REF_FORMAT_ONELEVEL (-2)
-#define CHECK_REF_FORMAT_WILDCARD (-3)
-extern int check_ref_format(const char *target);
+#define REFNAME_ALLOW_ONELEVEL 1
+#define REFNAME_REFSPEC_PATTERN 2
+
+/*
+ * Return 0 iff ref has the correct format for a refname according to
+ * the rules described in Documentation/git-check-ref-format.txt.  If
+ * REFNAME_ALLOW_ONELEVEL is set in flags, then accept one-level
+ * reference names.  If REFNAME_REFSPEC_PATTERN is set in flags, then
+ * allow a "*" wildcard character in place of one of the name
+ * components.
+ */
+extern int check_ref_format(const char *target, int flags);
 
 extern const char *prettify_refname(const char *refname);
 extern char *shorten_unambiguous_ref(const char *ref, int strict);
diff --git a/remote.c b/remote.c
index b8ecfa5..7059885 100644
--- a/remote.c
+++ b/remote.c
@@ -493,23 +493,6 @@ static void read_config(void)
 }
 
 /*
- * We need to make sure the remote-tracking branches are well formed, but a
- * wildcard refspec in "struct refspec" must have a trailing slash. We
- * temporarily drop the trailing '/' while calling check_ref_format(),
- * and put it back.  The caller knows that a CHECK_REF_FORMAT_ONELEVEL
- * error return is Ok for a wildcard refspec.
- */
-static int verify_refname(char *name, int is_glob)
-{
-	int result;
-
-	result = check_ref_format(name);
-	if (is_glob && result == CHECK_REF_FORMAT_WILDCARD)
-		result = CHECK_REF_FORMAT_OK;
-	return result;
-}
-
-/*
  * This function frees a refspec array.
  * Warning: code paths should be checked to ensure that the src
  *          and dst pointers are always freeable pointers as well
@@ -532,13 +515,13 @@ static void free_refspecs(struct refspec *refspec, int nr_refspec)
 static struct refspec *parse_refspec_internal(int nr_refspec, const char **refspec, int fetch, int verify)
 {
 	int i;
-	int st;
 	struct refspec *rs = xcalloc(sizeof(*rs), nr_refspec);
 
 	for (i = 0; i < nr_refspec; i++) {
 		size_t llen;
 		int is_glob;
 		const char *lhs, *rhs;
+		int flags;
 
 		is_glob = 0;
 
@@ -576,6 +559,7 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 
 		rs[i].pattern = is_glob;
 		rs[i].src = xstrndup(lhs, llen);
+		flags = REFNAME_ALLOW_ONELEVEL | (is_glob ? REFNAME_REFSPEC_PATTERN : 0);
 
 		if (fetch) {
 			/*
@@ -585,26 +569,20 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 			 */
 			if (!*rs[i].src)
 				; /* empty is ok */
-			else {
-				st = verify_refname(rs[i].src, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
-					goto invalid;
-			}
+			else if (check_ref_format(rs[i].src, flags))
+				goto invalid;
 			/*
 			 * RHS
 			 * - missing is ok, and is same as empty.
 			 * - empty is ok; it means not to store.
 			 * - otherwise it must be a valid looking ref.
 			 */
-			if (!rs[i].dst) {
+			if (!rs[i].dst)
 				; /* ok */
-			} else if (!*rs[i].dst) {
+			else if (!*rs[i].dst)
 				; /* ok */
-			} else {
-				st = verify_refname(rs[i].dst, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
-					goto invalid;
-			}
+			else if (check_ref_format(rs[i].dst, flags))
+				goto invalid;
 		} else {
 			/*
 			 * LHS
@@ -616,8 +594,7 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 			if (!*rs[i].src)
 				; /* empty is ok */
 			else if (is_glob) {
-				st = verify_refname(rs[i].src, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
+				if (check_ref_format(rs[i].src, flags))
 					goto invalid;
 			}
 			else
@@ -630,14 +607,12 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 			 * - otherwise it must be a valid looking ref.
 			 */
 			if (!rs[i].dst) {
-				st = verify_refname(rs[i].src, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
+				if (check_ref_format(rs[i].src, flags))
 					goto invalid;
 			} else if (!*rs[i].dst) {
 				goto invalid;
 			} else {
-				st = verify_refname(rs[i].dst, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
+				if (check_ref_format(rs[i].dst, flags))
 					goto invalid;
 			}
 		}
@@ -1427,8 +1402,8 @@ int get_fetch_map(const struct ref *remote_refs,
 
 	for (rmp = &ref_map; *rmp; ) {
 		if ((*rmp)->peer_ref) {
-			int st = check_ref_format((*rmp)->peer_ref->name + 5);
-			if (st && st != CHECK_REF_FORMAT_ONELEVEL) {
+			if (check_ref_format((*rmp)->peer_ref->name + 5,
+				REFNAME_ALLOW_ONELEVEL)) {
 				struct ref *ignore = *rmp;
 				error("* Ignoring funny ref '%s' locally",
 				      (*rmp)->peer_ref->name);
@@ -1620,7 +1595,7 @@ static int one_local_ref(const char *refname, const unsigned char *sha1, int fla
 	int len;
 
 	/* we already know it starts with refs/ to get here */
-	if (check_ref_format(refname + 5))
+	if (check_ref_format(refname + 5, 0))
 		return 0;
 
 	len = strlen(refname) + 1;
diff --git a/sha1_name.c b/sha1_name.c
index ff5992a..975ec3b 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -972,9 +972,9 @@ int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
 {
 	strbuf_branchname(sb, name);
 	if (name[0] == '-')
-		return CHECK_REF_FORMAT_ERROR;
+		return -1;
 	strbuf_splice(sb, 0, 0, "refs/heads/", 11);
-	return check_ref_format(sb->buf);
+	return check_ref_format(sb->buf, 0);
 }
 
 /*
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index 795301d..d7e8d90 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -84,11 +84,7 @@ valid_ref "$ref" '--refspec-pattern --allow-onelevel'
 
 ref='*'
 invalid_ref "$ref"
-
-#invalid_ref "$ref" --allow-onelevel
-test_expect_failure "ref name '$ref' is invalid with options --allow-onelevel" \
-	"test_must_fail git check-ref-format --allow-onelevel '$ref'"
-
+invalid_ref "$ref" --allow-onelevel
 invalid_ref "$ref" --refspec-pattern
 valid_ref "$ref" '--refspec-pattern --allow-onelevel'
 
diff --git a/transport.c b/transport.c
index fa279d5..225d9b8 100644
--- a/transport.c
+++ b/transport.c
@@ -754,18 +754,9 @@ void transport_verify_remote_names(int nr_heads, const char **heads)
 			continue;
 
 		remote = remote ? (remote + 1) : local;
-		switch (check_ref_format(remote)) {
-		case 0: /* ok */
-		case CHECK_REF_FORMAT_ONELEVEL:
-			/* ok but a single level -- that is fine for
-			 * a match pattern.
-			 */
-		case CHECK_REF_FORMAT_WILDCARD:
-			/* ok but ends with a pattern-match character */
-			continue;
-		}
-		die("remote part of refspec is not a valid name in %s",
-		    heads[i]);
+		if (check_ref_format(remote, REFNAME_ALLOW_ONELEVEL|REFNAME_REFSPEC_PATTERN))
+			die("remote part of refspec is not a valid name in %s",
+				heads[i]);
 	}
 }
 
diff --git a/walker.c b/walker.c
index dce7128..e5d8eb2 100644
--- a/walker.c
+++ b/walker.c
@@ -190,7 +190,7 @@ static int interpret_target(struct walker *walker, char *target, unsigned char *
 {
 	if (!get_sha1_hex(target, sha1))
 		return 0;
-	if (!check_ref_format(target)) {
+	if (!check_ref_format(target, 0)) {
 		struct ref *ref = alloc_ref(target);
 		if (!walker->fetch_ref(walker, ref)) {
 			hashcpy(sha1, ref->old_sha1);
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 5/7] Add a library function normalize_refname()
  2011-09-10  6:50 [PATCH v2 0/7] Improved infrastructure for refname normalization Michael Haggerty
                   ` (3 preceding siblings ...)
  2011-09-10  6:50 ` [PATCH v2 4/7] Change check_ref_format() to take a flags argument Michael Haggerty
@ 2011-09-10  6:50 ` Michael Haggerty
  2011-09-10  6:50 ` [PATCH v2 6/7] Do not allow ".lock" at the end of any refname component Michael Haggerty
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-10  6:50 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, A Large Angry SCM, Michael Haggerty

Implement check_ref_format() using the new function.  Also use it to
replace collapse_slashes() in check-ref-format.c.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/check-ref-format.c  |   41 ++++-------------
 refs.c                      |  109 +++++++++++++++++++++++++++----------------
 refs.h                      |   24 +++++++---
 t/t1402-check-ref-format.sh |    3 +
 4 files changed, 98 insertions(+), 79 deletions(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index c639400..4c202af 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -11,28 +11,6 @@ static const char builtin_check_ref_format_usage[] =
 "git check-ref-format [--print] [options] <refname>\n"
 "   or: git check-ref-format --branch <branchname-shorthand>";
 
-/*
- * Remove leading slashes and replace each run of adjacent slashes in
- * src with a single slash, and write the result to dst.
- *
- * This function is similar to normalize_path_copy(), but stripped down
- * to meet check_ref_format's simpler needs.
- */
-static void collapse_slashes(char *dst, const char *src)
-{
-	char ch;
-	char prev = '/';
-
-	while ((ch = *src++) != '\0') {
-		if (prev == '/' && ch == prev)
-			continue;
-
-		*dst++ = ch;
-		prev = ch;
-	}
-	*dst = '\0';
-}
-
 static int check_ref_format_branch(const char *arg)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -45,12 +23,15 @@ static int check_ref_format_branch(const char *arg)
 	return 0;
 }
 
-static void refname_format_print(const char *arg)
+static int check_ref_format_print(const char *arg, int flags)
 {
-	char *refname = xmalloc(strlen(arg) + 1);
+	int refnamelen = strlen(arg) + 1;
+	char *refname = xmalloc(refnamelen);
 
-	collapse_slashes(refname, arg);
+	if (normalize_refname(refname, refnamelen, arg, flags))
+		return 1;
 	printf("%s\n", refname);
+	return 0;
 }
 
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
@@ -79,12 +60,8 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 	}
 	if (! (i == argc - 1))
 		usage(builtin_check_ref_format_usage);
-
-	if (check_ref_format(argv[i], flags))
-		return 1;
-
 	if (print)
-		refname_format_print(argv[i]);
-
-	return 0;
+		return check_ref_format_print(argv[i], flags);
+	else
+		return !!check_ref_format(argv[i], flags);
 }
diff --git a/refs.c b/refs.c
index a206a4c..372350e 100644
--- a/refs.c
+++ b/refs.c
@@ -872,55 +872,84 @@ static inline int bad_ref_char(int ch)
 	return 0;
 }
 
-int check_ref_format(const char *ref, int flags)
+int normalize_refname(char *dst, int dstlen, const char *ref, int flags)
 {
-	int ch, level, last;
-	const char *cp = ref;
-
-	level = 0;
-	while (1) {
-		while ((ch = *cp++) == '/')
-			; /* tolerate duplicated slashes */
-		if (!ch)
-			/* should not end with slashes */
-			return -1;
+	int ch, last, component_len, component_count = 0;
+	const char *cp = ref, *component;
 
-		/* we are at the beginning of the path component */
-		if (ch == '.')
-			return -1;
-		if (bad_ref_char(ch)) {
-			if ((flags & REFNAME_REFSPEC_PATTERN) && ch == '*' &&
-				(!*cp || *cp == '/'))
-				/* Accept one wildcard as a full refname component. */
-				flags &= ~REFNAME_REFSPEC_PATTERN;
-			else
-				return -1;
-		}
+	ch = *cp;
+	do {
+		while (ch == '/')
+			ch = *++cp; /* tolerate leading and repeated slashes */
 
-		last = ch;
-		/* scan the rest of the path component */
-		while ((ch = *cp++) != 0) {
-			if (bad_ref_char(ch))
-				return -1;
-			if (ch == '/')
-				break;
+		/*
+		 * We are at the start of a path component.  Record
+		 * its start for later reference.  If we are copying
+		 * to dst, use the copy there, because we might be
+		 * overwriting ref; otherwise, use the copy from the
+		 * input string.
+		 */
+		component = dst ? dst : cp;
+		component_len = 0;
+		last = '\0';
+		while (1) {
+			if (ch != 0 && bad_ref_char(ch)) {
+				if ((flags & REFNAME_REFSPEC_PATTERN) &&
+					ch == '*' &&
+					component_len == 0 &&
+					(cp[1] == 0 || cp[1] == '/')) {
+					/* Accept one wildcard as a full refname component. */
+					flags &= ~REFNAME_REFSPEC_PATTERN;
+				} else {
+					/* Illegal character in refname */
+					return -1;
+				}
+			}
 			if (last == '.' && ch == '.')
+				/* Refname must not contain "..". */
 				return -1;
 			if (last == '@' && ch == '{')
+				/* Refname must not contain "@{". */
 				return -1;
+			if (dst) {
+				if (dstlen-- <= 0)
+					/* Output array was too small. */
+					return -1;
+				*dst++ = ch;
+			}
+			if (ch == 0 || ch == '/')
+				break;
+			++component_len;
 			last = ch;
+			ch = *++cp;
 		}
-		level++;
-		if (!ch) {
-			if (ref <= cp - 2 && cp[-2] == '.')
-				return -1;
-			if (level < 2 && !(flags & REFNAME_ALLOW_ONELEVEL))
-				return -1;
-			if (has_extension(ref, ".lock"))
-				return -1;
-			return 0;
-		}
-	}
+
+		/* We are at the end of a path component. */
+		++component_count;
+		if (component_len == 0)
+			/* Either ref was zero length or it ended with slash. */
+			return -1;
+
+		if (component[0] == '.')
+			/* Components must not start with '.'. */
+			return -1;
+	} while (ch != 0);
+
+	if (last == '.')
+		/* Refname must not end with '.'. */
+		return -1;
+	if (component_len >= 5 && !memcmp(&component[component_len - 5], ".lock", 5))
+		/* Refname must not end with ".lock". */
+		return -1;
+	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
+		/* Refname must have at least two components. */
+		return -1;
+	return 0;
+}
+
+int check_ref_format(const char *ref, int flags)
+{
+	return normalize_refname(NULL, 0, ref, flags);
 }
 
 const char *prettify_refname(const char *name)
diff --git a/refs.h b/refs.h
index b248ce6..8a15f83 100644
--- a/refs.h
+++ b/refs.h
@@ -101,14 +101,24 @@ extern int for_each_reflog(each_ref_fn, void *);
 #define REFNAME_REFSPEC_PATTERN 2
 
 /*
- * Return 0 iff ref has the correct format for a refname according to
- * the rules described in Documentation/git-check-ref-format.txt.  If
- * REFNAME_ALLOW_ONELEVEL is set in flags, then accept one-level
- * reference names.  If REFNAME_REFSPEC_PATTERN is set in flags, then
- * allow a "*" wildcard character in place of one of the name
- * components.
+ * Check that ref is a valid refname according to the rules described
+ * in Documentation/git-check-ref-format.txt and normalize it by
+ * stripping out superfluous "/" characters.  If dst != NULL, write
+ * the normalized refname to dst, which must be an allocated character
+ * array with length dstlen (typically at least as long as ref).  dst
+ * may point at the same memory as ref.  Return 0 iff the refname was
+ * OK and fit into dst.  If REFNAME_ALLOW_ONELEVEL is set in flags,
+ * then accept one-level reference names.  If REFNAME_REFSPEC_PATTERN
+ * is set in flags, then allow a "*" wildcard characters in place of
+ * one of the name components.
  */
-extern int check_ref_format(const char *target, int flags);
+extern int normalize_refname(char *dst, int dstlen, const char *ref, int flags);
+
+/*
+ * Return 0 iff ref has the correct format for a refname.  See
+ * normalize_refname() for details.
+ */
+extern int check_ref_format(const char *ref, int flags);
 
 extern const char *prettify_refname(const char *refname);
 extern char *shorten_unambiguous_ref(const char *ref, int strict);
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index d7e8d90..b0b773b 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -42,6 +42,7 @@ invalid_ref 'heads/foo..bar'
 invalid_ref 'heads/foo?bar'
 valid_ref 'foo./bar'
 invalid_ref 'heads/foo.lock'
+invalid_ref 'heads///foo.lock'
 valid_ref 'heads/foo@bar'
 invalid_ref 'heads/v@{ation'
 invalid_ref 'heads/foo\bar'
@@ -155,5 +156,7 @@ invalid_ref_normalized '/foo'
 invalid_ref_normalized 'heads/foo/../bar'
 invalid_ref_normalized 'heads/./foo'
 invalid_ref_normalized 'heads\foo'
+invalid_ref_normalized 'heads/foo.lock'
+invalid_ref_normalized 'heads///foo.lock'
 
 test_done
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 6/7] Do not allow ".lock" at the end of any refname component
  2011-09-10  6:50 [PATCH v2 0/7] Improved infrastructure for refname normalization Michael Haggerty
                   ` (4 preceding siblings ...)
  2011-09-10  6:50 ` [PATCH v2 5/7] Add a library function normalize_refname() Michael Haggerty
@ 2011-09-10  6:50 ` Michael Haggerty
  2011-09-10  6:50 ` [PATCH v2 7/7] Add tools to avoid the use of unnormalized refnames Michael Haggerty
  2011-09-12  4:28 ` [PATCH v2 0/7] Improved infrastructure for refname normalization Junio C Hamano
  7 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-10  6:50 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, A Large Angry SCM, Michael Haggerty

Allowing any refname component to end with ".lock" is looking for
trouble; for example,

    $ git br foo.lock/bar
    $ git br foo
    fatal: Unable to create '[...]/.git/refs/heads/foo.lock': File exists.

Therefore, do not allow any refname component to end with ".lock".

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 Documentation/git-check-ref-format.txt |    4 +---
 refs.c                                 |    6 +++---
 t/t1402-check-ref-format.sh            |    4 ++++
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt
index 3ab22b9..f2d21c7 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -28,7 +28,7 @@ git imposes the following rules on how references are named:
 
 . They can include slash `/` for hierarchical (directory)
   grouping, but no slash-separated component can begin with a
-  dot `.`.
+  dot `.` or end with the sequence `.lock`.
 
 . They must contain at least one `/`. This enforces the presence of a
   category like `heads/`, `tags/` etc. but the actual names are not
@@ -47,8 +47,6 @@ git imposes the following rules on how references are named:
 
 . They cannot end with a slash `/` nor a dot `.`.
 
-. They cannot end with the sequence `.lock`.
-
 . They cannot contain a sequence `@{`.
 
 . They cannot contain a `\`.
diff --git a/refs.c b/refs.c
index 372350e..6985a3f 100644
--- a/refs.c
+++ b/refs.c
@@ -933,14 +933,14 @@ int normalize_refname(char *dst, int dstlen, const char *ref, int flags)
 		if (component[0] == '.')
 			/* Components must not start with '.'. */
 			return -1;
+		if (component_len >= 5 && !memcmp(&component[component_len - 5], ".lock", 5))
+			/* Components must not end with ".lock". */
+			return -1;
 	} while (ch != 0);
 
 	if (last == '.')
 		/* Refname must not end with '.'. */
 		return -1;
-	if (component_len >= 5 && !memcmp(&component[component_len - 5], ".lock", 5))
-		/* Refname must not end with ".lock". */
-		return -1;
 	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
 		/* Refname must have at least two components. */
 		return -1;
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index b0b773b..419788f 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -43,6 +43,8 @@ invalid_ref 'heads/foo?bar'
 valid_ref 'foo./bar'
 invalid_ref 'heads/foo.lock'
 invalid_ref 'heads///foo.lock'
+invalid_ref 'foo.lock/bar'
+invalid_ref 'foo.lock///bar'
 valid_ref 'heads/foo@bar'
 invalid_ref 'heads/v@{ation'
 invalid_ref 'heads/foo\bar'
@@ -158,5 +160,7 @@ invalid_ref_normalized 'heads/./foo'
 invalid_ref_normalized 'heads\foo'
 invalid_ref_normalized 'heads/foo.lock'
 invalid_ref_normalized 'heads///foo.lock'
+invalid_ref_normalized 'foo.lock/bar'
+invalid_ref_normalized 'foo.lock///bar'
 
 test_done
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 7/7] Add tools to avoid the use of unnormalized refnames.
  2011-09-10  6:50 [PATCH v2 0/7] Improved infrastructure for refname normalization Michael Haggerty
                   ` (5 preceding siblings ...)
  2011-09-10  6:50 ` [PATCH v2 6/7] Do not allow ".lock" at the end of any refname component Michael Haggerty
@ 2011-09-10  6:50 ` Michael Haggerty
  2011-09-12  4:28 ` [PATCH v2 0/7] Improved infrastructure for refname normalization Junio C Hamano
  7 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-10  6:50 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, A Large Angry SCM, Michael Haggerty

Change normalize_refname() to only allow unnormalized refnames if its
dst parameter is non-NULL.  Callers who want to support unnormalized
refnames will need the unnormalized copy, and callers unprepared to
deal with a normalized copy shouldn't pretend to accept unnormalized
refnames.

Change check_ref_format() to reject unnormalized refnames.

Add a new temporary function, check_ref_format_unsafe(), which accepts
unnormalized refnames like the old check_ref_format().  Change callers
to use this function so that their behavior is preserved, even though
in many cases it is brokenness that is being preserved.  Callers
should be fixed to use one of the new functions, then
check_ref_format_unsafe() should be removed.

Add options --normalize and --no-normalize to "git check-ref-format"
to access the new functionality and use the new options to add tests.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 Documentation/git-check-ref-format.txt |   13 ++++++++-
 builtin/check-ref-format.c             |   34 ++++++++++++++-----------
 builtin/checkout.c                     |    2 +-
 builtin/fetch-pack.c                   |    2 +-
 builtin/receive-pack.c                 |    2 +-
 builtin/replace.c                      |    2 +-
 builtin/show-ref.c                     |    2 +-
 builtin/tag.c                          |    2 +-
 connect.c                              |    2 +-
 environment.c                          |    2 +-
 fast-import.c                          |    2 +-
 notes-merge.c                          |    4 +-
 refs.c                                 |   17 ++++++++++---
 refs.h                                 |   41 +++++++++++++++++++++++--------
 remote.c                               |   17 +++++++------
 sha1_name.c                            |    2 +-
 t/t1402-check-ref-format.sh            |    3 ++
 transport.c                            |    4 ++-
 walker.c                               |    2 +-
 19 files changed, 101 insertions(+), 54 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt
index f2d21c7..25a7c43 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -9,7 +9,8 @@ SYNOPSIS
 --------
 [verse]
 'git check-ref-format' [--print]
-       [--[no-]allow-onelevel] [--refspec-pattern] <refname>
+       [--[no-]allow-onelevel] [--refspec-pattern] [--[no-]normalize]
+       <refname>
 'git check-ref-format' --branch <branchname-shorthand>
 
 DESCRIPTION
@@ -71,7 +72,7 @@ reference name expressions (see linkgit:gitrevisions[7]):
 . at-open-brace `@{` is used as a notation to access a reflog entry.
 
 With the `--print` option, if 'refname' is acceptable, it prints the
-canonicalized name of a hypothetical reference with that name.  That is,
+normalized name of a hypothetical reference with that name.  That is,
 it prints 'refname' with any extra `/` characters removed.
 
 With the `--branch` option, it expands the ``previous branch syntax''
@@ -95,6 +96,14 @@ OPTIONS
 	of a one full pathname component (e.g., `foo/*/bar` but not
 	`foo/bar*`).
 
+--normalize::
+--no-normalize::
+	Controls whether <refname> is allowed to contain extra `/`
+	characters at the beginning and between name components.  If
+	this option is set, such extra slashes are stripped out of the
+	value printed by the `--print` option.  The default is
+	`--normalize`.
+
 EXAMPLES
 --------
 
diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 4c202af..fc1ffd2 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -23,22 +23,12 @@ static int check_ref_format_branch(const char *arg)
 	return 0;
 }
 
-static int check_ref_format_print(const char *arg, int flags)
-{
-	int refnamelen = strlen(arg) + 1;
-	char *refname = xmalloc(refnamelen);
-
-	if (normalize_refname(refname, refnamelen, arg, flags))
-		return 1;
-	printf("%s\n", refname);
-	return 0;
-}
-
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
 	int i;
 	int print = 0;
 	int flags = 0;
+	int normalize = 1;
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(builtin_check_ref_format_usage);
@@ -55,13 +45,27 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 			flags &= ~REFNAME_ALLOW_ONELEVEL;
 		else if (!strcmp(argv[i], "--refspec-pattern"))
 			flags |= REFNAME_REFSPEC_PATTERN;
+		else if (!strcmp(argv[i], "--normalize"))
+			normalize = 1;
+		else if (!strcmp(argv[i], "--no-normalize"))
+			normalize = 0;
 		else
 			usage(builtin_check_ref_format_usage);
 	}
 	if (! (i == argc - 1))
 		usage(builtin_check_ref_format_usage);
-	if (print)
-		return check_ref_format_print(argv[i], flags);
-	else
-		return !!check_ref_format(argv[i], flags);
+	if (normalize) {
+		int refnamelen = strlen(argv[i]) + 1;
+		char *refname = xmalloc(refnamelen);
+		if (normalize_refname(refname, refnamelen, argv[i], flags))
+			return 1;
+		if (print)
+			printf("%s\n", refname);
+	} else {
+		if (check_ref_format(argv[i], flags))
+			return 1;
+		if (print)
+			printf("%s\n", argv[i]);
+	}
+	return 0;
 }
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2882116..cf16ac3 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -882,7 +882,7 @@ static int parse_branchname_arg(int argc, const char **argv,
 	new->name = arg;
 	setup_branch_path(new);
 
-	if (!check_ref_format(new->path, 0) &&
+	if (!check_ref_format_unsafe(new->path, 0) &&
 	    resolve_ref(new->path, branch_rev, 1, NULL))
 		hashcpy(rev, branch_rev);
 	else
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 411aa7d..640e1ac 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -544,7 +544,7 @@ static void filter_refs(struct ref **refs, int nr_match, char **match)
 	for (ref = *refs; ref; ref = next) {
 		next = ref->next;
 		if (!memcmp(ref->name, "refs/", 5) &&
-		    check_ref_format(ref->name + 5, 0))
+		    check_ref_format_unsafe(ref->name + 5, 0))
 			; /* trash */
 		else if (args.fetch_all &&
 			 (!args.depth || prefixcmp(ref->name, "refs/tags/") )) {
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 4e880ef..24d917b 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -356,7 +356,7 @@ static const char *update(struct command *cmd)
 	struct ref_lock *lock;
 
 	/* only refs/... are allowed */
-	if (prefixcmp(name, "refs/") || check_ref_format(name + 5, 0)) {
+	if (prefixcmp(name, "refs/") || check_ref_format_unsafe(name + 5, 0)) {
 		rp_error("refusing to create funny ref '%s' remotely", name);
 		return "funny refname";
 	}
diff --git a/builtin/replace.c b/builtin/replace.c
index 15f0e5e..ac6b154 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -94,7 +94,7 @@ static int replace_object(const char *object_ref, const char *replace_ref,
 		     "refs/replace/%s",
 		     sha1_to_hex(object)) > sizeof(ref) - 1)
 		die("replace ref name too long: %.*s...", 50, ref);
-	if (check_ref_format(ref, 0))
+	if (check_ref_format_unsafe(ref, 0))
 		die("'%s' is not a valid ref name.", ref);
 
 	if (!resolve_ref(ref, prev, 1, NULL))
diff --git a/builtin/show-ref.c b/builtin/show-ref.c
index 375a14b..ba5f66f 100644
--- a/builtin/show-ref.c
+++ b/builtin/show-ref.c
@@ -145,7 +145,7 @@ static int exclude_existing(const char *match)
 			if (strncmp(ref, match, matchlen))
 				continue;
 		}
-		if (check_ref_format(ref, 0)) {
+		if (check_ref_format_unsafe(ref, 0)) {
 			warning("ref '%s' ignored", ref);
 			continue;
 		}
diff --git a/builtin/tag.c b/builtin/tag.c
index 7aceaab..10ee87c 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -412,7 +412,7 @@ static int strbuf_check_tag_ref(struct strbuf *sb, const char *name)
 	strbuf_reset(sb);
 	strbuf_addf(sb, "refs/tags/%s", name);
 
-	return check_ref_format(sb->buf, 0);
+	return check_ref_format_unsafe(sb->buf, 0);
 }
 
 int cmd_tag(int argc, const char **argv, const char *prefix)
diff --git a/connect.c b/connect.c
index 292a9e2..3fea6b1 100644
--- a/connect.c
+++ b/connect.c
@@ -22,7 +22,7 @@ static int check_ref(const char *name, int len, unsigned int flags)
 	len -= 5;
 
 	/* REF_NORMAL means that we don't want the magic fake tag refs */
-	if ((flags & REF_NORMAL) && check_ref_format(name, 0))
+	if ((flags & REF_NORMAL) && check_ref_format_unsafe(name, 0))
 		return 0;
 
 	/* REF_HEADS means that we want regular branch heads */
diff --git a/environment.c b/environment.c
index 8acbb87..ec293e8 100644
--- a/environment.c
+++ b/environment.c
@@ -106,7 +106,7 @@ static char *expand_namespace(const char *raw_namespace)
 		if (strcmp((*c)->buf, "/") != 0)
 			strbuf_addf(&buf, "refs/namespaces/%s", (*c)->buf);
 	strbuf_list_free(components);
-	if (check_ref_format(buf.buf, 0))
+	if (check_ref_format_unsafe(buf.buf, 0))
 		die("bad git namespace path \"%s\"", raw_namespace);
 	strbuf_addch(&buf, '/');
 	return strbuf_detach(&buf, NULL);
diff --git a/fast-import.c b/fast-import.c
index 4d55ee6..70271f0 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -722,7 +722,7 @@ static struct branch *new_branch(const char *name)
 
 	if (b)
 		die("Invalid attempt to create duplicate branch: %s", name);
-	if (check_ref_format(name, REFNAME_ALLOW_ONELEVEL))
+	if (check_ref_format_unsafe(name, REFNAME_ALLOW_ONELEVEL))
 		die("Branch name doesn't conform to GIT standards: %s", name);
 
 	b = pool_calloc(1, sizeof(struct branch));
diff --git a/notes-merge.c b/notes-merge.c
index bb8d7c8..7e7dd3a 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -570,7 +570,7 @@ int notes_merge(struct notes_merge_options *o,
 	/* Dereference o->local_ref into local_sha1 */
 	if (!resolve_ref(o->local_ref, local_sha1, 0, NULL))
 		die("Failed to resolve local notes ref '%s'", o->local_ref);
-	else if (!check_ref_format(o->local_ref, 0) &&
+	else if (!check_ref_format_unsafe(o->local_ref, 0) &&
 		is_null_sha1(local_sha1))
 		local = NULL; /* local_sha1 == null_sha1 indicates unborn ref */
 	else if (!(local = lookup_commit_reference(local_sha1)))
@@ -584,7 +584,7 @@ int notes_merge(struct notes_merge_options *o,
 		 * Failed to get remote_sha1. If o->remote_ref looks like an
 		 * unborn ref, perform the merge using an empty notes tree.
 		 */
-		if (!check_ref_format(o->remote_ref, 0)) {
+		if (!check_ref_format_unsafe(o->remote_ref, 0)) {
 			hashclr(remote_sha1);
 			remote = NULL;
 		} else {
diff --git a/refs.c b/refs.c
index 6985a3f..26720e8 100644
--- a/refs.c
+++ b/refs.c
@@ -879,8 +879,11 @@ int normalize_refname(char *dst, int dstlen, const char *ref, int flags)
 
 	ch = *cp;
 	do {
-		while (ch == '/')
-			ch = *++cp; /* tolerate leading and repeated slashes */
+		if (dst) {
+			/* tolerate leading and repeated slashes */
+			while (ch == '/')
+				ch = *++cp;
+		}
 
 		/*
 		 * We are at the start of a path component.  Record
@@ -947,6 +950,12 @@ int normalize_refname(char *dst, int dstlen, const char *ref, int flags)
 	return 0;
 }
 
+int check_ref_format_unsafe(const char *ref, int flags)
+{
+	char unused[PATH_MAX];
+	return normalize_refname(unused, sizeof(unused), ref, flags);
+}
+
 int check_ref_format(const char *ref, int flags)
 {
 	return normalize_refname(NULL, 0, ref, flags);
@@ -1132,7 +1141,7 @@ static struct ref_lock *lock_ref_sha1_basic(const char *ref, const unsigned char
 struct ref_lock *lock_ref_sha1(const char *ref, const unsigned char *old_sha1)
 {
 	char refpath[PATH_MAX];
-	if (check_ref_format(ref, 0))
+	if (check_ref_format_unsafe(ref, 0))
 		return NULL;
 	strcpy(refpath, mkpath("refs/%s", ref));
 	return lock_ref_sha1_basic(refpath, old_sha1, 0, NULL);
@@ -1140,7 +1149,7 @@ struct ref_lock *lock_ref_sha1(const char *ref, const unsigned char *old_sha1)
 
 struct ref_lock *lock_any_ref_for_update(const char *ref, const unsigned char *old_sha1, int flags)
 {
-	if (check_ref_format(ref, REFNAME_ALLOW_ONELEVEL))
+	if (check_ref_format_unsafe(ref, REFNAME_ALLOW_ONELEVEL))
 		return NULL;
 	return lock_ref_sha1_basic(ref, old_sha1, flags, NULL);
 }
diff --git a/refs.h b/refs.h
index 8a15f83..bdfd230 100644
--- a/refs.h
+++ b/refs.h
@@ -102,24 +102,43 @@ extern int for_each_reflog(each_ref_fn, void *);
 
 /*
  * Check that ref is a valid refname according to the rules described
- * in Documentation/git-check-ref-format.txt and normalize it by
- * stripping out superfluous "/" characters.  If dst != NULL, write
- * the normalized refname to dst, which must be an allocated character
- * array with length dstlen (typically at least as long as ref).  dst
- * may point at the same memory as ref.  Return 0 iff the refname was
- * OK and fit into dst.  If REFNAME_ALLOW_ONELEVEL is set in flags,
- * then accept one-level reference names.  If REFNAME_REFSPEC_PATTERN
- * is set in flags, then allow a "*" wildcard characters in place of
- * one of the name components.
+ * in Documentation/git-check-ref-format.txt.  If
+ * REFNAME_ALLOW_ONELEVEL is set in flags, then accept reference names
+ * with only a single component.  If REFNAME_REFSPEC_PATTERN is set in
+ * flags, then allow a "*" wildcard character in place of one of the
+ * name components.
+ *
+ * The handling of normalized/unnormalized refnames is determined by
+ * dst:
+ *
+ * - If dst is non-NULL, then it must be an allocated character array
+ *   with length dstlen.  Usually dstlen should be at least
+ *   strlen(ref)+1.  Dst may point at the same memory as ref.  In this
+ *   case, write a normalized copy of ref to dst, stripping out
+ *   superfluous "/" characters.
+ *
+ * - If dst is NULL, verify that the reference is already normalized
+ *   and do no copying.
+ *
+ * Return 0 iff the refname was OK and fit into dst.
  */
 extern int normalize_refname(char *dst, int dstlen, const char *ref, int flags);
 
 /*
- * Return 0 iff ref has the correct format for a refname.  See
- * normalize_refname() for details.
+ * Return 0 iff ref has the correct format for a refname and is
+ * correctly normalized.  See normalize_refname() for details.
  */
 extern int check_ref_format(const char *ref, int flags);
 
+/*
+ * Return 0 iff ref has the correct format for a refname, *without*
+ * requiring it to be normalized.  This function is unsafe because
+ * unnormalized refnames should not be used; please use
+ * normalize_refname() if you want to accept unnormalized refnames or
+ * check_ref_format() if you only want to accept normalized refnames.
+ */
+extern int check_ref_format_unsafe(const char *ref, int flags);
+
 extern const char *prettify_refname(const char *refname);
 extern char *shorten_unambiguous_ref(const char *ref, int strict);
 
diff --git a/remote.c b/remote.c
index 7059885..00e7523 100644
--- a/remote.c
+++ b/remote.c
@@ -559,7 +559,8 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 
 		rs[i].pattern = is_glob;
 		rs[i].src = xstrndup(lhs, llen);
-		flags = REFNAME_ALLOW_ONELEVEL | (is_glob ? REFNAME_REFSPEC_PATTERN : 0);
+		flags = REFNAME_ALLOW_ONELEVEL |
+			(is_glob ? REFNAME_REFSPEC_PATTERN : 0);
 
 		if (fetch) {
 			/*
@@ -569,7 +570,7 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 			 */
 			if (!*rs[i].src)
 				; /* empty is ok */
-			else if (check_ref_format(rs[i].src, flags))
+			else if (check_ref_format_unsafe(rs[i].src, flags))
 				goto invalid;
 			/*
 			 * RHS
@@ -581,7 +582,7 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 				; /* ok */
 			else if (!*rs[i].dst)
 				; /* ok */
-			else if (check_ref_format(rs[i].dst, flags))
+			else if (check_ref_format_unsafe(rs[i].dst, flags))
 				goto invalid;
 		} else {
 			/*
@@ -594,7 +595,7 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 			if (!*rs[i].src)
 				; /* empty is ok */
 			else if (is_glob) {
-				if (check_ref_format(rs[i].src, flags))
+				if (check_ref_format_unsafe(rs[i].src, flags))
 					goto invalid;
 			}
 			else
@@ -607,12 +608,12 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 			 * - otherwise it must be a valid looking ref.
 			 */
 			if (!rs[i].dst) {
-				if (check_ref_format(rs[i].src, flags))
+				if (check_ref_format_unsafe(rs[i].src, flags))
 					goto invalid;
 			} else if (!*rs[i].dst) {
 				goto invalid;
 			} else {
-				if (check_ref_format(rs[i].dst, flags))
+				if (check_ref_format_unsafe(rs[i].dst, flags))
 					goto invalid;
 			}
 		}
@@ -1402,7 +1403,7 @@ int get_fetch_map(const struct ref *remote_refs,
 
 	for (rmp = &ref_map; *rmp; ) {
 		if ((*rmp)->peer_ref) {
-			if (check_ref_format((*rmp)->peer_ref->name + 5,
+			if (check_ref_format_unsafe((*rmp)->peer_ref->name + 5,
 				REFNAME_ALLOW_ONELEVEL)) {
 				struct ref *ignore = *rmp;
 				error("* Ignoring funny ref '%s' locally",
@@ -1595,7 +1596,7 @@ static int one_local_ref(const char *refname, const unsigned char *sha1, int fla
 	int len;
 
 	/* we already know it starts with refs/ to get here */
-	if (check_ref_format(refname + 5, 0))
+	if (check_ref_format_unsafe(refname + 5, 0))
 		return 0;
 
 	len = strlen(refname) + 1;
diff --git a/sha1_name.c b/sha1_name.c
index 975ec3b..8458454 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -974,7 +974,7 @@ int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
 	if (name[0] == '-')
 		return -1;
 	strbuf_splice(sb, 0, 0, "refs/heads/", 11);
-	return check_ref_format(sb->buf, 0);
+	return check_ref_format_unsafe(sb->buf, 0);
 }
 
 /*
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index 419788f..411c271 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -30,9 +30,12 @@ invalid_ref '/'
 invalid_ref '/' --allow-onelevel
 valid_ref 'foo/bar/baz'
 valid_ref 'refs///heads/foo'
+invalid_ref 'refs///heads/foo' --no-normalize
 invalid_ref 'heads/foo/'
 valid_ref '/heads/foo'
+invalid_ref '/heads/foo' --no-normalize
 valid_ref '///heads/foo'
+invalid_ref '///heads/foo' --no-normalize
 invalid_ref './foo'
 invalid_ref './foo/bar'
 invalid_ref 'foo/./bar'
diff --git a/transport.c b/transport.c
index 225d9b8..e00610d 100644
--- a/transport.c
+++ b/transport.c
@@ -754,7 +754,9 @@ void transport_verify_remote_names(int nr_heads, const char **heads)
 			continue;
 
 		remote = remote ? (remote + 1) : local;
-		if (check_ref_format(remote, REFNAME_ALLOW_ONELEVEL|REFNAME_REFSPEC_PATTERN))
+		if (check_ref_format_unsafe(remote,
+				     REFNAME_ALLOW_ONELEVEL|
+				     REFNAME_REFSPEC_PATTERN))
 			die("remote part of refspec is not a valid name in %s",
 				heads[i]);
 	}
diff --git a/walker.c b/walker.c
index e5d8eb2..a1c1ee2 100644
--- a/walker.c
+++ b/walker.c
@@ -190,7 +190,7 @@ static int interpret_target(struct walker *walker, char *target, unsigned char *
 {
 	if (!get_sha1_hex(target, sha1))
 		return 0;
-	if (!check_ref_format(target, 0)) {
+	if (!check_ref_format_unsafe(target, 0)) {
 		struct ref *ref = alloc_ref(target);
 		if (!walker->fetch_ref(walker, ref)) {
 			hashcpy(sha1, ref->old_sha1);
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/7] Improved infrastructure for refname normalization
  2011-09-10  6:50 [PATCH v2 0/7] Improved infrastructure for refname normalization Michael Haggerty
                   ` (6 preceding siblings ...)
  2011-09-10  6:50 ` [PATCH v2 7/7] Add tools to avoid the use of unnormalized refnames Michael Haggerty
@ 2011-09-12  4:28 ` Junio C Hamano
  2011-09-12 15:11   ` Michael Haggerty
  7 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2011-09-12  4:28 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git, cmn, A Large Angry SCM

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Patch series re-roll:

Thanks for working on this. I very much like the general direction of the
series, the strategy to avoid wholesale audit of the callers and marking
the places that needs fixing with "_unsafe()".

There were a few minor things that looked worth mentioning while
reviewing, though.

 - (style) You seem to be fond of pre-increment a lot, but in general our
   codebase prefers post-increment especially when the end result does not
   make any difference, e.g.

	for (i = 1; ...; ++i) {
        	...

 - (series structure) It might make the series progress easier to follow
   if you introduced check_ref_format_unsafe() in the same commit where
   you change check_ref_format() to take flags parameter.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/7] Improved infrastructure for refname normalization
  2011-09-12  4:28 ` [PATCH v2 0/7] Improved infrastructure for refname normalization Junio C Hamano
@ 2011-09-12 15:11   ` Michael Haggerty
  2011-09-12 16:44     ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Haggerty @ 2011-09-12 15:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, cmn, A Large Angry SCM

On 09/12/2011 06:28 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> There were a few minor things that looked worth mentioning while
> reviewing, though.
> 
>  - (style) You seem to be fond of pre-increment a lot, but in general our
>    codebase prefers post-increment especially when the end result does not
>    make any difference, e.g.
> 
> 	for (i = 1; ...; ++i) {
>         	...

OK, changed.

>  - (series structure) It might make the series progress easier to follow
>    if you introduced check_ref_format_unsafe() in the same commit where
>    you change check_ref_format() to take flags parameter.

OK.  I'll take the opportunity to rename the functions to
check_refname_format*(), to make it more obvious that they only concern
themselves with the refnames and not the references themselves.

I discovered a bug in my code for handling refnames without
normalization; I will also fix that in v3.

OTOH I am again having serious doubts that trying to support
unnormalized refnames is a good idea.  I will write more when I have
time to argue my case.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/7] Improved infrastructure for refname normalization
  2011-09-12 15:11   ` Michael Haggerty
@ 2011-09-12 16:44     ` Junio C Hamano
  2011-09-13  4:16       ` Michael Haggerty
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2011-09-12 16:44 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git, cmn, A Large Angry SCM

Michael Haggerty <mhagger@alum.mit.edu> writes:

> OTOH I am again having serious doubts that trying to support
> unnormalized refnames is a good idea.

Sorry, I am confused. I thought the way you are planning forward is to
leave unnormalized ones unchecked as the current code does (and mark the
places that do so with _unsafe()), with the eventual goal of fixing the
caller to pass only normalized ones to make call to the "safe" version?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/7] Improved infrastructure for refname normalization
  2011-09-12 16:44     ` Junio C Hamano
@ 2011-09-13  4:16       ` Michael Haggerty
  2011-09-13 17:43         ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Haggerty @ 2011-09-13  4:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, cmn, A Large Angry SCM

On 09/12/2011 06:44 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
>> OTOH I am again having serious doubts that trying to support
>> unnormalized refnames is a good idea.
> 
> Sorry, I am confused. I thought the way you are planning forward is to
> leave unnormalized ones unchecked as the current code does (and mark the
> places that do so with _unsafe()), with the eventual goal of fixing the
> caller to pass only normalized ones to make call to the "safe" version?

That was the plan, but after my experience trying to fix this problem I
have come to doubt that it is doable within a reasonable amount of work
or even that support for unnormalized refnames is desirable.  The
problem is that the API provided by refs.{c,h} is far from waterproof,
and I keep finding more code elsewhere that manipulates and parses
refnames and makes implicit assumptions (sometimes spread over many
functions) about their form.

Consistency of the UI should be the goal.  Supporting unnormalized
refnames some places, but not others, will just confuse and frustrate
users.  The only two consistent alternatives are

1. Unnormalized refnames are supported everywhere in the UI that
refnames are accepted, including clients like gitweb, gitk, egit, etc.

2. Only normalized refnames are supported; unnormalized refnames are
errors that we report on a best-effort basis.

Option (1) has a number of problems:

* Claiming to support unnormalized refnames is far from the current
situation; therefore lots of current code would have to be considered
broken.

* Fixing the code requires many new unit tests and fixes to many areas
of the code, including clients outside of the main git project.  I have
tried fixing a couple of examples ("git branch", "git rev-parse", and
the first argument of "git update-ref") and it is pretty messy.

* Some of the needed changes seem like they might conflict with other
forms of DWIM; for example, the ambiguous_path() kludge.

* The extra copying needed for normalization has a runtime cost and
complicates memory management.

* Unnormalized refnames are *themselves* a form of UI inconsistency and
therefore not a very noble goal.  It is better that people learn that
each reference has a single name, and unlearn that references were once
1:1 with files under .git/refs.

What is the benefit of (2) that justifies all of this work?  To enable
sloppy script writers to throw slashes around carelessly?

Option (1) would be far easier.  Then code only needs to treat
unnormalized refnames like any other kind of invalid refname rather than
making sure that unnormalized refnames work properly in combination with
all other features.

So I propose the following:

* Institute a policy that refnames in the UI must always be normalized

* On a best-effort basis, emit errors whenever unnormalized refnames are
encountered

* Continue to support "git check-ref-format --print", which script
writers can use to normalize refnames if they need to.

The only disadvantage of a stricter policy is that some of today's
sloppy scripts, which today might sometimes work (but not reliably),
break in a pretty obvious way that can be fixed with a single call to
"git check-ref-format --print".

I'd rather get beyond this swamp and start working on the hierarchical
reference cache, which will bring some real benefits.  (The hierarchical
reference cache requires some sanity in refname handling, which is why I
got into this mess in the first place.)

What do people think?
Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/7] Improved infrastructure for refname normalization
  2011-09-13  4:16       ` Michael Haggerty
@ 2011-09-13 17:43         ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2011-09-13 17:43 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git, cmn, A Large Angry SCM

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Consistency of the UI should be the goal.  Supporting unnormalized
> refnames some places, but not others, will just confuse and frustrate
> users.
> ...
> So I propose the following:
>
> * Institute a policy that refnames in the UI must always be normalized
>
> * On a best-effort basis, emit errors whenever unnormalized refnames are
> encountered
>
> * Continue to support "git check-ref-format --print", which script
> writers can use to normalize refnames if they need to.
>
> The only disadvantage of a stricter policy is that some of today's
> sloppy scripts, which today might sometimes work (but not reliably),
> break in a pretty obvious way that can be fixed with a single call to
> "git check-ref-format --print".
>
> I'd rather get beyond this swamp and start working on the hierarchical
> reference cache, which will bring some real benefits.  (The hierarchical
> reference cache requires some sanity in refname handling, which is why I
> got into this mess in the first place.)

Both the analysis and theproposal I find very sane and sensible. Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-09-13 17:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-10  6:50 [PATCH v2 0/7] Improved infrastructure for refname normalization Michael Haggerty
2011-09-10  6:50 ` [PATCH v2 1/7] t1402: add some more tests Michael Haggerty
2011-09-10  6:50 ` [PATCH v2 2/7] Change bad_ref_char() to return a boolean value Michael Haggerty
2011-09-10  6:50 ` [PATCH v2 3/7] git check-ref-format: add options --allow-onelevel and --refspec-pattern Michael Haggerty
2011-09-10  6:50 ` [PATCH v2 4/7] Change check_ref_format() to take a flags argument Michael Haggerty
2011-09-10  6:50 ` [PATCH v2 5/7] Add a library function normalize_refname() Michael Haggerty
2011-09-10  6:50 ` [PATCH v2 6/7] Do not allow ".lock" at the end of any refname component Michael Haggerty
2011-09-10  6:50 ` [PATCH v2 7/7] Add tools to avoid the use of unnormalized refnames Michael Haggerty
2011-09-12  4:28 ` [PATCH v2 0/7] Improved infrastructure for refname normalization Junio C Hamano
2011-09-12 15:11   ` Michael Haggerty
2011-09-12 16:44     ` Junio C Hamano
2011-09-13  4:16       ` Michael Haggerty
2011-09-13 17:43         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).