All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 6/6] refname_is_safe: avoid expensive normalize_path_copy call
Date: Sat, 4 Apr 2015 21:15:57 -0400	[thread overview]
Message-ID: <20150405011557.GF30127@peff.net> (raw)
In-Reply-To: <20150405010611.GA15901@peff.net>

Usually refs are not allowed to contain a ".." component.
However, since d0f810f (refs.c: allow listing and deleting
badly named refs, 2014-09-03), we relax these rules in some
cases in order to help users examine and get rid of
badly-named refs. However, we do still check that these refs
cannot "escape" the refs hierarchy (e.g., "refs/../foo").

This check is implemented by calling normalize_path_copy,
which requires us to allocate a new buffer to hold the
result. But we don't care about the result; we only care
whether the "up" components outnumbered the "down".

We can therefore implement this check ourselves without
requiring any extra allocations. With this patch, running
"git rev-parse refs/heads/does-not-exist" on a repo with
large (1.6GB) packed-refs file went from:

  real    0m8.910s
  user    0m8.452s
  sys     0m0.468s

to:

  real    0m8.529s
  user    0m8.044s
  sys     0m0.492s

for a wall-clock speedup of 4%.

Signed-off-by: Jeff King <peff@peff.net>
---
This was a lot less than I was hoping for, especially considering that
going from d0f810f^ to d0f810f is more like a 15% slowdown (or in
absolute numbers, ~1.1s versus only 400ms here). What's doubly confusing
is that I think we were running check_ref_format before d0f810f, which
does way more than the check we're doing here. So we should have ended
up faster than either.

So this is certainly _an_ improvement, but I think there may be more
going on.

 cache.h |  7 +++++++
 path.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 refs.c  | 16 ++--------------
 3 files changed, 73 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index 3d3244b..c2b3df4 100644
--- a/cache.h
+++ b/cache.h
@@ -836,6 +836,13 @@ char *strip_path_suffix(const char *path, const char *suffix);
 int daemon_avoid_alias(const char *path);
 extern int is_ntfs_dotgit(const char *name);
 
+/**
+ * Returns 1 if the path "escapes" its root (e.g., "foo/../../bar") and 0
+ * otherwise. Note that this is purely textual, and does not care about
+ * symlinks or other aspects of the filesystem.
+ */
+int check_path_escape(const char *path);
+
 /* object replacement */
 #define LOOKUP_REPLACE_OBJECT 1
 extern void *read_sha1_file_extended(const unsigned char *sha1, enum object_type *type, unsigned long *size, unsigned flag);
diff --git a/path.c b/path.c
index e608993..1127cbd 100644
--- a/path.c
+++ b/path.c
@@ -703,6 +703,70 @@ int normalize_path_copy(char *dst, const char *src)
 }
 
 /*
+ * We want to detect a path that "escapes" its root. The general strategy
+ * is to parse components left to right, keeping track of our depth,
+ * which is increased by non-empty components and decreased by ".."
+ * components.
+ */
+int check_path_escape(const char *path)
+{
+	int depth = 0;
+
+	while (*path) {
+		char ch = *path++;
+
+		/*
+		 * We always start our loop at the beginning of a path component. So
+		 * we can skip past any dir separators. This handles leading
+		 * "/", as well as any internal "////".
+		 */
+		if (is_dir_sep(ch))
+			continue;
+
+		/*
+		 * If we start with a dot, we care about the four cases
+		 * (similar to normalize_path_copy above):
+		 *
+		 *  (1)  "."  - does not affect depth; we are done
+		 *  (2)  "./" - does not affect depth; skip
+		 *  (3)  ".." - check depth and finish
+		 *  (4)  "../" - drop depth, check, and keep looking
+		 */
+		if (ch == '.') {
+			ch = *path++;
+
+			if (!ch)
+				return 1; /* case (1) */
+			if (is_dir_sep(ch))
+				continue; /* case (2) */
+			if (ch == '.') {
+				ch = *path++;
+				if (!ch)
+					return depth > 0; /* case (3) */
+				if (is_dir_sep(ch)) {
+					/* case (4) */
+					if (--depth < 0)
+						return 0;
+					continue;
+				}
+				/* otherwise, "..foo"; fall through */
+			}
+			/* otherwise ".foo"; fall through */
+		}
+
+		/*
+		 * We have a real component; inrement the depth and eat the
+		 * rest of the component
+		 */
+		depth++;
+		while (*path && !is_dir_sep(*path))
+			path++;
+	}
+
+	return 1;
+}
+
+/*
  * path = Canonical absolute path
  * prefixes = string_list containing normalized, absolute paths without
  * trailing slashes (except for the root directory, which is denoted by "/").
diff --git a/refs.c b/refs.c
index 47e4e53..daca499 100644
--- a/refs.c
+++ b/refs.c
@@ -312,20 +312,8 @@ static struct ref_dir *get_ref_dir(struct ref_entry *entry)
  */
 static int refname_is_safe(const char *refname)
 {
-	if (starts_with(refname, "refs/")) {
-		char *buf;
-		int result;
-
-		buf = xmalloc(strlen(refname) + 1);
-		/*
-		 * Does the refname try to escape refs/?
-		 * For example: refs/foo/../bar is safe but refs/foo/../../bar
-		 * is not.
-		 */
-		result = !normalize_path_copy(buf, refname + strlen("refs/"));
-		free(buf);
-		return result;
-	}
+	if (skip_prefix(refname, "refs/", &refname))
+		return check_path_escape(refname);
 	while (*refname) {
 		if (!isupper(*refname) && *refname != '_')
 			return 0;
-- 
2.4.0.rc0.363.gf9f328b

  parent reply	other threads:[~2015-04-05  1:16 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-05  1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King
2015-04-05  1:07 ` [PATCH 1/6] strbuf_getwholeline: use getc macro Jeff King
2015-04-05  1:08 ` [PATCH 2/6] git-compat-util: add fallbacks for unlocked stdio Jeff King
2015-04-05  1:11 ` [PATCH 3/6] strbuf_getwholeline: use getc_unlocked Jeff King
2015-04-05  4:56   ` Jeff King
2015-04-05  5:27     ` Jeff King
2015-04-05  5:35       ` Jeff King
2015-04-05 20:49         ` Junio C Hamano
2015-04-05 14:36     ` Duy Nguyen
2015-04-05 18:24       ` Jeff King
2015-04-05 20:09     ` Junio C Hamano
2015-04-07 13:48     ` Rasmus Villemoes
2015-04-07 19:04       ` Jeff King
2015-04-07 22:43         ` Rasmus Villemoes
2015-04-08  0:17           ` Jeff King
2015-04-05  1:11 ` [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow Jeff King
2015-04-06  2:13   ` Eric Sunshine
2015-04-06  5:05     ` Jeff King
2015-04-05  1:11 ` [PATCH 5/6] t1430: add another refs-escape test Jeff King
2015-04-05  1:15 ` Jeff King [this message]
2015-04-05 13:41 ` [PATCH 0/6] address packed-refs speed regressions René Scharfe
2015-04-05 18:52   ` Jeff King
2015-04-05 18:59     ` Jeff King
2015-04-05 23:04       ` René Scharfe
2015-04-05 22:39     ` René Scharfe
2015-04-06  4:49       ` Jeff King
2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
2015-04-16  8:48   ` [PATCH 1/9] strbuf_getwholeline: use getc macro Jeff King
2015-04-16  8:48   ` [PATCH 2/9] git-compat-util: add fallbacks for unlocked stdio Jeff King
2015-04-16  8:49   ` [PATCH 3/9] strbuf_getwholeline: use getc_unlocked Jeff King
2015-04-16  8:51   ` [PATCH 4/9] config: use getc_unlocked when reading from file Jeff King
2015-04-16  8:53   ` [PATCH 5/9] strbuf_addch: avoid calling strbuf_grow Jeff King
2015-04-16  8:58   ` [PATCH 6/9] strbuf_getwholeline: " Jeff King
2015-04-16  9:01   ` [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available Jeff King
2015-04-17 10:16     ` Eric Sunshine
2015-04-21 23:09       ` Jeff King
2015-05-08 23:56         ` Eric Sunshine
2015-05-09  1:09           ` Jeff King
2015-06-02 18:22             ` Eric Sunshine
2015-04-22 18:00       ` Johannes Schindelin
2015-04-22 18:06         ` Jeff King
2015-04-16  9:03   ` [PATCH 8/9] read_packed_refs: avoid double-checking sane refs Jeff King
2015-04-16  9:04   ` [PATCH 9/9] t1430: add another refs-escape test Jeff King
2015-04-16  9:25   ` [PATCH v2 0/9] address packed-refs speed regressions Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150405011557.GF30127@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.