From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 6/6] refname_is_safe: avoid expensive normalize_path_copy call
Date: Sat, 4 Apr 2015 21:15:57 -0400 [thread overview]
Message-ID: <20150405011557.GF30127@peff.net> (raw)
In-Reply-To: <20150405010611.GA15901@peff.net>
Usually refs are not allowed to contain a ".." component.
However, since d0f810f (refs.c: allow listing and deleting
badly named refs, 2014-09-03), we relax these rules in some
cases in order to help users examine and get rid of
badly-named refs. However, we do still check that these refs
cannot "escape" the refs hierarchy (e.g., "refs/../foo").
This check is implemented by calling normalize_path_copy,
which requires us to allocate a new buffer to hold the
result. But we don't care about the result; we only care
whether the "up" components outnumbered the "down".
We can therefore implement this check ourselves without
requiring any extra allocations. With this patch, running
"git rev-parse refs/heads/does-not-exist" on a repo with
large (1.6GB) packed-refs file went from:
real 0m8.910s
user 0m8.452s
sys 0m0.468s
to:
real 0m8.529s
user 0m8.044s
sys 0m0.492s
for a wall-clock speedup of 4%.
Signed-off-by: Jeff King <peff@peff.net>
---
This was a lot less than I was hoping for, especially considering that
going from d0f810f^ to d0f810f is more like a 15% slowdown (or in
absolute numbers, ~1.1s versus only 400ms here). What's doubly confusing
is that I think we were running check_ref_format before d0f810f, which
does way more than the check we're doing here. So we should have ended
up faster than either.
So this is certainly _an_ improvement, but I think there may be more
going on.
cache.h | 7 +++++++
path.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
refs.c | 16 ++--------------
3 files changed, 73 insertions(+), 14 deletions(-)
diff --git a/cache.h b/cache.h
index 3d3244b..c2b3df4 100644
--- a/cache.h
+++ b/cache.h
@@ -836,6 +836,13 @@ char *strip_path_suffix(const char *path, const char *suffix);
int daemon_avoid_alias(const char *path);
extern int is_ntfs_dotgit(const char *name);
+/**
+ * Returns 1 if the path "escapes" its root (e.g., "foo/../../bar") and 0
+ * otherwise. Note that this is purely textual, and does not care about
+ * symlinks or other aspects of the filesystem.
+ */
+int check_path_escape(const char *path);
+
/* object replacement */
#define LOOKUP_REPLACE_OBJECT 1
extern void *read_sha1_file_extended(const unsigned char *sha1, enum object_type *type, unsigned long *size, unsigned flag);
diff --git a/path.c b/path.c
index e608993..1127cbd 100644
--- a/path.c
+++ b/path.c
@@ -703,6 +703,70 @@ int normalize_path_copy(char *dst, const char *src)
}
/*
+ * We want to detect a path that "escapes" its root. The general strategy
+ * is to parse components left to right, keeping track of our depth,
+ * which is increased by non-empty components and decreased by ".."
+ * components.
+ */
+int check_path_escape(const char *path)
+{
+ int depth = 0;
+
+ while (*path) {
+ char ch = *path++;
+
+ /*
+ * We always start our loop at the beginning of a path component. So
+ * we can skip past any dir separators. This handles leading
+ * "/", as well as any internal "////".
+ */
+ if (is_dir_sep(ch))
+ continue;
+
+ /*
+ * If we start with a dot, we care about the four cases
+ * (similar to normalize_path_copy above):
+ *
+ * (1) "." - does not affect depth; we are done
+ * (2) "./" - does not affect depth; skip
+ * (3) ".." - check depth and finish
+ * (4) "../" - drop depth, check, and keep looking
+ */
+ if (ch == '.') {
+ ch = *path++;
+
+ if (!ch)
+ return 1; /* case (1) */
+ if (is_dir_sep(ch))
+ continue; /* case (2) */
+ if (ch == '.') {
+ ch = *path++;
+ if (!ch)
+ return depth > 0; /* case (3) */
+ if (is_dir_sep(ch)) {
+ /* case (4) */
+ if (--depth < 0)
+ return 0;
+ continue;
+ }
+ /* otherwise, "..foo"; fall through */
+ }
+ /* otherwise ".foo"; fall through */
+ }
+
+ /*
+ * We have a real component; inrement the depth and eat the
+ * rest of the component
+ */
+ depth++;
+ while (*path && !is_dir_sep(*path))
+ path++;
+ }
+
+ return 1;
+}
+
+/*
* path = Canonical absolute path
* prefixes = string_list containing normalized, absolute paths without
* trailing slashes (except for the root directory, which is denoted by "/").
diff --git a/refs.c b/refs.c
index 47e4e53..daca499 100644
--- a/refs.c
+++ b/refs.c
@@ -312,20 +312,8 @@ static struct ref_dir *get_ref_dir(struct ref_entry *entry)
*/
static int refname_is_safe(const char *refname)
{
- if (starts_with(refname, "refs/")) {
- char *buf;
- int result;
-
- buf = xmalloc(strlen(refname) + 1);
- /*
- * Does the refname try to escape refs/?
- * For example: refs/foo/../bar is safe but refs/foo/../../bar
- * is not.
- */
- result = !normalize_path_copy(buf, refname + strlen("refs/"));
- free(buf);
- return result;
- }
+ if (skip_prefix(refname, "refs/", &refname))
+ return check_path_escape(refname);
while (*refname) {
if (!isupper(*refname) && *refname != '_')
return 0;
--
2.4.0.rc0.363.gf9f328b
next prev parent reply other threads:[~2015-04-05 1:16 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-05 1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King
2015-04-05 1:07 ` [PATCH 1/6] strbuf_getwholeline: use getc macro Jeff King
2015-04-05 1:08 ` [PATCH 2/6] git-compat-util: add fallbacks for unlocked stdio Jeff King
2015-04-05 1:11 ` [PATCH 3/6] strbuf_getwholeline: use getc_unlocked Jeff King
2015-04-05 4:56 ` Jeff King
2015-04-05 5:27 ` Jeff King
2015-04-05 5:35 ` Jeff King
2015-04-05 20:49 ` Junio C Hamano
2015-04-05 14:36 ` Duy Nguyen
2015-04-05 18:24 ` Jeff King
2015-04-05 20:09 ` Junio C Hamano
2015-04-07 13:48 ` Rasmus Villemoes
2015-04-07 19:04 ` Jeff King
2015-04-07 22:43 ` Rasmus Villemoes
2015-04-08 0:17 ` Jeff King
2015-04-05 1:11 ` [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow Jeff King
2015-04-06 2:13 ` Eric Sunshine
2015-04-06 5:05 ` Jeff King
2015-04-05 1:11 ` [PATCH 5/6] t1430: add another refs-escape test Jeff King
2015-04-05 1:15 ` Jeff King [this message]
2015-04-05 13:41 ` [PATCH 0/6] address packed-refs speed regressions René Scharfe
2015-04-05 18:52 ` Jeff King
2015-04-05 18:59 ` Jeff King
2015-04-05 23:04 ` René Scharfe
2015-04-05 22:39 ` René Scharfe
2015-04-06 4:49 ` Jeff King
2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King
2015-04-16 8:48 ` [PATCH 1/9] strbuf_getwholeline: use getc macro Jeff King
2015-04-16 8:48 ` [PATCH 2/9] git-compat-util: add fallbacks for unlocked stdio Jeff King
2015-04-16 8:49 ` [PATCH 3/9] strbuf_getwholeline: use getc_unlocked Jeff King
2015-04-16 8:51 ` [PATCH 4/9] config: use getc_unlocked when reading from file Jeff King
2015-04-16 8:53 ` [PATCH 5/9] strbuf_addch: avoid calling strbuf_grow Jeff King
2015-04-16 8:58 ` [PATCH 6/9] strbuf_getwholeline: " Jeff King
2015-04-16 9:01 ` [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available Jeff King
2015-04-17 10:16 ` Eric Sunshine
2015-04-21 23:09 ` Jeff King
2015-05-08 23:56 ` Eric Sunshine
2015-05-09 1:09 ` Jeff King
2015-06-02 18:22 ` Eric Sunshine
2015-04-22 18:00 ` Johannes Schindelin
2015-04-22 18:06 ` Jeff King
2015-04-16 9:03 ` [PATCH 8/9] read_packed_refs: avoid double-checking sane refs Jeff King
2015-04-16 9:04 ` [PATCH 9/9] t1430: add another refs-escape test Jeff King
2015-04-16 9:25 ` [PATCH v2 0/9] address packed-refs speed regressions Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150405011557.GF30127@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.