git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brandon Williams <bmwill@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, sbeller@google.com, peff@peff.net,
	jacob.keller@gmail.com, ramsay@ramsayjones.plus.com,
	tboegi@web.de, j6t@kdbg.org, pclouds@gmail.com
Subject: Re: [PATCH v3 1/4] real_path: resolve symlinks by hand
Date: Mon, 12 Dec 2016 14:50:06 -0800	[thread overview]
Message-ID: <20161212225006.GB193413@google.com> (raw)
In-Reply-To: <xmqqd1gw94f1.fsf@gitster.mtv.corp.google.com>

On 12/12, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > +/* removes the last path component from 'path' except if 'path' is root */
> > +static void strip_last_component(struct strbuf *path)
> > +{
> > +	size_t offset = offset_1st_component(path->buf);
> > +	size_t len = path->len;
> > +
> > +	/* Find start of the last component */
> > +	while (offset < len && !is_dir_sep(path->buf[len - 1]))
> > +		len--;
> 
> If somebody at a higher level in the callchain has already
> normalized path, this is not a problem, but this will behave
> "unexpectedly" when path ends with a dir_sep byte (or more).
> 
> E.g. for input path "foo/bar/", the above loop runs zero times and
> then ...
> 
> > +	/* Skip sequences of multiple path-separators */
> > +	while (offset < len && is_dir_sep(path->buf[len - 1]))
> > +		len--;
> 
> ... the slash at the end is removed, leaving "foo/bar" in path.
> 

The way this is currently used I don't believe this scenario can happen
(since input to this shouldn't have trailing slashes), but if others
begin to use this function then yes, that is an implicit assumption.  I
think it may be an ok assumption though since this is only called on
"resolved" which is the ouput and needs to be normalized to begin with. To
fix this we could simply add the while loop that strips dir_sep at the
beginning as well as at the end, like so:

  /* Skip sequences of multiple path-separators */
  while (offset < len && is_dir_sep(path->buf[len - 1]))
  	len--;
  /* Skip sequences of multiple path-separators */
  while (offset < len && !is_dir_sep(path->buf[len - 1]))
  	len--;
  /* Skip sequences of multiple path-separators */
  while (offset < len && is_dir_sep(path->buf[len - 1]))
  	len--;

> > +	strbuf_setlen(path, len);
> > +}
> > ...
> > +/* get (and remove) the next component in 'remaining' and place it in 'next' */
> > +static void get_next_component(struct strbuf *next, struct strbuf *remaining)
> > +{
> > +	char *start = NULL;
> > +	char *end = NULL;
> > +
> > +	strbuf_reset(next);
> > +
> > +	/* look for the next component */
> > +	/* Skip sequences of multiple path-separators */
> > +	for (start = remaining->buf; is_dir_sep(*start); start++)
> > +		; /* nothing */
> > +	/* Find end of the path component */
> > +	for (end = start; *end && !is_dir_sep(*end); end++)
> > +		; /* nothing */
> > +
> > +	strbuf_add(next, start, end - start);
> > +	/* remove the component from 'remaining' */
> > +	strbuf_remove(remaining, 0, end - remaining->buf);
> > +}
> 
> Unlike the strip_last_component(), I think this one is more
> carefully done and avoids getting fooled by //extra//slashes// at
> the beginning or at the end, which does help in the correctness of
> the loop we see below.
> 
> > @@ -58,74 +88,112 @@ static const char *real_path_internal(const char *path, int die_on_error)
> >  			goto error_out;
> >  	}
> >  
> > +	strbuf_reset(&resolved);
> > +
> > +	if (is_absolute_path(path)) {
> > +		/* absolute path; start with only root as being resolved */
> > +		int offset = offset_1st_component(path);
> > +		strbuf_add(&resolved, path, offset);
> > +		strbuf_addstr(&remaining, path + offset);
> > +	} else {
> > +		/* relative path; can use CWD as the initial resolved path */
> > +		if (strbuf_getcwd(&resolved)) {
> > +			if (die_on_error)
> > +				die_errno("unable to get current working directory");
> > +			else
> > +				goto error_out;
> >  		}
> > +		strbuf_addstr(&remaining, path);
> > +	}
> >  
> > +	/* Iterate over the remaining path components */
> > +	while (remaining.len > 0) {
> > +		get_next_component(&next, &remaining);
> > +
> > +		if (next.len == 0) {
> > +			continue; /* empty component */
> > +		} else if (next.len == 1 && !strcmp(next.buf, ".")) {
> > +			continue; /* '.' component */
> > +		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
> > +			/* '..' component; strip the last path component */
> > +			strip_last_component(&resolved);
> 
> Wouldn't this let "resolved" eventually run out of the path
> components to strip for a malformed input e.g. "/a/../../b"?
> 

As I understand it, that path is correct and would resolve to "/b".  The
strip_last_component function doesn't allow striping the "1st" component
or the "root" component.  So if resolved is "/" and we encounter a ".."
which requires striping of the last component, the result would be "/".

> > + ...
> > +			/*
> > +			 * if there are still remaining components to resolve
> > +			 * then append them to symlink
> > +			 */
> > +			if (remaining.len) {
> > +				strbuf_addch(&symlink, '/');
> 
> This can add duplicate dir_sep if readlink(2)'ed contents of the
> symbolic link already ends with a slash, but I think it (together
> with the fact that the code does nothing to normalize what is read
> from the symbolic link) probably does not matter, given the way how
> get_next_component() is implemented.
> 

Yes, I think the way get_next_component() is written will account for
non-normalized symlink contents.  This way we only have to worry about
normalizing input in one location (maybe two with
strip_last_component()).

> > +				strbuf_addbuf(&symlink, &remaining);
> > +			}
> > +
> > +			/*
> > +			 * use the symlink as the remaining components that
> > +			 * need to be resloved
> > +			 */
> > +			strbuf_swap(&symlink, &remaining);
> > +		}
> >  	}
> >  
> > +	retval = resolved.buf;
> > +
> >  error_out:
> > +	strbuf_release(&remaining);
> > +	strbuf_release(&next);
> > +	strbuf_release(&symlink);
> >  
> >  	return retval;
> >  }
> 

-- 
Brandon Williams

  reply	other threads:[~2016-12-12 22:50 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-05 18:58 [PATCH] making real_path thread-safe Brandon Williams
2016-12-05 18:58 ` [PATCH] real_path: make " Brandon Williams
2016-12-05 19:57   ` Stefan Beller
2016-12-05 20:12     ` Brandon Williams
2016-12-05 20:38       ` Stefan Beller
2016-12-05 20:14   ` Stefan Beller
2016-12-05 20:16     ` Brandon Williams
2016-12-08  9:41       ` Duy Nguyen
2016-12-08 17:50         ` Brandon Williams
2016-12-06 23:44   ` Junio C Hamano
2016-12-07  0:10     ` Brandon Williams
2016-12-07  1:12       ` Ramsay Jones
2016-12-07 20:14         ` Torsten Bögershausen
2016-12-07 20:32           ` Junio C Hamano
2016-12-07 22:13             ` Brandon Williams
2016-12-08  7:55               ` Torsten Bögershausen
2016-12-08 18:41                 ` Johannes Sixt
2016-12-08 19:02                   ` Brandon Williams
2016-12-07 20:43       ` Johannes Sixt
2016-12-07 22:29         ` Brandon Williams
2016-12-08 11:32           ` Johannes Sixt
2016-12-08 16:54             ` Junio C Hamano
2016-12-08 23:58 ` [PATCH v2 0/4] road to reentrant real_path Brandon Williams
2016-12-08 23:58   ` [PATCH v2 1/4] real_path: resolve symlinks by hand Brandon Williams
2016-12-09  1:49     ` Jacob Keller
2016-12-09 14:33     ` Johannes Sixt
2016-12-09 20:04       ` Brandon Williams
2016-12-08 23:58   ` [PATCH v2 2/4] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
2016-12-08 23:58   ` [PATCH v2 3/4] real_path: create real_pathdup Brandon Williams
2016-12-09 14:35     ` Johannes Sixt
2016-12-08 23:58   ` [PATCH v2 4/4] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
2016-12-09 12:33   ` [PATCH v2 0/4] road to reentrant real_path Duy Nguyen
2016-12-09 19:42     ` Brandon Williams
2016-12-10 11:02       ` Duy Nguyen
2016-12-12 18:16   ` [PATCH v3 " Brandon Williams
2016-12-12 18:16     ` [PATCH v3 1/4] real_path: resolve symlinks by hand Brandon Williams
2016-12-12 22:19       ` Junio C Hamano
2016-12-12 22:50         ` Brandon Williams [this message]
2016-12-12 23:32           ` Junio C Hamano
2016-12-12 18:16     ` [PATCH v3 2/4] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
2016-12-12 22:20       ` Junio C Hamano
2016-12-12 18:16     ` [PATCH v3 3/4] real_path: create real_pathdup Brandon Williams
2016-12-12 22:25       ` Junio C Hamano
2016-12-12 18:16     ` [PATCH v3 4/4] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
2016-12-12 22:26       ` Junio C Hamano
2016-12-12 23:47         ` Junio C Hamano
2016-12-12 23:58           ` Stefan Beller
2016-12-13  1:15             ` Brandon Williams
2016-12-13  6:39               ` Junio C Hamano
2016-12-21 21:51     ` [PATCH bw/realpath-wo-chdir] real_path: canonicalize directory separators in root parts Johannes Sixt
2016-12-21 22:33       ` Brandon Williams
2016-12-22  6:07         ` Johannes Sixt
2016-12-22 17:33           ` Brandon Williams
2016-12-22 18:54             ` Johannes Sixt
2016-12-22 19:33             ` Junio C Hamano
2017-01-03 19:09     ` [PATCH v4 0/5] road to reentrant real_path Brandon Williams
2017-01-03 19:09       ` [PATCH v4 1/5] real_path: resolve symlinks by hand Brandon Williams
2017-01-03 19:09       ` [PATCH v4 2/5] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
2017-01-03 19:09       ` [PATCH v4 3/5] real_path: create real_pathdup Brandon Williams
2017-01-03 19:09       ` [PATCH v4 4/5] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
2017-01-04  1:07         ` Jacob Keller
2017-01-04 18:14           ` Brandon Williams
2017-01-03 19:09       ` [PATCH v4 5/5] real_path: canonicalize directory separators in root parts Brandon Williams
2017-01-04  0:48       ` [PATCH v4 0/5] road to reentrant real_path Jeff King
2017-01-04  6:56         ` Torsten Bögershausen
2017-01-04  7:01           ` Jeff King
2017-01-04 18:13             ` Brandon Williams
2017-01-04 18:22               ` Stefan Beller
2017-01-04 21:46                 ` Jacob Keller
2017-01-04 21:55                   ` Brandon Williams
2017-01-04 22:01       ` [PATCH v5 " Brandon Williams
2017-01-04 22:01         ` [PATCH v5 1/5] real_path: resolve symlinks by hand Brandon Williams
2017-01-04 22:01         ` [PATCH v5 2/5] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
2017-01-04 22:01         ` [PATCH v5 3/5] real_path: create real_pathdup Brandon Williams
2017-01-04 22:01         ` [PATCH v5 4/5] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
2017-01-04 22:01         ` [PATCH v5 5/5] real_path: canonicalize directory separators in root parts Brandon Williams
2017-01-08  3:09         ` [PATCH v5 0/5] road to reentrant real_path Junio C Hamano
2017-01-09 18:04           ` Brandon Williams
2017-01-09 18:18             ` Junio C Hamano
2017-01-09 18:24               ` Brandon Williams
2017-01-09 19:26                 ` Junio C Hamano
2017-01-09 18:50               ` [PATCH 1/2] real_path: prevent redefinition of MAXSYMLINKS Brandon Williams
2017-01-09 18:50                 ` [PATCH 2/2] real_path: set errno when max number of symlinks is exceeded Brandon Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161212225006.GB193413@google.com \
    --to=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=jacob.keller@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sbeller@google.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).