Re: [PATCH 6/8] check_refname_format(): add FULLY_QUALIFIED flag

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 6/8] check_refname_format(): add FULLY_QUALIFIED flag
Date: Tue, 30 Apr 2024 06:54:04 +0200	[thread overview]
Message-ID: <ZjB5bHiF5kAcRMpP@tanuki> (raw)
In-Reply-To: <20240429083325.GE233423@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 6460 bytes --]

On Mon, Apr 29, 2024 at 04:33:25AM -0400, Jeff King wrote:
> Before operating on a refname we get from a user, we usually check that
> it's syntactically valid. As a general rule, refs should be in the
> "refs/" namespace, the exception being HEAD and pseudorefs like
> FETCH_HEAD, etc. Those pseudorefs should consist only of all-caps and
> dash. But the syntactic rules are not enforced by check_refname_format().

s/dash/underscore, right?

> So for example we will happily operate on a ref "foo/bar" that will use
> the file ".git/foo/bar" under the hood (when using the files backend, of
> course).
> 
> Making things even more complicated, refname_is_safe() does enforce
> these syntax restrictions! When that function was added in 2014, we
> would have refused to work with such refs entirely. But we stopped being
> so picky in 03afcbee9b (read_packed_refs: avoid double-checking sane
> refs, 2015-04-16). That rationale there is that check_refname_format()
> is supposed to contain a superset of the checks of refname_is_safe().
> The idea being that we usually would rely on the more-strict
> check_refname_format(), but for certain operations (e.g., deleting a
> ref) we want to allow invalid names as long as they are not unsafe
> (e.g., not escaping the on-disk "refs/" hierarchy).

I still think we should eventually merge these functions. It's not
exactly obvious why one would use one or the other. So if we had a
function with strict default behaviour, where the caller can ask for
some loosening of the behaviour via flags, then I think it would become
a ton easier to do the right thing.

In any case, that doesn't need to be part of this patch series.

> But the pseudoref handling flips this logic; check_refname_format() is
> more lenient than refname_is_safe(). So you can create "foo/bar" and
> read it, but you cannot delete it:
> 
>   $ git update-ref foo/bar HEAD
>   $ git rev-parse foo/bar
>   747a29934757b7e695781e13e2511c43b951da2
>   $ git update-ref -d foo/bar
>   error: refusing to update ref with bad name 'foo/bar'
> 
> So we probably want check_ref_format() to learn the same syntactic
> restrictions that refname_is_safe() uses (namely insisting that anything
> outside of "refs/" matches the pseudoref syntax). The most obvious way
> to do that is simply to call refname_is_safe(). But the point of
> 03afcbee9b is that doing so is expensive. Without the syntactic
> restrictions of check_refname_format(), refname_is_safe() has to
> actually normalize the pathname to make sure it does not escape "refs/".
> That's redundant for us in check_refname_format(); we just need to make
> sure it either starts with "refs/" or is a pseudoref.
> 
> But wait, it gets more complicated! We also allow "worktrees/foo/$X"
> and "main-worktree/$X". In that case we should only be checking "$X"
> (which should be either a pseudoref or start with "refs/"). We can
> use parse_worktree_ref(), which fairly efficiently gives us the "bare"
> refname (even for a non-worktree ref, where it returns the original
> name).
> 
> And now when should this new logic kick in? Unfortunately we can't just
> do it all the time, because many callers pass in partial ref components.
> E.g., if they are thinking about making "refs/heads/foo", they'll pass
> us "foo". This is usually accompanied by the ALLOW_ONELEVEL flag. But we
> likewise can't take the absence of ALLOW_ONELEVEL as a hint that the
> name is fully qualified, because that flag is also used to indicate that
> pseudorefs should be allowed!

Indeed, it's a proper mess :)

> We need a new flag to tell these two situations apart. So let's add a
> FULLY_QUALIFIED flag that callers can use to ask us to enforce these
> syntactic rules. There are no callers yet, but we should be able to
> examine users of ALLOW_ONELEVEL, figure out which semantics they
> wanted, and convert as needed.

Makes sense.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
> The whole ALLOW_ONELEVEL thing is a long-standing confusion, and
> unfortunately has made it into the hands of users via "git
> check-ref-format --allow-onelevel". So I think it is there to stay.
> Possibly we should expose this new feature as --fully-qualified or
> similar.

Hm, that's really too bad. I wonder whether we should eventually start
to deprecate `--allow-onelevel` in favor of `--fully-qualified`. We
would continue to accept the flag, but remove it from our documentation
such that scripts start to move over. Then some day, we may replace
`ALLOW_ONELEVEL` with something like `ALLOW_ROOT_REF` that allows refs
in the root directory while honoring `is_pseudoref_syntax()`.

>  refs.c | 14 +++++++++++++-
>  refs.h |  1 +
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/refs.c b/refs.c
> index 8cac7e7e59..44b4419050 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -288,6 +288,15 @@ static int check_or_sanitize_refname(const char *refname, int flags,
>  {
>  	int component_len, component_count = 0;
>  
> +	if ((flags & REFNAME_FULLY_QUALIFIED)) {
> +		const char *bare_ref;
> +
> +		parse_worktree_ref(refname, NULL, NULL, &bare_ref);
> +		if (!starts_with(bare_ref, "refs/") &&
> +		    !is_pseudoref_syntax(bare_ref))
> +			return -1;
> +	}
> +
>  	if (!strcmp(refname, "@")) {
>  		/* Refname is a single character '@'. */
>  		if (sanitized)
> @@ -322,8 +331,11 @@ static int check_or_sanitize_refname(const char *refname, int flags,
>  		else
>  			return -1;
>  	}
> -	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
> +
> +	if (!(flags & (REFNAME_ALLOW_ONELEVEL | REFNAME_FULLY_QUALIFIED)) &&
> +	    component_count < 2)
>  		return -1; /* Refname has only one component. */
> +

I first thought that we don't have to handle REFNAME_FULLY_QUALIFIED
here because the above should already handle it. But we can of course
have a single component, only, when the ref is "refs/".

Patrick

>  	return 0;
>  }
>  
> diff --git a/refs.h b/refs.h
> index d278775e08..cdd859c8b7 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -571,6 +571,7 @@ int for_each_reflog(each_reflog_fn fn, void *cb_data);
>  
>  #define REFNAME_ALLOW_ONELEVEL 1
>  #define REFNAME_REFSPEC_PATTERN 2
> +#define REFNAME_FULLY_QUALIFIED 4
>  
>  /*
>   * Return 0 iff refname has the correct format for a refname according
> -- 
> 2.45.0.rc1.416.gbe2a76c799
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2024-04-30  4:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-29  8:33 [PATCH 6/8] check_refname_format(): add FULLY_QUALIFIED flag Jeff King
2024-04-29 16:01 ` Karthik Nayak
2024-04-30  9:47   ` Jeff King
2024-04-30 10:05     ` Patrick Steinhardt
2024-04-30  4:54 ` Patrick Steinhardt [this message]
2024-04-30 10:01   ` Jeff King
2024-04-30 10:09     ` Patrick Steinhardt
2024-04-30 17:00       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjB5bHiF5kAcRMpP@tanuki \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.