From: Jeff King <peff@peff.net>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 6/8] check_refname_format(): add FULLY_QUALIFIED flag
Date: Tue, 30 Apr 2024 06:01:45 -0400 [thread overview]
Message-ID: <20240430100145.GB1279403@coredump.intra.peff.net> (raw)
In-Reply-To: <ZjB5bHiF5kAcRMpP@tanuki>
On Tue, Apr 30, 2024 at 06:54:04AM +0200, Patrick Steinhardt wrote:
> On Mon, Apr 29, 2024 at 04:33:25AM -0400, Jeff King wrote:
> > Before operating on a refname we get from a user, we usually check that
> > it's syntactically valid. As a general rule, refs should be in the
> > "refs/" namespace, the exception being HEAD and pseudorefs like
> > FETCH_HEAD, etc. Those pseudorefs should consist only of all-caps and
> > dash. But the syntactic rules are not enforced by check_refname_format().
>
> s/dash/underscore, right?
Yep, thanks.
> > Making things even more complicated, refname_is_safe() does enforce
> > these syntax restrictions! When that function was added in 2014, we
> > would have refused to work with such refs entirely. But we stopped being
> > so picky in 03afcbee9b (read_packed_refs: avoid double-checking sane
> > refs, 2015-04-16). That rationale there is that check_refname_format()
> > is supposed to contain a superset of the checks of refname_is_safe().
> > The idea being that we usually would rely on the more-strict
> > check_refname_format(), but for certain operations (e.g., deleting a
> > ref) we want to allow invalid names as long as they are not unsafe
> > (e.g., not escaping the on-disk "refs/" hierarchy).
>
> I still think we should eventually merge these functions. It's not
> exactly obvious why one would use one or the other. So if we had a
> function with strict default behaviour, where the caller can ask for
> some loosening of the behaviour via flags, then I think it would become
> a ton easier to do the right thing.
>
> In any case, that doesn't need to be part of this patch series.
Yeah, I think it would be fine to merge them with a flag. With the one
exception that started this topic (and which I think could go away after
this series), every caller of refname_is_safe() only does so in
conjunction with check_refname_format() to check "well, is it at least
safe?".
You will have to tweak the return value somehow to indicate "ok" versus
"bad but safe" versus "unsafe" (e.g., resolving sets REF_BAD_NAME but
continues for the middle one). I think I'd prefer to leave that out of
this series (and after this, I think it becomes easier as a pure
refactoring since the behavior remains the same).
> > The whole ALLOW_ONELEVEL thing is a long-standing confusion, and
> > unfortunately has made it into the hands of users via "git
> > check-ref-format --allow-onelevel". So I think it is there to stay.
> > Possibly we should expose this new feature as --fully-qualified or
> > similar.
>
> Hm, that's really too bad. I wonder whether we should eventually start
> to deprecate `--allow-onelevel` in favor of `--fully-qualified`. We
> would continue to accept the flag, but remove it from our documentation
> such that scripts start to move over. Then some day, we may replace
> `ALLOW_ONELEVEL` with something like `ALLOW_ROOT_REF` that allows refs
> in the root directory while honoring `is_pseudoref_syntax()`.
I don't know if we could ever get rid of --allow-onelevel. If you want
to check a branch name, say, the replacement for it is to ask about
"refs/heads/$name". But sometimes you don't actually know how the short
name is going to be used, but you want to make sure it's syntactically
valid. E.g., validating a refspec may involve a name like "main" on its
own. I suspect it would be OK in practice to just give it an arbitrary
"refs/foo/$main", but that feels kind of hacky.
-Peff
> > @@ -288,6 +288,15 @@ static int check_or_sanitize_refname(const char *refname, int flags,
> > {
> > int component_len, component_count = 0;
> >
> > + if ((flags & REFNAME_FULLY_QUALIFIED)) {
> > + const char *bare_ref;
> > +
> > + parse_worktree_ref(refname, NULL, NULL, &bare_ref);
> > + if (!starts_with(bare_ref, "refs/") &&
> > + !is_pseudoref_syntax(bare_ref))
> > + return -1;
> > + }
> > +
> > if (!strcmp(refname, "@")) {
> > /* Refname is a single character '@'. */
> > if (sanitized)
> > @@ -322,8 +331,11 @@ static int check_or_sanitize_refname(const char *refname, int flags,
> > else
> > return -1;
> > }
> > - if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
> > +
> > + if (!(flags & (REFNAME_ALLOW_ONELEVEL | REFNAME_FULLY_QUALIFIED)) &&
> > + component_count < 2)
> > return -1; /* Refname has only one component. */
> > +
>
> I first thought that we don't have to handle REFNAME_FULLY_QUALIFIED
> here because the above should already handle it. But we can of course
> have a single component, only, when the ref is "refs/".
I hadn't really considered that case. The reason we have to handle
FULLY_QUALIFIED here is that without it, "FETCH_HEAD" (or for that
matter "HEAD") is forbidden as having only a single component. The
earlier hunk only rejects bad things, so we still end up in this code.
I think that "refs/" is forbidden both before and after my patch because
it's invalid to have a zero-length component (so "foo//bar" is
forbidden, but so is "foo/" because of the empty component on the end).a
-Peff
next prev parent reply other threads:[~2024-04-30 10:01 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-29 8:33 [PATCH 6/8] check_refname_format(): add FULLY_QUALIFIED flag Jeff King
2024-04-29 16:01 ` Karthik Nayak
2024-04-30 9:47 ` Jeff King
2024-04-30 10:05 ` Patrick Steinhardt
2024-04-30 4:54 ` Patrick Steinhardt
2024-04-30 10:01 ` Jeff King [this message]
2024-04-30 10:09 ` Patrick Steinhardt
2024-04-30 17:00 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240430100145.GB1279403@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).