git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: shejialuo <shejialuo@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Collin Funk <collin.funk1@gmail.com>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] wrapper: Fix a errno discrepancy on NetBSD.
Date: Sat, 3 May 2025 22:58:58 +0800	[thread overview]
Message-ID: <aBYvMjtGjzEhKg4s@ArchLinux> (raw)
In-Reply-To: <20250503133158.GA4450@coredump.intra.peff.net>

On Sat, May 03, 2025 at 09:31:58AM -0400, Jeff King wrote:
> On Sat, May 03, 2025 at 12:57:11AM +0000, brian m. carlson wrote:
> 
> > > diff --git a/wrapper.c b/wrapper.c
> > > index 3c79778055..4d448d7c57 100644
> > > --- a/wrapper.c
> > > +++ b/wrapper.c
> > > @@ -737,7 +737,19 @@ int is_empty_or_missing_file(const char *filename)
> > >  int open_nofollow(const char *path, int flags)
> > >  {
> > >  #ifdef O_NOFOLLOW
> > > -	return open(path, flags | O_NOFOLLOW);
> > > +	int ret = open(path, flags | O_NOFOLLOW);
> > > +#ifdef __NetBSD__
> > > +	/*
> > > +	 * NetBSD sets errno to EFTYPE when path is a symlink. The only other
> > > +	 * time this errno occurs when O_REGULAR is used. Since we don't use
> > > +	 * it anywhere we can avoid an lstat here.
> > > +	 */
> > > +	if (ret < 0 && errno == EFTYPE) {
> > > +		errno = ELOOP;
> > > +		return -1;
> > > +	}
> > > +#endif
> > > +	return ret;
> > 
> > This patch seems reasonable and correct.  I don't use NetBSD, but I do
> > often test there, and I'm aware of this infelicity.  I'm surprised we
> > haven't hit it before.
> > 
> > I suspect we'll also hit this on FreeBSD, which has a similar issue in
> > that it returns `EMLINK` instead of `ELOOP`.  I do wish these two OSes
> > would provide an appropriate POSIX-compatible `open` call when set with
> > `_POSIX_SOURCE`, since this is one of the biggest portability problems
> > with them.
> 
> The inconsistency in errno has been there since open_nofollow() was
> added years ago. But we didn't notice it because in general we try not
> to be too particular about which errno value we receive.
> 
> That changed in cfea2f2da8 (packed-backend: check whether the
> "packed-refs" is regular file, 2025-02-28), which uses open_nofollow()
> to check for symlinks while we open it. But it feels like it would be
> more direct to just lstat() the file in the first place (which we end up
> doing anyway to check for other things besides symlinks!).
> 
> I.e., I'd think this would just work everywhere:
> 
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 3ad1ed0787..a247220df9 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -2071,35 +2071,32 @@ static int packed_fsck(struct ref_store *ref_store,
>  	if (o->verbose)
>  		fprintf_ln(stderr, "Checking packed-refs file %s", refs->path);
>  
> -	fd = open_nofollow(refs->path, O_RDONLY);
> -	if (fd < 0) {
> +	if (lstat(refs->path, &st) < 0) {
>  		/*
>  		 * If the packed-refs file doesn't exist, there's nothing
>  		 * to check.
>  		 */
>  		if (errno == ENOENT)
>  			goto cleanup;
>  
> -		if (errno == ELOOP) {
> -			struct fsck_ref_report report = { 0 };
> -			report.path = "packed-refs";
> -			ret = fsck_report_ref(o, &report,
> -					      FSCK_MSG_BAD_REF_FILETYPE,
> -					      "not a regular file but a symlink");
> -			goto cleanup;
> -		}
> -
> -		ret = error_errno(_("unable to open '%s'"), refs->path);
> -		goto cleanup;
> -	} else if (fstat(fd, &st) < 0) {
>  		ret = error_errno(_("unable to stat '%s'"), refs->path);
>  		goto cleanup;
> -	} else if (!S_ISREG(st.st_mode)) {
> +	}
> +
> +	if (!S_ISREG(st.st_mode)) {
>  		struct fsck_ref_report report = { 0 };
>  		report.path = "packed-refs";
>  		ret = fsck_report_ref(o, &report,
>  				      FSCK_MSG_BAD_REF_FILETYPE,
>  				      "not a regular file");
> +		/* XXX optionally could keep going here and actually
> +		 * check the contents we do find */
> +		goto cleanup;
> +	}
> +
> +	fd = open(refs->path, O_RDONLY);
> +	if (fd < 0) {
> +		ret = error_errno(_("unable to open '%s'"), refs->path);
>  		goto cleanup;
>  	}
>  
> diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> index 9d1dc2144c..34d54a7c05 100755
> --- a/t/t0602-reffiles-fsck.sh
> +++ b/t/t0602-reffiles-fsck.sh
> @@ -632,7 +632,7 @@ test_expect_success SYMLINKS 'the filetype of packed-refs should be checked' '
>  		ln -sf packed-refs-back .git/packed-refs &&
>  		test_must_fail git refs verify 2>err &&
>  		cat >expect <<-EOF &&
> -		error: packed-refs: badRefFiletype: not a regular file but a symlink
> +		error: packed-refs: badRefFiletype: not a regular file
>  		EOF
>  		rm .git/packed-refs &&
>  		test_cmp expect err &&
> 
> It's not as "atomic" as open_nofollow() and fstat(), but I don't think
> we care about that for fsck. This is about consistency checking, not
> trying to beat races against active adversaries (not to mention that our
> open_nofollow() is best-effort anyway, and may be racy).
> 

The motivation why I use "open_nofollow" is that we want to avoid race
conditions as many as possible. However, as you have said, our
"open_nofollow" function still has the race. And it would be a little
cumbersome to make "open_nofollow" compatible. So, I rather prefer that
we simply use this way to solve the problem.

> I dunno. I don't mind making errno returns more consistent to prevent a
> future foot-gun, but I think as a general rule we may be better off not
> looking too hard at errno for exotic conditions.
> 

I don't mind either.

> -Peff
> 
> PS I notice that this same function reads the whole packed-refs file
>    into a strbuf. That may be a problem, as they can grow pretty big in
>    extreme cases (e.g., GitHub's fork networks easily got into the
>    gigabytes, as it was every ref of every fork). We usually mmap it.
>    Not related to this discussion, but just something I noticed while
>    reading the function.

Peff, thanks for notifying me. I want to know more background.
Initially, the reason why I don't use `mmap` is that when checking the
ref consistency, we usually don't need to share the "packed-refs"
content for multiple processes via `mmap`.

I don't know how Github executes "git fsck" for the forked repositories.
Is there any regular tasks for "git fsck"? And would "packed-refs" file
be shared for all these repositories?

If above is the case, I agree that we should reuse the logic of
"load_contents" to enhance. But I don't know whether we need to do this
in the first place.

Thanks,
Jialuo

  reply	other threads:[~2025-05-03 14:58 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-02 23:33 [PATCH] wrapper: Fix a errno discrepancy on NetBSD Collin Funk
2025-05-03  0:57 ` brian m. carlson
2025-05-03  1:05   ` Junio C Hamano
2025-05-03  4:21     ` Collin Funk
2025-05-03 17:32       ` Junio C Hamano
2025-05-03  3:48   ` Collin Funk
2025-05-03 13:31   ` Jeff King
2025-05-03 14:58     ` shejialuo [this message]
2025-05-03 15:49       ` Jeff King
2025-05-05  6:39         ` Patrick Steinhardt
2025-05-05 12:17           ` shejialuo
2025-05-03 18:56     ` Collin Funk
2025-05-05 15:43     ` Junio C Hamano
2025-05-05 18:03       ` Jeff King
2025-05-06 13:43         ` shejialuo
2025-05-06 22:58         ` Junio C Hamano
2025-05-03  4:16 ` [PATCH v2] wrapper: NetBSD gives EFTYPE and FreeBSD gives EMFILE where POSIX uses ELOOP Collin Funk
2025-05-03 15:45   ` brian m. carlson
2025-05-03 18:44     ` Collin Funk
2025-05-05  6:43   ` Patrick Steinhardt
2025-05-05 20:41     ` Junio C Hamano
2025-05-06  1:16       ` Collin Funk
2025-05-06 13:23         ` Patrick Steinhardt
2025-05-06  1:08   ` [PATCH v3] " Collin Funk
2025-05-06 13:24     ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBYvMjtGjzEhKg4s@ArchLinux \
    --to=shejialuo@gmail.com \
    --cc=collin.funk1@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).