From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/2] object-file: retry linking file into place when occluding file vanishes
Date: Mon, 6 Jan 2025 12:11:43 +0100 [thread overview]
Message-ID: <Z3u6bzyaDyzjV65X@pks.im> (raw)
In-Reply-To: <20250103194058.GE3208749@coredump.intra.peff.net>
On Fri, Jan 03, 2025 at 02:40:58PM -0500, Jeff King wrote:
> On Fri, Jan 03, 2025 at 09:19:55AM +0100, Patrick Steinhardt wrote:
>
> > In a preceding commit, we have adapted `check_collision()` to ignore
>
> If it's in next and we are building on top, I think we can mention it by
> name: 0ad3d65652 (object-file: fix race in object collision check,
> 2024-12-30).
Okay, good to know. I wasn't sure whether the patches might get rewound
when `next` gets rewound.
> > the case where either of the colliding files vanishes. This should be
> > safe in general when we assume that the contents of these two files were
> > the same. But the check is all about detecting collisions, so that
> > assumption may be too optimistic.
>
> I found this a little vague about what "too optimistic" means. ;)
> Maybe something like:
>
> Prior to 0ad3d65652, callers could expect that a successful return
> from finalize_object_file() means that either the file was moved into
> place, or the identical bytes were already present. If neither of
> those happens, we'd return an error.
>
> Since that commit, if the destination file disappears between our
> link() call and the collision check, we'd return success without
> actually checking the contents, and without retrying the link. This
> solves the common case that the files were indeed the same, but it
> means that we may corrupt the repository if they weren't (this implies
> a hash collision, but the whole point of this function is protecting
> against hash collisions).
>
> We can't be pessimistic and assume they're different; that hurts the
> common case that 0ad3d65652 was trying to fix. But after seeing that
> the destination file went away, we can retry linking again...
Well, as usual your commit messages are something to aspire to :)
Thanks.
> > Furthermore, stop treating `ENOENT` specially for the source file. It
> > shouldn't happen that the source vanishes as we're using a fresh
> > temporary file for it, so if it does vanish it indicates an actual
> > error.
>
> OK. I think this is worth doing, but I'd probably have put it into its
> own commit.
Fair enough, can do.
> > @@ -2034,8 +2037,10 @@ int finalize_object_file(const char *tmpfile, const char *filename)
> > int finalize_object_file_flags(const char *tmpfile, const char *filename,
> > enum finalize_object_file_flags flags)
> > {
> > - struct stat st;
> > - int ret = 0;
> > + int ret;
> > +
> > +retry:
> > + ret = 0;
> >
> > if (object_creation_mode == OBJECT_CREATION_USES_RENAMES)
> > goto try_rename;
> > @@ -2056,6 +2061,8 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename,
> > * left to unlink.
> > */
> > if (ret && ret != EEXIST) {
> > + struct stat st;
> > +
>
> OK, we move the stat struct here where it's needed. I think that's
> somewhat orthogonal to your patch, but reduced scoping does help make
> the goto's less confusing.
>
> I suspect there's a way to write this as a loop that would be more
> structured, but it would be a bigger refactor. Bonus points if it also
> get rid of the try_rename goto, too. ;)
>
> I'm OK punting on that, though.
Yeah, I'll punt on it for now. I don't love the resulting structure, but
it's also not that uncommon in our codebase.
> > @@ -2071,9 +2078,13 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename,
> > errno = saved_errno;
> > return error_errno(_("unable to write file %s"), filename);
> > }
> > - if (!(flags & FOF_SKIP_COLLISION_CHECK) &&
> > - check_collision(tmpfile, filename))
> > + if (!(flags & FOF_SKIP_COLLISION_CHECK)) {
> > + ret = check_collision(tmpfile, filename);
> > + if (ret == CHECK_COLLISION_DEST_VANISHED)
> > + goto retry;
> > + else if (ret)
> > return -1;
> > + }
> > unlink_or_warn(tmpfile);
> > }
>
> I share Junio's uneasiness with looping forever based on external input
> from the filesystem (even though you _should_ eventually win the race,
> that's not guaranteed, and of course a weird filesystem might confuse
> us). Could we put a stop-gap in it like:
Fair enough. I was also wondering about whether or not I should have a
retry counter when writing it but couldn't think about a (sane) scenario
where it would be needed. But yeah, filesystems can be weird, and it's
not a lot of code either, so I'll add it.
Patrick
next prev parent reply other threads:[~2025-01-06 11:11 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-03 8:19 [PATCH 0/2] object-file: retry linking file into place when occluding file vanishes Patrick Steinhardt
2025-01-03 8:19 ` [PATCH 1/2] object-file: rename variables in `check_collision()` Patrick Steinhardt
2025-01-03 19:10 ` Jeff King
2025-01-03 8:19 ` [PATCH 2/2] object-file: retry linking file into place when occluding file vanishes Patrick Steinhardt
2025-01-03 16:18 ` Junio C Hamano
2025-01-03 19:40 ` Jeff King
2025-01-03 19:59 ` Jeff King
2025-01-03 22:29 ` Junio C Hamano
2025-01-06 11:11 ` Patrick Steinhardt
2025-01-07 1:25 ` Jeff King
2025-01-03 20:25 ` Junio C Hamano
2025-01-06 11:11 ` Patrick Steinhardt [this message]
2025-01-06 9:24 ` [PATCH v2 0/3] " Patrick Steinhardt
2025-01-06 9:24 ` [PATCH v2 1/3] object-file: rename variables in `check_collision()` Patrick Steinhardt
2025-01-06 9:24 ` [PATCH v2 2/3] object-file: don't special-case missing source file in collision check Patrick Steinhardt
2025-01-06 9:24 ` [PATCH v2 3/3] object-file: retry linking file into place when occluding file vanishes Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z3u6bzyaDyzjV65X@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).