From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH] object-file: fix race in object collision check
Date: Fri, 3 Jan 2025 08:16:48 +0100 [thread overview]
Message-ID: <Z3eO2zxDOzbbFFNk@pks.im> (raw)
In-Reply-To: <20250101181952.GA1391912@coredump.intra.peff.net>
On Wed, Jan 01, 2025 at 01:19:52PM -0500, Jeff King wrote:
> On Wed, Jan 01, 2025 at 08:50:51AM -0800, Junio C Hamano wrote:
>
> > Jeff King <peff@peff.net> writes:
> >
> > > There is one gotcha here, though. We call this collision check only if
> > > we got EEXIST trying to move the tempfile into place. If the destination
> > > file then goes away, we can't do the collision check. But is it right to
> > > quietly return success?
> > >
> > > If the contents of the two were the same, that's fine. We don't need the
> > > extra copy.
> > >
> > > But if the contents were not the same, we'd prefer either to actually
> > > copy the contents into place, or to return an error.
> > >
> > > Of course we can't know, because the destination file has gone away. In
> > > the common case they will be the same, but the whole point of this check
> > > is to allow loosening the cryptographic collision of the packfile
> > > contents. So the safest thing would be to retain the tempfile, copying
> > > it into the destination file. That errs on the side of keeping data when
> > > we cannot make a determination.
> > >
> > > IOW, if we see ENOENT on filename_b, should we then loop back in the
> > > caller to try the link() again?
> >
> > Yuck, I think you're absolutely right.
>
> So I think this part, if adjusted as I suggested, would fix the race in
> the tests without making anything worse (it's just more code).
Hm, okay, I think that makes sense. Instead of being optimistic like my
first version was we should be pessimistic and assume the worst. Which I
guess is a sensible stance to take in a collision check.
> And then this...
>
> > > I think check_collision() is used _after_ the attempt to rename() into
> > > place. So there's a race when the tempfile goes away, but I think the
> > > outcome is made a bit worse by your patch.
> > >
> > > Consider a sequence like this:
> > >
> > > a. Process A writes tmp_pack_foo.
> > >
> > > b. Process A tries to link tmp_pack_foo to pack-<hash> but finds it
> > > already exists.
> > >
> > > c. Process A opens both tmp_pack_foo and pack-<hash>.
> > >
> > > d. Process A compares the two byte-for-byte, and then returns
> > > success/failure based on whether they were actually identical.
> > >
> > > Now imagine there is a process B that deletes the file (maybe an
> > > over-zealous "gc --prune=now" deletes the in-use temporary file):
> > >
> > > - if process B deletes it between steps (a) and (b), process A returns
> > > an error (there is nothing to link). The caller knows that the data
> > > was not stored.
> > >
> > > - if process B deletes it between (b) and (c), then before your patch
> > > we see an error (because we can't compare the files). After your
> > > patch, we continue on and return success. The caller knows the data
> > > was stored (via the original file, not our new copy).
> > >
> > > - if process B deletes it between (c) and (d), then process A has no
> > > idea. But at this point it does not matter. If the files were
> > > identical, we return success (and in fact, process A deletes the file
> > > itself). And if not identical, then we return error, and the callers
> > > knows the data was not stored.
> > >
> > > So even though the exact behavior may depend on where we hit the race, I
> > > think ignoring an ENOENT open() error on the tempfile meaningfully
> > > changes what happens in the middle case.
> > >
> > > In practice I don't really expect this to happen, and "gc --prune=now"
> > > is inherently risky in a live repository. But I think we're probably
> > > better off to continue treating it as an error if we can't open our own
> > > tempfile.
> >
> > So we'd ignore the racy and flaky tests, as hiding the flake by
> > ignoring the error would only hurt the real world users.
>
> ...is all about ignoring ENOENT on the tmpfile itself. And I think we
> can just drop that hunk entirely. The tests do not care here (they are
> running simultaneous gc, but _not_ a simultaneous "--prune=now" gc).
Yeah, we don't need this hunk then.
I'll send a follow-up change.
Patrick
prev parent reply other threads:[~2025-01-03 7:16 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-30 10:32 [PATCH] object-file: fix race in object collision check Patrick Steinhardt
2024-12-30 14:40 ` Junio C Hamano
2024-12-30 14:50 ` Patrick Steinhardt
2024-12-30 14:54 ` Junio C Hamano
2024-12-31 1:42 ` Jeff King
2025-01-01 16:50 ` Junio C Hamano
2025-01-01 18:19 ` Jeff King
2025-01-03 7:16 ` Patrick Steinhardt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z3eO2zxDOzbbFFNk@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).