From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Patrick Steinhardt <ps@pks.im>,
git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH] object-file: fix race in object collision check
Date: Wed, 01 Jan 2025 08:50:51 -0800 [thread overview]
Message-ID: <xmqq7c7exytw.fsf@gitster.g> (raw)
In-Reply-To: <20241231014220.GA225521@coredump.intra.peff.net> (Jeff King's message of "Mon, 30 Dec 2024 20:42:20 -0500")
Jeff King <peff@peff.net> writes:
> There is one gotcha here, though. We call this collision check only if
> we got EEXIST trying to move the tempfile into place. If the destination
> file then goes away, we can't do the collision check. But is it right to
> quietly return success?
>
> If the contents of the two were the same, that's fine. We don't need the
> extra copy.
>
> But if the contents were not the same, we'd prefer either to actually
> copy the contents into place, or to return an error.
>
> Of course we can't know, because the destination file has gone away. In
> the common case they will be the same, but the whole point of this check
> is to allow loosening the cryptographic collision of the packfile
> contents. So the safest thing would be to retain the tempfile, copying
> it into the destination file. That errs on the side of keeping data when
> we cannot make a determination.
>
> IOW, if we see ENOENT on filename_b, should we then loop back in the
> caller to try the link() again?
Yuck, I think you're absolutely right.
> I think check_collision() is used _after_ the attempt to rename() into
> place. So there's a race when the tempfile goes away, but I think the
> outcome is made a bit worse by your patch.
>
> Consider a sequence like this:
>
> a. Process A writes tmp_pack_foo.
>
> b. Process A tries to link tmp_pack_foo to pack-<hash> but finds it
> already exists.
>
> c. Process A opens both tmp_pack_foo and pack-<hash>.
>
> d. Process A compares the two byte-for-byte, and then returns
> success/failure based on whether they were actually identical.
>
> Now imagine there is a process B that deletes the file (maybe an
> over-zealous "gc --prune=now" deletes the in-use temporary file):
>
> - if process B deletes it between steps (a) and (b), process A returns
> an error (there is nothing to link). The caller knows that the data
> was not stored.
>
> - if process B deletes it between (b) and (c), then before your patch
> we see an error (because we can't compare the files). After your
> patch, we continue on and return success. The caller knows the data
> was stored (via the original file, not our new copy).
>
> - if process B deletes it between (c) and (d), then process A has no
> idea. But at this point it does not matter. If the files were
> identical, we return success (and in fact, process A deletes the file
> itself). And if not identical, then we return error, and the callers
> knows the data was not stored.
>
> So even though the exact behavior may depend on where we hit the race, I
> think ignoring an ENOENT open() error on the tempfile meaningfully
> changes what happens in the middle case.
>
> In practice I don't really expect this to happen, and "gc --prune=now"
> is inherently risky in a live repository. But I think we're probably
> better off to continue treating it as an error if we can't open our own
> tempfile.
So we'd ignore the racy and flaky tests, as hiding the flake by
ignoring the error would only hurt the real world users.
next prev parent reply other threads:[~2025-01-01 16:50 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-30 10:32 [PATCH] object-file: fix race in object collision check Patrick Steinhardt
2024-12-30 14:40 ` Junio C Hamano
2024-12-30 14:50 ` Patrick Steinhardt
2024-12-30 14:54 ` Junio C Hamano
2024-12-31 1:42 ` Jeff King
2025-01-01 16:50 ` Junio C Hamano [this message]
2025-01-01 18:19 ` Jeff King
2025-01-03 7:16 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq7c7exytw.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.