git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Elijah Newren <newren@gmail.com>, Patrick Steinhardt <ps@pks.im>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v4 3/8] finalize_object_file(): implement collision check
Date: Tue, 24 Sep 2024 17:59:10 -0400	[thread overview]
Message-ID: <ZvM2Lkb0/LPrqizO@nand.local> (raw)
In-Reply-To: <20240924203718.GA586150@coredump.intra.peff.net>

On Tue, Sep 24, 2024 at 04:37:18PM -0400, Jeff King wrote:
> But I think we can expand the argument a bit. I don't think this is a
> very good place for such a collision check, because race conditions
> aside, we'll already have bailed long before! When we do a write via
> write_object_file(), for example, we'll hit the freshen paths that
> assume the contents are identical, and skip the write (and thus this
> finalize) entirely.
>
> So if you want a collision check, we have to do it at a much higher
> level. And indeed we do: index-pack already implements such a check
> (through the confusingly named "check_collison" function). I don't think
> unpack-objects does so (it looks like it just feeds the result to
> write_object_file(), which does the freshening thing).
>
> So the argument I'd make here is more like: this is the wrong place to
> do it.

I think that is reasonable, and I agree with your reasoning here. I'm
happy to reword the commit message if you think doing so would be
useful, or drop the paragraph entirely if you think it's just confusing.

Let me know what you think, I'm happy to do whatever here, reroll or not
:-).

>   Side thoughts on collision checks:
>
>     I think index-pack is safe in the sense that it will always prefer
>     on-disk copies and will complain if it sees a collision.
>     unpack-objects is not, nor are regular in-repo writes (which
>     normally cannot be triggered by remote, but on forges that do
>     merges, etc, that's not always true). Both of the latter are also
>     subject to races, where a simultaneous collision might let the
>     attacker win.

Yup.

>     That race is kind of moot in a world where anybody can push to a
>     fork of a repo that ends up in the same shared location (so they can
>     actually win and become the "on disk" copy), and we're relying on
>     sha1dc for protection there. That's specific to certain forges, but
>     they do represent a lot of Git use.
>
>     In general, the use of unpack-objects versus index-pack is up to the
>     attacker (based on pack size). So I think it would be nice if
>     unpack-objects did the collision check. Even if the attacker beats
>     you to writing the object, it would be nice to see "holy crap, there
>     was a collision" instead of just silently throwing your pushed
>     object away.

Right, I agree that unpack-objects definitely should do the collision
check here. And indeed, that is the case, since that code (which all is
directly implemented in the builtin) uses the regular
collision-detecting SHA-1 implementation.

>     I know that GitHub only ever runs index-pack, which may give some
>     amount of protection here. In general, I think we should consider
>     deprecating unpack-objects in favor of teaching index-pack to
>     unpack. It has many enhancements (this one, but also threading) that
>     unpack-objects does not. I have an old patch series for this (sorry,
>     only from 2017, I'm slipping). IIRC the sticking point was that
>     unpack-objects is better about memory use in some cases (streaming
>     large blobs?) and I hadn't figured out a way around that.

Only seven years old? Sheesh, you really are slipping ;-).

> Phew. Sorry, that was a long digression for "I think you're right, but I
> might have argued it a little differently". I think the direction of the
> patch (skipping checks entirely for loose objects) is the right thing.
>
> > As a small note related to the latter bullet point above, we must teach
> > the tmp-objdir routines to similarly skip the content-level collision
> > checks when calling migrate_one() on a loose object file, which we do by
> > setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose
> > object shard.
>
> OK, this is the part I was wondering how you were going to deal with. :)
> Let's look at the implementation...
>
> > @@ -239,11 +247,15 @@ static int migrate_paths(struct strbuf *src, struct strbuf *dst)
> >
> >  	for (i = 0; i < paths.nr; i++) {
> >  		const char *name = paths.items[i].string;
> > +		enum finalize_object_file_flags flags_copy = flags;
> >
> >  		strbuf_addf(src, "/%s", name);
> >  		strbuf_addf(dst, "/%s", name);
> >
> > -		ret |= migrate_one(src, dst);
> > +		if (is_loose_object_shard(name))
> > +			flags_copy |= FOF_SKIP_COLLISION_CHECK;
> > +
> > +		ret |= migrate_one(src, dst, flags_copy);
> >
> >  		strbuf_setlen(src, src_len);
> >  		strbuf_setlen(dst, dst_len);
>
> This looks pretty reasonable overall, though I'd note that
> migrate_paths() is called recursively. So if I had:
>
>   tmp-objdir-XXXXXX/pack/ab/foo.pack
>
> we'd skip the collision check. I'm not sure how much we want to worry
> about that. I don't think we'd ever create such a path, and the general
> form of the paths is all under local control, so an attacker can't
> convince us to do so. And I find it pretty unlikely that we'd change the
> on-disk layout to accidentally trigger this.

I thought about this when writing it, and came to the conclusion that
the checks we have for "are we in something that looks like a loose
object shard?" are sane. That's only because we won't bother reading a
pack in $GIT_DIR/objects/pack/xx/yy.pack, since we do not read packs
recursively from the pack sub-directory.

So I think that the diff you posted below isn't hurting anything, and
the implementation looks correct to me, but I also don't think it's
helping anything either as a consequence of the above.

Thanks,
Taylor

  reply	other threads:[~2024-09-24 21:59 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-01 16:03 [PATCH 0/4] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Taylor Blau
2024-09-01 16:03 ` [PATCH 1/4] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03 19:34     ` Taylor Blau
2024-09-01 16:03 ` [PATCH 2/4] hash.h: scaffolding for _fast hashing variants Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03 17:27     ` Junio C Hamano
2024-09-03 19:52       ` Taylor Blau
2024-09-03 20:47         ` Junio C Hamano
2024-09-03 21:24           ` Taylor Blau
2024-09-04  7:05           ` Patrick Steinhardt
2024-09-04 14:53             ` Junio C Hamano
2024-09-03 19:40     ` Taylor Blau
2024-09-01 16:03 ` [PATCH 3/4] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03 19:43     ` Taylor Blau
2024-09-01 16:03 ` [PATCH 4/4] csum-file.c: use fast SHA-1 implementation when available Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03  1:22     ` brian m. carlson
2024-09-03 19:50     ` Taylor Blau
2024-09-02  3:41 ` [PATCH 0/4] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Junio C Hamano
2024-09-03 19:48   ` Taylor Blau
2024-09-03 20:44     ` Junio C Hamano
2024-09-02 14:08 ` brian m. carlson
2024-09-03 19:47   ` Taylor Blau
2024-09-03 22:41     ` Junio C Hamano
2024-09-04 14:01     ` brian m. carlson
2024-09-05 10:37     ` Jeff King
2024-09-05 15:41       ` Junio C Hamano
2024-09-05 16:23         ` Taylor Blau
2024-09-05 16:51           ` Junio C Hamano
2024-09-05 17:04             ` Taylor Blau
2024-09-05 17:51               ` Taylor Blau
2024-09-05 20:21                 ` Taylor Blau
2024-09-05 20:27               ` Jeff King
2024-09-05 21:27                 ` Junio C Hamano
2024-09-05 15:11 ` [PATCH v2 " Taylor Blau
2024-09-05 15:12   ` [PATCH v2 1/4] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-05 15:12   ` [PATCH v2 2/4] hash.h: scaffolding for _fast hashing variants Taylor Blau
2024-09-05 15:12   ` [PATCH v2 3/4] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-05 15:12   ` [PATCH v2 4/4] csum-file.c: use fast SHA-1 implementation when available Taylor Blau
2024-09-06 19:46 ` [PATCH v3 0/9] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Taylor Blau
2024-09-06 19:46   ` [PATCH v3 1/9] finalize_object_file(): check for name collision before renaming Taylor Blau
2024-09-06 19:46   ` [PATCH v3 2/9] finalize_object_file(): refactor unlink_or_warn() placement Taylor Blau
2024-09-06 19:46   ` [PATCH v3 3/9] finalize_object_file(): implement collision check Taylor Blau
2024-09-06 21:44     ` Junio C Hamano
2024-09-06 21:51       ` Chris Torek
2024-09-10  6:53       ` Jeff King
2024-09-10 15:14         ` Junio C Hamano
2024-09-16 10:45     ` Patrick Steinhardt
2024-09-16 15:54       ` Taylor Blau
2024-09-16 16:03         ` Taylor Blau
2024-09-17 20:40       ` Junio C Hamano
2024-09-06 19:46   ` [PATCH v3 4/9] pack-objects: use finalize_object_file() to rename pack/idx/etc Taylor Blau
2024-09-06 19:46   ` [PATCH v3 5/9] i5500-git-daemon.sh: use compile-able version of Git without OpenSSL Taylor Blau
2024-09-11  6:10     ` Jeff King
2024-09-11  6:12       ` Jeff King
2024-09-12 20:28         ` Junio C Hamano
2024-09-11 15:28       ` Junio C Hamano
2024-09-11 21:23         ` Jeff King
2024-09-06 19:46   ` [PATCH v3 6/9] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-06 19:46   ` [PATCH v3 7/9] hash.h: scaffolding for _fast hashing variants Taylor Blau
2024-09-06 19:46   ` [PATCH v3 8/9] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-06 19:46   ` [PATCH v3 9/9] csum-file.c: use fast SHA-1 implementation when available Taylor Blau
2024-09-06 21:50   ` [PATCH v3 0/9] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Junio C Hamano
2024-09-24 17:32 ` [PATCH v4 0/8] " Taylor Blau
2024-09-24 17:32   ` [PATCH v4 1/8] finalize_object_file(): check for name collision before renaming Taylor Blau
2024-09-25 17:02     ` Junio C Hamano
2024-09-24 17:32   ` [PATCH v4 2/8] finalize_object_file(): refactor unlink_or_warn() placement Taylor Blau
2024-09-24 17:32   ` [PATCH v4 3/8] finalize_object_file(): implement collision check Taylor Blau
2024-09-24 20:37     ` Jeff King
2024-09-24 21:59       ` Taylor Blau [this message]
2024-09-24 22:20         ` Jeff King
2024-09-25 18:06           ` Taylor Blau
2024-09-24 21:32     ` Junio C Hamano
2024-09-24 22:02       ` Taylor Blau
2024-09-24 17:32   ` [PATCH v4 4/8] pack-objects: use finalize_object_file() to rename pack/idx/etc Taylor Blau
2024-09-24 21:34     ` Junio C Hamano
2024-09-24 17:32   ` [PATCH v4 5/8] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-24 17:32   ` [PATCH v4 6/8] hash.h: scaffolding for _unsafe hashing variants Taylor Blau
2024-09-24 17:32   ` [PATCH v4 7/8] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-24 17:32   ` [PATCH v4 8/8] csum-file.c: use unsafe SHA-1 implementation when available Taylor Blau
2024-09-24 20:52   ` [PATCH v4 0/8] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Jeff King
2024-09-25 16:58   ` Elijah Newren
2024-09-25 17:11     ` Junio C Hamano
2024-09-25 17:22       ` Taylor Blau
2024-09-25 17:22     ` Taylor Blau
2024-09-26 15:22 ` [PATCH v5 " Taylor Blau
2024-09-26 15:22   ` [PATCH v5 1/8] finalize_object_file(): check for name collision before renaming Taylor Blau
2024-09-26 15:22   ` [PATCH v5 2/8] finalize_object_file(): refactor unlink_or_warn() placement Taylor Blau
2024-09-26 15:22   ` [PATCH v5 3/8] finalize_object_file(): implement collision check Taylor Blau
2024-09-26 15:22   ` [PATCH v5 4/8] pack-objects: use finalize_object_file() to rename pack/idx/etc Taylor Blau
2024-09-26 15:22   ` [PATCH v5 5/8] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-26 15:22   ` [PATCH v5 6/8] hash.h: scaffolding for _unsafe hashing variants Taylor Blau
2024-09-26 15:22   ` [PATCH v5 7/8] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-26 15:22   ` [PATCH v5 8/8] csum-file.c: use unsafe SHA-1 implementation when available Taylor Blau
2024-09-26 22:47   ` [PATCH v5 0/8] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Elijah Newren
2024-09-27  0:44     ` Junio C Hamano
2024-09-27  3:57   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZvM2Lkb0/LPrqizO@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).