git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org,  Jeff King <peff@peff.net>,
	 "brian m. carlson" <sandals@crustytoothpaste.net>,
	 Elijah Newren <newren@gmail.com>, Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v3 3/9] finalize_object_file(): implement collision check
Date: Fri, 06 Sep 2024 14:44:12 -0700	[thread overview]
Message-ID: <xmqqh6asv4nn.fsf@gitster.g> (raw)
In-Reply-To: <0feee5d1d4fc1e0aa1ca5d98f2f404ecdbb2f6b6.1725651952.git.me@ttaylorr.com> (Taylor Blau's message of "Fri, 6 Sep 2024 15:46:12 -0400")

Taylor Blau <me@ttaylorr.com> writes:

> Note that this may cause some extra computation when the files are in
> fact identical, but this should happen rarely. For example, when writing
> a loose object, we compute the object id first, then check manually for
> an existing object (so that we can freshen its timestamp) before
> actually trying to write and link the new data.

True.

> +static int check_collision(const char *filename_a, const char *filename_b)
> +{
> +	char buf_a[4096], buf_b[4096];
> +	int fd_a = -1, fd_b = -1;
> +	int ret = 0;
> +
> +	fd_a = open(filename_a, O_RDONLY);
> +	if (fd_a < 0) {
> +		ret = error_errno(_("unable to open %s"), filename_a);
> +		goto out;
> +	}
> +
> +	fd_b = open(filename_b, O_RDONLY);
> +	if (fd_b < 0) {
> +		ret = error_errno(_("unable to open %s"), filename_b);
> +		goto out;
> +	}

Two and two half comments on this function.

 * We compare 4k at a time here, while copy.c copies 8k at a time,
   and bulk-checkin.c uses 16k at a time.  Outside the scope of this
   topic, we probably should pick one number and stick to it, unless
   we have measured to pick perfect number for each case (and I know
   I picked 8k for copy.c and 16k for bulk-checkin.c both out of
   thin air).

 * I would have expected at least we would fstat() them to declare
   difference immediately after we find their sizes differ, for
   example.  As we assume that calling into this function should be
   rare, we prefer not to pay in complexity for performance here?

 * We use read_in_full() and assume that a short-read return from
   the function happens only at the end of file due to EOF, which is
   another reason why we can do away without fstat() on these files.

 * An error causes the caller to assume collision (because we assign
   the return value of error() to ret), which should do the same
   action as an actual collision to abort and keep the problematic
   file for forensics.

>  /*
>   * Move the just written object into its final resting place.
>   */
> @@ -1941,8 +1992,8 @@ int finalize_object_file(const char *tmpfile, const char *filename)
>  			errno = saved_errno;
>  			return error_errno(_("unable to write file %s"), filename);
>  		}
> -		/* FIXME!!! Collision check here ? */
> -		unlink_or_warn(tmpfile);
> +		if (check_collision(tmpfile, filename))
> +			return -1;
>  	}

OK.

  reply	other threads:[~2024-09-06 21:44 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-01 16:03 [PATCH 0/4] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Taylor Blau
2024-09-01 16:03 ` [PATCH 1/4] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03 19:34     ` Taylor Blau
2024-09-01 16:03 ` [PATCH 2/4] hash.h: scaffolding for _fast hashing variants Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03 17:27     ` Junio C Hamano
2024-09-03 19:52       ` Taylor Blau
2024-09-03 20:47         ` Junio C Hamano
2024-09-03 21:24           ` Taylor Blau
2024-09-04  7:05           ` Patrick Steinhardt
2024-09-04 14:53             ` Junio C Hamano
2024-09-03 19:40     ` Taylor Blau
2024-09-01 16:03 ` [PATCH 3/4] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03 19:43     ` Taylor Blau
2024-09-01 16:03 ` [PATCH 4/4] csum-file.c: use fast SHA-1 implementation when available Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03  1:22     ` brian m. carlson
2024-09-03 19:50     ` Taylor Blau
2024-09-02  3:41 ` [PATCH 0/4] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Junio C Hamano
2024-09-03 19:48   ` Taylor Blau
2024-09-03 20:44     ` Junio C Hamano
2024-09-02 14:08 ` brian m. carlson
2024-09-03 19:47   ` Taylor Blau
2024-09-03 22:41     ` Junio C Hamano
2024-09-04 14:01     ` brian m. carlson
2024-09-05 10:37     ` Jeff King
2024-09-05 15:41       ` Junio C Hamano
2024-09-05 16:23         ` Taylor Blau
2024-09-05 16:51           ` Junio C Hamano
2024-09-05 17:04             ` Taylor Blau
2024-09-05 17:51               ` Taylor Blau
2024-09-05 20:21                 ` Taylor Blau
2024-09-05 20:27               ` Jeff King
2024-09-05 21:27                 ` Junio C Hamano
2024-09-05 15:11 ` [PATCH v2 " Taylor Blau
2024-09-05 15:12   ` [PATCH v2 1/4] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-05 15:12   ` [PATCH v2 2/4] hash.h: scaffolding for _fast hashing variants Taylor Blau
2024-09-05 15:12   ` [PATCH v2 3/4] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-05 15:12   ` [PATCH v2 4/4] csum-file.c: use fast SHA-1 implementation when available Taylor Blau
2024-09-06 19:46 ` [PATCH v3 0/9] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Taylor Blau
2024-09-06 19:46   ` [PATCH v3 1/9] finalize_object_file(): check for name collision before renaming Taylor Blau
2024-09-06 19:46   ` [PATCH v3 2/9] finalize_object_file(): refactor unlink_or_warn() placement Taylor Blau
2024-09-06 19:46   ` [PATCH v3 3/9] finalize_object_file(): implement collision check Taylor Blau
2024-09-06 21:44     ` Junio C Hamano [this message]
2024-09-06 21:51       ` Chris Torek
2024-09-10  6:53       ` Jeff King
2024-09-10 15:14         ` Junio C Hamano
2024-09-16 10:45     ` Patrick Steinhardt
2024-09-16 15:54       ` Taylor Blau
2024-09-16 16:03         ` Taylor Blau
2024-09-17 20:40       ` Junio C Hamano
2024-09-06 19:46   ` [PATCH v3 4/9] pack-objects: use finalize_object_file() to rename pack/idx/etc Taylor Blau
2024-09-06 19:46   ` [PATCH v3 5/9] i5500-git-daemon.sh: use compile-able version of Git without OpenSSL Taylor Blau
2024-09-11  6:10     ` Jeff King
2024-09-11  6:12       ` Jeff King
2024-09-12 20:28         ` Junio C Hamano
2024-09-11 15:28       ` Junio C Hamano
2024-09-11 21:23         ` Jeff King
2024-09-06 19:46   ` [PATCH v3 6/9] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-06 19:46   ` [PATCH v3 7/9] hash.h: scaffolding for _fast hashing variants Taylor Blau
2024-09-06 19:46   ` [PATCH v3 8/9] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-06 19:46   ` [PATCH v3 9/9] csum-file.c: use fast SHA-1 implementation when available Taylor Blau
2024-09-06 21:50   ` [PATCH v3 0/9] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Junio C Hamano
2024-09-24 17:32 ` [PATCH v4 0/8] " Taylor Blau
2024-09-24 17:32   ` [PATCH v4 1/8] finalize_object_file(): check for name collision before renaming Taylor Blau
2024-09-25 17:02     ` Junio C Hamano
2024-09-24 17:32   ` [PATCH v4 2/8] finalize_object_file(): refactor unlink_or_warn() placement Taylor Blau
2024-09-24 17:32   ` [PATCH v4 3/8] finalize_object_file(): implement collision check Taylor Blau
2024-09-24 20:37     ` Jeff King
2024-09-24 21:59       ` Taylor Blau
2024-09-24 22:20         ` Jeff King
2024-09-25 18:06           ` Taylor Blau
2024-09-24 21:32     ` Junio C Hamano
2024-09-24 22:02       ` Taylor Blau
2024-09-24 17:32   ` [PATCH v4 4/8] pack-objects: use finalize_object_file() to rename pack/idx/etc Taylor Blau
2024-09-24 21:34     ` Junio C Hamano
2024-09-24 17:32   ` [PATCH v4 5/8] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-24 17:32   ` [PATCH v4 6/8] hash.h: scaffolding for _unsafe hashing variants Taylor Blau
2024-09-24 17:32   ` [PATCH v4 7/8] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-24 17:32   ` [PATCH v4 8/8] csum-file.c: use unsafe SHA-1 implementation when available Taylor Blau
2024-09-24 20:52   ` [PATCH v4 0/8] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Jeff King
2024-09-25 16:58   ` Elijah Newren
2024-09-25 17:11     ` Junio C Hamano
2024-09-25 17:22       ` Taylor Blau
2024-09-25 17:22     ` Taylor Blau
2024-09-26 15:22 ` [PATCH v5 " Taylor Blau
2024-09-26 15:22   ` [PATCH v5 1/8] finalize_object_file(): check for name collision before renaming Taylor Blau
2024-09-26 15:22   ` [PATCH v5 2/8] finalize_object_file(): refactor unlink_or_warn() placement Taylor Blau
2024-09-26 15:22   ` [PATCH v5 3/8] finalize_object_file(): implement collision check Taylor Blau
2024-09-26 15:22   ` [PATCH v5 4/8] pack-objects: use finalize_object_file() to rename pack/idx/etc Taylor Blau
2024-09-26 15:22   ` [PATCH v5 5/8] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-26 15:22   ` [PATCH v5 6/8] hash.h: scaffolding for _unsafe hashing variants Taylor Blau
2024-09-26 15:22   ` [PATCH v5 7/8] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-26 15:22   ` [PATCH v5 8/8] csum-file.c: use unsafe SHA-1 implementation when available Taylor Blau
2024-09-26 22:47   ` [PATCH v5 0/8] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Elijah Newren
2024-09-27  0:44     ` Junio C Hamano
2024-09-27  3:57   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqh6asv4nn.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).