All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: <git@vger.kernel.org>, Taylor Blau <me@ttaylorr.com>,
	Jason Hatton <jhatton@globalfinishing.com>
Subject: Re: [PATCH v2 2/2] Prevent git from rehashing 4GiB files
Date: Thu, 12 Oct 2023 10:58:42 -0700	[thread overview]
Message-ID: <xmqqpm1jn2nh.fsf@gitster.g> (raw)
In-Reply-To: <20231012160930.330618-3-sandals@crustytoothpaste.net> (brian m. carlson's message of "Thu, 12 Oct 2023 16:09:30 +0000")

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> An example would be to have a 2^32 sized file in the index of
> patched git. Patched git would save the file as 2^31 in the cache.
> An unpatched git would very much see the file has changed in size
> and force it to rehash the file, which is safe.

The reason why this is "safe" is because an older Git will would
keep rehashing whether 2^31 or 0 is stored as its sd_size, so the
change is not making things worse?  With older git, "git diff-files"
will report that such a file is not up to date, and then the user
will refresh the index, which will store 0 as its sd_file, so
tentatively "git status" may give a wrong information, but we
probalby do not care?  Is that how the reasoning goes?

> +/*
> + * Munge st_size into an unsigned int.
> + */
> +static unsigned int munge_st_size(off_t st_size) {
> +	unsigned int sd_size = st_size;
> +
> +	/*
> +	 * If the file is an exact multiple of 4 GiB, modify the value so it
> +	 * doesn't get marked as racily clean (zero).
> +	 */
> +	if (!sd_size && st_size)
> +		return 0x80000000;
> +	else
> +		return sd_size;
> +}

This assumes typeof(sd_size) aka "unsigned int" is always 32-bit,
which does not sound reasonable.  Reference to 4GiB, 2^32 and 2^31
in the code and the proposed commit log message need to be qualified
with "on a platform whose uint is 32-bit" or something, or better
yet, phrased in a way that is agnostic to the integer size.  At
the very least, the hardcoded 0x80000000 needs to be rethought, I
am afraid.

Other than that, the workaround for the racy-git avoidance code does
sound good.  I actually wonder if we should always use 1 regardless
of integer size.


  reply	other threads:[~2023-10-12 17:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-12 16:09 [PATCH v2 0/2] Prevent re-reading 4 GiB files on every status brian m. carlson
2023-10-12 16:09 ` [PATCH v2 1/2] t: add a test helper to truncate files brian m. carlson
2023-10-12 17:49   ` Eric Sunshine
2023-10-13 20:23     ` brian m. carlson
2023-10-12 22:52   ` Junio C Hamano
2023-10-13 20:18     ` brian m. carlson
2023-10-13 20:32       ` Junio C Hamano
2023-10-16 23:53   ` Jeff King
2023-10-12 16:09 ` [PATCH v2 2/2] Prevent git from rehashing 4GiB files brian m. carlson
2023-10-12 17:58   ` Junio C Hamano [this message]
2023-10-12 21:58     ` brian m. carlson
2023-10-12 22:11       ` Junio C Hamano
2023-10-17  0:00   ` Jeff King
2023-10-17 14:49     ` Jason Hatton
2023-10-17 17:02       ` Junio C Hamano
2023-10-18  0:42     ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqpm1jn2nh.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jhatton@globalfinishing.com \
    --cc=me@ttaylorr.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.