git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: <git@vger.kernel.org>, Taylor Blau <me@ttaylorr.com>,
	Jason Hatton <jhatton@globalfinishing.com>
Subject: Re: [PATCH v2 2/2] Prevent git from rehashing 4GiB files
Date: Thu, 12 Oct 2023 10:58:42 -0700	[thread overview]
Message-ID: <xmqqpm1jn2nh.fsf@gitster.g> (raw)
In-Reply-To: <20231012160930.330618-3-sandals@crustytoothpaste.net> (brian m. carlson's message of "Thu, 12 Oct 2023 16:09:30 +0000")

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> An example would be to have a 2^32 sized file in the index of
> patched git. Patched git would save the file as 2^31 in the cache.
> An unpatched git would very much see the file has changed in size
> and force it to rehash the file, which is safe.

The reason why this is "safe" is because an older Git will would
keep rehashing whether 2^31 or 0 is stored as its sd_size, so the
change is not making things worse?  With older git, "git diff-files"
will report that such a file is not up to date, and then the user
will refresh the index, which will store 0 as its sd_file, so
tentatively "git status" may give a wrong information, but we
probalby do not care?  Is that how the reasoning goes?

> +/*
> + * Munge st_size into an unsigned int.
> + */
> +static unsigned int munge_st_size(off_t st_size) {
> +	unsigned int sd_size = st_size;
> +
> +	/*
> +	 * If the file is an exact multiple of 4 GiB, modify the value so it
> +	 * doesn't get marked as racily clean (zero).
> +	 */
> +	if (!sd_size && st_size)
> +		return 0x80000000;
> +	else
> +		return sd_size;
> +}

This assumes typeof(sd_size) aka "unsigned int" is always 32-bit,
which does not sound reasonable.  Reference to 4GiB, 2^32 and 2^31
in the code and the proposed commit log message need to be qualified
with "on a platform whose uint is 32-bit" or something, or better
yet, phrased in a way that is agnostic to the integer size.  At
the very least, the hardcoded 0x80000000 needs to be rethought, I
am afraid.

Other than that, the workaround for the racy-git avoidance code does
sound good.  I actually wonder if we should always use 1 regardless
of integer size.


  reply	other threads:[~2023-10-12 17:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-12 16:09 [PATCH v2 0/2] Prevent re-reading 4 GiB files on every status brian m. carlson
2023-10-12 16:09 ` [PATCH v2 1/2] t: add a test helper to truncate files brian m. carlson
2023-10-12 17:49   ` Eric Sunshine
2023-10-13 20:23     ` brian m. carlson
2023-10-12 22:52   ` Junio C Hamano
2023-10-13 20:18     ` brian m. carlson
2023-10-13 20:32       ` Junio C Hamano
2023-10-16 23:53   ` Jeff King
2023-10-12 16:09 ` [PATCH v2 2/2] Prevent git from rehashing 4GiB files brian m. carlson
2023-10-12 17:58   ` Junio C Hamano [this message]
2023-10-12 21:58     ` brian m. carlson
2023-10-12 22:11       ` Junio C Hamano
2023-10-17  0:00   ` Jeff King
2023-10-17 14:49     ` Jason Hatton
2023-10-17 17:02       ` Junio C Hamano
2023-10-18  0:42     ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqpm1jn2nh.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jhatton@globalfinishing.com \
    --cc=me@ttaylorr.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).