From: Junio C Hamano <gitster@pobox.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: <git@vger.kernel.org>, Taylor Blau <me@ttaylorr.com>,
Jason Hatton <jhatton@globalfinishing.com>
Subject: Re: [PATCH v2 2/2] Prevent git from rehashing 4GiB files
Date: Thu, 12 Oct 2023 10:58:42 -0700 [thread overview]
Message-ID: <xmqqpm1jn2nh.fsf@gitster.g> (raw)
In-Reply-To: <20231012160930.330618-3-sandals@crustytoothpaste.net> (brian m. carlson's message of "Thu, 12 Oct 2023 16:09:30 +0000")
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> An example would be to have a 2^32 sized file in the index of
> patched git. Patched git would save the file as 2^31 in the cache.
> An unpatched git would very much see the file has changed in size
> and force it to rehash the file, which is safe.
The reason why this is "safe" is because an older Git will would
keep rehashing whether 2^31 or 0 is stored as its sd_size, so the
change is not making things worse? With older git, "git diff-files"
will report that such a file is not up to date, and then the user
will refresh the index, which will store 0 as its sd_file, so
tentatively "git status" may give a wrong information, but we
probalby do not care? Is that how the reasoning goes?
> +/*
> + * Munge st_size into an unsigned int.
> + */
> +static unsigned int munge_st_size(off_t st_size) {
> + unsigned int sd_size = st_size;
> +
> + /*
> + * If the file is an exact multiple of 4 GiB, modify the value so it
> + * doesn't get marked as racily clean (zero).
> + */
> + if (!sd_size && st_size)
> + return 0x80000000;
> + else
> + return sd_size;
> +}
This assumes typeof(sd_size) aka "unsigned int" is always 32-bit,
which does not sound reasonable. Reference to 4GiB, 2^32 and 2^31
in the code and the proposed commit log message need to be qualified
with "on a platform whose uint is 32-bit" or something, or better
yet, phrased in a way that is agnostic to the integer size. At
the very least, the hardcoded 0x80000000 needs to be rethought, I
am afraid.
Other than that, the workaround for the racy-git avoidance code does
sound good. I actually wonder if we should always use 1 regardless
of integer size.
next prev parent reply other threads:[~2023-10-12 17:59 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-12 16:09 [PATCH v2 0/2] Prevent re-reading 4 GiB files on every status brian m. carlson
2023-10-12 16:09 ` [PATCH v2 1/2] t: add a test helper to truncate files brian m. carlson
2023-10-12 17:49 ` Eric Sunshine
2023-10-13 20:23 ` brian m. carlson
2023-10-12 22:52 ` Junio C Hamano
2023-10-13 20:18 ` brian m. carlson
2023-10-13 20:32 ` Junio C Hamano
2023-10-16 23:53 ` Jeff King
2023-10-12 16:09 ` [PATCH v2 2/2] Prevent git from rehashing 4GiB files brian m. carlson
2023-10-12 17:58 ` Junio C Hamano [this message]
2023-10-12 21:58 ` brian m. carlson
2023-10-12 22:11 ` Junio C Hamano
2023-10-17 0:00 ` Jeff King
2023-10-17 14:49 ` Jason Hatton
2023-10-17 17:02 ` Junio C Hamano
2023-10-18 0:42 ` brian m. carlson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqpm1jn2nh.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=jhatton@globalfinishing.com \
--cc=me@ttaylorr.com \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).