From: shejialuo <shejialuo@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
Karthik Nayak <karthik.188@gmail.com>,
git@vger.kernel.org, Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
Date: Sun, 5 Oct 2025 22:11:08 +0800 [thread overview]
Message-ID: <aOJ8fAZVQ8y1oMgR@ArchLinux> (raw)
In-Reply-To: <20250924053601.GC1173044@coredump.intra.peff.net>
On Wed, Sep 24, 2025 at 01:36:01AM -0400, Jeff King wrote:
> On Tue, Sep 23, 2025 at 11:48:36AM -0700, Junio C Hamano wrote:
>
> > >> 1. It prevents us from using the full range of size_t, which is
> > >> necessary for large string list.
> >
> > It is a disease to think that countable things must be counted in
> > size_t and it needs to be somehow cured.
> >
> > It is a type to count the size of memory allocations, nothing more.
> > If you are holding 1000-bytes per the stuff you are counting, you
> > would not need the full range of size_t --- you'll ran out your
> > memory way before you fill size_t with the things you are counting.
> >
> > When there is no external constraints (like you need to specify
> > exact size to describe a file format to be interoperable), the most
> > appropriate type to count things in is a platform natural "int".
> > You wouldn't be handling billions of strings in string-list anyway
> > (and that is smaller than half of 32-bit size_t; 64-bit size_t is
> > much larger).
>
> I agree that size_t is much more than one needs for counting most
> things. But the problem is that "int" is much too small, if you are
> worried about malicious input causing integer overflows that could cause
> memory access errors.
>
> A nice property of counting everything as size_t is that if we are
> storing even a single byte per item, we will fail to allocate before
> hitting an integer overflow. So no, we do not expect to store billions
> of strings. But it is not that hard to convince Git to allocate billions
> of items in a list on a 64-bit system with 32-bit ints. And it is nice
> to know that iterating over them or trying to extend the array will
> never hit an integer overflow bug.
>
Make sense.
> I'd say the "right" size for preventing overflows probably only needs to
> be 58-60 bits or so, since usually we are storing more than one byte
> (plus overhead). But 64-bit is the natural machine word size that
> matches what we want. However, we should _not_ be worried about losing
> one bit to making it signed, especially if that makes it less
> error-prone to convert instances of "int" to use "size_t". I would be
> surprised if an attacker could convince a program to truly use up half
> of its address space.
>
> > >> 2. Using int for indices while other parts of the codebase use size_t
> > >> creates signed comparison warnings when these values are compared.
> >
> > The other thing may be (mis)using size_t when it should not be. If
> > they were also using "int" that would also squelch the warnings from
> > "-Wsign-compare".
>
> So I really care only about truncation and overflow above. Sign issues
> can cause bugs, of course, but the real issue is the size mismatch
> between "int" and "size_t". And while -Wsign-compare is sometimes an
> easy way to find those mismatches (because of the sign mismatch between
> them), it may bring more hassle than it's worth.
>
That's right, I would improve my commit message to show the correct
motivation.
Thanks,
Jialuo
next prev parent reply other threads:[~2025-10-05 14:11 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-07 16:40 [PATCH 0/4] enhance string-list API to fix sign compare warnings shejialuo
2025-09-07 16:42 ` [PATCH 1/4] string-list: allow passing NULL for `get_entry_index` shejialuo
2025-09-09 6:22 ` Patrick Steinhardt
2025-09-07 16:42 ` [PATCH 2/4] string-list: replace negative index encoding with "exact_match" parameter shejialuo
2025-09-09 6:22 ` Patrick Steinhardt
2025-09-15 12:11 ` shejialuo
2025-09-07 16:42 ` [PATCH 3/4] string-list: change "string_list_find_insert_index" return type to "size_t" shejialuo
2025-09-09 6:23 ` Patrick Steinhardt
2025-09-09 19:21 ` Junio C Hamano
2025-09-10 4:57 ` Patrick Steinhardt
2025-09-07 16:42 ` [PATCH 4/4] refs: enable sign compare warnings check shejialuo
2025-09-09 6:23 ` Patrick Steinhardt
2025-09-07 16:43 ` [PATCH 0/4] enhance string-list API to fix sign compare warnings shejialuo
2025-09-17 9:18 ` [PATCH v2 " shejialuo
2025-09-17 9:19 ` [PATCH v2 1/4] string-list: use bool instead of int for "exact_match" shejialuo
2025-09-17 9:19 ` [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter shejialuo
2025-09-23 8:14 ` Patrick Steinhardt
2025-10-05 13:31 ` shejialuo
2025-09-23 9:35 ` Karthik Nayak
2025-09-23 18:48 ` Junio C Hamano
2025-09-24 5:36 ` Jeff King
2025-09-24 13:20 ` Junio C Hamano
2025-09-25 2:50 ` Jeff King
2025-09-25 13:33 ` Junio C Hamano
2025-10-09 5:52 ` Jeff King
2025-10-08 1:49 ` Collin Funk
2025-10-09 5:55 ` Jeff King
2025-10-05 14:11 ` shejialuo [this message]
2025-10-05 14:06 ` shejialuo
2025-09-17 9:20 ` [PATCH v2 3/4] string-list: change "string_list_find_insert_index" return type to "size_t" shejialuo
2025-09-23 9:44 ` Karthik Nayak
2025-10-05 9:29 ` shejialuo
2025-09-17 9:20 ` [PATCH v2 4/4] refs: enable sign compare warnings check shejialuo
2025-10-06 6:28 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings shejialuo
2025-10-06 6:32 ` [PATCH v3 1/4] string-list: use bool instead of int for "exact_match" shejialuo
2025-10-06 6:32 ` [PATCH v3 2/4] string-list: replace negative index encoding with "exact_match" parameter shejialuo
2025-10-06 6:32 ` [PATCH v3 3/4] string-list: change "string_list_find_insert_index" return type to "size_t" shejialuo
2025-10-09 6:03 ` Jeff King
2025-10-06 6:32 ` [PATCH v3 4/4] refs: enable sign compare warnings check shejialuo
2025-10-06 22:09 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings Junio C Hamano
2025-10-08 1:52 ` Collin Funk
2025-10-08 15:56 ` Junio C Hamano
2025-10-08 8:11 ` Karthik Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aOJ8fAZVQ8y1oMgR@ArchLinux \
--to=shejialuo@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=karthik.188@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.