git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: Patrick Steinhardt <ps@pks.im>, git@vger.kernel.org
Cc: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>,
	Junio C Hamano <gitster@pobox.com>, Taylor Blau <me@ttaylorr.com>,
	Jeff King <peff@peff.net>
Subject: Re: [PATCH] git-compat-util: introduce `count_t` typedef
Date: Thu, 7 Aug 2025 15:17:14 +0100	[thread overview]
Message-ID: <582e8e75-c6eb-4845-8f3b-62f234f0964f@gmail.com> (raw)
In-Reply-To: <20250807-pks-introduce-count-t-v1-1-e96be52d8db1@pks.im>

Hi Patrick

On 07/08/2025 10:22, Patrick Steinhardt wrote:
> Historically, Git has been very lenient with its use of integer types
> and didn't really give much thought into which type to use in what
> situation. We interchangeably mix and match signed and unsigned types
> and often times blindly convert them. This use has led to several
> out-of-bounds reads and writes in the past, some of which could be
> turned into arbitrary code execution.

My feeling is that one of the main problems has been using different 
types for loop indexes and loop limits. If we mandated that the loop 
index had to be the same type as the limit that would improve things 
considerably and without mandating a particular type.

> A discussion that regularly comes up in this context though is what
> types to use for counting entities:
> 
>    - One question is whether the type should be signed or unsigned.
>      Arguably, the answer should be to use unsigned types as long as we
>      know that we never need a negative value, e.g. as a sentinel. This
>      helps guide the reader and explicitly conveys the sense that such a
>      counter is only ever going to be a non-negative number. Otherwise,
>      code would need to be more careful as it may hold negative values.

The counter argument to this is that it is easy to write incorrect loops 
when counting down if the loop variable is unsigned. Using a typedef 
that hides the actual type makes that harder to spot as it is not 
immediately obvious whether the loop index is signed or not. As we have 
cases that do need to store a negative value then we're still left with 
using a mix of signed and unsigned types for counting in our code base.

> Introduce a new typedef for `count_t` that is of type `uintptr_t` to
> give clear guidance what type to use for counting entities. This type
> was chosen because in the worst case, an entity may be a single byte and
> we fill all of our memory with these entities. As `uintptr_t` is
> guaranteed to hold at least the value of a pointer, we know that it
> could be used to index into every single such entity.

How many sites actually allocate anything like that number of 
entities?Generally we use ALLOC_GROW() or ALLOC_GROW_BY() which means 
that we're not normally counting bytes. ALLOC_GROW_BY() assumes the 
number of entities fits into a size_t so should be be changing that to 
use count_t? If we're worried about overflows then maybe we should look 
at alloc_nr() which calculates the new allocation with

     (nr + 16) * 3 / 2

which which will start overflowing long before we starting allocating 
UINTPTR_MAX single byte entities.

Thanks

Phillip


  parent reply	other threads:[~2025-08-07 14:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-07  9:22 [PATCH] git-compat-util: introduce `count_t` typedef Patrick Steinhardt
2025-08-07 11:00 ` Matthias Aßhauer
2025-08-07 14:17 ` Phillip Wood [this message]
2025-08-07 16:43   ` Junio C Hamano
2025-08-07 16:38 ` Junio C Hamano
2025-08-07 22:07   ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=582e8e75-c6eb-4845-8f3b-62f234f0964f@gmail.com \
    --to=phillip.wood123@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=oswald.buddenhagen@gmx.de \
    --cc=peff@peff.net \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).