All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Patrick Steinhardt <ps@pks.im>,
	Oswald Buddenhagen <oswald.buddenhagen@gmx.de>,
	git@vger.kernel.org
Subject: Re: [PATCH 2/9] commit-graph: stop using signed integers to count bloom filters
Date: Mon, 4 Aug 2025 17:44:06 -0400	[thread overview]
Message-ID: <aJEppnTkY+66IEza@nand.local> (raw)
In-Reply-To: <xmqq5xf35429.fsf@gitster.g>

On Mon, Aug 04, 2025 at 11:34:22AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > On Mon, Aug 04, 2025 at 11:13:28AM +0200, Oswald Buddenhagen wrote:
> >> On Mon, Aug 04, 2025 at 10:17:18AM +0200, Patrick Steinhardt wrote:
> >> > When writing a new commit graph we have a couple of counters that
> >> > provide statistics around what kind of bloom filters we have or have not
> >> > written. These counters naturally count from zero and are only ever
> >> > incremented, but they use a signed integer as type regardless.
> >> >
> >> > Refactor those fields to be of type `size_t` instead.
> >> >
> >> mind elaborating on that choice?
> >
> > We tend to use `size_t` when counting stuff.
>
> And I would have to say that it is wrong and we need to wean
> ourselves from such a superstition.  Unless you are measuring how
> big a memory block you ask from the allocator, the platform natural
> integer is often the right type to do the counting.
>
> Each of your "stuff" may weigh N megabytes in core, and if you have
> M of them, you may have to ask (N*2**20)*M bytes of memory from the
> allocator.  Your (N*2**20)*M must fit size_t _and_ you must compute
> it without overflowing or wrapping around.
>
> None of the above mean you have to express N in size_t, though.
> And more importantly, nobody gives you any extra guarantee that you
> would compute the result correctly if you used size_t.  You can write
> the right code with platform natural integer, and you have to take
> the same care (e.g. by using st_mult()) to catch integer overflows
> even if you used size_t.

Agreed. I think it makes sense to use size_t to keep track of, say, the
length and allocated size of a buffer, but when it comes to "counting"
something that isn't directly related to memory allocation or pointer
arithmetic, size_t is usually not the right choice.

For instance, the MIDX code counts the number of objects and packs in a
given MIDX (and likewise for its base MIDX(s)), but those are all
uint32_t. You could make the case to say that, "well, they are encoded
in the file format as 4-byte unsigned values, so we should treat them
the same way in memory at read-time", and I think that's reasonable. In
that instance, using "int" would be the wrong choice, since I have
definitely seen repositories that have in excess of 2^32-1 objects.
But there is no reason that we shouldn't use size_t to make that count.

> > ... Regarding the data size I
> > don't really think that matters much. It's not like we have hundreds of
> > thousands of commit graphs in-memory at any point in time.
>
> Aren't you saying that a platform natural integer is a much better
> fit?
>
> As to signedness, it sometimes is better for a struct member that is
> used to record the number of "stuff" you have to be a signed integer
> that is initialized to -1 to signal "we haven't counted so we do not
> yet know how many there are".  So
>
>     These counters naturally count from zero and are only ever
>     incremented.
>
> is not always a valid excuse to insist that such a variable must be
> unsigned.

I wrote these counters in 312cff5207 (bloom: split 'get_bloom_filter()'
in two, 2020-09-16) and 59f0d5073f (bloom: encode out-of-bounds filters
as non-empty, 2020-09-17), and I don't see a compelling reason that
these should be unsigned.

It's true that we don't have any need for negative values here since we
are counting from zero, but I don't think that alone justifies changing
the signed-ness here.

Is there a reason beyond "these are always non-negative" that changing
the signed-ness is warranted? If so, let's discuss that and make sure
that it is documented in the commit message. If not, I think we could
drop this patch (and optionally the patch before it as well).

Thanks,
Taylor

  reply	other threads:[~2025-08-04 21:44 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-04  8:17 [PATCH 0/9] commit-graph: remove reliance on global state Patrick Steinhardt
2025-08-04  8:17 ` [PATCH 1/9] trace2: introduce function to trace unsigned integers Patrick Steinhardt
2025-08-04 21:33   ` Taylor Blau
2025-08-04  8:17 ` [PATCH 2/9] commit-graph: stop using signed integers to count bloom filters Patrick Steinhardt
2025-08-04  9:13   ` Oswald Buddenhagen
2025-08-04 11:18     ` Patrick Steinhardt
2025-08-04 18:34       ` Junio C Hamano
2025-08-04 21:44         ` Taylor Blau [this message]
2025-08-06  6:23           ` Patrick Steinhardt
2025-08-06 12:54             ` Oswald Buddenhagen
2025-08-06 19:04               ` Junio C Hamano
2025-08-06 15:41             ` Junio C Hamano
2025-08-07  7:04               ` Patrick Steinhardt
2025-08-07 22:41                 ` Junio C Hamano
2025-08-11  8:05                   ` Patrick Steinhardt
2025-08-05 15:13         ` Junio C Hamano
2025-08-04 21:42   ` Taylor Blau
2025-08-04  8:17 ` [PATCH 3/9] commit-graph: fix type for some write options Patrick Steinhardt
2025-08-04 21:52   ` Taylor Blau
2025-08-04  8:17 ` [PATCH 4/9] commit-graph: fix sign comparison warnings Patrick Steinhardt
2025-08-04 22:04   ` Taylor Blau
2025-08-06  6:52     ` Patrick Steinhardt
2025-08-04  8:17 ` [PATCH 5/9] commit-graph: stop using `the_hash_algo` via macros Patrick Steinhardt
2025-08-04 22:05   ` Taylor Blau
2025-08-04  8:17 ` [PATCH 6/9] commit-graph: store the hash algorithm instead of its length Patrick Steinhardt
2025-08-04 22:07   ` Taylor Blau
2025-08-04  8:17 ` [PATCH 7/9] commit-graph: stop using `the_hash_algo` Patrick Steinhardt
2025-08-04 22:10   ` Taylor Blau
2025-08-06  6:53     ` Patrick Steinhardt
2025-08-04  8:17 ` [PATCH 8/9] commit-graph: stop using `the_repository` Patrick Steinhardt
2025-08-04 22:11   ` Taylor Blau
2025-08-04  8:17 ` [PATCH 9/9] commit-graph: stop passing in redundant repository Patrick Steinhardt
2025-08-05  4:27 ` [PATCH 0/9] commit-graph: remove reliance on global state Derrick Stolee
2025-08-06  6:53   ` Patrick Steinhardt
2025-08-06 12:00 ` [PATCH v2 00/10] " Patrick Steinhardt
2025-08-06 12:00   ` [PATCH v2 01/10] trace2: introduce function to trace unsigned integers Patrick Steinhardt
2025-08-06 12:00   ` [PATCH v2 02/10] commit-graph: stop using signed integers to count Bloom filters Patrick Steinhardt
2025-08-06 12:00   ` [PATCH v2 03/10] commit-graph: fix type for some write options Patrick Steinhardt
2025-08-06 12:34     ` Oswald Buddenhagen
2025-08-06 15:40       ` Junio C Hamano
2025-08-07  7:07         ` Patrick Steinhardt
2025-08-06 12:00   ` [PATCH v2 04/10] commit-graph: fix sign comparison warnings Patrick Steinhardt
2025-08-06 12:00   ` [PATCH v2 05/10] commit-graph: stop using `the_hash_algo` via macros Patrick Steinhardt
2025-08-06 12:00   ` [PATCH v2 06/10] commit-graph: store the hash algorithm instead of its length Patrick Steinhardt
2025-08-06 12:00   ` [PATCH v2 07/10] commit-graph: refactor `parse_commit_graph()` to take a repository Patrick Steinhardt
2025-08-06 12:00   ` [PATCH v2 08/10] commit-graph: stop using `the_hash_algo` Patrick Steinhardt
2025-08-06 12:00   ` [PATCH v2 09/10] commit-graph: stop using `the_repository` Patrick Steinhardt
2025-08-06 12:00   ` [PATCH v2 10/10] commit-graph: stop passing in redundant repository Patrick Steinhardt
2025-08-07  8:04 ` [PATCH v3 00/10] commit-graph: remove reliance on global state Patrick Steinhardt
2025-08-07  8:04   ` [PATCH v3 01/10] trace2: introduce function to trace unsigned integers Patrick Steinhardt
2025-08-07  8:04   ` [PATCH v3 02/10] commit-graph: stop using signed integers to count Bloom filters Patrick Steinhardt
2025-08-07  8:04   ` [PATCH v3 03/10] commit-graph: fix type for some write options Patrick Steinhardt
2025-08-07 22:40     ` Junio C Hamano
2025-08-11  8:24       ` Patrick Steinhardt
2025-08-07  8:04   ` [PATCH v3 04/10] commit-graph: fix sign comparison warnings Patrick Steinhardt
2025-08-07  8:04   ` [PATCH v3 05/10] commit-graph: stop using `the_hash_algo` via macros Patrick Steinhardt
2025-08-07  8:04   ` [PATCH v3 06/10] commit-graph: store the hash algorithm instead of its length Patrick Steinhardt
2025-08-07  8:04   ` [PATCH v3 07/10] commit-graph: refactor `parse_commit_graph()` to take a repository Patrick Steinhardt
2025-08-07  8:04   ` [PATCH v3 08/10] commit-graph: stop using `the_hash_algo` Patrick Steinhardt
2025-08-07  8:04   ` [PATCH v3 09/10] commit-graph: stop using `the_repository` Patrick Steinhardt
2025-08-07  8:04   ` [PATCH v3 10/10] commit-graph: stop passing in redundant repository Patrick Steinhardt
2025-08-15  5:49 ` [PATCH v4 0/6] commit-graph: remove reliance on global state Patrick Steinhardt
2025-08-15  5:49   ` [PATCH v4 1/6] commit-graph: stop using `the_hash_algo` via macros Patrick Steinhardt
2025-08-15  5:49   ` [PATCH v4 2/6] commit-graph: store the hash algorithm instead of its length Patrick Steinhardt
2025-08-15  5:49   ` [PATCH v4 3/6] commit-graph: refactor `parse_commit_graph()` to take a repository Patrick Steinhardt
2025-08-15  5:49   ` [PATCH v4 4/6] commit-graph: stop using `the_hash_algo` Patrick Steinhardt
2025-08-15  5:49   ` [PATCH v4 5/6] commit-graph: stop using `the_repository` Patrick Steinhardt
2025-08-15  5:49   ` [PATCH v4 6/6] commit-graph: stop passing in redundant repository Patrick Steinhardt
2025-08-15 15:17   ` [PATCH v4 0/6] commit-graph: remove reliance on global state Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJEppnTkY+66IEza@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=oswald.buddenhagen@gmx.de \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.