From: Taylor Blau <me@ttaylorr.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Patrick Steinhardt <ps@pks.im>,
Oswald Buddenhagen <oswald.buddenhagen@gmx.de>,
git@vger.kernel.org
Subject: Re: [PATCH 2/9] commit-graph: stop using signed integers to count bloom filters
Date: Mon, 4 Aug 2025 17:44:06 -0400 [thread overview]
Message-ID: <aJEppnTkY+66IEza@nand.local> (raw)
In-Reply-To: <xmqq5xf35429.fsf@gitster.g>
On Mon, Aug 04, 2025 at 11:34:22AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > On Mon, Aug 04, 2025 at 11:13:28AM +0200, Oswald Buddenhagen wrote:
> >> On Mon, Aug 04, 2025 at 10:17:18AM +0200, Patrick Steinhardt wrote:
> >> > When writing a new commit graph we have a couple of counters that
> >> > provide statistics around what kind of bloom filters we have or have not
> >> > written. These counters naturally count from zero and are only ever
> >> > incremented, but they use a signed integer as type regardless.
> >> >
> >> > Refactor those fields to be of type `size_t` instead.
> >> >
> >> mind elaborating on that choice?
> >
> > We tend to use `size_t` when counting stuff.
>
> And I would have to say that it is wrong and we need to wean
> ourselves from such a superstition. Unless you are measuring how
> big a memory block you ask from the allocator, the platform natural
> integer is often the right type to do the counting.
>
> Each of your "stuff" may weigh N megabytes in core, and if you have
> M of them, you may have to ask (N*2**20)*M bytes of memory from the
> allocator. Your (N*2**20)*M must fit size_t _and_ you must compute
> it without overflowing or wrapping around.
>
> None of the above mean you have to express N in size_t, though.
> And more importantly, nobody gives you any extra guarantee that you
> would compute the result correctly if you used size_t. You can write
> the right code with platform natural integer, and you have to take
> the same care (e.g. by using st_mult()) to catch integer overflows
> even if you used size_t.
Agreed. I think it makes sense to use size_t to keep track of, say, the
length and allocated size of a buffer, but when it comes to "counting"
something that isn't directly related to memory allocation or pointer
arithmetic, size_t is usually not the right choice.
For instance, the MIDX code counts the number of objects and packs in a
given MIDX (and likewise for its base MIDX(s)), but those are all
uint32_t. You could make the case to say that, "well, they are encoded
in the file format as 4-byte unsigned values, so we should treat them
the same way in memory at read-time", and I think that's reasonable. In
that instance, using "int" would be the wrong choice, since I have
definitely seen repositories that have in excess of 2^32-1 objects.
But there is no reason that we shouldn't use size_t to make that count.
> > ... Regarding the data size I
> > don't really think that matters much. It's not like we have hundreds of
> > thousands of commit graphs in-memory at any point in time.
>
> Aren't you saying that a platform natural integer is a much better
> fit?
>
> As to signedness, it sometimes is better for a struct member that is
> used to record the number of "stuff" you have to be a signed integer
> that is initialized to -1 to signal "we haven't counted so we do not
> yet know how many there are". So
>
> These counters naturally count from zero and are only ever
> incremented.
>
> is not always a valid excuse to insist that such a variable must be
> unsigned.
I wrote these counters in 312cff5207 (bloom: split 'get_bloom_filter()'
in two, 2020-09-16) and 59f0d5073f (bloom: encode out-of-bounds filters
as non-empty, 2020-09-17), and I don't see a compelling reason that
these should be unsigned.
It's true that we don't have any need for negative values here since we
are counting from zero, but I don't think that alone justifies changing
the signed-ness here.
Is there a reason beyond "these are always non-negative" that changing
the signed-ness is warranted? If so, let's discuss that and make sure
that it is documented in the commit message. If not, I think we could
drop this patch (and optionally the patch before it as well).
Thanks,
Taylor
next prev parent reply other threads:[~2025-08-04 21:44 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-04 8:17 [PATCH 0/9] commit-graph: remove reliance on global state Patrick Steinhardt
2025-08-04 8:17 ` [PATCH 1/9] trace2: introduce function to trace unsigned integers Patrick Steinhardt
2025-08-04 21:33 ` Taylor Blau
2025-08-04 8:17 ` [PATCH 2/9] commit-graph: stop using signed integers to count bloom filters Patrick Steinhardt
2025-08-04 9:13 ` Oswald Buddenhagen
2025-08-04 11:18 ` Patrick Steinhardt
2025-08-04 18:34 ` Junio C Hamano
2025-08-04 21:44 ` Taylor Blau [this message]
2025-08-06 6:23 ` Patrick Steinhardt
2025-08-06 12:54 ` Oswald Buddenhagen
2025-08-06 19:04 ` Junio C Hamano
2025-08-06 15:41 ` Junio C Hamano
2025-08-07 7:04 ` Patrick Steinhardt
2025-08-07 22:41 ` Junio C Hamano
2025-08-11 8:05 ` Patrick Steinhardt
2025-08-05 15:13 ` Junio C Hamano
2025-08-04 21:42 ` Taylor Blau
2025-08-04 8:17 ` [PATCH 3/9] commit-graph: fix type for some write options Patrick Steinhardt
2025-08-04 21:52 ` Taylor Blau
2025-08-04 8:17 ` [PATCH 4/9] commit-graph: fix sign comparison warnings Patrick Steinhardt
2025-08-04 22:04 ` Taylor Blau
2025-08-06 6:52 ` Patrick Steinhardt
2025-08-04 8:17 ` [PATCH 5/9] commit-graph: stop using `the_hash_algo` via macros Patrick Steinhardt
2025-08-04 22:05 ` Taylor Blau
2025-08-04 8:17 ` [PATCH 6/9] commit-graph: store the hash algorithm instead of its length Patrick Steinhardt
2025-08-04 22:07 ` Taylor Blau
2025-08-04 8:17 ` [PATCH 7/9] commit-graph: stop using `the_hash_algo` Patrick Steinhardt
2025-08-04 22:10 ` Taylor Blau
2025-08-06 6:53 ` Patrick Steinhardt
2025-08-04 8:17 ` [PATCH 8/9] commit-graph: stop using `the_repository` Patrick Steinhardt
2025-08-04 22:11 ` Taylor Blau
2025-08-04 8:17 ` [PATCH 9/9] commit-graph: stop passing in redundant repository Patrick Steinhardt
2025-08-05 4:27 ` [PATCH 0/9] commit-graph: remove reliance on global state Derrick Stolee
2025-08-06 6:53 ` Patrick Steinhardt
2025-08-06 12:00 ` [PATCH v2 00/10] " Patrick Steinhardt
2025-08-06 12:00 ` [PATCH v2 01/10] trace2: introduce function to trace unsigned integers Patrick Steinhardt
2025-08-06 12:00 ` [PATCH v2 02/10] commit-graph: stop using signed integers to count Bloom filters Patrick Steinhardt
2025-08-06 12:00 ` [PATCH v2 03/10] commit-graph: fix type for some write options Patrick Steinhardt
2025-08-06 12:34 ` Oswald Buddenhagen
2025-08-06 15:40 ` Junio C Hamano
2025-08-07 7:07 ` Patrick Steinhardt
2025-08-06 12:00 ` [PATCH v2 04/10] commit-graph: fix sign comparison warnings Patrick Steinhardt
2025-08-06 12:00 ` [PATCH v2 05/10] commit-graph: stop using `the_hash_algo` via macros Patrick Steinhardt
2025-08-06 12:00 ` [PATCH v2 06/10] commit-graph: store the hash algorithm instead of its length Patrick Steinhardt
2025-08-06 12:00 ` [PATCH v2 07/10] commit-graph: refactor `parse_commit_graph()` to take a repository Patrick Steinhardt
2025-08-06 12:00 ` [PATCH v2 08/10] commit-graph: stop using `the_hash_algo` Patrick Steinhardt
2025-08-06 12:00 ` [PATCH v2 09/10] commit-graph: stop using `the_repository` Patrick Steinhardt
2025-08-06 12:00 ` [PATCH v2 10/10] commit-graph: stop passing in redundant repository Patrick Steinhardt
2025-08-07 8:04 ` [PATCH v3 00/10] commit-graph: remove reliance on global state Patrick Steinhardt
2025-08-07 8:04 ` [PATCH v3 01/10] trace2: introduce function to trace unsigned integers Patrick Steinhardt
2025-08-07 8:04 ` [PATCH v3 02/10] commit-graph: stop using signed integers to count Bloom filters Patrick Steinhardt
2025-08-07 8:04 ` [PATCH v3 03/10] commit-graph: fix type for some write options Patrick Steinhardt
2025-08-07 22:40 ` Junio C Hamano
2025-08-11 8:24 ` Patrick Steinhardt
2025-08-07 8:04 ` [PATCH v3 04/10] commit-graph: fix sign comparison warnings Patrick Steinhardt
2025-08-07 8:04 ` [PATCH v3 05/10] commit-graph: stop using `the_hash_algo` via macros Patrick Steinhardt
2025-08-07 8:04 ` [PATCH v3 06/10] commit-graph: store the hash algorithm instead of its length Patrick Steinhardt
2025-08-07 8:04 ` [PATCH v3 07/10] commit-graph: refactor `parse_commit_graph()` to take a repository Patrick Steinhardt
2025-08-07 8:04 ` [PATCH v3 08/10] commit-graph: stop using `the_hash_algo` Patrick Steinhardt
2025-08-07 8:04 ` [PATCH v3 09/10] commit-graph: stop using `the_repository` Patrick Steinhardt
2025-08-07 8:04 ` [PATCH v3 10/10] commit-graph: stop passing in redundant repository Patrick Steinhardt
2025-08-15 5:49 ` [PATCH v4 0/6] commit-graph: remove reliance on global state Patrick Steinhardt
2025-08-15 5:49 ` [PATCH v4 1/6] commit-graph: stop using `the_hash_algo` via macros Patrick Steinhardt
2025-08-15 5:49 ` [PATCH v4 2/6] commit-graph: store the hash algorithm instead of its length Patrick Steinhardt
2025-08-15 5:49 ` [PATCH v4 3/6] commit-graph: refactor `parse_commit_graph()` to take a repository Patrick Steinhardt
2025-08-15 5:49 ` [PATCH v4 4/6] commit-graph: stop using `the_hash_algo` Patrick Steinhardt
2025-08-15 5:49 ` [PATCH v4 5/6] commit-graph: stop using `the_repository` Patrick Steinhardt
2025-08-15 5:49 ` [PATCH v4 6/6] commit-graph: stop passing in redundant repository Patrick Steinhardt
2025-08-15 15:17 ` [PATCH v4 0/6] commit-graph: remove reliance on global state Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aJEppnTkY+66IEza@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=oswald.buddenhagen@gmx.de \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).