From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Derrick Stolee <derrickstolee@github.com>,
Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Subject: Re: [PATCH 4/6] pack-bitmap.c: factor out manual `map_pos` manipulation
Date: Fri, 24 Mar 2023 19:08:56 -0400 [thread overview]
Message-ID: <ZB4tiLfNsD3LW/JS@nand.local> (raw)
In-Reply-To: <20230321175612.GF3119834@coredump.intra.peff.net>
On Tue, Mar 21, 2023 at 01:56:12PM -0400, Jeff King wrote:
> I like the idea of centralizing the bounds checks here.
>
> I do think copying lseek() is a little awkward, though. As you note, we
> only care about SEEK_SET and SEEK_CUR, and we have to BUG() on anything
> else. Which means we have a run-time check, rather than a compile time
> check. If we had two functions like:
>
> void bitmap_index_seek_to(struct bitmap_index *bitmap_git, size_t pos)
> {
> bitmap_git->map_pos = pos;
> if (bitmap_git->map_pos >= bitmap_git->map_size)
> ...etc...
> }
>
> void bitmap_index_incr(struct bitmap_index *bitmap_git, size_t offset)
> {
> bitmap_index_seek_to(bitmap_git, st_add(bitmap_git->map_pos, offset));
> }
>
> then the compiler can see all cases directly. I dunno how much it
> matters.
Yeah, I think that trying to copy the interface of lseek() was a
mistake, since it ended up creating more awkwardness than anything else.
I like the combination of bitmap_index_seek_to() and _incr(), though
though I called the latter bitmap_index_seek_set() instead.
We've got to be a little reminiscent of lseek() after all ;-).
> The other thing that's interesting from a bounds-checking perspective is
> that you're checking the seek _after_ you've read an item. Which has two
> implications:
>
> - I don't think we could ever overflow size_t here.
Yep, agreed. Let's drop the st_add() call there entirely, since it's not
doing anything useful and just serving to confuse the reader.
> - The more interesting case is that we are not overflowing size_t, but
> simply walking past bitmap_git->map_size. But again, we are reading
> first and then seeking. So if our seek does go past, then by
> definition we just read garbage bytes, which is undefined behavior.
>
> For a bounds-check, wouldn't we want it the other way around? To
> make sure we have 4 bytes available, and then if so read them and
> increment the offset?
I thought on first blush that this would be a much larger change, but
actually it's quite small. The EWAH code already checks its reads ahead
of time, so we don't have to worry about those. It's really that
read_be32() and read_u8() do the read-then-seek.
So I think that the change here is limited to making sure that we can
read enough bytes in those functions before actually executing the read.
> > + if (bitmap_git->map_pos >= bitmap_git->map_size)
> > + BUG("bitmap position exceeds size (%"PRIuMAX" >= %"PRIuMAX")",
> > + (uintmax_t)bitmap_git->map_pos,
> > + (uintmax_t)bitmap_git->map_size);
>
> This uses ">=", which is good, because it is valid to walk the pointer
> to one past the end of the map, which is effectively EOF. But as above,
> in that condition the callers should be checking for this EOF state
> before reading the bytes.
I think that having both is useful. Checking before you read avoids
undefined behavior. Checking after you seek ensures that we never have a
bitmap_index struct with a bogus map_pos value.
Thanks,
Taylor
next prev parent reply other threads:[~2023-03-24 23:09 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-20 20:02 [PATCH 0/6] pack-bitmap: miscellaneous mmap read hardening Taylor Blau
2023-03-20 20:02 ` [PATCH 1/6] pack-bitmap.c: hide bitmap internals in `read_u8()` Taylor Blau
2023-03-21 17:35 ` Jeff King
2023-03-24 17:52 ` Derrick Stolee
2023-03-20 20:02 ` [PATCH 2/6] pack-bitmap.c: hide bitmap internals in `read_be32()` Taylor Blau
2023-03-20 20:02 ` [PATCH 3/6] pack-bitmap.c: drop unnecessary 'inline's Taylor Blau
2023-03-21 17:40 ` Jeff King
2023-03-20 20:02 ` [PATCH 4/6] pack-bitmap.c: factor out manual `map_pos` manipulation Taylor Blau
2023-03-21 17:56 ` Jeff King
2023-03-24 18:04 ` Derrick Stolee
2023-03-24 18:29 ` Jeff King
2023-03-24 23:23 ` Taylor Blau
2023-03-25 4:57 ` Jeff King
2023-03-24 23:13 ` Taylor Blau
2023-03-24 23:24 ` Taylor Blau
2023-03-24 23:08 ` Taylor Blau [this message]
2023-03-20 20:02 ` [PATCH 5/6] pack-bitmap.c: use `bitmap_index_seek()` where possible Taylor Blau
2023-03-21 18:05 ` Jeff King
2023-03-24 18:06 ` Derrick Stolee
2023-03-24 18:35 ` Jeff King
2023-03-24 19:43 ` Junio C Hamano
2023-03-24 20:37 ` Jeff King
2023-03-24 21:38 ` Junio C Hamano
2023-03-24 22:57 ` Taylor Blau
2023-03-20 20:02 ` [PATCH 6/6] pack-bitmap.c: factor out `bitmap_index_seek_commit()` Taylor Blau
2023-03-21 18:13 ` Jeff King
2023-03-21 18:16 ` Taylor Blau
2023-03-21 18:27 ` Jeff King
2023-03-24 18:09 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZB4tiLfNsD3LW/JS@nand.local \
--to=me@ttaylorr.com \
--cc=chakrabortyabhradeep79@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.