All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Karthik Nayak <karthik.188@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 03/10] reftable/record: handle overflows when decoding varints
Date: Mon, 20 Jan 2025 16:09:48 +0100	[thread overview]
Message-ID: <Z45nMqLUHwMGEy11@pks.im> (raw)
In-Reply-To: <CAOLa=ZQxp=tmmBwAV2OR9ODLGf_VHLxG_50-YwN7-s7+c6pmNQ@mail.gmail.com>

On Mon, Jan 20, 2025 at 04:47:47AM -0500, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --git a/reftable/record.c b/reftable/record.c
> > index 04429d23fe..4e6541c307 100644
> > --- a/reftable/record.c
> > +++ b/reftable/record.c
> > @@ -21,47 +21,40 @@ static void *reftable_record_data(struct reftable_record *rec);
> >
> >  int get_var_int(uint64_t *dest, struct string_view *in)
> >  {
> > -	int ptr = 0;
> > +	const unsigned char *buf = in->buf;
> > +	unsigned char c;
> >  	uint64_t val;
> >
> > -	if (in->len == 0)
> > +	if (!in->len)
> >  		return -1;
> > -	val = in->buf[ptr] & 0x7f;
> > -
> > -	while (in->buf[ptr] & 0x80) {
> > -		ptr++;
> > -		if (ptr > in->len) {
> > +	c = *buf++;
> > +	val = c & 0x7f;
> > +
> > +	while (c & 0x80) {
> > +		val += 1;
> 
> I was at first confused, I understand that we add 1 to check if there is
> an overflow before adding the next section. But this actually modifies
> the value itself, but looking below at `put_var_int()`, we did value--
> before storing each continuation byte. So during decoding.
> 
> Nit: it would be nice to explain that part a bit here with comments.

Yeah, I had to think about it a bit myself. It's quite a clever
optimization: when the 0x80 bit is set, we know that the remaining value
cannot be 0. We thus don't have to represent that value, which is why we
can subtract 1 when encoding and re-add 1 when decoding. This allows us
to save a byte in some edge cases.

[snip]
> > -int put_var_int(struct string_view *dest, uint64_t val)
> > +int put_var_int(struct string_view *dest, uint64_t value)
> >  {
> > -	uint8_t buf[10] = { 0 };
> > -	int i = 9;
> > -	int n = 0;
> > -	buf[i] = (uint8_t)(val & 0x7f);
> > -	i--;
> > -	while (1) {
> > -		val >>= 7;
> > -		if (!val) {
> > -			break;
> > -		}
> > -		val--;
> > -		buf[i] = 0x80 | (uint8_t)(val & 0x7f);
> > -		i--;
> > -	}
> > -
> > -	n = sizeof(buf) - i - 1;
> > -	if (dest->len < n)
> > +	unsigned char varint[10];
> > +	unsigned pos = sizeof(varint) - 1;
> > +	varint[pos] = value & 127;
> 
> Nit: While the `get_var_int()` uses hexes, here we use ints. Would be
> nicer to use `0x7f` and so on and be consistent.

Yup, makes sense.

> > +	while (value >>= 7)
> > +		varint[--pos] = 128 | (--value & 127);
> > +	if (dest->len < sizeof(varint) - pos)
> >  		return -1;
> > -	memcpy(dest->buf, &buf[i + 1], n);
> > -	return n;
> > +	memcpy(dest->buf, varint + pos, sizeof(varint) - pos);
> > +	return sizeof(varint) - pos;
> >  }
> >
> >  int reftable_is_block_type(uint8_t typ)
> > diff --git a/reftable/record.h b/reftable/record.h
> > index a24cb23bd4..721d6c949a 100644
> > --- a/reftable/record.h
> > +++ b/reftable/record.h
> > @@ -34,6 +34,10 @@ static inline void string_view_consume(struct string_view *s, int n)
> >
> >  /* utilities for de/encoding varints */
> >
> 
> We should remove this, no?

Yup, good catch.

Patrick

  reply	other threads:[~2025-01-20 15:09 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-16 10:08 [PATCH 00/10] reftable: fix -Wsign-compare warnings Patrick Steinhardt
2025-01-16 10:08 ` [PATCH 01/10] meson: stop disabling -Wsign-compare Patrick Steinhardt
2025-01-16 10:08 ` [PATCH 02/10] reftable/record: drop unused `print` function pointer Patrick Steinhardt
2025-01-16 10:08 ` [PATCH 03/10] reftable/record: handle overflows when decoding varints Patrick Steinhardt
2025-01-20  9:47   ` Karthik Nayak
2025-01-20 15:09     ` Patrick Steinhardt [this message]
2025-01-16 10:08 ` [PATCH 04/10] reftable/basics: adjust `common_prefix_size()` to return `size_t` Patrick Steinhardt
2025-01-16 10:08 ` [PATCH 05/10] reftable/basics: adjust `hash_size()` to return `uint32_t` Patrick Steinhardt
2025-01-16 10:08 ` [PATCH 06/10] reftable/block: adapt header and footer size to return a `size_t` Patrick Steinhardt
2025-01-16 10:08 ` [PATCH 07/10] reftable/block: adjust type of the restart length Patrick Steinhardt
2025-01-16 10:08 ` [PATCH 08/10] reftable/blocksource: adjust type of the block length Patrick Steinhardt
2025-01-16 10:08 ` [PATCH 09/10] reftable/blocksource: adjust `read_block()` to return `ssize_t` Patrick Steinhardt
2025-01-16 10:08 ` [PATCH 10/10] reftable: address trivial -Wsign-compare warnings Patrick Steinhardt
2025-01-16 22:12   ` Junio C Hamano
2025-01-17  6:10     ` Patrick Steinhardt
2025-01-20 10:07 ` [PATCH 00/10] reftable: fix " Karthik Nayak
2025-01-20 15:10   ` Patrick Steinhardt
2025-01-20 16:17 ` [PATCH v2 " Patrick Steinhardt
2025-01-20 16:17   ` [PATCH v2 01/10] meson: stop disabling -Wsign-compare Patrick Steinhardt
2025-01-20 16:17   ` [PATCH v2 02/10] reftable/record: drop unused `print` function pointer Patrick Steinhardt
2025-01-20 16:17   ` [PATCH v2 03/10] reftable/record: handle overflows when decoding varints Patrick Steinhardt
2025-01-20 16:17   ` [PATCH v2 04/10] reftable/basics: adjust `common_prefix_size()` to return `size_t` Patrick Steinhardt
2025-01-20 16:17   ` [PATCH v2 05/10] reftable/basics: adjust `hash_size()` to return `uint32_t` Patrick Steinhardt
2025-01-20 16:17   ` [PATCH v2 06/10] reftable/block: adapt header and footer size to return a `size_t` Patrick Steinhardt
2025-01-20 16:17   ` [PATCH v2 07/10] reftable/block: adjust type of the restart length Patrick Steinhardt
2025-01-20 16:17   ` [PATCH v2 08/10] reftable/blocksource: adjust type of the block length Patrick Steinhardt
2025-01-20 16:17   ` [PATCH v2 09/10] reftable/blocksource: adjust `read_block()` to return `ssize_t` Patrick Steinhardt
2025-01-20 16:17   ` [PATCH v2 10/10] reftable: address trivial -Wsign-compare warnings Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z45nMqLUHwMGEy11@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.