All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Vernet <void@manifault.com>
To: Andrew Werner <awerner32@gmail.com>
Cc: bpf@vger.kernel.org, kernel-team@dataexmachina.dev,
	alexei.starovoitov@gmail.com, andrii@kernel.org,
	olsajiri@gmail.com, houtao@huaweicloud.com
Subject: Re: [PATCH bpf-next v3] libbpf: handle producer position overflow
Date: Thu, 24 Aug 2023 17:46:01 -0500	[thread overview]
Message-ID: <20230824224601.GC11642@maniforge> (raw)
In-Reply-To: <20230824220907.1172808-1-awerner32@gmail.com>

On Thu, Aug 24, 2023 at 06:09:08PM -0400, Andrew Werner wrote:
> Before this patch, the producer position could overflow `unsigned
> long`, in which case libbpf would forever stop processing new writes to
> the ringbuf. Similarly, overflows of the producer position could result
> in __bpf_user_ringbuf_peek not discovering available data. This patch
> addresses that bug by computing using the signed delta between the
> consumer and producer position to determine if data is available; the
> delta computation is robust to overflow.
> 
> A more defensive check could be to ensure that the delta is within
> the allowed range, but such defensive checks are neither present in
> the kernel side code nor in libbpf. The overflow that this patch
> handles can occur while the producer and consumer follow a correct
> protocol.
> 
> Secondarily, the type used to represent the positions in the
> user_ring_buffer functions in both libbpf and the kernel has been
> changed from u64 to unsigned long to match the type used in the
> kernel's representation of the structure. The change occurs in the
> same patch because it's required to align the data availability
> calculations between the userspace producing ringbuf and the bpf
> producing ringbuf.
> 
> Not included in this patch, a selftest was written to demonstrate the
> bug, and indeed this patch allows the test to continue to make progress
> past the overflow. The shape of the self test was as follows:
> 
>  a) Set up ringbuf of 2GB size (the maximum permitted size).
>  b) reserve+commit maximum-sized records (ULONG_MAX/4) constantly as
>     fast as possible.
> 
> With 1 million records per second repro time should be about 4.7 hours.
> Such a test duration is impractical to run, hence the omission.
> 
> Additionally, this patch adds commentary around a separate point to note
> that the modular arithmetic is valid in the face of overflows, as that
> fact may not be obvious to future readers.
> 
> v2->v3:
>   - Changed the representation of the consumer and producer positions
>     from u64 to unsigned long in user_ring_buffer functions. 
>   - Addressed overflow in __bpf_user_ringbuf_peek.
>   - Changed data availability computations to use the signed delta
>     between the consumer and producer positions rather than merely
>     checking whether their values were unequal.
> v1->v2:
>   - Fixed comment grammar.
>   - Properly formatted subject line.
> 
> Signed-off-by: Andrew Werner <awerner32@gmail.com>

Hey Andrew,

This LGTM, thanks for finding and fixing this. I left a comment below
about the cast >= vs. equality comparison, but I won't block on it given
that it's already been discussed on another thread. Here's my tag:

Reviewed-by: David Vernet<void@manifault.com>

> ---
>  kernel/bpf/ringbuf.c    | 11 ++++++++---
>  tools/lib/bpf/ringbuf.c | 16 +++++++++++++---
>  2 files changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> index f045fde632e5..0c48673520fb 100644
> --- a/kernel/bpf/ringbuf.c
> +++ b/kernel/bpf/ringbuf.c
> @@ -658,7 +658,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>  {
>  	int err;
>  	u32 hdr_len, sample_len, total_len, flags, *hdr;
> -	u64 cons_pos, prod_pos;
> +	unsigned long cons_pos, prod_pos;
>  
>  	/* Synchronizes with smp_store_release() in user-space producer. */
>  	prod_pos = smp_load_acquire(&rb->producer_pos);
> @@ -667,7 +667,12 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>  
>  	/* Synchronizes with smp_store_release() in __bpf_user_ringbuf_sample_release() */
>  	cons_pos = smp_load_acquire(&rb->consumer_pos);
> -	if (cons_pos >= prod_pos)
> +
> +	/* Check if there's data available by computing the signed delta between
> +	 * cons_pos and prod_pos; a negative delta indicates that the consumer has
> +	 * not caught up. This formulation is robust to prod_pos wrapping around.
> +	 */
> +	if ((long)(cons_pos - prod_pos) >= 0)

I see that Andrii suggested doing it this way in [0] so I won't insist
on changing it, but IMO this is much less readable and more confusing
than just doing an if (cond_pos == prod_pos) with a comment. The way
it's written this way, it makes it look like there could be a situation
where cond_pos could be ahead of prod_pos, whereas that would actually
just be a bug elsewhere that we'd be papering over. I guess this is more
defensive. In any case, I won't insit on it needing to change.

[0]: https://lore.kernel.org/all/CAEf4BzZQQ=fz+NqFHhJcqKoVAvh4=XbH7HWaHKjUg5OOzi-PTw@mail.gmail.com/

>  		return -ENODATA;
>  
>  	hdr = (u32 *)((uintptr_t)rb->data + (uintptr_t)(cons_pos & rb->mask));
> @@ -711,7 +716,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>  
>  static void __bpf_user_ringbuf_sample_release(struct bpf_ringbuf *rb, size_t size, u64 flags)
>  {
> -	u64 consumer_pos;
> +	unsigned long consumer_pos;
>  	u32 rounded_size = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
>  
>  	/* Using smp_load_acquire() is unnecessary here, as the busy-bit
> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> index 02199364db13..141030a89370 100644
> --- a/tools/lib/bpf/ringbuf.c
> +++ b/tools/lib/bpf/ringbuf.c
> @@ -237,7 +237,13 @@ static int64_t ringbuf_process_ring(struct ring *r)
>  	do {
>  		got_new_data = false;
>  		prod_pos = smp_load_acquire(r->producer_pos);
> -		while (cons_pos < prod_pos) {
> +
> +		/* Check if there's data available by computing the signed delta
> +		 * between cons_pos and prod_pos; a negative delta indicates that the
> +		 * consumer has not caught up. This formulation is robust to prod_pos
> +		 * wrapping around.
> +		 */
> +		while ((long)(cons_pos - prod_pos) < 0) {

Same here. Doing a != is much more clear, in my opinion. Not a blocker
though.

[...]

Thanks,
David

  reply	other threads:[~2023-08-24 22:46 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-24 22:09 [PATCH bpf-next v3] libbpf: handle producer position overflow Andrew Werner
2023-08-24 22:46 ` David Vernet [this message]
2023-08-24 23:42   ` Andrew Werner
2023-08-25 18:25     ` Andrii Nakryiko
2023-08-25  3:17 ` Hou Tao
2023-08-25 16:00   ` Daniel Borkmann
2023-08-25 16:40     ` Andrew Werner
2023-08-25 15:27 ` Daniel Borkmann
2023-08-25 16:39   ` Andrew Werner
2023-08-25 17:23     ` Daniel Borkmann
2023-08-25 17:49       ` Andrew Werner
2023-08-25 18:28       ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230824224601.GC11642@maniforge \
    --to=void@manifault.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=awerner32@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=houtao@huaweicloud.com \
    --cc=kernel-team@dataexmachina.dev \
    --cc=olsajiri@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.