BPF List
 help / color / mirror / Atom feed
* [PATCH bpf v2 0/8] bpf: Fix ring buffer handling
@ 2026-06-19  0:26 Tamir Duberstein
  2026-06-19  0:26 ` [PATCH bpf v2 1/8] libbpf: ringbuf: Honor zero consume bounds Tamir Duberstein
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Tamir Duberstein @ 2026-06-19  0:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Andrea Righi,
	Xu Kuohai, Andrea Righi, Bing-Jhong Billy Jheng, David Vernet
  Cc: bpf, linux-kernel, linux-kselftest, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis, Tamir Duberstein, Sashiko

Fix correctness issues in BPF ring buffer handling.

A zero record bound currently consumes one record. A NULL callback is
accepted during manager construction but crashes when callback-based
consumption reaches the ring.

Native-width position counters are also compared by magnitude in
libbpf consumption and pending reservation tracking. The kernel
user-ring drain widens them before arithmetic. On native 32-bit systems
these paths fail after counter wrap.

The libbpf consumer can also miss a readiness notification after
publishing its position and checking for new data without a full
StoreLoad barrier. Use compiler atomics and add the missing barrier
between publishing a consumer position and the next producer-position
load, including across busy retries and early returns.

Callback traversal does not follow the overwrite position maintained by
BPF_F_RB_OVERWRITE maps. Reject callback consumption of those maps, as
discussed here:
https://lore.kernel.org/bpf/CAEf4Bzaq5drHWChXoRBnrmkb6reAsSVj8r=uByFSup31FMA7hw@mail.gmail.com/

Andrew Werner found the position-wrap and missed-wakeup failures while
implementing Aya ring buffer support. The original implementation
contains the equality reasoning and edge-triggered regression test:
https://github.com/aya-rs/aya/commit/e2cf734490bc188bcedb1eac92d23d81123e42cd

Aya later corrected the consumer ordering with the same explicit fence:
https://github.com/aya-rs/aya/commit/7277a57ea8cdb74918d3096a4b22b6d814481973

Assisted-by: Codex:gpt-5.5
Signed-off-by: Tamir Duberstein <tamird@kernel.org>
---
Changes in v2:
- Fix additional native 32-bit wrap paths found by Sashiko review.
- Explain modular position comparisons and leave mixed-bitness ABI work
  separate.
- Preserve the wakeup handshake across bounded and callback returns.
- Follow BPF comment style and simplify wakeup test cleanup.
- Link to v1: https://patch.msgid.link/20260613-bpf-ringbuf-fixes-v1-0-e623481cb724@kernel.org

---
Tamir Duberstein (8):
      libbpf: ringbuf: Honor zero consume bounds
      libbpf: ringbuf: Prevent NULL callback crash
      libbpf: ringbuf: Reject overwrite callback use
      libbpf: ringbuf: Handle position counter wrap
      bpf: ringbuf: Handle pending position wrap
      bpf: user_ringbuf: Handle position wrap
      libbpf: ringbuf: Use compiler atomics
      libbpf: ringbuf: Prevent missed wakeups

 kernel/bpf/ringbuf.c                             |  12 +-
 tools/lib/bpf/libbpf.h                           |  35 +++-
 tools/lib/bpf/ringbuf.c                          |  97 +++++++---
 tools/testing/selftests/bpf/prog_tests/ringbuf.c | 228 +++++++++++++++++++++++
 4 files changed, 340 insertions(+), 32 deletions(-)
---
base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d
change-id: 20260613-bpf-ringbuf-fixes-e9a8b3c6125b

Best regards,
--  
Tamir Duberstein <tamird@kernel.org>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH bpf v2 1/8] libbpf: ringbuf: Honor zero consume bounds
  2026-06-19  0:26 [PATCH bpf v2 0/8] bpf: Fix ring buffer handling Tamir Duberstein
@ 2026-06-19  0:26 ` Tamir Duberstein
  2026-06-19 21:13   ` Eduard Zingerman
  2026-06-19  0:26 ` [PATCH bpf v2 2/8] libbpf: ringbuf: Prevent NULL callback crash Tamir Duberstein
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Tamir Duberstein @ 2026-06-19  0:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Andrea Righi,
	Xu Kuohai, Andrea Righi, Bing-Jhong Billy Jheng, David Vernet
  Cc: bpf, linux-kernel, linux-kselftest, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis, Tamir Duberstein

ringbuf_process_ring() checks the record bound only after advancing the
consumer position and invoking the callback. A zero bound therefore
consumes the first available record.

Return before reading the ring positions when the bound is zero so
ring_buffer__consume_n() and ring__consume_n() leave all records queued.

Fixes: 4d22ea94ea33 ("libbpf: Add ring__consume_n / ring_buffer__consume_n")
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Assisted-by: Codex:gpt-5.5
Signed-off-by: Tamir Duberstein <tamird@kernel.org>
---
 tools/lib/bpf/ringbuf.c                          |  3 +++
 tools/testing/selftests/bpf/prog_tests/ringbuf.c | 13 +++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
index 00ec4837a06d..f2bb619d5a75 100644
--- a/tools/lib/bpf/ringbuf.c
+++ b/tools/lib/bpf/ringbuf.c
@@ -240,6 +240,9 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
 	bool got_new_data;
 	void *sample;
 
+	if (n == 0)
+		return 0;
+
 	cons_pos = smp_load_acquire(r->consumer_pos);
 	do {
 		got_new_data = false;
diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf.c b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
index 64520684d2cb..4f0558f14847 100644
--- a/tools/testing/selftests/bpf/prog_tests/ringbuf.c
+++ b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
@@ -404,6 +404,7 @@ static int process_n_sample(void *ctx, void *data, size_t len)
 static void ringbuf_n_subtest(void)
 {
 	struct test_ringbuf_n_lskel *skel_n;
+	struct ring *ring;
 	int err, i;
 
 	skel_n = test_ringbuf_n_lskel__open();
@@ -431,6 +432,18 @@ static void ringbuf_n_subtest(void)
 	for (i = 0; i < N_TOT_SAMPLES; i++)
 		syscall(__NR_getpgid);
 
+	ring = ring_buffer__ring(ringbuf, 0);
+	if (!ASSERT_OK_PTR(ring, "ring_buffer__ring"))
+		goto cleanup_ringbuf;
+
+	err = ring_buffer__consume_n(ringbuf, 0);
+	if (!ASSERT_EQ(err, 0, "ringbuf_consume_zero"))
+		goto cleanup_ringbuf;
+
+	err = ring__consume_n(ring, 0);
+	if (!ASSERT_EQ(err, 0, "ring_consume_zero"))
+		goto cleanup_ringbuf;
+
 	/* Consume all samples from the ring buffer in batches of N_SAMPLES */
 	for (i = 0; i < N_TOT_SAMPLES; i += err) {
 		err = ring_buffer__consume_n(ringbuf, N_SAMPLES);

-- 
2.55.0.rc0.159.gbe5d7338c2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf v2 2/8] libbpf: ringbuf: Prevent NULL callback crash
  2026-06-19  0:26 [PATCH bpf v2 0/8] bpf: Fix ring buffer handling Tamir Duberstein
  2026-06-19  0:26 ` [PATCH bpf v2 1/8] libbpf: ringbuf: Honor zero consume bounds Tamir Duberstein
@ 2026-06-19  0:26 ` Tamir Duberstein
  2026-06-19 13:54   ` Jiri Olsa
  2026-06-19  0:26 ` [PATCH bpf v2 3/8] libbpf: ringbuf: Reject overwrite callback use Tamir Duberstein
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Tamir Duberstein @ 2026-06-19  0:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Andrea Righi,
	Xu Kuohai, Andrea Righi, Bing-Jhong Billy Jheng, David Vernet
  Cc: bpf, linux-kernel, linux-kselftest, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis, Tamir Duberstein

ring_buffer__new() and ring_buffer__add() allow a NULL sample
callback. When callback-based consumption reaches such a ring, it calls
through the NULL function pointer and crashes.

Check every ring before manager polling or consumption so a missing
callback returns -EINVAL before the manager waits or consumes another
ring. Check the selected ring before direct per-ring consumption. Perform
both checks before honoring a zero record bound so invalid callback use
always reports the error.

Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
Assisted-by: Codex:gpt-5.5
Signed-off-by: Tamir Duberstein <tamird@kernel.org>
---
 tools/lib/bpf/libbpf.h                           | 11 ++-
 tools/lib/bpf/ringbuf.c                          | 43 +++++++++--
 tools/testing/selftests/bpf/prog_tests/ringbuf.c | 93 ++++++++++++++++++++++++
 3 files changed, 136 insertions(+), 11 deletions(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index bba4e8464396..9ba6b9ad3498 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1526,18 +1526,17 @@ LIBBPF_API int ring__map_fd(const struct ring *r);
  *
  * @param r A ringbuffer object.
  * @return The number of records consumed (or INT_MAX, whichever is less), or
- * a negative number if any of the callbacks return an error.
+ * a negative error code on failure.
  */
 LIBBPF_API int ring__consume(struct ring *r);
 
 /**
- * @brief **ring__consume_n()** consumes up to a requested amount of items from
- * a ringbuffer without event polling.
+ * @brief **ring__consume_n()** consumes up to a requested number of records
+ * from a ringbuffer without event polling.
  *
  * @param r A ringbuffer object.
- * @param n Maximum amount of items to consume.
- * @return The number of items consumed, or a negative number if any of the
- * callbacks return an error.
+ * @param n Maximum number of records to consume.
+ * @return The number of records consumed, or a negative error code on failure.
  */
 LIBBPF_API int ring__consume_n(struct ring *r, size_t n);
 
diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
index f2bb619d5a75..0dae4d95d309 100644
--- a/tools/lib/bpf/ringbuf.c
+++ b/tools/lib/bpf/ringbuf.c
@@ -231,6 +231,26 @@ static inline int roundup_len(__u32 len)
 	return (len + 7) / 8 * 8;
 }
 
+static int ringbuf_validate(const struct ring *r)
+{
+	if (unlikely(!r->sample_cb))
+		return -EINVAL;
+	return 0;
+}
+
+static int ringbuf_validate_callbacks(const struct ring_buffer *rb)
+{
+	int i, err;
+
+	for (i = 0; i < rb->ring_cnt; i++) {
+		err = ringbuf_validate(rb->rings[i]);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int64_t ringbuf_process_ring(struct ring *r, size_t n)
 {
 	int *len_ptr, len, err;
@@ -240,6 +260,9 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
 	bool got_new_data;
 	void *sample;
 
+	err = ringbuf_validate(r);
+	if (err)
+		return err;
 	if (n == 0)
 		return 0;
 
@@ -284,14 +307,17 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
  * records.
  *
  * Returns number of records consumed across all registered ring buffers (or
- * n, whichever is less), or negative number if any of the callbacks return
- * error.
+ * n, whichever is less), or a negative error code on failure.
  */
 int ring_buffer__consume_n(struct ring_buffer *rb, size_t n)
 {
 	int64_t err, res = 0;
 	int i;
 
+	err = ringbuf_validate_callbacks(rb);
+	if (err)
+		return libbpf_err(err);
+
 	for (i = 0; i < rb->ring_cnt; i++) {
 		struct ring *ring = rb->rings[i];
 
@@ -309,14 +335,17 @@ int ring_buffer__consume_n(struct ring_buffer *rb, size_t n)
 
 /* Consume available ring buffer(s) data without event polling.
  * Returns number of records consumed across all registered ring buffers (or
- * INT_MAX, whichever is less), or negative number if any of the callbacks
- * return error.
+ * INT_MAX, whichever is less), or a negative error code on failure.
  */
 int ring_buffer__consume(struct ring_buffer *rb)
 {
 	int64_t err, res = 0;
 	int i;
 
+	err = ringbuf_validate_callbacks(rb);
+	if (err)
+		return libbpf_err(err);
+
 	for (i = 0; i < rb->ring_cnt; i++) {
 		struct ring *ring = rb->rings[i];
 
@@ -334,13 +363,17 @@ int ring_buffer__consume(struct ring_buffer *rb)
 
 /* Poll for available data and consume records, if any are available.
  * Returns number of records consumed (or INT_MAX, whichever is less), or
- * negative number, if any of the registered callbacks returned error.
+ * a negative error code on failure.
  */
 int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms)
 {
 	int i, cnt;
 	int64_t err, res = 0;
 
+	err = ringbuf_validate_callbacks(rb);
+	if (err)
+		return libbpf_err(err);
+
 	cnt = epoll_wait(rb->epoll_fd, rb->events, rb->ring_cnt, timeout_ms);
 	if (cnt < 0)
 		return libbpf_err(-errno);
diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf.c b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
index 4f0558f14847..9ce996bcea8c 100644
--- a/tools/testing/selftests/bpf/prog_tests/ringbuf.c
+++ b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
@@ -401,6 +401,97 @@ static int process_n_sample(void *ctx, void *data, size_t len)
 	return 0;
 }
 
+static int process_noop_sample(void *ctx, void *data, size_t len)
+{
+	return 0;
+}
+
+static void ringbuf_null_cb_subtest(void)
+{
+	struct test_ringbuf_n_lskel *skel_n;
+	struct ring_buffer *ringbuf = NULL;
+	struct ring *ring;
+	unsigned long consumer_pos;
+	int no_cb_map_fd = -1;
+	int err;
+
+	skel_n = test_ringbuf_n_lskel__open();
+	if (!ASSERT_OK_PTR(skel_n, "test_ringbuf_n_lskel__open"))
+		return;
+
+	skel_n->maps.ringbuf.max_entries = getpagesize();
+	skel_n->bss->pid = getpid();
+	skel_n->bss->value = SAMPLE_VALUE;
+
+	err = test_ringbuf_n_lskel__load(skel_n);
+	if (!ASSERT_OK(err, "test_ringbuf_n_lskel__load"))
+		goto cleanup;
+
+	err = test_ringbuf_n_lskel__attach(skel_n);
+	if (!ASSERT_OK(err, "test_ringbuf_n_lskel__attach"))
+		goto cleanup;
+
+	syscall(__NR_getpgid);
+
+	no_cb_map_fd = bpf_map_create(BPF_MAP_TYPE_RINGBUF, NULL, 0, 0,
+				      getpagesize(), NULL);
+	if (!ASSERT_OK_FD(no_cb_map_fd, "bpf_map_create"))
+		goto cleanup;
+
+	/* Manager APIs must validate all rings before consuming any of them. */
+	ringbuf = ring_buffer__new(skel_n->maps.ringbuf.map_fd,
+				   process_noop_sample, NULL, NULL);
+	if (!ASSERT_OK_PTR(ringbuf, "ring_buffer__new"))
+		goto cleanup_fd;
+
+	ring = ring_buffer__ring(ringbuf, 0);
+	if (!ASSERT_OK_PTR(ring, "ring_buffer__ring"))
+		goto cleanup_ringbuf;
+
+	err = ring_buffer__add(ringbuf, no_cb_map_fd, NULL, NULL);
+	if (!ASSERT_OK(err, "ring_buffer__add_no_cb"))
+		goto cleanup_ringbuf;
+
+	consumer_pos = ring__consumer_pos(ring);
+	ASSERT_GT(ring__producer_pos(ring), consumer_pos,
+		  "producer_pos_mixed_cb");
+
+	err = ring_buffer__consume_n(ringbuf, 0);
+	ASSERT_EQ(err, -EINVAL, "ringbuf_consume_zero_mixed_cb");
+	err = ring_buffer__consume(ringbuf);
+	ASSERT_EQ(err, -EINVAL, "ringbuf_consume_mixed_cb");
+	err = ring_buffer__poll(ringbuf, 0);
+	ASSERT_EQ(err, -EINVAL, "ringbuf_poll_mixed_cb");
+	ASSERT_EQ(ring__consumer_pos(ring), consumer_pos,
+		  "consumer_pos_mixed_cb");
+
+	ring_buffer__free(ringbuf);
+	ringbuf =
+		ring_buffer__new(skel_n->maps.ringbuf.map_fd, NULL, NULL, NULL);
+	if (!ASSERT_OK_PTR(ringbuf, "ring_buffer__new_no_cb"))
+		goto cleanup_fd;
+
+	ring = ring_buffer__ring(ringbuf, 0);
+	if (!ASSERT_OK_PTR(ring, "ring_buffer__ring_no_cb"))
+		goto cleanup_ringbuf;
+	consumer_pos = ring__consumer_pos(ring);
+
+	err = ring_buffer__consume_n(ringbuf, 0);
+	ASSERT_EQ(err, -EINVAL, "ringbuf_consume_zero_no_cb");
+	err = ring__consume_n(ring, 0);
+	ASSERT_EQ(err, -EINVAL, "ring_consume_zero_no_cb");
+	err = ring__consume(ring);
+	ASSERT_EQ(err, -EINVAL, "ring_consume_no_cb");
+	ASSERT_EQ(ring__consumer_pos(ring), consumer_pos, "consumer_pos_no_cb");
+
+cleanup_ringbuf:
+	ring_buffer__free(ringbuf);
+cleanup_fd:
+	close(no_cb_map_fd);
+cleanup:
+	test_ringbuf_n_lskel__destroy(skel_n);
+}
+
 static void ringbuf_n_subtest(void)
 {
 	struct test_ringbuf_n_lskel *skel_n;
@@ -579,6 +670,8 @@ void test_ringbuf(void)
 		ringbuf_subtest();
 	if (test__start_subtest("ringbuf_n"))
 		ringbuf_n_subtest();
+	if (test__start_subtest("ringbuf_null_cb"))
+		ringbuf_null_cb_subtest();
 	if (test__start_subtest("ringbuf_map_key"))
 		ringbuf_map_key_subtest();
 	if (test__start_subtest("ringbuf_write"))

-- 
2.55.0.rc0.159.gbe5d7338c2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf v2 3/8] libbpf: ringbuf: Reject overwrite callback use
  2026-06-19  0:26 [PATCH bpf v2 0/8] bpf: Fix ring buffer handling Tamir Duberstein
  2026-06-19  0:26 ` [PATCH bpf v2 1/8] libbpf: ringbuf: Honor zero consume bounds Tamir Duberstein
  2026-06-19  0:26 ` [PATCH bpf v2 2/8] libbpf: ringbuf: Prevent NULL callback crash Tamir Duberstein
@ 2026-06-19  0:26 ` Tamir Duberstein
  2026-06-19  0:26 ` [PATCH bpf v2 4/8] libbpf: ringbuf: Handle position counter wrap Tamir Duberstein
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Tamir Duberstein @ 2026-06-19  0:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Andrea Righi,
	Xu Kuohai, Andrea Righi, Bing-Jhong Billy Jheng, David Vernet
  Cc: bpf, linux-kernel, linux-kselftest, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis, Tamir Duberstein

BPF_F_RB_OVERWRITE can advance overwrite_pos past consumer_pos.
Callback traversal does not read overwrite_pos, so after the producer
laps the consumer it can treat overwritten data as a record header.

An earlier proposal[0] copied the readable window before invoking
callbacks. Review concluded that callbacks are a poor fit because
copying penalizes zero-copy users and the API cannot report skipped
records.

Record the map flag and reject callback consumption with -EOPNOTSUPP.

Link: https://lore.kernel.org/bpf/CAEf4Bzaq5drHWChXoRBnrmkb6reAsSVj8r=uByFSup31FMA7hw@mail.gmail.com/ [0]
Fixes: feeaf1346f80 ("bpf: Add overwrite mode for BPF ring buffer")
Assisted-by: Codex:gpt-5.5
Signed-off-by: Tamir Duberstein <tamird@kernel.org>
---
 tools/lib/bpf/libbpf.h                           |  1 +
 tools/lib/bpf/ringbuf.c                          |  4 +++
 tools/testing/selftests/bpf/prog_tests/ringbuf.c | 39 ++++++++++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 9ba6b9ad3498..ae46b17feaa6 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1439,6 +1439,7 @@ struct ring_buffer;
 struct ring;
 struct user_ring_buffer;
 
+/* Callback-based consumption is unsupported for BPF_F_RB_OVERWRITE maps. */
 typedef int (*ring_buffer_sample_fn)(void *ctx, void *data, size_t size);
 
 struct ring_buffer_opts {
diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
index 0dae4d95d309..5737e02c1670 100644
--- a/tools/lib/bpf/ringbuf.c
+++ b/tools/lib/bpf/ringbuf.c
@@ -30,6 +30,7 @@ struct ring {
 	unsigned long *producer_pos;
 	unsigned long mask;
 	int map_fd;
+	bool overwrite;
 };
 
 struct ring_buffer {
@@ -118,6 +119,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd,
 	r->sample_cb = sample_cb;
 	r->ctx = ctx;
 	r->mask = info.max_entries - 1;
+	r->overwrite = info.map_flags & BPF_F_RB_OVERWRITE;
 
 	/* Map writable consumer page */
 	tmp = mmap(NULL, rb->page_size, PROT_READ | PROT_WRITE, MAP_SHARED, map_fd, 0);
@@ -233,6 +235,8 @@ static inline int roundup_len(__u32 len)
 
 static int ringbuf_validate(const struct ring *r)
 {
+	if (unlikely(r->overwrite))
+		return -EOPNOTSUPP;
 	if (unlikely(!r->sample_cb))
 		return -EINVAL;
 	return 0;
diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf.c b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
index 9ce996bcea8c..29be2476c478 100644
--- a/tools/testing/selftests/bpf/prog_tests/ringbuf.c
+++ b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
@@ -602,6 +602,43 @@ static void ringbuf_map_key_subtest(void)
 	test_ringbuf_map_key_lskel__destroy(skel_map_key);
 }
 
+static void ringbuf_overwrite_callback_subtest(void)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_RB_OVERWRITE);
+	struct ring_buffer *ringbuf;
+	struct ring *ring;
+	int map_fd, err;
+
+	map_fd = bpf_map_create(BPF_MAP_TYPE_RINGBUF, NULL, 0, 0, getpagesize(),
+				&opts);
+	if (!ASSERT_OK_FD(map_fd, "bpf_map_create"))
+		return;
+
+	ringbuf = ring_buffer__new(map_fd, process_noop_sample, NULL, NULL);
+	if (!ASSERT_OK_PTR(ringbuf, "ring_buffer__new"))
+		goto cleanup_fd;
+
+	ring = ring_buffer__ring(ringbuf, 0);
+	if (!ASSERT_OK_PTR(ring, "ring_buffer__ring"))
+		goto cleanup_ringbuf;
+
+	err = ring_buffer__consume_n(ringbuf, 0);
+	ASSERT_EQ(err, -EOPNOTSUPP, "ringbuf_consume_zero");
+	err = ring_buffer__consume(ringbuf);
+	ASSERT_EQ(err, -EOPNOTSUPP, "ringbuf_consume");
+	err = ring_buffer__poll(ringbuf, 0);
+	ASSERT_EQ(err, -EOPNOTSUPP, "ringbuf_poll");
+	err = ring__consume_n(ring, 0);
+	ASSERT_EQ(err, -EOPNOTSUPP, "ring_consume_zero");
+	err = ring__consume(ring);
+	ASSERT_EQ(err, -EOPNOTSUPP, "ring_consume");
+
+cleanup_ringbuf:
+	ring_buffer__free(ringbuf);
+cleanup_fd:
+	close(map_fd);
+}
+
 static void ringbuf_overwrite_mode_subtest(void)
 {
 	unsigned long size, len1, len2, len3, len4, len5;
@@ -676,6 +713,8 @@ void test_ringbuf(void)
 		ringbuf_map_key_subtest();
 	if (test__start_subtest("ringbuf_write"))
 		ringbuf_write_subtest();
+	if (test__start_subtest("ringbuf_overwrite_callback"))
+		ringbuf_overwrite_callback_subtest();
 	if (test__start_subtest("ringbuf_overwrite_mode"))
 		ringbuf_overwrite_mode_subtest();
 }

-- 
2.55.0.rc0.159.gbe5d7338c2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf v2 4/8] libbpf: ringbuf: Handle position counter wrap
  2026-06-19  0:26 [PATCH bpf v2 0/8] bpf: Fix ring buffer handling Tamir Duberstein
                   ` (2 preceding siblings ...)
  2026-06-19  0:26 ` [PATCH bpf v2 3/8] libbpf: ringbuf: Reject overwrite callback use Tamir Duberstein
@ 2026-06-19  0:26 ` Tamir Duberstein
  2026-06-19  0:41   ` sashiko-bot
  2026-06-19  0:26 ` [PATCH bpf v2 5/8] bpf: ringbuf: Handle pending position wrap Tamir Duberstein
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Tamir Duberstein @ 2026-06-19  0:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Andrea Righi,
	Xu Kuohai, Andrea Righi, Bing-Jhong Billy Jheng, David Vernet
  Cc: bpf, linux-kernel, linux-kselftest, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis, Tamir Duberstein

Ring buffer positions are unsigned long counters and can wrap on 32-bit
systems. ringbuf_process_ring() stops consuming when producer_pos wraps
below consumer_pos because it compares the counters by magnitude.

Compare the positions for equality instead. The producer cannot move
logically behind the consumer in a non-overwrite ring.

Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
Reported-by: Andrew Werner <awerner32@gmail.com>
Assisted-by: Codex:gpt-5.5
Signed-off-by: Tamir Duberstein <tamird@kernel.org>
---
 tools/lib/bpf/ringbuf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
index 5737e02c1670..1c5bce2b5e12 100644
--- a/tools/lib/bpf/ringbuf.c
+++ b/tools/lib/bpf/ringbuf.c
@@ -274,7 +274,8 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
 	do {
 		got_new_data = false;
 		prod_pos = smp_load_acquire(r->producer_pos);
-		while (cons_pos < prod_pos) {
+		/* Positions wrap; the consumer cannot logically pass the producer. */
+		while (cons_pos != prod_pos) {
 			len_ptr = r->data + (cons_pos & r->mask);
 			len = smp_load_acquire(len_ptr);
 

-- 
2.55.0.rc0.159.gbe5d7338c2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf v2 5/8] bpf: ringbuf: Handle pending position wrap
  2026-06-19  0:26 [PATCH bpf v2 0/8] bpf: Fix ring buffer handling Tamir Duberstein
                   ` (3 preceding siblings ...)
  2026-06-19  0:26 ` [PATCH bpf v2 4/8] libbpf: ringbuf: Handle position counter wrap Tamir Duberstein
@ 2026-06-19  0:26 ` Tamir Duberstein
  2026-06-19  0:45   ` sashiko-bot
  2026-06-19  0:26 ` [PATCH bpf v2 6/8] bpf: user_ringbuf: Handle " Tamir Duberstein
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Tamir Duberstein @ 2026-06-19  0:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Andrea Righi,
	Xu Kuohai, Andrea Righi, Bing-Jhong Billy Jheng, David Vernet
  Cc: bpf, linux-kernel, linux-kselftest, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis, Tamir Duberstein, Sashiko

Ring buffer positions are unsigned long counters and can wrap on 32-bit
systems. __bpf_ringbuf_reserve() stops advancing pending_pos after
producer_pos wraps because it compares the counters by magnitude.

Compare the positions for equality instead. The pending position cannot
logically move ahead of the producer position.

Fixes: cfa1a2329a69 ("bpf: Fix overrunning reservations in ringbuf")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/bpf/20260614020552.022A11F000E9@smtp.kernel.org/
Assisted-by: Codex:gpt-5.5
Signed-off-by: Tamir Duberstein <tamird@kernel.org>
---
 kernel/bpf/ringbuf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 35ae64ade36b..909880031fd3 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -482,7 +482,8 @@ static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size)
 	prod_pos = rb->producer_pos;
 	new_prod_pos = prod_pos + len;
 
-	while (pend_pos < prod_pos) {
+	/* Positions wrap; pending_pos cannot logically pass producer_pos. */
+	while (pend_pos != prod_pos) {
 		hdr = (void *)rb->data + (pend_pos & rb->mask);
 		hdr_len = READ_ONCE(hdr->len);
 		if (hdr_len & BPF_RINGBUF_BUSY_BIT)

-- 
2.55.0.rc0.159.gbe5d7338c2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf v2 6/8] bpf: user_ringbuf: Handle position wrap
  2026-06-19  0:26 [PATCH bpf v2 0/8] bpf: Fix ring buffer handling Tamir Duberstein
                   ` (4 preceding siblings ...)
  2026-06-19  0:26 ` [PATCH bpf v2 5/8] bpf: ringbuf: Handle pending position wrap Tamir Duberstein
@ 2026-06-19  0:26 ` Tamir Duberstein
  2026-06-19  0:40   ` sashiko-bot
  2026-06-19  0:26 ` [PATCH bpf v2 7/8] libbpf: ringbuf: Use compiler atomics Tamir Duberstein
  2026-06-19  0:26 ` [PATCH bpf v2 8/8] libbpf: ringbuf: Prevent missed wakeups Tamir Duberstein
  7 siblings, 1 reply; 17+ messages in thread
From: Tamir Duberstein @ 2026-06-19  0:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Andrea Righi,
	Xu Kuohai, Andrea Righi, Bing-Jhong Billy Jheng, David Vernet
  Cc: bpf, linux-kernel, linux-kselftest, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis, Tamir Duberstein, Sashiko

User ring buffer positions are unsigned long counters, but
__bpf_user_ringbuf_peek() widens them to u64 before comparing and
subtracting them. On 32-bit systems, producer_pos wrapping below
consumer_pos therefore appears to move backwards and permanently stalls
the ring. The widened subtraction can also bypass the advertised-window
check.

Keep the positions word-sized and derive the available data with
word-sized subtraction so the arithmetic wraps with the counters. Treat
a zero span as empty and reject spans larger than the ring before
reading a record header.

Fixes: 205715673844 ("bpf: Add bpf_user_ringbuf_drain() helper")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/bpf/20260614020552.022A11F000E9@smtp.kernel.org/
Assisted-by: Codex:gpt-5.5
Signed-off-by: Tamir Duberstein <tamird@kernel.org>
---
 kernel/bpf/ringbuf.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 909880031fd3..19cbb9b74ee7 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -748,9 +748,9 @@ const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto = {
 
 static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *size)
 {
+	unsigned long avail, cons_pos, prod_pos;
 	int err;
 	u32 hdr_len, sample_len, total_len, flags, *hdr;
-	u64 cons_pos, prod_pos;
 
 	/* Synchronizes with smp_store_release() in user-space producer. */
 	prod_pos = smp_load_acquire(&rb->producer_pos);
@@ -759,8 +759,11 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
 
 	/* Synchronizes with smp_store_release() in __bpf_user_ringbuf_sample_release() */
 	cons_pos = smp_load_acquire(&rb->consumer_pos);
-	if (cons_pos >= prod_pos)
+	avail = prod_pos - cons_pos;
+	if (!avail)
 		return -ENODATA;
+	if (avail > ringbuf_total_data_sz(rb))
+		return -EINVAL;
 
 	hdr = (u32 *)((uintptr_t)rb->data + (uintptr_t)(cons_pos & rb->mask));
 	/* Synchronizes with smp_store_release() in user-space producer. */
@@ -770,7 +773,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
 	total_len = round_up(sample_len + BPF_RINGBUF_HDR_SZ, 8);
 
 	/* The sample must fit within the region advertised by the producer position. */
-	if (total_len > prod_pos - cons_pos)
+	if (total_len > avail)
 		return -EINVAL;
 
 	/* The sample must fit within the data region of the ring buffer. */

-- 
2.55.0.rc0.159.gbe5d7338c2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf v2 7/8] libbpf: ringbuf: Use compiler atomics
  2026-06-19  0:26 [PATCH bpf v2 0/8] bpf: Fix ring buffer handling Tamir Duberstein
                   ` (5 preceding siblings ...)
  2026-06-19  0:26 ` [PATCH bpf v2 6/8] bpf: user_ringbuf: Handle " Tamir Duberstein
@ 2026-06-19  0:26 ` Tamir Duberstein
  2026-06-19  0:26 ` [PATCH bpf v2 8/8] libbpf: ringbuf: Prevent missed wakeups Tamir Duberstein
  7 siblings, 0 replies; 17+ messages in thread
From: Tamir Duberstein @ 2026-06-19  0:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Andrea Righi,
	Xu Kuohai, Andrea Righi, Bing-Jhong Billy Jheng, David Vernet
  Cc: bpf, linux-kernel, linux-kselftest, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis, Tamir Duberstein

Consumer-side ring buffer code uses architecture-specific smp_* helpers
for shared memory accesses.

Use compiler atomics instead. They provide equivalent acquire and
release ordering through a portable userspace interface and allow the
next commit to use compiler fences in the wakeup protocol without mixing
atomic interfaces.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Assisted-by: Codex:gpt-5.5
Signed-off-by: Tamir Duberstein <tamird@kernel.org>
---
 tools/lib/bpf/ringbuf.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
index 1c5bce2b5e12..141f2cbe56eb 100644
--- a/tools/lib/bpf/ringbuf.c
+++ b/tools/lib/bpf/ringbuf.c
@@ -270,14 +270,14 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
 	if (n == 0)
 		return 0;
 
-	cons_pos = smp_load_acquire(r->consumer_pos);
+	cons_pos = __atomic_load_n(r->consumer_pos, __ATOMIC_ACQUIRE);
 	do {
 		got_new_data = false;
-		prod_pos = smp_load_acquire(r->producer_pos);
+		prod_pos = __atomic_load_n(r->producer_pos, __ATOMIC_ACQUIRE);
 		/* Positions wrap; the consumer cannot logically pass the producer. */
 		while (cons_pos != prod_pos) {
 			len_ptr = r->data + (cons_pos & r->mask);
-			len = smp_load_acquire(len_ptr);
+			len = __atomic_load_n(len_ptr, __ATOMIC_ACQUIRE);
 
 			/* sample not committed yet, bail out for now */
 			if (len & BPF_RINGBUF_BUSY_BIT)
@@ -291,14 +291,16 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
 				err = r->sample_cb(r->ctx, sample, len);
 				if (err < 0) {
 					/* update consumer pos and bail out */
-					smp_store_release(r->consumer_pos,
-							  cons_pos);
+					__atomic_store_n(r->consumer_pos,
+							 cons_pos,
+							 __ATOMIC_RELEASE);
 					return err;
 				}
 				cnt++;
 			}
 
-			smp_store_release(r->consumer_pos, cons_pos);
+			__atomic_store_n(r->consumer_pos, cons_pos,
+					 __ATOMIC_RELEASE);
 
 			if (cnt >= n)
 				goto done;
@@ -413,8 +415,8 @@ struct ring *ring_buffer__ring(struct ring_buffer *rb, unsigned int idx)
 
 unsigned long ring__consumer_pos(const struct ring *r)
 {
-	/* Synchronizes with smp_store_release() in ringbuf_process_ring(). */
-	return smp_load_acquire(r->consumer_pos);
+	/* Synchronizes with the release store in ringbuf_process_ring(). */
+	return __atomic_load_n(r->consumer_pos, __ATOMIC_ACQUIRE);
 }
 
 unsigned long ring__producer_pos(const struct ring *r)
@@ -422,7 +424,7 @@ unsigned long ring__producer_pos(const struct ring *r)
 	/* Synchronizes with smp_store_release() in __bpf_ringbuf_reserve() in
 	 * the kernel.
 	 */
-	return smp_load_acquire(r->producer_pos);
+	return __atomic_load_n(r->producer_pos, __ATOMIC_ACQUIRE);
 }
 
 size_t ring__avail_data_size(const struct ring *r)

-- 
2.55.0.rc0.159.gbe5d7338c2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf v2 8/8] libbpf: ringbuf: Prevent missed wakeups
  2026-06-19  0:26 [PATCH bpf v2 0/8] bpf: Fix ring buffer handling Tamir Duberstein
                   ` (6 preceding siblings ...)
  2026-06-19  0:26 ` [PATCH bpf v2 7/8] libbpf: ringbuf: Use compiler atomics Tamir Duberstein
@ 2026-06-19  0:26 ` Tamir Duberstein
  2026-06-20  1:53   ` Eduard Zingerman
  7 siblings, 1 reply; 17+ messages in thread
From: Tamir Duberstein @ 2026-06-19  0:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Andrea Righi,
	Xu Kuohai, Andrea Righi, Bing-Jhong Billy Jheng, David Vernet
  Cc: bpf, linux-kernel, linux-kselftest, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis, Tamir Duberstein, Sashiko

After consuming the last visible record, ringbuf_process_ring()
publishes the consumer position and checks the producer position. These
operations lack a full StoreLoad barrier. A producer can therefore
commit a new record but read the old consumer position while the
consumer reads the old producer position. The producer sends no
notification and the consumer waits despite a queued record.

Insert a full barrier between publishing a consumer position and the
next producer position load. When a record bound or callback ends the
current invocation first, execute the barrier before returning so the
load in a later invocation completes the same handshake.

Add an edge-triggered epoll test that drains one record per call while a
concurrent producer fills the ring. Without the barrier, a missed
notification leaves the producer dropping records from a full ring while
the consumer times out. Document that bounded consumers and callbacks
that terminate consumption must drain before waiting again.

Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
Reported-by: Andrew Werner <awerner32@gmail.com>
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/bpf/20260614015716.945AF1F000E9@smtp.kernel.org/
Assisted-by: Codex:gpt-5.5
Signed-off-by: Tamir Duberstein <tamird@kernel.org>
---
 tools/lib/bpf/libbpf.h                           | 23 +++++++
 tools/lib/bpf/ringbuf.c                          | 24 +++++--
 tools/testing/selftests/bpf/prog_tests/ringbuf.c | 83 ++++++++++++++++++++++++
 3 files changed, 123 insertions(+), 7 deletions(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index ae46b17feaa6..3a649ed87034 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1440,6 +1440,11 @@ struct ring;
 struct user_ring_buffer;
 
 /* Callback-based consumption is unsupported for BPF_F_RB_OVERWRITE maps. */
+/*
+ * A negative return stops consumption; non-negative values continue. Stopping
+ * can leave records queued without a new readiness notification. Before
+ * waiting for readiness again, consume until no records remain.
+ */
 typedef int (*ring_buffer_sample_fn)(void *ctx, void *data, size_t size);
 
 struct ring_buffer_opts {
@@ -1456,6 +1461,20 @@ LIBBPF_API int ring_buffer__add(struct ring_buffer *rb, int map_fd,
 				ring_buffer_sample_fn sample_cb, void *ctx);
 LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms);
 LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb);
+
+/**
+ * @brief **ring_buffer__consume_n()** consumes up to a requested number of
+ * records from a ring buffer manager without event polling.
+ *
+ * @param rb A ring buffer manager object.
+ * @param n Maximum number of records to consume.
+ * @return The number of records consumed, or a negative error code on failure.
+ *
+ * Reaching the requested bound does not establish that every ring is empty.
+ * Records can remain queued without a new readiness notification. Before
+ * calling ring_buffer__poll() or waiting on ring_buffer__epoll_fd(), call
+ * ring_buffer__consume() until it returns 0.
+ */
 LIBBPF_API int ring_buffer__consume_n(struct ring_buffer *rb, size_t n);
 LIBBPF_API int ring_buffer__epoll_fd(const struct ring_buffer *rb);
 
@@ -1538,6 +1557,10 @@ LIBBPF_API int ring__consume(struct ring *r);
  * @param r A ringbuffer object.
  * @param n Maximum number of records to consume.
  * @return The number of records consumed, or a negative error code on failure.
+ *
+ * Reaching the requested bound does not establish that the ring is empty.
+ * Records can remain queued without a new readiness notification. Before
+ * waiting on ring__map_fd(), call ring__consume() until it returns 0.
  */
 LIBBPF_API int ring__consume_n(struct ring *r, size_t n);
 
diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
index 141f2cbe56eb..0598f6c2f7da 100644
--- a/tools/lib/bpf/ringbuf.c
+++ b/tools/lib/bpf/ringbuf.c
@@ -271,7 +271,7 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
 		return 0;
 
 	cons_pos = __atomic_load_n(r->consumer_pos, __ATOMIC_ACQUIRE);
-	do {
+	for (;;) {
 		got_new_data = false;
 		prod_pos = __atomic_load_n(r->producer_pos, __ATOMIC_ACQUIRE);
 		/* Positions wrap; the consumer cannot logically pass the producer. */
@@ -279,9 +279,9 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
 			len_ptr = r->data + (cons_pos & r->mask);
 			len = __atomic_load_n(len_ptr, __ATOMIC_ACQUIRE);
 
-			/* sample not committed yet, bail out for now */
+			/* Retry a busy record once after publishing prior records. */
 			if (len & BPF_RINGBUF_BUSY_BIT)
-				goto done;
+				break;
 
 			got_new_data = true;
 			cons_pos += roundup_len(len);
@@ -294,7 +294,8 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
 					__atomic_store_n(r->consumer_pos,
 							 cons_pos,
 							 __ATOMIC_RELEASE);
-					return err;
+					cnt = err;
+					break;
 				}
 				cnt++;
 			}
@@ -303,10 +304,19 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
 					 __ATOMIC_RELEASE);
 
 			if (cnt >= n)
-				goto done;
+				break;
 		}
-	} while (got_new_data);
-done:
+		if (!got_new_data)
+			break;
+
+		/*
+		 * Order the published consumer position before the next
+		 * producer-position load, whether below or in a later invocation.
+		 */
+		__atomic_thread_fence(__ATOMIC_SEQ_CST);
+		if (cnt < 0 || cnt >= n)
+			break;
+	}
 	return cnt;
 }
 
diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf.c b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
index 29be2476c478..0d45a766a580 100644
--- a/tools/testing/selftests/bpf/prog_tests/ringbuf.c
+++ b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
@@ -492,6 +492,87 @@ static void ringbuf_null_cb_subtest(void)
 	test_ringbuf_n_lskel__destroy(skel_n);
 }
 
+#define N_WAKEUP_SAMPLES 20000
+
+struct wakeup_ctx {
+	bool stop;
+};
+
+static void *wakeup_producer(void *arg)
+{
+	struct wakeup_ctx *ctx = arg;
+
+	while (!__atomic_load_n(&ctx->stop, __ATOMIC_RELAXED))
+		syscall(__NR_getpgid);
+	return NULL;
+}
+
+static void ringbuf_wakeup_subtest(void)
+{
+	struct test_ringbuf_n_lskel *skel_n;
+	struct ring_buffer *ringbuf = NULL;
+	struct epoll_event event = {
+		.events = EPOLLIN | EPOLLET,
+	};
+	struct wakeup_ctx ctx = {};
+	pthread_t producer;
+	int epoll_fd = -1;
+	int err, total = 0;
+
+	skel_n = test_ringbuf_n_lskel__open();
+	if (!ASSERT_OK_PTR(skel_n, "test_ringbuf_n_lskel__open"))
+		return;
+
+	skel_n->maps.ringbuf.max_entries = getpagesize();
+	skel_n->bss->pid = getpid();
+	skel_n->bss->value = SAMPLE_VALUE;
+
+	err = test_ringbuf_n_lskel__load(skel_n);
+	if (!ASSERT_OK(err, "test_ringbuf_n_lskel__load"))
+		goto cleanup;
+
+	err = test_ringbuf_n_lskel__attach(skel_n);
+	if (!ASSERT_OK(err, "test_ringbuf_n_lskel__attach"))
+		goto cleanup;
+
+	ringbuf = ring_buffer__new(skel_n->maps.ringbuf.map_fd,
+				   process_noop_sample, NULL, NULL);
+	if (!ASSERT_OK_PTR(ringbuf, "ring_buffer__new"))
+		goto cleanup;
+
+	epoll_fd = epoll_create1(EPOLL_CLOEXEC);
+	if (!ASSERT_OK_FD(epoll_fd, "epoll_create1"))
+		goto cleanup_ringbuf;
+
+	err = epoll_ctl(epoll_fd, EPOLL_CTL_ADD, skel_n->maps.ringbuf.map_fd,
+			&event);
+	if (!ASSERT_OK(err, "epoll_ctl"))
+		goto cleanup_epoll;
+
+	err = pthread_create(&producer, NULL, wakeup_producer, &ctx);
+	if (!ASSERT_OK(err, "pthread_create"))
+		goto cleanup_epoll;
+
+	while (total < N_WAKEUP_SAMPLES) {
+		err = epoll_wait(epoll_fd, &event, 1, 1000);
+		if (!ASSERT_EQ(err, 1, "epoll_wait"))
+			break;
+		while ((err = ring_buffer__consume_n(ringbuf, 1)) > 0)
+			total += err;
+		if (!ASSERT_OK(err, "ring_buffer__consume_n"))
+			break;
+	}
+
+	__atomic_store_n(&ctx.stop, true, __ATOMIC_RELAXED);
+	pthread_join(producer, NULL);
+cleanup_epoll:
+	close(epoll_fd);
+cleanup_ringbuf:
+	ring_buffer__free(ringbuf);
+cleanup:
+	test_ringbuf_n_lskel__destroy(skel_n);
+}
+
 static void ringbuf_n_subtest(void)
 {
 	struct test_ringbuf_n_lskel *skel_n;
@@ -709,6 +790,8 @@ void test_ringbuf(void)
 		ringbuf_n_subtest();
 	if (test__start_subtest("ringbuf_null_cb"))
 		ringbuf_null_cb_subtest();
+	if (test__start_subtest("ringbuf_wakeup"))
+		ringbuf_wakeup_subtest();
 	if (test__start_subtest("ringbuf_map_key"))
 		ringbuf_map_key_subtest();
 	if (test__start_subtest("ringbuf_write"))

-- 
2.55.0.rc0.159.gbe5d7338c2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf v2 6/8] bpf: user_ringbuf: Handle position wrap
  2026-06-19  0:26 ` [PATCH bpf v2 6/8] bpf: user_ringbuf: Handle " Tamir Duberstein
@ 2026-06-19  0:40   ` sashiko-bot
  0 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-06-19  0:40 UTC (permalink / raw)
  To: Tamir Duberstein; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] ringbuf_avail_data_sz() uses max() for wrapping counters, yielding incorrect availability estimates after wrap-around on 32-bit systems.
- [High] 32-bit user-space processes on a 64-bit kernel will permanently stall the ring buffer upon crossing the 4GB counter boundary.
--

commit ffd4323e55c15970e65b2c3f90be01a022bf7ad4
Author: Tamir Duberstein <tamird@kernel.org>

bpf: user_ringbuf: Handle position wrap

This commit fixes an issue where user ring buffer positions are widened to u64
before comparing and subtracting them. This caused producer_pos wrapping below
consumer_pos on 32-bit systems to appear as backward movement, permanently
stalling the ring. The patch keeps the positions word-sized and derives
available data using word-sized subtraction so arithmetic wraps with the
counters.

Closes: https://lore.kernel.org/bpf/20260614020552.022A11F000E9@smtp.kernel.org/

> diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> index 909880031fd38..19cbb9b74ee7c 100644
> --- a/kernel/bpf/ringbuf.c
> +++ b/kernel/bpf/ringbuf.c

[ ... ]

> @@ -759,8 +759,11 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>  
>  	/* Synchronizes with smp_store_release() in __bpf_user_ringbuf_sample_release() */
>  	cons_pos = smp_load_acquire(&rb->consumer_pos);
> -	if (cons_pos >= prod_pos)
> +	avail = prod_pos - cons_pos;

[Severity: High]
This isn't a bug introduced by this patch, but does this 64-bit subtraction
fail to account for the 32-bit wrapping behavior of a compat userspace
producer?

The shared producer_pos and consumer_pos counters in struct bpf_ringbuf are
defined as unsigned long, making them 8 bytes on a 64-bit kernel but 4 bytes
for a 32-bit user-space process.

When a 32-bit user-space process updates the counter, it only updates the
lower 4 bytes. Upon reaching 4GB, it wraps to 0, leaving the upper 4 bytes
as 0. The 64-bit kernel reads this as a massive backward jump to 0.

Since avail is evaluated as an 8-byte unsigned long here, the subtraction
underflows to a huge 64-bit number (e.g., 0xFFFFFFFF00000020).

> +	if (!avail)
>  		return -ENODATA;
> +	if (avail > ringbuf_total_data_sz(rb))
> +		return -EINVAL;

[Severity: High]
This isn't a bug introduced by this patch, but will a 32-bit user-space
process on a 64-bit kernel permanently stall the ring buffer upon crossing
the 4GB counter boundary here?

Due to the underflow mentioned above, the massively inflated avail value
fails this bounds check. This causes __bpf_user_ringbuf_peek() to
permanently return -EINVAL.

Could we use (u32)prod_pos - (u32)cons_pos to natively wrap at 32-bits,
mitigating the issue for compat userspace since the ring size is inherently
bounded by 32 bits?

[ ... ]

[Severity: High]
This isn't a bug introduced by this patch, but does ringbuf_avail_data_sz()
yield incorrect availability estimates after wrap-around on 32-bit systems?

In ringbuf_avail_data_sz():

kernel/bpf/ringbuf.c:ringbuf_avail_data_sz() {
    ...
    if (unlikely(rb->overwrite_mode)) {
        over_pos = smp_load_acquire(&rb->overwrite_pos);
        prod_pos = smp_load_acquire(&rb->producer_pos);
        return prod_pos - max(cons_pos, over_pos);
    ...
}

The available data calculation for overwrite mode uses max(cons_pos,
over_pos). Because these counters wrap around at 4GB on 32-bit systems,
the max() macro will incorrectly evaluate a wrapped counter (e.g.,
0x0000000A) as smaller than an unwrapped one (e.g., 0xFFFFFFF0), even
though it is logically ahead.

This causes the function to calculate the difference from the wrong
offset, resulting in inaccurate data availability metrics, which can
lead to spurious or missed epoll wakeups and garbage values returned by
bpf_ringbuf_query(BPF_RB_AVAIL_DATA).

Should a wrap-safe macro like time_after() or simple signed modulo
arithmetic be used instead?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260618-bpf-ringbuf-fixes-v2-0-33fde039ddf3@kernel.org?part=6

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf v2 4/8] libbpf: ringbuf: Handle position counter wrap
  2026-06-19  0:26 ` [PATCH bpf v2 4/8] libbpf: ringbuf: Handle position counter wrap Tamir Duberstein
@ 2026-06-19  0:41   ` sashiko-bot
  0 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-06-19  0:41 UTC (permalink / raw)
  To: Tamir Duberstein; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The BPF ring buffer permanently fails for 32-bit user space applications running on 64-bit kernels after 4GB of data transfer.
--

commit f2aeeb4a1cf7d41d48bd7bbff2ebee0477b1c33c
Author: Tamir Duberstein <tamird@kernel.org>

libbpf: ringbuf: Handle position counter wrap

This commit addresses an issue in ringbuf_process_ring() on 32-bit systems
where the counters can wrap. It changes the loop condition to compare the
consumer and producer positions for equality instead of magnitude.

> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> index 5737e02c16703..1c5bce2b5e12d 100644
> --- a/tools/lib/bpf/ringbuf.c
> +++ b/tools/lib/bpf/ringbuf.c
> @@ -274,7 +274,8 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
>  	do {
>  		got_new_data = false;
>  		prod_pos = smp_load_acquire(r->producer_pos);
> -		while (cons_pos < prod_pos) {
> +		/* Positions wrap; the consumer cannot logically pass the producer. */
> +		while (cons_pos != prod_pos) {

[Severity: High]
This isn't a bug introduced by this patch, but since we are handling 32-bit
counter wrap logic here, does this also expose the compat mismatch between
32-bit userspace and 64-bit kernels?

In tools/lib/bpf/ringbuf.c, struct ring defines consumer_pos as an
unsigned long pointer. On a 32-bit userspace, this maps and writes 4 bytes.
However, on a 64-bit kernel, struct bpf_ringbuf in kernel/bpf/ringbuf.c
defines consumer_pos as an 8-byte unsigned long.

Because of little-endian mapping, a 32-bit userspace app only overwrites
the lower 4 bytes of the kernel's 64-bit consumer_pos. When the userspace
counter wraps around 4GB, it writes a small value back to the lower 4 bytes
while the upper 4 bytes remain zero.

Meanwhile, the kernel's 64-bit producer_pos continues to increment past 4GB.
When bpf_ringbuf_has_space() evaluates new_prod_pos - cons_pos, the
difference evaluates to larger than 4GB, exceeding the ring buffer's
capacity mask.

Could this permanently fail all further kernel ring buffer reservations,
effectively deadlocking compat tasks once 4GB of data is transferred?

>  			len_ptr = r->data + (cons_pos & r->mask);
>  			len = smp_load_acquire(len_ptr);
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260618-bpf-ringbuf-fixes-v2-0-33fde039ddf3@kernel.org?part=4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf v2 5/8] bpf: ringbuf: Handle pending position wrap
  2026-06-19  0:26 ` [PATCH bpf v2 5/8] bpf: ringbuf: Handle pending position wrap Tamir Duberstein
@ 2026-06-19  0:45   ` sashiko-bot
  0 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-06-19  0:45 UTC (permalink / raw)
  To: Tamir Duberstein; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] `ringbuf_avail_data_sz` incorrectly uses the `max()` macro on wrapping `unsigned long` counters, leading to erroneous available data calculations on 32-bit systems.
--

commit c0d823f17573cdb3b8da49e472d0ae2b71115825
Author: Tamir Duberstein <tamird@kernel.org>

bpf: ringbuf: Handle pending position wrap

This commit fixes a counter wrapping issue on 32-bit systems in
__bpf_ringbuf_reserve(). It ensures the pending position continues to
advance after the producer position wraps by changing the loop condition
from a magnitude comparison to a check for equality.

Closes: https://lore.kernel.org/bpf/20260614020552.022A11F000E9@smtp.kernel.org/

> diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> index 35ae64ade36b8..909880031fd38 100644
> --- a/kernel/bpf/ringbuf.c
> +++ b/kernel/bpf/ringbuf.c
> @@ -482,7 +482,8 @@ static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size)
>  	prod_pos = rb->producer_pos;
>  	new_prod_pos = prod_pos + len;
>  
> -	while (pend_pos < prod_pos) {
> +	/* Positions wrap; pending_pos cannot logically pass producer_pos. */
> +	while (pend_pos != prod_pos) {

[Severity: High]
This is a pre-existing issue, but I noticed another problem with counter
wrapping on 32-bit systems.

kernel/bpf/ringbuf.c:ringbuf_avail_data_sz() {
    ...
    if (unlikely(rb->overwrite_mode)) {
        over_pos = smp_load_acquire(&rb->overwrite_pos);
        prod_pos = smp_load_acquire(&rb->producer_pos);
        return prod_pos - max(cons_pos, over_pos);
    ...
}

Does the use of max() on wrapping unsigned long counters here lead to
erroneous available data calculations?

If over_pos wraps and becomes numerically smaller than a stalled cons_pos,
max() will incorrectly evaluate cons_pos as larger. This evaluates the
available data as prod_pos - cons_pos instead of prod_pos - over_pos, which
could inflate the reported available data size up to 4GB.

Additionally, if prod_pos exactly laps cons_pos by a multiple of 4GB, the
calculation would yield 0. Could this cause ringbuf_map_poll_kern() to
erroneously block the consumer in an infinite wait despite the buffer being
full?

>  		hdr = (void *)rb->data + (pend_pos & rb->mask);
>  		hdr_len = READ_ONCE(hdr->len);
>  		if (hdr_len & BPF_RINGBUF_BUSY_BIT)

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260618-bpf-ringbuf-fixes-v2-0-33fde039ddf3@kernel.org?part=5

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf v2 2/8] libbpf: ringbuf: Prevent NULL callback crash
  2026-06-19  0:26 ` [PATCH bpf v2 2/8] libbpf: ringbuf: Prevent NULL callback crash Tamir Duberstein
@ 2026-06-19 13:54   ` Jiri Olsa
  2026-06-19 21:11     ` Eduard Zingerman
  0 siblings, 1 reply; 17+ messages in thread
From: Jiri Olsa @ 2026-06-19 13:54 UTC (permalink / raw)
  To: Tamir Duberstein
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Shuah Khan, Andrea Righi, Xu Kuohai,
	Andrea Righi, Bing-Jhong Billy Jheng, David Vernet, bpf,
	linux-kernel, linux-kselftest, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis

On Thu, Jun 18, 2026 at 08:26:40PM -0400, Tamir Duberstein wrote:
> ring_buffer__new() and ring_buffer__add() allow a NULL sample
> callback. When callback-based consumption reaches such a ring, it calls
> through the NULL function pointer and crashes.
> 
> Check every ring before manager polling or consumption so a missing
> callback returns -EINVAL before the manager waits or consumes another
> ring. Check the selected ring before direct per-ring consumption. Perform
> both checks before honoring a zero record bound so invalid callback use
> always reports the error.
> 
> Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
> Assisted-by: Codex:gpt-5.5
> Signed-off-by: Tamir Duberstein <tamird@kernel.org>
> ---
>  tools/lib/bpf/libbpf.h                           | 11 ++-
>  tools/lib/bpf/ringbuf.c                          | 43 +++++++++--
>  tools/testing/selftests/bpf/prog_tests/ringbuf.c | 93 ++++++++++++++++++++++++
>  3 files changed, 136 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index bba4e8464396..9ba6b9ad3498 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -1526,18 +1526,17 @@ LIBBPF_API int ring__map_fd(const struct ring *r);
>   *
>   * @param r A ringbuffer object.
>   * @return The number of records consumed (or INT_MAX, whichever is less), or
> - * a negative number if any of the callbacks return an error.
> + * a negative error code on failure.
>   */
>  LIBBPF_API int ring__consume(struct ring *r);
>  
>  /**
> - * @brief **ring__consume_n()** consumes up to a requested amount of items from
> - * a ringbuffer without event polling.
> + * @brief **ring__consume_n()** consumes up to a requested number of records
> + * from a ringbuffer without event polling.
>   *
>   * @param r A ringbuffer object.
> - * @param n Maximum amount of items to consume.
> - * @return The number of items consumed, or a negative number if any of the
> - * callbacks return an error.
> + * @param n Maximum number of records to consume.
> + * @return The number of records consumed, or a negative error code on failure.
>   */
>  LIBBPF_API int ring__consume_n(struct ring *r, size_t n);
>  
> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> index f2bb619d5a75..0dae4d95d309 100644
> --- a/tools/lib/bpf/ringbuf.c
> +++ b/tools/lib/bpf/ringbuf.c
> @@ -231,6 +231,26 @@ static inline int roundup_len(__u32 len)
>  	return (len + 7) / 8 * 8;
>  }
>  
> +static int ringbuf_validate(const struct ring *r)
> +{
> +	if (unlikely(!r->sample_cb))
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +static int ringbuf_validate_callbacks(const struct ring_buffer *rb)
> +{
> +	int i, err;
> +
> +	for (i = 0; i < rb->ring_cnt; i++) {
> +		err = ringbuf_validate(rb->rings[i]);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
>  static int64_t ringbuf_process_ring(struct ring *r, size_t n)
>  {
>  	int *len_ptr, len, err;
> @@ -240,6 +260,9 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
>  	bool got_new_data;
>  	void *sample;
>  
> +	err = ringbuf_validate(r);
> +	if (err)
> +		return err;

as Emil noted in first version, this seems like overkill, could you
just check sample_cb != NULL before it's actualy called?

in [1] you mentioned you plan to add some feature that won't use sample_cb,
it'd be easier to review/suggest the solution with more details on that

jirka

[1] https://lore.kernel.org/bpf/e384ef2f478093a70af11980d2d1cdeb@kernel.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf v2 2/8] libbpf: ringbuf: Prevent NULL callback crash
  2026-06-19 13:54   ` Jiri Olsa
@ 2026-06-19 21:11     ` Eduard Zingerman
  2026-06-20  2:48       ` Alexei Starovoitov
  0 siblings, 1 reply; 17+ messages in thread
From: Eduard Zingerman @ 2026-06-19 21:11 UTC (permalink / raw)
  To: Jiri Olsa, Tamir Duberstein
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Shuah Khan, Andrea Righi, Xu Kuohai, Andrea Righi,
	Bing-Jhong Billy Jheng, David Vernet, bpf, linux-kernel,
	linux-kselftest, Andrew Werner, Zvi Effron, Andrii Nakryiko,
	Emil Tsalapatis

On Fri, 2026-06-19 at 15:54 +0200, Jiri Olsa wrote:
> On Thu, Jun 18, 2026 at 08:26:40PM -0400, Tamir Duberstein wrote:
> > ring_buffer__new() and ring_buffer__add() allow a NULL sample
> > callback. When callback-based consumption reaches such a ring, it calls
> > through the NULL function pointer and crashes.
> > 
> > Check every ring before manager polling or consumption so a missing
> > callback returns -EINVAL before the manager waits or consumes another
> > ring. Check the selected ring before direct per-ring consumption. Perform
> > both checks before honoring a zero record bound so invalid callback use
> > always reports the error.
> > 
> > Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
> > Assisted-by: Codex:gpt-5.5
> > Signed-off-by: Tamir Duberstein <tamird@kernel.org>
> > ---
> >  tools/lib/bpf/libbpf.h                           | 11 ++-
> >  tools/lib/bpf/ringbuf.c                          | 43 +++++++++--
> >  tools/testing/selftests/bpf/prog_tests/ringbuf.c | 93 ++++++++++++++++++++++++
> >  3 files changed, 136 insertions(+), 11 deletions(-)
> > 
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index bba4e8464396..9ba6b9ad3498 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
> > @@ -1526,18 +1526,17 @@ LIBBPF_API int ring__map_fd(const struct ring *r);
> >   *
> >   * @param r A ringbuffer object.
> >   * @return The number of records consumed (or INT_MAX, whichever is less), or
> > - * a negative number if any of the callbacks return an error.
> > + * a negative error code on failure.
> >   */
> >  LIBBPF_API int ring__consume(struct ring *r);
> >  
> >  /**
> > - * @brief **ring__consume_n()** consumes up to a requested amount of items from
> > - * a ringbuffer without event polling.
> > + * @brief **ring__consume_n()** consumes up to a requested number of records
> > + * from a ringbuffer without event polling.
> >   *
> >   * @param r A ringbuffer object.
> > - * @param n Maximum amount of items to consume.
> > - * @return The number of items consumed, or a negative number if any of the
> > - * callbacks return an error.
> > + * @param n Maximum number of records to consume.
> > + * @return The number of records consumed, or a negative error code on failure.
> >   */
> >  LIBBPF_API int ring__consume_n(struct ring *r, size_t n);
> >  
> > diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> > index f2bb619d5a75..0dae4d95d309 100644
> > --- a/tools/lib/bpf/ringbuf.c
> > +++ b/tools/lib/bpf/ringbuf.c
> > @@ -231,6 +231,26 @@ static inline int roundup_len(__u32 len)
> >  	return (len + 7) / 8 * 8;
> >  }
> >  
> > +static int ringbuf_validate(const struct ring *r)
> > +{
> > +	if (unlikely(!r->sample_cb))
> > +		return -EINVAL;
> > +	return 0;
> > +}
> > +
> > +static int ringbuf_validate_callbacks(const struct ring_buffer *rb)
> > +{
> > +	int i, err;
> > +
> > +	for (i = 0; i < rb->ring_cnt; i++) {
> > +		err = ringbuf_validate(rb->rings[i]);
> > +		if (err)
> > +			return err;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static int64_t ringbuf_process_ring(struct ring *r, size_t n)
> >  {
> >  	int *len_ptr, len, err;
> > @@ -240,6 +260,9 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
> >  	bool got_new_data;
> >  	void *sample;
> >  
> > +	err = ringbuf_validate(r);
> > +	if (err)
> > +		return err;
> 
> as Emil noted in first version, this seems like overkill, could you
> just check sample_cb != NULL before it's actualy called?
> 
> in [1] you mentioned you plan to add some feature that won't use sample_cb,
> it'd be easier to review/suggest the solution with more details on that
> 
> jirka

As-is it appears that the simplest thing to do is to error out in
ring_buffer__new() and ring_buffer__add() when callback is NULL or
when overwrite map flag is present.
I checked our internal usage for then __new/__add functions,
and it seems there are no cases when NULL is passed as a callback.

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf v2 1/8] libbpf: ringbuf: Honor zero consume bounds
  2026-06-19  0:26 ` [PATCH bpf v2 1/8] libbpf: ringbuf: Honor zero consume bounds Tamir Duberstein
@ 2026-06-19 21:13   ` Eduard Zingerman
  0 siblings, 0 replies; 17+ messages in thread
From: Eduard Zingerman @ 2026-06-19 21:13 UTC (permalink / raw)
  To: Tamir Duberstein, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Andrea Righi,
	Xu Kuohai, Andrea Righi, Bing-Jhong Billy Jheng, David Vernet
  Cc: bpf, linux-kernel, linux-kselftest, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis

On Thu, 2026-06-18 at 20:26 -0400, Tamir Duberstein wrote:
> ringbuf_process_ring() checks the record bound only after advancing the
> consumer position and invoking the callback. A zero bound therefore
> consumes the first available record.
> 
> Return before reading the ring positions when the bound is zero so
> ring_buffer__consume_n() and ring__consume_n() leave all records queued.
> 
> Fixes: 4d22ea94ea33 ("libbpf: Add ring__consume_n / ring_buffer__consume_n")
> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> Assisted-by: Codex:gpt-5.5
> Signed-off-by: Tamir Duberstein <tamird@kernel.org>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf.c b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
> index 64520684d2cb..4f0558f14847 100644
> --- a/tools/testing/selftests/bpf/prog_tests/ringbuf.c
> +++ b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
> @@ -404,6 +404,7 @@ static int process_n_sample(void *ctx, void *data, size_t len)
>  static void ringbuf_n_subtest(void)
>  {
>  	struct test_ringbuf_n_lskel *skel_n;
> +	struct ring *ring;
>  	int err, i;
>  
>  	skel_n = test_ringbuf_n_lskel__open();
> @@ -431,6 +432,18 @@ static void ringbuf_n_subtest(void)
>  	for (i = 0; i < N_TOT_SAMPLES; i++)
>  		syscall(__NR_getpgid);
>  
> +	ring = ring_buffer__ring(ringbuf, 0);
> +	if (!ASSERT_OK_PTR(ring, "ring_buffer__ring"))
> +		goto cleanup_ringbuf;
> +
> +	err = ring_buffer__consume_n(ringbuf, 0);
> +	if (!ASSERT_EQ(err, 0, "ringbuf_consume_zero"))
> +		goto cleanup_ringbuf;
> +
> +	err = ring__consume_n(ring, 0);
> +	if (!ASSERT_EQ(err, 0, "ring_consume_zero"))
> +		goto cleanup_ringbuf;
> +
>  	/* Consume all samples from the ring buffer in batches of N_SAMPLES */
>  	for (i = 0; i < N_TOT_SAMPLES; i += err) {
>  		err = ring_buffer__consume_n(ringbuf, N_SAMPLES);

Could you please pack the tests as a separate commit (or commits)
at the end of the series?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf v2 8/8] libbpf: ringbuf: Prevent missed wakeups
  2026-06-19  0:26 ` [PATCH bpf v2 8/8] libbpf: ringbuf: Prevent missed wakeups Tamir Duberstein
@ 2026-06-20  1:53   ` Eduard Zingerman
  0 siblings, 0 replies; 17+ messages in thread
From: Eduard Zingerman @ 2026-06-20  1:53 UTC (permalink / raw)
  To: Tamir Duberstein, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Andrea Righi,
	Xu Kuohai, Andrea Righi, Bing-Jhong Billy Jheng, David Vernet
  Cc: bpf, linux-kernel, linux-kselftest, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis, Sashiko

On Thu, 2026-06-18 at 20:26 -0400, Tamir Duberstein wrote:
> After consuming the last visible record, ringbuf_process_ring()
> publishes the consumer position and checks the producer position. These
> operations lack a full StoreLoad barrier. A producer can therefore
> commit a new record but read the old consumer position while the
> consumer reads the old producer position. The producer sends no
> notification and the consumer waits despite a queued record.
> 
> Insert a full barrier between publishing a consumer position and the
> next producer position load. When a record bound or callback ends the
> current invocation first, execute the barrier before returning so the
> load in a later invocation completes the same handshake.
> 
> Add an edge-triggered epoll test that drains one record per call while a
> concurrent producer fills the ring. Without the barrier, a missed
> notification leaves the producer dropping records from a full ring while
> the consumer times out. Document that bounded consumers and callbacks
> that terminate consumption must drain before waiting again.
> 
> Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
> Reported-by: Andrew Werner <awerner32@gmail.com>
> Reported-by: Sashiko <sashiko-bot@kernel.org>
> Closes: https://lore.kernel.org/bpf/20260614015716.945AF1F000E9@smtp.kernel.org/
> Assisted-by: Codex:gpt-5.5
> Signed-off-by: Tamir Duberstein <tamird@kernel.org>
> ---

Took me a while, but I agree there is a race here.
FWIW, here is the description of the race as far as I understand it.

Simplified pseudo-code for the consumer (ringbuf_process_ring)

    cons_pos = load_acquire(consumer_pos);      // op#1 [orders before 2,3,4,5]
    do {
      got_new_data = false;
      prod_pos = load_acquire(producer_pos);    // op#2 [orders before 3,4,5]
      while (cons_pos < prod_pos) {
        len_ptr = ... cons_pos ...;
        len = load_acquire(len_ptr);            // op#3 [orders before 4,5]
        got_new_data = true;
        callback(... cons_pos ...);             // op#4
        cons_pos += len;
        store_release(consumer_pos, cons_pos);  // op#5 [orders after 2,3,4; ordering relative to 2' not defined]
      }
    } while (got_new_data);

  The patch fixes the issue with op#5, store_release operation is
  ordered after operations 2,3,4, but it's ordering relative to op#2
  from the next outer loop iteration (denoted as 2') is not defined.

Simplified pseudo-code for the producer (bpf_ringbuf_{reserve,commit})

    // reserve
    store_release(producer_pos, ...);

    // commit
    xchg(... clear BUSY_BIT ...);
    cons_pos = load_acquire(consumer_pos);
    if (cons_pos == rec_pos)
        schedule_wakeup();

A possible sequence of events

  Producer:

    // submits a record such that:
    consumer_pos = 0   (initial state)
    producer_pos = 10

  Consumer:

    cons_pos = load_acquire(consumer_pos)     -> cons_pos = 0
    prod_pos = load_acquire(producer_pos)     -> prod_pos = 10
      len = load_acquire(len_ptr)             -> len = 10
      callback(...)                           -> (record at pos 0 consumed)
      cons_pos += len                         -> cons_pos = 10
      store_release(consumer_pos, cons_pos)   -> consumer_pos = 10   (issued; global visibility deferred)
      (cons_pos < prod_pos) 10 < 10           -> false, exit inner loop
    while (got_new_data)                      -> true, re-enter outer loop
    prod_pos = load_acquire(producer_pos)     -> prod_pos = 10       (reads current value; record at pos 10 not published yet)
      (cons_pos < prod_pos) 10 < 10           -> false, skip inner loop
    while (got_new_data)                      -> false; consumer returns 0 and enters an epoll wait

  Producer (reserves and submits a new record):

    store_release(producer_pos, 20)           -> producer_pos = 20   (record at pos 10 reserved)
    xchg(... clear BUSY_BIT ...)              -> (record at pos 10 committed)
    cons_pos = load_acquire(consumer_pos)     -> cons_pos = 0        (consumer's store of 10 not visible yet)
      rec_pos = 10; cons_pos == rec_pos       -> 0 == 10 false -> NO WAKEUP

  Consumer:

    (deferred store becomes visible)          -> consumer_pos = 10   (too late)

---

Consumer exits while a new record is available in the buffer,
and the producer fails to notify the consumer via epoll.
The added barrier prevents such sequence of events.

For the fix:

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>

Please move the test to a separate commit, I'll review it a bit later.

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf v2 2/8] libbpf: ringbuf: Prevent NULL callback crash
  2026-06-19 21:11     ` Eduard Zingerman
@ 2026-06-20  2:48       ` Alexei Starovoitov
  0 siblings, 0 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2026-06-20  2:48 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Jiri Olsa, Tamir Duberstein, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Shuah Khan, Andrea Righi, Xu Kuohai,
	Andrea Righi, Bing-Jhong Billy Jheng, David Vernet, bpf, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, Andrew Werner, Zvi Effron,
	Andrii Nakryiko, Emil Tsalapatis

On Fri, Jun 19, 2026 at 2:12 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> > >     int *len_ptr, len, err;
> > > @@ -240,6 +260,9 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
> > >     bool got_new_data;
> > >     void *sample;
> > >
> > > +   err = ringbuf_validate(r);
> > > +   if (err)
> > > +           return err;
> >
> > as Emil noted in first version, this seems like overkill, could you
> > just check sample_cb != NULL before it's actualy called?
> >
> > in [1] you mentioned you plan to add some feature that won't use sample_cb,
> > it'd be easier to review/suggest the solution with more details on that
> >
> > jirka
>
> As-is it appears that the simplest thing to do is to error out in
> ring_buffer__new() and ring_buffer__add() when callback is NULL or
> when overwrite map flag is present.
> I checked our internal usage for then __new/__add functions,
> and it seems there are no cases when NULL is passed as a callback.

Even such an early check is probably unnecessary.
A valid callback is a part of api today.
In general we don't add defensive programing checks for NULL
here and in other places.

Sounds like there is a follow up to add callback-less api.
Let's see how it looks. If there is a need to check
callback nullness it's better to discuss altogether.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-06-20  2:48 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-19  0:26 [PATCH bpf v2 0/8] bpf: Fix ring buffer handling Tamir Duberstein
2026-06-19  0:26 ` [PATCH bpf v2 1/8] libbpf: ringbuf: Honor zero consume bounds Tamir Duberstein
2026-06-19 21:13   ` Eduard Zingerman
2026-06-19  0:26 ` [PATCH bpf v2 2/8] libbpf: ringbuf: Prevent NULL callback crash Tamir Duberstein
2026-06-19 13:54   ` Jiri Olsa
2026-06-19 21:11     ` Eduard Zingerman
2026-06-20  2:48       ` Alexei Starovoitov
2026-06-19  0:26 ` [PATCH bpf v2 3/8] libbpf: ringbuf: Reject overwrite callback use Tamir Duberstein
2026-06-19  0:26 ` [PATCH bpf v2 4/8] libbpf: ringbuf: Handle position counter wrap Tamir Duberstein
2026-06-19  0:41   ` sashiko-bot
2026-06-19  0:26 ` [PATCH bpf v2 5/8] bpf: ringbuf: Handle pending position wrap Tamir Duberstein
2026-06-19  0:45   ` sashiko-bot
2026-06-19  0:26 ` [PATCH bpf v2 6/8] bpf: user_ringbuf: Handle " Tamir Duberstein
2026-06-19  0:40   ` sashiko-bot
2026-06-19  0:26 ` [PATCH bpf v2 7/8] libbpf: ringbuf: Use compiler atomics Tamir Duberstein
2026-06-19  0:26 ` [PATCH bpf v2 8/8] libbpf: ringbuf: Prevent missed wakeups Tamir Duberstein
2026-06-20  1:53   ` Eduard Zingerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox