From: Tamir Duberstein <tamird@kernel.org>
To: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>,
Andrea Righi <arighi@nvidia.com>,
Xu Kuohai <xukuohai@huawei.com>,
Andrea Righi <andrea.righi@canonical.com>,
Bing-Jhong Billy Jheng <billy@starlabs.sg>,
David Vernet <void@manifault.com>
Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org,
Andrew Werner <awerner32@gmail.com>,
Zvi Effron <zeffron@riotgames.com>,
Andrii Nakryiko <andriin@fb.com>,
Emil Tsalapatis <emil@etsalapatis.com>,
Tamir Duberstein <tamird@kernel.org>,
Sashiko <sashiko-bot@kernel.org>
Subject: [PATCH bpf v2 8/8] libbpf: ringbuf: Prevent missed wakeups
Date: Thu, 18 Jun 2026 20:26:46 -0400 [thread overview]
Message-ID: <20260618-bpf-ringbuf-fixes-v2-8-33fde039ddf3@kernel.org> (raw)
In-Reply-To: <20260618-bpf-ringbuf-fixes-v2-0-33fde039ddf3@kernel.org>
After consuming the last visible record, ringbuf_process_ring()
publishes the consumer position and checks the producer position. These
operations lack a full StoreLoad barrier. A producer can therefore
commit a new record but read the old consumer position while the
consumer reads the old producer position. The producer sends no
notification and the consumer waits despite a queued record.
Insert a full barrier between publishing a consumer position and the
next producer position load. When a record bound or callback ends the
current invocation first, execute the barrier before returning so the
load in a later invocation completes the same handshake.
Add an edge-triggered epoll test that drains one record per call while a
concurrent producer fills the ring. Without the barrier, a missed
notification leaves the producer dropping records from a full ring while
the consumer times out. Document that bounded consumers and callbacks
that terminate consumption must drain before waiting again.
Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
Reported-by: Andrew Werner <awerner32@gmail.com>
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/bpf/20260614015716.945AF1F000E9@smtp.kernel.org/
Assisted-by: Codex:gpt-5.5
Signed-off-by: Tamir Duberstein <tamird@kernel.org>
---
tools/lib/bpf/libbpf.h | 23 +++++++
tools/lib/bpf/ringbuf.c | 24 +++++--
tools/testing/selftests/bpf/prog_tests/ringbuf.c | 83 ++++++++++++++++++++++++
3 files changed, 123 insertions(+), 7 deletions(-)
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index ae46b17feaa6..3a649ed87034 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1440,6 +1440,11 @@ struct ring;
struct user_ring_buffer;
/* Callback-based consumption is unsupported for BPF_F_RB_OVERWRITE maps. */
+/*
+ * A negative return stops consumption; non-negative values continue. Stopping
+ * can leave records queued without a new readiness notification. Before
+ * waiting for readiness again, consume until no records remain.
+ */
typedef int (*ring_buffer_sample_fn)(void *ctx, void *data, size_t size);
struct ring_buffer_opts {
@@ -1456,6 +1461,20 @@ LIBBPF_API int ring_buffer__add(struct ring_buffer *rb, int map_fd,
ring_buffer_sample_fn sample_cb, void *ctx);
LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms);
LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb);
+
+/**
+ * @brief **ring_buffer__consume_n()** consumes up to a requested number of
+ * records from a ring buffer manager without event polling.
+ *
+ * @param rb A ring buffer manager object.
+ * @param n Maximum number of records to consume.
+ * @return The number of records consumed, or a negative error code on failure.
+ *
+ * Reaching the requested bound does not establish that every ring is empty.
+ * Records can remain queued without a new readiness notification. Before
+ * calling ring_buffer__poll() or waiting on ring_buffer__epoll_fd(), call
+ * ring_buffer__consume() until it returns 0.
+ */
LIBBPF_API int ring_buffer__consume_n(struct ring_buffer *rb, size_t n);
LIBBPF_API int ring_buffer__epoll_fd(const struct ring_buffer *rb);
@@ -1538,6 +1557,10 @@ LIBBPF_API int ring__consume(struct ring *r);
* @param r A ringbuffer object.
* @param n Maximum number of records to consume.
* @return The number of records consumed, or a negative error code on failure.
+ *
+ * Reaching the requested bound does not establish that the ring is empty.
+ * Records can remain queued without a new readiness notification. Before
+ * waiting on ring__map_fd(), call ring__consume() until it returns 0.
*/
LIBBPF_API int ring__consume_n(struct ring *r, size_t n);
diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
index 141f2cbe56eb..0598f6c2f7da 100644
--- a/tools/lib/bpf/ringbuf.c
+++ b/tools/lib/bpf/ringbuf.c
@@ -271,7 +271,7 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
return 0;
cons_pos = __atomic_load_n(r->consumer_pos, __ATOMIC_ACQUIRE);
- do {
+ for (;;) {
got_new_data = false;
prod_pos = __atomic_load_n(r->producer_pos, __ATOMIC_ACQUIRE);
/* Positions wrap; the consumer cannot logically pass the producer. */
@@ -279,9 +279,9 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
len_ptr = r->data + (cons_pos & r->mask);
len = __atomic_load_n(len_ptr, __ATOMIC_ACQUIRE);
- /* sample not committed yet, bail out for now */
+ /* Retry a busy record once after publishing prior records. */
if (len & BPF_RINGBUF_BUSY_BIT)
- goto done;
+ break;
got_new_data = true;
cons_pos += roundup_len(len);
@@ -294,7 +294,8 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
__atomic_store_n(r->consumer_pos,
cons_pos,
__ATOMIC_RELEASE);
- return err;
+ cnt = err;
+ break;
}
cnt++;
}
@@ -303,10 +304,19 @@ static int64_t ringbuf_process_ring(struct ring *r, size_t n)
__ATOMIC_RELEASE);
if (cnt >= n)
- goto done;
+ break;
}
- } while (got_new_data);
-done:
+ if (!got_new_data)
+ break;
+
+ /*
+ * Order the published consumer position before the next
+ * producer-position load, whether below or in a later invocation.
+ */
+ __atomic_thread_fence(__ATOMIC_SEQ_CST);
+ if (cnt < 0 || cnt >= n)
+ break;
+ }
return cnt;
}
diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf.c b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
index 29be2476c478..0d45a766a580 100644
--- a/tools/testing/selftests/bpf/prog_tests/ringbuf.c
+++ b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
@@ -492,6 +492,87 @@ static void ringbuf_null_cb_subtest(void)
test_ringbuf_n_lskel__destroy(skel_n);
}
+#define N_WAKEUP_SAMPLES 20000
+
+struct wakeup_ctx {
+ bool stop;
+};
+
+static void *wakeup_producer(void *arg)
+{
+ struct wakeup_ctx *ctx = arg;
+
+ while (!__atomic_load_n(&ctx->stop, __ATOMIC_RELAXED))
+ syscall(__NR_getpgid);
+ return NULL;
+}
+
+static void ringbuf_wakeup_subtest(void)
+{
+ struct test_ringbuf_n_lskel *skel_n;
+ struct ring_buffer *ringbuf = NULL;
+ struct epoll_event event = {
+ .events = EPOLLIN | EPOLLET,
+ };
+ struct wakeup_ctx ctx = {};
+ pthread_t producer;
+ int epoll_fd = -1;
+ int err, total = 0;
+
+ skel_n = test_ringbuf_n_lskel__open();
+ if (!ASSERT_OK_PTR(skel_n, "test_ringbuf_n_lskel__open"))
+ return;
+
+ skel_n->maps.ringbuf.max_entries = getpagesize();
+ skel_n->bss->pid = getpid();
+ skel_n->bss->value = SAMPLE_VALUE;
+
+ err = test_ringbuf_n_lskel__load(skel_n);
+ if (!ASSERT_OK(err, "test_ringbuf_n_lskel__load"))
+ goto cleanup;
+
+ err = test_ringbuf_n_lskel__attach(skel_n);
+ if (!ASSERT_OK(err, "test_ringbuf_n_lskel__attach"))
+ goto cleanup;
+
+ ringbuf = ring_buffer__new(skel_n->maps.ringbuf.map_fd,
+ process_noop_sample, NULL, NULL);
+ if (!ASSERT_OK_PTR(ringbuf, "ring_buffer__new"))
+ goto cleanup;
+
+ epoll_fd = epoll_create1(EPOLL_CLOEXEC);
+ if (!ASSERT_OK_FD(epoll_fd, "epoll_create1"))
+ goto cleanup_ringbuf;
+
+ err = epoll_ctl(epoll_fd, EPOLL_CTL_ADD, skel_n->maps.ringbuf.map_fd,
+ &event);
+ if (!ASSERT_OK(err, "epoll_ctl"))
+ goto cleanup_epoll;
+
+ err = pthread_create(&producer, NULL, wakeup_producer, &ctx);
+ if (!ASSERT_OK(err, "pthread_create"))
+ goto cleanup_epoll;
+
+ while (total < N_WAKEUP_SAMPLES) {
+ err = epoll_wait(epoll_fd, &event, 1, 1000);
+ if (!ASSERT_EQ(err, 1, "epoll_wait"))
+ break;
+ while ((err = ring_buffer__consume_n(ringbuf, 1)) > 0)
+ total += err;
+ if (!ASSERT_OK(err, "ring_buffer__consume_n"))
+ break;
+ }
+
+ __atomic_store_n(&ctx.stop, true, __ATOMIC_RELAXED);
+ pthread_join(producer, NULL);
+cleanup_epoll:
+ close(epoll_fd);
+cleanup_ringbuf:
+ ring_buffer__free(ringbuf);
+cleanup:
+ test_ringbuf_n_lskel__destroy(skel_n);
+}
+
static void ringbuf_n_subtest(void)
{
struct test_ringbuf_n_lskel *skel_n;
@@ -709,6 +790,8 @@ void test_ringbuf(void)
ringbuf_n_subtest();
if (test__start_subtest("ringbuf_null_cb"))
ringbuf_null_cb_subtest();
+ if (test__start_subtest("ringbuf_wakeup"))
+ ringbuf_wakeup_subtest();
if (test__start_subtest("ringbuf_map_key"))
ringbuf_map_key_subtest();
if (test__start_subtest("ringbuf_write"))
--
2.55.0.rc0.159.gbe5d7338c2
prev parent reply other threads:[~2026-06-19 0:27 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-19 0:26 [PATCH bpf v2 0/8] bpf: Fix ring buffer handling Tamir Duberstein
2026-06-19 0:26 ` [PATCH bpf v2 1/8] libbpf: ringbuf: Honor zero consume bounds Tamir Duberstein
2026-06-19 0:26 ` [PATCH bpf v2 2/8] libbpf: ringbuf: Prevent NULL callback crash Tamir Duberstein
2026-06-19 0:26 ` [PATCH bpf v2 3/8] libbpf: ringbuf: Reject overwrite callback use Tamir Duberstein
2026-06-19 0:26 ` [PATCH bpf v2 4/8] libbpf: ringbuf: Handle position counter wrap Tamir Duberstein
2026-06-19 0:41 ` sashiko-bot
2026-06-19 0:26 ` [PATCH bpf v2 5/8] bpf: ringbuf: Handle pending position wrap Tamir Duberstein
2026-06-19 0:45 ` sashiko-bot
2026-06-19 0:26 ` [PATCH bpf v2 6/8] bpf: user_ringbuf: Handle " Tamir Duberstein
2026-06-19 0:40 ` sashiko-bot
2026-06-19 0:26 ` [PATCH bpf v2 7/8] libbpf: ringbuf: Use compiler atomics Tamir Duberstein
2026-06-19 0:26 ` Tamir Duberstein [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260618-bpf-ringbuf-fixes-v2-8-33fde039ddf3@kernel.org \
--to=tamird@kernel.org \
--cc=andrea.righi@canonical.com \
--cc=andrii@kernel.org \
--cc=andriin@fb.com \
--cc=arighi@nvidia.com \
--cc=ast@kernel.org \
--cc=awerner32@gmail.com \
--cc=billy@starlabs.sg \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=emil@etsalapatis.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=sashiko-bot@kernel.org \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=void@manifault.com \
--cc=xukuohai@huawei.com \
--cc=yonghong.song@linux.dev \
--cc=zeffron@riotgames.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.