From: Jordan Rife <jordan@jrife.io>
To: netdev@vger.kernel.org, bpf@vger.kernel.org
Cc: Jordan Rife <jordan@jrife.io>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@linux.dev>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
Kuniyuki Iwashima <kuniyu@amazon.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>
Subject: [RESEND PATCH v2 bpf-next 00/12] bpf: tcp: Exactly-once socket iteration
Date: Wed, 18 Jun 2025 09:25:31 -0700 [thread overview]
Message-ID: <20250618162545.15633-1-jordan@jrife.io> (raw)
TCP socket iterators use iter->offset to track progress through a
bucket, which is a measure of the number of matching sockets from the
current bucket that have been seen or processed by the iterator. On
subsequent iterations, if the current bucket has unprocessed items, we
skip at least iter->offset matching items in the bucket before adding
any remaining items to the next batch. However, iter->offset isn't
always an accurate measure of "things already seen" when the underlying
bucket changes between reads which can lead to repeated or skipped
sockets. Instead, this series remembers the cookies of the sockets we
haven't seen yet in the current bucket and resumes from the first cookie
in that list that we can find on the next iteration.
This is a continuation of the work started in [1]. This series largely
replicates the patterns applied to UDP socket iterators, applying them
instead to TCP socket iterators.
CHANGES
=======
v1 -> v2:
* In patch five ("bpf: tcp: Avoid socket skips and repeats during
iteration"), remove unnecessary bucket bounds checks in
bpf_iter_tcp_resume. In either case, if st->bucket is outside the
current table's range then bpf_iter_tcp_resume_* calls *_get_first
which immediately returns NULL anyway and the logic will fall through.
(Martin)
* Add a check at the top of bpf_iter_tcp_resume_listening and
bpf_iter_tcp_resume_established to see if we're done with the current
bucket and advance it immediately instead of wasting time finding the
first matching socket in that bucket with
(listening|established)_get_first. In v1, we originally discussed
adding logic to advance the bucket in bpf_iter_tcp_seq_next and
bpf_iter_tcp_seq_stop, but after trying this the logic seemed harder
to track. Overall, keeping everything inside bpf_iter_tcp_resume_*
seemed a bit clearer. (Martin)
* Instead of using a timeout in the last patch ("selftests/bpf: Add
tests for bucket resume logic in established sockets") to wait for
sockets to leave the ehash table after calling close(), use
bpf_sock_destroy to deterministically destroy and remove them. This
introduces one more patch ("selftests/bpf: Create iter_tcp_destroy
test program") to create the iterator program that destroys a selected
socket. Drive this through a destroy() function in the last patch
which, just like close(), accepts a socket file descriptor. (Martin)
* Introduce one more patch ("selftests/bpf: Allow for iteration over
multiple states") to fix a latent bug in iter_tcp_soreuse where the
sk->sk_state != TCP_LISTEN check was ignored. Add the "ss" variable to
allow test code to configure which socket states to allow.
[1]: https://lore.kernel.org/bpf/20250502161528.264630-1-jordan@jrife.io/
Jordan Rife (12):
bpf: tcp: Make mem flags configurable through
bpf_iter_tcp_realloc_batch
bpf: tcp: Make sure iter->batch always contains a full bucket snapshot
bpf: tcp: Get rid of st_bucket_done
bpf: tcp: Use bpf_tcp_iter_batch_item for bpf_tcp_iter_state batch
items
bpf: tcp: Avoid socket skips and repeats during iteration
selftests/bpf: Add tests for bucket resume logic in listening sockets
selftests/bpf: Allow for iteration over multiple ports
selftests/bpf: Allow for iteration over multiple states
selftests/bpf: Make ehash buckets configurable in socket iterator
tests
selftests/bpf: Create established sockets in socket iterator tests
selftests/bpf: Create iter_tcp_destroy test program
selftests/bpf: Add tests for bucket resume logic in established
sockets
net/ipv4/tcp_ipv4.c | 263 +++++++---
.../bpf/prog_tests/sock_iter_batch.c | 450 +++++++++++++++++-
.../selftests/bpf/progs/sock_iter_batch.c | 37 +-
3 files changed, 668 insertions(+), 82 deletions(-)
--
2.43.0
next reply other threads:[~2025-06-18 16:25 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-18 16:25 Jordan Rife [this message]
2025-06-18 16:25 ` [RESEND PATCH v2 bpf-next 01/12] bpf: tcp: Make mem flags configurable through bpf_iter_tcp_realloc_batch Jordan Rife
2025-06-18 16:25 ` [RESEND PATCH v2 bpf-next 02/12] bpf: tcp: Make sure iter->batch always contains a full bucket snapshot Jordan Rife
2025-06-18 18:44 ` Stanislav Fomichev
2025-06-23 18:50 ` Jordan Rife
2025-06-23 21:36 ` Stanislav Fomichev
2025-06-24 19:49 ` Jordan Rife
2025-06-18 16:25 ` [RESEND PATCH v2 bpf-next 03/12] bpf: tcp: Get rid of st_bucket_done Jordan Rife
2025-06-18 16:25 ` [RESEND PATCH v2 bpf-next 04/12] bpf: tcp: Use bpf_tcp_iter_batch_item for bpf_tcp_iter_state batch items Jordan Rife
2025-06-18 16:25 ` [RESEND PATCH v2 bpf-next 05/12] bpf: tcp: Avoid socket skips and repeats during iteration Jordan Rife
2025-06-18 16:25 ` [RESEND PATCH v2 bpf-next 06/12] selftests/bpf: Add tests for bucket resume logic in listening sockets Jordan Rife
2025-06-18 16:25 ` [RESEND PATCH v2 bpf-next 07/12] selftests/bpf: Allow for iteration over multiple ports Jordan Rife
2025-06-18 16:25 ` [RESEND PATCH v2 bpf-next 08/12] selftests/bpf: Allow for iteration over multiple states Jordan Rife
2025-06-18 16:25 ` [RESEND PATCH v2 bpf-next 09/12] selftests/bpf: Make ehash buckets configurable in socket iterator tests Jordan Rife
2025-06-18 16:25 ` [RESEND PATCH v2 bpf-next 10/12] selftests/bpf: Create established sockets " Jordan Rife
2025-06-18 16:25 ` [RESEND PATCH v2 bpf-next 11/12] selftests/bpf: Create iter_tcp_destroy test program Jordan Rife
2025-06-18 16:25 ` [RESEND PATCH v2 bpf-next 12/12] selftests/bpf: Add tests for bucket resume logic in established sockets Jordan Rife
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250618162545.15633-1-jordan@jrife.io \
--to=jordan@jrife.io \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kuniyu@amazon.com \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).