From: Martin KaFai Lau <martin.lau@linux.dev>
To: Jordan Rife <jordan@jrife.io>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
Aditi Ghag <aditi.ghag@isovalent.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Subject: Re: [RFC PATCH bpf-next 2/3] bpf: udp: Avoid socket skips and repeats during iteration
Date: Mon, 7 Apr 2025 14:56:54 -0700 [thread overview]
Message-ID: <58bfc722-5dc4-4119-9c5c-49fb6b3da6cd@linux.dev> (raw)
In-Reply-To: <20250404220221.1665428-3-jordan@jrife.io>
On 4/4/25 3:02 PM, Jordan Rife wrote:
> +static struct sock *bpf_iter_udp_resume(struct sock *first_sk,
> + union bpf_udp_iter_batch_item *cookies,
> + int n_cookies)
> +{
> + struct sock *sk = NULL;
> + int i = 0;
> +
> + for (; i < n_cookies; i++) {
> + sk = first_sk;
> + udp_portaddr_for_each_entry_from(sk)
> + if (cookies[i].cookie == atomic64_read(&sk->sk_cookie))
> + goto done;
> + }
> +done:
> + return sk;
> +}
> +
> static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
> {
> struct bpf_udp_iter_state *iter = seq->private;
> struct udp_iter_state *state = &iter->state;
> + unsigned int find_cookie, end_cookie = 0;
> struct net *net = seq_file_net(seq);
> - int resume_bucket, resume_offset;
> struct udp_table *udptable;
> unsigned int batch_sks = 0;
> bool resized = false;
> + int resume_bucket;
> struct sock *sk;
>
> resume_bucket = state->bucket;
> - resume_offset = iter->offset;
>
> /* The current batch is done, so advance the bucket. */
> if (iter->st_bucket_done)
> @@ -3428,6 +3446,8 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
> * before releasing the bucket lock. This allows BPF programs that are
> * called in seq_show to acquire the bucket lock if needed.
> */
> + find_cookie = iter->cur_sk;
> + end_cookie = iter->end_sk;
> iter->cur_sk = 0;
> iter->end_sk = 0;
> iter->st_bucket_done = false;
> @@ -3439,18 +3459,26 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
> if (hlist_empty(&hslot2->head))
> continue;
>
> - iter->offset = 0;
> spin_lock_bh(&hslot2->lock);
> - udp_portaddr_for_each_entry(sk, &hslot2->head) {
> + /* Initialize sk to the first socket in hslot2. */
> + udp_portaddr_for_each_entry(sk, &hslot2->head)
> + break;
nit. It is to get the first entry? May be directly do
hlist_entry_safe(hslot2->head.first, ... ) instead.
> + /* Resume from the first (in iteration order) unseen socket from
> + * the last batch that still exists in resume_bucket. Most of
> + * the time this will just be where the last iteration left off
> + * in resume_bucket unless that socket disappeared between
> + * reads.
> + *
> + * Skip this if end_cookie isn't set; this is the first
> + * batch, we're on bucket zero, and we want to start from the
> + * beginning.
> + */
> + if (state->bucket == resume_bucket && end_cookie)
> + sk = bpf_iter_udp_resume(sk,
> + &iter->batch[find_cookie],
> + end_cookie - find_cookie);
> + udp_portaddr_for_each_entry_from(sk) {
> if (seq_sk_match(seq, sk)) {
> - /* Resume from the last iterated socket at the
> - * offset in the bucket before iterator was stopped.
> - */
> - if (state->bucket == resume_bucket &&
> - iter->offset < resume_offset) {
> - ++iter->offset;
> - continue;
> - }
> if (iter->end_sk < iter->max_sk) {
> sock_hold(sk);
> iter->batch[iter->end_sk++].sock = sk;
I looked at the details for these two functions. The approach looks good to me.
Thanks for trying it.
This should stop the potential duplicates during stop() and then re-start().
My understanding is that it may or may not batch something newer than the last
stop(). This behavior should be similar to the current offset approach also. I
think it is fine. The similar situation is true for the next bucket anyway.
next prev parent reply other threads:[~2025-04-07 21:57 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-04 22:02 [RFC PATCH bpf-next 0/3] Exactly-once UDP socket iteration Jordan Rife
2025-04-04 22:02 ` [RFC PATCH bpf-next 1/3] bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items Jordan Rife
2025-04-04 22:02 ` [RFC PATCH bpf-next 2/3] bpf: udp: Avoid socket skips and repeats during iteration Jordan Rife
2025-04-04 23:20 ` Kuniyuki Iwashima
2025-04-07 23:30 ` Jordan Rife
2025-04-08 0:16 ` Kuniyuki Iwashima
2025-04-08 2:39 ` Jordan Rife
2025-04-08 5:23 ` Martin KaFai Lau
2025-04-09 0:11 ` Jordan Rife
2025-04-07 21:56 ` Martin KaFai Lau [this message]
2025-04-07 23:39 ` Jordan Rife
2025-04-04 22:02 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add tests for bucket resume logic in UDP socket iterators Jordan Rife
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58bfc722-5dc4-4119-9c5c-49fb6b3da6cd@linux.dev \
--to=martin.lau@linux.dev \
--cc=aditi.ghag@isovalent.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jordan@jrife.io \
--cc=netdev@vger.kernel.org \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox