Re: [RFC PATCH bpf-next 2/3] bpf: udp: Avoid socket skips and repeats during iteration

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Martin KaFai Lau <martin.lau@linux.dev>
To: Jordan Rife <jordan@jrife.io>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
	Aditi Ghag <aditi.ghag@isovalent.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Subject: Re: [RFC PATCH bpf-next 2/3] bpf: udp: Avoid socket skips and repeats during iteration
Date: Mon, 7 Apr 2025 14:56:54 -0700	[thread overview]
Message-ID: <58bfc722-5dc4-4119-9c5c-49fb6b3da6cd@linux.dev> (raw)
In-Reply-To: <20250404220221.1665428-3-jordan@jrife.io>

On 4/4/25 3:02 PM, Jordan Rife wrote:
> +static struct sock *bpf_iter_udp_resume(struct sock *first_sk,
> +					union bpf_udp_iter_batch_item *cookies,
> +					int n_cookies)
> +{
> +	struct sock *sk = NULL;
> +	int i = 0;
> +
> +	for (; i < n_cookies; i++) {
> +		sk = first_sk;
> +		udp_portaddr_for_each_entry_from(sk)
> +			if (cookies[i].cookie == atomic64_read(&sk->sk_cookie))
> +				goto done;
> +	}
> +done:
> +	return sk;
> +}
> +
>   static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
>   {
>   	struct bpf_udp_iter_state *iter = seq->private;
>   	struct udp_iter_state *state = &iter->state;
> +	unsigned int find_cookie, end_cookie = 0;
>   	struct net *net = seq_file_net(seq);
> -	int resume_bucket, resume_offset;
>   	struct udp_table *udptable;
>   	unsigned int batch_sks = 0;
>   	bool resized = false;
> +	int resume_bucket;
>   	struct sock *sk;
>   
>   	resume_bucket = state->bucket;
> -	resume_offset = iter->offset;
>   
>   	/* The current batch is done, so advance the bucket. */
>   	if (iter->st_bucket_done)
> @@ -3428,6 +3446,8 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
>   	 * before releasing the bucket lock. This allows BPF programs that are
>   	 * called in seq_show to acquire the bucket lock if needed.
>   	 */
> +	find_cookie = iter->cur_sk;
> +	end_cookie = iter->end_sk;
>   	iter->cur_sk = 0;
>   	iter->end_sk = 0;
>   	iter->st_bucket_done = false;
> @@ -3439,18 +3459,26 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
>   		if (hlist_empty(&hslot2->head))
>   			continue;
>   
> -		iter->offset = 0;
>   		spin_lock_bh(&hslot2->lock);
> -		udp_portaddr_for_each_entry(sk, &hslot2->head) {
> +		/* Initialize sk to the first socket in hslot2. */
> +		udp_portaddr_for_each_entry(sk, &hslot2->head)
> +			break;

nit. It is to get the first entry? May be directly do
hlist_entry_safe(hslot2->head.first, ... ) instead.

> +		/* Resume from the first (in iteration order) unseen socket from
> +		 * the last batch that still exists in resume_bucket. Most of
> +		 * the time this will just be where the last iteration left off
> +		 * in resume_bucket unless that socket disappeared between
> +		 * reads.
> +		 *
> +		 * Skip this if end_cookie isn't set; this is the first
> +		 * batch, we're on bucket zero, and we want to start from the
> +		 * beginning.
> +		 */
> +		if (state->bucket == resume_bucket && end_cookie)
> +			sk = bpf_iter_udp_resume(sk,
> +						 &iter->batch[find_cookie],
> +						 end_cookie - find_cookie);
> +		udp_portaddr_for_each_entry_from(sk) {
>   			if (seq_sk_match(seq, sk)) {
> -				/* Resume from the last iterated socket at the
> -				 * offset in the bucket before iterator was stopped.
> -				 */
> -				if (state->bucket == resume_bucket &&
> -				    iter->offset < resume_offset) {
> -					++iter->offset;
> -					continue;
> -				}
>   				if (iter->end_sk < iter->max_sk) {
>   					sock_hold(sk);
>   					iter->batch[iter->end_sk++].sock = sk;

I looked at the details for these two functions. The approach looks good to me. 
Thanks for trying it.

This should stop the potential duplicates during stop() and then re-start().

My understanding is that it may or may not batch something newer than the last 
stop(). This behavior should be similar to the current offset approach also. I 
think it is fine. The similar situation is true for the next bucket anyway.

next prev parent reply	other threads:[~2025-04-07 21:57 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-04 22:02 [RFC PATCH bpf-next 0/3] Exactly-once UDP socket iteration Jordan Rife
2025-04-04 22:02 ` [RFC PATCH bpf-next 1/3] bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items Jordan Rife
2025-04-04 22:02 ` [RFC PATCH bpf-next 2/3] bpf: udp: Avoid socket skips and repeats during iteration Jordan Rife
2025-04-04 23:20   ` Kuniyuki Iwashima
2025-04-07 23:30     ` Jordan Rife
2025-04-08  0:16       ` Kuniyuki Iwashima
2025-04-08  2:39         ` Jordan Rife
2025-04-08  5:23           ` Martin KaFai Lau
2025-04-09  0:11             ` Jordan Rife
2025-04-07 21:56   ` Martin KaFai Lau [this message]
2025-04-07 23:39     ` Jordan Rife
2025-04-04 22:02 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add tests for bucket resume logic in UDP socket iterators Jordan Rife

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58bfc722-5dc4-4119-9c5c-49fb6b3da6cd@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=aditi.ghag@isovalent.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jordan@jrife.io \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.