From: Martin KaFai Lau <martin.lau@linux.dev>
To: Jordan Rife <jordan@jrife.io>
Cc: Aditi Ghag <aditi.ghag@isovalent.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
Kuniyuki Iwashima <kuniyu@amazon.com>,
bpf@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH v2 bpf-next 2/5] bpf: udp: Propagate ENOMEM up from bpf_iter_udp_batch
Date: Fri, 11 Apr 2025 20:47:36 -0700 [thread overview]
Message-ID: <daa3f02a-c982-4a7a-afcd-41f5e9b2f79c@linux.dev> (raw)
In-Reply-To: <CABi4-ojQVb=8SKGNubpy=bG4pg1o=tNaz9UspYDTbGTPZTu8gQ@mail.gmail.com>
On 4/11/25 4:31 PM, Jordan Rife wrote:
>> The resized == true case will have a similar issue. Meaning the next
>> bpf_iter_udp_batch() will end up skipping the remaining sk in that bucket, e.g.
>> the partial-bucket batch has been consumed, so cur_sk == end_sk but
>> st_bucket_done == false and bpf_iter_udp_resume() returns NULL. It is sort of a
>> regression from the current "offset" implementation for this case. Any thought
>> on how to make it better?
>
> Are you referring to the case where the bucket grows in size so much
> between releasing and reacquiring the bucket's lock to where we still
> can't fit all sockets into our batch even after a
> bpf_iter_udp_realloc_batch()? If so, I think we touched on this a bit
> in [1]:
Right, and it is also the same as the kvmalloc failure case that this patch is
handling. Let see if it can be done better without returning error in both cases.
> 1) Loop until iter->end_sk == batch_sks, possibly calling realloc a
> couple times. The unbounded loop is a bit worrying; I guess
> bpf_iter_udp_batch could "race" if the bucket size keeps growing here.
> 2) Loop some bounded number of times and return some ERR_PTR(x) if the
> loop can't keep up after a few tries so we don't break the invariant
> that the batch is always a full snapshot of a bucket.
> 3) Set some flag in the iterator state, e.g. iter->is_partial,
> indicating to the next call to bpf_iter_udp_realloc_batch() that the
> last batch was actually partial and that if it can't find any of the
> cookies from last time it should start over from the beginning of the
> bucket instead of advancing to the next bucket. This may repeat
> sockets we've already seen in the worst case, but still better than
> skipping them.
Probably something like (3) but I don't think it needs a new "is_partial". The
existing "st_bucket_done" should do.
How about for the "st_bucket_done == false" case, it also stores the
cookie before advancing the cur_sk in bpf_iter_udp_seq_next().
not-compiled code:
static void *bpf_iter_udp_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
if (iter->cur_sk < iter->end_sk) {
u64 cookie;
cookie = iter->st_bucket_done ?
0 : __sock_gen_cookie(iter->batch[iter->cur_sk].sock);
sock_put(iter->batch[iter->cur_sk].sock);
iter->batch[iter->cur_sk++].cookie = cookie;
}
/* ... */
}
In bpf_iter_udp_resume(), if it cannot find the first sk from find_cookie to
end_cookie, then it searches backward from find_cookie to 0. If nothing found,
then it should start from the beginning of the resume_bucket. Would it work?
next prev parent reply other threads:[~2025-04-12 3:47 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-11 17:35 [PATCH v2 bpf-next 0/5] bpf: udp: Exactly-once socket iteration Jordan Rife
2025-04-11 17:35 ` [PATCH v2 bpf-next 1/5] bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items Jordan Rife
2025-04-11 17:35 ` [PATCH v2 bpf-next 2/5] bpf: udp: Propagate ENOMEM up from bpf_iter_udp_batch Jordan Rife
2025-04-11 22:10 ` Martin KaFai Lau
2025-04-11 23:31 ` Jordan Rife
2025-04-12 3:47 ` Martin KaFai Lau [this message]
2025-04-14 0:02 ` Jordan Rife
2025-04-14 21:54 ` Martin KaFai Lau
2025-04-14 23:59 ` Jordan Rife
2025-04-11 17:35 ` [PATCH v2 bpf-next 3/5] bpf: udp: Avoid socket skips and repeats during iteration Jordan Rife
2025-04-11 20:12 ` Kuniyuki Iwashima
2025-04-11 17:35 ` [PATCH v2 bpf-next 4/5] selftests/bpf: Return socket cookies from sock_iter_batch progs Jordan Rife
2025-04-11 17:35 ` [PATCH v2 bpf-next 5/5] selftests/bpf: Add tests for bucket resume logic in UDP socket iterators Jordan Rife
2025-04-11 22:32 ` Martin KaFai Lau
2025-04-15 0:04 ` Jordan Rife
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=daa3f02a-c982-4a7a-afcd-41f5e9b2f79c@linux.dev \
--to=martin.lau@linux.dev \
--cc=aditi.ghag@isovalent.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jordan@jrife.io \
--cc=kuniyu@amazon.com \
--cc=netdev@vger.kernel.org \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.