From: Martin KaFai Lau <martin.lau@linux.dev>
To: Kuniyuki Iwashima <kuniyu@amazon.com>, jordan@jrife.io
Cc: aditi.ghag@isovalent.com, bpf@vger.kernel.org,
daniel@iogearbox.net, netdev@vger.kernel.org,
willemdebruijn.kernel@gmail.com
Subject: Re: [PATCH v3 bpf-next 2/6] bpf: udp: Make sure iter->batch always contains a full bucket snapshot
Date: Thu, 17 Apr 2025 16:12:57 -0700 [thread overview]
Message-ID: <42b84ea3-b3c1-4839-acfc-bd182e7af313@linux.dev> (raw)
In-Reply-To: <20250417224219.29946-1-kuniyu@amazon.com>
On 4/17/25 3:41 PM, Kuniyuki Iwashima wrote:
> From: Jordan Rife <jordan@jrife.io>
> Date: Wed, 16 Apr 2025 16:36:17 -0700
>> Require that iter->batch always contains a full bucket snapshot. This
>> invariant is important to avoid skipping or repeating sockets during
>> iteration when combined with the next few patches. Before, there were
>> two cases where a call to bpf_iter_udp_batch may only capture part of a
>> bucket:
>>
>> 1. When bpf_iter_udp_realloc_batch() returns -ENOMEM [1].
>> 2. When more sockets are added to the bucket while calling
>> bpf_iter_udp_realloc_batch(), making the updated batch size
>> insufficient [2].
>>
>> In cases where the batch size only covers part of a bucket, it is
>> possible to forget which sockets were already visited, especially if we
>> have to process a bucket in more than two batches. This forces us to
>> choose between repeating or skipping sockets, so don't allow this:
>>
>> 1. Stop iteration and propagate -ENOMEM up to userspace if reallocation
>> fails instead of continuing with a partial batch.
>> 2. Retry bpf_iter_udp_realloc_batch() up to two times if we fail to
>> capture the full bucket. On the third attempt, hold onto the bucket
>> lock (hslot2->lock) through bpf_iter_udp_realloc_batch() with
>> GFP_ATOMIC to guarantee that the bucket size doesn't change before
>> our next attempt. Try with GFP_USER first to improve the chances
>> that memory allocation succeeds; only use GFP_ATOMIC as a last
>> resort.
>
> kvmalloc() tries the kmalloc path, 1. slab and 2. page allocator, and
> if both of them fails, then tries 3. vmalloc(). But, vmalloc() does not
> support GFP_ATOMIC, __kvmalloc_node_noprof() returns at
> !gfpflags_allow_blocking().
>
> So, falling back to GFP_ATOMIC is most unlikely to work as the last resort.
>
> GFP_ATOMIC first and falling back to GFP_USER few more times, or not using
> GFP_ATOMIC makes more sense to me.
If I read it correctly, the last retry with GFP_ATOMIC is not because of the
earlier GFP_USER allocation failure but the size of the bucket has changed a lot
that it is doing one final attempt to get the whole bucket and this requires to
hold the bucket lock to ensure the size stays the same which then must use
GFP_ATOMIC.
next prev parent reply other threads:[~2025-04-17 23:13 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-16 23:36 [PATCH v3 bpf-next 0/6] bpf: udp: Exactly-once socket iteration Jordan Rife
2025-04-16 23:36 ` [PATCH v3 bpf-next 1/6] bpf: udp: Make mem flags configurable through bpf_iter_udp_realloc_batch Jordan Rife
2025-04-17 23:35 ` Kuniyuki Iwashima
2025-04-16 23:36 ` [PATCH v3 bpf-next 2/6] bpf: udp: Make sure iter->batch always contains a full bucket snapshot Jordan Rife
2025-04-17 22:41 ` Kuniyuki Iwashima
2025-04-17 23:12 ` Martin KaFai Lau [this message]
2025-04-17 23:32 ` Kuniyuki Iwashima
2025-04-17 23:51 ` Jordan Rife
2025-04-17 23:45 ` Kuniyuki Iwashima
2025-04-17 23:57 ` Jordan Rife
2025-04-18 0:15 ` Kuniyuki Iwashima
2025-04-16 23:36 ` [PATCH v3 bpf-next 3/6] bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items Jordan Rife
2025-04-16 23:36 ` [PATCH v3 bpf-next 4/6] bpf: udp: Avoid socket skips and repeats during iteration Jordan Rife
2025-04-16 23:36 ` [PATCH v3 bpf-next 5/6] selftests/bpf: Return socket cookies from sock_iter_batch progs Jordan Rife
2025-04-16 23:36 ` [PATCH v3 bpf-next 6/6] selftests/bpf: Add tests for bucket resume logic in UDP socket iterators Jordan Rife
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42b84ea3-b3c1-4839-acfc-bd182e7af313@linux.dev \
--to=martin.lau@linux.dev \
--cc=aditi.ghag@isovalent.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jordan@jrife.io \
--cc=kuniyu@amazon.com \
--cc=netdev@vger.kernel.org \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).