From: Kuniyuki Iwashima <kuniyu@amazon.com>
To: <martin.lau@linux.dev>
Cc: <aditi.ghag@isovalent.com>, <bpf@vger.kernel.org>,
<daniel@iogearbox.net>, <jordan@jrife.io>, <kuniyu@amazon.com>,
<netdev@vger.kernel.org>, <willemdebruijn.kernel@gmail.com>
Subject: Re: [PATCH v3 bpf-next 2/6] bpf: udp: Make sure iter->batch always contains a full bucket snapshot
Date: Thu, 17 Apr 2025 16:32:58 -0700 [thread overview]
Message-ID: <20250417233303.37489-1-kuniyu@amazon.com> (raw)
In-Reply-To: <42b84ea3-b3c1-4839-acfc-bd182e7af313@linux.dev>
From: Martin KaFai Lau <martin.lau@linux.dev>
Date: Thu, 17 Apr 2025 16:12:57 -0700
> On 4/17/25 3:41 PM, Kuniyuki Iwashima wrote:
> > From: Jordan Rife <jordan@jrife.io>
> > Date: Wed, 16 Apr 2025 16:36:17 -0700
> >> Require that iter->batch always contains a full bucket snapshot. This
> >> invariant is important to avoid skipping or repeating sockets during
> >> iteration when combined with the next few patches. Before, there were
> >> two cases where a call to bpf_iter_udp_batch may only capture part of a
> >> bucket:
> >>
> >> 1. When bpf_iter_udp_realloc_batch() returns -ENOMEM [1].
> >> 2. When more sockets are added to the bucket while calling
> >> bpf_iter_udp_realloc_batch(), making the updated batch size
> >> insufficient [2].
> >>
> >> In cases where the batch size only covers part of a bucket, it is
> >> possible to forget which sockets were already visited, especially if we
> >> have to process a bucket in more than two batches. This forces us to
> >> choose between repeating or skipping sockets, so don't allow this:
> >>
> >> 1. Stop iteration and propagate -ENOMEM up to userspace if reallocation
> >> fails instead of continuing with a partial batch.
> >> 2. Retry bpf_iter_udp_realloc_batch() up to two times if we fail to
> >> capture the full bucket. On the third attempt, hold onto the bucket
> >> lock (hslot2->lock) through bpf_iter_udp_realloc_batch() with
> >> GFP_ATOMIC to guarantee that the bucket size doesn't change before
> >> our next attempt. Try with GFP_USER first to improve the chances
> >> that memory allocation succeeds; only use GFP_ATOMIC as a last
> >> resort.
> >
> > kvmalloc() tries the kmalloc path, 1. slab and 2. page allocator, and
> > if both of them fails, then tries 3. vmalloc(). But, vmalloc() does not
> > support GFP_ATOMIC, __kvmalloc_node_noprof() returns at
> > !gfpflags_allow_blocking().
> >
> > So, falling back to GFP_ATOMIC is most unlikely to work as the last resort.
> >
> > GFP_ATOMIC first and falling back to GFP_USER few more times, or not using
> > GFP_ATOMIC makes more sense to me.
>
> If I read it correctly, the last retry with GFP_ATOMIC is not because of the
> earlier GFP_USER allocation failure but the size of the bucket has changed a lot
> that it is doing one final attempt to get the whole bucket and this requires to
> hold the bucket lock to ensure the size stays the same which then must use
> GFP_ATOMIC.
Ah exactly, when allocation fails, it always returned an error.
Sorry, I should've read code first.
next prev parent reply other threads:[~2025-04-17 23:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-16 23:36 [PATCH v3 bpf-next 0/6] bpf: udp: Exactly-once socket iteration Jordan Rife
2025-04-16 23:36 ` [PATCH v3 bpf-next 1/6] bpf: udp: Make mem flags configurable through bpf_iter_udp_realloc_batch Jordan Rife
2025-04-17 23:35 ` Kuniyuki Iwashima
2025-04-16 23:36 ` [PATCH v3 bpf-next 2/6] bpf: udp: Make sure iter->batch always contains a full bucket snapshot Jordan Rife
2025-04-17 22:41 ` Kuniyuki Iwashima
2025-04-17 23:12 ` Martin KaFai Lau
2025-04-17 23:32 ` Kuniyuki Iwashima [this message]
2025-04-17 23:51 ` Jordan Rife
2025-04-17 23:45 ` Kuniyuki Iwashima
2025-04-17 23:57 ` Jordan Rife
2025-04-18 0:15 ` Kuniyuki Iwashima
2025-04-16 23:36 ` [PATCH v3 bpf-next 3/6] bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items Jordan Rife
2025-04-16 23:36 ` [PATCH v3 bpf-next 4/6] bpf: udp: Avoid socket skips and repeats during iteration Jordan Rife
2025-04-16 23:36 ` [PATCH v3 bpf-next 5/6] selftests/bpf: Return socket cookies from sock_iter_batch progs Jordan Rife
2025-04-16 23:36 ` [PATCH v3 bpf-next 6/6] selftests/bpf: Add tests for bucket resume logic in UDP socket iterators Jordan Rife
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250417233303.37489-1-kuniyu@amazon.com \
--to=kuniyu@amazon.com \
--cc=aditi.ghag@isovalent.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jordan@jrife.io \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).