From: Paolo Abeni <pabeni@redhat.com>
To: Alexander Lobakin <aleksander.lobakin@intel.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>
Cc: "Lorenzo Bianconi" <lorenzo@kernel.org>,
"Daniel Xu" <dxu@dxuuu.xyz>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
"Toke Høiland-Jørgensen" <toke@kernel.org>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v2 5/8] net: skbuff: introduce napi_skb_cache_get_bulk()
Date: Thu, 9 Jan 2025 14:16:22 +0100 [thread overview]
Message-ID: <d97b6ec6-59fb-4123-9d96-27b9b32dc5cc@redhat.com> (raw)
In-Reply-To: <20250107152940.26530-6-aleksander.lobakin@intel.com>
On 1/7/25 4:29 PM, Alexander Lobakin wrote:
> Add a function to get an array of skbs from the NAPI percpu cache.
> It's supposed to be a drop-in replacement for
> kmem_cache_alloc_bulk(skbuff_head_cache, GFP_ATOMIC) and
> xdp_alloc_skb_bulk(GFP_ATOMIC). The difference (apart from the
> requirement to call it only from the BH) is that it tries to use
> as many NAPI cache entries for skbs as possible, and allocate new
> ones only if needed.
>
> The logic is as follows:
>
> * there is enough skbs in the cache: decache them and return to the
> caller;
> * not enough: try refilling the cache first. If there is now enough
> skbs, return;
> * still not enough: try allocating skbs directly to the output array
> with %GFP_ZERO, maybe we'll be able to get some. If there's now
> enough, return;
> * still not enough: return as many as we were able to obtain.
>
> Most of times, if called from the NAPI polling loop, the first one will
> be true, sometimes (rarely) the second one. The third and the fourth --
> only under heavy memory pressure.
> It can save significant amounts of CPU cycles if there are GRO cycles
> and/or Tx completion cycles (anything that descends to
> napi_skb_cache_put()) happening on this CPU.
>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> Tested-by: Daniel Xu <dxu@dxuuu.xyz>
> ---
> include/linux/skbuff.h | 1 +
> net/core/skbuff.c | 62 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 63 insertions(+)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index bb2b751d274a..1c089c7c14e1 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1315,6 +1315,7 @@ struct sk_buff *build_skb_around(struct sk_buff *skb,
> void *data, unsigned int frag_size);
> void skb_attempt_defer_free(struct sk_buff *skb);
>
> +u32 napi_skb_cache_get_bulk(void **skbs, u32 n);
> struct sk_buff *napi_build_skb(void *data, unsigned int frag_size);
> struct sk_buff *slab_build_skb(void *data);
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a441613a1e6c..42eb31dcc9ce 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -367,6 +367,68 @@ static struct sk_buff *napi_skb_cache_get(void)
> return skb;
> }
>
> +/**
> + * napi_skb_cache_get_bulk - obtain a number of zeroed skb heads from the cache
> + * @skbs: pointer to an at least @n-sized array to fill with skb pointers
> + * @n: number of entries to provide
> + *
> + * Tries to obtain @n &sk_buff entries from the NAPI percpu cache and writes
> + * the pointers into the provided array @skbs. If there are less entries
> + * available, tries to replenish the cache and bulk-allocates the diff from
> + * the MM layer if needed.
> + * The heads are being zeroed with either memset() or %__GFP_ZERO, so they are
> + * ready for {,__}build_skb_around() and don't have any data buffers attached.
> + * Must be called *only* from the BH context.
> + *
> + * Return: number of successfully allocated skbs (@n if no actual allocation
> + * needed or kmem_cache_alloc_bulk() didn't fail).
> + */
> +u32 napi_skb_cache_get_bulk(void **skbs, u32 n)
> +{
> + struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
> + u32 bulk, total = n;
> +
> + local_lock_nested_bh(&napi_alloc_cache.bh_lock);
> +
> + if (nc->skb_count >= n)
> + goto get;
I (mis?)understood from the commit message this condition should be
likely, too?!?
> + /* No enough cached skbs. Try refilling the cache first */
> + bulk = min(NAPI_SKB_CACHE_SIZE - nc->skb_count, NAPI_SKB_CACHE_BULK);
> + nc->skb_count += kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
> + GFP_ATOMIC | __GFP_NOWARN, bulk,
> + &nc->skb_cache[nc->skb_count]);
> + if (likely(nc->skb_count >= n))
> + goto get;
> +
> + /* Still not enough. Bulk-allocate the missing part directly, zeroed */
> + n -= kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
> + GFP_ATOMIC | __GFP_ZERO | __GFP_NOWARN,
> + n - nc->skb_count, &skbs[nc->skb_count]);
You should probably cap 'n' to NAPI_SKB_CACHE_SIZE. Also what about
latency spikes when n == 48 (should be the maximum possible with such
limit) here?
/P
next prev parent reply other threads:[~2025-01-09 13:16 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-07 15:29 [PATCH net-next v2 0/8] bpf: cpumap: enable GRO for XDP_PASS frames Alexander Lobakin
2025-01-07 15:29 ` [PATCH net-next v2 1/8] net: gro: decouple GRO from the NAPI layer Alexander Lobakin
2025-01-09 14:24 ` Paolo Abeni
2025-01-13 13:50 ` Alexander Lobakin
2025-01-13 21:01 ` Jakub Kicinski
2025-01-14 17:19 ` Alexander Lobakin
2025-01-07 15:29 ` [PATCH net-next v2 2/8] net: gro: expose GRO init/cleanup to use outside of NAPI Alexander Lobakin
2025-01-07 15:29 ` [PATCH net-next v2 3/8] bpf: cpumap: switch to GRO from netif_receive_skb_list() Alexander Lobakin
2025-01-07 15:29 ` [PATCH net-next v2 4/8] bpf: cpumap: reuse skb array instead of a linked list to chain skbs Alexander Lobakin
2025-01-07 15:29 ` [PATCH net-next v2 5/8] net: skbuff: introduce napi_skb_cache_get_bulk() Alexander Lobakin
2025-01-09 13:16 ` Paolo Abeni [this message]
2025-01-13 13:47 ` Alexander Lobakin
2025-01-07 15:29 ` [PATCH net-next v2 6/8] bpf: cpumap: switch to napi_skb_cache_get_bulk() Alexander Lobakin
2025-01-07 15:29 ` [PATCH net-next v2 7/8] veth: use napi_skb_cache_get_bulk() instead of xdp_alloc_skb_bulk() Alexander Lobakin
2025-01-07 15:29 ` [PATCH net-next v2 8/8] xdp: remove xdp_alloc_skb_bulk() Alexander Lobakin
2025-01-07 17:17 ` [PATCH net-next v2 0/8] bpf: cpumap: enable GRO for XDP_PASS frames Jesper Dangaard Brouer
2025-01-08 13:39 ` Alexander Lobakin
2025-01-09 17:02 ` Toke Høiland-Jørgensen
2025-01-13 13:50 ` Alexander Lobakin
2025-01-09 1:26 ` Daniel Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d97b6ec6-59fb-4123-9d96-27b9b32dc5cc@redhat.com \
--to=pabeni@redhat.com \
--cc=aleksander.lobakin@intel.com \
--cc=andrew+netdev@lunn.ch \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dxu@dxuuu.xyz \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lorenzo@kernel.org \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=toke@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox