All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net 0/2] net: remove the single page frag cache for good
@ 2025-02-18 18:29 Paolo Abeni
  2025-02-18 18:29 ` [PATCH v2 net 1/2] net: allow small head cache usage with large MAX_SKB_FRAGS values Paolo Abeni
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Paolo Abeni @ 2025-02-18 18:29 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, Simon Horman,
	Andrew Lunn, Alexander Duyck, Nikolay Aleksandrov, Jason Xing

This is another attempt at reverting commit dbae2b062824 ("net: skb:
introduce and use a single page frag cache"), as it causes regressions
in specific use-cases.

Reverting such commit uncovers an allocation issue for build with 
CONFIG_MAX_SKB_FRAGS=45, as reported by Sabrina.

This series handle the latter in patch 1 and brings the revert in patch
2.

Note that there is a little chicken-egg problem, as I included into the
patch 1's changelog the splat that would be visible only applying first
the revert: I think current patch order is better for bisectability,
still the splat is useful for correct attribution.

Paolo Abeni (2):
  net: allow small head cache usage with large MAX_SKB_FRAGS values
  Revert "net: skb: introduce and use a single page frag cache"

 include/linux/netdevice.h |   1 -
 include/net/gro.h         |   3 ++
 net/core/dev.c            |  17 ++++++
 net/core/gro.c            |   3 --
 net/core/skbuff.c         | 110 ++++----------------------------------
 5 files changed, 30 insertions(+), 104 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 net 1/2] net: allow small head cache usage with large MAX_SKB_FRAGS values
  2025-02-18 18:29 [PATCH v2 net 0/2] net: remove the single page frag cache for good Paolo Abeni
@ 2025-02-18 18:29 ` Paolo Abeni
  2025-02-20  9:41   ` Eric Dumazet
  2025-02-18 18:29 ` [PATCH v2 net 2/2] Revert "net: skb: introduce and use a single page frag cache" Paolo Abeni
  2025-02-20 10:10 ` [PATCH v2 net 0/2] net: remove the single page frag cache for good patchwork-bot+netdevbpf
  2 siblings, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2025-02-18 18:29 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, Simon Horman,
	Andrew Lunn, Alexander Duyck, Nikolay Aleksandrov, Jason Xing

Sabrina reported the following splat:

    WARNING: CPU: 0 PID: 1 at net/core/dev.c:6935 netif_napi_add_weight_locked+0x8f2/0xba0
    Modules linked in:
    CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc1-net-00092-g011b03359038 #996
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
    RIP: 0010:netif_napi_add_weight_locked+0x8f2/0xba0
    Code: e8 c3 e6 6a fe 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc c7 44 24 10 ff ff ff ff e9 8f fb ff ff e8 9e e6 6a fe <0f> 0b e9 d3 fe ff ff e8 92 e6 6a fe 48 8b 04 24 be ff ff ff ff 48
    RSP: 0000:ffffc9000001fc60 EFLAGS: 00010293
    RAX: 0000000000000000 RBX: ffff88806ce48128 RCX: 1ffff11001664b9e
    RDX: ffff888008f00040 RSI: ffffffff8317ca42 RDI: ffff88800b325cb6
    RBP: ffff88800b325c40 R08: 0000000000000001 R09: ffffed100167502c
    R10: ffff88800b3a8163 R11: 0000000000000000 R12: ffff88800ac1c168
    R13: ffff88800ac1c168 R14: ffff88800ac1c168 R15: 0000000000000007
    FS:  0000000000000000(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff888008201000 CR3: 0000000004c94001 CR4: 0000000000370ef0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    <TASK>
    gro_cells_init+0x1ba/0x270
    xfrm_input_init+0x4b/0x2a0
    xfrm_init+0x38/0x50
    ip_rt_init+0x2d7/0x350
    ip_init+0xf/0x20
    inet_init+0x406/0x590
    do_one_initcall+0x9d/0x2e0
    do_initcalls+0x23b/0x280
    kernel_init_freeable+0x445/0x490
    kernel_init+0x20/0x1d0
    ret_from_fork+0x46/0x80
    ret_from_fork_asm+0x1a/0x30
    </TASK>
    irq event stamp: 584330
    hardirqs last  enabled at (584338): [<ffffffff8168bf87>] __up_console_sem+0x77/0xb0
    hardirqs last disabled at (584345): [<ffffffff8168bf6c>] __up_console_sem+0x5c/0xb0
    softirqs last  enabled at (583242): [<ffffffff833ee96d>] netlink_insert+0x14d/0x470
    softirqs last disabled at (583754): [<ffffffff8317c8cd>] netif_napi_add_weight_locked+0x77d/0xba0

on kernel built with MAX_SKB_FRAGS=45, where SKB_WITH_OVERHEAD(1024)
is smaller than GRO_MAX_HEAD.

Such built additionally contains the revert of the single page frag cache
so that napi_get_frags() ends up using the page frag allocator, triggering
the splat.

Note that the underlying issue is independent from the mentioned
revert; address it ensuring that the small head cache will fit either TCP
and GRO allocation and updating napi_alloc_skb() and __netdev_alloc_skb()
to select kmalloc() usage for any allocation fitting such cache.

Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
I preferred moving the GRO_MAX_HEAD definition into the gro.h header
instead of the (suggested) tcp.h, to avoid including such a large file
even from gro.c
---
 include/net/gro.h |  3 +++
 net/core/gro.c    |  3 ---
 net/core/skbuff.c | 10 +++++++---
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/net/gro.h b/include/net/gro.h
index b9b58c1f8d19..7b548f91754b 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -11,6 +11,9 @@
 #include <net/udp.h>
 #include <net/hotdata.h>
 
+/* This should be increased if a protocol with a bigger head is added. */
+#define GRO_MAX_HEAD (MAX_HEADER + 128)
+
 struct napi_gro_cb {
 	union {
 		struct {
diff --git a/net/core/gro.c b/net/core/gro.c
index d1f44084e978..78b320b63174 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -7,9 +7,6 @@
 
 #define MAX_GRO_SKBS 8
 
-/* This should be increased if a protocol with a bigger head is added. */
-#define GRO_MAX_HEAD (MAX_HEADER + 128)
-
 static DEFINE_SPINLOCK(offload_lock);
 
 /**
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a441613a1e6c..f5a6d50570c4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -69,6 +69,7 @@
 #include <net/dst.h>
 #include <net/sock.h>
 #include <net/checksum.h>
+#include <net/gro.h>
 #include <net/gso.h>
 #include <net/hotdata.h>
 #include <net/ip6_checksum.h>
@@ -95,7 +96,9 @@
 static struct kmem_cache *skbuff_ext_cache __ro_after_init;
 #endif
 
-#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(MAX_TCP_HEADER)
+#define GRO_MAX_HEAD_PAD (GRO_MAX_HEAD + NET_SKB_PAD + NET_IP_ALIGN)
+#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(max(MAX_TCP_HEADER, \
+					       GRO_MAX_HEAD_PAD))
 
 /* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two.
  * This should ensure that SKB_SMALL_HEAD_HEADROOM is a unique
@@ -736,7 +739,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
 	/* If requested length is either too small or too big,
 	 * we use kmalloc() for skb->head allocation.
 	 */
-	if (len <= SKB_WITH_OVERHEAD(1024) ||
+	if (len <= SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE) ||
 	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
 	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
 		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
@@ -816,7 +819,8 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len)
 	 * When the small frag allocator is available, prefer it over kmalloc
 	 * for small fragments
 	 */
-	if ((!NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) ||
+	if ((!NAPI_HAS_SMALL_PAGE_FRAG &&
+	     len <= SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)) ||
 	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
 	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
 		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX | SKB_ALLOC_NAPI,
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 net 2/2] Revert "net: skb: introduce and use a single page frag cache"
  2025-02-18 18:29 [PATCH v2 net 0/2] net: remove the single page frag cache for good Paolo Abeni
  2025-02-18 18:29 ` [PATCH v2 net 1/2] net: allow small head cache usage with large MAX_SKB_FRAGS values Paolo Abeni
@ 2025-02-18 18:29 ` Paolo Abeni
  2025-02-20  9:41   ` Eric Dumazet
  2025-02-20 10:10 ` [PATCH v2 net 0/2] net: remove the single page frag cache for good patchwork-bot+netdevbpf
  2 siblings, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2025-02-18 18:29 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, Simon Horman,
	Andrew Lunn, Alexander Duyck, Nikolay Aleksandrov, Jason Xing

After the previous commit is finally safe to revert commit dbae2b062824
("net: skb: introduce and use a single page frag cache"): do it here.

The intended goal of such change was to counter a performance regression
introduced by commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs").

Unfortunately, the blamed commit introduces another regression for the
virtio_net driver. Such a driver calls napi_alloc_skb() with a tiny
size, so that the whole head frag could fit a 512-byte block.

The single page frag cache uses a 1K fragment for such allocation, and
the additional overhead, under small UDP packets flood, makes the page
allocator a bottleneck.

Thanks to commit bf9f1baa279f ("net: add dedicated kmem_cache for
typical/small skb->head"), this revert does not re-introduce the
original regression. Actually, in the relevant test on top of this
revert, I measure a small but noticeable positive delta, just above
noise level.

The revert itself required some additional mangling due to recent updates
in the affected code.

Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: dbae2b062824 ("net: skb: introduce and use a single page frag cache")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/netdevice.h |   1 -
 net/core/dev.c            |  17 +++++++
 net/core/skbuff.c         | 104 ++------------------------------------
 3 files changed, 22 insertions(+), 100 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c0a86afb85da..365f0e2098d1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4115,7 +4115,6 @@ void netif_receive_skb_list(struct list_head *head);
 gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb);
 void napi_gro_flush(struct napi_struct *napi, bool flush_old);
 struct sk_buff *napi_get_frags(struct napi_struct *napi);
-void napi_get_frags_check(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
 
 static inline void napi_free_frags(struct napi_struct *napi)
diff --git a/net/core/dev.c b/net/core/dev.c
index b91658e8aedb..55e356a68db6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6920,6 +6920,23 @@ netif_napi_dev_list_add(struct net_device *dev, struct napi_struct *napi)
 	list_add_rcu(&napi->dev_list, higher); /* adds after higher */
 }
 
+/* Double check that napi_get_frags() allocates skbs with
+ * skb->head being backed by slab, not a page fragment.
+ * This is to make sure bug fixed in 3226b158e67c
+ * ("net: avoid 32 x truesize under-estimation for tiny skbs")
+ * does not accidentally come back.
+ */
+static void napi_get_frags_check(struct napi_struct *napi)
+{
+	struct sk_buff *skb;
+
+	local_bh_disable();
+	skb = napi_get_frags(napi);
+	WARN_ON_ONCE(skb && skb->head_frag);
+	napi_free_frags(napi);
+	local_bh_enable();
+}
+
 void netif_napi_add_weight_locked(struct net_device *dev,
 				  struct napi_struct *napi,
 				  int (*poll)(struct napi_struct *, int),
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f5a6d50570c4..7b03b64fdcb2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -223,67 +223,9 @@ static void skb_under_panic(struct sk_buff *skb, unsigned int sz, void *addr)
 #define NAPI_SKB_CACHE_BULK	16
 #define NAPI_SKB_CACHE_HALF	(NAPI_SKB_CACHE_SIZE / 2)
 
-#if PAGE_SIZE == SZ_4K
-
-#define NAPI_HAS_SMALL_PAGE_FRAG	1
-#define NAPI_SMALL_PAGE_PFMEMALLOC(nc)	((nc).pfmemalloc)
-
-/* specialized page frag allocator using a single order 0 page
- * and slicing it into 1K sized fragment. Constrained to systems
- * with a very limited amount of 1K fragments fitting a single
- * page - to avoid excessive truesize underestimation
- */
-
-struct page_frag_1k {
-	void *va;
-	u16 offset;
-	bool pfmemalloc;
-};
-
-static void *page_frag_alloc_1k(struct page_frag_1k *nc, gfp_t gfp)
-{
-	struct page *page;
-	int offset;
-
-	offset = nc->offset - SZ_1K;
-	if (likely(offset >= 0))
-		goto use_frag;
-
-	page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
-	if (!page)
-		return NULL;
-
-	nc->va = page_address(page);
-	nc->pfmemalloc = page_is_pfmemalloc(page);
-	offset = PAGE_SIZE - SZ_1K;
-	page_ref_add(page, offset / SZ_1K);
-
-use_frag:
-	nc->offset = offset;
-	return nc->va + offset;
-}
-#else
-
-/* the small page is actually unused in this build; add dummy helpers
- * to please the compiler and avoid later preprocessor's conditionals
- */
-#define NAPI_HAS_SMALL_PAGE_FRAG	0
-#define NAPI_SMALL_PAGE_PFMEMALLOC(nc)	false
-
-struct page_frag_1k {
-};
-
-static void *page_frag_alloc_1k(struct page_frag_1k *nc, gfp_t gfp_mask)
-{
-	return NULL;
-}
-
-#endif
-
 struct napi_alloc_cache {
 	local_lock_t bh_lock;
 	struct page_frag_cache page;
-	struct page_frag_1k page_small;
 	unsigned int skb_count;
 	void *skb_cache[NAPI_SKB_CACHE_SIZE];
 };
@@ -293,23 +235,6 @@ static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache) = {
 	.bh_lock = INIT_LOCAL_LOCK(bh_lock),
 };
 
-/* Double check that napi_get_frags() allocates skbs with
- * skb->head being backed by slab, not a page fragment.
- * This is to make sure bug fixed in 3226b158e67c
- * ("net: avoid 32 x truesize under-estimation for tiny skbs")
- * does not accidentally come back.
- */
-void napi_get_frags_check(struct napi_struct *napi)
-{
-	struct sk_buff *skb;
-
-	local_bh_disable();
-	skb = napi_get_frags(napi);
-	WARN_ON_ONCE(!NAPI_HAS_SMALL_PAGE_FRAG && skb && skb->head_frag);
-	napi_free_frags(napi);
-	local_bh_enable();
-}
-
 void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask)
 {
 	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
@@ -816,11 +741,8 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len)
 
 	/* If requested length is either too small or too big,
 	 * we use kmalloc() for skb->head allocation.
-	 * When the small frag allocator is available, prefer it over kmalloc
-	 * for small fragments
 	 */
-	if ((!NAPI_HAS_SMALL_PAGE_FRAG &&
-	     len <= SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)) ||
+	if (len <= SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE) ||
 	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
 	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
 		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX | SKB_ALLOC_NAPI,
@@ -830,32 +752,16 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len)
 		goto skb_success;
 	}
 
+	len = SKB_HEAD_ALIGN(len);
+
 	if (sk_memalloc_socks())
 		gfp_mask |= __GFP_MEMALLOC;
 
 	local_lock_nested_bh(&napi_alloc_cache.bh_lock);
 	nc = this_cpu_ptr(&napi_alloc_cache);
-	if (NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) {
-		/* we are artificially inflating the allocation size, but
-		 * that is not as bad as it may look like, as:
-		 * - 'len' less than GRO_MAX_HEAD makes little sense
-		 * - On most systems, larger 'len' values lead to fragment
-		 *   size above 512 bytes
-		 * - kmalloc would use the kmalloc-1k slab for such values
-		 * - Builds with smaller GRO_MAX_HEAD will very likely do
-		 *   little networking, as that implies no WiFi and no
-		 *   tunnels support, and 32 bits arches.
-		 */
-		len = SZ_1K;
 
-		data = page_frag_alloc_1k(&nc->page_small, gfp_mask);
-		pfmemalloc = NAPI_SMALL_PAGE_PFMEMALLOC(nc->page_small);
-	} else {
-		len = SKB_HEAD_ALIGN(len);
-
-		data = page_frag_alloc(&nc->page, len, gfp_mask);
-		pfmemalloc = page_frag_cache_is_pfmemalloc(&nc->page);
-	}
+	data = page_frag_alloc(&nc->page, len, gfp_mask);
+	pfmemalloc = page_frag_cache_is_pfmemalloc(&nc->page);
 	local_unlock_nested_bh(&napi_alloc_cache.bh_lock);
 
 	if (unlikely(!data))
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 net 1/2] net: allow small head cache usage with large MAX_SKB_FRAGS values
  2025-02-18 18:29 ` [PATCH v2 net 1/2] net: allow small head cache usage with large MAX_SKB_FRAGS values Paolo Abeni
@ 2025-02-20  9:41   ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2025-02-20  9:41 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, David S. Miller, Jakub Kicinski, Simon Horman,
	Andrew Lunn, Alexander Duyck, Nikolay Aleksandrov, Jason Xing

On Tue, Feb 18, 2025 at 7:32 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Sabrina reported the following splat:
>
>     WARNING: CPU: 0 PID: 1 at net/core/dev.c:6935 netif_napi_add_weight_locked+0x8f2/0xba0
>     Modules linked in:
>     CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc1-net-00092-g011b03359038 #996
>     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>     RIP: 0010:netif_napi_add_weight_locked+0x8f2/0xba0
>     Code: e8 c3 e6 6a fe 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc c7 44 24 10 ff ff ff ff e9 8f fb ff ff e8 9e e6 6a fe <0f> 0b e9 d3 fe ff ff e8 92 e6 6a fe 48 8b 04 24 be ff ff ff ff 48
>     RSP: 0000:ffffc9000001fc60 EFLAGS: 00010293
>     RAX: 0000000000000000 RBX: ffff88806ce48128 RCX: 1ffff11001664b9e
>     RDX: ffff888008f00040 RSI: ffffffff8317ca42 RDI: ffff88800b325cb6
>     RBP: ffff88800b325c40 R08: 0000000000000001 R09: ffffed100167502c
>     R10: ffff88800b3a8163 R11: 0000000000000000 R12: ffff88800ac1c168
>     R13: ffff88800ac1c168 R14: ffff88800ac1c168 R15: 0000000000000007
>     FS:  0000000000000000(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: ffff888008201000 CR3: 0000000004c94001 CR4: 0000000000370ef0
>     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>     DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>     Call Trace:
>     <TASK>
>     gro_cells_init+0x1ba/0x270
>     xfrm_input_init+0x4b/0x2a0
>     xfrm_init+0x38/0x50
>     ip_rt_init+0x2d7/0x350
>     ip_init+0xf/0x20
>     inet_init+0x406/0x590
>     do_one_initcall+0x9d/0x2e0
>     do_initcalls+0x23b/0x280
>     kernel_init_freeable+0x445/0x490
>     kernel_init+0x20/0x1d0
>     ret_from_fork+0x46/0x80
>     ret_from_fork_asm+0x1a/0x30
>     </TASK>
>     irq event stamp: 584330
>     hardirqs last  enabled at (584338): [<ffffffff8168bf87>] __up_console_sem+0x77/0xb0
>     hardirqs last disabled at (584345): [<ffffffff8168bf6c>] __up_console_sem+0x5c/0xb0
>     softirqs last  enabled at (583242): [<ffffffff833ee96d>] netlink_insert+0x14d/0x470
>     softirqs last disabled at (583754): [<ffffffff8317c8cd>] netif_napi_add_weight_locked+0x77d/0xba0
>
> on kernel built with MAX_SKB_FRAGS=45, where SKB_WITH_OVERHEAD(1024)
> is smaller than GRO_MAX_HEAD.
>
> Such built additionally contains the revert of the single page frag cache
> so that napi_get_frags() ends up using the page frag allocator, triggering
> the splat.
>
> Note that the underlying issue is independent from the mentioned
> revert; address it ensuring that the small head cache will fit either TCP
> and GRO allocation and updating napi_alloc_skb() and __netdev_alloc_skb()
> to select kmalloc() usage for any allocation fitting such cache.
>
> Reported-by: Sabrina Dubroca <sd@queasysnail.net>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 net 2/2] Revert "net: skb: introduce and use a single page frag cache"
  2025-02-18 18:29 ` [PATCH v2 net 2/2] Revert "net: skb: introduce and use a single page frag cache" Paolo Abeni
@ 2025-02-20  9:41   ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2025-02-20  9:41 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, David S. Miller, Jakub Kicinski, Simon Horman,
	Andrew Lunn, Alexander Duyck, Nikolay Aleksandrov, Jason Xing

On Tue, Feb 18, 2025 at 7:33 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> After the previous commit is finally safe to revert commit dbae2b062824
> ("net: skb: introduce and use a single page frag cache"): do it here.
>
> The intended goal of such change was to counter a performance regression
> introduced by commit 3226b158e67c ("net: avoid 32 x truesize
> under-estimation for tiny skbs").
>
> Unfortunately, the blamed commit introduces another regression for the
> virtio_net driver. Such a driver calls napi_alloc_skb() with a tiny
> size, so that the whole head frag could fit a 512-byte block.
>
> The single page frag cache uses a 1K fragment for such allocation, and
> the additional overhead, under small UDP packets flood, makes the page
> allocator a bottleneck.
>
> Thanks to commit bf9f1baa279f ("net: add dedicated kmem_cache for
> typical/small skb->head"), this revert does not re-introduce the
> original regression. Actually, in the relevant test on top of this
> revert, I measure a small but noticeable positive delta, just above
> noise level.
>
> The revert itself required some additional mangling due to recent updates
> in the affected code.
>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Fixes: dbae2b062824 ("net: skb: introduce and use a single page frag cache")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 net 0/2] net: remove the single page frag cache for good
  2025-02-18 18:29 [PATCH v2 net 0/2] net: remove the single page frag cache for good Paolo Abeni
  2025-02-18 18:29 ` [PATCH v2 net 1/2] net: allow small head cache usage with large MAX_SKB_FRAGS values Paolo Abeni
  2025-02-18 18:29 ` [PATCH v2 net 2/2] Revert "net: skb: introduce and use a single page frag cache" Paolo Abeni
@ 2025-02-20 10:10 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-02-20 10:10 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, edumazet, davem, kuba, horms, andrew+netdev,
	alexanderduyck, razor, kernelxing

Hello:

This series was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Tue, 18 Feb 2025 19:29:38 +0100 you wrote:
> This is another attempt at reverting commit dbae2b062824 ("net: skb:
> introduce and use a single page frag cache"), as it causes regressions
> in specific use-cases.
> 
> Reverting such commit uncovers an allocation issue for build with
> CONFIG_MAX_SKB_FRAGS=45, as reported by Sabrina.
> 
> [...]

Here is the summary with links:
  - [v2,net,1/2] net: allow small head cache usage with large MAX_SKB_FRAGS values
    https://git.kernel.org/netdev/net/c/14ad6ed30a10
  - [v2,net,2/2] Revert "net: skb: introduce and use a single page frag cache"
    https://git.kernel.org/netdev/net/c/6bc7e4eb0499

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-02-20 10:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-18 18:29 [PATCH v2 net 0/2] net: remove the single page frag cache for good Paolo Abeni
2025-02-18 18:29 ` [PATCH v2 net 1/2] net: allow small head cache usage with large MAX_SKB_FRAGS values Paolo Abeni
2025-02-20  9:41   ` Eric Dumazet
2025-02-18 18:29 ` [PATCH v2 net 2/2] Revert "net: skb: introduce and use a single page frag cache" Paolo Abeni
2025-02-20  9:41   ` Eric Dumazet
2025-02-20 10:10 ` [PATCH v2 net 0/2] net: remove the single page frag cache for good patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.