* [PATCH net-next 0/3] net: skbuff: cache one skb_ext for use by GRO
@ 2023-02-15 3:43 Jakub Kicinski
2023-02-15 3:43 ` [PATCH net-next 1/3] net: skb: carve the allocation out of skb_ext_add() Jakub Kicinski
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Jakub Kicinski @ 2023-02-15 3:43 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, willemb, fw, Jakub Kicinski
New offload features may require attaching skb_ext to every packet.
PSP support adds fields to the skb in the patches published by Google:
https://github.com/google/psp
Since PSP is designed to be stateless on the device we need to carry
metadata thru the stack and perform validation at the socket layer.
Upstream we'll likely try to use skb_ext to place such metadata.
Lower the cost of allocation of skb_ext by caching a single object
per core.
Obviously the PSP support isn't upstream yet, but I think that this
should also help the TC fallback, and any other skb_ext uses people
may come up with in the meantime.
Jakub Kicinski (3):
net: skb: carve the allocation out of skb_ext_add()
net: skbuff: cache one skb_ext for use by GRO
net: create and use NAPI version of tc_skb_ext_alloc()
.../ethernet/mellanox/mlx5/core/en/rep/tc.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +-
include/linux/skbuff.h | 1 +
include/net/pkt_cls.h | 9 ++
net/core/skbuff.c | 131 ++++++++++++++----
5 files changed, 117 insertions(+), 28 deletions(-)
--
2.39.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH net-next 1/3] net: skb: carve the allocation out of skb_ext_add()
2023-02-15 3:43 [PATCH net-next 0/3] net: skbuff: cache one skb_ext for use by GRO Jakub Kicinski
@ 2023-02-15 3:43 ` Jakub Kicinski
2023-02-15 3:43 ` [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO Jakub Kicinski
2023-02-15 3:43 ` [PATCH net-next 3/3] net: create and use NAPI version of tc_skb_ext_alloc() Jakub Kicinski
2 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2023-02-15 3:43 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, willemb, fw, Jakub Kicinski
Subsequent patches will try to add different skb_ext allocation
methods. Refactor skb_ext_add() so that most of its code moves
into a tail-called helper, and allocation can be easily swapped
out.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/core/skbuff.c | 52 ++++++++++++++++++++++++++++-------------------
1 file changed, 31 insertions(+), 21 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 13ea10cf8544..6f0fc1f09536 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6672,26 +6672,13 @@ void *__skb_ext_set(struct sk_buff *skb, enum skb_ext_id id,
return skb_ext_get_ptr(ext, id);
}
-/**
- * skb_ext_add - allocate space for given extension, COW if needed
- * @skb: buffer
- * @id: extension to allocate space for
- *
- * Allocates enough space for the given extension.
- * If the extension is already present, a pointer to that extension
- * is returned.
- *
- * If the skb was cloned, COW applies and the returned memory can be
- * modified without changing the extension space of clones buffers.
- *
- * Returns pointer to the extension or NULL on allocation failure.
- */
-void *skb_ext_add(struct sk_buff *skb, enum skb_ext_id id)
+static void *skb_ext_add_finalize(struct sk_buff *skb, enum skb_ext_id id,
+ struct skb_ext *new)
{
- struct skb_ext *new, *old = NULL;
unsigned int newlen, newoff;
+ struct skb_ext *old;
- if (skb->active_extensions) {
+ if (!new) {
old = skb->extensions;
new = skb_ext_maybe_cow(old, skb->active_extensions);
@@ -6704,10 +6691,6 @@ void *skb_ext_add(struct sk_buff *skb, enum skb_ext_id id)
newoff = new->chunks;
} else {
newoff = SKB_EXT_CHUNKSIZEOF(*new);
-
- new = __skb_ext_alloc(GFP_ATOMIC);
- if (!new)
- return NULL;
}
newlen = newoff + skb_ext_type_len[id];
@@ -6719,6 +6702,33 @@ void *skb_ext_add(struct sk_buff *skb, enum skb_ext_id id)
skb->active_extensions |= 1 << id;
return skb_ext_get_ptr(new, id);
}
+
+/**
+ * skb_ext_add - allocate space for given extension, COW if needed
+ * @skb: buffer
+ * @id: extension to allocate space for
+ *
+ * Allocates enough space for the given extension.
+ * If the extension is already present, a pointer to that extension
+ * is returned.
+ *
+ * If the skb was cloned, COW applies and the returned memory can be
+ * modified without changing the extension space of clones buffers.
+ *
+ * Returns pointer to the extension or NULL on allocation failure.
+ */
+void *skb_ext_add(struct sk_buff *skb, enum skb_ext_id id)
+{
+ struct skb_ext *new = NULL;
+
+ if (!skb->active_extensions) {
+ new = __skb_ext_alloc(GFP_ATOMIC);
+ if (!new)
+ return NULL;
+ }
+
+ return skb_ext_add_finalize(skb, id, new);
+}
EXPORT_SYMBOL(skb_ext_add);
#ifdef CONFIG_XFRM
--
2.39.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO
2023-02-15 3:43 [PATCH net-next 0/3] net: skbuff: cache one skb_ext for use by GRO Jakub Kicinski
2023-02-15 3:43 ` [PATCH net-next 1/3] net: skb: carve the allocation out of skb_ext_add() Jakub Kicinski
@ 2023-02-15 3:43 ` Jakub Kicinski
2023-02-15 8:41 ` Paolo Abeni
2023-02-15 15:37 ` Edward Cree
2023-02-15 3:43 ` [PATCH net-next 3/3] net: create and use NAPI version of tc_skb_ext_alloc() Jakub Kicinski
2 siblings, 2 replies; 19+ messages in thread
From: Jakub Kicinski @ 2023-02-15 3:43 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, willemb, fw, Jakub Kicinski
On the driver -> GRO path we can avoid thrashing the kmemcache
by holding onto one skb_ext.
Drivers usually report static data, so don't bother trying to
hold onto the skb_ext if the ext has contents which require
a destructor.
With a single flow and SW GRO adding a tc_skb_ext to every
frame costs around 16.6% of performance (21.2Gbps -> 17.6Gbps,
yes it's a relatively slow CPU). Using the cache reduces
the loss to 9.3%, (-> 19.2Gbps) although obviously in real
life the recycling will be less effective.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/skbuff.h | 1 +
net/core/skbuff.c | 79 +++++++++++++++++++++++++++++++++++++++---
2 files changed, 75 insertions(+), 5 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d5602b15c714..e68cb0a777b9 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4622,6 +4622,7 @@ struct skb_ext *__skb_ext_alloc(gfp_t flags);
void *__skb_ext_set(struct sk_buff *skb, enum skb_ext_id id,
struct skb_ext *ext);
void *skb_ext_add(struct sk_buff *skb, enum skb_ext_id id);
+void *napi_skb_ext_add(struct sk_buff *skb, enum skb_ext_id id);
void __skb_ext_del(struct sk_buff *skb, enum skb_ext_id id);
void __skb_ext_put(struct skb_ext *ext);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6f0fc1f09536..feb5034b13ad 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -224,6 +224,9 @@ static void *page_frag_alloc_1k(struct page_frag_1k *nc, gfp_t gfp_mask)
struct napi_alloc_cache {
struct page_frag_cache page;
struct page_frag_1k page_small;
+#ifdef CONFIG_SKB_EXTENSIONS
+ struct skb_ext *ext;
+#endif
unsigned int skb_count;
void *skb_cache[NAPI_SKB_CACHE_SIZE];
};
@@ -1228,6 +1231,43 @@ static void napi_skb_cache_put(struct sk_buff *skb)
}
}
+static bool skb_ext_needs_destruct(const struct skb_ext *ext)
+{
+ bool needs_destruct = false;
+
+#ifdef CONFIG_XFRM
+ needs_destruct |= __skb_ext_exist(ext, SKB_EXT_SEC_PATH);
+#endif
+#ifdef CONFIG_MCTP_FLOWS
+ needs_destruct |= __skb_ext_exist(ext, SKB_EXT_MCTP);
+#endif
+
+ return needs_destruct;
+}
+
+static void napi_skb_ext_put(struct sk_buff *skb)
+{
+#ifdef CONFIG_SKB_EXTENSIONS
+ struct skb_ext *ext;
+
+ if (!skb->active_extensions)
+ return;
+
+ ext = skb->extensions;
+ if (!skb_ext_needs_destruct(ext)) {
+ struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+
+ if (refcount_read(&ext->refcnt) == 1 && !nc->ext) {
+ kasan_poison_object_data(skbuff_ext_cache, ext);
+ nc->ext = ext;
+ return;
+ }
+ }
+
+ __skb_ext_put(ext);
+#endif
+}
+
void __kfree_skb_defer(struct sk_buff *skb)
{
skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED);
@@ -1239,7 +1279,7 @@ void napi_skb_free_stolen_head(struct sk_buff *skb)
if (unlikely(skb->slow_gro)) {
nf_reset_ct(skb);
skb_dst_drop(skb);
- skb_ext_put(skb);
+ napi_skb_ext_put(skb);
skb_orphan(skb);
skb->slow_gro = 0;
}
@@ -6599,6 +6639,12 @@ static void *skb_ext_get_ptr(struct skb_ext *ext, enum skb_ext_id id)
return (void *)ext + (ext->offset[id] * SKB_EXT_ALIGN_VALUE);
}
+static void skb_ext_init(struct skb_ext *new)
+{
+ memset(new->offset, 0, sizeof(new->offset));
+ refcount_set(&new->refcnt, 1);
+}
+
/**
* __skb_ext_alloc - allocate a new skb extensions storage
*
@@ -6612,10 +6658,8 @@ struct skb_ext *__skb_ext_alloc(gfp_t flags)
{
struct skb_ext *new = kmem_cache_alloc(skbuff_ext_cache, flags);
- if (new) {
- memset(new->offset, 0, sizeof(new->offset));
- refcount_set(&new->refcnt, 1);
- }
+ if (new)
+ skb_ext_init(new);
return new;
}
@@ -6731,6 +6775,31 @@ void *skb_ext_add(struct sk_buff *skb, enum skb_ext_id id)
}
EXPORT_SYMBOL(skb_ext_add);
+void *napi_skb_ext_add(struct sk_buff *skb, enum skb_ext_id id)
+{
+ struct skb_ext *new = NULL;
+
+ if (!skb->active_extensions) {
+ struct napi_alloc_cache *nc;
+
+ nc = this_cpu_ptr(&napi_alloc_cache);
+ new = nc->ext;
+ if (new) {
+ kasan_unpoison_object_data(skbuff_ext_cache, new);
+ nc->ext = NULL;
+ } else {
+ new = kmem_cache_alloc(skbuff_ext_cache, GFP_ATOMIC);
+ if (!new)
+ return NULL;
+ }
+
+ skb_ext_init(new);
+ }
+
+ return skb_ext_add_finalize(skb, id, new);
+}
+EXPORT_SYMBOL(napi_skb_ext_add);
+
#ifdef CONFIG_XFRM
static void skb_ext_put_sp(struct sec_path *sp)
{
--
2.39.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 3/3] net: create and use NAPI version of tc_skb_ext_alloc()
2023-02-15 3:43 [PATCH net-next 0/3] net: skbuff: cache one skb_ext for use by GRO Jakub Kicinski
2023-02-15 3:43 ` [PATCH net-next 1/3] net: skb: carve the allocation out of skb_ext_add() Jakub Kicinski
2023-02-15 3:43 ` [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO Jakub Kicinski
@ 2023-02-15 3:43 ` Jakub Kicinski
2023-02-15 16:50 ` Jamal Hadi Salim
2 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2023-02-15 3:43 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, willemb, fw, Jakub Kicinski, saeedm,
leon, jhs, xiyou.wangcong, jiri, roid, ozsh, paulb
Try to use the cached skb_ext in the drivers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: saeedm@nvidia.com
CC: leon@kernel.org
CC: jhs@mojatatu.com
CC: xiyou.wangcong@gmail.com
CC: jiri@resnulli.us
CC: roid@nvidia.com
CC: ozsh@nvidia.com
CC: paulb@nvidia.com
---
drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +-
include/net/pkt_cls.h | 9 +++++++++
3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
index 3b590cfe33b8..ffbed5a92eab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
@@ -770,7 +770,7 @@ static bool mlx5e_restore_skb_chain(struct sk_buff *skb, u32 chain, u32 reg_c1,
struct mlx5_eswitch *esw;
u32 zone_restore_id;
- tc_skb_ext = tc_skb_ext_alloc(skb);
+ tc_skb_ext = tc_skb_ext_alloc_napi(skb);
if (!tc_skb_ext) {
WARN_ON(1);
return false;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 2d06b4412762..3d9da4ccaf5d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -5643,7 +5643,7 @@ bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
if (mapped_obj.type == MLX5_MAPPED_OBJ_CHAIN) {
chain = mapped_obj.chain;
- tc_skb_ext = tc_skb_ext_alloc(skb);
+ tc_skb_ext = tc_skb_ext_alloc_napi(skb);
if (WARN_ON(!tc_skb_ext))
return false;
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index ace437c6754b..82821a3f8a8b 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -764,6 +764,15 @@ static inline struct tc_skb_ext *tc_skb_ext_alloc(struct sk_buff *skb)
memset(tc_skb_ext, 0, sizeof(*tc_skb_ext));
return tc_skb_ext;
}
+
+static inline struct tc_skb_ext *tc_skb_ext_alloc_napi(struct sk_buff *skb)
+{
+ struct tc_skb_ext *tc_skb_ext = napi_skb_ext_add(skb, TC_SKB_EXT);
+
+ if (tc_skb_ext)
+ memset(tc_skb_ext, 0, sizeof(*tc_skb_ext));
+ return tc_skb_ext;
+}
#endif
enum tc_matchall_command {
--
2.39.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO
2023-02-15 3:43 ` [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO Jakub Kicinski
@ 2023-02-15 8:41 ` Paolo Abeni
2023-02-15 17:45 ` Jakub Kicinski
2023-02-15 15:37 ` Edward Cree
1 sibling, 1 reply; 19+ messages in thread
From: Paolo Abeni @ 2023-02-15 8:41 UTC (permalink / raw)
To: Jakub Kicinski, davem; +Cc: netdev, edumazet, willemb, fw
On Tue, 2023-02-14 at 19:43 -0800, Jakub Kicinski wrote:
> On the driver -> GRO path we can avoid thrashing the kmemcache
> by holding onto one skb_ext.
>
> Drivers usually report static data, so don't bother trying to
> hold onto the skb_ext if the ext has contents which require
> a destructor.
>
> With a single flow and SW GRO adding a tc_skb_ext to every
> frame costs around 16.6% of performance (21.2Gbps -> 17.6Gbps,
> yes it's a relatively slow CPU). Using the cache reduces
> the loss to 9.3%, (-> 19.2Gbps) although obviously in real
> life the recycling will be less effective.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> include/linux/skbuff.h | 1 +
> net/core/skbuff.c | 79 +++++++++++++++++++++++++++++++++++++++---
> 2 files changed, 75 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index d5602b15c714..e68cb0a777b9 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -4622,6 +4622,7 @@ struct skb_ext *__skb_ext_alloc(gfp_t flags);
> void *__skb_ext_set(struct sk_buff *skb, enum skb_ext_id id,
> struct skb_ext *ext);
> void *skb_ext_add(struct sk_buff *skb, enum skb_ext_id id);
> +void *napi_skb_ext_add(struct sk_buff *skb, enum skb_ext_id id);
> void __skb_ext_del(struct sk_buff *skb, enum skb_ext_id id);
> void __skb_ext_put(struct skb_ext *ext);
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 6f0fc1f09536..feb5034b13ad 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -224,6 +224,9 @@ static void *page_frag_alloc_1k(struct page_frag_1k *nc, gfp_t gfp_mask)
> struct napi_alloc_cache {
> struct page_frag_cache page;
> struct page_frag_1k page_small;
> +#ifdef CONFIG_SKB_EXTENSIONS
> + struct skb_ext *ext;
> +#endif
> unsigned int skb_count;
> void *skb_cache[NAPI_SKB_CACHE_SIZE];
> };
> @@ -1228,6 +1231,43 @@ static void napi_skb_cache_put(struct sk_buff *skb)
> }
> }
>
> +static bool skb_ext_needs_destruct(const struct skb_ext *ext)
> +{
> + bool needs_destruct = false;
> +
> +#ifdef CONFIG_XFRM
> + needs_destruct |= __skb_ext_exist(ext, SKB_EXT_SEC_PATH);
> +#endif
> +#ifdef CONFIG_MCTP_FLOWS
> + needs_destruct |= __skb_ext_exist(ext, SKB_EXT_MCTP);
> +#endif
> +
> + return needs_destruct;
> +}
> +
> +static void napi_skb_ext_put(struct sk_buff *skb)
> +{
> +#ifdef CONFIG_SKB_EXTENSIONS
> + struct skb_ext *ext;
> +
> + if (!skb->active_extensions)
> + return;
> +
> + ext = skb->extensions;
> + if (!skb_ext_needs_destruct(ext)) {
> + struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
> +
> + if (refcount_read(&ext->refcnt) == 1 && !nc->ext) {
> + kasan_poison_object_data(skbuff_ext_cache, ext);
> + nc->ext = ext;
> + return;
> + }
> + }
> +
> + __skb_ext_put(ext);
> +#endif
> +}
> +
> void __kfree_skb_defer(struct sk_buff *skb)
I'm wondering if napi_reuse_skb() should be touched, too? Even it's not
directly used by the following patch...
Cheers,
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO
2023-02-15 3:43 ` [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO Jakub Kicinski
2023-02-15 8:41 ` Paolo Abeni
@ 2023-02-15 15:37 ` Edward Cree
2023-02-15 16:17 ` Alexander Lobakin
1 sibling, 1 reply; 19+ messages in thread
From: Edward Cree @ 2023-02-15 15:37 UTC (permalink / raw)
To: Jakub Kicinski, davem; +Cc: netdev, edumazet, pabeni, willemb, fw
On 15/02/2023 03:43, Jakub Kicinski wrote:
> On the driver -> GRO path we can avoid thrashing the kmemcache
> by holding onto one skb_ext.
Hmm, will one be enough if we're doing GRO_NORMAL batching?
As for e.g. UDP traffic up to 8 skbs (by default) can have
overlapping lifetimes.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO
2023-02-15 15:37 ` Edward Cree
@ 2023-02-15 16:17 ` Alexander Lobakin
2023-02-15 17:52 ` Jakub Kicinski
0 siblings, 1 reply; 19+ messages in thread
From: Alexander Lobakin @ 2023-02-15 16:17 UTC (permalink / raw)
To: Edward Cree, Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, willemb, fw
From: Edward Cree <ecree.xilinx@gmail.com>
Date: Wed, 15 Feb 2023 15:37:44 +0000
> On 15/02/2023 03:43, Jakub Kicinski wrote:
>> On the driver -> GRO path we can avoid thrashing the kmemcache
>> by holding onto one skb_ext.
>
> Hmm, will one be enough if we're doing GRO_NORMAL batching?
> As for e.g. UDP traffic up to 8 skbs (by default) can have
> overlapping lifetimes.
>
I thought of an array of %NAPI_SKB_CACHE_SIZE to be honest. From what
I've ever tested, no cache (for any netstack-related object) is enough
if it can't serve one full NAPI poll :D
+ agree with Paolo re napi_reuse_skb(), it's used only in the NAPI
context and recycles a lot o'stuff already, we can speed it up safely here.
Thanks,
Olek
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 3/3] net: create and use NAPI version of tc_skb_ext_alloc()
2023-02-15 3:43 ` [PATCH net-next 3/3] net: create and use NAPI version of tc_skb_ext_alloc() Jakub Kicinski
@ 2023-02-15 16:50 ` Jamal Hadi Salim
2023-02-15 17:03 ` Jiri Pirko
2023-02-15 17:35 ` Jakub Kicinski
0 siblings, 2 replies; 19+ messages in thread
From: Jamal Hadi Salim @ 2023-02-15 16:50 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, willemb, fw, saeedm, leon,
xiyou.wangcong, jiri, roid, ozsh, paulb
On Tue, Feb 14, 2023 at 10:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Try to use the cached skb_ext in the drivers.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: saeedm@nvidia.com
> CC: leon@kernel.org
> CC: jhs@mojatatu.com
> CC: xiyou.wangcong@gmail.com
> CC: jiri@resnulli.us
> CC: roid@nvidia.com
> CC: ozsh@nvidia.com
> CC: paulb@nvidia.com
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c | 2 +-
> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +-
> include/net/pkt_cls.h | 9 +++++++++
> 3 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
> index 3b590cfe33b8..ffbed5a92eab 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
> @@ -770,7 +770,7 @@ static bool mlx5e_restore_skb_chain(struct sk_buff *skb, u32 chain, u32 reg_c1,
> struct mlx5_eswitch *esw;
> u32 zone_restore_id;
>
> - tc_skb_ext = tc_skb_ext_alloc(skb);
> + tc_skb_ext = tc_skb_ext_alloc_napi(skb);
> if (!tc_skb_ext) {
> WARN_ON(1);
> return false;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index 2d06b4412762..3d9da4ccaf5d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -5643,7 +5643,7 @@ bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
>
> if (mapped_obj.type == MLX5_MAPPED_OBJ_CHAIN) {
> chain = mapped_obj.chain;
> - tc_skb_ext = tc_skb_ext_alloc(skb);
> + tc_skb_ext = tc_skb_ext_alloc_napi(skb);
> if (WARN_ON(!tc_skb_ext))
> return false;
>
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index ace437c6754b..82821a3f8a8b 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -764,6 +764,15 @@ static inline struct tc_skb_ext *tc_skb_ext_alloc(struct sk_buff *skb)
> memset(tc_skb_ext, 0, sizeof(*tc_skb_ext));
> return tc_skb_ext;
> }
> +
> +static inline struct tc_skb_ext *tc_skb_ext_alloc_napi(struct sk_buff *skb)
> +{
> + struct tc_skb_ext *tc_skb_ext = napi_skb_ext_add(skb, TC_SKB_EXT);
> +
> + if (tc_skb_ext)
> + memset(tc_skb_ext, 0, sizeof(*tc_skb_ext));
> + return tc_skb_ext;
> +}
> #endif
>
Dumb question: Would this work with a consumer of the metadata past
RPS stage? didnt look closely but assumed per cpu skb_ext because of
the napi context - which will require a per cpu pointer to fetch it
later.
In P4TC we make heavy use of skb_ext and found it to be very pricey,
so we ended making it per cpu - but that's post RPS so we are safe.
cheers,
jamal
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 3/3] net: create and use NAPI version of tc_skb_ext_alloc()
2023-02-15 16:50 ` Jamal Hadi Salim
@ 2023-02-15 17:03 ` Jiri Pirko
2023-02-15 18:36 ` Jamal Hadi Salim
2023-02-15 17:35 ` Jakub Kicinski
1 sibling, 1 reply; 19+ messages in thread
From: Jiri Pirko @ 2023-02-15 17:03 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni, willemb, fw,
saeedm, leon, xiyou.wangcong, roid, ozsh, paulb
Wed, Feb 15, 2023 at 05:50:55PM CET, jhs@mojatatu.com wrote:
>On Tue, Feb 14, 2023 at 10:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> Try to use the cached skb_ext in the drivers.
>>
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>> ---
>> CC: saeedm@nvidia.com
>> CC: leon@kernel.org
>> CC: jhs@mojatatu.com
>> CC: xiyou.wangcong@gmail.com
>> CC: jiri@resnulli.us
>> CC: roid@nvidia.com
>> CC: ozsh@nvidia.com
>> CC: paulb@nvidia.com
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c | 2 +-
>> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +-
>> include/net/pkt_cls.h | 9 +++++++++
>> 3 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
>> index 3b590cfe33b8..ffbed5a92eab 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
>> @@ -770,7 +770,7 @@ static bool mlx5e_restore_skb_chain(struct sk_buff *skb, u32 chain, u32 reg_c1,
>> struct mlx5_eswitch *esw;
>> u32 zone_restore_id;
>>
>> - tc_skb_ext = tc_skb_ext_alloc(skb);
>> + tc_skb_ext = tc_skb_ext_alloc_napi(skb);
>> if (!tc_skb_ext) {
>> WARN_ON(1);
>> return false;
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>> index 2d06b4412762..3d9da4ccaf5d 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>> @@ -5643,7 +5643,7 @@ bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
>>
>> if (mapped_obj.type == MLX5_MAPPED_OBJ_CHAIN) {
>> chain = mapped_obj.chain;
>> - tc_skb_ext = tc_skb_ext_alloc(skb);
>> + tc_skb_ext = tc_skb_ext_alloc_napi(skb);
>> if (WARN_ON(!tc_skb_ext))
>> return false;
>>
>> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>> index ace437c6754b..82821a3f8a8b 100644
>> --- a/include/net/pkt_cls.h
>> +++ b/include/net/pkt_cls.h
>> @@ -764,6 +764,15 @@ static inline struct tc_skb_ext *tc_skb_ext_alloc(struct sk_buff *skb)
>> memset(tc_skb_ext, 0, sizeof(*tc_skb_ext));
>> return tc_skb_ext;
>> }
>> +
>> +static inline struct tc_skb_ext *tc_skb_ext_alloc_napi(struct sk_buff *skb)
>> +{
>> + struct tc_skb_ext *tc_skb_ext = napi_skb_ext_add(skb, TC_SKB_EXT);
>> +
>> + if (tc_skb_ext)
>> + memset(tc_skb_ext, 0, sizeof(*tc_skb_ext));
>> + return tc_skb_ext;
>> +}
>> #endif
>>
>
>Dumb question: Would this work with a consumer of the metadata past
>RPS stage? didnt look closely but assumed per cpu skb_ext because of
>the napi context - which will require a per cpu pointer to fetch it
>later.
>In P4TC we make heavy use of skb_ext and found it to be very pricey,
Why don't you use skb->cb internally in p4tc? Or does the skb leave p4tc
and arrive back again? When?
>so we ended making it per cpu - but that's post RPS so we are safe.
>
>cheers,
>jamal
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 3/3] net: create and use NAPI version of tc_skb_ext_alloc()
2023-02-15 16:50 ` Jamal Hadi Salim
2023-02-15 17:03 ` Jiri Pirko
@ 2023-02-15 17:35 ` Jakub Kicinski
2023-02-15 18:38 ` Jamal Hadi Salim
1 sibling, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2023-02-15 17:35 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: davem, netdev, edumazet, pabeni, willemb, fw, saeedm, leon,
xiyou.wangcong, jiri, roid, ozsh, paulb
On Wed, 15 Feb 2023 11:50:55 -0500 Jamal Hadi Salim wrote:
> Dumb question: Would this work with a consumer of the metadata past
> RPS stage? didnt look closely but assumed per cpu skb_ext because of
> the napi context - which will require a per cpu pointer to fetch it
> later.
The cache is just for allocation, specifically for the case where
driver allocates the skb and it's quickly coalesced by GRO.
The lifetime of the allocated object is not changed.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO
2023-02-15 8:41 ` Paolo Abeni
@ 2023-02-15 17:45 ` Jakub Kicinski
2023-02-15 18:08 ` Alexander Lobakin
0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2023-02-15 17:45 UTC (permalink / raw)
To: Paolo Abeni; +Cc: davem, netdev, edumazet, willemb, fw
On Wed, 15 Feb 2023 09:41:13 +0100 Paolo Abeni wrote:
> I'm wondering if napi_reuse_skb() should be touched, too? Even it's not
> directly used by the following patch...
I didn't touch it because I (sadly) don't have access to any driver
using GRO frags to test :( But I certainly can.
What about __kfree_skb_defer() and napi_consume_skb() (basically
the other two napi_skb_cache_put() callers) ?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO
2023-02-15 16:17 ` Alexander Lobakin
@ 2023-02-15 17:52 ` Jakub Kicinski
2023-02-15 18:01 ` Alexander Lobakin
0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2023-02-15 17:52 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Edward Cree, davem, netdev, edumazet, pabeni, willemb, fw
On Wed, 15 Feb 2023 17:17:53 +0100 Alexander Lobakin wrote:
> > On 15/02/2023 03:43, Jakub Kicinski wrote:
> >> On the driver -> GRO path we can avoid thrashing the kmemcache
> >> by holding onto one skb_ext.
> >
> > Hmm, will one be enough if we're doing GRO_NORMAL batching?
> > As for e.g. UDP traffic up to 8 skbs (by default) can have
> > overlapping lifetimes.
> >
> I thought of an array of %NAPI_SKB_CACHE_SIZE to be honest. From what
> I've ever tested, no cache (for any netstack-related object) is enough
> if it can't serve one full NAPI poll :D
I was hoping to leave sizing of the cache until we have some data from
a production network (or at least representative packet traces).
NAPI_SKB_CACHE_SIZE kinda assumes we're not doing much GRO, right?
And the current patch feeds the cache exclusively from GRO...
> + agree with Paolo re napi_reuse_skb(), it's used only in the NAPI
> context and recycles a lot o'stuff already, we can speed it up safely here.
LMK what's your opinion on touching the other potential spots, too.
(in Paolo's subthread).
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO
2023-02-15 17:52 ` Jakub Kicinski
@ 2023-02-15 18:01 ` Alexander Lobakin
2023-02-15 18:20 ` Jakub Kicinski
0 siblings, 1 reply; 19+ messages in thread
From: Alexander Lobakin @ 2023-02-15 18:01 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Edward Cree, davem, netdev, edumazet, pabeni, willemb, fw
From: Jakub Kicinski <kuba@kernel.org>
Date: Wed, 15 Feb 2023 09:52:00 -0800
> On Wed, 15 Feb 2023 17:17:53 +0100 Alexander Lobakin wrote:
>>> On 15/02/2023 03:43, Jakub Kicinski wrote:
>>>> On the driver -> GRO path we can avoid thrashing the kmemcache
>>>> by holding onto one skb_ext.
>>>
>>> Hmm, will one be enough if we're doing GRO_NORMAL batching?
>>> As for e.g. UDP traffic up to 8 skbs (by default) can have
>>> overlapping lifetimes.
>>>
>> I thought of an array of %NAPI_SKB_CACHE_SIZE to be honest. From what
>> I've ever tested, no cache (for any netstack-related object) is enough
>> if it can't serve one full NAPI poll :D
>
> I was hoping to leave sizing of the cache until we have some data from
> a production network (or at least representative packet traces).
>
> NAPI_SKB_CACHE_SIZE kinda assumes we're not doing much GRO, right?
It assumes we GRO a lot :D
Imagine that you have 64 frames during one poll and the GRO layer
decides to coalesce them by batches of 16. Then only 4 skbs will be
used, the rest will go as frags (with "stolen heads") -> 60 of 64 skbs
will return to that skb cache and will then be reused by napi_build_skb().
> And the current patch feeds the cache exclusively from GRO...
>
>> + agree with Paolo re napi_reuse_skb(), it's used only in the NAPI
>> context and recycles a lot o'stuff already, we can speed it up safely here.
>
> LMK what's your opinion on touching the other potential spots, too.
> (in Paolo's subthread).
<went to take a look already>
Thanks,
Olek
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO
2023-02-15 17:45 ` Jakub Kicinski
@ 2023-02-15 18:08 ` Alexander Lobakin
2023-02-15 19:08 ` Paolo Abeni
0 siblings, 1 reply; 19+ messages in thread
From: Alexander Lobakin @ 2023-02-15 18:08 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Paolo Abeni, davem, netdev, edumazet, willemb, fw
From: Jakub Kicinski <kuba@kernel.org>
Date: Wed, 15 Feb 2023 09:45:42 -0800
> On Wed, 15 Feb 2023 09:41:13 +0100 Paolo Abeni wrote:
>> I'm wondering if napi_reuse_skb() should be touched, too? Even it's not
>> directly used by the following patch...
>
> I didn't touch it because I (sadly) don't have access to any driver
> using GRO frags to test :( But I certainly can.
>
> What about __kfree_skb_defer() and napi_consume_skb() (basically
> the other two napi_skb_cache_put() callers) ?
>
Sounds good. Basically any caller of napi_skb_cache_put() can be
switched to recycle extensions.
But you certainly need to have a pool instead of just one pointer then,
since napi_consume_skb() will return a lot if exts are actively used :)
Having just one pointer will only make freeing extensions 1 step longer
(`caller -> skb cache -> kmem_cache_free()` 63 times per 1 Tx completion
poll for example to save you 1 pointer to be used on Rx later).
Thanks,
Olek
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO
2023-02-15 18:01 ` Alexander Lobakin
@ 2023-02-15 18:20 ` Jakub Kicinski
2023-02-16 12:04 ` Alexander Lobakin
0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2023-02-15 18:20 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Edward Cree, davem, netdev, edumazet, pabeni, willemb, fw
On Wed, 15 Feb 2023 19:01:19 +0100 Alexander Lobakin wrote:
> > I was hoping to leave sizing of the cache until we have some data from
> > a production network (or at least representative packet traces).
> >
> > NAPI_SKB_CACHE_SIZE kinda assumes we're not doing much GRO, right?
>
> It assumes we GRO a lot :D
>
> Imagine that you have 64 frames during one poll and the GRO layer
> decides to coalesce them by batches of 16. Then only 4 skbs will be
> used, the rest will go as frags (with "stolen heads") -> 60 of 64 skbs
> will return to that skb cache and will then be reused by napi_build_skb().
Let's say 5 - for 4 resulting skbs GRO will need the 4 resulting and
one extra to shuttle between the driver and GRO (worst case).
With a cache of 1 I'm guaranteed to save 59 alloc calls, 92%, right?
That's why I'm saying - the larger cache would help workloads which
don't GRO as much. Am I missing the point or how GRO works?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 3/3] net: create and use NAPI version of tc_skb_ext_alloc()
2023-02-15 17:03 ` Jiri Pirko
@ 2023-02-15 18:36 ` Jamal Hadi Salim
0 siblings, 0 replies; 19+ messages in thread
From: Jamal Hadi Salim @ 2023-02-15 18:36 UTC (permalink / raw)
To: Jiri Pirko
Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni, willemb, fw,
saeedm, leon, xiyou.wangcong, roid, ozsh, paulb
On Wed, Feb 15, 2023 at 12:03 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Wed, Feb 15, 2023 at 05:50:55PM CET, jhs@mojatatu.com wrote:
> >On Tue, Feb 14, 2023 at 10:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> Try to use the cached skb_ext in the drivers.
> >>
> >> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> >> ---
> >> CC: saeedm@nvidia.com
> >> CC: leon@kernel.org
> >> CC: jhs@mojatatu.com
> >> CC: xiyou.wangcong@gmail.com
> >> CC: jiri@resnulli.us
> >> CC: roid@nvidia.com
> >> CC: ozsh@nvidia.com
> >> CC: paulb@nvidia.com
> >> ---
> >> drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c | 2 +-
> >> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +-
> >> include/net/pkt_cls.h | 9 +++++++++
> >> 3 files changed, 11 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
> >> index 3b590cfe33b8..ffbed5a92eab 100644
> >> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
> >> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
> >> @@ -770,7 +770,7 @@ static bool mlx5e_restore_skb_chain(struct sk_buff *skb, u32 chain, u32 reg_c1,
> >> struct mlx5_eswitch *esw;
> >> u32 zone_restore_id;
> >>
> >> - tc_skb_ext = tc_skb_ext_alloc(skb);
> >> + tc_skb_ext = tc_skb_ext_alloc_napi(skb);
> >> if (!tc_skb_ext) {
> >> WARN_ON(1);
> >> return false;
> >> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> >> index 2d06b4412762..3d9da4ccaf5d 100644
> >> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> >> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> >> @@ -5643,7 +5643,7 @@ bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
> >>
> >> if (mapped_obj.type == MLX5_MAPPED_OBJ_CHAIN) {
> >> chain = mapped_obj.chain;
> >> - tc_skb_ext = tc_skb_ext_alloc(skb);
> >> + tc_skb_ext = tc_skb_ext_alloc_napi(skb);
> >> if (WARN_ON(!tc_skb_ext))
> >> return false;
> >>
> >> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> >> index ace437c6754b..82821a3f8a8b 100644
> >> --- a/include/net/pkt_cls.h
> >> +++ b/include/net/pkt_cls.h
> >> @@ -764,6 +764,15 @@ static inline struct tc_skb_ext *tc_skb_ext_alloc(struct sk_buff *skb)
> >> memset(tc_skb_ext, 0, sizeof(*tc_skb_ext));
> >> return tc_skb_ext;
> >> }
> >> +
> >> +static inline struct tc_skb_ext *tc_skb_ext_alloc_napi(struct sk_buff *skb)
> >> +{
> >> + struct tc_skb_ext *tc_skb_ext = napi_skb_ext_add(skb, TC_SKB_EXT);
> >> +
> >> + if (tc_skb_ext)
> >> + memset(tc_skb_ext, 0, sizeof(*tc_skb_ext));
> >> + return tc_skb_ext;
> >> +}
> >> #endif
> >>
> >
> >Dumb question: Would this work with a consumer of the metadata past
> >RPS stage? didnt look closely but assumed per cpu skb_ext because of
> >the napi context - which will require a per cpu pointer to fetch it
> >later.
> >In P4TC we make heavy use of skb_ext and found it to be very pricey,
>
> Why don't you use skb->cb internally in p4tc?
Just for space reasons since we have a lot of stuff we carry around
(headers, metadata, and table keys which could be each upto an imposed
512b in size).
> Or does the skb leave p4tc and arrive back again? When?
Only time it leaves is if we do a "recirculate" - which is essentially
a mirred redirect to ingress/egress of another port.
It still causes memory pressure in alloc/free - we are thinking of
getting rid of it altogether and replacing it with a statically
allocated per-cpu scratchpad; if it gets recirculated we start from
scratch.
cheers,
jamal
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 3/3] net: create and use NAPI version of tc_skb_ext_alloc()
2023-02-15 17:35 ` Jakub Kicinski
@ 2023-02-15 18:38 ` Jamal Hadi Salim
0 siblings, 0 replies; 19+ messages in thread
From: Jamal Hadi Salim @ 2023-02-15 18:38 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, willemb, fw, saeedm, leon,
xiyou.wangcong, jiri, roid, ozsh, paulb
On Wed, Feb 15, 2023 at 12:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 15 Feb 2023 11:50:55 -0500 Jamal Hadi Salim wrote:
> > Dumb question: Would this work with a consumer of the metadata past
> > RPS stage? didnt look closely but assumed per cpu skb_ext because of
> > the napi context - which will require a per cpu pointer to fetch it
> > later.
>
> The cache is just for allocation, specifically for the case where
> driver allocates the skb and it's quickly coalesced by GRO.
> The lifetime of the allocated object is not changed.
ah. That certainly works.
cheers,
jamal
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO
2023-02-15 18:08 ` Alexander Lobakin
@ 2023-02-15 19:08 ` Paolo Abeni
0 siblings, 0 replies; 19+ messages in thread
From: Paolo Abeni @ 2023-02-15 19:08 UTC (permalink / raw)
To: Alexander Lobakin, Jakub Kicinski; +Cc: davem, netdev, edumazet, willemb, fw
On Wed, 2023-02-15 at 19:08 +0100, Alexander Lobakin wrote:
> From: Jakub Kicinski <kuba@kernel.org>
> Date: Wed, 15 Feb 2023 09:45:42 -0800
>
> > On Wed, 15 Feb 2023 09:41:13 +0100 Paolo Abeni wrote:
> > > I'm wondering if napi_reuse_skb() should be touched, too? Even it's not
> > > directly used by the following patch...
> >
> > I didn't touch it because I (sadly) don't have access to any driver
> > using GRO frags to test :( But I certainly can.
> >
> > What about __kfree_skb_defer() and napi_consume_skb() (basically
> > the other two napi_skb_cache_put() callers) ?
> >
>
> Sounds good. Basically any caller of napi_skb_cache_put() can be
> switched to recycle extensions.
> But you certainly need to have a pool instead of just one pointer then,
> since napi_consume_skb() will return a lot if exts are actively used :)
This could be also a point to (initially) exclude napi_consume_skb()
and keep the (initial) implementation as simple as possible.
If the expected use-case more related to forwarded traffic, local
traffic or independent from such consideration?
Cheers,
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO
2023-02-15 18:20 ` Jakub Kicinski
@ 2023-02-16 12:04 ` Alexander Lobakin
0 siblings, 0 replies; 19+ messages in thread
From: Alexander Lobakin @ 2023-02-16 12:04 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Edward Cree, davem, netdev, edumazet, pabeni, willemb, fw
From: Jakub Kicinski <kuba@kernel.org>
Date: Wed, 15 Feb 2023 10:20:15 -0800
> On Wed, 15 Feb 2023 19:01:19 +0100 Alexander Lobakin wrote:
>>> I was hoping to leave sizing of the cache until we have some data from
>>> a production network (or at least representative packet traces).
>>>
>>> NAPI_SKB_CACHE_SIZE kinda assumes we're not doing much GRO, right?
>>
>> It assumes we GRO a lot :D
>>
>> Imagine that you have 64 frames during one poll and the GRO layer
>> decides to coalesce them by batches of 16. Then only 4 skbs will be
>> used, the rest will go as frags (with "stolen heads") -> 60 of 64 skbs
>> will return to that skb cache and will then be reused by napi_build_skb().
>
> Let's say 5 - for 4 resulting skbs GRO will need the 4 resulting and
> one extra to shuttle between the driver and GRO (worst case).
> With a cache of 1 I'm guaranteed to save 59 alloc calls, 92%, right?
>
> That's why I'm saying - the larger cache would help workloads which
> don't GRO as much. Am I missing the point or how GRO works?
Maybe I'm missing something now :D
The driver receives 5 frames, so it allocates 5 skbs. GRO coalesces them
into one big, so the first one remains as an skb, the following 4 get
their data added as frags and then are moved to the NAPI cache
(%NAPI_GRO_FREE_STOLEN_HEAD).
After GRO decides it's enough for this skb, it gets moved to the pending
list to be flushed soon. @gro_normal_batch is usually 8, so it means
there can be up to 8....
Oh wait, Eric changed this to count segments, not skbs :D
...there can be up to 2* such skbs waiting for a flush (the first one
sets the counter to 5, the second adds 5 more => flush happens). So you
anyway would need at least 2* skb extensions cached, otherwise there
will be new allocations.
This is not counting fraglists, when GRO decides to fraglist an skb, it
requires at least 1 skb more. UDP fraglisted GRO (I know almost nobody
uses it, still it does exist) doesn't use frags at all and requires 1
skb per each segment.
You're right that the cache size of %NAPI_POLL_WEIGHT is needed only for
corner cases like big @gro_normal_batch, fraglists, UDP fraglisted GRO
and so on, still think we shouldn't ignore them :) Also this cache can
then be reused later to bulk-free extensions on Tx completion, just like
it's done for skbs.
* or less/more if customized by user, for example I set 16 on MIPS,
x86_64 works better with 8.
Thanks,
Olek
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2023-02-16 12:06 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-15 3:43 [PATCH net-next 0/3] net: skbuff: cache one skb_ext for use by GRO Jakub Kicinski
2023-02-15 3:43 ` [PATCH net-next 1/3] net: skb: carve the allocation out of skb_ext_add() Jakub Kicinski
2023-02-15 3:43 ` [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by GRO Jakub Kicinski
2023-02-15 8:41 ` Paolo Abeni
2023-02-15 17:45 ` Jakub Kicinski
2023-02-15 18:08 ` Alexander Lobakin
2023-02-15 19:08 ` Paolo Abeni
2023-02-15 15:37 ` Edward Cree
2023-02-15 16:17 ` Alexander Lobakin
2023-02-15 17:52 ` Jakub Kicinski
2023-02-15 18:01 ` Alexander Lobakin
2023-02-15 18:20 ` Jakub Kicinski
2023-02-16 12:04 ` Alexander Lobakin
2023-02-15 3:43 ` [PATCH net-next 3/3] net: create and use NAPI version of tc_skb_ext_alloc() Jakub Kicinski
2023-02-15 16:50 ` Jamal Hadi Salim
2023-02-15 17:03 ` Jiri Pirko
2023-02-15 18:36 ` Jamal Hadi Salim
2023-02-15 17:35 ` Jakub Kicinski
2023-02-15 18:38 ` Jamal Hadi Salim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).