* [PATCH v2 net-next] net: introduce a config option to tweak MAX_SKB_FRAGS
@ 2023-03-21 16:35 Eric Dumazet
2023-03-22 10:14 ` Nikolay Aleksandrov
2023-03-22 10:36 ` kernel test robot
0 siblings, 2 replies; 3+ messages in thread
From: Eric Dumazet @ 2023-03-21 16:35 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Eric Dumazet, Eric Dumazet
From: Eric Dumazet <edumazet@google.com>
Currently, MAX_SKB_FRAGS value is 17.
For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
attempts order-3 allocations, stuffing 32768 bytes per frag.
But with zero copy, we use order-0 pages.
For BIG TCP to show its full potential, we add a config option
to be able to fit up to 45 segments per skb.
This is also needed for BIG TCP rx zerocopy, as zerocopy currently
does not support skbs with frag list.
We have used MAX_SKB_FRAGS=45 value for years at Google before
we deployed 4K MTU, with no adverse effect, other than
a recent issue in mlx4, fixed in commit 26782aad00cc
("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS")
Back then, goal was to be able to receive full size (64KB) GRO
packets without the frag_list overhead.
Note that /proc/sys/net/core/max_skb_frags can also be used to limit
the number of fragments TCP can use in tx packets.
By default we keep the old/legacy value of 17 until we get
more coverage for the updated values.
Sizes of struct skb_shared_info on 64bit arches
MAX_SKB_FRAGS | sizeof(struct skb_shared_info):
==============================================
17 320
21 320+64 = 384
25 320+128 = 448
29 320+192 = 512
33 320+256 = 576
37 320+320 = 640
41 320+384 = 704
45 320+448 = 768
This inflation might cause problems for drivers assuming they could pack
both the incoming packet and skb_shared_info in half a page, using build_skb().
v2: fix two build errors assuming MAX_SKB_FRAGS was "unsigned long"
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
drivers/scsi/cxgbi/libcxgbi.c | 4 ++--
include/linux/skbuff.h | 14 ++------------
net/Kconfig | 12 ++++++++++++
net/packet/af_packet.c | 4 ++--
4 files changed, 18 insertions(+), 16 deletions(-)
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index af281e271f886041b397ea881e2ce7be00eff625..3e1de4c842cc6102e25a5972d6b11e05c3e4c060 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -2314,9 +2314,9 @@ static int cxgbi_sock_tx_queue_up(struct cxgbi_sock *csk, struct sk_buff *skb)
frags++;
if (frags >= SKB_WR_LIST_SIZE) {
- pr_err("csk 0x%p, frags %u, %u,%u >%lu.\n",
+ pr_err("csk 0x%p, frags %u, %u,%u >%u.\n",
csk, skb_shinfo(skb)->nr_frags, skb->len,
- skb->data_len, SKB_WR_LIST_SIZE);
+ skb->data_len, (unsigned int)SKB_WR_LIST_SIZE);
return -EINVAL;
}
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index fe661011644b8f468ff5e92075a6624f0557584c..43726ca7d20f232461a4d2e5b984032806e9c13e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -345,18 +345,8 @@ struct sk_buff_head {
struct sk_buff;
-/* To allow 64K frame to be packed as single skb without frag_list we
- * require 64K/PAGE_SIZE pages plus 1 additional page to allow for
- * buffers which do not start on a page boundary.
- *
- * Since GRO uses frags we allocate at least 16 regardless of page
- * size.
- */
-#if (65536/PAGE_SIZE + 1) < 16
-#define MAX_SKB_FRAGS 16UL
-#else
-#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1)
-#endif
+#define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
+
extern int sysctl_max_skb_frags;
/* Set skb_shinfo(skb)->gso_size to this in case you want skb_segment to
diff --git a/net/Kconfig b/net/Kconfig
index 48c33c2221999e575c83a409ab773b9cc3656eab..f806722bccf450c62e07bfdb245e5195ac4a156d 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -251,6 +251,18 @@ config PCPU_DEV_REFCNT
network device refcount are using per cpu variables if this option is set.
This can be forced to N to detect underflows (with a performance drop).
+config MAX_SKB_FRAGS
+ int "Maximum number of fragments per skb_shared_info"
+ range 17 45
+ default 17
+ help
+ Having more fragments per skb_shared_info can help GRO efficiency.
+ This helps BIG TCP workloads, but might expose bugs in some
+ legacy drivers.
+ This also increases memory overhead of small packets,
+ and in drivers using build_skb().
+ If unsure, say 17.
+
config RPS
bool
depends on SMP && SYSFS
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 497193f73030c385a2d33b71dfbc299fbf9b763d..568f8d76e3c124f3b322a8d88dc3dcfbc45e7c0e 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2622,8 +2622,8 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
nr_frags = skb_shinfo(skb)->nr_frags;
if (unlikely(nr_frags >= MAX_SKB_FRAGS)) {
- pr_err("Packet exceed the number of skb frags(%lu)\n",
- MAX_SKB_FRAGS);
+ pr_err("Packet exceed the number of skb frags(%u)\n",
+ (unsigned int)MAX_SKB_FRAGS);
return -EFAULT;
}
--
2.40.0.rc1.284.g88254d51c5-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2 net-next] net: introduce a config option to tweak MAX_SKB_FRAGS
2023-03-21 16:35 [PATCH v2 net-next] net: introduce a config option to tweak MAX_SKB_FRAGS Eric Dumazet
@ 2023-03-22 10:14 ` Nikolay Aleksandrov
2023-03-22 10:36 ` kernel test robot
1 sibling, 0 replies; 3+ messages in thread
From: Nikolay Aleksandrov @ 2023-03-22 10:14 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Eric Dumazet, daniel@iogearbox.net
On 21/03/2023 18:35, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Currently, MAX_SKB_FRAGS value is 17.
>
> For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
> attempts order-3 allocations, stuffing 32768 bytes per frag.
>
> But with zero copy, we use order-0 pages.
>
> For BIG TCP to show its full potential, we add a config option
> to be able to fit up to 45 segments per skb.
>
> This is also needed for BIG TCP rx zerocopy, as zerocopy currently
> does not support skbs with frag list.
>
> We have used MAX_SKB_FRAGS=45 value for years at Google before
> we deployed 4K MTU, with no adverse effect, other than
> a recent issue in mlx4, fixed in commit 26782aad00cc
> ("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS")
>
> Back then, goal was to be able to receive full size (64KB) GRO
> packets without the frag_list overhead.
>
> Note that /proc/sys/net/core/max_skb_frags can also be used to limit
> the number of fragments TCP can use in tx packets.
>
> By default we keep the old/legacy value of 17 until we get
> more coverage for the updated values.
>
> Sizes of struct skb_shared_info on 64bit arches
>
> MAX_SKB_FRAGS | sizeof(struct skb_shared_info):
> ==============================================
> 17 320
> 21 320+64 = 384
> 25 320+128 = 448
> 29 320+192 = 512
> 33 320+256 = 576
> 37 320+320 = 640
> 41 320+384 = 704
> 45 320+448 = 768
>
> This inflation might cause problems for drivers assuming they could pack
> both the incoming packet and skb_shared_info in half a page, using build_skb().
>
> v2: fix two build errors assuming MAX_SKB_FRAGS was "unsigned long"
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> drivers/scsi/cxgbi/libcxgbi.c | 4 ++--
> include/linux/skbuff.h | 14 ++------------
> net/Kconfig | 12 ++++++++++++
> net/packet/af_packet.c | 4 ++--
> 4 files changed, 18 insertions(+), 16 deletions(-)
>
Nice! I was statically increasing it for our datapath performance tests
w/ BIG TCP and zerocopy, had to implement custom header-data split
for mlx to get it all working but the improvements are impressive as
expected.
FWIW,
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2 net-next] net: introduce a config option to tweak MAX_SKB_FRAGS
2023-03-21 16:35 [PATCH v2 net-next] net: introduce a config option to tweak MAX_SKB_FRAGS Eric Dumazet
2023-03-22 10:14 ` Nikolay Aleksandrov
@ 2023-03-22 10:36 ` kernel test robot
1 sibling, 0 replies; 3+ messages in thread
From: kernel test robot @ 2023-03-22 10:36 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: oe-kbuild-all, netdev, Eric Dumazet
Hi Eric,
I love your patch! Yet something to improve:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/net-introduce-a-config-option-to-tweak-MAX_SKB_FRAGS/20230322-003641
patch link: https://lore.kernel.org/r/20230321163550.1574254-1-eric.dumazet%40gmail.com
patch subject: [PATCH v2 net-next] net: introduce a config option to tweak MAX_SKB_FRAGS
config: ia64-randconfig-r005-20230322 (https://download.01.org/0day-ci/archive/20230322/202303221833.CjbkODlQ-lkp@intel.com/config)
compiler: ia64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/69776fdcf56a3d545d8b37c25829fcadec2d9144
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Eric-Dumazet/net-introduce-a-config-option-to-tweak-MAX_SKB_FRAGS/20230322-003641
git checkout 69776fdcf56a3d545d8b37c25829fcadec2d9144
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=ia64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=ia64 SHELL=/bin/bash kernel/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303221833.CjbkODlQ-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from include/linux/filter.h:12,
from include/linux/bpf_verifier.h:9,
from kernel/bpf/btf.c:19:
include/linux/skbuff.h:348:23: error: 'CONFIG_MAX_SKB_FRAGS' undeclared here (not in a function); did you mean 'MAX_SKB_FRAGS'?
348 | #define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
| ^~~~~~~~~~~~~~~~~~~~
include/linux/skbuff.h:593:31: note: in expansion of macro 'MAX_SKB_FRAGS'
593 | skb_frag_t frags[MAX_SKB_FRAGS];
| ^~~~~~~~~~~~~
include/linux/skbuff.h: In function '__skb_fill_page_desc_noacc':
include/linux/skbuff.h:2392:51: warning: parameter 'i' set but not used [-Wunused-but-set-parameter]
2392 | int i, struct page *page,
| ~~~~^
include/linux/skbuff.h: In function 'skb_frag_ref':
include/linux/skbuff.h:3380:58: warning: parameter 'f' set but not used [-Wunused-but-set-parameter]
3380 | static inline void skb_frag_ref(struct sk_buff *skb, int f)
| ~~~~^
include/linux/skbuff.h: In function 'skb_frag_unref':
include/linux/skbuff.h:3411:60: warning: parameter 'f' set but not used [-Wunused-but-set-parameter]
3411 | static inline void skb_frag_unref(struct sk_buff *skb, int f)
| ~~~~^
include/linux/skbuff.h: In function 'skb_frag_set_page':
include/linux/skbuff.h:3478:63: warning: parameter 'f' set but not used [-Wunused-but-set-parameter]
3478 | static inline void skb_frag_set_page(struct sk_buff *skb, int f,
| ~~~~^
In file included from <command-line>:
include/linux/skmsg.h: In function 'sk_msg_init':
>> include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
| ^
include/linux/compiler_types.h:377:23: note: in definition of macro '__compiletime_assert'
377 | if (!(condition)) \
| ^~~~~~~~~
include/linux/compiler_types.h:397:9: note: in expansion of macro '_compiletime_assert'
397 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
| ^~~~~~~~~~~~~~~~
include/linux/skmsg.h:177:9: note: in expansion of macro 'BUILD_BUG_ON'
177 | BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != NR_MSG_FRAG_IDS);
| ^~~~~~~~~~~~
include/linux/compiler.h:232:33: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
232 | #define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
| ^~~~~~~~~~~~~~~~~
include/linux/kernel.h:55:59: note: in expansion of macro '__must_be_array'
55 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
| ^~~~~~~~~~~~~~~
include/linux/skmsg.h:177:22: note: in expansion of macro 'ARRAY_SIZE'
177 | BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != NR_MSG_FRAG_IDS);
| ^~~~~~~~~~
In file included from kernel/bpf/btf.c:23:
include/linux/skmsg.h: In function 'sk_msg_xfer':
include/linux/skmsg.h:183:36: warning: parameter 'which' set but not used [-Wunused-but-set-parameter]
183 | int which, u32 size)
| ~~~~^~~~~
include/linux/skmsg.h: In function 'sk_msg_elem':
include/linux/skmsg.h:209:71: warning: parameter 'which' set but not used [-Wunused-but-set-parameter]
209 | static inline struct scatterlist *sk_msg_elem(struct sk_msg *msg, int which)
| ~~~~^~~~~
include/linux/skmsg.h: In function 'sk_msg_elem_cpy':
include/linux/skmsg.h:214:74: warning: parameter 'which' set but not used [-Wunused-but-set-parameter]
214 | static inline struct scatterlist sk_msg_elem_cpy(struct sk_msg *msg, int which)
| ~~~~^~~~~
kernel/bpf/btf.c: In function 'btf_seq_show':
kernel/bpf/btf.c:7101:29: warning: function 'btf_seq_show' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
7101 | seq_vprintf((struct seq_file *)show->target, fmt, args);
| ^~~~~~~~
kernel/bpf/btf.c: In function 'btf_snprintf_show':
kernel/bpf/btf.c:7138:9: warning: function 'btf_snprintf_show' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
7138 | len = vsnprintf(show->target, ssnprintf->len_left, fmt, args);
| ^~~
vim +16 include/linux/build_bug.h
bc6245e5efd70c Ian Abbott 2017-07-10 6
bc6245e5efd70c Ian Abbott 2017-07-10 7 #ifdef __CHECKER__
bc6245e5efd70c Ian Abbott 2017-07-10 8 #define BUILD_BUG_ON_ZERO(e) (0)
bc6245e5efd70c Ian Abbott 2017-07-10 9 #else /* __CHECKER__ */
bc6245e5efd70c Ian Abbott 2017-07-10 10 /*
bc6245e5efd70c Ian Abbott 2017-07-10 11 * Force a compilation error if condition is true, but also produce a
8788994376d84d Rikard Falkeborn 2019-12-04 12 * result (of value 0 and type int), so the expression can be used
bc6245e5efd70c Ian Abbott 2017-07-10 13 * e.g. in a structure initializer (or where-ever else comma expressions
bc6245e5efd70c Ian Abbott 2017-07-10 14 * aren't permitted).
bc6245e5efd70c Ian Abbott 2017-07-10 15 */
8788994376d84d Rikard Falkeborn 2019-12-04 @16 #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
527edbc18a70e7 Masahiro Yamada 2019-01-03 17 #endif /* __CHECKER__ */
527edbc18a70e7 Masahiro Yamada 2019-01-03 18
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-03-22 10:37 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-21 16:35 [PATCH v2 net-next] net: introduce a config option to tweak MAX_SKB_FRAGS Eric Dumazet
2023-03-22 10:14 ` Nikolay Aleksandrov
2023-03-22 10:36 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).