* [PATCH net-next 0/2] net: xsk: add two sysctl knobs @ 2025-06-17 0:22 Jason Xing 2025-06-17 0:22 ` [PATCH net-next 1/2] net: xsk: make MAX_PER_SOCKET_BUDGET tunable Jason Xing ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Jason Xing @ 2025-06-17 0:22 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend Cc: bpf, netdev, Jason Xing From: Jason Xing <kernelxing@tencent.com> Introduce a control method in the xsk path to let users have the chance to tune it manually. Jason Xing (2): net: xsk: make MAX_PER_SOCKET_BUDGET tunable net: xsk: make xsk_tx_batch_size tunable include/net/hotdata.h | 2 ++ net/core/hotdata.c | 4 +++- net/core/sysctl_net_core.c | 15 +++++++++++++++ net/xdp/xsk.c | 10 +++++----- 4 files changed, 25 insertions(+), 6 deletions(-) -- 2.43.5 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next 1/2] net: xsk: make MAX_PER_SOCKET_BUDGET tunable 2025-06-17 0:22 [PATCH net-next 0/2] net: xsk: add two sysctl knobs Jason Xing @ 2025-06-17 0:22 ` Jason Xing 2025-06-17 0:22 ` [PATCH net-next 2/2] net: xsk: make xsk_tx_batch_size tunable Jason Xing 2025-06-17 1:11 ` [PATCH net-next 0/2] net: xsk: add two sysctl knobs Stanislav Fomichev 2 siblings, 0 replies; 12+ messages in thread From: Jason Xing @ 2025-06-17 0:22 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend Cc: bpf, netdev, Jason Xing From: Jason Xing <kernelxing@tencent.com> Make MAX_PER_SOCKET_BUDGET tunable and let users decide how many descs they expect to peek at one time. Signed-off-by: Jason Xing <kernelxing@tencent.com> --- include/net/hotdata.h | 1 + net/core/hotdata.c | 3 ++- net/core/sysctl_net_core.c | 7 +++++++ net/xdp/xsk.c | 6 ++++-- 4 files changed, 14 insertions(+), 3 deletions(-) diff --git a/include/net/hotdata.h b/include/net/hotdata.h index fda94b2647ff..2df1e8175a85 100644 --- a/include/net/hotdata.h +++ b/include/net/hotdata.h @@ -40,6 +40,7 @@ struct net_hotdata { int sysctl_max_skb_frags; int sysctl_skb_defer_max; int sysctl_mem_pcpu_rsv; + int sysctl_xsk_max_per_socket_budget; }; #define inet_ehash_secret net_hotdata.tcp_protocol.secret diff --git a/net/core/hotdata.c b/net/core/hotdata.c index 0bc893d5f07b..5131714eee63 100644 --- a/net/core/hotdata.c +++ b/net/core/hotdata.c @@ -19,6 +19,7 @@ struct net_hotdata net_hotdata __cacheline_aligned = { .dev_rx_weight = 64, .sysctl_max_skb_frags = MAX_SKB_FRAGS, .sysctl_skb_defer_max = 64, - .sysctl_mem_pcpu_rsv = SK_MEMORY_PCPU_RESERVE + .sysctl_mem_pcpu_rsv = SK_MEMORY_PCPU_RESERVE, + .sysctl_xsk_max_per_socket_budget = 32 }; EXPORT_SYMBOL(net_hotdata); diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 5dbb2c6f371d..9f9946b7ffc0 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -640,6 +640,13 @@ static struct ctl_table net_core_table[] = { .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, }, + { + .procname = "xsk_max_per_socket_budget", + .data = &net_hotdata.sysctl_xsk_max_per_socket_budget, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, }; static struct ctl_table netns_core_table[] = { diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 72c000c0ae5f..95027f964858 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -28,13 +28,13 @@ #include <net/netdev_lock.h> #include <net/netdev_rx_queue.h> #include <net/xdp.h> +#include <net/hotdata.h> #include "xsk_queue.h" #include "xdp_umem.h" #include "xsk.h" #define TX_BATCH_SIZE 32 -#define MAX_PER_SOCKET_BUDGET (TX_BATCH_SIZE) void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) { @@ -424,7 +424,9 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) rcu_read_lock(); again: list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { - if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) { + int max_budget = READ_ONCE(net_hotdata.sysctl_xsk_max_per_socket_budget); + + if (xs->tx_budget_spent >= max_budget) { budget_exhausted = true; continue; } -- 2.43.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next 2/2] net: xsk: make xsk_tx_batch_size tunable 2025-06-17 0:22 [PATCH net-next 0/2] net: xsk: add two sysctl knobs Jason Xing 2025-06-17 0:22 ` [PATCH net-next 1/2] net: xsk: make MAX_PER_SOCKET_BUDGET tunable Jason Xing @ 2025-06-17 0:22 ` Jason Xing 2025-06-17 1:11 ` [PATCH net-next 0/2] net: xsk: add two sysctl knobs Stanislav Fomichev 2 siblings, 0 replies; 12+ messages in thread From: Jason Xing @ 2025-06-17 0:22 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend Cc: bpf, netdev, Jason Xing From: Jason Xing <kernelxing@tencent.com> In some cases, users probably expect to see xsk_generic_xmit() can handle more desc at one time. Make it tunable and it would be similar with how dev_tx_weight works in xsk path. Signed-off-by: Jason Xing <kernelxing@tencent.com> --- include/net/hotdata.h | 1 + net/core/hotdata.c | 3 ++- net/core/sysctl_net_core.c | 8 ++++++++ net/xdp/xsk.c | 4 +--- 4 files changed, 12 insertions(+), 4 deletions(-) diff --git a/include/net/hotdata.h b/include/net/hotdata.h index 2df1e8175a85..862bd168fdd8 100644 --- a/include/net/hotdata.h +++ b/include/net/hotdata.h @@ -41,6 +41,7 @@ struct net_hotdata { int sysctl_skb_defer_max; int sysctl_mem_pcpu_rsv; int sysctl_xsk_max_per_socket_budget; + int sysctl_xsk_tx_batch_size; }; #define inet_ehash_secret net_hotdata.tcp_protocol.secret diff --git a/net/core/hotdata.c b/net/core/hotdata.c index 5131714eee63..e08fd0ee1b05 100644 --- a/net/core/hotdata.c +++ b/net/core/hotdata.c @@ -20,6 +20,7 @@ struct net_hotdata net_hotdata __cacheline_aligned = { .sysctl_max_skb_frags = MAX_SKB_FRAGS, .sysctl_skb_defer_max = 64, .sysctl_mem_pcpu_rsv = SK_MEMORY_PCPU_RESERVE, - .sysctl_xsk_max_per_socket_budget = 32 + .sysctl_xsk_max_per_socket_budget = 32, + .sysctl_xsk_tx_batch_size = 32 }; EXPORT_SYMBOL(net_hotdata); diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 9f9946b7ffc0..73be2d9e6bab 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -647,6 +647,14 @@ static struct ctl_table net_core_table[] = { .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "xsk_tx_batch_size", + .data = &net_hotdata.sysctl_xsk_tx_batch_size, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, + }; static struct ctl_table netns_core_table[] = { diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 95027f964858..4215a40abb69 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -34,8 +34,6 @@ #include "xdp_umem.h" #include "xsk.h" -#define TX_BATCH_SIZE 32 - void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) { if (pool->cached_need_wakeup & XDP_WAKEUP_RX) @@ -780,8 +778,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, static int __xsk_generic_xmit(struct sock *sk) { + u32 max_batch = READ_ONCE(net_hotdata.sysctl_xsk_tx_batch_size); struct xdp_sock *xs = xdp_sk(sk); - u32 max_batch = TX_BATCH_SIZE; bool sent_frame = false; struct xdp_desc desc; struct sk_buff *skb; -- 2.43.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs 2025-06-17 0:22 [PATCH net-next 0/2] net: xsk: add two sysctl knobs Jason Xing 2025-06-17 0:22 ` [PATCH net-next 1/2] net: xsk: make MAX_PER_SOCKET_BUDGET tunable Jason Xing 2025-06-17 0:22 ` [PATCH net-next 2/2] net: xsk: make xsk_tx_batch_size tunable Jason Xing @ 2025-06-17 1:11 ` Stanislav Fomichev 2025-06-17 2:15 ` Jason Xing 2 siblings, 1 reply; 12+ messages in thread From: Stanislav Fomichev @ 2025-06-17 1:11 UTC (permalink / raw) To: Jason Xing Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf, netdev, Jason Xing On 06/17, Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > Introduce a control method in the xsk path to let users have the chance > to tune it manually. Can you expand more on why the defaults don't work for you? Also, can we put these settings into the socket instead of (global/ns) sysctl? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs 2025-06-17 1:11 ` [PATCH net-next 0/2] net: xsk: add two sysctl knobs Stanislav Fomichev @ 2025-06-17 2:15 ` Jason Xing 2025-06-17 17:46 ` Stanislav Fomichev 2025-06-18 17:57 ` Willem de Bruijn 0 siblings, 2 replies; 12+ messages in thread From: Jason Xing @ 2025-06-17 2:15 UTC (permalink / raw) To: Stanislav Fomichev Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf, netdev, Jason Xing Hi Stanislav, On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > On 06/17, Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > Introduce a control method in the xsk path to let users have the chance > > to tune it manually. > > Can you expand more on why the defaults don't work for you? We use a user-level tcp stack with xsk to transmit packets that have higher priorities than other normal kernel tcp flows. It turns out that enlarging the number can minimize times of triggering sendto sysctl, which contributes to faster transmission. it's very easy to hit the upper bound (namely, 32) if you log the return value of sendto. I mentioned a bit about this in the second patch, saying that we can have a similar knob already appearing in the qdisc layer. Furthermore, exposing important parameters can help applications complete their AI/auto-tuning to judge which one is the best fit in their production workload. That is also one of the promising tendencies :) > > Also, can we put these settings into the socket instead of (global/ns) > sysctl? As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its corresponding netns? I have no strong opinion on this point for now. Thanks, Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs 2025-06-17 2:15 ` Jason Xing @ 2025-06-17 17:46 ` Stanislav Fomichev 2025-06-17 19:17 ` Joe Damato 2025-06-18 0:29 ` Jason Xing 2025-06-18 17:57 ` Willem de Bruijn 1 sibling, 2 replies; 12+ messages in thread From: Stanislav Fomichev @ 2025-06-17 17:46 UTC (permalink / raw) To: Jason Xing Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf, netdev, Jason Xing On 06/17, Jason Xing wrote: > Hi Stanislav, > > On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > On 06/17, Jason Xing wrote: > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > Introduce a control method in the xsk path to let users have the chance > > > to tune it manually. > > > > Can you expand more on why the defaults don't work for you? > > We use a user-level tcp stack with xsk to transmit packets that have > higher priorities than other normal kernel tcp flows. It turns out > that enlarging the number can minimize times of triggering sendto > sysctl, which contributes to faster transmission. it's very easy to > hit the upper bound (namely, 32) if you log the return value of > sendto. I mentioned a bit about this in the second patch, saying that > we can have a similar knob already appearing in the qdisc layer. > Furthermore, exposing important parameters can help applications > complete their AI/auto-tuning to judge which one is the best fit in > their production workload. That is also one of the promising > tendencies :) > > > > > Also, can we put these settings into the socket instead of (global/ns) > > sysctl? > > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its > corresponding netns? I have no strong opinion on this point for now. I'm suggesting something along these lines (see below). And then add some way to configure it (plus, obviously, set the default value on init). There is also a question on whether you need separate values for MAX_PER_SOCKET_BUDGET and TX_BATCH_SIZE, and if yes, then why. diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 72c000c0ae5f..fb2caec9914d 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -424,7 +424,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) rcu_read_lock(); again: list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { - if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) { + if (xs->tx_budget_spent >= xs->max_tx_budget) { budget_exhausted = true; continue; } @@ -779,7 +779,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, static int __xsk_generic_xmit(struct sock *sk) { struct xdp_sock *xs = xdp_sk(sk); - u32 max_batch = TX_BATCH_SIZE; bool sent_frame = false; struct xdp_desc desc; struct sk_buff *skb; @@ -797,7 +796,7 @@ static int __xsk_generic_xmit(struct sock *sk) goto out; while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { - if (max_batch-- == 0) { + if (xs->max_tx_budget-- == 0) { err = -EAGAIN; goto out; } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs 2025-06-17 17:46 ` Stanislav Fomichev @ 2025-06-17 19:17 ` Joe Damato 2025-06-18 0:31 ` Jason Xing 2025-06-18 0:29 ` Jason Xing 1 sibling, 1 reply; 12+ messages in thread From: Joe Damato @ 2025-06-17 19:17 UTC (permalink / raw) To: Stanislav Fomichev Cc: Jason Xing, davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf, netdev, Jason Xing On Tue, Jun 17, 2025 at 10:46:26AM -0700, Stanislav Fomichev wrote: > On 06/17, Jason Xing wrote: > > > > > > Also, can we put these settings into the socket instead of (global/ns) > > > sysctl? > > > > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its > > corresponding netns? I have no strong opinion on this point for now. > > I'm suggesting something along these lines (see below). And then add > some way to configure it (plus, obviously, set the default value > on init). +1 from me on making this per-socket instead of global with sysfs. I feel like the direction networking has taken lately (netdev-genl, for example) is to make things more granular instead of global. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs 2025-06-17 19:17 ` Joe Damato @ 2025-06-18 0:31 ` Jason Xing 0 siblings, 0 replies; 12+ messages in thread From: Jason Xing @ 2025-06-18 0:31 UTC (permalink / raw) To: Joe Damato, Stanislav Fomichev, Jason Xing, davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf, netdev, Jason Xing On Wed, Jun 18, 2025 at 3:17 AM Joe Damato <joe@dama.to> wrote: > > On Tue, Jun 17, 2025 at 10:46:26AM -0700, Stanislav Fomichev wrote: > > On 06/17, Jason Xing wrote: > > > > > > > > > Also, can we put these settings into the socket instead of (global/ns) > > > > sysctl? > > > > > > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its > > > corresponding netns? I have no strong opinion on this point for now. > > > > I'm suggesting something along these lines (see below). And then add > > some way to configure it (plus, obviously, set the default value > > on init). > > +1 from me on making this per-socket instead of global with sysfs. > > I feel like the direction networking has taken lately (netdev-genl, for > example) is to make things more granular instead of global. Agree. Previously I missed using sock_net(sk)->core->xxx to make it namespecified :( Thanks, Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs 2025-06-17 17:46 ` Stanislav Fomichev 2025-06-17 19:17 ` Joe Damato @ 2025-06-18 0:29 ` Jason Xing 2025-06-18 6:59 ` Jason Xing 1 sibling, 1 reply; 12+ messages in thread From: Jason Xing @ 2025-06-18 0:29 UTC (permalink / raw) To: Stanislav Fomichev Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf, netdev, Jason Xing On Wed, Jun 18, 2025 at 1:46 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > On 06/17, Jason Xing wrote: > > Hi Stanislav, > > > > On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > > > On 06/17, Jason Xing wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > Introduce a control method in the xsk path to let users have the chance > > > > to tune it manually. > > > > > > Can you expand more on why the defaults don't work for you? > > > > We use a user-level tcp stack with xsk to transmit packets that have > > higher priorities than other normal kernel tcp flows. It turns out > > that enlarging the number can minimize times of triggering sendto > > sysctl, which contributes to faster transmission. it's very easy to > > hit the upper bound (namely, 32) if you log the return value of > > sendto. I mentioned a bit about this in the second patch, saying that > > we can have a similar knob already appearing in the qdisc layer. > > Furthermore, exposing important parameters can help applications > > complete their AI/auto-tuning to judge which one is the best fit in > > their production workload. That is also one of the promising > > tendencies :) > > > > > > > > Also, can we put these settings into the socket instead of (global/ns) > > > sysctl? > > > > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its > > corresponding netns? I have no strong opinion on this point for now. To add to that, after digging into this part, I realized that we're able to use sock_net(sk)->core.max_tx_budget directly to finish the namespace stuff because xsk_create() calls sk_alloc() which correlates its netns in the sk->sk_net. Sounds reasonable? > > I'm suggesting something along these lines (see below). And then add > some way to configure it (plus, obviously, set the default value > on init). There is also a question on whether you need separate > values for MAX_PER_SOCKET_BUDGET and TX_BATCH_SIZE, and if yes, For now, actually I don't see a specific reason to separate them, so let me use a single one in V2. My use case only expects to see the TX_BATCH_SIZE adjustment. > then why. > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index 72c000c0ae5f..fb2caec9914d 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -424,7 +424,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) > rcu_read_lock(); > again: > list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { > - if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) { > + if (xs->tx_budget_spent >= xs->max_tx_budget) { If we implement it like this, xs->max_tx_budget has to read a per-netns from somewhere and then initialize it. The core problem still remains: where to store the per netns value. Do you think using the aforementioned sock_net(sk)->core.max_tx_budget is possible? Thanks, Jason > budget_exhausted = true; > continue; > } > @@ -779,7 +779,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > static int __xsk_generic_xmit(struct sock *sk) > { > struct xdp_sock *xs = xdp_sk(sk); > - u32 max_batch = TX_BATCH_SIZE; > bool sent_frame = false; > struct xdp_desc desc; > struct sk_buff *skb; > @@ -797,7 +796,7 @@ static int __xsk_generic_xmit(struct sock *sk) > goto out; > > while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { > - if (max_batch-- == 0) { > + if (xs->max_tx_budget-- == 0) { > err = -EAGAIN; > goto out; > } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs 2025-06-18 0:29 ` Jason Xing @ 2025-06-18 6:59 ` Jason Xing 0 siblings, 0 replies; 12+ messages in thread From: Jason Xing @ 2025-06-18 6:59 UTC (permalink / raw) To: Stanislav Fomichev Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf, netdev, Jason Xing On Wed, Jun 18, 2025 at 8:29 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Wed, Jun 18, 2025 at 1:46 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > On 06/17, Jason Xing wrote: > > > Hi Stanislav, > > > > > > On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > > > > > On 06/17, Jason Xing wrote: > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > Introduce a control method in the xsk path to let users have the chance > > > > > to tune it manually. > > > > > > > > Can you expand more on why the defaults don't work for you? > > > > > > We use a user-level tcp stack with xsk to transmit packets that have > > > higher priorities than other normal kernel tcp flows. It turns out > > > that enlarging the number can minimize times of triggering sendto > > > sysctl, which contributes to faster transmission. it's very easy to > > > hit the upper bound (namely, 32) if you log the return value of > > > sendto. I mentioned a bit about this in the second patch, saying that > > > we can have a similar knob already appearing in the qdisc layer. > > > Furthermore, exposing important parameters can help applications > > > complete their AI/auto-tuning to judge which one is the best fit in > > > their production workload. That is also one of the promising > > > tendencies :) > > > > > > > > > > > Also, can we put these settings into the socket instead of (global/ns) > > > > sysctl? > > > > > > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its > > > corresponding netns? I have no strong opinion on this point for now. > > To add to that, after digging into this part, I realized that we're > able to use sock_net(sk)->core.max_tx_budget directly to finish the > namespace stuff because xsk_create() calls sk_alloc() which correlates > its netns in the sk->sk_net. Sounds reasonable? Updated patch here: https://lore.kernel.org/all/20250618065553.96822-1-kerneljasonxing@gmail.com/ Please review :0 Thanks, Jason > > > > > I'm suggesting something along these lines (see below). And then add > > some way to configure it (plus, obviously, set the default value > > on init). There is also a question on whether you need separate > > values for MAX_PER_SOCKET_BUDGET and TX_BATCH_SIZE, and if yes, > > For now, actually I don't see a specific reason to separate them, so > let me use a single one in V2. My use case only expects to see the > TX_BATCH_SIZE adjustment. > > > then why. > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > index 72c000c0ae5f..fb2caec9914d 100644 > > --- a/net/xdp/xsk.c > > +++ b/net/xdp/xsk.c > > @@ -424,7 +424,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) > > rcu_read_lock(); > > again: > > list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { > > - if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) { > > + if (xs->tx_budget_spent >= xs->max_tx_budget) { > > If we implement it like this, xs->max_tx_budget has to read a > per-netns from somewhere and then initialize it. The core problem > still remains: where to store the per netns value. > > Do you think using the aforementioned sock_net(sk)->core.max_tx_budget > is possible? > > Thanks, > Jason > > > budget_exhausted = true; > > continue; > > } > > @@ -779,7 +779,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > > static int __xsk_generic_xmit(struct sock *sk) > > { > > struct xdp_sock *xs = xdp_sk(sk); > > - u32 max_batch = TX_BATCH_SIZE; > > bool sent_frame = false; > > struct xdp_desc desc; > > struct sk_buff *skb; > > @@ -797,7 +796,7 @@ static int __xsk_generic_xmit(struct sock *sk) > > goto out; > > > > while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { > > - if (max_batch-- == 0) { > > + if (xs->max_tx_budget-- == 0) { > > err = -EAGAIN; > > goto out; > > } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs 2025-06-17 2:15 ` Jason Xing 2025-06-17 17:46 ` Stanislav Fomichev @ 2025-06-18 17:57 ` Willem de Bruijn 2025-06-18 19:24 ` Jason Xing 1 sibling, 1 reply; 12+ messages in thread From: Willem de Bruijn @ 2025-06-18 17:57 UTC (permalink / raw) To: Jason Xing, Stanislav Fomichev Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf, netdev, Jason Xing Jason Xing wrote: > Hi Stanislav, > > On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > On 06/17, Jason Xing wrote: > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > Introduce a control method in the xsk path to let users have the chance > > > to tune it manually. > > > > Can you expand more on why the defaults don't work for you? > > We use a user-level tcp stack with xsk to transmit packets that have > higher priorities than other normal kernel tcp flows. It turns out > that enlarging the number can minimize times of triggering sendto > sysctl, which contributes to faster transmission. it's very easy to > hit the upper bound (namely, 32) if you log the return value of > sendto. I mentioned a bit about this in the second patch, saying that > we can have a similar knob already appearing in the qdisc layer. > Furthermore, exposing important parameters can help applications > complete their AI/auto-tuning to judge which one is the best fit in > their production workload. That is also one of the promising > tendencies :) It would be informative to include this in the commit. Or more broadly: suggestions for when and how to pick good settings for these new tunables. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs 2025-06-18 17:57 ` Willem de Bruijn @ 2025-06-18 19:24 ` Jason Xing 0 siblings, 0 replies; 12+ messages in thread From: Jason Xing @ 2025-06-18 19:24 UTC (permalink / raw) To: Willem de Bruijn Cc: Stanislav Fomichev, davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf, netdev, Jason Xing On Thu, Jun 19, 2025 at 1:58 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > Hi Stanislav, > > > > On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > > > On 06/17, Jason Xing wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > Introduce a control method in the xsk path to let users have the chance > > > > to tune it manually. > > > > > > Can you expand more on why the defaults don't work for you? > > > > We use a user-level tcp stack with xsk to transmit packets that have > > higher priorities than other normal kernel tcp flows. It turns out > > that enlarging the number can minimize times of triggering sendto > > sysctl, which contributes to faster transmission. it's very easy to > > hit the upper bound (namely, 32) if you log the return value of > > sendto. I mentioned a bit about this in the second patch, saying that > > we can have a similar knob already appearing in the qdisc layer. > > Furthermore, exposing important parameters can help applications > > complete their AI/auto-tuning to judge which one is the best fit in > > their production workload. That is also one of the promising > > tendencies :) > > It would be informative to include this in the commit. Sure, I will add more in V3. > > Or more broadly: suggestions for when and how to pick good settings > for these new tunables. I'm not sure what is a good setting in general. But I will try to describe how I use it and what my understanding of it is in the commit message. Thanks, Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-06-18 19:25 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-17 0:22 [PATCH net-next 0/2] net: xsk: add two sysctl knobs Jason Xing 2025-06-17 0:22 ` [PATCH net-next 1/2] net: xsk: make MAX_PER_SOCKET_BUDGET tunable Jason Xing 2025-06-17 0:22 ` [PATCH net-next 2/2] net: xsk: make xsk_tx_batch_size tunable Jason Xing 2025-06-17 1:11 ` [PATCH net-next 0/2] net: xsk: add two sysctl knobs Stanislav Fomichev 2025-06-17 2:15 ` Jason Xing 2025-06-17 17:46 ` Stanislav Fomichev 2025-06-17 19:17 ` Joe Damato 2025-06-18 0:31 ` Jason Xing 2025-06-18 0:29 ` Jason Xing 2025-06-18 6:59 ` Jason Xing 2025-06-18 17:57 ` Willem de Bruijn 2025-06-18 19:24 ` Jason Xing
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).