* [PATCH net-next 0/2] net: xsk: add two sysctl knobs
@ 2025-06-17 0:22 Jason Xing
2025-06-17 0:22 ` [PATCH net-next 1/2] net: xsk: make MAX_PER_SOCKET_BUDGET tunable Jason Xing
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Jason Xing @ 2025-06-17 0:22 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
Introduce a control method in the xsk path to let users have the chance
to tune it manually.
Jason Xing (2):
net: xsk: make MAX_PER_SOCKET_BUDGET tunable
net: xsk: make xsk_tx_batch_size tunable
include/net/hotdata.h | 2 ++
net/core/hotdata.c | 4 +++-
net/core/sysctl_net_core.c | 15 +++++++++++++++
net/xdp/xsk.c | 10 +++++-----
4 files changed, 25 insertions(+), 6 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next 1/2] net: xsk: make MAX_PER_SOCKET_BUDGET tunable
2025-06-17 0:22 [PATCH net-next 0/2] net: xsk: add two sysctl knobs Jason Xing
@ 2025-06-17 0:22 ` Jason Xing
2025-06-17 0:22 ` [PATCH net-next 2/2] net: xsk: make xsk_tx_batch_size tunable Jason Xing
2025-06-17 1:11 ` [PATCH net-next 0/2] net: xsk: add two sysctl knobs Stanislav Fomichev
2 siblings, 0 replies; 12+ messages in thread
From: Jason Xing @ 2025-06-17 0:22 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
Make MAX_PER_SOCKET_BUDGET tunable and let users decide how many descs
they expect to peek at one time.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
include/net/hotdata.h | 1 +
net/core/hotdata.c | 3 ++-
net/core/sysctl_net_core.c | 7 +++++++
net/xdp/xsk.c | 6 ++++--
4 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/include/net/hotdata.h b/include/net/hotdata.h
index fda94b2647ff..2df1e8175a85 100644
--- a/include/net/hotdata.h
+++ b/include/net/hotdata.h
@@ -40,6 +40,7 @@ struct net_hotdata {
int sysctl_max_skb_frags;
int sysctl_skb_defer_max;
int sysctl_mem_pcpu_rsv;
+ int sysctl_xsk_max_per_socket_budget;
};
#define inet_ehash_secret net_hotdata.tcp_protocol.secret
diff --git a/net/core/hotdata.c b/net/core/hotdata.c
index 0bc893d5f07b..5131714eee63 100644
--- a/net/core/hotdata.c
+++ b/net/core/hotdata.c
@@ -19,6 +19,7 @@ struct net_hotdata net_hotdata __cacheline_aligned = {
.dev_rx_weight = 64,
.sysctl_max_skb_frags = MAX_SKB_FRAGS,
.sysctl_skb_defer_max = 64,
- .sysctl_mem_pcpu_rsv = SK_MEMORY_PCPU_RESERVE
+ .sysctl_mem_pcpu_rsv = SK_MEMORY_PCPU_RESERVE,
+ .sysctl_xsk_max_per_socket_budget = 32
};
EXPORT_SYMBOL(net_hotdata);
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 5dbb2c6f371d..9f9946b7ffc0 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -640,6 +640,13 @@ static struct ctl_table net_core_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
},
+ {
+ .procname = "xsk_max_per_socket_budget",
+ .data = &net_hotdata.sysctl_xsk_max_per_socket_budget,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
};
static struct ctl_table netns_core_table[] = {
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 72c000c0ae5f..95027f964858 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -28,13 +28,13 @@
#include <net/netdev_lock.h>
#include <net/netdev_rx_queue.h>
#include <net/xdp.h>
+#include <net/hotdata.h>
#include "xsk_queue.h"
#include "xdp_umem.h"
#include "xsk.h"
#define TX_BATCH_SIZE 32
-#define MAX_PER_SOCKET_BUDGET (TX_BATCH_SIZE)
void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool)
{
@@ -424,7 +424,9 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
rcu_read_lock();
again:
list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
- if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
+ int max_budget = READ_ONCE(net_hotdata.sysctl_xsk_max_per_socket_budget);
+
+ if (xs->tx_budget_spent >= max_budget) {
budget_exhausted = true;
continue;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next 2/2] net: xsk: make xsk_tx_batch_size tunable
2025-06-17 0:22 [PATCH net-next 0/2] net: xsk: add two sysctl knobs Jason Xing
2025-06-17 0:22 ` [PATCH net-next 1/2] net: xsk: make MAX_PER_SOCKET_BUDGET tunable Jason Xing
@ 2025-06-17 0:22 ` Jason Xing
2025-06-17 1:11 ` [PATCH net-next 0/2] net: xsk: add two sysctl knobs Stanislav Fomichev
2 siblings, 0 replies; 12+ messages in thread
From: Jason Xing @ 2025-06-17 0:22 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
In some cases, users probably expect to see xsk_generic_xmit() can
handle more desc at one time. Make it tunable and it would be similar
with how dev_tx_weight works in xsk path.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
include/net/hotdata.h | 1 +
net/core/hotdata.c | 3 ++-
net/core/sysctl_net_core.c | 8 ++++++++
net/xdp/xsk.c | 4 +---
4 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/include/net/hotdata.h b/include/net/hotdata.h
index 2df1e8175a85..862bd168fdd8 100644
--- a/include/net/hotdata.h
+++ b/include/net/hotdata.h
@@ -41,6 +41,7 @@ struct net_hotdata {
int sysctl_skb_defer_max;
int sysctl_mem_pcpu_rsv;
int sysctl_xsk_max_per_socket_budget;
+ int sysctl_xsk_tx_batch_size;
};
#define inet_ehash_secret net_hotdata.tcp_protocol.secret
diff --git a/net/core/hotdata.c b/net/core/hotdata.c
index 5131714eee63..e08fd0ee1b05 100644
--- a/net/core/hotdata.c
+++ b/net/core/hotdata.c
@@ -20,6 +20,7 @@ struct net_hotdata net_hotdata __cacheline_aligned = {
.sysctl_max_skb_frags = MAX_SKB_FRAGS,
.sysctl_skb_defer_max = 64,
.sysctl_mem_pcpu_rsv = SK_MEMORY_PCPU_RESERVE,
- .sysctl_xsk_max_per_socket_budget = 32
+ .sysctl_xsk_max_per_socket_budget = 32,
+ .sysctl_xsk_tx_batch_size = 32
};
EXPORT_SYMBOL(net_hotdata);
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 9f9946b7ffc0..73be2d9e6bab 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -647,6 +647,14 @@ static struct ctl_table net_core_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec
},
+ {
+ .procname = "xsk_tx_batch_size",
+ .data = &net_hotdata.sysctl_xsk_tx_batch_size,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
+
};
static struct ctl_table netns_core_table[] = {
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 95027f964858..4215a40abb69 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -34,8 +34,6 @@
#include "xdp_umem.h"
#include "xsk.h"
-#define TX_BATCH_SIZE 32
-
void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool)
{
if (pool->cached_need_wakeup & XDP_WAKEUP_RX)
@@ -780,8 +778,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
static int __xsk_generic_xmit(struct sock *sk)
{
+ u32 max_batch = READ_ONCE(net_hotdata.sysctl_xsk_tx_batch_size);
struct xdp_sock *xs = xdp_sk(sk);
- u32 max_batch = TX_BATCH_SIZE;
bool sent_frame = false;
struct xdp_desc desc;
struct sk_buff *skb;
--
2.43.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs
2025-06-17 0:22 [PATCH net-next 0/2] net: xsk: add two sysctl knobs Jason Xing
2025-06-17 0:22 ` [PATCH net-next 1/2] net: xsk: make MAX_PER_SOCKET_BUDGET tunable Jason Xing
2025-06-17 0:22 ` [PATCH net-next 2/2] net: xsk: make xsk_tx_batch_size tunable Jason Xing
@ 2025-06-17 1:11 ` Stanislav Fomichev
2025-06-17 2:15 ` Jason Xing
2 siblings, 1 reply; 12+ messages in thread
From: Stanislav Fomichev @ 2025-06-17 1:11 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
On 06/17, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
>
> Introduce a control method in the xsk path to let users have the chance
> to tune it manually.
Can you expand more on why the defaults don't work for you?
Also, can we put these settings into the socket instead of (global/ns)
sysctl?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs
2025-06-17 1:11 ` [PATCH net-next 0/2] net: xsk: add two sysctl knobs Stanislav Fomichev
@ 2025-06-17 2:15 ` Jason Xing
2025-06-17 17:46 ` Stanislav Fomichev
2025-06-18 17:57 ` Willem de Bruijn
0 siblings, 2 replies; 12+ messages in thread
From: Jason Xing @ 2025-06-17 2:15 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
Hi Stanislav,
On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 06/17, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Introduce a control method in the xsk path to let users have the chance
> > to tune it manually.
>
> Can you expand more on why the defaults don't work for you?
We use a user-level tcp stack with xsk to transmit packets that have
higher priorities than other normal kernel tcp flows. It turns out
that enlarging the number can minimize times of triggering sendto
sysctl, which contributes to faster transmission. it's very easy to
hit the upper bound (namely, 32) if you log the return value of
sendto. I mentioned a bit about this in the second patch, saying that
we can have a similar knob already appearing in the qdisc layer.
Furthermore, exposing important parameters can help applications
complete their AI/auto-tuning to judge which one is the best fit in
their production workload. That is also one of the promising
tendencies :)
>
> Also, can we put these settings into the socket instead of (global/ns)
> sysctl?
As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its
corresponding netns? I have no strong opinion on this point for now.
Thanks,
Jason
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs
2025-06-17 2:15 ` Jason Xing
@ 2025-06-17 17:46 ` Stanislav Fomichev
2025-06-17 19:17 ` Joe Damato
2025-06-18 0:29 ` Jason Xing
2025-06-18 17:57 ` Willem de Bruijn
1 sibling, 2 replies; 12+ messages in thread
From: Stanislav Fomichev @ 2025-06-17 17:46 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
On 06/17, Jason Xing wrote:
> Hi Stanislav,
>
> On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 06/17, Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > Introduce a control method in the xsk path to let users have the chance
> > > to tune it manually.
> >
> > Can you expand more on why the defaults don't work for you?
>
> We use a user-level tcp stack with xsk to transmit packets that have
> higher priorities than other normal kernel tcp flows. It turns out
> that enlarging the number can minimize times of triggering sendto
> sysctl, which contributes to faster transmission. it's very easy to
> hit the upper bound (namely, 32) if you log the return value of
> sendto. I mentioned a bit about this in the second patch, saying that
> we can have a similar knob already appearing in the qdisc layer.
> Furthermore, exposing important parameters can help applications
> complete their AI/auto-tuning to judge which one is the best fit in
> their production workload. That is also one of the promising
> tendencies :)
>
> >
> > Also, can we put these settings into the socket instead of (global/ns)
> > sysctl?
>
> As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its
> corresponding netns? I have no strong opinion on this point for now.
I'm suggesting something along these lines (see below). And then add
some way to configure it (plus, obviously, set the default value
on init). There is also a question on whether you need separate
values for MAX_PER_SOCKET_BUDGET and TX_BATCH_SIZE, and if yes,
then why.
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 72c000c0ae5f..fb2caec9914d 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -424,7 +424,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
rcu_read_lock();
again:
list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
- if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
+ if (xs->tx_budget_spent >= xs->max_tx_budget) {
budget_exhausted = true;
continue;
}
@@ -779,7 +779,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
static int __xsk_generic_xmit(struct sock *sk)
{
struct xdp_sock *xs = xdp_sk(sk);
- u32 max_batch = TX_BATCH_SIZE;
bool sent_frame = false;
struct xdp_desc desc;
struct sk_buff *skb;
@@ -797,7 +796,7 @@ static int __xsk_generic_xmit(struct sock *sk)
goto out;
while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
- if (max_batch-- == 0) {
+ if (xs->max_tx_budget-- == 0) {
err = -EAGAIN;
goto out;
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs
2025-06-17 17:46 ` Stanislav Fomichev
@ 2025-06-17 19:17 ` Joe Damato
2025-06-18 0:31 ` Jason Xing
2025-06-18 0:29 ` Jason Xing
1 sibling, 1 reply; 12+ messages in thread
From: Joe Damato @ 2025-06-17 19:17 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Jason Xing, davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
On Tue, Jun 17, 2025 at 10:46:26AM -0700, Stanislav Fomichev wrote:
> On 06/17, Jason Xing wrote:
> > >
> > > Also, can we put these settings into the socket instead of (global/ns)
> > > sysctl?
> >
> > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its
> > corresponding netns? I have no strong opinion on this point for now.
>
> I'm suggesting something along these lines (see below). And then add
> some way to configure it (plus, obviously, set the default value
> on init).
+1 from me on making this per-socket instead of global with sysfs.
I feel like the direction networking has taken lately (netdev-genl, for
example) is to make things more granular instead of global.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs
2025-06-17 17:46 ` Stanislav Fomichev
2025-06-17 19:17 ` Joe Damato
@ 2025-06-18 0:29 ` Jason Xing
2025-06-18 6:59 ` Jason Xing
1 sibling, 1 reply; 12+ messages in thread
From: Jason Xing @ 2025-06-18 0:29 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
On Wed, Jun 18, 2025 at 1:46 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 06/17, Jason Xing wrote:
> > Hi Stanislav,
> >
> > On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > >
> > > On 06/17, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > Introduce a control method in the xsk path to let users have the chance
> > > > to tune it manually.
> > >
> > > Can you expand more on why the defaults don't work for you?
> >
> > We use a user-level tcp stack with xsk to transmit packets that have
> > higher priorities than other normal kernel tcp flows. It turns out
> > that enlarging the number can minimize times of triggering sendto
> > sysctl, which contributes to faster transmission. it's very easy to
> > hit the upper bound (namely, 32) if you log the return value of
> > sendto. I mentioned a bit about this in the second patch, saying that
> > we can have a similar knob already appearing in the qdisc layer.
> > Furthermore, exposing important parameters can help applications
> > complete their AI/auto-tuning to judge which one is the best fit in
> > their production workload. That is also one of the promising
> > tendencies :)
> >
> > >
> > > Also, can we put these settings into the socket instead of (global/ns)
> > > sysctl?
> >
> > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its
> > corresponding netns? I have no strong opinion on this point for now.
To add to that, after digging into this part, I realized that we're
able to use sock_net(sk)->core.max_tx_budget directly to finish the
namespace stuff because xsk_create() calls sk_alloc() which correlates
its netns in the sk->sk_net. Sounds reasonable?
>
> I'm suggesting something along these lines (see below). And then add
> some way to configure it (plus, obviously, set the default value
> on init). There is also a question on whether you need separate
> values for MAX_PER_SOCKET_BUDGET and TX_BATCH_SIZE, and if yes,
For now, actually I don't see a specific reason to separate them, so
let me use a single one in V2. My use case only expects to see the
TX_BATCH_SIZE adjustment.
> then why.
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 72c000c0ae5f..fb2caec9914d 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -424,7 +424,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
> rcu_read_lock();
> again:
> list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
> - if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
> + if (xs->tx_budget_spent >= xs->max_tx_budget) {
If we implement it like this, xs->max_tx_budget has to read a
per-netns from somewhere and then initialize it. The core problem
still remains: where to store the per netns value.
Do you think using the aforementioned sock_net(sk)->core.max_tx_budget
is possible?
Thanks,
Jason
> budget_exhausted = true;
> continue;
> }
> @@ -779,7 +779,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> static int __xsk_generic_xmit(struct sock *sk)
> {
> struct xdp_sock *xs = xdp_sk(sk);
> - u32 max_batch = TX_BATCH_SIZE;
> bool sent_frame = false;
> struct xdp_desc desc;
> struct sk_buff *skb;
> @@ -797,7 +796,7 @@ static int __xsk_generic_xmit(struct sock *sk)
> goto out;
>
> while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> - if (max_batch-- == 0) {
> + if (xs->max_tx_budget-- == 0) {
> err = -EAGAIN;
> goto out;
> }
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs
2025-06-17 19:17 ` Joe Damato
@ 2025-06-18 0:31 ` Jason Xing
0 siblings, 0 replies; 12+ messages in thread
From: Jason Xing @ 2025-06-18 0:31 UTC (permalink / raw)
To: Joe Damato, Stanislav Fomichev, Jason Xing, davem, edumazet, kuba,
pabeni, bjorn, magnus.karlsson, maciej.fijalkowski,
jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf,
netdev, Jason Xing
On Wed, Jun 18, 2025 at 3:17 AM Joe Damato <joe@dama.to> wrote:
>
> On Tue, Jun 17, 2025 at 10:46:26AM -0700, Stanislav Fomichev wrote:
> > On 06/17, Jason Xing wrote:
>
> > > >
> > > > Also, can we put these settings into the socket instead of (global/ns)
> > > > sysctl?
> > >
> > > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its
> > > corresponding netns? I have no strong opinion on this point for now.
> >
> > I'm suggesting something along these lines (see below). And then add
> > some way to configure it (plus, obviously, set the default value
> > on init).
>
> +1 from me on making this per-socket instead of global with sysfs.
>
> I feel like the direction networking has taken lately (netdev-genl, for
> example) is to make things more granular instead of global.
Agree. Previously I missed using sock_net(sk)->core->xxx to make it
namespecified :(
Thanks,
Jason
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs
2025-06-18 0:29 ` Jason Xing
@ 2025-06-18 6:59 ` Jason Xing
0 siblings, 0 replies; 12+ messages in thread
From: Jason Xing @ 2025-06-18 6:59 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
On Wed, Jun 18, 2025 at 8:29 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Jun 18, 2025 at 1:46 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 06/17, Jason Xing wrote:
> > > Hi Stanislav,
> > >
> > > On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > > >
> > > > On 06/17, Jason Xing wrote:
> > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > >
> > > > > Introduce a control method in the xsk path to let users have the chance
> > > > > to tune it manually.
> > > >
> > > > Can you expand more on why the defaults don't work for you?
> > >
> > > We use a user-level tcp stack with xsk to transmit packets that have
> > > higher priorities than other normal kernel tcp flows. It turns out
> > > that enlarging the number can minimize times of triggering sendto
> > > sysctl, which contributes to faster transmission. it's very easy to
> > > hit the upper bound (namely, 32) if you log the return value of
> > > sendto. I mentioned a bit about this in the second patch, saying that
> > > we can have a similar knob already appearing in the qdisc layer.
> > > Furthermore, exposing important parameters can help applications
> > > complete their AI/auto-tuning to judge which one is the best fit in
> > > their production workload. That is also one of the promising
> > > tendencies :)
> > >
> > > >
> > > > Also, can we put these settings into the socket instead of (global/ns)
> > > > sysctl?
> > >
> > > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its
> > > corresponding netns? I have no strong opinion on this point for now.
>
> To add to that, after digging into this part, I realized that we're
> able to use sock_net(sk)->core.max_tx_budget directly to finish the
> namespace stuff because xsk_create() calls sk_alloc() which correlates
> its netns in the sk->sk_net. Sounds reasonable?
Updated patch here:
https://lore.kernel.org/all/20250618065553.96822-1-kerneljasonxing@gmail.com/
Please review :0
Thanks,
Jason
>
> >
> > I'm suggesting something along these lines (see below). And then add
> > some way to configure it (plus, obviously, set the default value
> > on init). There is also a question on whether you need separate
> > values for MAX_PER_SOCKET_BUDGET and TX_BATCH_SIZE, and if yes,
>
> For now, actually I don't see a specific reason to separate them, so
> let me use a single one in V2. My use case only expects to see the
> TX_BATCH_SIZE adjustment.
>
> > then why.
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 72c000c0ae5f..fb2caec9914d 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -424,7 +424,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
> > rcu_read_lock();
> > again:
> > list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
> > - if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
> > + if (xs->tx_budget_spent >= xs->max_tx_budget) {
>
> If we implement it like this, xs->max_tx_budget has to read a
> per-netns from somewhere and then initialize it. The core problem
> still remains: where to store the per netns value.
>
> Do you think using the aforementioned sock_net(sk)->core.max_tx_budget
> is possible?
>
> Thanks,
> Jason
>
> > budget_exhausted = true;
> > continue;
> > }
> > @@ -779,7 +779,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > static int __xsk_generic_xmit(struct sock *sk)
> > {
> > struct xdp_sock *xs = xdp_sk(sk);
> > - u32 max_batch = TX_BATCH_SIZE;
> > bool sent_frame = false;
> > struct xdp_desc desc;
> > struct sk_buff *skb;
> > @@ -797,7 +796,7 @@ static int __xsk_generic_xmit(struct sock *sk)
> > goto out;
> >
> > while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> > - if (max_batch-- == 0) {
> > + if (xs->max_tx_budget-- == 0) {
> > err = -EAGAIN;
> > goto out;
> > }
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs
2025-06-17 2:15 ` Jason Xing
2025-06-17 17:46 ` Stanislav Fomichev
@ 2025-06-18 17:57 ` Willem de Bruijn
2025-06-18 19:24 ` Jason Xing
1 sibling, 1 reply; 12+ messages in thread
From: Willem de Bruijn @ 2025-06-18 17:57 UTC (permalink / raw)
To: Jason Xing, Stanislav Fomichev
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
Jason Xing wrote:
> Hi Stanislav,
>
> On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 06/17, Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > Introduce a control method in the xsk path to let users have the chance
> > > to tune it manually.
> >
> > Can you expand more on why the defaults don't work for you?
>
> We use a user-level tcp stack with xsk to transmit packets that have
> higher priorities than other normal kernel tcp flows. It turns out
> that enlarging the number can minimize times of triggering sendto
> sysctl, which contributes to faster transmission. it's very easy to
> hit the upper bound (namely, 32) if you log the return value of
> sendto. I mentioned a bit about this in the second patch, saying that
> we can have a similar knob already appearing in the qdisc layer.
> Furthermore, exposing important parameters can help applications
> complete their AI/auto-tuning to judge which one is the best fit in
> their production workload. That is also one of the promising
> tendencies :)
It would be informative to include this in the commit.
Or more broadly: suggestions for when and how to pick good settings
for these new tunables.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs
2025-06-18 17:57 ` Willem de Bruijn
@ 2025-06-18 19:24 ` Jason Xing
0 siblings, 0 replies; 12+ messages in thread
From: Jason Xing @ 2025-06-18 19:24 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Stanislav Fomichev, davem, edumazet, kuba, pabeni, bjorn,
magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast,
daniel, hawk, john.fastabend, bpf, netdev, Jason Xing
On Thu, Jun 19, 2025 at 1:58 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > Hi Stanislav,
> >
> > On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > >
> > > On 06/17, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > Introduce a control method in the xsk path to let users have the chance
> > > > to tune it manually.
> > >
> > > Can you expand more on why the defaults don't work for you?
> >
> > We use a user-level tcp stack with xsk to transmit packets that have
> > higher priorities than other normal kernel tcp flows. It turns out
> > that enlarging the number can minimize times of triggering sendto
> > sysctl, which contributes to faster transmission. it's very easy to
> > hit the upper bound (namely, 32) if you log the return value of
> > sendto. I mentioned a bit about this in the second patch, saying that
> > we can have a similar knob already appearing in the qdisc layer.
> > Furthermore, exposing important parameters can help applications
> > complete their AI/auto-tuning to judge which one is the best fit in
> > their production workload. That is also one of the promising
> > tendencies :)
>
> It would be informative to include this in the commit.
Sure, I will add more in V3.
>
> Or more broadly: suggestions for when and how to pick good settings
> for these new tunables.
I'm not sure what is a good setting in general. But I will try to
describe how I use it and what my understanding of it is in the commit
message.
Thanks,
Jason
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-06-18 19:25 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-17 0:22 [PATCH net-next 0/2] net: xsk: add two sysctl knobs Jason Xing
2025-06-17 0:22 ` [PATCH net-next 1/2] net: xsk: make MAX_PER_SOCKET_BUDGET tunable Jason Xing
2025-06-17 0:22 ` [PATCH net-next 2/2] net: xsk: make xsk_tx_batch_size tunable Jason Xing
2025-06-17 1:11 ` [PATCH net-next 0/2] net: xsk: add two sysctl knobs Stanislav Fomichev
2025-06-17 2:15 ` Jason Xing
2025-06-17 17:46 ` Stanislav Fomichev
2025-06-17 19:17 ` Joe Damato
2025-06-18 0:31 ` Jason Xing
2025-06-18 0:29 ` Jason Xing
2025-06-18 6:59 ` Jason Xing
2025-06-18 17:57 ` Willem de Bruijn
2025-06-18 19:24 ` Jason Xing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).