* [PATCH net-next v6 1/5] net: Create separate gro_flush_normal function
2025-07-18 23:20 [PATCH net-next v6 0/5] Add support to do threaded napi busy poll Samiullah Khawaja
@ 2025-07-18 23:20 ` Samiullah Khawaja
2025-07-18 23:20 ` [PATCH net-next v6 2/5] net: Use dev_set_threaded_hint instead of dev_set_threaded in drivers Samiullah Khawaja
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Samiullah Khawaja @ 2025-07-18 23:20 UTC (permalink / raw)
To: Jakub Kicinski, David S . Miller , Eric Dumazet, Paolo Abeni,
almasrymina, willemb, jdamato, mkarsten
Cc: netdev, skhawaja
Move multiple copies of same code snippet doing `gro_flush` and
`gro_normal_list` into separate helper function.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
v6:
- gro_flush_helper renamed to gro_flush_normal and moved to gro.h. Also
used it in kernel/bpf/cpumap.c
---
include/net/gro.h | 6 ++++++
kernel/bpf/cpumap.c | 3 +--
net/core/dev.c | 9 +++------
3 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/include/net/gro.h b/include/net/gro.h
index 22d3a69e4404..a0fca7ac6e7e 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -534,6 +534,12 @@ static inline void gro_normal_list(struct gro_node *gro)
gro->rx_count = 0;
}
+static inline void gro_flush_normal(struct gro_node *gro, bool flush_old)
+{
+ gro_flush(gro, flush_old);
+ gro_normal_list(gro);
+}
+
/* Queue one GRO_NORMAL SKB up for list processing. If batch size exceeded,
* pass the whole batch up to the stack.
*/
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 67e8a2fc1a99..b2b7b8ec2c2a 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -282,8 +282,7 @@ static void cpu_map_gro_flush(struct bpf_cpu_map_entry *rcpu, bool empty)
* This is equivalent to how NAPI decides whether to perform a full
* flush.
*/
- gro_flush(&rcpu->gro, !empty && HZ >= 1000);
- gro_normal_list(&rcpu->gro);
+ gro_flush_normal(&rcpu->gro, !empty && HZ >= 1000);
}
static int cpu_map_kthread_run(void *data)
diff --git a/net/core/dev.c b/net/core/dev.c
index 621a639aeba1..cc216a461743 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6574,8 +6574,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
* it, we need to bound somehow the time packets are kept in
* the GRO layer.
*/
- gro_flush(&n->gro, !!timeout);
- gro_normal_list(&n->gro);
+ gro_flush_normal(&n->gro, !!timeout);
if (unlikely(!list_empty(&n->poll_list))) {
/* If n->poll_list is not empty, we need to mask irqs */
@@ -6645,8 +6644,7 @@ static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
}
/* Flush too old packets. If HZ < 1000, flush all packets */
- gro_flush(&napi->gro, HZ >= 1000);
- gro_normal_list(&napi->gro);
+ gro_flush_normal(&napi->gro, HZ >= 1000);
clear_bit(NAPI_STATE_SCHED, &napi->state);
}
@@ -7511,8 +7509,7 @@ static int __napi_poll(struct napi_struct *n, bool *repoll)
}
/* Flush too old packets. If HZ < 1000, flush all packets */
- gro_flush(&n->gro, HZ >= 1000);
- gro_normal_list(&n->gro);
+ gro_flush_normal(&n->gro, HZ >= 1000);
/* Some drivers may have called napi_schedule
* prior to exhausting their budget.
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net-next v6 2/5] net: Use dev_set_threaded_hint instead of dev_set_threaded in drivers
2025-07-18 23:20 [PATCH net-next v6 0/5] Add support to do threaded napi busy poll Samiullah Khawaja
2025-07-18 23:20 ` [PATCH net-next v6 1/5] net: Create separate gro_flush_normal function Samiullah Khawaja
@ 2025-07-18 23:20 ` Samiullah Khawaja
2025-07-18 23:20 ` [PATCH net-next v6 3/5] net: define an enum for the napi threaded state Samiullah Khawaja
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Samiullah Khawaja @ 2025-07-18 23:20 UTC (permalink / raw)
To: Jakub Kicinski, David S . Miller , Eric Dumazet, Paolo Abeni,
almasrymina, willemb, jdamato, mkarsten
Cc: netdev, skhawaja
Prepare for adding an enum type for NAPI threaded states by adding
dev_set_threaded_hint API. De-export the existing dev_set_threaded API
and only use it internally. Update existing drivers to use
dev_set_threaded_hint instead of the de-exported dev_set_threaded.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 2 +-
drivers/net/ethernet/mellanox/mlxsw/pci.c | 2 +-
drivers/net/ethernet/renesas/ravb_main.c | 2 +-
drivers/net/wireguard/device.c | 2 +-
drivers/net/wireless/ath/ath10k/snoc.c | 2 +-
drivers/net/wireless/mediatek/mt76/debugfs.c | 2 +-
include/linux/netdevice.h | 2 +-
net/core/dev.c | 7 ++++++-
net/core/dev.h | 2 ++
9 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index ef1a51347351..4519379d284c 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -2688,7 +2688,7 @@ static int atl1c_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
adapter->mii.mdio_write = atl1c_mdio_write;
adapter->mii.phy_id_mask = 0x1f;
adapter->mii.reg_num_mask = MDIO_CTRL_REG_MASK;
- dev_set_threaded(netdev, true);
+ dev_set_threaded_hint(netdev);
for (i = 0; i < adapter->rx_queue_count; ++i)
netif_napi_add(netdev, &adapter->rrd_ring[i].napi,
atl1c_clean_rx);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 058dcabfaa2e..268b830ce17e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -156,7 +156,7 @@ static int mlxsw_pci_napi_devs_init(struct mlxsw_pci *mlxsw_pci)
}
strscpy(mlxsw_pci->napi_dev_rx->name, "mlxsw_rx",
sizeof(mlxsw_pci->napi_dev_rx->name));
- dev_set_threaded(mlxsw_pci->napi_dev_rx, true);
+ dev_set_threaded_hint(mlxsw_pci->napi_dev_rx);
return 0;
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index c9f4976a3527..31b2cb11764d 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -3075,7 +3075,7 @@ static int ravb_probe(struct platform_device *pdev)
if (info->coalesce_irqs) {
netdev_sw_irq_coalesce_default_on(ndev);
if (num_present_cpus() == 1)
- dev_set_threaded(ndev, true);
+ dev_set_threaded_hint(ndev);
}
/* Network device register */
diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
index 4a529f1f9bea..1f3e4e7cc90a 100644
--- a/drivers/net/wireguard/device.c
+++ b/drivers/net/wireguard/device.c
@@ -366,7 +366,7 @@ static int wg_newlink(struct net_device *dev,
if (ret < 0)
goto err_free_handshake_queue;
- dev_set_threaded(dev, true);
+ dev_set_threaded_hint(dev);
ret = register_netdevice(dev);
if (ret < 0)
goto err_uninit_ratelimiter;
diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index d51f2e5a79a4..d6412330d8ef 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -936,7 +936,7 @@ static int ath10k_snoc_hif_start(struct ath10k *ar)
bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX);
- dev_set_threaded(ar->napi_dev, true);
+ dev_set_threaded_hint(ar->napi_dev);
ath10k_core_napi_enable(ar);
/* IRQs are left enabled when we restart due to a firmware crash */
if (!test_bit(ATH10K_SNOC_FLAG_RECOVERY, &ar_snoc->flags))
diff --git a/drivers/net/wireless/mediatek/mt76/debugfs.c b/drivers/net/wireless/mediatek/mt76/debugfs.c
index b6a2746c187d..bd62a87aabfe 100644
--- a/drivers/net/wireless/mediatek/mt76/debugfs.c
+++ b/drivers/net/wireless/mediatek/mt76/debugfs.c
@@ -34,7 +34,7 @@ mt76_napi_threaded_set(void *data, u64 val)
return -EOPNOTSUPP;
if (dev->napi_dev->threaded != val)
- return dev_set_threaded(dev->napi_dev, val);
+ return dev_set_threaded_hint(dev->napi_dev);
return 0;
}
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e49d8c98d284..87591448a008 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -589,7 +589,7 @@ static inline bool napi_complete(struct napi_struct *n)
return napi_complete_done(n, 0);
}
-int dev_set_threaded(struct net_device *dev, bool threaded);
+int dev_set_threaded_hint(struct net_device *dev);
void napi_disable(struct napi_struct *n);
void napi_disable_locked(struct napi_struct *n);
diff --git a/net/core/dev.c b/net/core/dev.c
index cc216a461743..d3f72e5f4904 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7025,7 +7025,12 @@ int dev_set_threaded(struct net_device *dev, bool threaded)
return err;
}
-EXPORT_SYMBOL(dev_set_threaded);
+
+int dev_set_threaded_hint(struct net_device *dev)
+{
+ return dev_set_threaded(dev, true);
+}
+EXPORT_SYMBOL(dev_set_threaded_hint);
/**
* netif_queue_set_napi - Associate queue with the napi
diff --git a/net/core/dev.h b/net/core/dev.h
index a603387fb566..23cbeaad8ca2 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -322,6 +322,8 @@ static inline bool napi_get_threaded(struct napi_struct *n)
int napi_set_threaded(struct napi_struct *n, bool threaded);
+int dev_set_threaded(struct net_device *dev, bool threaded);
+
int rps_cpumask_housekeeping(struct cpumask *mask);
#if defined(CONFIG_DEBUG_NET) && defined(CONFIG_BPF_SYSCALL)
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net-next v6 3/5] net: define an enum for the napi threaded state
2025-07-18 23:20 [PATCH net-next v6 0/5] Add support to do threaded napi busy poll Samiullah Khawaja
2025-07-18 23:20 ` [PATCH net-next v6 1/5] net: Create separate gro_flush_normal function Samiullah Khawaja
2025-07-18 23:20 ` [PATCH net-next v6 2/5] net: Use dev_set_threaded_hint instead of dev_set_threaded in drivers Samiullah Khawaja
@ 2025-07-18 23:20 ` Samiullah Khawaja
2025-07-21 23:48 ` Jakub Kicinski
2025-07-18 23:20 ` [PATCH net-next v6 4/5] Extend napi threaded polling to allow kthread based busy polling Samiullah Khawaja
` (2 subsequent siblings)
5 siblings, 1 reply; 8+ messages in thread
From: Samiullah Khawaja @ 2025-07-18 23:20 UTC (permalink / raw)
To: Jakub Kicinski, David S . Miller , Eric Dumazet, Paolo Abeni,
almasrymina, willemb, jdamato, mkarsten
Cc: netdev, skhawaja
Instead of using '0' and '1' for napi threaded state use an enum with
'disabled' and 'enabled' states. This is to prepare for the next patch
to add a new 'busy-poll-enabled' state. Also move and update the
'threaded' field in struct net_device to u8 instead of bool.
Tested:
./tools/testing/selftests/net/nl_netdev.py
TAP version 13
1..7
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
ok 3 nl_netdev.page_pool_check
ok 4 nl_netdev.napi_list_check
ok 5 nl_netdev.dev_set_threaded
ok 6 nl_netdev.napi_set_threaded
ok 7 nl_netdev.nsim_rxq_reset_down
# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
v6:
- Moved threaded in struct netdevice up to fill the cacheline hole.
- Changed dev_set_threaded to dev_set_threaded_hint and removed the
second argument that was always set to true by all the drivers.
Exported only dev_set_threaded_hint and made dev_set_threaded core
only function. This change is done in a separate commit.
- Updated documentation comment for threaded in struct netdevice.
---
Documentation/netlink/specs/netdev.yaml | 13 ++++---
.../networking/net_cachelines/net_device.rst | 2 +-
include/linux/netdevice.h | 7 ++--
include/uapi/linux/netdev.h | 5 +++
net/core/dev.c | 12 ++++---
net/core/dev.h | 13 ++++---
net/core/netdev-genl-gen.c | 2 +-
net/core/netdev-genl.c | 2 +-
tools/include/uapi/linux/netdev.h | 5 +++
tools/testing/selftests/net/nl_netdev.py | 36 +++++++++----------
10 files changed, 58 insertions(+), 39 deletions(-)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 85d0ea6ac426..11edbf9c5727 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -85,6 +85,10 @@ definitions:
name: qstats-scope
type: flags
entries: [queue]
+ -
+ name: napi-threaded
+ type: enum
+ entries: [ disabled, enabled ]
attribute-sets:
-
@@ -286,11 +290,10 @@ attribute-sets:
-
name: threaded
doc: Whether the NAPI is configured to operate in threaded polling
- mode. If this is set to 1 then the NAPI context operates in
- threaded polling mode.
- type: uint
- checks:
- max: 1
+ mode. If this is set to `enabled` then the NAPI context operates
+ in threaded polling mode.
+ type: u32
+ enum: napi-threaded
-
name: xsk-info
attributes: []
diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst
index c69cc89c958e..cb6daccac0b6 100644
--- a/Documentation/networking/net_cachelines/net_device.rst
+++ b/Documentation/networking/net_cachelines/net_device.rst
@@ -68,6 +68,7 @@ unsigned_char addr_assign_type
unsigned_char addr_len
unsigned_char upper_level
unsigned_char lower_level
+u8 threaded napi_poll(napi_enable,dev_set_threaded)
unsigned_short neigh_priv_len
unsigned_short padded
unsigned_short dev_id
@@ -165,7 +166,6 @@ struct sfp_bus* sfp_bus
struct lock_class_key* qdisc_tx_busylock
bool proto_down
unsigned:1 wol_enabled
-unsigned:1 threaded napi_poll(napi_enable,dev_set_threaded)
unsigned_long:1 see_all_hwtstamp_requests
unsigned_long:1 change_proto_down
unsigned_long:1 netns_immutable
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 87591448a008..97cf14a9b469 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -369,7 +369,7 @@ struct napi_config {
u64 irq_suspend_timeout;
u32 defer_hard_irqs;
cpumask_t affinity_mask;
- bool threaded;
+ u8 threaded;
unsigned int napi_id;
};
@@ -1871,6 +1871,7 @@ enum netdev_reg_state {
* @addr_len: Hardware address length
* @upper_level: Maximum depth level of upper devices.
* @lower_level: Maximum depth level of lower devices.
+ * @threaded: napi threaded state.
* @neigh_priv_len: Used in neigh_alloc()
* @dev_id: Used to differentiate devices that share
* the same link layer address
@@ -2010,8 +2011,6 @@ enum netdev_reg_state {
* switch driver and used to set the phys state of the
* switch port.
*
- * @threaded: napi threaded mode is enabled
- *
* @irq_affinity_auto: driver wants the core to store and re-assign the IRQ
* affinity. Set by netif_enable_irq_affinity(), then
* the driver must create a persistent napi by
@@ -2247,6 +2246,7 @@ struct net_device {
unsigned char addr_len;
unsigned char upper_level;
unsigned char lower_level;
+ u8 threaded;
unsigned short neigh_priv_len;
unsigned short dev_id;
@@ -2428,7 +2428,6 @@ struct net_device {
struct sfp_bus *sfp_bus;
struct lock_class_key *qdisc_tx_busylock;
bool proto_down;
- bool threaded;
bool irq_affinity_auto;
bool rx_cpu_rmap_auto;
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 1f3719a9a0eb..48eb49aa03d4 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -77,6 +77,11 @@ enum netdev_qstats_scope {
NETDEV_QSTATS_SCOPE_QUEUE = 1,
};
+enum netdev_napi_threaded {
+ NETDEV_NAPI_THREADED_DISABLED,
+ NETDEV_NAPI_THREADED_ENABLED,
+};
+
enum {
NETDEV_A_DEV_IFINDEX = 1,
NETDEV_A_DEV_PAD,
diff --git a/net/core/dev.c b/net/core/dev.c
index d3f72e5f4904..ec65b03492b1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6959,7 +6959,8 @@ static void napi_stop_kthread(struct napi_struct *napi)
napi->thread = NULL;
}
-int napi_set_threaded(struct napi_struct *napi, bool threaded)
+int napi_set_threaded(struct napi_struct *napi,
+ enum netdev_napi_threaded threaded)
{
if (threaded) {
if (!napi->thread) {
@@ -6984,7 +6985,8 @@ int napi_set_threaded(struct napi_struct *napi, bool threaded)
return 0;
}
-int dev_set_threaded(struct net_device *dev, bool threaded)
+int dev_set_threaded(struct net_device *dev,
+ enum netdev_napi_threaded threaded)
{
struct napi_struct *napi;
int err = 0;
@@ -6996,7 +6998,7 @@ int dev_set_threaded(struct net_device *dev, bool threaded)
if (!napi->thread) {
err = napi_kthread_create(napi);
if (err) {
- threaded = false;
+ threaded = NETDEV_NAPI_THREADED_DISABLED;
break;
}
}
@@ -7028,7 +7030,7 @@ int dev_set_threaded(struct net_device *dev, bool threaded)
int dev_set_threaded_hint(struct net_device *dev)
{
- return dev_set_threaded(dev, true);
+ return dev_set_threaded(dev, NETDEV_NAPI_THREADED_ENABLED);
}
EXPORT_SYMBOL(dev_set_threaded_hint);
@@ -7345,7 +7347,7 @@ void netif_napi_add_weight_locked(struct net_device *dev,
* threaded mode will not be enabled in napi_enable().
*/
if (dev->threaded && napi_kthread_create(napi))
- dev->threaded = false;
+ dev->threaded = NETDEV_NAPI_THREADED_DISABLED;
netif_napi_set_irq_locked(napi, -1);
}
EXPORT_SYMBOL(netif_napi_add_weight_locked);
diff --git a/net/core/dev.h b/net/core/dev.h
index 23cbeaad8ca2..ab6fac65ec24 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -315,14 +315,19 @@ static inline void napi_set_irq_suspend_timeout(struct napi_struct *n,
WRITE_ONCE(n->irq_suspend_timeout, timeout);
}
-static inline bool napi_get_threaded(struct napi_struct *n)
+static inline enum netdev_napi_threaded napi_get_threaded(struct napi_struct *n)
{
- return test_bit(NAPI_STATE_THREADED, &n->state);
+ if (test_bit(NAPI_STATE_THREADED, &n->state))
+ return NETDEV_NAPI_THREADED_ENABLED;
+
+ return NETDEV_NAPI_THREADED_DISABLED;
}
-int napi_set_threaded(struct napi_struct *n, bool threaded);
+int napi_set_threaded(struct napi_struct *n,
+ enum netdev_napi_threaded threaded);
-int dev_set_threaded(struct net_device *dev, bool threaded);
+int dev_set_threaded(struct net_device *dev,
+ enum netdev_napi_threaded threaded);
int rps_cpumask_housekeeping(struct cpumask *mask);
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index 0994bd68a7e6..e9a2a6f26cb7 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -97,7 +97,7 @@ static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_THREADED
[NETDEV_A_NAPI_DEFER_HARD_IRQS] = NLA_POLICY_FULL_RANGE(NLA_U32, &netdev_a_napi_defer_hard_irqs_range),
[NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT] = { .type = NLA_UINT, },
[NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT] = { .type = NLA_UINT, },
- [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_UINT, 1),
+ [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 1),
};
/* NETDEV_CMD_BIND_TX - do */
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 5875df372415..6314eb7bdf69 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -333,7 +333,7 @@ netdev_nl_napi_set_config(struct napi_struct *napi, struct genl_info *info)
int ret;
threaded = nla_get_uint(info->attrs[NETDEV_A_NAPI_THREADED]);
- ret = napi_set_threaded(napi, !!threaded);
+ ret = napi_set_threaded(napi, threaded);
if (ret)
return ret;
}
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index 1f3719a9a0eb..48eb49aa03d4 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -77,6 +77,11 @@ enum netdev_qstats_scope {
NETDEV_QSTATS_SCOPE_QUEUE = 1,
};
+enum netdev_napi_threaded {
+ NETDEV_NAPI_THREADED_DISABLED,
+ NETDEV_NAPI_THREADED_ENABLED,
+};
+
enum {
NETDEV_A_DEV_IFINDEX = 1,
NETDEV_A_DEV_PAD,
diff --git a/tools/testing/selftests/net/nl_netdev.py b/tools/testing/selftests/net/nl_netdev.py
index c8ffade79a52..5c66421ab8aa 100755
--- a/tools/testing/selftests/net/nl_netdev.py
+++ b/tools/testing/selftests/net/nl_netdev.py
@@ -52,14 +52,14 @@ def napi_set_threaded(nf) -> None:
napi1_id = napis[1]['id']
# set napi threaded and verify
- nf.napi_set({'id': napi0_id, 'threaded': 1})
+ nf.napi_set({'id': napi0_id, 'threaded': "enabled"})
napi0 = nf.napi_get({'id': napi0_id})
- ksft_eq(napi0['threaded'], 1)
+ ksft_eq(napi0['threaded'], "enabled")
ksft_ne(napi0.get('pid'), None)
# check it is not set for napi1
napi1 = nf.napi_get({'id': napi1_id})
- ksft_eq(napi1['threaded'], 0)
+ ksft_eq(napi1['threaded'], "disabled")
ksft_eq(napi1.get('pid'), None)
ip(f"link set dev {nsim.ifname} down")
@@ -67,18 +67,18 @@ def napi_set_threaded(nf) -> None:
# verify if napi threaded is still set
napi0 = nf.napi_get({'id': napi0_id})
- ksft_eq(napi0['threaded'], 1)
+ ksft_eq(napi0['threaded'], "enabled")
ksft_ne(napi0.get('pid'), None)
# check it is still not set for napi1
napi1 = nf.napi_get({'id': napi1_id})
- ksft_eq(napi1['threaded'], 0)
+ ksft_eq(napi1['threaded'], "disabled")
ksft_eq(napi1.get('pid'), None)
# unset napi threaded and verify
- nf.napi_set({'id': napi0_id, 'threaded': 0})
+ nf.napi_set({'id': napi0_id, 'threaded': "disabled"})
napi0 = nf.napi_get({'id': napi0_id})
- ksft_eq(napi0['threaded'], 0)
+ ksft_eq(napi0['threaded'], "disabled")
ksft_eq(napi0.get('pid'), None)
# set threaded at device level
@@ -86,10 +86,10 @@ def napi_set_threaded(nf) -> None:
# check napi threaded is set for both napis
napi0 = nf.napi_get({'id': napi0_id})
- ksft_eq(napi0['threaded'], 1)
+ ksft_eq(napi0['threaded'], "enabled")
ksft_ne(napi0.get('pid'), None)
napi1 = nf.napi_get({'id': napi1_id})
- ksft_eq(napi1['threaded'], 1)
+ ksft_eq(napi1['threaded'], "enabled")
ksft_ne(napi1.get('pid'), None)
# unset threaded at device level
@@ -97,16 +97,16 @@ def napi_set_threaded(nf) -> None:
# check napi threaded is unset for both napis
napi0 = nf.napi_get({'id': napi0_id})
- ksft_eq(napi0['threaded'], 0)
+ ksft_eq(napi0['threaded'], "disabled")
ksft_eq(napi0.get('pid'), None)
napi1 = nf.napi_get({'id': napi1_id})
- ksft_eq(napi1['threaded'], 0)
+ ksft_eq(napi1['threaded'], "disabled")
ksft_eq(napi1.get('pid'), None)
# set napi threaded for napi0
nf.napi_set({'id': napi0_id, 'threaded': 1})
napi0 = nf.napi_get({'id': napi0_id})
- ksft_eq(napi0['threaded'], 1)
+ ksft_eq(napi0['threaded'], "enabled")
ksft_ne(napi0.get('pid'), None)
# unset threaded at device level
@@ -114,10 +114,10 @@ def napi_set_threaded(nf) -> None:
# check napi threaded is unset for both napis
napi0 = nf.napi_get({'id': napi0_id})
- ksft_eq(napi0['threaded'], 0)
+ ksft_eq(napi0['threaded'], "disabled")
ksft_eq(napi0.get('pid'), None)
napi1 = nf.napi_get({'id': napi1_id})
- ksft_eq(napi1['threaded'], 0)
+ ksft_eq(napi1['threaded'], "disabled")
ksft_eq(napi1.get('pid'), None)
def dev_set_threaded(nf) -> None:
@@ -141,10 +141,10 @@ def dev_set_threaded(nf) -> None:
# check napi threaded is set for both napis
napi0 = nf.napi_get({'id': napi0_id})
- ksft_eq(napi0['threaded'], 1)
+ ksft_eq(napi0['threaded'], "enabled")
ksft_ne(napi0.get('pid'), None)
napi1 = nf.napi_get({'id': napi1_id})
- ksft_eq(napi1['threaded'], 1)
+ ksft_eq(napi1['threaded'], "enabled")
ksft_ne(napi1.get('pid'), None)
# unset threaded
@@ -152,10 +152,10 @@ def dev_set_threaded(nf) -> None:
# check napi threaded is unset for both napis
napi0 = nf.napi_get({'id': napi0_id})
- ksft_eq(napi0['threaded'], 0)
+ ksft_eq(napi0['threaded'], "disabled")
ksft_eq(napi0.get('pid'), None)
napi1 = nf.napi_get({'id': napi1_id})
- ksft_eq(napi1['threaded'], 0)
+ ksft_eq(napi1['threaded'], "disabled")
ksft_eq(napi1.get('pid'), None)
def nsim_rxq_reset_down(nf) -> None:
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH net-next v6 3/5] net: define an enum for the napi threaded state
2025-07-18 23:20 ` [PATCH net-next v6 3/5] net: define an enum for the napi threaded state Samiullah Khawaja
@ 2025-07-21 23:48 ` Jakub Kicinski
0 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2025-07-21 23:48 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: David S . Miller , Eric Dumazet, Paolo Abeni, almasrymina,
willemb, jdamato, mkarsten, netdev
On Fri, 18 Jul 2025 23:20:49 +0000 Samiullah Khawaja wrote:
> Documentation/netlink/specs/netdev.yaml | 13 ++++---
yamllint says:
91:15 error too many spaces inside brackets (brackets)
91:33 error too many spaces inside brackets (brackets)
Please fix, rebase (the series does not apply any more), and repost
the first 3 patches ASAP. We really need to merge the decoding of
threaded as enum before the merge window. We don't want it to be
a bare int in 6.17 and enum in 6.18. I may break some Python scripts.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next v6 4/5] Extend napi threaded polling to allow kthread based busy polling
2025-07-18 23:20 [PATCH net-next v6 0/5] Add support to do threaded napi busy poll Samiullah Khawaja
` (2 preceding siblings ...)
2025-07-18 23:20 ` [PATCH net-next v6 3/5] net: define an enum for the napi threaded state Samiullah Khawaja
@ 2025-07-18 23:20 ` Samiullah Khawaja
2025-07-18 23:20 ` [PATCH net-next v6 5/5] selftests: Add napi threaded busy poll test in `busy_poller` Samiullah Khawaja
2025-07-19 21:27 ` [PATCH net-next v6 0/5] Add support to do threaded napi busy poll Martin Karsten
5 siblings, 0 replies; 8+ messages in thread
From: Samiullah Khawaja @ 2025-07-18 23:20 UTC (permalink / raw)
To: Jakub Kicinski, David S . Miller , Eric Dumazet, Paolo Abeni,
almasrymina, willemb, jdamato, mkarsten
Cc: netdev, skhawaja
Add a new state to napi state enum:
- NAPI_STATE_THREADED_BUSY_POLL
Threaded busy poll is enabled/running for this napi.
Following changes are introduced in the napi scheduling and state logic:
- When threaded busy poll is enabled through sysfs or netlink it also
enables NAPI_STATE_THREADED so a kthread is created per napi. It also
sets NAPI_STATE_THREADED_BUSY_POLL bit on each napi to indicate that
it is going to busy poll the napi.
- When napi is scheduled with NAPI_STATE_SCHED_THREADED and associated
kthread is woken up, the kthread owns the context. If
NAPI_STATE_THREADED_BUSY_POLL and NAPI_STATE_SCHED_THREADED both are
set then it means that kthread can busy poll.
- To keep busy polling and to avoid scheduling of the interrupts, the
napi_complete_done returns false when both NAPI_STATE_SCHED_THREADED
and NAPI_STATE_THREADED_BUSY_POLL flags are set. Also
napi_complete_done returns early to avoid the
NAPI_STATE_SCHED_THREADED being unset.
- If at any point NAPI_STATE_THREADED_BUSY_POLL is unset, the
napi_complete_done will run and unset the NAPI_STATE_SCHED_THREADED
bit also. This will make the associated kthread go to sleep as per
existing logic.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
Documentation/ABI/testing/sysfs-class-net | 3 +-
Documentation/netlink/specs/netdev.yaml | 5 +-
Documentation/networking/napi.rst | 63 +++++++++++++++++++-
include/linux/netdevice.h | 11 +++-
include/uapi/linux/netdev.h | 1 +
net/core/dev.c | 71 +++++++++++++++++++----
net/core/dev.h | 3 +
net/core/net-sysfs.c | 2 +-
net/core/netdev-genl-gen.c | 2 +-
tools/include/uapi/linux/netdev.h | 1 +
10 files changed, 145 insertions(+), 17 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
index ebf21beba846..15d7d36a8294 100644
--- a/Documentation/ABI/testing/sysfs-class-net
+++ b/Documentation/ABI/testing/sysfs-class-net
@@ -343,7 +343,7 @@ Date: Jan 2021
KernelVersion: 5.12
Contact: netdev@vger.kernel.org
Description:
- Boolean value to control the threaded mode per device. User could
+ Integer value to control the threaded mode per device. User could
set this value to enable/disable threaded mode for all napi
belonging to this device, without the need to do device up/down.
@@ -351,4 +351,5 @@ Description:
== ==================================
0 threaded mode disabled for this dev
1 threaded mode enabled for this dev
+ 2 threaded mode enabled, and busy polling enabled.
== ==================================
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 11edbf9c5727..70a4a9c8afef 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -88,7 +88,7 @@ definitions:
-
name: napi-threaded
type: enum
- entries: [ disabled, enabled ]
+ entries: [ disabled, enabled, busy-poll-enabled ]
attribute-sets:
-
@@ -291,7 +291,8 @@ attribute-sets:
name: threaded
doc: Whether the NAPI is configured to operate in threaded polling
mode. If this is set to `enabled` then the NAPI context operates
- in threaded polling mode.
+ in threaded polling mode. If this is set to `busy-poll-enabled`
+ then the NAPI kthread also does busypolling.
type: u32
enum: napi-threaded
-
diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst
index a15754adb041..a1e76341a99a 100644
--- a/Documentation/networking/napi.rst
+++ b/Documentation/networking/napi.rst
@@ -263,7 +263,9 @@ are not well known).
Busy polling is enabled by either setting ``SO_BUSY_POLL`` on
selected sockets or using the global ``net.core.busy_poll`` and
``net.core.busy_read`` sysctls. An io_uring API for NAPI busy polling
-also exists.
+also exists. Threaded polling of NAPI also has a mode to busy poll for
+packets (:ref:`threaded busy polling<threaded_busy_poll>`) using the same
+thread that is used for NAPI processing.
epoll-based busy polling
------------------------
@@ -426,6 +428,65 @@ Therefore, setting ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` is
the recommended usage, because otherwise setting ``irq-suspend-timeout``
might not have any discernible effect.
+.. _threaded_busy_poll:
+
+Threaded NAPI busy polling
+--------------------------
+
+Threaded NAPI allows processing of packets from each NAPI in a kthread in
+kernel. Threaded NAPI busy polling extends this and adds support to do
+continuous busy polling of this NAPI. This can be used to enable busy polling
+independent of userspace application or the API (epoll, io_uring, raw sockets)
+being used in userspace to process the packets.
+
+It can be enabled for each NAPI using netlink interface or at device level using
+the threaded NAPI sysctl.
+
+For example, using following script:
+
+.. code-block:: bash
+
+ $ ynl --family netdev --do napi-set \
+ --json='{"id": 66, "threaded": "busy-poll-enabled"}'
+
+
+Enabling it for each NAPI allows finer control to enable busy pollling for
+only a set of NIC queues which will get traffic with low latency requirements.
+
+Depending on application requirement, user might want to set affinity of the
+kthread that is busy polling each NAPI. User might also want to set priority
+and the scheduler of the thread depending on the latency requirements.
+
+For a hard low-latency application, user might want to dedicate the full core
+for the NAPI polling so the NIC queue descriptors are picked up from the queue
+as soon as they appear. Once enabled, the NAPI thread will poll the NIC queues
+continuously without sleeping. This will keep the CPU core busy with 100%
+usage. For more relaxed low-latency requirement, user might want to share the
+core with other threads by setting thread affinity and priority.
+
+Once threaded busy polling is enabled for a NAPI, PID of the kthread can be
+fetched using netlink interface so the affinity, priority and scheduler
+configuration can be done.
+
+For example, following script can be used to fetch the pid:
+
+.. code-block:: bash
+
+ $ ynl --family netdev --do napi-get --json='{"id": 66}'
+
+This will output something like following, the pid `258` is the PID of the
+kthread that is polling this NAPI.
+
+.. code-block:: bash
+
+ $ {'defer-hard-irqs': 0,
+ 'gro-flush-timeout': 0,
+ 'id': 66,
+ 'ifindex': 2,
+ 'irq-suspend-timeout': 0,
+ 'pid': 258,
+ 'threaded': 'enable'}
+
.. _threaded:
Threaded NAPI
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 97cf14a9b469..6682b975febd 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -427,6 +427,8 @@ enum {
NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/
NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */
NAPI_STATE_HAS_NOTIFIER, /* Napi has an IRQ notifier */
+ NAPI_STATE_THREADED_BUSY_POLL, /* The threaded napi poller will busy poll */
+ NAPI_STATE_SCHED_THREADED_BUSY_POLL, /* The threaded napi poller is busy polling */
};
enum {
@@ -441,8 +443,14 @@ enum {
NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED),
NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED),
NAPIF_STATE_HAS_NOTIFIER = BIT(NAPI_STATE_HAS_NOTIFIER),
+ NAPIF_STATE_THREADED_BUSY_POLL = BIT(NAPI_STATE_THREADED_BUSY_POLL),
+ NAPIF_STATE_SCHED_THREADED_BUSY_POLL =
+ BIT(NAPI_STATE_SCHED_THREADED_BUSY_POLL),
};
+#define NAPIF_STATE_THREADED_BUSY_POLL_MASK \
+ (NAPIF_STATE_THREADED | NAPIF_STATE_THREADED_BUSY_POLL)
+
enum gro_result {
GRO_MERGED,
GRO_MERGED_FREE,
@@ -1871,7 +1879,8 @@ enum netdev_reg_state {
* @addr_len: Hardware address length
* @upper_level: Maximum depth level of upper devices.
* @lower_level: Maximum depth level of lower devices.
- * @threaded: napi threaded state.
+ * @threaded: napi threaded mode is disabled, enabled or
+ * enabled with busy polling.
* @neigh_priv_len: Used in neigh_alloc()
* @dev_id: Used to differentiate devices that share
* the same link layer address
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 48eb49aa03d4..8163afb15377 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -80,6 +80,7 @@ enum netdev_qstats_scope {
enum netdev_napi_threaded {
NETDEV_NAPI_THREADED_DISABLED,
NETDEV_NAPI_THREADED_ENABLED,
+ NETDEV_NAPI_THREADED_BUSY_POLL_ENABLED,
};
enum {
diff --git a/net/core/dev.c b/net/core/dev.c
index ec65b03492b1..9511c69dc8e8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -78,6 +78,7 @@
#include <linux/slab.h>
#include <linux/sched.h>
#include <linux/sched/isolation.h>
+#include <linux/sched/types.h>
#include <linux/sched/mm.h>
#include <linux/smpboot.h>
#include <linux/mutex.h>
@@ -6554,7 +6555,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
* the guarantee we will be called later.
*/
if (unlikely(n->state & (NAPIF_STATE_NPSVC |
- NAPIF_STATE_IN_BUSY_POLL)))
+ NAPIF_STATE_IN_BUSY_POLL |
+ NAPIF_STATE_SCHED_THREADED_BUSY_POLL)))
return false;
if (work_done) {
@@ -6959,6 +6961,19 @@ static void napi_stop_kthread(struct napi_struct *napi)
napi->thread = NULL;
}
+static void napi_set_threaded_state(struct napi_struct *napi,
+ enum netdev_napi_threaded threaded)
+{
+ unsigned long val;
+
+ val = 0;
+ if (threaded == NETDEV_NAPI_THREADED_BUSY_POLL_ENABLED)
+ val |= NAPIF_STATE_THREADED_BUSY_POLL;
+ if (threaded)
+ val |= NAPIF_STATE_THREADED;
+ set_mask_bits(&napi->state, NAPIF_STATE_THREADED_BUSY_POLL_MASK, val);
+}
+
int napi_set_threaded(struct napi_struct *napi,
enum netdev_napi_threaded threaded)
{
@@ -6979,7 +6994,7 @@ int napi_set_threaded(struct napi_struct *napi,
} else {
/* Make sure kthread is created before THREADED bit is set. */
smp_mb__before_atomic();
- assign_bit(NAPI_STATE_THREADED, &napi->state, threaded);
+ napi_set_threaded_state(napi, threaded);
}
return 0;
@@ -7017,12 +7032,15 @@ int dev_set_threaded(struct net_device *dev,
* polled. In this case, the switch between threaded mode and
* softirq mode will happen in the next round of napi_schedule().
* This should not cause hiccups/stalls to the live traffic.
+ *
+ * Switch to busy_poll threaded napi will occur after the threaded
+ * napi is scheduled.
*/
list_for_each_entry(napi, &dev->napi_list, dev_list) {
if (!threaded && napi->thread)
napi_stop_kthread(napi);
else
- assign_bit(NAPI_STATE_THREADED, &napi->state, threaded);
+ napi_set_threaded_state(napi, threaded);
}
return err;
@@ -7369,7 +7387,9 @@ void napi_disable_locked(struct napi_struct *n)
}
new = val | NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC;
- new &= ~(NAPIF_STATE_THREADED | NAPIF_STATE_PREFER_BUSY_POLL);
+ new &= ~(NAPIF_STATE_THREADED
+ | NAPIF_STATE_THREADED_BUSY_POLL
+ | NAPIF_STATE_PREFER_BUSY_POLL);
} while (!try_cmpxchg(&n->state, &val, new));
hrtimer_cancel(&n->timer);
@@ -7413,7 +7433,7 @@ void napi_enable_locked(struct napi_struct *n)
new = val & ~(NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC);
if (n->dev->threaded && n->thread)
- new |= NAPIF_STATE_THREADED;
+ napi_set_threaded_state(n, n->dev->threaded);
} while (!try_cmpxchg(&n->state, &val, new));
}
EXPORT_SYMBOL(napi_enable_locked);
@@ -7581,7 +7601,7 @@ static int napi_thread_wait(struct napi_struct *napi)
return -1;
}
-static void napi_threaded_poll_loop(struct napi_struct *napi)
+static void napi_threaded_poll_loop(struct napi_struct *napi, bool busy_poll)
{
struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
struct softnet_data *sd;
@@ -7610,22 +7630,53 @@ static void napi_threaded_poll_loop(struct napi_struct *napi)
}
skb_defer_free_flush(sd);
bpf_net_ctx_clear(bpf_net_ctx);
+
+ /* Flush too old packets. If HZ < 1000, flush all packets */
+ if (busy_poll)
+ gro_flush_normal(&napi->gro, HZ >= 1000);
local_bh_enable();
- if (!repoll)
+ /* If busy polling then do not break here because we need to
+ * call cond_resched and rcu_softirq_qs_periodic to prevent
+ * watchdog warnings.
+ */
+ if (!repoll && !busy_poll)
break;
rcu_softirq_qs_periodic(last_qs);
cond_resched();
+
+ if (!repoll)
+ break;
}
}
static int napi_threaded_poll(void *data)
{
struct napi_struct *napi = data;
+ bool busy_poll_sched;
+ unsigned long val;
+ bool busy_poll;
+
+ while (!napi_thread_wait(napi)) {
+ /* Once woken up, this means that we are scheduled as threaded
+ * napi and this thread owns the napi context, if busy poll
+ * state is set then busy poll this napi.
+ */
+ val = READ_ONCE(napi->state);
+ busy_poll = val & NAPIF_STATE_THREADED_BUSY_POLL;
+ busy_poll_sched = val & NAPIF_STATE_SCHED_THREADED_BUSY_POLL;
- while (!napi_thread_wait(napi))
- napi_threaded_poll_loop(napi);
+ /* Do not busy poll if napi is disabled. */
+ if (unlikely(val & NAPIF_STATE_DISABLE))
+ busy_poll = false;
+
+ if (busy_poll != busy_poll_sched)
+ assign_bit(NAPI_STATE_SCHED_THREADED_BUSY_POLL,
+ &napi->state, busy_poll);
+
+ napi_threaded_poll_loop(napi, busy_poll);
+ }
return 0;
}
@@ -12808,7 +12859,7 @@ static void run_backlog_napi(unsigned int cpu)
{
struct softnet_data *sd = per_cpu_ptr(&softnet_data, cpu);
- napi_threaded_poll_loop(&sd->backlog);
+ napi_threaded_poll_loop(&sd->backlog, false);
}
static void backlog_napi_setup(unsigned int cpu)
diff --git a/net/core/dev.h b/net/core/dev.h
index ab6fac65ec24..082270ed5b92 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -317,6 +317,9 @@ static inline void napi_set_irq_suspend_timeout(struct napi_struct *n,
static inline enum netdev_napi_threaded napi_get_threaded(struct napi_struct *n)
{
+ if (test_bit(NAPI_STATE_THREADED_BUSY_POLL, &n->state))
+ return NETDEV_NAPI_THREADED_BUSY_POLL_ENABLED;
+
if (test_bit(NAPI_STATE_THREADED, &n->state))
return NETDEV_NAPI_THREADED_ENABLED;
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 8f897e2c8b4f..3ebf8153666b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -754,7 +754,7 @@ static int modify_napi_threaded(struct net_device *dev, unsigned long val)
if (list_empty(&dev->napi_list))
return -EOPNOTSUPP;
- if (val != 0 && val != 1)
+ if (val > NETDEV_NAPI_THREADED_BUSY_POLL_ENABLED)
return -EOPNOTSUPP;
ret = dev_set_threaded(dev, val);
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index e9a2a6f26cb7..ff20435c45d2 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -97,7 +97,7 @@ static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_THREADED
[NETDEV_A_NAPI_DEFER_HARD_IRQS] = NLA_POLICY_FULL_RANGE(NLA_U32, &netdev_a_napi_defer_hard_irqs_range),
[NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT] = { .type = NLA_UINT, },
[NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT] = { .type = NLA_UINT, },
- [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 1),
+ [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 2),
};
/* NETDEV_CMD_BIND_TX - do */
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index 48eb49aa03d4..8163afb15377 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -80,6 +80,7 @@ enum netdev_qstats_scope {
enum netdev_napi_threaded {
NETDEV_NAPI_THREADED_DISABLED,
NETDEV_NAPI_THREADED_ENABLED,
+ NETDEV_NAPI_THREADED_BUSY_POLL_ENABLED,
};
enum {
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net-next v6 5/5] selftests: Add napi threaded busy poll test in `busy_poller`
2025-07-18 23:20 [PATCH net-next v6 0/5] Add support to do threaded napi busy poll Samiullah Khawaja
` (3 preceding siblings ...)
2025-07-18 23:20 ` [PATCH net-next v6 4/5] Extend napi threaded polling to allow kthread based busy polling Samiullah Khawaja
@ 2025-07-18 23:20 ` Samiullah Khawaja
2025-07-19 21:27 ` [PATCH net-next v6 0/5] Add support to do threaded napi busy poll Martin Karsten
5 siblings, 0 replies; 8+ messages in thread
From: Samiullah Khawaja @ 2025-07-18 23:20 UTC (permalink / raw)
To: Jakub Kicinski, David S . Miller , Eric Dumazet, Paolo Abeni,
almasrymina, willemb, jdamato, mkarsten
Cc: netdev, skhawaja
Add testcase to run busy poll test with threaded napi busy poll enabled.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
tools/testing/selftests/net/busy_poll_test.sh | 25 ++++++++++++++++++-
tools/testing/selftests/net/busy_poller.c | 14 ++++++++---
2 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/busy_poll_test.sh b/tools/testing/selftests/net/busy_poll_test.sh
index 7d2d40812074..ab230df1057e 100755
--- a/tools/testing/selftests/net/busy_poll_test.sh
+++ b/tools/testing/selftests/net/busy_poll_test.sh
@@ -27,6 +27,9 @@ NAPI_DEFER_HARD_IRQS=100
GRO_FLUSH_TIMEOUT=50000
SUSPEND_TIMEOUT=20000000
+# NAPI threaded busy poll config
+NAPI_THREADED_POLL=2
+
setup_ns()
{
set -e
@@ -62,6 +65,9 @@ cleanup_ns()
test_busypoll()
{
suspend_value=${1:-0}
+ napi_threaded_value=${2:-0}
+ prefer_busy_poll_value=${3:-$PREFER_BUSY_POLL}
+
tmp_file=$(mktemp)
out_file=$(mktemp)
@@ -73,10 +79,11 @@ test_busypoll()
-b${SERVER_IP} \
-m${MAX_EVENTS} \
-u${BUSY_POLL_USECS} \
- -P${PREFER_BUSY_POLL} \
+ -P${prefer_busy_poll_value} \
-g${BUSY_POLL_BUDGET} \
-i${NSIM_SV_IFIDX} \
-s${suspend_value} \
+ -t${napi_threaded_value} \
-o${out_file}&
wait_local_port_listen nssv ${SERVER_PORT} tcp
@@ -109,6 +116,15 @@ test_busypoll_with_suspend()
return $?
}
+test_busypoll_with_napi_threaded()
+{
+ # Only enable napi threaded poll. Set suspend timeout and prefer busy
+ # poll to 0.
+ test_busypoll 0 ${NAPI_THREADED_POLL} 0
+
+ return $?
+}
+
###
### Code start
###
@@ -154,6 +170,13 @@ if [ $? -ne 0 ]; then
exit 1
fi
+test_busypoll_with_napi_threaded
+if [ $? -ne 0 ]; then
+ echo "test_busypoll_with_napi_threaded failed"
+ cleanup_ns
+ exit 1
+fi
+
echo "$NSIM_SV_FD:$NSIM_SV_IFIDX" > $NSIM_DEV_SYS_UNLINK
echo $NSIM_CL_ID > $NSIM_DEV_SYS_DEL
diff --git a/tools/testing/selftests/net/busy_poller.c b/tools/testing/selftests/net/busy_poller.c
index 04c7ff577bb8..f7407f09f635 100644
--- a/tools/testing/selftests/net/busy_poller.c
+++ b/tools/testing/selftests/net/busy_poller.c
@@ -65,15 +65,16 @@ static uint32_t cfg_busy_poll_usecs;
static uint16_t cfg_busy_poll_budget;
static uint8_t cfg_prefer_busy_poll;
-/* IRQ params */
+/* NAPI params */
static uint32_t cfg_defer_hard_irqs;
static uint64_t cfg_gro_flush_timeout;
static uint64_t cfg_irq_suspend_timeout;
+static enum netdev_napi_threaded cfg_napi_threaded_poll = NETDEV_NAPI_THREADED_DISABLE;
static void usage(const char *filepath)
{
error(1, 0,
- "Usage: %s -p<port> -b<addr> -m<max_events> -u<busy_poll_usecs> -P<prefer_busy_poll> -g<busy_poll_budget> -o<outfile> -d<defer_hard_irqs> -r<gro_flush_timeout> -s<irq_suspend_timeout> -i<ifindex>",
+ "Usage: %s -p<port> -b<addr> -m<max_events> -u<busy_poll_usecs> -P<prefer_busy_poll> -g<busy_poll_budget> -o<outfile> -d<defer_hard_irqs> -r<gro_flush_timeout> -s<irq_suspend_timeout> -t<napi_threaded_poll> -i<ifindex>",
filepath);
}
@@ -86,7 +87,7 @@ static void parse_opts(int argc, char **argv)
if (argc <= 1)
usage(argv[0]);
- while ((c = getopt(argc, argv, "p:m:b:u:P:g:o:d:r:s:i:")) != -1) {
+ while ((c = getopt(argc, argv, "p:m:b:u:P:g:o:d:r:s:i:t:")) != -1) {
/* most options take integer values, except o and b, so reduce
* code duplication a bit for the common case by calling
* strtoull here and leave bounds checking and casting per
@@ -168,6 +169,12 @@ static void parse_opts(int argc, char **argv)
cfg_ifindex = (int)tmp;
break;
+ case 't':
+ if (tmp == ULLONG_MAX || tmp > 2)
+ error(1, ERANGE, "napi threaded poll value must be 0-2");
+
+ cfg_napi_threaded_poll = (enum netdev_napi_threaded)tmp;
+ break;
}
}
@@ -246,6 +253,7 @@ static void setup_queue(void)
cfg_gro_flush_timeout);
netdev_napi_set_req_set_irq_suspend_timeout(set_req,
cfg_irq_suspend_timeout);
+ netdev_napi_set_req_set_threaded(set_req, cfg_napi_threaded_poll);
if (netdev_napi_set(ys, set_req))
error(1, 0, "can't set NAPI params: %s\n", yerr.msg);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH net-next v6 0/5] Add support to do threaded napi busy poll
2025-07-18 23:20 [PATCH net-next v6 0/5] Add support to do threaded napi busy poll Samiullah Khawaja
` (4 preceding siblings ...)
2025-07-18 23:20 ` [PATCH net-next v6 5/5] selftests: Add napi threaded busy poll test in `busy_poller` Samiullah Khawaja
@ 2025-07-19 21:27 ` Martin Karsten
5 siblings, 0 replies; 8+ messages in thread
From: Martin Karsten @ 2025-07-19 21:27 UTC (permalink / raw)
To: Samiullah Khawaja, Jakub Kicinski, David S . Miller, Eric Dumazet,
Paolo Abeni, almasrymina, willemb, jdamato, joe
Cc: netdev
On 2025-07-18 19:20, Samiullah Khawaja wrote:
[snip]
>
> | Experiment | interrupts | SO_BUSYPOLL | SO_BUSYPOLL(separate) | NAPI threaded |
> |---|---|---|---|---|
> | 12 Kpkt/s + 0us delay | | | | |
> | | p5: 12700 | p5: 12900 | p5: 13300 | p5: 12800 |
> | | p50: 13100 | p50: 13600 | p50: 14100 | p50: 13000 |
> | | p95: 13200 | p95: 13800 | p95: 14400 | p95: 13000 |
> | | p99: 13200 | p99: 13800 | p99: 14400 | p99: 13000 |
> | 32 Kpkt/s + 30us delay | | | | |
> | | p5: 19900 | p5: 16600 | p5: 13100 | p5: 12800 |
> | | p50: 21100 | p50: 17000 | p50: 13700 | p50: 13000 |
> | | p95: 21200 | p95: 17100 | p95: 14000 | p95: 13000 |
> | | p99: 21200 | p99: 17100 | p99: 14000 | p99: 13000 |
> | 125 Kpkt/s + 6us delay | | | | |
> | | p5: 14600 | p5: 17100 | p5: 13300 | p5: 12900 |
> | | p50: 15400 | p50: 17400 | p50: 13800 | p50: 13100 |
> | | p95: 15600 | p95: 17600 | p95: 14000 | p95: 13100 |
> | | p99: 15600 | p99: 17600 | p99: 14000 | p99: 13100 |
> | 12 Kpkt/s + 78us delay | | | | |
> | | p5: 14100 | p5: 16700 | p5: 13200 | p5: 12600 |
> | | p50: 14300 | p50: 17100 | p50: 13900 | p50: 12800 |
> | | p95: 14300 | p95: 17200 | p95: 14200 | p95: 12800 |
> | | p99: 14300 | p99: 17200 | p99: 14200 | p99: 12800 |
> | 25 Kpkt/s + 38us delay | | | | |
> | | p5: 19900 | p5: 16600 | p5: 13000 | p5: 12700 |
> | | p50: 21000 | p50: 17100 | p50: 13800 | p50: 12900 |
> | | p95: 21100 | p95: 17100 | p95: 14100 | p95: 12900 |
> | | p99: 21100 | p99: 17100 | p99: 14100 | p99: 12900 |
>
> ## Observations
>
> - Here without application processing all the approaches give the same
> latency within 1usecs range and NAPI threaded gives minimum latency.
> - With application processing the latency increases by 3-4usecs when
> doing inline polling.
> - Using a dedicated core to drive napi polling keeps the latency same
> even with application processing. This is observed both in userspace
> and threaded napi (in kernel).
> - Using napi threaded polling in kernel gives lower latency by
> 1-1.5usecs as compared to userspace driven polling in separate core.
> - With application processing userspace will get the packet from recv
> ring and spend some time doing application processing and then do napi
> polling. While application processing is happening a dedicated core
> doing napi polling can pull the packet of the NAPI RX queue and
> populate the AF_XDP recv ring. This means that when the application
> thread is done with application processing it has new packets ready to
> recv and process in recv ring.
> - Napi threaded busy polling in the kernel with a dedicated core gives
> the consistent P5-P99 latency.
Hi Samiullah.
I notice that you still present the experiments with application delay.
I previously asked what these experiments represent, since it's highly
unlikely that a latency-critical service would run at 100% load?
I also notice that you have not added any warning to the cover letter
that explicitly spells out the trade-off between performance and efficiency.
However, most importantly, I am trying to rerun the experiments, but
when running xsk_rr with threaded napi busy poll, networking locks up
and the machine needs a hard reset to reboot. This is after applying
your patches to commit c3886ccaadf8fdc2c91bfbdcdca36ccdc6ef8f70. I have
tested with Intel E810-XXV-2 using the ice driver and Mellanox
ConnectX-4 Lx using the mlx5 driver.
I am enclosing the various stack backtraces that I find in the logs.
Best,
Martin
**** ice ****
Jul 19 16:51:31 husky07 kernel: INFO: task systemd-network:542 blocked
for more than 122 seconds.
Jul 19 16:51:31 husky07 kernel: Tainted: G I E
6.16.0-rc5-test #1
Jul 19 16:51:31 husky07 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 16:51:31 husky07 kernel: task:systemd-network state:D stack:0
pid:542 tgid:542 ppid:1 task_flags:0x400100 flags:0x00004006
Jul 19 16:51:31 husky07 kernel: Call Trace:
Jul 19 16:51:31 husky07 kernel: <TASK>
Jul 19 16:51:31 husky07 kernel: __schedule+0x49b/0x1530
Jul 19 16:51:31 husky07 kernel: schedule+0x27/0xf0
Jul 19 16:51:31 husky07 kernel: schedule_preempt_disabled+0x15/0x30
Jul 19 16:51:31 husky07 kernel: __mutex_lock.constprop.0+0x4c9/0x870
Jul 19 16:51:31 husky07 kernel: ? __nla_validate_parse+0x5a/0xe30
Jul 19 16:51:31 husky07 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 19 16:51:31 husky07 kernel: mutex_lock+0x3b/0x50
Jul 19 16:51:31 husky07 kernel: rtnl_lock+0x15/0x20
Jul 19 16:51:31 husky07 kernel: inet_rtm_newaddr+0x101/0x540
Jul 19 16:51:31 husky07 kernel: ? __pfx_inet_rtm_newaddr+0x10/0x10
Jul 19 16:51:31 husky07 kernel: rtnetlink_rcv_msg+0x37e/0x450
Jul 19 16:51:31 husky07 kernel: ? shmem_undo_range+0x283/0x850
Jul 19 16:51:31 husky07 kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
Jul 19 16:51:31 husky07 kernel: netlink_rcv_skb+0x5c/0x110
Jul 19 16:51:31 husky07 kernel: rtnetlink_rcv+0x15/0x30
Jul 19 16:51:31 husky07 kernel: netlink_unicast+0x282/0x3d0
Jul 19 16:51:31 husky07 kernel: netlink_sendmsg+0x214/0x470
Jul 19 16:51:31 husky07 kernel: __sys_sendto+0x23d/0x250
Jul 19 16:51:31 husky07 kernel: __x64_sys_sendto+0x24/0x40
Jul 19 16:51:31 husky07 kernel: x64_sys_call+0x1c32/0x2660
Jul 19 16:51:31 husky07 kernel: do_syscall_64+0x80/0x990
Jul 19 16:51:31 husky07 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:51:31 husky07 kernel: ? __ct_user_enter+0x72/0x100
Jul 19 16:51:31 husky07 kernel: ? do_syscall_64+0x1be/0x990
Jul 19 16:51:31 husky07 kernel: ? kmem_cache_free+0x43a/0x470
Jul 19 16:51:31 husky07 kernel: ? sched_clock_noinstr+0x9/0x10
Jul 19 16:51:31 husky07 kernel: ? sched_clock+0x10/0x30
Jul 19 16:51:31 husky07 kernel: ? get_vtime_delta+0x14/0xc0
Jul 19 16:51:31 husky07 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:51:31 husky07 kernel: ? __ct_user_enter+0x72/0x100
Jul 19 16:51:31 husky07 kernel: ? do_syscall_64+0x1be/0x990
Jul 19 16:51:31 husky07 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:51:31 husky07 kernel: ? __ct_user_enter+0x72/0x100
Jul 19 16:51:31 husky07 kernel: ? do_syscall_64+0x1be/0x990
Jul 19 16:51:31 husky07 kernel: ? sched_clock_noinstr+0x9/0x10
Jul 19 16:51:31 husky07 kernel: ? sched_clock+0x10/0x30
Jul 19 16:51:31 husky07 kernel: ? get_vtime_delta+0x14/0xc0
Jul 19 16:51:31 husky07 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:51:31 husky07 kernel: ? __ct_user_enter+0x72/0x100
Jul 19 16:51:31 husky07 kernel: ? do_syscall_64+0x1be/0x990
Jul 19 16:51:31 husky07 kernel: ? do_syscall_64+0x1be/0x990
Jul 19 16:51:31 husky07 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 19 16:51:31 husky07 kernel: RIP: 0033:0x7018b7b2c0a7
Jul 19 16:51:31 husky07 kernel: RSP: 002b:00007ffd470844a8 EFLAGS:
00000202 ORIG_RAX: 000000000000002c
Jul 19 16:51:31 husky07 kernel: RAX: ffffffffffffffda RBX:
000062ff08206d00 RCX: 00007018b7b2c0a7
Jul 19 16:51:31 husky07 kernel: RDX: 0000000000000044 RSI:
000062ff08235ae0 RDI: 0000000000000003
Jul 19 16:51:31 husky07 kernel: RBP: 00007ffd47084540 R08:
00007ffd470844b0 R09: 0000000000000080
Jul 19 16:51:31 husky07 kernel: R10: 0000000000000000 R11:
0000000000000202 R12: 000062ff0823a0a0
Jul 19 16:51:31 husky07 kernel: R13: 000062ff082105b8 R14:
0000000000000000 R15: 000062ff08210570
Jul 19 16:51:31 husky07 kernel: </TASK>
Jul 19 16:51:31 husky07 kernel: INFO: task systemd-network:542 is
blocked on a mutex likely owned by task xsk_rr:5912.
Jul 19 16:51:31 husky07 kernel: task:xsk_rr state:D stack:0
pid:5912 tgid:5912 ppid:5911 task_flags:0x400100 flags:0x00004006
Jul 19 16:51:31 husky07 kernel: Call Trace:
Jul 19 16:51:31 husky07 kernel: <TASK>
Jul 19 16:51:31 husky07 kernel: __schedule+0x49b/0x1530
Jul 19 16:51:31 husky07 kernel: ? _raw_spin_unlock_irqrestore+0x21/0x60
Jul 19 16:51:31 husky07 kernel: schedule+0x27/0xf0
Jul 19 16:51:31 husky07 kernel: schedule_timeout+0x85/0x110
Jul 19 16:51:31 husky07 kernel: ? __pfx_process_timeout+0x10/0x10
Jul 19 16:51:31 husky07 kernel: msleep+0x34/0x60
Jul 19 16:51:31 husky07 kernel: napi_stop_kthread+0x78/0x80
Jul 19 16:51:31 husky07 kernel: napi_set_threaded+0x33/0xc0
Jul 19 16:51:31 husky07 kernel: napi_enable_locked+0xb5/0x250
Jul 19 16:51:31 husky07 kernel: napi_enable+0x25/0x50
Jul 19 16:51:31 husky07 kernel: ice_up_complete+0x91/0x260 [ice]
Jul 19 16:51:31 husky07 kernel: ice_xdp+0x388/0x5d0 [ice]
Jul 19 16:51:31 husky07 kernel: ? __pfx_ice_xdp+0x10/0x10 [ice]
Jul 19 16:51:31 husky07 kernel: dev_xdp_install+0x157/0x320
Jul 19 16:51:31 husky07 kernel: dev_xdp_attach+0x23f/0x9d0
Jul 19 16:51:31 husky07 kernel: ? __bpf_prog_get+0x1f/0xf0
Jul 19 16:51:31 husky07 kernel: dev_change_xdp_fd+0x164/0x210
Jul 19 16:51:31 husky07 kernel: do_setlink.isra.0+0x110a/0x12c0
Jul 19 16:51:31 husky07 kernel: ? get_page_from_freelist+0x167f/0x1bd0
Jul 19 16:51:31 husky07 kernel: ? __nla_validate_parse+0x5a/0xe30
Jul 19 16:51:31 husky07 kernel: ? ns_capable+0x2a/0x60
Jul 19 16:51:31 husky07 kernel: rtnl_setlink+0x289/0x600
Jul 19 16:51:31 husky07 kernel: ? security_capable+0x7c/0x1e0
Jul 19 16:51:31 husky07 kernel: ? __pfx_rtnl_setlink+0x10/0x10
Jul 19 16:51:31 husky07 kernel: rtnetlink_rcv_msg+0x37e/0x450
Jul 19 16:51:31 husky07 kernel: ? kvfree+0x31/0x40
Jul 19 16:51:31 husky07 kernel: ? map_update_elem+0x203/0x330
Jul 19 16:51:31 husky07 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:51:31 husky07 kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
Jul 19 16:51:31 husky07 kernel: netlink_rcv_skb+0x5c/0x110
Jul 19 16:51:31 husky07 kernel: rtnetlink_rcv+0x15/0x30
Jul 19 16:51:31 husky07 kernel: netlink_unicast+0x282/0x3d0
Jul 19 16:51:31 husky07 kernel: netlink_sendmsg+0x214/0x470
Jul 19 16:51:31 husky07 kernel: __sys_sendto+0x23d/0x250
Jul 19 16:51:31 husky07 kernel: __x64_sys_sendto+0x24/0x40
Jul 19 16:51:31 husky07 kernel: x64_sys_call+0x1c32/0x2660
Jul 19 16:51:31 husky07 kernel: do_syscall_64+0x80/0x990
Jul 19 16:51:31 husky07 kernel: ? sched_clock_noinstr+0x9/0x10
Jul 19 16:51:31 husky07 kernel: ? sched_clock+0x10/0x30
Jul 19 16:51:31 husky07 kernel: ? get_vtime_delta+0x14/0xc0
Jul 19 16:51:31 husky07 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:51:31 husky07 kernel: ? __ct_user_enter+0x72/0x100
Jul 19 16:51:31 husky07 kernel: ? irqentry_exit_to_user_mode+0x167/0x270
Jul 19 16:51:31 husky07 kernel: ? irqentry_exit+0x43/0x50
Jul 19 16:51:31 husky07 kernel: ? exc_page_fault+0x90/0x1b0
Jul 19 16:51:31 husky07 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 19 16:51:31 husky07 kernel: RIP: 0033:0x73175152bead
Jul 19 16:51:31 husky07 kernel: RSP: 002b:00007ffd54363fa8 EFLAGS:
00000246 ORIG_RAX: 000000000000002c
Jul 19 16:51:31 husky07 kernel: RAX: ffffffffffffffda RBX:
0000000000000004 RCX: 000073175152bead
Jul 19 16:51:31 husky07 kernel: RDX: 0000000000000034 RSI:
00007ffd54364030 RDI: 0000000000000008
Jul 19 16:51:31 husky07 kernel: RBP: 00007ffd54364000 R08:
0000000000000000 R09: 0000000000000000
Jul 19 16:51:31 husky07 kernel: R10: 0000000000000000 R11:
0000000000000246 R12: 0000000000000019
Jul 19 16:51:31 husky07 kernel: R13: 0000000000000000 R14:
000063c753f8cd78 R15: 000073175187c000
Jul 19 16:51:31 husky07 kernel: </TASK>
Jul 19 16:51:31 husky07 kernel: INFO: task kworker/u50:3:3472 blocked
for more than 122 seconds.
Jul 19 16:51:31 husky07 kernel: Tainted: G I E
6.16.0-rc5-test #1
Jul 19 16:51:31 husky07 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 16:51:31 husky07 kernel: task:kworker/u50:3 state:D stack:0
pid:3472 tgid:3472 ppid:2 task_flags:0x4208060 flags:0x00004000
Jul 19 16:51:31 husky07 kernel: Workqueue: events_unbound linkwatch_event
Jul 19 16:51:31 husky07 kernel: Call Trace:
Jul 19 16:51:31 husky07 kernel: <TASK>
Jul 19 16:51:31 husky07 kernel: __schedule+0x49b/0x1530
Jul 19 16:51:31 husky07 kernel: ? sched_clock_noinstr+0x9/0x10
Jul 19 16:51:31 husky07 kernel: ? sched_clock+0x10/0x30
Jul 19 16:51:31 husky07 kernel: ? sched_clock_cpu+0x10/0x1e0
Jul 19 16:51:31 husky07 kernel: schedule+0x27/0xf0
Jul 19 16:51:31 husky07 kernel: schedule_preempt_disabled+0x15/0x30
Jul 19 16:51:31 husky07 kernel: __mutex_lock.constprop.0+0x4c9/0x870
Jul 19 16:51:31 husky07 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 19 16:51:31 husky07 kernel: mutex_lock+0x3b/0x50
Jul 19 16:51:31 husky07 kernel: rtnl_lock+0x15/0x20
Jul 19 16:51:31 husky07 kernel: linkwatch_event+0x12/0x40
Jul 19 16:51:31 husky07 kernel: process_one_work+0x191/0x3e0
Jul 19 16:51:31 husky07 kernel: worker_thread+0x2e3/0x420
Jul 19 16:51:31 husky07 kernel: ? __pfx_worker_thread+0x10/0x10
Jul 19 16:51:31 husky07 kernel: kthread+0x10d/0x230
Jul 19 16:51:31 husky07 kernel: ? __pfx_kthread+0x10/0x10
Jul 19 16:51:31 husky07 kernel: ret_from_fork+0x1d7/0x210
Jul 19 16:51:31 husky07 kernel: ? __pfx_kthread+0x10/0x10
Jul 19 16:51:31 husky07 kernel: ret_from_fork_asm+0x1a/0x30
Jul 19 16:51:31 husky07 kernel: </TASK>
Jul 19 16:51:31 husky07 kernel: INFO: task kworker/u50:3:3472 is blocked
on a mutex likely owned by task xsk_rr:5912.
Jul 19 16:51:31 husky07 kernel: task:xsk_rr state:D stack:0
pid:5912 tgid:5912 ppid:5911 task_flags:0x400100 flags:0x00004006
Jul 19 16:51:31 husky07 kernel: Call Trace:
Jul 19 16:51:31 husky07 kernel: <TASK>
Jul 19 16:51:31 husky07 kernel: __schedule+0x49b/0x1530
Jul 19 16:51:31 husky07 kernel: ? _raw_spin_unlock_irqrestore+0x21/0x60
Jul 19 16:51:31 husky07 kernel: schedule+0x27/0xf0
Jul 19 16:51:31 husky07 kernel: schedule_timeout+0x85/0x110
Jul 19 16:51:31 husky07 kernel: ? __pfx_process_timeout+0x10/0x10
Jul 19 16:51:31 husky07 kernel: msleep+0x34/0x60
Jul 19 16:51:31 husky07 kernel: napi_stop_kthread+0x78/0x80
Jul 19 16:51:31 husky07 kernel: napi_set_threaded+0x33/0xc0
Jul 19 16:51:31 husky07 kernel: napi_enable_locked+0xb5/0x250
Jul 19 16:51:31 husky07 kernel: napi_enable+0x25/0x50
Jul 19 16:51:31 husky07 kernel: ice_up_complete+0x91/0x260 [ice]
Jul 19 16:51:31 husky07 kernel: ice_xdp+0x388/0x5d0 [ice]
Jul 19 16:51:31 husky07 kernel: ? __pfx_ice_xdp+0x10/0x10 [ice]
Jul 19 16:51:31 husky07 kernel: dev_xdp_install+0x157/0x320
Jul 19 16:51:31 husky07 kernel: dev_xdp_attach+0x23f/0x9d0
Jul 19 16:51:31 husky07 kernel: ? __bpf_prog_get+0x1f/0xf0
Jul 19 16:51:31 husky07 kernel: dev_change_xdp_fd+0x164/0x210
Jul 19 16:51:31 husky07 kernel: do_setlink.isra.0+0x110a/0x12c0
Jul 19 16:51:31 husky07 kernel: ? get_page_from_freelist+0x167f/0x1bd0
Jul 19 16:51:31 husky07 kernel: ? __nla_validate_parse+0x5a/0xe30
Jul 19 16:51:31 husky07 kernel: ? ns_capable+0x2a/0x60
Jul 19 16:51:31 husky07 kernel: rtnl_setlink+0x289/0x600
Jul 19 16:51:31 husky07 kernel: ? security_capable+0x7c/0x1e0
Jul 19 16:51:31 husky07 kernel: ? __pfx_rtnl_setlink+0x10/0x10
Jul 19 16:51:31 husky07 kernel: rtnetlink_rcv_msg+0x37e/0x450
Jul 19 16:51:31 husky07 kernel: ? kvfree+0x31/0x40
Jul 19 16:51:31 husky07 kernel: ? map_update_elem+0x203/0x330
Jul 19 16:51:31 husky07 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:51:31 husky07 kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
Jul 19 16:51:31 husky07 kernel: netlink_rcv_skb+0x5c/0x110
Jul 19 16:51:31 husky07 kernel: rtnetlink_rcv+0x15/0x30
Jul 19 16:51:31 husky07 kernel: netlink_unicast+0x282/0x3d0
Jul 19 16:51:31 husky07 kernel: netlink_sendmsg+0x214/0x470
Jul 19 16:51:31 husky07 kernel: __sys_sendto+0x23d/0x250
Jul 19 16:51:31 husky07 kernel: __x64_sys_sendto+0x24/0x40
Jul 19 16:51:31 husky07 kernel: x64_sys_call+0x1c32/0x2660
Jul 19 16:51:31 husky07 kernel: do_syscall_64+0x80/0x990
Jul 19 16:51:31 husky07 kernel: ? sched_clock_noinstr+0x9/0x10
Jul 19 16:51:31 husky07 kernel: ? sched_clock+0x10/0x30
Jul 19 16:51:31 husky07 kernel: ? get_vtime_delta+0x14/0xc0
Jul 19 16:51:31 husky07 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:51:31 husky07 kernel: ? __ct_user_enter+0x72/0x100
Jul 19 16:51:31 husky07 kernel: ? irqentry_exit_to_user_mode+0x167/0x270
Jul 19 16:51:31 husky07 kernel: ? irqentry_exit+0x43/0x50
Jul 19 16:51:31 husky07 kernel: ? exc_page_fault+0x90/0x1b0
Jul 19 16:51:31 husky07 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 19 16:51:31 husky07 kernel: RIP: 0033:0x73175152bead
Jul 19 16:51:31 husky07 kernel: RSP: 002b:00007ffd54363fa8 EFLAGS:
00000246 ORIG_RAX: 000000000000002c
Jul 19 16:51:31 husky07 kernel: RAX: ffffffffffffffda RBX:
0000000000000004 RCX: 000073175152bead
Jul 19 16:51:31 husky07 kernel: RDX: 0000000000000034 RSI:
00007ffd54364030 RDI: 0000000000000008
Jul 19 16:51:31 husky07 kernel: RBP: 00007ffd54364000 R08:
0000000000000000 R09: 0000000000000000
Jul 19 16:51:31 husky07 kernel: R10: 0000000000000000 R11:
0000000000000246 R12: 0000000000000019
Jul 19 16:51:31 husky07 kernel: R13: 0000000000000000 R14:
000063c753f8cd78 R15: 000073175187c000
Jul 19 16:51:31 husky07 kernel: </TASK>
Jul 19 16:51:31 husky07 kernel: INFO: task sudo:5918 blocked for more
than 122 seconds.
Jul 19 16:51:31 husky07 kernel: Tainted: G I E
6.16.0-rc5-test #1
Jul 19 16:51:31 husky07 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 16:51:31 husky07 kernel: task:sudo state:D stack:0
pid:5918 tgid:5918 ppid:5856 task_flags:0x400100 flags:0x00004006
Jul 19 16:51:31 husky07 kernel: Call Trace:
Jul 19 16:51:31 husky07 kernel: <TASK>
Jul 19 16:51:31 husky07 kernel: __schedule+0x49b/0x1530
Jul 19 16:51:31 husky07 kernel: ? xa_load+0x6d/0xa0
Jul 19 16:51:31 husky07 kernel: schedule+0x27/0xf0
Jul 19 16:51:31 husky07 kernel: schedule_preempt_disabled+0x15/0x30
Jul 19 16:51:31 husky07 kernel: __mutex_lock.constprop.0+0x4c9/0x870
Jul 19 16:51:31 husky07 kernel: ? __pfx_rtnl_dump_ifinfo+0x10/0x10
Jul 19 16:51:31 husky07 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 19 16:51:31 husky07 kernel: mutex_lock+0x3b/0x50
Jul 19 16:51:31 husky07 kernel: rtnl_dumpit+0x83/0xc0
Jul 19 16:51:31 husky07 kernel: netlink_dump+0x197/0x3c0
Jul 19 16:51:31 husky07 kernel: ? obj_cgroup_charge_account+0x139/0x370
Jul 19 16:51:31 husky07 kernel: __netlink_dump_start+0x204/0x340
Jul 19 16:51:31 husky07 kernel: ? __pfx_rtnl_dump_ifinfo+0x10/0x10
Jul 19 16:51:31 husky07 kernel: rtnetlink_rcv_msg+0x2d6/0x450
Jul 19 16:51:31 husky07 kernel: ? __pfx_rtnl_dumpit+0x10/0x10
Jul 19 16:51:31 husky07 kernel: ? __pfx_rtnl_dump_ifinfo+0x10/0x10
Jul 19 16:51:31 husky07 kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
Jul 19 16:51:31 husky07 kernel: netlink_rcv_skb+0x5c/0x110
Jul 19 16:51:31 husky07 kernel: rtnetlink_rcv+0x15/0x30
Jul 19 16:51:31 husky07 kernel: netlink_unicast+0x282/0x3d0
Jul 19 16:51:31 husky07 kernel: netlink_sendmsg+0x214/0x470
Jul 19 16:51:31 husky07 kernel: __sys_sendto+0x23d/0x250
Jul 19 16:51:31 husky07 kernel: __x64_sys_sendto+0x24/0x40
Jul 19 16:51:31 husky07 kernel: x64_sys_call+0x1c32/0x2660
Jul 19 16:51:31 husky07 kernel: do_syscall_64+0x80/0x990
Jul 19 16:51:31 husky07 kernel: ? walk_system_ram_range+0xa8/0x110
Jul 19 16:51:31 husky07 kernel: ? __pfx_pagerange_is_ram_callback+0x10/0x10
Jul 19 16:51:31 husky07 kernel: ? ___pte_offset_map+0x1c/0x1b0
Jul 19 16:51:31 husky07 kernel: ? __pte_offset_map_lock+0xa2/0x120
Jul 19 16:51:31 husky07 kernel: ? __get_locked_pte+0x3f/0x90
Jul 19 16:51:31 husky07 kernel: ? insert_pfn+0xbb/0x220
Jul 19 16:51:31 husky07 kernel: ? vmf_insert_pfn_prot+0x99/0x100
Jul 19 16:51:31 husky07 kernel: ? vmf_insert_pfn+0x12/0x20
Jul 19 16:51:31 husky07 kernel: ? vvar_fault+0xa1/0x110
Jul 19 16:51:31 husky07 kernel: ? special_mapping_fault+0x21/0xd0
Jul 19 16:51:31 husky07 kernel: ? __do_fault+0x3d/0x190
Jul 19 16:51:31 husky07 kernel: ? do_fault+0x2d5/0x570
Jul 19 16:51:31 husky07 kernel: ? __handle_mm_fault+0x838/0x1070
Jul 19 16:51:31 husky07 kernel: ? security_task_setrlimit+0xa3/0x1b0
Jul 19 16:51:31 husky07 kernel: ? do_prlimit+0x144/0x230
Jul 19 16:51:31 husky07 kernel: ? count_memcg_events+0x180/0x200
Jul 19 16:51:31 husky07 kernel: ? sched_clock_noinstr+0x9/0x10
Jul 19 16:51:31 husky07 kernel: ? sched_clock+0x10/0x30
Jul 19 16:51:31 husky07 kernel: ? get_vtime_delta+0x14/0xc0
Jul 19 16:51:31 husky07 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:51:31 husky07 kernel: ? __ct_user_enter+0x72/0x100
Jul 19 16:51:31 husky07 kernel: ? irqentry_exit_to_user_mode+0x167/0x270
Jul 19 16:51:31 husky07 kernel: ? irqentry_exit+0x43/0x50
Jul 19 16:51:31 husky07 kernel: ? exc_page_fault+0x90/0x1b0
Jul 19 16:51:31 husky07 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 19 16:51:31 husky07 kernel: RIP: 0033:0x7cb59032c0a7
Jul 19 16:51:31 husky07 kernel: RSP: 002b:00007ffcf07c2808 EFLAGS:
00000202 ORIG_RAX: 000000000000002c
Jul 19 16:51:31 husky07 kernel: RAX: ffffffffffffffda RBX:
00007ffcf07c2850 RCX: 00007cb59032c0a7
Jul 19 16:51:31 husky07 kernel: RDX: 0000000000000014 RSI:
00007ffcf07c2890 RDI: 0000000000000003
Jul 19 16:51:31 husky07 kernel: RBP: 00007ffcf07c28e0 R08:
00007ffcf07c2850 R09: 000000000000000c
Jul 19 16:51:31 husky07 kernel: R10: 0000000000000000 R11:
0000000000000202 R12: 00007ffcf07c2980
Jul 19 16:51:31 husky07 kernel: R13: 00007ffcf07c2890 R14:
00007ffcf07c29b0 R15: 00007ffcf07c2e58
Jul 19 16:51:31 husky07 kernel: </TASK>
Jul 19 16:51:31 husky07 kernel: INFO: task sudo:5918 is blocked on a
mutex likely owned by task xsk_rr:5912.
Jul 19 16:51:31 husky07 kernel: task:xsk_rr state:D stack:0
pid:5912 tgid:5912 ppid:5911 task_flags:0x400100 flags:0x00004006
Jul 19 16:51:31 husky07 kernel: Call Trace:
Jul 19 16:51:31 husky07 kernel: <TASK>
Jul 19 16:51:31 husky07 kernel: __schedule+0x49b/0x1530
Jul 19 16:51:31 husky07 kernel: ? _raw_spin_unlock_irqrestore+0x21/0x60
Jul 19 16:51:31 husky07 kernel: schedule+0x27/0xf0
Jul 19 16:51:31 husky07 kernel: schedule_timeout+0x85/0x110
Jul 19 16:51:31 husky07 kernel: ? __pfx_process_timeout+0x10/0x10
Jul 19 16:51:31 husky07 kernel: msleep+0x34/0x60
Jul 19 16:51:31 husky07 kernel: napi_stop_kthread+0x78/0x80
Jul 19 16:51:31 husky07 kernel: napi_set_threaded+0x33/0xc0
Jul 19 16:51:31 husky07 kernel: napi_enable_locked+0xb5/0x250
Jul 19 16:51:31 husky07 kernel: napi_enable+0x25/0x50
Jul 19 16:51:31 husky07 kernel: ice_up_complete+0x91/0x260 [ice]
Jul 19 16:51:31 husky07 kernel: ice_xdp+0x388/0x5d0 [ice]
Jul 19 16:51:31 husky07 kernel: ? __pfx_ice_xdp+0x10/0x10 [ice]
Jul 19 16:51:31 husky07 kernel: dev_xdp_install+0x157/0x320
Jul 19 16:51:31 husky07 kernel: dev_xdp_attach+0x23f/0x9d0
Jul 19 16:51:31 husky07 kernel: ? __bpf_prog_get+0x1f/0xf0
Jul 19 16:51:31 husky07 kernel: dev_change_xdp_fd+0x164/0x210
Jul 19 16:51:31 husky07 kernel: do_setlink.isra.0+0x110a/0x12c0
Jul 19 16:51:31 husky07 kernel: ? get_page_from_freelist+0x167f/0x1bd0
Jul 19 16:51:31 husky07 kernel: ? __nla_validate_parse+0x5a/0xe30
Jul 19 16:51:31 husky07 kernel: ? ns_capable+0x2a/0x60
Jul 19 16:51:31 husky07 kernel: rtnl_setlink+0x289/0x600
Jul 19 16:51:31 husky07 kernel: ? security_capable+0x7c/0x1e0
Jul 19 16:51:31 husky07 kernel: ? __pfx_rtnl_setlink+0x10/0x10
Jul 19 16:51:31 husky07 kernel: rtnetlink_rcv_msg+0x37e/0x450
Jul 19 16:51:31 husky07 kernel: ? kvfree+0x31/0x40
Jul 19 16:51:31 husky07 kernel: ? map_update_elem+0x203/0x330
Jul 19 16:51:31 husky07 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:51:31 husky07 kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
Jul 19 16:51:31 husky07 kernel: netlink_rcv_skb+0x5c/0x110
Jul 19 16:51:31 husky07 kernel: rtnetlink_rcv+0x15/0x30
Jul 19 16:51:31 husky07 kernel: netlink_unicast+0x282/0x3d0
Jul 19 16:51:31 husky07 kernel: netlink_sendmsg+0x214/0x470
Jul 19 16:51:31 husky07 kernel: __sys_sendto+0x23d/0x250
Jul 19 16:51:31 husky07 kernel: __x64_sys_sendto+0x24/0x40
Jul 19 16:51:31 husky07 kernel: x64_sys_call+0x1c32/0x2660
Jul 19 16:51:31 husky07 kernel: do_syscall_64+0x80/0x990
Jul 19 16:51:31 husky07 kernel: ? sched_clock_noinstr+0x9/0x10
Jul 19 16:51:31 husky07 kernel: ? sched_clock+0x10/0x30
Jul 19 16:51:31 husky07 kernel: ? get_vtime_delta+0x14/0xc0
Jul 19 16:51:31 husky07 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:51:31 husky07 kernel: ? __ct_user_enter+0x72/0x100
Jul 19 16:51:31 husky07 kernel: ? irqentry_exit_to_user_mode+0x167/0x270
Jul 19 16:51:31 husky07 kernel: ? irqentry_exit+0x43/0x50
Jul 19 16:51:31 husky07 kernel: ? exc_page_fault+0x90/0x1b0
Jul 19 16:51:31 husky07 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 19 16:51:31 husky07 kernel: RIP: 0033:0x73175152bead
Jul 19 16:51:31 husky07 kernel: RSP: 002b:00007ffd54363fa8 EFLAGS:
00000246 ORIG_RAX: 000000000000002c
Jul 19 16:51:31 husky07 kernel: RAX: ffffffffffffffda RBX:
0000000000000004 RCX: 000073175152bead
Jul 19 16:51:31 husky07 kernel: RDX: 0000000000000034 RSI:
00007ffd54364030 RDI: 0000000000000008
Jul 19 16:51:31 husky07 kernel: RBP: 00007ffd54364000 R08:
0000000000000000 R09: 0000000000000000
Jul 19 16:51:31 husky07 kernel: R10: 0000000000000000 R11:
0000000000000246 R12: 0000000000000019
Jul 19 16:51:31 husky07 kernel: R13: 0000000000000000 R14:
000063c753f8cd78 R15: 000073175187c000
Jul 19 16:51:31 husky07 kernel: </TASK>
**** mlx5 ****
Jul 19 16:52:28 tilly02 kernel: INFO: task kworker/u129:1:255 blocked
for more than 122 seconds.
Jul 19 16:52:28 tilly02 kernel: Tainted: G I E
6.16.0-rc5-test #1
Jul 19 16:52:28 tilly02 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 16:52:28 tilly02 kernel: task:kworker/u129:1 state:D stack:0
pid:255 tgid:255 ppid:2 task_flags:0x4208060 flags:0x00004000
Jul 19 16:52:28 tilly02 kernel: Workqueue: events_unbound linkwatch_event
Jul 19 16:52:28 tilly02 kernel: Call Trace:
Jul 19 16:52:28 tilly02 kernel: <TASK>
Jul 19 16:52:28 tilly02 kernel: __schedule+0x493/0x1630
Jul 19 16:52:28 tilly02 kernel: ? sched_clock+0x10/0x30
Jul 19 16:52:28 tilly02 kernel: ? sched_clock_cpu+0x10/0x1e0
Jul 19 16:52:28 tilly02 kernel: schedule+0x27/0xf0
Jul 19 16:52:28 tilly02 kernel: schedule_preempt_disabled+0x15/0x30
Jul 19 16:52:28 tilly02 kernel: __mutex_lock.constprop.0+0x4c9/0x870
Jul 19 16:52:28 tilly02 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 19 16:52:28 tilly02 kernel: mutex_lock+0x3b/0x50
Jul 19 16:52:28 tilly02 kernel: rtnl_lock+0x15/0x20
Jul 19 16:52:28 tilly02 kernel: linkwatch_event+0x12/0x40
Jul 19 16:52:28 tilly02 kernel: process_one_work+0x18e/0x3e0
Jul 19 16:52:28 tilly02 kernel: worker_thread+0x2e3/0x420
Jul 19 16:52:28 tilly02 kernel: ? __pfx_worker_thread+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: kthread+0x10a/0x230
Jul 19 16:52:28 tilly02 kernel: ? __pfx_kthread+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: ret_from_fork+0x1d4/0x210
Jul 19 16:52:28 tilly02 kernel: ? __pfx_kthread+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: ret_from_fork_asm+0x1a/0x30
Jul 19 16:52:28 tilly02 kernel: </TASK>
Jul 19 16:52:28 tilly02 kernel: INFO: task kworker/u129:1:255 is blocked
on a mutex likely owned by task xsk_rr:1612.
Jul 19 16:52:28 tilly02 kernel: task:xsk_rr state:D stack:0
pid:1612 tgid:1612 ppid:1611 task_flags:0x400100 flags:0x00004002
Jul 19 16:52:28 tilly02 kernel: Call Trace:
Jul 19 16:52:28 tilly02 kernel: <TASK>
Jul 19 16:52:28 tilly02 kernel: __schedule+0x493/0x1630
Jul 19 16:52:28 tilly02 kernel: schedule+0x27/0xf0
Jul 19 16:52:28 tilly02 kernel: schedule_timeout+0x85/0x110
Jul 19 16:52:28 tilly02 kernel: ? __pfx_process_timeout+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: msleep+0x34/0x60
Jul 19 16:52:28 tilly02 kernel: napi_stop_kthread+0x78/0x80
Jul 19 16:52:28 tilly02 kernel: napi_set_threaded+0x33/0xc0
Jul 19 16:52:28 tilly02 kernel: napi_enable_locked+0xb5/0x250
Jul 19 16:52:28 tilly02 kernel:
mlx5e_activate_priv_channels+0x1bc/0x490 [mlx5_core]
Jul 19 16:52:28 tilly02 kernel: mlx5e_switch_priv_channels+0xeb/0x150
[mlx5_core]
Jul 19 16:52:28 tilly02 kernel: mlx5e_safe_switch_params+0xef/0x140
[mlx5_core]
Jul 19 16:52:28 tilly02 kernel: mlx5e_xdp_set+0xd0/0x220 [mlx5_core]
Jul 19 16:52:28 tilly02 kernel: ? __pfx_mlx5e_xdp+0x10/0x10 [mlx5_core]
Jul 19 16:52:28 tilly02 kernel: mlx5e_xdp+0x47/0x60 [mlx5_core]
Jul 19 16:52:28 tilly02 kernel: dev_xdp_install+0x154/0x320
Jul 19 16:52:28 tilly02 kernel: dev_xdp_attach+0x23f/0x9d0
Jul 19 16:52:28 tilly02 kernel: ? __bpf_prog_get+0x1f/0xf0
Jul 19 16:52:28 tilly02 kernel: dev_change_xdp_fd+0x164/0x210
Jul 19 16:52:28 tilly02 kernel: do_setlink.isra.0+0x110a/0x12c0
Jul 19 16:52:28 tilly02 kernel: ? __call_rcu_common+0x233/0x730
Jul 19 16:52:28 tilly02 kernel: ? __rmqueue_pcplist+0x86e/0xed0
Jul 19 16:52:28 tilly02 kernel: ? __nla_validate_parse+0x5a/0xe30
Jul 19 16:52:28 tilly02 kernel: ? ns_capable+0x2a/0x60
Jul 19 16:52:28 tilly02 kernel: rtnl_setlink+0x289/0x600
Jul 19 16:52:28 tilly02 kernel: ? __memcg_slab_post_alloc_hook+0x1b0/0x3e0
Jul 19 16:52:28 tilly02 kernel: ? security_capable+0x77/0x1c0
Jul 19 16:52:28 tilly02 kernel: ? __pfx_rtnl_setlink+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: rtnetlink_rcv_msg+0x37b/0x450
Jul 19 16:52:28 tilly02 kernel: ? bpf_map_kzalloc+0xd1/0x110
Jul 19 16:52:28 tilly02 kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: netlink_rcv_skb+0x59/0x110
Jul 19 16:52:28 tilly02 kernel: rtnetlink_rcv+0x15/0x30
Jul 19 16:52:28 tilly02 kernel: netlink_unicast+0x27f/0x3d0
Jul 19 16:52:28 tilly02 kernel: netlink_sendmsg+0x214/0x470
Jul 19 16:52:28 tilly02 kernel: __sys_sendto+0x23a/0x250
Jul 19 16:52:28 tilly02 kernel: __x64_sys_sendto+0x24/0x40
Jul 19 16:52:28 tilly02 kernel: x64_sys_call+0x1c32/0x2660
Jul 19 16:52:28 tilly02 kernel: do_syscall_64+0x80/0x9a0
Jul 19 16:52:28 tilly02 kernel: ? vmf_insert_pfn_prot+0x99/0x100
Jul 19 16:52:28 tilly02 kernel: ? vmf_insert_pfn+0x12/0x20
Jul 19 16:52:28 tilly02 kernel: ? vvar_fault+0xa1/0x110
Jul 19 16:52:28 tilly02 kernel: ? special_mapping_fault+0x1e/0xd0
Jul 19 16:52:28 tilly02 kernel: ? __do_fault+0x3a/0x190
Jul 19 16:52:28 tilly02 kernel: ? do_fault+0x2d5/0x570
Jul 19 16:52:28 tilly02 kernel: ? __handle_mm_fault+0x838/0x1070
Jul 19 16:52:28 tilly02 kernel: ? count_memcg_events+0x180/0x200
Jul 19 16:52:28 tilly02 kernel: ? sched_clock_noinstr+0x9/0x10
Jul 19 16:52:28 tilly02 kernel: ? sched_clock+0x10/0x30
Jul 19 16:52:28 tilly02 kernel: ? get_vtime_delta+0x14/0xc0
Jul 19 16:52:28 tilly02 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:52:28 tilly02 kernel: ? __ct_user_enter+0x72/0x100
Jul 19 16:52:28 tilly02 kernel: ? irqentry_exit_to_user_mode+0x167/0x270
Jul 19 16:52:28 tilly02 kernel: ? irqentry_exit+0x43/0x50
Jul 19 16:52:28 tilly02 kernel: ? exc_page_fault+0x90/0x1b0
Jul 19 16:52:28 tilly02 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 19 16:52:28 tilly02 kernel: RIP: 0033:0x7e1395f2bead
Jul 19 16:52:28 tilly02 kernel: RSP: 002b:00007ffd2ac39398 EFLAGS:
00000246 ORIG_RAX: 000000000000002c
Jul 19 16:52:28 tilly02 kernel: RAX: ffffffffffffffda RBX:
0000000000000004 RCX: 00007e1395f2bead
Jul 19 16:52:28 tilly02 kernel: RDX: 0000000000000034 RSI:
00007ffd2ac39420 RDI: 0000000000000008
Jul 19 16:52:28 tilly02 kernel: RBP: 00007ffd2ac393f0 R08:
0000000000000000 R09: 0000000000000000
Jul 19 16:52:28 tilly02 kernel: R10: 0000000000000000 R11:
0000000000000246 R12: 0000000000000019
Jul 19 16:52:28 tilly02 kernel: R13: 0000000000000000 R14:
00006305503a1d78 R15: 00007e1396267000
Jul 19 16:52:28 tilly02 kernel: </TASK>
Jul 19 16:52:28 tilly02 kernel: INFO: task sudo:1619 blocked for more
than 122 seconds.
Jul 19 16:52:28 tilly02 kernel: Tainted: G I E
6.16.0-rc5-test #1
Jul 19 16:52:28 tilly02 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 16:52:28 tilly02 kernel: task:sudo state:D stack:0
pid:1619 tgid:1619 ppid:1544 task_flags:0x400100 flags:0x00004002
Jul 19 16:52:28 tilly02 kernel: Call Trace:
Jul 19 16:52:28 tilly02 kernel: <TASK>
Jul 19 16:52:28 tilly02 kernel: __schedule+0x493/0x1630
Jul 19 16:52:28 tilly02 kernel: ? obj_cgroup_charge_account+0x139/0x370
Jul 19 16:52:28 tilly02 kernel: schedule+0x27/0xf0
Jul 19 16:52:28 tilly02 kernel: schedule_preempt_disabled+0x15/0x30
Jul 19 16:52:28 tilly02 kernel: __mutex_lock.constprop.0+0x4c9/0x870
Jul 19 16:52:28 tilly02 kernel: ? __pfx_rtnl_dump_ifinfo+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 19 16:52:28 tilly02 kernel: mutex_lock+0x3b/0x50
Jul 19 16:52:28 tilly02 kernel: rtnl_dumpit+0x83/0xc0
Jul 19 16:52:28 tilly02 kernel: netlink_dump+0x194/0x3c0
Jul 19 16:52:28 tilly02 kernel: __netlink_dump_start+0x204/0x340
Jul 19 16:52:28 tilly02 kernel: ? __pfx_rtnl_dump_ifinfo+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: rtnetlink_rcv_msg+0x2d6/0x450
Jul 19 16:52:28 tilly02 kernel: ? __pfx_rtnl_dumpit+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: ? __pfx_rtnl_dump_ifinfo+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: netlink_rcv_skb+0x59/0x110
Jul 19 16:52:28 tilly02 kernel: rtnetlink_rcv+0x15/0x30
Jul 19 16:52:28 tilly02 kernel: netlink_unicast+0x27f/0x3d0
Jul 19 16:52:28 tilly02 kernel: netlink_sendmsg+0x214/0x470
Jul 19 16:52:28 tilly02 kernel: __sys_sendto+0x23a/0x250
Jul 19 16:52:28 tilly02 kernel: __x64_sys_sendto+0x24/0x40
Jul 19 16:52:28 tilly02 kernel: x64_sys_call+0x1c32/0x2660
Jul 19 16:52:28 tilly02 kernel: do_syscall_64+0x80/0x9a0
Jul 19 16:52:28 tilly02 kernel: ? __pte_offset_map_lock+0xa2/0x120
Jul 19 16:52:28 tilly02 kernel: ? __get_locked_pte+0x3f/0x90
Jul 19 16:52:28 tilly02 kernel: ? insert_pfn+0xbb/0x220
Jul 19 16:52:28 tilly02 kernel: ? vmf_insert_pfn_prot+0x99/0x100
Jul 19 16:52:28 tilly02 kernel: ? vmf_insert_pfn+0x12/0x20
Jul 19 16:52:28 tilly02 kernel: ? vvar_fault+0xa1/0x110
Jul 19 16:52:28 tilly02 kernel: ? special_mapping_fault+0x1e/0xd0
Jul 19 16:52:28 tilly02 kernel: ? __do_fault+0x3a/0x190
Jul 19 16:52:28 tilly02 kernel: ? do_fault+0x2d5/0x570
Jul 19 16:52:28 tilly02 kernel: ? __handle_mm_fault+0x838/0x1070
Jul 19 16:52:28 tilly02 kernel: ? __do_sys_prlimit64+0x244/0x2e0
Jul 19 16:52:28 tilly02 kernel: ? count_memcg_events+0x180/0x200
Jul 19 16:52:28 tilly02 kernel: ? handle_mm_fault+0xbc/0x300
Jul 19 16:52:28 tilly02 kernel: ? __ct_user_enter+0x2d/0x100
Jul 19 16:52:28 tilly02 kernel: ? irqentry_exit_to_user_mode+0x167/0x270
Jul 19 16:52:28 tilly02 kernel: ? irqentry_exit+0x43/0x50
Jul 19 16:52:28 tilly02 kernel: ? exc_page_fault+0x90/0x1b0
Jul 19 16:52:28 tilly02 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 19 16:52:28 tilly02 kernel: RIP: 0033:0x75041712c0a7
Jul 19 16:52:28 tilly02 kernel: RSP: 002b:00007ffc92399938 EFLAGS:
00000202 ORIG_RAX: 000000000000002c
Jul 19 16:52:28 tilly02 kernel: RAX: ffffffffffffffda RBX:
00007ffc92399980 RCX: 000075041712c0a7
Jul 19 16:52:28 tilly02 kernel: RDX: 0000000000000014 RSI:
00007ffc923999c0 RDI: 0000000000000003
Jul 19 16:52:28 tilly02 kernel: RBP: 00007ffc92399a10 R08:
00007ffc92399980 R09: 000000000000000c
Jul 19 16:52:28 tilly02 kernel: R10: 0000000000000000 R11:
0000000000000202 R12: 00007ffc92399ab0
Jul 19 16:52:28 tilly02 kernel: R13: 00007ffc923999c0 R14:
00007ffc92399ae0 R15: 00007ffc92399f88
Jul 19 16:52:28 tilly02 kernel: </TASK>
Jul 19 16:52:28 tilly02 kernel: INFO: task sudo:1619 is blocked on a
mutex likely owned by task xsk_rr:1612.
Jul 19 16:52:28 tilly02 kernel: task:xsk_rr state:D stack:0
pid:1612 tgid:1612 ppid:1611 task_flags:0x400100 flags:0x00004002
Jul 19 16:52:28 tilly02 kernel: Call Trace:
Jul 19 16:52:28 tilly02 kernel: <TASK>
Jul 19 16:52:28 tilly02 kernel: __schedule+0x493/0x1630
Jul 19 16:52:28 tilly02 kernel: schedule+0x27/0xf0
Jul 19 16:52:28 tilly02 kernel: schedule_timeout+0x85/0x110
Jul 19 16:52:28 tilly02 kernel: ? __pfx_process_timeout+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: msleep+0x34/0x60
Jul 19 16:52:28 tilly02 kernel: napi_stop_kthread+0x78/0x80
Jul 19 16:52:28 tilly02 kernel: napi_set_threaded+0x33/0xc0
Jul 19 16:52:28 tilly02 kernel: napi_enable_locked+0xb5/0x250
Jul 19 16:52:28 tilly02 kernel:
mlx5e_activate_priv_channels+0x1bc/0x490 [mlx5_core]
Jul 19 16:52:28 tilly02 kernel: mlx5e_switch_priv_channels+0xeb/0x150
[mlx5_core]
Jul 19 16:52:28 tilly02 kernel: mlx5e_safe_switch_params+0xef/0x140
[mlx5_core]
Jul 19 16:52:28 tilly02 kernel: mlx5e_xdp_set+0xd0/0x220 [mlx5_core]
Jul 19 16:52:28 tilly02 kernel: ? __pfx_mlx5e_xdp+0x10/0x10 [mlx5_core]
Jul 19 16:52:28 tilly02 kernel: mlx5e_xdp+0x47/0x60 [mlx5_core]
Jul 19 16:52:28 tilly02 kernel: dev_xdp_install+0x154/0x320
Jul 19 16:52:28 tilly02 kernel: dev_xdp_attach+0x23f/0x9d0
Jul 19 16:52:28 tilly02 kernel: ? __bpf_prog_get+0x1f/0xf0
Jul 19 16:52:28 tilly02 kernel: dev_change_xdp_fd+0x164/0x210
Jul 19 16:52:28 tilly02 kernel: do_setlink.isra.0+0x110a/0x12c0
Jul 19 16:52:28 tilly02 kernel: ? __call_rcu_common+0x233/0x730
Jul 19 16:52:28 tilly02 kernel: ? __rmqueue_pcplist+0x86e/0xed0
Jul 19 16:52:28 tilly02 kernel: ? __nla_validate_parse+0x5a/0xe30
Jul 19 16:52:28 tilly02 kernel: ? ns_capable+0x2a/0x60
Jul 19 16:52:28 tilly02 kernel: rtnl_setlink+0x289/0x600
Jul 19 16:52:28 tilly02 kernel: ? __memcg_slab_post_alloc_hook+0x1b0/0x3e0
Jul 19 16:52:28 tilly02 kernel: ? security_capable+0x77/0x1c0
Jul 19 16:52:28 tilly02 kernel: ? __pfx_rtnl_setlink+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: rtnetlink_rcv_msg+0x37b/0x450
Jul 19 16:52:28 tilly02 kernel: ? bpf_map_kzalloc+0xd1/0x110
Jul 19 16:52:28 tilly02 kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
Jul 19 16:52:28 tilly02 kernel: netlink_rcv_skb+0x59/0x110
Jul 19 16:52:28 tilly02 kernel: rtnetlink_rcv+0x15/0x30
Jul 19 16:52:28 tilly02 kernel: netlink_unicast+0x27f/0x3d0
Jul 19 16:52:28 tilly02 kernel: netlink_sendmsg+0x214/0x470
Jul 19 16:52:28 tilly02 kernel: __sys_sendto+0x23a/0x250
Jul 19 16:52:28 tilly02 kernel: __x64_sys_sendto+0x24/0x40
Jul 19 16:52:28 tilly02 kernel: x64_sys_call+0x1c32/0x2660
Jul 19 16:52:28 tilly02 kernel: do_syscall_64+0x80/0x9a0
Jul 19 16:52:28 tilly02 kernel: ? vmf_insert_pfn_prot+0x99/0x100
Jul 19 16:52:28 tilly02 kernel: ? vmf_insert_pfn+0x12/0x20
Jul 19 16:52:28 tilly02 kernel: ? vvar_fault+0xa1/0x110
Jul 19 16:52:28 tilly02 kernel: ? special_mapping_fault+0x1e/0xd0
Jul 19 16:52:28 tilly02 kernel: ? __do_fault+0x3a/0x190
Jul 19 16:52:28 tilly02 kernel: ? do_fault+0x2d5/0x570
Jul 19 16:52:28 tilly02 kernel: ? __handle_mm_fault+0x838/0x1070
Jul 19 16:52:28 tilly02 kernel: ? count_memcg_events+0x180/0x200
Jul 19 16:52:28 tilly02 kernel: ? sched_clock_noinstr+0x9/0x10
Jul 19 16:52:28 tilly02 kernel: ? sched_clock+0x10/0x30
Jul 19 16:52:28 tilly02 kernel: ? get_vtime_delta+0x14/0xc0
Jul 19 16:52:28 tilly02 kernel: ? ct_kernel_exit.isra.0+0x84/0xb0
Jul 19 16:52:28 tilly02 kernel: ? __ct_user_enter+0x72/0x100
Jul 19 16:52:28 tilly02 kernel: ? irqentry_exit_to_user_mode+0x167/0x270
Jul 19 16:52:28 tilly02 kernel: ? irqentry_exit+0x43/0x50
Jul 19 16:52:28 tilly02 kernel: ? exc_page_fault+0x90/0x1b0
Jul 19 16:52:28 tilly02 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 19 16:52:28 tilly02 kernel: RIP: 0033:0x7e1395f2bead
Jul 19 16:52:28 tilly02 kernel: RSP: 002b:00007ffd2ac39398 EFLAGS:
00000246 ORIG_RAX: 000000000000002c
Jul 19 16:52:28 tilly02 kernel: RAX: ffffffffffffffda RBX:
0000000000000004 RCX: 00007e1395f2bead
Jul 19 16:52:28 tilly02 kernel: RDX: 0000000000000034 RSI:
00007ffd2ac39420 RDI: 0000000000000008
Jul 19 16:52:28 tilly02 kernel: RBP: 00007ffd2ac393f0 R08:
0000000000000000 R09: 0000000000000000
Jul 19 16:52:28 tilly02 kernel: R10: 0000000000000000 R11:
0000000000000246 R12: 0000000000000019
Jul 19 16:52:28 tilly02 kernel: R13: 0000000000000000 R14:
00006305503a1d78 R15: 00007e1396267000
Jul 19 16:52:28 tilly02 kernel: </TASK>
> Following histogram is generated to measure the time spent in recvfrom
> while using inline thread with SO_BUSYPOLL. The histogram is generated
> using the following bpftrace command. In this experiment there are 32K
> packets per second and the application processing delay is 30usecs. This
> is to measure whether there is significant time spent pulling packets
> from the descriptor queue that it will affect the overall latency if
> done inline.
>
> ```
> bpftrace -e '
> kprobe:xsk_recvmsg {
> @start[tid] = nsecs;
> }
> kretprobe:xsk_recvmsg {
> if (@start[tid]) {
> $sample = (nsecs - @start[tid]);
> @xsk_recvfrom_hist = hist($sample);
> delete(@start[tid]);
> }
> }
> END { clear(@start);}'
> ```
>
> Here in case of inline busypolling around 35 percent of calls are taking
> 1-2usecs and around 50 percent are taking 0.5-2usecs.
>
> @xsk_recvfrom_hist:
> [128, 256) 24073 |@@@@@@@@@@@@@@@@@@@@@@ |
> [256, 512) 55633 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [512, 1K) 20974 |@@@@@@@@@@@@@@@@@@@ |
> [1K, 2K) 34234 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [2K, 4K) 3266 |@@@ |
> [4K, 8K) 19 | |
>
> v6:
> - Moved threaded in struct netdevice up to fill the cacheline hole.
> - Changed dev_set_threaded to dev_set_threaded_hint and removed the
> second argument that was always set to true by all the drivers.
> Exported only dev_set_threaded_hint and made dev_set_threaded core
> only function. This change is done in a separate commit.
> - Updated documentation comment for threaded in struct netdevice.
> - gro_flush_helper renamed to gro_flush_normal and moved to gro.h. Also
> used it in kernel/bpf/cpumap.c
> - Updated documentation to explicitly state that the NAPI threaded busy
> polling would keep the CPU core busy at 100% usage.
> - Updated documentation and commit messages.
>
> v5:
> - Updated experiment data with 'SO_PREFER_BUSY_POLL' usage as
> suggested.
> - Sent 'Add support to set napi threaded for individual napi'
> separately. This series depends on top of that patch.
> https://lore.kernel.org/netdev/20250423201413.1564527-1-skhawaja@google.com/
> - Added a separate patch to use enum for napi threaded state. Updated
> the nl_netdev python test.
> - Using "write all" semantics when napi settings set at device level.
> This aligns with already existing behaviour for other settings.
> - Fix comments to make them kdoc compatible.
> - Updated Documentation/networking/net_cachelines/net_device.rst
> - Updated the missed gro_flush modification in napi_complete_done
>
> v4:
> - Using AF_XDP based benchmark for experiments.
> - Re-enable dev level napi threaded busypoll after soft reset.
>
> v3:
> - Fixed calls to dev_set_threaded in drivers
>
> v2:
> - Add documentation in napi.rst.
> - Provide experiment data and usecase details.
> - Update busy_poller selftest to include napi threaded poll testcase.
> - Define threaded mode enum in netlink interface.
> - Included NAPI threaded state in napi config to save/restore.
>
> Samiullah Khawaja (5):
> net: Create separate gro_flush_normal function
> net: Use dev_set_threaded_hint instead of dev_set_threaded in drivers
> net: define an enum for the napi threaded state
> Extend napi threaded polling to allow kthread based busy polling
> selftests: Add napi threaded busy poll test in `busy_poller`
>
> Documentation/ABI/testing/sysfs-class-net | 3 +-
> Documentation/netlink/specs/netdev.yaml | 14 ++-
> Documentation/networking/napi.rst | 63 +++++++++++-
> .../networking/net_cachelines/net_device.rst | 2 +-
> .../net/ethernet/atheros/atl1c/atl1c_main.c | 2 +-
> drivers/net/ethernet/mellanox/mlxsw/pci.c | 2 +-
> drivers/net/ethernet/renesas/ravb_main.c | 2 +-
> drivers/net/wireguard/device.c | 2 +-
> drivers/net/wireless/ath/ath10k/snoc.c | 2 +-
> drivers/net/wireless/mediatek/mt76/debugfs.c | 2 +-
> include/linux/netdevice.h | 18 +++-
> include/net/gro.h | 6 ++
> include/uapi/linux/netdev.h | 6 ++
> kernel/bpf/cpumap.c | 3 +-
> net/core/dev.c | 97 +++++++++++++++----
> net/core/dev.h | 16 ++-
> net/core/net-sysfs.c | 2 +-
> net/core/netdev-genl-gen.c | 2 +-
> net/core/netdev-genl.c | 2 +-
> tools/include/uapi/linux/netdev.h | 6 ++
> tools/testing/selftests/net/busy_poll_test.sh | 25 ++++-
> tools/testing/selftests/net/busy_poller.c | 14 ++-
> tools/testing/selftests/net/nl_netdev.py | 36 +++----
> 23 files changed, 257 insertions(+), 70 deletions(-)
>
>
> base-commit: c3886ccaadf8fdc2c91bfbdcdca36ccdc6ef8f70
^ permalink raw reply [flat|nested] 8+ messages in thread