* Re: [PATCH] net: airoha: Fix skb->priority underflow in airoha_dev_select_queue()
From: Lorenzo Bianconi @ 2026-06-14 8:09 UTC (permalink / raw)
To: Wayen.Yan
Cc: netdev, horms, pabeni, kuba, edumazet, andrew+netdev,
angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
linux-mediatek
In-Reply-To: <6a2de8c5.2c570c9e.53b1a.0e1b@mx.google.com>
[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]
> In airoha_dev_select_queue(), the expression:
>
> queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES;
>
> implicitly converts to unsigned arithmetic: when skb->priority is 0
> (the default for unclassified traffic), (0u - 1u) wraps to UINT_MAX,
> and UINT_MAX % 8 = 7, routing default best-effort packets to the
> highest-priority QoS queue. This causes QoS inversion where the
> majority of traffic on a PON gateway starves actual high-priority
> flows (VoIP, gaming, etc.).
>
> Fix by guarding the subtraction: when priority is 0, map to queue 0
> (lowest priority), otherwise apply the original (priority - 1) % 8
> mapping.
>
> Fixes: 2b288b81560b ("net: airoha: Introduce ndo_select_queue callback")
> Signed-off-by: Wayen <win847@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 31cdb11cd7..d476ef83c3 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -1933,7 +1933,7 @@ static u16 airoha_dev_select_queue(struct net_device *dev, struct sk_buff *skb,
> */
> channel = netdev_uses_dsa(dev) ? skb_get_queue_mapping(skb) : port->id;
> channel = channel % AIROHA_NUM_QOS_CHANNELS;
> - queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES; /* QoS queue */
> + queue = skb->priority ? (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES : 0;
> queue = channel * AIROHA_NUM_QOS_QUEUES + queue;
>
> return queue < dev->num_tx_queues ? queue : 0;
> --
> 2.51.0
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH] net: airoha: Remove dead MT7996 NPU firmware declarations
From: Lorenzo Bianconi @ 2026-06-14 8:16 UTC (permalink / raw)
To: Wayen.Yan
Cc: netdev, horms, pabeni, kuba, edumazet, andrew+netdev,
angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
linux-mediatek
In-Reply-To: <6a2dea77.01c4f138.336eeb.a256@mx.google.com>
[-- Attachment #1: Type: text/plain, Size: 2985 bytes --]
> Remove the NPU_EN7581_7996_FIRMWARE_DATA/RV32 #define macros and
> their corresponding MODULE_FIRMWARE() declarations. Neither the
> en7581_npu_soc_data nor the an7583_npu_soc_data references these
> firmware names, and no firmware loading path in the driver ever
> requests them. The only references are the #define lines themselves
> and the MODULE_FIRMWARE() declarations below.
>
> Keeping dead MODULE_FIRMWARE entries causes modprobe/udev to attempt
> pre-loading non-existent firmware files, generating kernel log noise
> and misleading distributors about which firmware files to package.
>
> Fixes: 23290c7bc190 ("net: airoha: Introduce Airoha NPU support")
> Signed-off-by: Wayen <win847@gmail.com>
Please drop this patch since EN7581_7996 firmware is defined via dts
for 7581:
commit 3847173525e307ebcd23bd4863da943ea78b0057
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Tue Jan 20 11:17:18 2026 +0100
net: airoha: npu: Add the capability to read firmware names from dts
Introduce the capability to read the firmware binary names from device-tree
using the firmware-name property if available.
This patch is needed because NPU firmware binaries are board specific since
they depend on the MediaTek WiFi chip used on the board (e.g. MT7996 or
MT7992) and the WiFi chip version info is not available in the NPU driver.
This is a preliminary patch to enable MT76 NPU offloading if the Airoha SoC
is equipped with MT7996 (Eagle) WiFi chipset.
https://github.com/openwrt/openwrt/blob/main/target/linux/airoha/dts/an7581-npu-mt7996.dtsi
and here these macros are used to notify userspace for firmware loading.
Regards,
Lorenzo
> ---
> drivers/net/ethernet/airoha/airoha_npu.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/airoha/airoha_npu.c b/drivers/net/ethernet/airoha/airoha_npu.c
> index 17dbdc8325..93095f3894 100644
> --- a/drivers/net/ethernet/airoha/airoha_npu.c
> +++ b/drivers/net/ethernet/airoha/airoha_npu.c
> @@ -16,8 +16,6 @@
>
> #define NPU_EN7581_FIRMWARE_DATA "airoha/en7581_npu_data.bin"
> #define NPU_EN7581_FIRMWARE_RV32 "airoha/en7581_npu_rv32.bin"
> -#define NPU_EN7581_7996_FIRMWARE_DATA "airoha/en7581_MT7996_npu_data.bin"
> -#define NPU_EN7581_7996_FIRMWARE_RV32 "airoha/en7581_MT7996_npu_rv32.bin"
> #define NPU_AN7583_FIRMWARE_DATA "airoha/an7583_npu_data.bin"
> #define NPU_AN7583_FIRMWARE_RV32 "airoha/an7583_npu_rv32.bin"
> #define NPU_EN7581_FIRMWARE_RV32_MAX_SIZE 0x200000
> @@ -822,8 +820,6 @@ module_platform_driver(airoha_npu_driver);
>
> MODULE_FIRMWARE(NPU_EN7581_FIRMWARE_DATA);
> MODULE_FIRMWARE(NPU_EN7581_FIRMWARE_RV32);
> -MODULE_FIRMWARE(NPU_EN7581_7996_FIRMWARE_DATA);
> -MODULE_FIRMWARE(NPU_EN7581_7996_FIRMWARE_RV32);
> MODULE_FIRMWARE(NPU_AN7583_FIRMWARE_DATA);
> MODULE_FIRMWARE(NPU_AN7583_FIRMWARE_RV32);
> MODULE_LICENSE("GPL");
> --
> 2.51.0
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: net: airoha: Remove dead MT7996 NPU firmware declarations
From: win847 @ 2026-06-14 8:22 UTC (permalink / raw)
To: netdev; +Cc: lorenzo
In-Reply-To: <6a2dea77.01c4f138.336eeb.a256@mx.google.com>
Thank you for the clarification, Lorenzo.
You're right - I missed the DTS-based firmware loading mechanism
introduced in commit 3847173525e30. The MODULE_FIRMWARE() macros
are indeed used to notify userspace for firmware loading, and the
firmware names are defined via DTS for different board configurations.
I'll drop this patch.
Regards,
Wayen
^ permalink raw reply
* [PATCH v3 1/3] net/smc: bound the wire-controlled producer cursor to the RMB
From: Bryam Vargas via B4 Relay @ 2026-06-14 8:23 UTC (permalink / raw)
To: Wenjia Zhang, Dust Li, D. Wythe, Sidraya Jayagond
Cc: Eric Dumazet, David S. Miller, Mahanta Jambigi, Wen Gu,
Simon Horman, netdev, Ursula Braun, Stefan Raspl, linux-s390,
Paolo Abeni, linux-kernel, linux-rdma, Jakub Kicinski, Tony Lu
In-Reply-To: <20260614-b4-disp-edd64be9-v3-0-551fa514257e@proton.me>
From: Bryam Vargas <hexlabsecurity@proton.me>
smc_cdc_cursor_to_host() (SMC-R) and smcd_cdc_msg_to_host() (SMC-D)
import the peer's producer cursor from the wire into the local
connection cursor with no upper bound against the receive buffer (RMB).
The urgent path then uses that count as a raw index:
base = conn->rmb_desc->cpu_addr + conn->rx_off;
conn->urg_rx_byte = *(base + conn->urg_curs.count - 1);
so a peer that advertises a producer cursor past rmb_desc->len reads
out of bounds of the RMB allocation in the receive tasklet (softirq).
Bound the producer cursor count to rmb_desc->len at the conversion
boundary, for both SMC-R and SMC-D. Apply the bound to the producer
cursor only: the consumer cursor indexes the peer's RMB and is bounded
by peer_rmbe_size, so clamping it to our rmb_desc->len would
under-credit peer_rmbe_space and stall transmit to a peer whose RMB is
larger than ours.
Fixes: de8474eb9d50 ("net/smc: urgent data support")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
net/smc/smc_cdc.h | 27 ++++++++++++++++++++++++---
1 file changed, 24 insertions(+), 3 deletions(-)
diff --git a/net/smc/smc_cdc.h b/net/smc/smc_cdc.h
index 696cc11f2303..ca76ef630356 100644
--- a/net/smc/smc_cdc.h
+++ b/net/smc/smc_cdc.h
@@ -221,7 +221,8 @@ static inline void smc_host_msg_to_cdc(struct smc_cdc_msg *peer,
static inline void smc_cdc_cursor_to_host(union smc_host_cursor *local,
union smc_cdc_cursor *peer,
- struct smc_connection *conn)
+ struct smc_connection *conn,
+ int max_count)
{
union smc_host_cursor temp, old;
union smc_cdc_cursor net;
@@ -235,6 +236,15 @@ static inline void smc_cdc_cursor_to_host(union smc_host_cursor *local,
if ((old.wrap == temp.wrap) &&
(old.count > temp.count))
return;
+ /* The peer producer cursor is wire-controlled and is later used as a
+ * raw index into our RMB by the urgent path; bound its count to the
+ * RMB. max_count == 0 leaves the consumer cursor unbounded here: it
+ * indexes the peer's RMB (bounded by peer_rmbe_size, not our
+ * rmb_desc->len), so clamping it to rmb_desc->len would under-credit
+ * peer_rmbe_space and stall transmit to peers with a larger RMB.
+ */
+ if (max_count && temp.count > max_count)
+ temp.count = max_count;
smc_curs_copy(local, &temp, conn);
}
@@ -246,8 +256,13 @@ static inline void smcr_cdc_msg_to_host(struct smc_host_cdc_msg *local,
local->len = peer->len;
local->seqno = ntohs(peer->seqno);
local->token = ntohl(peer->token);
- smc_cdc_cursor_to_host(&local->prod, &peer->prod, conn);
- smc_cdc_cursor_to_host(&local->cons, &peer->cons, conn);
+ /* bound the wire-controlled producer cursor to our RMB (used as a raw
+ * index by the urgent path); leave the consumer cursor unbounded -- it
+ * indexes the peer's RMB and is bounded by peer_rmbe_size.
+ */
+ smc_cdc_cursor_to_host(&local->prod, &peer->prod, conn,
+ conn->rmb_desc->len);
+ smc_cdc_cursor_to_host(&local->cons, &peer->cons, conn, 0);
local->prod_flags = peer->prod_flags;
local->conn_state_flags = peer->conn_state_flags;
}
@@ -260,6 +275,12 @@ static inline void smcd_cdc_msg_to_host(struct smc_host_cdc_msg *local,
temp.wrap = peer->prod.wrap;
temp.count = peer->prod.count;
+ /* the peer producer cursor is wire-controlled and is used as a raw
+ * index into our RMB by the urgent path; bound it to the RMB. The
+ * consumer cursor below indexes the peer's RMB and is left unbounded.
+ */
+ if (temp.count > conn->rmb_desc->len)
+ temp.count = conn->rmb_desc->len;
smc_curs_copy(&local->prod, &temp, conn);
temp.wrap = peer->cons.wrap;
--
2.43.0
^ permalink raw reply related
* [PATCH v3 0/3] net/smc: bound wire-controlled CDC cursors against the local buffers
From: Bryam Vargas via B4 Relay @ 2026-06-14 8:23 UTC (permalink / raw)
To: Wenjia Zhang, Dust Li, D. Wythe, Sidraya Jayagond
Cc: Eric Dumazet, David S. Miller, Mahanta Jambigi, Wen Gu,
Simon Horman, netdev, Ursula Braun, Stefan Raspl, linux-s390,
Paolo Abeni, linux-kernel, linux-rdma, Jakub Kicinski, Tony Lu
A peer's CDC producer/consumer cursors are copied from the wire and used,
without an upper bound against the local buffers, as (a) a raw index into the
RMB on the urgent path, (b) the receive length in smc_rx_recvmsg(), and (c) the
send length in smc_tx_sendmsg() on the SMC-D DMB-merge path. A malicious or
buggy peer can forge a cursor so each of these runs past the relevant buffer:
an out-of-bounds read of adjacent kernel memory (disclosed to the peer) on the
receive/urgent side, and an out-of-bounds write of attacker-influenced length
and content on the send side.
This series bounds each wire-controlled value at its point of use against the
local buffer, enforcing invariants the code already documents
("0 <= bytes_to_rcv <= rmb_desc->len", "0 <= sndbuf_space <= sndbuf_desc->len").
Conforming peers always keep these values in range, so the bounds are no-ops in
normal operation.
1/3 bounds the producer cursor count to rmb_desc->len at the SMC-R/SMC-D
conversion boundary (the urgent-path raw index). The bound is applied to
the producer cursor only -- the consumer cursor indexes the peer's RMB and
is bounded by peer_rmbe_size, so clamping it to our rmb_desc->len would
under-credit peer_rmbe_space and stall transmit to a peer with a larger
RMB.
2/3 bounds the readable count in smc_rx_recvmsg() so the wrap-around copy
cannot read past the RMB.
3/3 bounds the write space in smc_tx_sendmsg() so the wrap-around copy cannot
write past the send buffer.
This supersedes two separately-posted patches and folds them into one series
together with the producer-cursor fix, after review feedback that they share a
root cause:
- net/smc: bound peer producer cursor and bytes_to_rcv on SMC-D CDC receive
https://lore.kernel.org/netdev/20260610084803.186516-1-hexlabsecurity@proton.me/
- net/smc: bound sndbuf_space on the SMC-D DMB-merge receive path
https://lore.kernel.org/netdev/20260610090928.192177-1-hexlabsecurity@proton.me/
Changes since those postings (addressing the review):
- The receive/send bounds were previously applied in the CDC receive tasklet,
after the atomic_add(). As the review noted, that read-then-set is not
atomic, and a recvmsg()/sendmsg() on another CPU can observe the inflated
value in the window between the atomic_add() and the clamp: recvmsg() runs
under lock_sock(), which leaves the slock free, so it is not serialized
against the bh_lock_sock() CDC tasklet. The bound now lives at the consumer,
where the value is used to size the copy, which is race-free.
- The bounds now also reject a negative value (if (x < 0 || x > len)): across
many forged CDC messages the signed accumulator can wrap negative, which a
plain "> len" check misses and min_t(size_t, ...) then turns into a huge
length.
- The SMC-R producer-cursor bound is applied only to the producer cursor at
the call site, not in the shared smc_cdc_cursor_to_host() helper, so the
consumer cursor (bounded by peer_rmbe_size) is no longer truncated.
Verified with an in-kernel KASAN A/B matrix on x86-64 (SMC-D loopback,
CONFIG_SMC_LO; no special hardware): each sink produces a slab-out-of-bounds
read/write for a forged cursor and is clean with the patch, and both the
cross-CPU race and the negative-accumulator case are reproduced and closed.
Logs available on request.
---
Bryam Vargas (3):
net/smc: bound the wire-controlled producer cursor to the RMB
net/smc: bound the receive length to the RMB in smc_rx_recvmsg()
net/smc: bound the send length to the send buffer in smc_tx_sendmsg()
net/smc/smc_cdc.h | 27 ++++++++++++++++++++++++---
net/smc/smc_rx.c | 12 ++++++++++++
net/smc/smc_tx.c | 13 +++++++++++++
3 files changed, 49 insertions(+), 3 deletions(-)
---
base-commit: 8e65320d91cdc3b241d4b94855c88459b91abf66
change-id: 20260614-b4-disp-edd64be9-b094cf67fded
Best regards,
--
Bryam Vargas <hexlabsecurity@proton.me>
^ permalink raw reply
* [PATCH v3 2/3] net/smc: bound the receive length to the RMB in smc_rx_recvmsg()
From: Bryam Vargas via B4 Relay @ 2026-06-14 8:23 UTC (permalink / raw)
To: Wenjia Zhang, Dust Li, D. Wythe, Sidraya Jayagond
Cc: Eric Dumazet, David S. Miller, Mahanta Jambigi, Wen Gu,
Simon Horman, netdev, Ursula Braun, Stefan Raspl, linux-s390,
Paolo Abeni, linux-kernel, linux-rdma, Jakub Kicinski, Tony Lu
In-Reply-To: <20260614-b4-disp-edd64be9-v3-0-551fa514257e@proton.me>
From: Bryam Vargas <hexlabsecurity@proton.me>
conn->bytes_to_rcv is accumulated in the receive tasklet from the peer's
producer cursor:
diff_prod = smc_curs_diff(rmb_desc->len, &prod_old, &prod_new);
atomic_add(diff_prod, &conn->bytes_to_rcv);
smc_curs_diff()'s differing-wrap branch returns (size - old.count) +
new.count, which exceeds rmb_desc->len for a forged producer cursor and
accumulates across CDC messages, so bytes_to_rcv can grow past the RMB
(and across many messages can overflow the signed counter negative).
smc_rx_recvmsg() reads it as the number of readable bytes and performs a
wrap-around copy whose second chunk (copylen - first_chunk, read from
ring offset 0) is never re-bounded to rmb_desc->len, reading past the
RMB into adjacent kernel memory and disclosing it to the peer.
Bound the readable count to rmb_desc->len where it is consumed, treating
a negative (sign-overflowed) value as out of range too, so the copy
length can never exceed the ring. This enforces the documented
0 <= bytes_to_rcv <= rmb_desc->len invariant at the consumer, where it
is race-free against the producer update that runs in the receive
tasklet.
Fixes: 952310ccf2d8 ("smc: receive data from RMBE")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
net/smc/smc_rx.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c
index c1d9b923938d..f461cf10b085 100644
--- a/net/smc/smc_rx.c
+++ b/net/smc/smc_rx.c
@@ -442,6 +442,18 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg,
/* initialize variables for 1st iteration of subsequent loop */
/* could be just 1 byte, even after waiting on data above */
readable = smc_rx_data_available(conn, peeked_bytes);
+ /* bytes_to_rcv is accumulated from the peer's wire-controlled
+ * producer cursor; a forged cursor can drive it past the RMB,
+ * or overflow the signed accumulator to a negative value across
+ * many CDC messages (which a plain "> len" check would miss
+ * before the size_t cast below turns it huge). Bound it to the
+ * RMB in either case so the wrap-around copy cannot run past
+ * rmb_desc->len. This enforces the documented
+ * 0 <= bytes_to_rcv <= rmb_desc->len invariant at the consumer,
+ * race-free against the producer update in the receive tasklet.
+ */
+ if (readable < 0 || readable > conn->rmb_desc->len)
+ readable = conn->rmb_desc->len;
splbytes = atomic_read(&conn->splice_pending);
if (!readable || (msg && splbytes)) {
if (splbytes)
--
2.43.0
^ permalink raw reply related
* [PATCH v3 3/3] net/smc: bound the send length to the send buffer in smc_tx_sendmsg()
From: Bryam Vargas via B4 Relay @ 2026-06-14 8:23 UTC (permalink / raw)
To: Wenjia Zhang, Dust Li, D. Wythe, Sidraya Jayagond
Cc: Eric Dumazet, David S. Miller, Mahanta Jambigi, Wen Gu,
Simon Horman, netdev, Ursula Braun, Stefan Raspl, linux-s390,
Paolo Abeni, linux-kernel, linux-rdma, Jakub Kicinski, Tony Lu
In-Reply-To: <20260614-b4-disp-edd64be9-v3-0-551fa514257e@proton.me>
From: Bryam Vargas <hexlabsecurity@proton.me>
On the SMC-D DMB-merge (nocopy) path, smc_cdc_msg_recv_action() advances
conn->sndbuf_space from the peer's consumer cursor:
diff_tx = smc_curs_diff(sndbuf_desc->len, &tx_curs_fin,
&local_rx_ctrl.cons);
atomic_add(diff_tx, &conn->sndbuf_space);
The consumer cursor is wire-controlled and unvalidated, and
smc_curs_diff()'s differing-wrap branch can return more than
sndbuf_desc->len, so a forged cursor drives sndbuf_space past the send
buffer (and across many CDC messages can overflow the signed counter
negative). smc_tx_sendmsg() reads it as the available write space and
performs a wrap-around copy whose second chunk (copylen - first_chunk,
written at ring offset 0) is never re-bounded to sndbuf_desc->len,
writing user data past the send buffer -- a heap out-of-bounds write of
attacker-influenced length and content.
Bound the write space to sndbuf_desc->len where it is consumed, treating
a negative (sign-overflowed) value as out of range too, so the copy
length can never exceed the ring. This enforces the documented
0 <= sndbuf_space <= sndbuf_desc->len invariant at the producer,
race-free against the CDC tasklet that advances sndbuf_space.
Fixes: cc0ab806fc52 ("net/smc: adapt cursor update when sndbuf and peer DMB are merged")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
net/smc/smc_tx.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 3144b4b1fe29..5916f02060fb 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -233,6 +233,19 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
/* initialize variables for 1st iteration of subsequent loop */
/* could be just 1 byte, even after smc_tx_wait above */
writespace = atomic_read(&conn->sndbuf_space);
+ /* sndbuf_space is advanced from the peer's wire-controlled
+ * consumer cursor on the SMC-D DMB-merge path; a forged cursor
+ * can inflate it past the send buffer, or overflow the signed
+ * accumulator to a negative value across many CDC messages
+ * (which a plain "> len" check would miss before the size_t
+ * cast below turns it huge). Bound it to the send buffer in
+ * either case so the wrap-around write cannot run past
+ * sndbuf_desc->len. This enforces the documented
+ * 0 <= sndbuf_space <= sndbuf_desc->len invariant at the
+ * producer, race-free against the CDC tasklet.
+ */
+ if (writespace < 0 || writespace > conn->sndbuf_desc->len)
+ writespace = conn->sndbuf_desc->len;
/* not more than what user space asked for */
copylen = min_t(size_t, send_remaining, writespace);
/* determine start of sndbuf */
--
2.43.0
^ permalink raw reply related
* Re: [PATCH net] kcm: use WRITE_ONCE() when changing lower socket callbacks
From: patchwork-bot+netdevbpf @ 2026-06-14 8:30 UTC (permalink / raw)
To: Runyu Xiao
Cc: davem, edumazet, kuba, pabeni, netdev, horms, linux-kernel,
jianhao.xu
In-Reply-To: <20260611053543.2429462-1-runyu.xiao@seu.edu.cn>
Hello:
This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Thu, 11 Jun 2026 13:35:43 +0800 you wrote:
> kcm_attach() replaces a live lower TCP socket's sk_data_ready and
> sk_write_space callbacks with KCM handlers, and kcm_unattach() restores
> them later. Those callback-pointer updates are still plain stores even
> though the same fields can be read and invoked concurrently on other
> CPUs.
>
> If another CPU observes an older callback snapshot after the live field
> has already been restored, callback execution can run with a mismatched
> target and sk_user_data state, leading to stale or misdirected wakeups.
>
> [...]
Here is the summary with links:
- [net] kcm: use WRITE_ONCE() when changing lower socket callbacks
https://git.kernel.org/netdev/net/c/47186409c092
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH] net/smc: bound the peer producer cursor on SMC-D and SMC-R CDC receive
From: Bryam Vargas @ 2026-06-14 8:32 UTC (permalink / raw)
To: Jakub Kicinski
Cc: D . Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
Mahanta Jambigi, Tony Lu, Wen Gu, David S . Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Stefan Raspl, Ursula Braun, linux-rdma,
linux-s390, netdev, linux-kernel
In-Reply-To: <20260614003111.383195-1-kuba@kernel.org>
On Sat, 13 Jun 2026 17:31:11 -0700, Jakub Kicinski wrote:
> Is this clamp safe against a concurrent smc_rx_recvmsg() on another CPU?
Confirmed -- the tasklet read-then-set is racy: a recvmsg()/sendmsg() on
another CPU reads the inflated value in the window between the
atomic_add() and the clamp (recvmsg() runs under lock_sock(), which
leaves the slock free, so it is not serialized against the
bh_lock_sock() CDC tasklet). Reworked as a v3 series:
https://lore.kernel.org/netdev/20260614-b4-disp-edd64be9-v3-0-551fa514257e@proton.me/
The bound now lives at the consumer (smc_rx_recvmsg() / smc_tx_sendmsg()),
where it is race-free; it also rejects a sign-overflowed (negative)
accumulator (per the sashiko-bot review on the sndbuf_space patch); and
the producer-cursor clamp is applied to the producer cursor only, so the
consumer cursor stays bounded by peer_rmbe_size, not rmb_desc->len. The
sndbuf_space fix is folded in as patch 3/3.
Thanks for the review.
Bryam
^ permalink raw reply
* Re: [PATCH net-next] net: dsa: yt921x: Add limited ACL flow statistics support
From: David Yang @ 2026-06-14 8:39 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, andrew, olteanv, davem, edumazet, pabeni, linux-kernel
In-Reply-To: <20260614004530.402503-1-kuba@kernel.org>
On Sun, Jun 14, 2026 at 8:45 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> This is an AI-generated review of your patch. The human sending this
> email says "I thought this is a false positive and the register is
> Clear-on-Read but you seem to be writing 0 to it when allocating."
...
> [Low]
> On rule delete the priv->flowstats_map bit is cleared, but
> YT921X_FLOWSTATn_CTRL(statid) is not written to clear FLOWSTAT_CTRL_EN.
> The counter remains physically enabled with TYPE_FLOW until something
> else reprograms it.
>
> Today this is benign because acl_commit() zeros the action register so
> no rule references the counter, and reuse rewrites CTRL. However, the
> header file enumerates other types (TYPE_CPU_CODE, TYPE_DROP_CODE,
> TYPE_PORT) for the same slot pool. If a future caller picks one of
> these types without first disabling the slot, could it observe stale
> counts from the previous flow user?
It's an absolute hardware counter that requires manual resetting before using.
> [Medium]
> This block runs unconditionally for every ACL rule, regardless of
> whether the user requested statistics. When all 64 counters are in use
> (statid >= YT921X_FLOWSTAT_NUM), the rule is still installed but
> silently without stats; the user only learns by querying and getting
> -EOPNOTSUPP from yt921x_acl_stat().
>
> There is no NL_SET_ERR_MSG_MOD() at install time and no kernel log
> message. The same function explicitly emits
>
> NL_SET_ERR_MSG_MOD(extack, "No more meters available");
>
> for the meter pool right above this hunk.
>
> Should the flow-stat path either be gated on whether the rule actually
> needs stats (so unrelated rules do not consume counters), or surface a
> diagnostic via extack when the pool is exhausted? Otherwise an old set
> of un-queried rules can starve newer rules that do need stats.
As I know there is no way to specify whether the rule needs a stat; it
always assumes rule has one.
> [Low]
> After the YT921X_FLOWSTATn_CTRL write succeeds, the bare "return res;"
> on a failing YT921X_FLOWSTATn_STAT write leaves the counter enabled in
> hardware while priv->flowstats_map remains clear. Likewise, if any
> later step (kvmemdup, yt921x_acl_commit) fails, the err: label does not
> unwind the CTRL write or the meter configuration done in the prior
> hunk.
>
> The same asymmetric rollback pre-exists for meters and is implicitly
> healed by the next allocation reprogramming the slot, but the new
> flow-stat path's bare returns bypass the err: label entirely. Should
> this teardown path be made symmetric?
No traffic will be measured by the counter. Any subsequent calls that
pick the meter / counter should reinit and reset it before activating
it for themselves.
^ permalink raw reply
* [PATCH net] atm: br2684: validate IP header length before filtering
From: Yizhou Zhao @ 2026-06-14 8:40 UTC (permalink / raw)
To: netdev
Cc: Yizhou Zhao, Chas Williams, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Kees Cook,
linux-atm-general, linux-kernel, Yuxiang Yang, Ao Wang,
Xuewei Feng, Qi Li, Ke Xu, stable
When CONFIG_ATM_BR2684_IPFILTER is enabled, packet_fails_filter()
treats skb->data as an IPv4 header whenever the packet protocol is
ETH_P_IP and then reads iph->daddr. That read is not protected by a
check that the pulled skb still contains a full IPv4 header.
This is reachable through the receive path. An LLC-routed IPv4 PDU can
contain only the 8-byte LLC/SNAP header; br2684_push() accepts it,
sets skb->protocol to ETH_P_IP, pulls the LLC header, and leaves
skb->len as 0 before the filter runs. The VC-routed path also reads
iph->version before checking that the skb contains an IPv4 header, so a
2-byte PDU starting with an IPv4 version nibble can reach the same
filter decision.
In both cases the filter can make its pass/drop decision from bytes
outside the packet data. A reproducer using a dummy ATM receive device
filled the skb tailroom with 0xa5 and showed that an 8-byte LLC-routed
PDU and a 2-byte VC-routed PDU were forwarded when the filter prefix was
0xa5a5a5a5, even though neither packet contained an IPv4 destination
address.
Drop IPv4 packets that are shorter than struct iphdr in
packet_fails_filter(), before reading iph->daddr. Also reject
VC-routed packets shorter than struct iphdr before br2684_push() reads
iph->version. Such packets cannot contain a valid IPv4 header, while
normal minimum-sized IPv4 packets continue through the existing filter
logic.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
Reported-by: Ao Wang <wangao@seu.edu.cn>
Reported-by: Xuewei Feng <fengxw06@126.com>
Reported-by: Qi Li <qli01@tsinghua.edu.cn>
Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
Assisted-by: GLM:GLM-5.1
Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
---
net/atm/br2684.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/atm/br2684.c b/net/atm/br2684.c
index 6580d67c3456..fa4b1852d72b 100644
--- a/net/atm/br2684.c
+++ b/net/atm/br2684.c
@@ -393,6 +393,7 @@ packet_fails_filter(__be16 type, struct br2684_vcc *brvcc, struct sk_buff *skb)
if (brvcc->filter.netmask == 0)
return 0; /* no filter in place */
if (type == htons(ETH_P_IP) &&
+ skb->len >= sizeof(struct iphdr) &&
(((struct iphdr *)(skb->data))->daddr & brvcc->filter.
netmask) == brvcc->filter.prefix)
return 0;
@@ -482,6 +483,8 @@ static void br2684_push(struct atm_vcc *atmvcc, struct sk_buff *skb)
skb_reset_network_header(skb);
iph = ip_hdr(skb);
+ if (skb->len < sizeof(struct iphdr))
+ goto error;
if (iph->version == 4)
skb->protocol = htons(ETH_P_IP);
else if (iph->version == 6)
--
2.43.0
^ permalink raw reply related
* [PATCH net] appletalk: Hold socket reference in atalk_rcv()
From: Yizhou Zhao @ 2026-06-14 9:52 UTC (permalink / raw)
To: netdev
Cc: Yizhou Zhao, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Kees Cook, Kito Xu, linux-kernel,
Yuxiang Yang, Ao Wang, Xuewei Feng, Qi Li, Ke Xu, stable
atalk_search_socket() walks the global atalk_sockets list while holding
atalk_sockets_lock, but it returns the matching socket after dropping the
lock without taking a reference. atalk_rcv() then passes that pointer to
sock_queue_rcv_skb().
That leaves a race with close(). A concurrent atalk_release() can orphan
the socket, remove it from atalk_sockets, and drop the final reference via
atalk_destroy_socket(), freeing the socket before atalk_rcv() queues the
incoming skb.
On a KASAN-enabled kernel this can be reproduced by racing AppleTalk DDP
delivery on loopback against close/rebind of the destination DGRAM socket:
BUG: KASAN: slab-use-after-free in selinux_socket_sock_rcv_skb()
sk_filter_trim_cap()
sock_queue_rcv_skb_reason()
atalk_rcv()
snap_rcv()
llc_rcv()
Take a reference on the selected socket before dropping
atalk_sockets_lock, and put it after sock_queue_rcv_skb() has finished.
This keeps the socket alive for the receive path without changing socket
lookup semantics. A malformed or racing receive still drops the skb on
queueing failure as before.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
Reported-by: Ao Wang <wangao@seu.edu.cn>
Reported-by: Xuewei Feng <fengxw06@126.com>
Reported-by: Qi Li <qli01@tsinghua.edu.cn>
Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
Assisted-by: GLM:GLM-5.1
Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
---
net/appletalk/ddp.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 30a6dc06291c..61ec5c569dc3 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -131,6 +131,8 @@ static struct sock *atalk_search_socket(struct sockaddr_at *to,
}
s = def_socket;
found:
+ if (s)
+ sock_hold(s);
read_unlock_bh(&atalk_sockets_lock);
return s;
}
@@ -1474,9 +1476,12 @@ static int atalk_rcv(struct sk_buff *skb, struct net_device *dev,
goto drop;
/* Queue packet (standard) */
- if (sock_queue_rcv_skb(sock, skb) < 0)
+ if (sock_queue_rcv_skb(sock, skb) < 0) {
+ sock_put(sock);
goto drop;
+ }
+ sock_put(sock);
return NET_RX_SUCCESS;
drop:
--
2.43.0
^ permalink raw reply related
* Re: [PATCH net-next 1/3] net: busy-poll: introduce sk_tx_busy_loop()
From: Menglong Dong @ 2026-06-14 10:12 UTC (permalink / raw)
To: menglong8.dong, Jakub Kicinski
Cc: jasowang, mst, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
pabeni, magnus.karlsson, maciej.fijalkowski, sdf, horms, ast,
daniel, hawk, john.fastabend, bjorn, kerneljasonxing, netdev,
virtualization, linux-kernel, bpf
In-Reply-To: <20260613112113.55d9313f@kernel.org>
On 2026/6/14 02:21, Jakub Kicinski wrote:
> On Thu, 11 Jun 2026 15:12:40 +0800 menglong8.dong@gmail.com wrote:
> > For now, we use sk_busy_loop() for both rx and tx path. The sk_busy_loop()
> > will call napi_busy_loop() for the specified napi_id. However, some
> > nic drivers have tx napi, such as virtio-net. In this case, sk_busy_loop()
> > doesn't work, as it can only schedule the NAPI for the rx queue.
> >
> > Therefore, introduce sk_tx_busy_loop() for the nic drivers that support tx
> > napi, which will schedule the tx napi if available.
>
> First, I thought the only difference with Tx NAPI is that it can't be
> busy polled. So if you want to poll an instance don't register it as
> a Tx one instead of adding all this "tx polling" stuff in the core?
I see. Register the tx NAPI with netif_napi_add_config() allow us
busy poll it. But we still have two NAPI instance: rx NAPI and tx NAPI.
sk_busy_loop() can only busy poll on one of them.
Before AF_XDP, we don't have the need to send packet via tx NAPI, which
means that we don't need to busy poll it.
I analyst some nic drivers on the implement of AF_XDP. Some of them
will check xsk tx ring of current queue and send the data in it in the
rx NAPI, such as mlx5. Some of them will allocate a extra "rxtx" NAPI
for the AF_XDP zero-copy queue, which will poll both the data receiving
and sending.
In the case about, they will do the data sending and receiving for the
AF_XDP in a single NAPI instance.
However, some driver receiving the data in rx NAPI and send data in
tx NAPI for AF_XDP. In this case, we can't use sk_busy_loop() for both
rx path and tx path, as we need to wake different NAPI instance.
>
> Second, can this problem happen for any other NIC or is it purely
> an artifact of virtio's delayed Tx completion handling?
According to my analysis, only virtio-net and ICSSG driver have
split NAPI for AF_XDP. I don't have a ICSSG nic, but the codex tell
me that it does have the same problem.
I'm not sure if it is a good idea to introduce the sk_tx_busy_loop().
Maybe we can modify the driver instead by using the same NAPI
for both data sending and receiving, just like others do. The
advantage of introduce sk_tx_busy_loop() is that we can split the
data sending and receiving, which maybe more efficient.
>
> Third, this series does not apply.
Ah, I'll rebase this series if a V2 is acceptable.
Thanks!
Menglong Dong
>
>
^ permalink raw reply
* [PATCH net-next v3 0/4] Extend netkit io_uring ZC selftests
From: Daniel Borkmann @ 2026-06-14 10:26 UTC (permalink / raw)
To: kuba; +Cc: razor, bobbyeshleman, dw, netdev
Small follow-up to the HW net selftests, in particular to add a
selftest showing that also large rx_buf_len for io_uring ZC is
supported with netkit queue leasing.
v2 -> v3:
- rebase to latest net-next
v1 -> v2:
- Switch to cmd(..., ns=cfg.netns) (Bobby)
- Add mp_clear_wait for large rx_buf_len test (Bobby)
Daniel Borkmann (4):
selftests/net: Move netkit lease hw setup into per-test fixtures
selftests/net: Use public NetDrvContEnv API in nk_qlease fixtures
selftests/net: Add netkit io_uring ZC test for large rx_buf_len
selftests/net: Add hugepage kernel config dependency for zcrx
tools/testing/selftests/drivers/net/hw/config | 1 +
.../selftests/drivers/net/hw/nk_qlease.py | 300 +++++++++++++++---
.../selftests/drivers/net/lib/py/env.py | 10 +-
3 files changed, 259 insertions(+), 52 deletions(-)
--
2.43.0
^ permalink raw reply
* [PATCH net-next v3 1/4] selftests/net: Move netkit lease hw setup into per-test fixtures
From: Daniel Borkmann @ 2026-06-14 10:26 UTC (permalink / raw)
To: kuba; +Cc: razor, bobbyeshleman, dw, netdev
In-Reply-To: <20260614102607.863838-1-daniel@iogearbox.net>
The HW counterpart of nk_qlease.py was carrying its lease setup in main()
and stashing src_queue / nk_queue / nk_*_ifname on cfg, which had drawbacks
called out during the review at [0].
This is the deferred half of the cleanup that landed in commit e254ffb9502c
("selftests/net: Split netdevsim tests from HW tests in nk_qlease") which
was the SW counterpart of nk_qlease.py.
While at it, convert the open-coded "ip netns exec" prefixes in the test
bodies over to the ns= argument of cmd() / bkg().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://lore.kernel.org/netdev/20260408162238.16709090@kernel.org/ [0]
---
.../selftests/drivers/net/hw/nk_qlease.py | 198 +++++++++++++-----
1 file changed, 149 insertions(+), 49 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/hw/nk_qlease.py b/tools/testing/selftests/drivers/net/hw/nk_qlease.py
index f5fd64775989..3723574dcd30 100755
--- a/tools/testing/selftests/drivers/net/hw/nk_qlease.py
+++ b/tools/testing/selftests/drivers/net/hw/nk_qlease.py
@@ -18,8 +18,10 @@ from lib.py import (
NetNSEnter,
EthtoolFamily,
NetdevFamily,
+ RtnlFamily,
)
from lib.py import (
+ Netlink,
bkg,
cmd,
defer,
@@ -31,9 +33,117 @@ from lib.py import (
from lib.py import KsftSkipEx, CmdExitFailure
-def set_flow_rule(cfg):
+def _create_netkit_pair(cfg, rxqueues=2):
+ if cfg.nk_host_ifname:
+ cmd(f"ip link del dev {cfg.nk_host_ifname}", fail=False)
+ cfg.nk_host_ifname = None
+ cfg.nk_guest_ifname = None
+ if getattr(cfg, "_tc_attached", False):
+ cmd(
+ f"tc filter del dev {cfg.ifname} ingress pref {cfg._bpf_prog_pref}",
+ fail=False,
+ )
+ cfg._tc_attached = False
+
+ all_links = ip("-d link show", json=True)
+ old_idxs = {
+ link["ifindex"]
+ for link in all_links
+ if link.get("linkinfo", {}).get("info_kind") == "netkit"
+ }
+
+ rtnl = RtnlFamily()
+ rtnl.newlink(
+ {
+ "linkinfo": {
+ "kind": "netkit",
+ "data": {
+ "mode": "l2",
+ "policy": "forward",
+ "peer-policy": "forward",
+ },
+ },
+ "num-rx-queues": rxqueues,
+ },
+ flags=[Netlink.NLM_F_CREATE, Netlink.NLM_F_EXCL],
+ )
+
+ all_links = ip("-d link show", json=True)
+ nk_links = [
+ link
+ for link in all_links
+ if link.get("linkinfo", {}).get("info_kind") == "netkit"
+ and link["ifindex"] not in old_idxs
+ ]
+ if len(nk_links) != 2:
+ raise KsftSkipEx("Failed to create netkit pair")
+
+ nk_links.sort(key=lambda x: x["ifindex"])
+ cfg.nk_host_ifname = nk_links[1]["ifname"]
+ cfg.nk_guest_ifname = nk_links[0]["ifname"]
+ cfg.nk_host_ifindex = nk_links[1]["ifindex"]
+ cfg.nk_guest_ifindex = nk_links[0]["ifindex"]
+
+ ip(f"link set dev {cfg.nk_guest_ifname} netns {cfg.netns.name}")
+ ip(f"link set dev {cfg.nk_host_ifname} up")
+ ip(f"-6 addr add fe80::1/64 dev {cfg.nk_host_ifname} nodad")
+ ip(
+ f"-6 route add {cfg.nk_guest_ipv6}/128 via fe80::2 "
+ f"dev {cfg.nk_host_ifname}"
+ )
+ ip(f"link set dev {cfg.nk_guest_ifname} up", ns=cfg.netns)
+ ip(f"-6 addr add fe80::2/64 dev {cfg.nk_guest_ifname}", ns=cfg.netns)
+ ip(
+ f"-6 addr add {cfg.nk_guest_ipv6}/64 dev {cfg.nk_guest_ifname} nodad",
+ ns=cfg.netns,
+ )
+ ip(
+ f"-6 route add default via fe80::1 dev {cfg.nk_guest_ifname}",
+ ns=cfg.netns,
+ )
+
+ cfg._attach_bpf()
+
+
+def _setup_lease(cfg, rxqueues=2):
+ _create_netkit_pair(cfg, rxqueues=rxqueues)
+
+ ethnl = EthtoolFamily()
+ channels = ethnl.channels_get({"header": {"dev-index": cfg.ifindex}})[
+ "combined-count"
+ ]
+ if channels < 2:
+ raise KsftSkipEx(
+ "Test requires NETIF with at least 2 combined channels"
+ )
+ src_queue = channels - 1
+
+ with NetNSEnter(str(cfg.netns)):
+ netdevnl = NetdevFamily()
+ bind_result = netdevnl.queue_create(
+ {
+ "ifindex": cfg.nk_guest_ifindex,
+ "type": "rx",
+ "lease": {
+ "ifindex": cfg.ifindex,
+ "queue": {"id": src_queue, "type": "rx"},
+ "netns-id": 0,
+ },
+ }
+ )
+ return src_queue, bind_result["id"]
+
+
+def _teardown_netkit(cfg):
+ if cfg.nk_host_ifname:
+ cmd(f"ip link del dev {cfg.nk_host_ifname}", fail=False)
+ cfg.nk_host_ifname = None
+ cfg.nk_guest_ifname = None
+
+
+def set_flow_rule(cfg, src_queue):
output = ethtool(
- f"-N {cfg.ifname} flow-type tcp6 dst-port {cfg.port} action {cfg.src_queue}"
+ f"-N {cfg.ifname} flow-type tcp6 dst-port {cfg.port} action {src_queue}"
).stdout
values = re.search(r"ID (\d+)", output).group(1)
return int(values)
@@ -41,6 +151,8 @@ def set_flow_rule(cfg):
def test_iou_zcrx(cfg) -> None:
cfg.require_ipver("6")
+ src_queue, nk_queue = _setup_lease(cfg)
+ defer(_teardown_netkit, cfg)
ethnl = EthtoolFamily()
rings = ethnl.rings_get({"header": {"dev-index": cfg.ifindex}})
@@ -65,40 +177,47 @@ def test_iou_zcrx(cfg) -> None:
},
)
- ethtool(f"-X {cfg.ifname} equal {cfg.src_queue}")
+ ethtool(f"-X {cfg.ifname} equal {src_queue}")
defer(ethtool, f"-X {cfg.ifname} default")
- flow_rule_id = set_flow_rule(cfg)
+ flow_rule_id = set_flow_rule(cfg, src_queue)
defer(ethtool, f"-N {cfg.ifname} delete {flow_rule_id}")
- rx_cmd = f"ip netns exec {cfg.netns.name} {cfg.bin_local} -s -p {cfg.port} -i {cfg.nk_guest_ifname} -q {cfg.nk_queue}"
+ rx_cmd = (
+ f"{cfg.bin_local} -s -p {cfg.port} "
+ f"-i {cfg.nk_guest_ifname} -q {nk_queue}"
+ )
tx_cmd = f"{cfg.bin_remote} -c -h {cfg.nk_guest_ipv6} -p {cfg.port} -l 12840"
- with bkg(rx_cmd, exit_wait=True):
+ with bkg(rx_cmd, exit_wait=True, ns=cfg.netns):
wait_port_listen(cfg.port, proto="tcp", ns=cfg.netns)
cmd(tx_cmd, host=cfg.remote)
def test_attrs(cfg) -> None:
cfg.require_ipver("6")
+ src_queue, nk_queue = _setup_lease(cfg)
+ defer(_teardown_netkit, cfg)
netdevnl = NetdevFamily()
queue_info = netdevnl.queue_get(
- {"ifindex": cfg.ifindex, "id": cfg.src_queue, "type": "rx"}
+ {"ifindex": cfg.ifindex, "id": src_queue, "type": "rx"}
)
- ksft_eq(queue_info["id"], cfg.src_queue)
+ ksft_eq(queue_info["id"], src_queue)
ksft_eq(queue_info["type"], "rx")
ksft_eq(queue_info["ifindex"], cfg.ifindex)
ksft_in("lease", queue_info)
lease = queue_info["lease"]
ksft_eq(lease["ifindex"], cfg.nk_guest_ifindex)
- ksft_eq(lease["queue"]["id"], cfg.nk_queue)
+ ksft_eq(lease["queue"]["id"], nk_queue)
ksft_eq(lease["queue"]["type"], "rx")
ksft_in("netns-id", lease)
def test_attach_xdp_with_mp(cfg) -> None:
cfg.require_ipver("6")
+ src_queue, nk_queue = _setup_lease(cfg)
+ defer(_teardown_netkit, cfg)
ethnl = EthtoolFamily()
rings = ethnl.rings_get({"header": {"dev-index": cfg.ifindex}})
@@ -123,18 +242,21 @@ def test_attach_xdp_with_mp(cfg) -> None:
},
)
- ethtool(f"-X {cfg.ifname} equal {cfg.src_queue}")
+ ethtool(f"-X {cfg.ifname} equal {src_queue}")
defer(ethtool, f"-X {cfg.ifname} default")
netdevnl = NetdevFamily()
- rx_cmd = f"ip netns exec {cfg.netns.name} {cfg.bin_local} -s -p {cfg.port} -i {cfg.nk_guest_ifname} -q {cfg.nk_queue}"
- with bkg(rx_cmd):
+ rx_cmd = (
+ f"{cfg.bin_local} -s -p {cfg.port} "
+ f"-i {cfg.nk_guest_ifname} -q {nk_queue}"
+ )
+ with bkg(rx_cmd, ns=cfg.netns):
wait_port_listen(cfg.port, proto="tcp", ns=cfg.netns)
time.sleep(0.1)
queue_info = netdevnl.queue_get(
- {"ifindex": cfg.ifindex, "id": cfg.src_queue, "type": "rx"}
+ {"ifindex": cfg.ifindex, "id": src_queue, "type": "rx"}
)
ksft_in("io-uring", queue_info)
@@ -144,13 +266,15 @@ def test_attach_xdp_with_mp(cfg) -> None:
time.sleep(0.1)
queue_info = netdevnl.queue_get(
- {"ifindex": cfg.ifindex, "id": cfg.src_queue, "type": "rx"}
+ {"ifindex": cfg.ifindex, "id": src_queue, "type": "rx"}
)
ksft_not_in("io-uring", queue_info)
def test_destroy(cfg) -> None:
cfg.require_ipver("6")
+ src_queue, nk_queue = _setup_lease(cfg)
+ defer(_teardown_netkit, cfg)
ethnl = EthtoolFamily()
rings = ethnl.rings_get({"header": {"dev-index": cfg.ifindex}})
@@ -175,16 +299,19 @@ def test_destroy(cfg) -> None:
},
)
- ethtool(f"-X {cfg.ifname} equal {cfg.src_queue}")
+ ethtool(f"-X {cfg.ifname} equal {src_queue}")
defer(ethtool, f"-X {cfg.ifname} default")
- rx_cmd = f"ip netns exec {cfg.netns.name} {cfg.bin_local} -s -p {cfg.port} -i {cfg.nk_guest_ifname} -q {cfg.nk_queue}"
- rx_proc = cmd(rx_cmd, background=True)
+ rx_cmd = (
+ f"{cfg.bin_local} -s -p {cfg.port} "
+ f"-i {cfg.nk_guest_ifname} -q {nk_queue}"
+ )
+ rx_proc = cmd(rx_cmd, background=True, ns=cfg.netns)
wait_port_listen(cfg.port, proto="tcp", ns=cfg.netns)
netdevnl = NetdevFamily()
queue_info = netdevnl.queue_get(
- {"ifindex": cfg.ifindex, "id": cfg.src_queue, "type": "rx"}
+ {"ifindex": cfg.ifindex, "id": src_queue, "type": "rx"}
)
ksft_in("io-uring", queue_info)
@@ -199,17 +326,14 @@ def test_destroy(cfg) -> None:
cfg.nk_guest_ifname = None
queue_info = netdevnl.queue_get(
- {"ifindex": cfg.ifindex, "id": cfg.src_queue, "type": "rx"}
+ {"ifindex": cfg.ifindex, "id": src_queue, "type": "rx"}
)
ksft_not_in("io-uring", queue_info)
- cmd(f"tc filter del dev {cfg.ifname} ingress pref {cfg._bpf_prog_pref}")
- cfg._tc_attached = False
-
- flow_rule_id = set_flow_rule(cfg)
+ flow_rule_id = set_flow_rule(cfg, src_queue)
defer(ethtool, f"-N {cfg.ifname} delete {flow_rule_id}")
- rx_cmd = f"{cfg.bin_local} -s -p {cfg.port} -i {cfg.ifname} -q {cfg.src_queue}"
+ rx_cmd = f"{cfg.bin_local} -s -p {cfg.port} -i {cfg.ifname} -q {src_queue}"
tx_cmd = f"{cfg.bin_remote} -c -h {cfg.addr_v['6']} -p {cfg.port} -l 12840"
with bkg(rx_cmd, exit_wait=True):
wait_port_listen(cfg.port, proto="tcp")
@@ -217,7 +341,7 @@ def test_destroy(cfg) -> None:
# Short delay since iou cleanup is async and takes a bit of time.
time.sleep(0.1)
queue_info = netdevnl.queue_get(
- {"ifindex": cfg.ifindex, "id": cfg.src_queue, "type": "rx"}
+ {"ifindex": cfg.ifindex, "id": src_queue, "type": "rx"}
)
ksft_not_in("io-uring", queue_info)
@@ -230,30 +354,6 @@ def main() -> None:
cfg.bin_remote = cfg.remote.deploy(cfg.bin_local)
cfg.port = rand_port()
- ethnl = EthtoolFamily()
- channels = ethnl.channels_get({"header": {"dev-index": cfg.ifindex}})
- channels = channels["combined-count"]
- if channels < 2:
- raise KsftSkipEx("Test requires NETIF with at least 2 combined channels")
-
- cfg.src_queue = channels - 1
-
- with NetNSEnter(str(cfg.netns)):
- netdevnl = NetdevFamily()
- bind_result = netdevnl.queue_create(
- {
- "ifindex": cfg.nk_guest_ifindex,
- "type": "rx",
- "lease": {
- "ifindex": cfg.ifindex,
- "queue": {"id": cfg.src_queue, "type": "rx"},
- "netns-id": 0,
- },
- }
- )
- cfg.nk_queue = bind_result["id"]
-
- # test_destroy must be last because it destroys the netkit devices
ksft_run(
[test_iou_zcrx, test_attrs, test_attach_xdp_with_mp, test_destroy],
args=(cfg,),
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v3 2/4] selftests/net: Use public NetDrvContEnv API in nk_qlease fixtures
From: Daniel Borkmann @ 2026-06-14 10:26 UTC (permalink / raw)
To: kuba; +Cc: razor, bobbyeshleman, dw, netdev
In-Reply-To: <20260614102607.863838-1-daniel@iogearbox.net>
Expose the netkit host ifname as a public attribute nk_host_ifname
(symmetric with the already-public nk_guest_ifname), rename _attach_bpf
to a public attach_bpf, and add a public detach_bpf helper that
encapsulates the tc-filter teardown bookkeeping. Switch the fixtures
to this public API. No functional change and keeps pylint happy.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/drivers/net/hw/nk_qlease.py | 9 ++-------
tools/testing/selftests/drivers/net/lib/py/env.py | 10 ++++++++--
2 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/hw/nk_qlease.py b/tools/testing/selftests/drivers/net/hw/nk_qlease.py
index 3723574dcd30..b97663820ccf 100755
--- a/tools/testing/selftests/drivers/net/hw/nk_qlease.py
+++ b/tools/testing/selftests/drivers/net/hw/nk_qlease.py
@@ -38,12 +38,7 @@ def _create_netkit_pair(cfg, rxqueues=2):
cmd(f"ip link del dev {cfg.nk_host_ifname}", fail=False)
cfg.nk_host_ifname = None
cfg.nk_guest_ifname = None
- if getattr(cfg, "_tc_attached", False):
- cmd(
- f"tc filter del dev {cfg.ifname} ingress pref {cfg._bpf_prog_pref}",
- fail=False,
- )
- cfg._tc_attached = False
+ cfg.detach_bpf()
all_links = ip("-d link show", json=True)
old_idxs = {
@@ -102,7 +97,7 @@ def _create_netkit_pair(cfg, rxqueues=2):
ns=cfg.netns,
)
- cfg._attach_bpf()
+ cfg.attach_bpf()
def _setup_lease(cfg, rxqueues=2):
diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py
index b188ee55c76b..e4ab99b905b1 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -401,7 +401,7 @@ class NetDrvContEnv(NetDrvEpEnv):
self.nk_guest_ifindex = netkit_links[0]['ifindex']
self._setup_ns()
- self._attach_bpf()
+ self.attach_bpf()
if primary_rx_redirect:
self._attach_primary_rx_redirect_bpf()
@@ -524,7 +524,13 @@ class NetDrvContEnv(NetDrvEpEnv):
return bpf_obj
return None
- def _attach_bpf(self):
+ def detach_bpf(self):
+ if self._tc_attached:
+ cmd(f"tc filter del dev {self.ifname} ingress pref "
+ f"{self._bpf_prog_pref}", fail=False)
+ self._tc_attached = False
+
+ def attach_bpf(self):
bpf_obj = self._find_bpf_obj("nk_forward.bpf.o")
if not bpf_obj:
raise KsftSkipEx("BPF prog nk_forward.bpf.o not found")
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v3 4/4] selftests/net: Add hugepage kernel config dependency for zcrx
From: Daniel Borkmann @ 2026-06-14 10:26 UTC (permalink / raw)
To: kuba; +Cc: razor, bobbyeshleman, dw, netdev
In-Reply-To: <20260614102607.863838-1-daniel@iogearbox.net>
test_iou_zcrx_large_buf in drivers/net/hw/nk_qlease.py runs iou-zcrx
with rx_buf_len > page size, backed by a hugepage-mapped area. Thus
add to the Kconfig.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/drivers/net/hw/config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/drivers/net/hw/config b/tools/testing/selftests/drivers/net/hw/config
index cd20024218cd..ed8642b68094 100644
--- a/tools/testing/selftests/drivers/net/hw/config
+++ b/tools/testing/selftests/drivers/net/hw/config
@@ -3,6 +3,7 @@ CONFIG_FAIL_FUNCTION=y
CONFIG_FAULT_INJECTION=y
CONFIG_FAULT_INJECTION_DEBUG_FS=y
CONFIG_FUNCTION_ERROR_INJECTION=y
+CONFIG_HUGETLBFS=y
CONFIG_INET6_ESP=y
CONFIG_INET6_ESP_OFFLOAD=y
CONFIG_INET_ESP=y
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v3 3/4] selftests/net: Add netkit io_uring ZC test for large rx_buf_len
From: Daniel Borkmann @ 2026-06-14 10:26 UTC (permalink / raw)
To: kuba; +Cc: razor, bobbyeshleman, dw, netdev
In-Reply-To: <20260614102607.863838-1-daniel@iogearbox.net>
Add test_iou_zcrx_large_buf, which runs iou-zcrx with rx_buf_len >
page size (-x 2) through a netkit-leased RX queue. The netkit ifindex
is opaque to io_uring, but rx_page_size is honoured by the leased
physical qops via netif_mp_open_rxq()'s lease redirect.
Originally, I also added a BIG TCP variant on top, but dropped it
here as fbnic (and the QEMU fbnic model) has no BIG TCP support
to exercise it as this point.
Tested against the QEMU fbnic emulation. The new test exercises
the > page rx_buf_len path only when the leased NIC advertises
QCFG_RX_PAGE_SIZE; otherwise it skips.
For fbnic, I used Bjorn's patches locally [0]:
# ./nk_qlease.py
TAP version 13
1..5
ok 1 nk_qlease.test_iou_zcrx
ok 2 nk_qlease.test_iou_zcrx_large_buf
ok 3 nk_qlease.test_attrs
ok 4 nk_qlease.test_attach_xdp_with_mp
ok 5 nk_qlease.test_destroy
# Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0
Without those patches (aka not advertising QCFG_RX_PAGE_SIZE):
# ./nk_qlease.py
TAP version 13
1..5
ok 1 nk_qlease.test_iou_zcrx
ok 2 nk_qlease.test_iou_zcrx_large_buf # SKIP Large chunks are not supported -95
ok 3 nk_qlease.test_attrs
ok 4 nk_qlease.test_attach_xdp_with_mp
ok 5 nk_qlease.test_destroy
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:1 error:0
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://lore.kernel.org/netdev/20260522113225.241337-1-bjorn@kernel.org/ [0]
---
.../selftests/drivers/net/hw/nk_qlease.py | 107 +++++++++++++++++-
1 file changed, 106 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/net/hw/nk_qlease.py b/tools/testing/selftests/drivers/net/hw/nk_qlease.py
index b97663820ccf..4f53034c9a50 100755
--- a/tools/testing/selftests/drivers/net/hw/nk_qlease.py
+++ b/tools/testing/selftests/drivers/net/hw/nk_qlease.py
@@ -32,6 +32,31 @@ from lib.py import (
)
from lib.py import KsftSkipEx, CmdExitFailure
+# iou-zcrx exits with 42 from setup_zcrx() when the NIC does not advertise
+# QCFG_RX_PAGE_SIZE (or otherwise rejects the requested rx_buf_len).
+SKIP_CODE = 42
+
+
+def _restore_hugepages(count):
+ with open("/proc/sys/vm/nr_hugepages", "w", encoding="utf-8") as f:
+ f.write(str(count))
+
+
+def _mp_clear_wait(cfg, src_queue):
+ """Wait for the io_uring memory provider to clear from the leased
+ physical queue; io_uring tears it down asynchronously after the
+ process holding the ifq exits."""
+ netdevnl = NetdevFamily()
+ deadline = time.time() + 5
+ while time.time() < deadline:
+ queue_info = netdevnl.queue_get(
+ {"ifindex": cfg.ifindex, "id": src_queue, "type": "rx"}
+ )
+ if "io-uring" not in queue_info:
+ return
+ time.sleep(0.1)
+ raise TimeoutError("Timed out waiting for memory provider to clear")
+
def _create_netkit_pair(cfg, rxqueues=2):
if cfg.nk_host_ifname:
@@ -188,6 +213,80 @@ def test_iou_zcrx(cfg) -> None:
cmd(tx_cmd, host=cfg.remote)
+def test_iou_zcrx_large_buf(cfg) -> None:
+ """iou-zcrx with rx_buf_len > page size, going through a netkit-leased
+ queue. Exercises the queue rx-buf-len path via netif_mp_open_rxq()'s
+ lease redirect: the netkit ifindex is opaque to io_uring, but
+ rx_page_size is honoured by the *physical* qops because the lease
+ pointer rewrites the request from netkit onto the leased physical
+ rxq before supported_params/validate_qcfg are consulted.
+ """
+ cfg.require_ipver("6")
+ src_queue, nk_queue = _setup_lease(cfg)
+ defer(_teardown_netkit, cfg)
+ ethnl = EthtoolFamily()
+
+ with open("/proc/sys/vm/nr_hugepages", "r+", encoding="utf-8") as f:
+ nr_hugepages = int(f.read().strip())
+ if nr_hugepages < 64:
+ f.seek(0)
+ f.write("64")
+ defer(_restore_hugepages, nr_hugepages)
+
+ rings = ethnl.rings_get({"header": {"dev-index": cfg.ifindex}})
+ rx_rings = rings["rx"]
+ hds_thresh = rings.get("hds-thresh", 0)
+
+ ethnl.rings_set(
+ {
+ "header": {"dev-index": cfg.ifindex},
+ "tcp-data-split": "enabled",
+ "hds-thresh": 0,
+ "rx": 64,
+ }
+ )
+ defer(
+ ethnl.rings_set,
+ {
+ "header": {"dev-index": cfg.ifindex},
+ "tcp-data-split": "unknown",
+ "hds-thresh": hds_thresh,
+ "rx": rx_rings,
+ },
+ )
+
+ ethtool(f"-X {cfg.ifname} equal {src_queue}")
+ defer(ethtool, f"-X {cfg.ifname} default")
+
+ flow_rule_id = set_flow_rule(cfg, src_queue)
+ defer(ethtool, f"-N {cfg.ifname} delete {flow_rule_id}")
+
+ # -x 2 asks iou-zcrx for rx_buf_len = 2 * page_size (8 KiB on x86_64),
+ # backed by a 2 MiB hugepage area so the chunks are physically
+ # contiguous, which is what zcrx requires for non-default rx_buf_len.
+ rx_cmd = (
+ f"{cfg.bin_local} -s -p {cfg.port} "
+ f"-i {cfg.nk_guest_ifname} -q {nk_queue} -x 2"
+ )
+ tx_cmd = f"{cfg.bin_remote} -c -h {cfg.nk_guest_ipv6} -p {cfg.port} -l 12840"
+
+ # Probe via -d (dry run): exits with SKIP_CODE if the leased physical
+ # qops doesn't advertise QCFG_RX_PAGE_SIZE (e.g. older bnxt FW/HW).
+ probe = cmd(rx_cmd + " -d", fail=False, ns=cfg.netns)
+ if probe.ret == SKIP_CODE:
+ msg = probe.stdout.strip() or "rx_buf_len not supported by leased NIC"
+ raise KsftSkipEx(msg)
+
+ # A successful dry run still registered the zcrx ifq on the leased
+ # physical queue; wait for its async teardown before the real server
+ # binds the same queue.
+ _mp_clear_wait(cfg, src_queue)
+
+ with bkg(rx_cmd, exit_wait=True, ns=cfg.netns):
+ wait_port_listen(cfg.port, proto="tcp", ns=cfg.netns)
+ cmd(tx_cmd, host=cfg.remote)
+
+
def test_attrs(cfg) -> None:
cfg.require_ipver("6")
src_queue, nk_queue = _setup_lease(cfg)
@@ -350,7 +449,13 @@ def main() -> None:
cfg.port = rand_port()
ksft_run(
- [test_iou_zcrx, test_attrs, test_attach_xdp_with_mp, test_destroy],
+ [
+ test_iou_zcrx,
+ test_iou_zcrx_large_buf,
+ test_attrs,
+ test_attach_xdp_with_mp,
+ test_destroy,
+ ],
args=(cfg,),
)
ksft_exit()
--
2.43.0
^ permalink raw reply related
* [PATCH v2] net: mvneta_bm: add suspend/resume support to prevent crash after resume
From: Yun Zhou @ 2026-06-14 10:38 UTC (permalink / raw)
To: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni
Cc: netdev, linux-kernel, yun.zhou
The mvneta driver uses the hardware Buffer Manager (BM) for RX buffer
allocation. During suspend, mvneta disables its clock, causing BM to
lose all buffer address state. On resume, mvneta_bm_port_init() re-
attaches the BM pool to the NIC, but BM hardware returns stale/garbage
buffer addresses. When NAPI poll processes these buffers, DMA cache
sync hits an invalid virtual address causing a kernel panic:
Unable to handle kernel paging request at virtual address b0000080
PC is at v7_dma_inv_range
Call trace:
v7_dma_inv_range from arch_sync_dma_for_cpu+0x94/0x158
arch_sync_dma_for_cpu from __dma_sync_single_for_cpu+0xc4/0x15c
__dma_sync_single_for_cpu from mvneta_rx_swbm+0x6c8/0xf48
mvneta_rx_swbm from mvneta_poll+0x6fc/0x70c
mvneta_poll from __napi_poll.constprop.0+0x2c/0x1e0
__napi_poll.constprop.0 from net_rx_action+0x160/0x2c4
net_rx_action from handle_softirqs+0xd8/0x2b8
handle_softirqs from run_ksoftirqd+0x30/0x94
run_ksoftirqd from smpboot_thread_fn+0x100/0x204
smpboot_thread_fn from kthread+0xf4/0x110
kthread from ret_from_fork+0x14/0x28
Fix by adding suspend/resume callbacks to the BM driver:
- suspend: drain all buffers (with DMA unmapping), free the BPPE
regions, and reset pool state to FREE before stopping BM and gating
the clock.
- resume: enable the clock and reinitialize BM defaults. Pool
allocation and buffer refill are handled by mvneta_resume() through
the normal mvneta_bm_port_init() path, which sees pools as FREE and
performs full initialization identical to probe.
Add a device_link (DL_FLAG_AUTOREMOVE_CONSUMER) in mvneta_probe to
guarantee BM resumes before mvneta and suspends after mvneta.
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
v2:
- Drain buffers via mvneta_bm_bufs_free() in suspend instead of only
stopping BM and gating the clock. This ensures proper DMA unmapping
and avoids buffer leaks.
- Free the BPPE DMA-coherent region in suspend so that resume takes
the full probe-time initialization path (alloc + fill), eliminating
the need to modify mvneta_bm_pool_create().
- Reset pool type to MVNETA_BM_FREE in suspend so mvneta_bm_pool_use()
correctly re-creates and refills pools on resume.
- Check clk_prepare_enable() return value in resume.
- Add device_link between mvneta (consumer) and mvneta_bm (supplier)
to guarantee correct suspend/resume ordering.
drivers/net/ethernet/marvell/mvneta.c | 5 +++
drivers/net/ethernet/marvell/mvneta_bm.c | 47 ++++++++++++++++++++++++
2 files changed, 52 insertions(+)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 0c061fb0ed07..cfaf5ea1db9e 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -5678,6 +5678,11 @@ static int mvneta_probe(struct platform_device *pdev)
"use SW buffer management\n");
mvneta_bm_put(pp->bm_priv);
pp->bm_priv = NULL;
+ } else {
+ /* Ensure BM suspends after us, resumes before us */
+ device_link_add(&pdev->dev,
+ &pp->bm_priv->pdev->dev,
+ DL_FLAG_AUTOREMOVE_CONSUMER);
}
}
/* Set RX packet offset correction for platforms, whose
diff --git a/drivers/net/ethernet/marvell/mvneta_bm.c b/drivers/net/ethernet/marvell/mvneta_bm.c
index 6bb380494919..ff2fe4020a45 100644
--- a/drivers/net/ethernet/marvell/mvneta_bm.c
+++ b/drivers/net/ethernet/marvell/mvneta_bm.c
@@ -477,6 +477,52 @@ static void mvneta_bm_remove(struct platform_device *pdev)
clk_disable_unprepare(priv->clk);
}
+#ifdef CONFIG_PM_SLEEP
+static int mvneta_bm_suspend(struct device *dev)
+{
+ struct mvneta_bm *priv = dev_get_drvdata(dev);
+ int i;
+
+ /* Drain buffers and free pool resources while BM is still clocked */
+ for (i = 0; i < MVNETA_BM_POOLS_NUM; i++) {
+ struct mvneta_bm_pool *bm_pool = &priv->bm_pools[i];
+ int size_bytes;
+
+ if (bm_pool->type == MVNETA_BM_FREE)
+ continue;
+
+ mvneta_bm_bufs_free(priv, bm_pool, bm_pool->port_map);
+
+ size_bytes = sizeof(u32) * bm_pool->hwbm_pool.size;
+ dma_free_coherent(&priv->pdev->dev, size_bytes,
+ bm_pool->virt_addr, bm_pool->phys_addr);
+ bm_pool->virt_addr = NULL;
+ bm_pool->type = MVNETA_BM_FREE;
+ }
+
+ mvneta_bm_write(priv, MVNETA_BM_COMMAND_REG, MVNETA_BM_STOP_MASK);
+ clk_disable_unprepare(priv->clk);
+ return 0;
+}
+
+static int mvneta_bm_resume(struct device *dev)
+{
+ struct mvneta_bm *priv = dev_get_drvdata(dev);
+ int err;
+
+ err = clk_prepare_enable(priv->clk);
+ if (err)
+ return err;
+
+ /* Reinitialize BM hardware; pools are refilled by mvneta_resume() */
+ mvneta_bm_default_set(priv);
+ mvneta_bm_write(priv, MVNETA_BM_COMMAND_REG, MVNETA_BM_START_MASK);
+ return 0;
+}
+#endif
+
+static SIMPLE_DEV_PM_OPS(mvneta_bm_pm_ops, mvneta_bm_suspend, mvneta_bm_resume);
+
static const struct of_device_id mvneta_bm_match[] = {
{ .compatible = "marvell,armada-380-neta-bm" },
{ }
@@ -489,6 +535,7 @@ static struct platform_driver mvneta_bm_driver = {
.driver = {
.name = MVNETA_BM_DRIVER_NAME,
.of_match_table = mvneta_bm_match,
+ .pm = &mvneta_bm_pm_ops,
},
};
--
2.43.0
^ permalink raw reply related
* Re: [PATCH RFC] net: bridge: mcast: don't clear L2 host_joined on port group deletion
From: Ido Schimmel @ 2026-06-14 10:54 UTC (permalink / raw)
To: cedric.jehasse
Cc: Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, bridge, netdev,
linux-kernel, Cedric Jehasse
In-Reply-To: <20260610-mdb_l2_host_joined_fix-v1-1-19746b0b8a5d@luminex.be>
On Wed, Jun 10, 2026 at 10:31:23AM +0200, Cedric Jehasse via B4 Relay wrote:
> From: Cedric Jehasse <cedric.jehasse@luminex.be>
>
> For a static L2 multicast group that has both a host entry and a port
> entry, deleting the port entry also removes the host entry, and the
> whole group then disappears from "bridge mdb show".
>
> To reproduce:
> bridge mdb add dev br0 port br0 grp 01:02:03:04:05:06 permanent
> bridge mdb add dev br0 port swp1 grp 01:02:03:04:05:06 permanent
> bridge mdb del dev br0 port swp1 grp 01:02:03:04:05:06 permanent
> bridge mdb show # the "port br0" host entry is gone, too
Please show the output in the commit message and also show that this
differs from regular (*, G) entries where the host entry is not removed
following the deletion of the port entry.
>
> br_multicast_del_pg() processes every non-(*,G) entry through the S,G
> path, which removes the port group from br->sg_port_tbl and then calls
> br_multicast_sg_del_exclude_ports(). L2 entries are stored in
> sg_port_tbl as well, so they take this path too.
>
> When the last port is removed in br_multicast_sg_del_exclude_ports it
> sets "sgmp->host_joined = false", clearing the host membership directly
> and bypassing br_multicast_host_leave(). With host_joined now false and
> no ports left, br_multicast_del_pg() arms the group timer and
> br_multicast_group_expired() tears down the whole mdb entry -- even
> though the host membership was explicitly and permanently configured
> from user space.
>
> Keep removing L2 port groups from sg_port_tbl, but skip the S,G
> EXCLUDE-mode handling for them. The host membership of an L2 group is
> managed solely via br_multicast_host_join() / br_multicast_host_leave().
>
> Signed-off-by: Cedric Jehasse <cedric.jehasse@luminex.be>
The patch seems OK to me, but please add a test case in bridge_mdb.sh.
I checked the code and AFAICT this never worked, so target at net-next
without a fixes tag: Support for L2 multicast groups was added in
955062b03fa62, but at this point the mode handling already existed in
br_multicast_del_pg().
^ permalink raw reply
* Re: [PATCH net-next v7 04/11] net: Enable BIG TCP with partial GSO
From: Paolo Abeni @ 2026-06-14 11:19 UTC (permalink / raw)
To: Alice Mikityanska, Daniel Borkmann, David S. Miller, Eric Dumazet,
Jakub Kicinski, Xin Long, Willem de Bruijn, Willem de Bruijn,
David Ahern, Nikolay Aleksandrov
Cc: Shuah Khan, Stanislav Fomichev, Andrew Lunn, Simon Horman,
Florian Westphal, netdev, Alice Mikityanska
In-Reply-To: <20260611192955.604661-5-alice.kernel@fastmail.im>
On 6/11/26 9:29 PM, Alice Mikityanska wrote:
> From: Alice Mikityanska <alice@isovalent.com>
>
> skb_segment is called for partial GSO, when netif_needs_gso returns true
> in validate_xmit_skb. Partial GSO is needed, for example, when
> segmentation of tunneled traffic is offloaded to a NIC that only
> supports inner checksum offload.
>
> Currently, skb_segment clamps the segment length to 65534 bytes, because
> gso_size == 65535 is a special value GSO_BY_FRAGS, and we don't want
> to accidentally assign mss = 65535, as it would fall into the
> GSO_BY_FRAGS check further in the function.
>
> This implementation, however, artificially blocks len > 65534, which is
> possible since the introduction of BIG TCP. To allow bigger lengths and
> avoid resegmentation of BIG TCP packets, store the gso_by_frags flag in
> the beginning and don't use a special value of mss for this purpose
> after mss was modified.
>
> Signed-off-by: Alice Mikityanska <alice@isovalent.com>
> Reviewed-by: Willem de Bruijn <willemb@google.com>
> ---
> drivers/net/netdevsim/psp.c | 2 +-
> net/core/skbuff.c | 10 +++++-----
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/netdevsim/psp.c b/drivers/net/netdevsim/psp.c
> index d3e36c74be62..6b3532b5e360 100644
> --- a/drivers/net/netdevsim/psp.c
> +++ b/drivers/net/netdevsim/psp.c
> @@ -92,7 +92,7 @@ nsim_do_psp(struct sk_buff *skb, struct netdevsim *ns,
> * provide a valid checksum here, so the skb isn't dropped.
> */
> uh = udp_hdr(skb);
> - udplen = ntohs(uh->len) ?: skb->len - skb_transport_offset(skb);
> + udplen = udp_get_len(skb, uh, skb_transport_offset(skb));
> csum = skb_checksum(skb, skb_transport_offset(skb),
> udplen, 0);
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index c64693fcb2d1..5dcee79df8cf 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4773,6 +4773,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> struct sk_buff *tail = NULL;
> struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
> unsigned int mss = skb_shinfo(head_skb)->gso_size;
> + bool gso_by_frags = mss == GSO_BY_FRAGS;
> unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
> unsigned int offset = doffset;
> unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
> @@ -4788,7 +4789,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> int nfrags, pos;
>
> if ((skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY) &&
> - mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) {
> + !gso_by_frags && mss != skb_headlen(head_skb)) {
> struct sk_buff *check_skb;
>
> for (check_skb = list_skb; check_skb; check_skb = check_skb->next) {
> @@ -4816,7 +4817,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> sg = !!(features & NETIF_F_SG);
> csum = !!can_checksum_protocol(features, proto);
>
> - if (sg && csum && (mss != GSO_BY_FRAGS)) {
> + if (sg && csum && !gso_by_frags) {
> if (!(features & NETIF_F_GSO_PARTIAL)) {
> struct sk_buff *iter;
> unsigned int frag_len;
> @@ -4850,9 +4851,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> /* GSO partial only requires that we trim off any excess that
> * doesn't fit into an MSS sized block, so take care of that
> * now.
> - * Cap len to not accidentally hit GSO_BY_FRAGS.
> */
> - partial_segs = min(len, GSO_BY_FRAGS - 1) / mss;
> + partial_segs = len / mss;
Sashiko/gemini says the above can lead to hit BUG_ON() later.
I *think* it's not a false positive, as it looks like skb_segment()
assumes an skb can hold `mss` bytes without resorting to frag_list
usage, and mss > MAX_SKB_FRAGS * PAGE_SIZE breaks such assumption.
I think handling correctly this case will requires some non trivial
surgery to skb_segment: both `while (pos < offset + len) {` loops must
be updated to feed data from `frags` as needed instead of
BUG_ON()/net_warn_ratelimited(skb_shinfo(nskb)->nr_frags >= MAX_SKB_FRAGS);
/P
^ permalink raw reply
* Re: [PATCH net-next v7 06/11] udp: Support gro_ipv4_max_size > 65536
From: Paolo Abeni @ 2026-06-14 11:30 UTC (permalink / raw)
To: Alice Mikityanska, Daniel Borkmann, David S. Miller, Eric Dumazet,
Jakub Kicinski, Xin Long, Willem de Bruijn, Willem de Bruijn,
David Ahern, Nikolay Aleksandrov
Cc: Shuah Khan, Stanislav Fomichev, Andrew Lunn, Simon Horman,
Florian Westphal, netdev, Alice Mikityanska
In-Reply-To: <20260611192955.604661-7-alice.kernel@fastmail.im>
On 6/11/26 9:29 PM, Alice Mikityanska wrote:
> From: Alice Mikityanska <alice@isovalent.com>
>
> Currently, gro_max_size and gro_ipv4_max_size can be set to values
> bigger than 65536, and GRO will happily aggregate UDP to the configured
> size (for example, with TCP traffic in VXLAN tunnels).
Sashiko gemini says IPv4 will not be happy, suspecting iph_totlen() will
not work properly, but that is wrong, because skb_is_gso_tcp() will
still be true for the relevant GSO packets.
It would be nice to explicitly mentioning such fact here.
/P
^ permalink raw reply
* [PATCH net-next 00/11] Netfilter/IPVS updates for net-next
From: Pablo Neira Ayuso @ 2026-06-14 11:45 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
Hi,
The following patchset contains Netfilter/IPVS updates for net-next.
More specifically, this contains conncount rework to address AI related
reports, assorted Netfiter updates and two small incremental updates on
IPVS:
1) Replace old obsolete workqueues (system_wq, system_unbound_wq)
in IPVS, from Marco Crivellari.
2) Replace WARN_ON{_ONCE} by DEBUG_NET_WARN_ON_ONCE in nf_tables.
In the recent years, reporters say that the use of WARN_ON{_ONCE}
in conjunction with panic_on_warn=1 results in DoS. Let's replace
it by DEBUG_NET_WARN_ON_ONCE so this is only exercised by test
infrastructure and fuzzers, while also providing context to AI
agents. From Fernando F. Mancera.
Five patches from Florian Westphal to address AI reports in the conncount
infrastructures:
3) Fix missing rcu read lock section when calling
__ovs_ct_limit_get_zone_limit().
4) Add a dedicate lock per rbtree tree, this increases memory
usage but it should improve scalability.
5) Add a helper function to find the rbtree node, no functional
changes are intented.
6) Add sequence counter to detect concurrent tree modifications
and retry lookups.
7) Add locks to GC conncount walk and address other nitpicks.
Then, several assorted updates:
8) Defensive Tree-wide addition of NULL checks for ct extensions.
9) Bail out if flowtable bypass cannot be fully set up from the
flow offload expression, instead of lazy building a likely
incomplete one.
10) Fix documentation for the new conn_max sysctl toggle in IPVS.
11) Add nf_dev_xmit_recursion*() helpers and use them, to address
recent AI reports.
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git nf-next-26-06-14
Thanks.
----------------------------------------------------------------
The following changes since commit 4ed4f607e1cb6041db46ca5cd3200987d7d1eff2:
Merge tag 'batadv-next-pullrequest-20260605' of https://git.open-mesh.org/batadv (2026-06-08 15:40:55 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git tags/nf-next-26-06-14
for you to fetch changes up to 2354e975932dabb06fad239f07a3b68fd1809737:
netfilter: nf_dup_netdev: add nf_dev_xmit_recursion*() helpers and use them (2026-06-14 13:07:03 +0200)
----------------------------------------------------------------
netfilter pull request 26-06-14
----------------------------------------------------------------
Fernando Fernandez Mancera (1):
netfilter: nf_tables: use DEBUG_NET_WARN_ON_ONCE in packet and control paths
Florian Westphal (5):
netfilter: nf_conncount: callers must hold rcu read lock
netfilter: nf_conncount: use per nf_conncount_data spinlocks
netfilter: nf_conncount: split count_tree_node rbtree walk into helper
netfilter: nf_conncount: add sequence counter to detect tree modifications
netfilter: nf_conncount: gc and rcu fixes
Julian Anastasov (1):
ipvs: fix doc syntax for conn_max sysctl
Marco Crivellari (1):
ipvs: Replace use of system_unbound_wq with system_dfl_long_wq
Pablo Neira Ayuso (3):
netfilter: conntrack: check NULL when retrieving ct extension
netfilter: flowtable: bail out if forward path cannot be discovered
netfilter: nf_dup_netdev: add nf_dev_xmit_recursion*() helpers and use them
Documentation/networking/ipvs-sysctl.rst | 23 ++-
include/net/netfilter/nf_conntrack_helper.h | 2 +
include/net/netfilter/nf_dup_netdev.h | 34 +++-
net/ipv4/netfilter/nf_nat_h323.c | 12 ++
net/ipv4/netfilter/nf_nat_pptp.c | 14 +-
net/netfilter/ipvs/ip_vs_conn.c | 4 +-
net/netfilter/ipvs/ip_vs_ctl.c | 10 +-
net/netfilter/nf_conncount.c | 230 +++++++++++++++++-----------
net/netfilter/nf_conntrack_broadcast.c | 3 +
net/netfilter/nf_conntrack_expect.c | 33 ++--
net/netfilter/nf_conntrack_ftp.c | 6 +
net/netfilter/nf_conntrack_h323_main.c | 18 +++
net/netfilter/nf_conntrack_pptp.c | 9 ++
net/netfilter/nf_conntrack_proto_gre.c | 9 ++
net/netfilter/nf_conntrack_sane.c | 3 +
net/netfilter/nf_conntrack_seqadj.c | 17 +-
net/netfilter/nf_conntrack_sip.c | 41 ++++-
net/netfilter/nf_dup_netdev.c | 15 +-
net/netfilter/nf_flow_table_path.c | 81 +++++-----
net/netfilter/nf_nat_sip.c | 12 ++
net/netfilter/nf_tables_api.c | 38 +++--
net/netfilter/nf_tables_core.c | 8 +-
net/netfilter/nf_tables_offload.c | 2 +-
net/netfilter/nf_tables_trace.c | 6 +-
net/netfilter/nfnetlink_cthelper.c | 6 +
net/netfilter/nft_ct.c | 2 +-
net/netfilter/nft_ct_fast.c | 2 +-
net/netfilter/nft_exthdr.c | 2 +-
net/netfilter/nft_fib.c | 2 +-
net/netfilter/nft_fwd_netdev.c | 17 +-
net/netfilter/nft_inner.c | 2 +-
net/netfilter/nft_lookup.c | 2 +-
net/netfilter/nft_masq.c | 2 +-
net/netfilter/nft_meta.c | 10 +-
net/netfilter/nft_payload.c | 6 +-
net/netfilter/nft_redir.c | 2 +-
net/netfilter/nft_reject.c | 8 +-
net/netfilter/nft_rt.c | 2 +-
net/netfilter/nft_set_hash.c | 2 +-
net/netfilter/nft_set_pipapo.c | 2 +-
net/netfilter/nft_set_rbtree.c | 6 +-
net/netfilter/nft_socket.c | 8 +-
net/netfilter/nft_tunnel.c | 2 +-
net/netfilter/nft_xfrm.c | 6 +-
net/openvswitch/conntrack.c | 2 +-
45 files changed, 494 insertions(+), 229 deletions(-)
^ permalink raw reply
* [PATCH net-next 01/11] ipvs: Replace use of system_unbound_wq with system_dfl_long_wq
From: Pablo Neira Ayuso @ 2026-06-14 11:45 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
In-Reply-To: <20260614114605.474783-1-pablo@netfilter.org>
From: Marco Crivellari <marco.crivellari@suse.com>
This patch continues the effort to refactor workqueue APIs, which has
begun with the changes introducing new workqueues and a new
alloc_workqueue flag:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The point of the refactoring is to eventually alter the default behavior
of workqueues to become unbound by default so that their workload
placement is optimized by the scheduler.
Before that to happen, workqueue users must be converted to the better
named new workqueues with no intended behaviour changes:
system_wq -> system_percpu_wq
system_unbound_wq -> system_dfl_wq
This way the old obsolete workqueues (system_wq, system_unbound_wq) can
be removed in the future.
This specific work is considered long, so enqueue it using
system_dfl_long_wq instead of system_dfl_wq.
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/ipvs/ip_vs_conn.c | 4 ++--
net/netfilter/ipvs/ip_vs_ctl.c | 10 +++++-----
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index e76a73d183d5..cb36641f8d1c 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -285,7 +285,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
/* Schedule resizing if load increases */
if (atomic_read(&ipvs->conn_count) > t->u_thresh &&
!test_and_set_bit(IP_VS_WORK_CONN_RESIZE, &ipvs->work_flags))
- mod_delayed_work(system_unbound_wq, &ipvs->conn_resize_work, 0);
+ mod_delayed_work(system_dfl_long_wq, &ipvs->conn_resize_work, 0);
return ret;
}
@@ -916,7 +916,7 @@ static void conn_resize_work_handler(struct work_struct *work)
out:
/* Monitor if we need to shrink table */
- queue_delayed_work(system_unbound_wq, &ipvs->conn_resize_work,
+ queue_delayed_work(system_dfl_long_wq, &ipvs->conn_resize_work,
more_work ? 1 : 2 * HZ);
}
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index f765d1506839..bcf40b8c41cf 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -821,7 +821,7 @@ static void svc_resize_work_handler(struct work_struct *work)
if (!READ_ONCE(ipvs->enable) || !more_work ||
test_bit(IP_VS_WORK_SVC_NORESIZE, &ipvs->work_flags))
return;
- queue_delayed_work(system_unbound_wq, &ipvs->svc_resize_work, 1);
+ queue_delayed_work(system_dfl_long_wq, &ipvs->svc_resize_work, 1);
return;
unlock_m:
@@ -1869,7 +1869,7 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,
/* Schedule resize work */
if (grow && !test_and_set_bit(IP_VS_WORK_SVC_RESIZE, &ipvs->work_flags))
- queue_delayed_work(system_unbound_wq, &ipvs->svc_resize_work,
+ queue_delayed_work(system_dfl_long_wq, &ipvs->svc_resize_work,
1);
*svc_p = svc;
@@ -2125,7 +2125,7 @@ static int ip_vs_del_service(struct ip_vs_service *svc)
rcu_read_unlock();
if (shrink && !test_and_set_bit(IP_VS_WORK_SVC_RESIZE,
&ipvs->work_flags))
- queue_delayed_work(system_unbound_wq,
+ queue_delayed_work(system_dfl_long_wq,
&ipvs->svc_resize_work, 1);
}
return 0;
@@ -2606,7 +2606,7 @@ static int ipvs_proc_conn_lfactor(const struct ctl_table *table, int write,
} else {
WRITE_ONCE(*valp, val);
if (rcu_access_pointer(ipvs->conn_tab))
- mod_delayed_work(system_unbound_wq,
+ mod_delayed_work(system_dfl_long_wq,
&ipvs->conn_resize_work, 0);
}
}
@@ -2638,7 +2638,7 @@ static int ipvs_proc_svc_lfactor(const struct ctl_table *table, int write,
READ_ONCE(ipvs->enable) &&
!test_bit(IP_VS_WORK_SVC_NORESIZE,
&ipvs->work_flags))
- mod_delayed_work(system_unbound_wq,
+ mod_delayed_work(system_dfl_long_wq,
&ipvs->svc_resize_work, 0);
mutex_unlock(&ipvs->service_mutex);
}
--
2.47.3
^ permalink raw reply related
* [PATCH net-next 02/11] netfilter: nf_tables: use DEBUG_NET_WARN_ON_ONCE in packet and control paths
From: Pablo Neira Ayuso @ 2026-06-14 11:45 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
In-Reply-To: <20260614114605.474783-1-pablo@netfilter.org>
From: Fernando Fernandez Mancera <fmancera@suse.de>
Replace raw warning macros with DEBUG_NET_WARN_ON_ONCE across the
nf_tables API, core engine, and expression evaluations. This prevents
unnecessary system panics when panic_on_warn=1 is enabled in production
systems.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_tables_api.c | 38 +++++++++++++++++++++++--------
net/netfilter/nf_tables_core.c | 8 ++++---
net/netfilter/nf_tables_offload.c | 2 +-
net/netfilter/nf_tables_trace.c | 6 +++--
net/netfilter/nft_ct.c | 2 +-
net/netfilter/nft_ct_fast.c | 2 +-
net/netfilter/nft_exthdr.c | 2 +-
net/netfilter/nft_fib.c | 2 +-
net/netfilter/nft_inner.c | 2 +-
net/netfilter/nft_lookup.c | 2 +-
net/netfilter/nft_masq.c | 2 +-
net/netfilter/nft_meta.c | 10 ++++----
net/netfilter/nft_payload.c | 6 ++---
net/netfilter/nft_redir.c | 2 +-
net/netfilter/nft_reject.c | 8 +++++--
net/netfilter/nft_rt.c | 2 +-
net/netfilter/nft_set_hash.c | 2 +-
net/netfilter/nft_set_pipapo.c | 2 +-
net/netfilter/nft_set_rbtree.c | 6 +++--
net/netfilter/nft_socket.c | 8 ++++---
net/netfilter/nft_tunnel.c | 2 +-
net/netfilter/nft_xfrm.c | 6 ++---
22 files changed, 76 insertions(+), 46 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 87387adbca65..4884f7f7aaee 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3378,8 +3378,10 @@ static int nf_tables_delchain(struct sk_buff *skb, const struct nfnl_info *info,
*/
int nft_register_expr(struct nft_expr_type *type)
{
- if (WARN_ON_ONCE(type->maxattr > NFT_EXPR_MAXATTR))
+ if (unlikely(type->maxattr > NFT_EXPR_MAXATTR)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return -ENOMEM;
+ }
nfnl_lock(NFNL_SUBSYS_NFTABLES);
if (type->family == NFPROTO_UNSPEC)
@@ -3691,8 +3693,10 @@ int nft_expr_clone(struct nft_expr *dst, struct nft_expr *src, gfp_t gfp)
{
int err;
- if (WARN_ON_ONCE(!src->ops->clone))
+ if (unlikely(!src->ops->clone)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return -EINVAL;
+ }
dst->ops = src->ops;
err = src->ops->clone(dst, src, gfp);
@@ -8327,8 +8331,10 @@ static int nf_tables_newobj(struct sk_buff *skb, const struct nfnl_info *info,
return 0;
type = nft_obj_type_get(net, objtype, family);
- if (WARN_ON_ONCE(IS_ERR(type)))
+ if (IS_ERR(type)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return PTR_ERR(type);
+ }
nft_ctx_init(&ctx, net, skb, info->nlh, family, table, NULL, nla);
@@ -10306,19 +10312,25 @@ static int nf_tables_commit_chain_prepare(struct net *net, struct nft_chain *cha
prule = (struct nft_rule_dp *)data;
data += offsetof(struct nft_rule_dp, data);
- if (WARN_ON_ONCE(data > data_boundary))
+ if (unlikely(data > data_boundary)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return -ENOMEM;
+ }
size = 0;
nft_rule_for_each_expr(expr, last, rule) {
- if (WARN_ON_ONCE(data + size + expr->ops->size > data_boundary))
+ if (unlikely(data + size + expr->ops->size > data_boundary)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return -ENOMEM;
+ }
memcpy(data + size, expr, expr->ops->size);
size += expr->ops->size;
}
- if (WARN_ON_ONCE(size >= 1 << 12))
+ if (unlikely(size >= 1 << 12)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return -ENOMEM;
+ }
prule->handle = rule->handle;
prule->dlen = size;
@@ -10329,8 +10341,10 @@ static int nf_tables_commit_chain_prepare(struct net *net, struct nft_chain *cha
chain->blob_next->size += (unsigned long)(data - (void *)prule);
}
- if (WARN_ON_ONCE(data > data_boundary))
+ if (unlikely(data > data_boundary)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return -ENOMEM;
+ }
prule = (struct nft_rule_dp *)data;
nft_last_rule(chain, prule);
@@ -11636,8 +11650,10 @@ int nft_parse_register_load(const struct nft_ctx *ctx,
next_register = DIV_ROUND_UP(len, NFT_REG32_SIZE) + reg;
/* Can't happen: nft_validate_register_load() should have failed */
- if (WARN_ON_ONCE(next_register > NFT_REG32_NUM))
+ if (unlikely(next_register > NFT_REG32_NUM)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return -EINVAL;
+ }
/* find first register that did not see an earlier store. */
invalid_reg = find_next_zero_bit(ctx->reg_inited, NFT_REG32_NUM, reg);
@@ -11884,8 +11900,10 @@ int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data,
struct nlattr *tb[NFTA_DATA_MAX + 1];
int err;
- if (WARN_ON_ONCE(!desc->size))
+ if (unlikely(!desc->size)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return -EINVAL;
+ }
err = nla_parse_nested_deprecated(tb, NFTA_DATA_MAX, nla,
nft_data_policy, NULL);
@@ -11950,7 +11968,7 @@ int nft_data_dump(struct sk_buff *skb, int attr, const struct nft_data *data,
break;
default:
err = -EINVAL;
- WARN_ON(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
}
nla_nest_end(skb, nest);
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index 8ab186f86dd4..01a72f334dc6 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -314,8 +314,10 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv)
switch (regs.verdict.code) {
case NFT_JUMP:
- if (WARN_ON_ONCE(stackptr >= NFT_JUMP_STACK_SIZE))
- return NF_DROP;
+ if (unlikely(stackptr >= NFT_JUMP_STACK_SIZE)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
+ return NF_DROP_REASON(pkt->skb, SKB_DROP_REASON_NETFILTER_DROP, ELOOP);
+ }
jumpstack[stackptr].rule = nft_rule_next(rule);
stackptr++;
fallthrough;
@@ -326,7 +328,7 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv)
case NFT_RETURN:
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
}
if (stackptr > 0) {
diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
index 9101b1703b52..8998a24651ff 100644
--- a/net/netfilter/nf_tables_offload.c
+++ b/net/netfilter/nf_tables_offload.c
@@ -361,7 +361,7 @@ static int nft_block_setup(struct nft_base_chain *basechain,
err = nft_flow_offload_unbind(bo, basechain);
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
err = -EOPNOTSUPP;
}
diff --git a/net/netfilter/nf_tables_trace.c b/net/netfilter/nf_tables_trace.c
index a88abae5a9de..d85b6a2fb43c 100644
--- a/net/netfilter/nf_tables_trace.c
+++ b/net/netfilter/nf_tables_trace.c
@@ -227,8 +227,10 @@ static const struct nft_chain *nft_trace_get_chain(const struct nft_rule_dp *rul
last = (const struct nft_rule_dp_last *)rule;
- if (WARN_ON_ONCE(!last->chain))
+ if (unlikely(!last->chain)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return &info->basechain->chain;
+ }
return last->chain;
}
@@ -354,7 +356,7 @@ void nft_trace_notify(const struct nft_pktinfo *pkt,
return;
nla_put_failure:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
kfree_skb(skb);
}
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 9fe179d688da..25934c6f01fb 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -1132,7 +1132,7 @@ static void nft_ct_helper_obj_eval(struct nft_object *obj,
to_assign = priv->helper6;
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
return;
}
diff --git a/net/netfilter/nft_ct_fast.c b/net/netfilter/nft_ct_fast.c
index ecf7b3a404be..a44524c4fe63 100644
--- a/net/netfilter/nft_ct_fast.c
+++ b/net/netfilter/nft_ct_fast.c
@@ -53,7 +53,7 @@ void nft_ct_get_fast_eval(const struct nft_expr *expr,
return;
#endif
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
regs->verdict.code = NFT_BREAK;
break;
}
diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c
index e6a07c0df207..8861b4d191d1 100644
--- a/net/netfilter/nft_exthdr.c
+++ b/net/netfilter/nft_exthdr.c
@@ -298,7 +298,7 @@ static void nft_exthdr_tcp_set_eval(const struct nft_expr *expr,
old.v32, new.v32, false);
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
break;
}
diff --git a/net/netfilter/nft_fib.c b/net/netfilter/nft_fib.c
index 327a5f33659c..1d0d815c8745 100644
--- a/net/netfilter/nft_fib.c
+++ b/net/netfilter/nft_fib.c
@@ -155,7 +155,7 @@ void nft_fib_store_result(void *reg, const struct nft_fib *priv,
strscpy_pad(reg, dev ? dev->name : "", IFNAMSIZ);
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
*dreg = 0;
break;
}
diff --git a/net/netfilter/nft_inner.c b/net/netfilter/nft_inner.c
index d14ca157910b..97fb4eea2d66 100644
--- a/net/netfilter/nft_inner.c
+++ b/net/netfilter/nft_inner.c
@@ -308,7 +308,7 @@ static void nft_inner_eval(const struct nft_expr *expr, struct nft_regs *regs,
nft_meta_inner_eval((struct nft_expr *)&priv->expr, regs, pkt, &tun_ctx);
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
goto err;
}
nft_inner_save_tun_ctx(pkt, &tun_ctx);
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
index 9fafe5afc490..ba512e94b402 100644
--- a/net/netfilter/nft_lookup.c
+++ b/net/netfilter/nft_lookup.c
@@ -50,7 +50,7 @@ __nft_set_do_lookup(const struct net *net, const struct nft_set *set,
if (set->ops == &nft_set_rbtree_type.ops)
return nft_rbtree_lookup(net, set, key);
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
#endif
return set->ops->lookup(net, set, key);
}
diff --git a/net/netfilter/nft_masq.c b/net/netfilter/nft_masq.c
index 2b01128737a3..841efd981e20 100644
--- a/net/netfilter/nft_masq.c
+++ b/net/netfilter/nft_masq.c
@@ -123,7 +123,7 @@ static void nft_masq_eval(const struct nft_expr *expr,
break;
#endif
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
break;
}
}
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 5b25851381e5..9b5821c64442 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -116,12 +116,12 @@ nft_meta_get_eval_pkttype_lo(const struct nft_pktinfo *pkt,
nft_reg_store8(dest, PACKET_MULTICAST);
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
return false;
}
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
return false;
}
@@ -460,7 +460,7 @@ void nft_meta_get_eval(const struct nft_expr *expr,
nft_meta_get_eval_sdifname(dest, pkt);
break;
default:
- WARN_ON(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
goto err;
}
return;
@@ -506,7 +506,7 @@ void nft_meta_set_eval(const struct nft_expr *expr,
break;
#endif
default:
- WARN_ON(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
}
}
EXPORT_SYMBOL_GPL(nft_meta_set_eval);
@@ -886,7 +886,7 @@ void nft_meta_inner_eval(const struct nft_expr *expr,
nft_reg_store8(dest, tun_ctx->l4proto);
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
goto err;
}
return;
diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index 484a5490832e..ef2a80dfc68f 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -196,7 +196,7 @@ void nft_payload_eval(const struct nft_expr *expr,
goto err;
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
goto err;
}
offset += priv->offset;
@@ -603,7 +603,7 @@ void nft_payload_inner_eval(const struct nft_expr *expr, struct nft_regs *regs,
offset = tun_ctx->inner_thoff;
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
goto err;
}
offset += priv->offset;
@@ -866,7 +866,7 @@ static void nft_payload_set_eval(const struct nft_expr *expr,
goto err;
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
goto err;
}
diff --git a/net/netfilter/nft_redir.c b/net/netfilter/nft_redir.c
index 58ae802db8f5..a98aa28180fb 100644
--- a/net/netfilter/nft_redir.c
+++ b/net/netfilter/nft_redir.c
@@ -126,7 +126,7 @@ static void nft_redir_eval(const struct nft_expr *expr,
break;
#endif
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
break;
}
}
diff --git a/net/netfilter/nft_reject.c b/net/netfilter/nft_reject.c
index 196a92c7ea09..e3972e904cf0 100644
--- a/net/netfilter/nft_reject.c
+++ b/net/netfilter/nft_reject.c
@@ -102,8 +102,10 @@ static u8 icmp_code_v4[NFT_REJECT_ICMPX_MAX + 1] = {
int nft_reject_icmp_code(u8 code)
{
- if (WARN_ON_ONCE(code > NFT_REJECT_ICMPX_MAX))
+ if (unlikely(code > NFT_REJECT_ICMPX_MAX)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return ICMP_NET_UNREACH;
+ }
return icmp_code_v4[code];
}
@@ -120,8 +122,10 @@ static u8 icmp_code_v6[NFT_REJECT_ICMPX_MAX + 1] = {
int nft_reject_icmpv6_code(u8 code)
{
- if (WARN_ON_ONCE(code > NFT_REJECT_ICMPX_MAX))
+ if (unlikely(code > NFT_REJECT_ICMPX_MAX)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return ICMPV6_NOROUTE;
+ }
return icmp_code_v6[code];
}
diff --git a/net/netfilter/nft_rt.c b/net/netfilter/nft_rt.c
index e23cd4759851..aeb0094eafd8 100644
--- a/net/netfilter/nft_rt.c
+++ b/net/netfilter/nft_rt.c
@@ -93,7 +93,7 @@ void nft_rt_get_eval(const struct nft_expr *expr,
break;
#endif
default:
- WARN_ON(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
goto err;
}
return;
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index b0e571c8e3f3..eb4e382119d4 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -385,7 +385,7 @@ static void nft_rhash_walk(const struct nft_ctx *ctx, struct nft_set *set,
break;
default:
iter->err = -EINVAL;
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
break;
}
}
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 50d4a4f04309..706c78853f24 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -2199,7 +2199,7 @@ static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
break;
default:
iter->err = -EINVAL;
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
break;
}
}
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index b4f0b5fdf1f2..018bbb6df4ce 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -654,8 +654,10 @@ static int nft_array_may_resize(const struct nft_set *set, bool flush)
}
realloc_array:
- if (WARN_ON_ONCE(nelems > new_max_intervals))
+ if (unlikely(nelems > new_max_intervals)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return -ENOMEM;
+ }
if (priv->array_next) {
if (max_intervals == new_max_intervals)
@@ -878,7 +880,7 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
break;
default:
iter->err = -EINVAL;
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
break;
}
}
diff --git a/net/netfilter/nft_socket.c b/net/netfilter/nft_socket.c
index a146a45d7531..52d892e04261 100644
--- a/net/netfilter/nft_socket.c
+++ b/net/netfilter/nft_socket.c
@@ -71,8 +71,10 @@ static noinline int nft_socket_cgroup_subtree_level(void)
if (level > 255)
return -ERANGE;
- if (WARN_ON_ONCE(level < 0))
+ if (unlikely(level < 0)) {
+ DEBUG_NET_WARN_ON_ONCE(1);
return -EINVAL;
+ }
return level;
}
@@ -97,7 +99,7 @@ static struct sock *nft_socket_do_lookup(const struct nft_pktinfo *pkt)
break;
#endif
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
break;
}
@@ -152,7 +154,7 @@ static void nft_socket_eval(const struct nft_expr *expr,
break;
#endif
default:
- WARN_ON(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
regs->verdict.code = NFT_BREAK;
}
diff --git a/net/netfilter/nft_tunnel.c b/net/netfilter/nft_tunnel.c
index 68f7cfbbee06..0a018d4706a9 100644
--- a/net/netfilter/nft_tunnel.c
+++ b/net/netfilter/nft_tunnel.c
@@ -60,7 +60,7 @@ static void nft_tunnel_get_eval(const struct nft_expr *expr,
regs->verdict.code = NFT_BREAK;
break;
default:
- WARN_ON(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
regs->verdict.code = NFT_BREAK;
}
}
diff --git a/net/netfilter/nft_xfrm.c b/net/netfilter/nft_xfrm.c
index 65a75d88e5f0..8cec43064319 100644
--- a/net/netfilter/nft_xfrm.c
+++ b/net/netfilter/nft_xfrm.c
@@ -132,7 +132,7 @@ static void nft_xfrm_state_get_key(const struct nft_xfrm *priv,
switch (priv->key) {
case NFT_XFRM_KEY_UNSPEC:
case __NFT_XFRM_KEY_MAX:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
break;
case NFT_XFRM_KEY_DADDR_IP4:
*dest = (__force __u32)state->id.daddr.a4;
@@ -206,7 +206,7 @@ static void nft_xfrm_get_eval(const struct nft_expr *expr,
nft_xfrm_get_eval_out(priv, regs, pkt);
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
regs->verdict.code = NFT_BREAK;
break;
}
@@ -252,7 +252,7 @@ static int nft_xfrm_validate(const struct nft_ctx *ctx, const struct nft_expr *e
(1 << NF_INET_POST_ROUTING);
break;
default:
- WARN_ON_ONCE(1);
+ DEBUG_NET_WARN_ON_ONCE(1);
return -EINVAL;
}
--
2.47.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox