* Re: [ANN] netdev development stats for 7.2
From: Jacob Keller @ 2026-06-17 23:19 UTC (permalink / raw)
To: Jakub Kicinski, netdev
In-Reply-To: <20260617115319.43a5942d@kernel.org>
On 6/17/2026 11:53 AM, Jakub Kicinski wrote:
> Top scores (positive): Top scores (negative):
> 1 ( ) [768] Jakub Kicinski 1 ( +1) [91] Tariq Toukan
> 2 ( ) [376] Simon Horman 2 ( +8) [86] Wei Fang
> 3 ( ) [346] Andrew Lunn 3 ( +4) [67] Ratheesh Kannoth
> 4 ( ) [265] Paolo Abeni 4 (***) [54] javen
> 5 ( +4) [ 91] Ido Schimmel 5 ( +6) [49] Lorenzo Bianconi
> 6 (+14) [ 74] David Laight 6 (***) [48] Luiz Angelo Daros de Luca
> 7 ( ) [ 62] Krzysztof Kozlowski 7 (***) [43] Simon Wunderlich
> 8 ( +2) [ 57] Aleksandr Loktionov 8 (***) [38] Chuck Lever
> 9 (+12) [ 50] Nikolay Aleksandrov 9 (+18) [38] Grzegorz Nitka
> 10 ( -4) [ 49] Willem de Bruijn 10 (***) [35] Pablo Neira Ayuso
> 11 ( +3) [ 49] Sabrina Dubroca 11 (***) [35] Markus Stockhausen
> 12 (+41) [ 47] Alexander Lobakin 12 (***) [34] Selvamani Rajagopal
> 13 (+24) [ 47] Maxime Chevallier 13 (***) [34] Jason Xing
> 14 ( -6) [ 46] David Ahern 14 ( -8) [33] Illusion Wang
> 15 (***) [ 43] Jiayuan Chen 15 (***) [30] Minxi Hou
>
> One process note on the reviewer score. Tariq tops the negative list.
> I've been returning to the question of whether it's fair since
> he has to handle submissions of most of nVidia's patches.
> Still, I don't understand why reading thru the list and reviewing
> one patchset from another company a day is too much to ask.
>
This is a difficult question. When I've covered for Tony in a similar
position, I've felt like it is hard enough to keep an eye on our own
list let alone also finding time to review other places.
A positive note here is that nVidia is now green overall, so at least
there is some participation from the company as a whole. On the other
hand, Tony isn't in the top negatives despite performing a somewhat
similar role.
I know I was lacking myself in the last cycle due to a bunch of
unrelated work and issues. I've been working to get review back into my
daily flow.
^ permalink raw reply
* Re: [PATCH net v3] net: airoha: Fix skb->priority underflow in airoha_dev_select_queue()
From: Jakub Kicinski @ 2026-06-17 23:19 UTC (permalink / raw)
To: lorenzo
Cc: Wayen Yan, netdev, horms, pabeni, edumazet, andrew+netdev,
angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
linux-mediatek
In-Reply-To: <178161373805.2167512.2544164327472822616@gmail.com>
On Sun, 14 Jun 2026 07:30:54 +0800 Wayen Yan wrote:
> In airoha_dev_select_queue(), the expression:
>
> queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES;
>
> implicitly converts to unsigned arithmetic: when skb->priority is 0
> (the default for unclassified traffic), (0u - 1u) wraps to UINT_MAX,
> and UINT_MAX % 8 = 7, routing default best-effort packets to the
> highest-priority QoS queue. This causes QoS inversion where the
> majority of traffic on a PON gateway starves actual high-priority
> flows (VoIP, gaming, etc.).
>
> Fix by guarding the subtraction: when priority is 0, map to queue 0
> (lowest priority), otherwise apply the original (priority - 1) % 8
> mapping.
>
> Fixes: 2b288b81560b ("net: airoha: Introduce ndo_select_queue callback")
> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
> Reviewed-by: Joe Damato <joe@dama.to>
> Signed-off-by: Wayen Yan <win847@gmail.com>
> ---
> drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 31cdb11cd7..d476ef83c3 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -1933,7 +1933,7 @@ static u16 airoha_dev_select_queue(struct net_device *dev, struct sk_buff *skb,
> */
> channel = netdev_uses_dsa(dev) ? skb_get_queue_mapping(skb) : port->id;
> channel = channel % AIROHA_NUM_QOS_CHANNELS;
> - queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES; /* QoS queue */
> + queue = skb->priority ? (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES : 0;
Hi Lorenzo, is there a reason we're subtracting 1 here in the first
place? Could be just me, but may be worth adding a comment here.
Intuitively if we are "narrowing" 16 prios to 8 queues it'd make most
sense to group the adjacent ones -- divide by two.
Please respin with some sort of an explanation..
> queue = channel * AIROHA_NUM_QOS_QUEUES + queue;
>
> return queue < dev->num_tx_queues ? queue : 0;
--
pw-bot: cr
^ permalink raw reply
* Re: [PATCH v3 0/3] net/smc: bound wire-controlled CDC cursors against the local buffers
From: Jakub Kicinski @ 2026-06-17 23:24 UTC (permalink / raw)
To: Bryam Vargas via B4 Relay
Cc: hexlabsecurity, Wenjia Zhang, Dust Li, D. Wythe, Sidraya Jayagond,
Eric Dumazet, David S. Miller, Mahanta Jambigi, Wen Gu,
Simon Horman, netdev, Ursula Braun, Stefan Raspl, linux-s390,
Paolo Abeni, linux-kernel, linux-rdma, Tony Lu
In-Reply-To: <20260614-b4-disp-edd64be9-v3-0-551fa514257e@proton.me>
On Sun, 14 Jun 2026 03:23:29 -0500 Bryam Vargas via B4 Relay wrote:
> A peer's CDC producer/consumer cursors are copied from the wire and used,
> without an upper bound against the local buffers, as (a) a raw index into the
> RMB on the urgent path, (b) the receive length in smc_rx_recvmsg(), and (c) the
> send length in smc_tx_sendmsg() on the SMC-D DMB-merge path. A malicious or
> buggy peer can forge a cursor so each of these runs past the relevant buffer:
> an out-of-bounds read of adjacent kernel memory (disclosed to the peer) on the
> receive/urgent side, and an out-of-bounds write of attacker-influenced length
> and content on the send side.
Once again, SMC maintainers -- please review.
--
mping: SHARED MEMORY COMMUNICATIONS (SMC) SOCKETS
^ permalink raw reply
* Re: [PATCH] net: tn40xx: fix netdev and NAPI leak in probe error paths
From: Jakub Kicinski @ 2026-06-17 23:33 UTC (permalink / raw)
To: ZhaoJinming
Cc: FUJITA Tomonori, Andrew Lunn, David S . Miller, Eric Dumazet,
Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260615064256.1068059-1-zhaojinming@uniontech.com>
On Mon, 15 Jun 2026 14:42:56 +0800 ZhaoJinming wrote:
> In tn40_probe(), after tn40_netdev_alloc() and netif_napi_add() succeed,
> none of the subsequent error paths call netif_napi_del() or free_netdev()
> to undo these operations. On any probe failure after netif_napi_add() the
> NAPI structure (embedded in the netdev private data) remains on the
> per-netdev napi_list while the backing memory is never freed, causing:
it's devm_ allocated:
ndev = devm_alloc_etherdev(&pdev->dev, sizeof(struct tn40_priv));
you're introducing a bug instead of fixing one..
--
pw-bot: reject
pv-bot: slop
^ permalink raw reply
* Re: [PATCH] rocker: Fix memory leak in ofdpa_port_fdb()
From: Jakub Kicinski @ 2026-06-17 23:44 UTC (permalink / raw)
To: Andrew Lunn, Jiri Pirko
Cc: Jacob Keller, Ziran Zhang, Andrew Lunn, David S . Miller,
Eric Dumazet, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <61892bd4-7368-4cd8-b360-0267e5c47156@lunn.ch>
On Wed, 17 Jun 2026 11:26:46 +0200 Andrew Lunn wrote:
> On Tue, Jun 16, 2026 at 04:29:59PM -0700, Jacob Keller wrote:
> > On 6/15/2026 6:32 PM, Ziran Zhang wrote:
> > > In ofdpa_port_fdb(), the hash_del() only unlinks the node from
> > > hash table, but does not free it.
> > >
> > > Fix this by adding kfree(found) after the !found == removing check,
> > > where the pointer value is no longer needed.
> > >
> > > Found by Coccinelle kfree script.
>
> Is rocker actually used any more? I'm not too sure of the history, but
> was it not added as a way to develop the early switchdev code? There
> was a qemu implementation of the 'hardware'?
>
> Is it still useful? Should we actually just remove the driver?
I think it came up before but I don't remember the conclusion :S
We should either add rocker to NIPA or delete it. Jiri, WDYT?
^ permalink raw reply
* Re: [PATCH] net: airoha: Stop TX queues on error path in airoha_dev_open
From: Jakub Kicinski @ 2026-06-17 23:44 UTC (permalink / raw)
To: Wayen Yan
Cc: netdev, lorenzo, horms, pabeni, edumazet, andrew+netdev,
angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
linux-mediatek
In-Reply-To: <178160729880.2156257.7978513589649053826@gmail.com>
On Tue, 16 Jun 2026 18:50:39 +0800 Wayen Yan wrote:
> In airoha_dev_open(), if airoha_set_vip_for_gdm_port() fails after
> netif_tx_start_all_queues() has been called, the TX queues remain
> started while the device configuration is incomplete. This leaves
> the device in an inconsistent state where packets could be
> transmitted before the VIP/IFC port configuration is complete.
Not sure if this was superseded by another posting but FWIW
this posting did not apply.
^ permalink raw reply
* Re: [PATCH net-next] ionic: Change list definition method
From: Jakub Kicinski @ 2026-06-17 23:47 UTC (permalink / raw)
To: Lei Zhu; +Cc: brett.creeley, andrew+netdev, davem, edumazet, netdev
In-Reply-To: <20260617023243.61595-1-zhulei_szu@163.com>
On Wed, 17 Jun 2026 10:32:43 +0800 Lei Zhu wrote:
> The LIST_HEAD macro can both define a linked list and initialize
> it in one step. To simplify code, we replace the separate operations
> of linked list definition and manual initialization with the LIST_HEAD
> macro.
## Form letter - net-next-closed
We have already submitted our pull request with net-next material for v7.2,
and therefore net-next is closed for new drivers, features, code refactoring
and optimizations. We are currently accepting bug fixes only.
Please repost when net-next reopens after June 29th.
RFC patches sent for review only are obviously welcome at any time.
See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle
--
pw-bot: defer
pv-bot: closed
^ permalink raw reply
* Re: [PATCH net v6 0/7] net: require CAP_NET_ADMIN in the device netns for tunnel changelink
From: patchwork-bot+netdevbpf @ 2026-06-18 0:20 UTC (permalink / raw)
To: Maoyi Xie
Cc: davem, edumazet, kuba, pabeni, dsahern, steffen.klassert, herbert,
horms, kuniyu, shaw.leon, netdev, linux-kernel, stable
In-Reply-To: <20260612085941.3158249-1-maoyixie.tju@gmail.com>
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Fri, 12 Jun 2026 16:59:34 +0800 you wrote:
> A tunnel changelink() operates on at most two netns, dev_net(dev) and
> the tunnel link netns t->net. They differ once the device is created in
> or moved to a netns other than the one the request runs in. The rtnl
> changelink path checks CAP_NET_ADMIN only against dev_net(dev), so a
> caller privileged there but not in the link netns can rewrite a tunnel
> that lives in the link netns. Commit 8b484efd5cb4 ("ip6: vti: Use
> ip6_tnl.net in vti6_siocdevprivate().") added the same check on the
> ioctl path. This series adds it on the RTM_NEWLINK path.
>
> [...]
Here is the summary with links:
- [net,v6,1/7] net: ip_gre: require CAP_NET_ADMIN in the device netns for changelink
https://git.kernel.org/netdev/net/c/8165f7ff57d9
- [net,v6,2/7] net: ipip: require CAP_NET_ADMIN in the device netns for changelink
https://git.kernel.org/netdev/net/c/8211a2632466
- [net,v6,3/7] net: ip_vti: require CAP_NET_ADMIN in the device netns for changelink
https://git.kernel.org/netdev/net/c/95cceadbfd52
- [net,v6,4/7] net: ip6_tunnel: require CAP_NET_ADMIN in the device netns for changelink
https://git.kernel.org/netdev/net/c/2496fa0b7d18
- [net,v6,5/7] net: ip6_gre: require CAP_NET_ADMIN in the device netns for changelink
https://git.kernel.org/netdev/net/c/f00a50876d28
- [net,v6,6/7] net: ip6_vti: require CAP_NET_ADMIN in the device netns for changelink
https://git.kernel.org/netdev/net/c/e2ac3b242c37
- [net,v6,7/7] xfrm: xfrm_interface: require CAP_NET_ADMIN in the device netns for changelink
https://git.kernel.org/netdev/net/c/095515d89b19
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net] octeontx2-af: npc: Log successful MCAM drop-on-non-hit install at debug level
From: patchwork-bot+netdevbpf @ 2026-06-18 0:20 UTC (permalink / raw)
To: Ratheesh Kannoth
Cc: kuba, linux-kernel, netdev, andrew+netdev, davem, edumazet,
pabeni, sgoutham
In-Reply-To: <20260615033157.535237-1-rkannoth@marvell.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 15 Jun 2026 09:01:57 +0530 you wrote:
> npc_install_mcam_drop_rule() used dev_err() after a successful
> rvu_mbox_handler_npc_mcam_write_entry() call, so normal installs appeared
> as errors in dmesg. Use dev_dbg() for the success path and keep dev_err()
> for real failures.
>
> Fixes: 3571fe07a090 ("octeontx2-af: Drop rules for NPC MCAM")
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
>
> [...]
Here is the summary with links:
- [net] octeontx2-af: npc: Log successful MCAM drop-on-non-hit install at debug level
https://git.kernel.org/netdev/net/c/4f6ac65e8162
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net] net: ethernet: mtk_eth_soc: fix supported_interface set after phylink_create
From: patchwork-bot+netdevbpf @ 2026-06-18 0:20 UTC (permalink / raw)
To: Christian Marangi
Cc: nbd, lorenzo, andrew+netdev, davem, edumazet, kuba, pabeni,
matthias.bgg, angelogioacchino.delregno, linux, daniel, netdev,
linux-kernel, linux-arm-kernel, linux-mediatek
In-Reply-To: <20260615151106.15438-1-ansuelsmth@gmail.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 15 Jun 2026 17:11:00 +0200 you wrote:
> Everything configured in phylink_config it's assumed to be set before
> calling phylink_create() to permit correct parsing of all the different
> modes and capabilities.
>
> Commit 51cf06ddafc9 ("net: ethernet: mtk_eth_soc: add support for MT7988
> internal 2.5G PHY") while introducing support for 2.5G phy for MT7988,
> probably due to an auto-rebase, placed the configuration of the INTERNAL
> interface mode for the supported_interfaces for phylink_config right after
> phylink_create() introducing a possible problem with supported interfaces
> parsing.
>
> [...]
Here is the summary with links:
- [net] net: ethernet: mtk_eth_soc: fix supported_interface set after phylink_create
https://git.kernel.org/netdev/net/c/e4b4d8410c7c
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH][net-next] net/mlx5: Remove broken and unused mlx5_query_mtppse()
From: patchwork-bot+netdevbpf @ 2026-06-18 0:20 UTC (permalink / raw)
To: lirongqing
Cc: saeedm, leon, tariqt, mbloch, andrew+netdev, davem, edumazet,
kuba, pabeni, netdev, gal, linux-rdma, linux-kernel
In-Reply-To: <20260615140406.1828-1-lirongqing@baidu.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 15 Jun 2026 22:04:06 +0800 you wrote:
> From: Li RongQing <lirongqing@baidu.com>
>
> mlx5_query_mtppse() reads the Event Trigger Pin (MTPPSE) register but
> reads the returned arm and mode values from the input buffer 'in'
> instead of the output buffer 'out', so it always returns the values
> that were written rather than the actual hardware state, making the
> query useless.
>
> [...]
Here is the summary with links:
- [net-next] net/mlx5: Remove broken and unused mlx5_query_mtppse()
https://git.kernel.org/netdev/net/c/b50fa1e07cf8
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net] netdev-genl: report NAPI thread PID in the caller's pid namespace
From: patchwork-bot+netdevbpf @ 2026-06-18 0:20 UTC (permalink / raw)
To: Maoyi Xie
Cc: davem, edumazet, kuba, pabeni, horms, daniel, razor, dw, sdf,
dtatulea, skhawaja, netdev, linux-kernel, stable
In-Reply-To: <20260615171736.1709318-1-maoyixie.tju@gmail.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 16 Jun 2026 01:17:36 +0800 you wrote:
> netdev_nl_napi_fill_one() reports the NAPI kthread PID in NETDEV_A_NAPI_PID
> using task_pid_nr(), which returns the PID in the initial pid namespace.
>
> NETDEV_CMD_NAPI_GET does not have GENL_ADMIN_PERM and the netdev genl family
> is netnsok, so a caller in a child pid namespace can issue it. That caller
> then sees the kthread's global PID, even though the kthread is not visible
> in its pid namespace, where the value should be 0.
>
> [...]
Here is the summary with links:
- [net] netdev-genl: report NAPI thread PID in the caller's pid namespace
https://git.kernel.org/netdev/net/c/1f24c0d01db2
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net] net: psample: fix info leak in PSAMPLE_ATTR_DATA
From: patchwork-bot+netdevbpf @ 2026-06-18 0:20 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, bestswngs,
yotam.gi, jhs, jiri
In-Reply-To: <20260616003046.1099490-1-kuba@kernel.org>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 15 Jun 2026 17:30:46 -0700 you wrote:
> psample open codes nla_put() presumably to avoid wiping
> the data with 0s just to override it with packet data.
> This open coding is missing clearing the pad, however,
> each netlink attr is padded to 4B and data_len may
> not be divisible by 4B.
>
> Fixes: 6ae0a6286171 ("net: Introduce psample, a new genetlink channel for packet sampling")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>
> [...]
Here is the summary with links:
- [net] net: psample: fix info leak in PSAMPLE_ATTR_DATA
https://git.kernel.org/netdev/net/c/aedd02af1f8b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net-next] net: pse-pd: set user byte command SUB2 field
From: patchwork-bot+netdevbpf @ 2026-06-18 0:20 UTC (permalink / raw)
To: Robert Marko
Cc: o.rempel, kory.maincent, andrew+netdev, davem, edumazet, kuba,
pabeni, netdev, linux-kernel, luka.perkov
In-Reply-To: <20260611102517.445549-1-robert.marko@sartura.hr>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 11 Jun 2026 12:24:49 +0200 you wrote:
> The Set User Byte to Save command has three subject bytes.
> The PD692x0 protocol guides defines SUB2 with value 0x4e, while SUB1
> carries the NVM user byte.
>
> Template only initialized SUB and SUB1.
> Fill SUB2 explicitly so the command matches the documented layout.
>
> [...]
Here is the summary with links:
- [net-next] net: pse-pd: set user byte command SUB2 field
https://git.kernel.org/netdev/net/c/e586644d0a89
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH] net: ehea: unwind probe_port sysfs file on failure
From: patchwork-bot+netdevbpf @ 2026-06-18 0:20 UTC (permalink / raw)
To: Pengpeng Hou
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, kees, netdev,
linux-kernel
In-Reply-To: <20260615070033.43461-1-pengpeng@iscas.ac.cn>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 15 Jun 2026 15:00:31 +0800 you wrote:
> ehea_create_device_sysfs() creates probe_port and then remove_port. If
> the second device_create_file() fails, the helper returns the error but
> leaves probe_port installed even though probe treats the sysfs setup as
> failed.
>
> Remove probe_port on the remove_port creation failure path so the helper
> leaves no partial sysfs state behind.
>
> [...]
Here is the summary with links:
- net: ehea: unwind probe_port sysfs file on failure
https://git.kernel.org/netdev/net/c/1c4b39746c4b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net v2] sctp: hold socket lock when dumping endpoints in sctp_diag
From: patchwork-bot+netdevbpf @ 2026-06-18 0:20 UTC (permalink / raw)
To: Xin Long
Cc: netdev, linux-sctp, davem, kuba, edumazet, pabeni, horms,
marcelo.leitner, w, zdi-disclosures
In-Reply-To: <4c1b49ab87e0f7d552ebd8172b364b1994e913c9.1781552190.git.lucien.xin@gmail.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 15 Jun 2026 15:36:30 -0400 you wrote:
> SCTP_DIAG endpoint dumping was traversing endpoint address lists without
> holding lock_sock(), while those lists could change concurrently via
> socket operations (e.g., bindx changes). This creates a race where
> nla_reserve() counts addresses under RCU protection, but the subsequent
> copy may see fewer entries, potentially leaking uninitialized memory to
> userspace.
>
> [...]
Here is the summary with links:
- [net,v2] sctp: hold socket lock when dumping endpoints in sctp_diag
https://git.kernel.org/netdev/net/c/7d8297e26b4e
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net] octeontx2-pf: Fix leak of SQ timestamp buffer on teardown
From: patchwork-bot+netdevbpf @ 2026-06-18 0:20 UTC (permalink / raw)
To: Ratheesh Kannoth
Cc: amakarov, davem, jesse.brandeburg, kuba, linux-kernel, netdev,
richardcochran, andrew+netdev, edumazet, pabeni, sgoutham
In-Reply-To: <20260615030704.504536-1-rkannoth@marvell.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 15 Jun 2026 08:37:04 +0530 you wrote:
> The send-queue timestamp ring is allocated with qmem_alloc() when
> timestamping is used, but otx2_free_sq_res() never freed sq->timestamps,
> leaking that memory across ifdown and device removal. Add the missing
> qmem_free() alongside the other SQ companion buffers.
>
> Fixes: c9c12d339d93 ("octeontx2-pf: Add support for PTP clock")
> Cc: Aleksey Makarov <amakarov@marvell.com>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
>
> [...]
Here is the summary with links:
- [net] octeontx2-pf: Fix leak of SQ timestamp buffer on teardown
https://git.kernel.org/netdev/net/c/a056db30de92
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net v2 1/1] net: ipv4: bound TCP reordering sysctl writes and MTU probe sizes
From: patchwork-bot+netdevbpf @ 2026-06-18 0:21 UTC (permalink / raw)
To: Ren Wei
Cc: netdev, edumazet, kuniyu, david.laight.linux, ncardwell, pabeni,
chia-yu.chang, ij, yuuchihsu, idosch, fmancera, herbert,
yuantan098, zcliangcn, bird, bronzed_45_vested
In-Reply-To: <1a5b7e1ef4d70fbad8c8ee0b82d8405f3c964a3d.1781395200.git.bronzed_45_vested@icloud.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 15 Jun 2026 18:31:18 +0800 you wrote:
> From: Wyatt Feng <bronzed_45_vested@icloud.com>
>
> Reject invalid `net.ipv4.tcp_reordering` values before they reach TCP
> socket state. The sysctl is stored as an `int` but copied into the
> `u32` `tp->reordering` field for new sockets, so negative writes wrap
> to large values.
>
> [...]
Here is the summary with links:
- [net,v2,1/1] net: ipv4: bound TCP reordering sysctl writes and MTU probe sizes
https://git.kernel.org/netdev/net/c/efb8763d7bbb
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH bpf v2] bpf, sockmap: fix use-after-free when the stream parser resizes the skb
From: Kuniyuki Iwashima @ 2026-06-18 0:25 UTC (permalink / raw)
To: rhkrqnwk98
Cc: bobbyeshleman, bpf, davem, edumazet, horms, jakub, john.fastabend,
kuba, linux-kernel, netdev, pabeni
In-Reply-To: <20260612123553.2724240-1-rhkrqnwk98@gmail.com>
From: Sechang Lim <rhkrqnwk98@gmail.com>
Date: Fri, 12 Jun 2026 12:35:51 +0000
> sk_psock_strp_parse() runs the BPF_PROG_TYPE_SK_SKB stream-parser program
> to find the length of the next message. strparser assembles a message out
> of several received skbs by chaining them onto the head's frag_list and
> recording where to append the next one in strp->skb_nextp:
>
> *strp->skb_nextp = skb;
> strp->skb_nextp = &skb->next;
>
> and then calls the parser on the head:
>
> len = (*strp->cb.parse_msg)(strp, head);
>
> The parser is only meant to inspect the skb, but the program may call
> bpf_skb_change_tail() -- or the sibling bpf_skb_pull_data(),
> bpf_skb_change_head(), bpf_skb_adjust_room(), all allowed for SK_SKB.
It's bpf prog's responsibility not to abuse them.
Even setting aside that, why not simply block such BPF prog ?
It cannot be done at load time, but doable at attach time.
---8<---
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 630d530782fe..4d60b77da8ef 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4556,6 +4556,12 @@ static int bpf_prog_attach(const union bpf_attr *attr)
switch (ptype) {
case BPF_PROG_TYPE_SK_SKB:
+ if (attr->attach_type == BPF_SK_SKB_STREAM_PARSER &&
+ prog->aux->changes_pkt_data) {
+ ret = -EINVAL;
+ goto out;
+ }
+ fallthrough;
case BPF_PROG_TYPE_SK_MSG:
ret = sock_map_get_from_fd(attr, prog);
break;
---8<---
> Once the head carries a frag_list these go
>
> ... -> skb_ensure_writable -> pskb_may_pull -> __pskb_pull_tail
>
> and __pskb_pull_tail() frees the frag_list skbs that strparser still
> tracks through skb_nextp:
>
> while ((list = skb_shinfo(skb)->frag_list) != insp) {
> skb_shinfo(skb)->frag_list = list->next;
> consume_skb(list);
> }
>
> strp->skb_nextp now points into a freed sk_buff. The next segment of
> the same message arrives in __strp_recv(), which links it with
> *strp->skb_nextp = skb, an 8-byte write into the freed skb. The free
> and the write happen in different __strp_recv() calls, so the message
> has to span at least three segments before it triggers.
>
> BUG: KASAN: slab-use-after-free in __strp_recv+0x447/0xda0
> Write of size 8 at addr ffff88810db86140 by task repro/349
>
> Call Trace:
> <IRQ>
> __strp_recv+0x447/0xda0
> __tcp_read_sock+0x13d/0x590
> tcp_bpf_strp_read_sock+0x195/0x320
> strp_data_ready+0x267/0x340
> sk_psock_strp_data_ready+0x1ce/0x350
> tcp_data_queue+0x1364/0x2fd0
> tcp_rcv_established+0xe07/0x1640
> [...]
>
> Allocated by task 349:
> skb_clone+0x17b/0x210
> __strp_recv+0x2c3/0xda0
> __tcp_read_sock+0x13d/0x590
> [...]
>
> Freed by task 349:
> kmem_cache_free+0x150/0x570
> __pskb_pull_tail+0x57b/0xc20
> skb_ensure_writable+0x236/0x260
> __bpf_skb_change_tail+0x1d4/0x590
> sk_skb_change_tail+0x2a/0x40
> bpf_prog_1b285dcd6c41373e+0x27/0x30
> bpf_prog_run_pin_on_cpu+0xf3/0x260
> sk_psock_strp_parse+0x118/0x1e0
> __strp_recv+0x4f6/0xda0
> [...]
>
> The same resize also leaves the head's length inconsistent with its
> frags, so a later __pskb_pull_tail() can instead hit the
> BUG_ON(skb_copy_bits(...)) in net/core/skbuff.c.
>
> Run the parser on a private clone of the head when the message spans more
> than one skb and the program can modify the packet
> (prog->aux->changes_pkt_data), so a resizing helper can only touch the
> clone and strparser's head and skb_nextp stay valid. Single-skb messages
> have no frag_list and read-only parsers cannot resize, so both are still
> parsed in place. If the clone cannot be allocated, return 0 so the caller
> retries on the next read rather than failing the parser.
>
> Fixes: 8a31db561566 ("bpf: add access to sock fields and pkt data from sk_skb programs")
> Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
> ---
> v2:
> - clone only when prog->aux->changes_pkt_data (Bobby Eshleman)
> - return 0 on clone failure instead of -ENOMEM (Bobby Eshleman)
> - free the clone with consume_skb() instead of kfree_skb()
> - drop the unrelated guard(rcu)() change (Bobby Eshleman)
>
> v1:
> - https://lore.kernel.org/all/20260609112316.3685738-1-rhkrqnwk98@gmail.com/
>
> net/core/skmsg.c | 26 +++++++++++++++++++++++---
> 1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index e1850caf1a71..97e5bc5f38c3 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1149,9 +1149,29 @@ static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb)
> rcu_read_lock();
> prog = READ_ONCE(psock->progs.stream_parser);
> if (likely(prog)) {
> - skb->sk = psock->sk;
> - ret = bpf_prog_run_pin_on_cpu(prog, skb);
> - skb->sk = NULL;
> + struct sk_buff *parse_skb = skb;
> +
> + /*
> + * strparser chains the message skbs through skb->frag_list and
> + * keeps a pointer into that list in strp->skb_nextp. The parser
> + * program may call bpf_skb_change_tail() and friends, which go
> + * through __pskb_pull_tail() and free the frag_list skbs that
> + * strparser still tracks. Run the program on a clone when the head
> + * has a frag_list and the program can modify the packet, so it
> + * cannot drop frags strparser owns.
> + */
> + if (skb_has_frag_list(skb) && prog->aux->changes_pkt_data) {
> + parse_skb = skb_clone(skb, GFP_ATOMIC);
> + if (!parse_skb) {
> + rcu_read_unlock();
> + return 0;
> + }
> + }
> + parse_skb->sk = psock->sk;
> + ret = bpf_prog_run_pin_on_cpu(prog, parse_skb);
> + parse_skb->sk = NULL;
> + if (parse_skb != skb)
> + consume_skb(parse_skb);
> }
> rcu_read_unlock();
> return ret;
> --
> 2.43.0
>
^ permalink raw reply related
* Re: [PATCH net-next 0/2] appletalk: move the protocol out of tree
From: Finn Thain @ 2026-06-18 0:55 UTC (permalink / raw)
To: Carsten Strotmann
Cc: Jakub Kicinski, Carsten Strotmann, John Paul Adrian Glaubitz,
davem, netdev, edumazet, pabeni, andrew+netdev, horms, geert,
chleroy, npiggin, mpe, maddy, linux-mips, linux-m68k,
linuxppc-dev
In-Reply-To: <1781694488854.956546368.818588236@strotmann.de>
On Wed, 17 Jun 2026, Carsten Strotmann wrote:
> > _Someone_ has to handle the reports and patches. And since nobody is
> > doing that the code is going to GitHub, where it can continue to "just
> > be left" or whatever, without racking up CVEs for the Linux kernel and
> > leading to maintainer burn out :/
> >
>
> That's a good point. The large influx of reports is a problem, and burn
> out of maintainers is a too high cost.
>
Carsten, if, as a maintainer, you want to avoid burnout then
1) don't promise what you can't deliver (that is, decline sponsorship)
2) delegate (that is, leverage AI as an ally not as a lame excuse)
So the question remains: what is it which _can_ be delivered by and for
the "community" (by which I mean, that group of people which includes
actual end users -- not merely paying customers and sponsored developers).
This question has precious little to do with burnout, but it's the
question we need to address.
^ permalink raw reply
* Re: [RFC PATCH 1/2] landlock: fix TCP Fast Open connection bypass
From: Bryam Vargas @ 2026-06-18 1:25 UTC (permalink / raw)
To: Matthieu Buffet
Cc: Mickaël Salaün, Günther Noack, Mikhail Ivanov,
Paul Moore, Eric Dumazet, Neal Cardwell, linux-security-module,
netdev, linux-kernel
In-Reply-To: <20260617180526.15627-2-matthieu@buffet.re>
Thanks Matthieu, your #41, so no competing patch from me. I built your v0
(Landlock + MPTCP) and ran an A/B: without it, a confined task with CONNECT_TCP
denied still reaches the port via sendto(MSG_FASTOPEN); with it, that path is now
denied too, on IPv4 and IPv6.
Tested-by: Bryam Vargas <hexlabsecurity@proton.me>
One scope note, since you mention MPTCP: an MPTCP socket isn't covered.
sk_is_tcp() is false for the mptcp parent (sk_protocol is IPPROTO_MPTCP), so
neither the new sendmsg hook nor the existing socket_connect one mediates it. On
the patched kernel my MPTCP arm still reaches the blocked port via both connect()
and MSG_FASTOPEN. If MPTCP is meant to be in scope for CONNECT_TCP, the guard
wants `|| sk->sk_protocol == IPPROTO_MPTCP` (not sk_is_mptcp(), which is the
subflow flag).
Bryam
^ permalink raw reply
* [PATCH net-next v9 00/10] enic: SR-IOV V2 admin channel and MBOX protocol
From: Satish Kharat @ 2026-06-18 1:53 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel, Sesidhar Baddela, Satish Kharat,
Breno Leitao
This series adds the admin channel infrastructure and mailbox (MBOX)
protocol needed for V2 SR-IOV support in the enic driver.
The V2 SR-IOV design uses a direct PF-VF communication channel built on
dedicated WQ/RQ/CQ hardware resources and an MSI-X interrupt.
Firmware capability and admin channel infrastructure (patches 1-4):
- Probe-time firmware feature check for V2 SR-IOV support
- Admin channel open/close, RQ buffer management, CQ service
with MSI-X interrupt and workqueue-based polling
MBOX protocol and VF enable (patches 5-10):
- MBOX message types, core send/receive, PF and VF handlers
- V2 SR-IOV enable wiring with admin channel setup
- V2 VF probe with admin channel and PF registration
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
Changes in v9:
- Use dma_rmb() instead of rmb() when reading admin RQ completion
descriptors written by DMA (patch 4) [Sashiko]
- Use GFP_KERNEL instead of GFP_ATOMIC for admin RQ refill and for
received-message allocation; both run in workqueue (process)
context after the v8 NAPI-to-workqueue switch (patch 4) [Sashiko]
- Correct the enic_admin_msg comment to describe the workqueue
enqueue path rather than NAPI (patch 4) [Sashiko]
- Set mbox_send_disabled in enic_admin_channel_close() so a MBOX
send cannot race with channel teardown (patch 6) [Sashiko]
- Send the actual PF carrier state to a VF on registration instead
of unconditionally reporting link up (patch 7) [Sashiko]
- Call reinit_completion() before setting mbox_expected_reply so a
reply arriving between the two is not missed (patch 8) [Sashiko]
- Defer PF->VF link state notification to a workqueue and gate it on
carrier transitions; enic_link_check() runs in the notify (atomic)
context while the MBOX send sleeps on a mutex/completion (patch 9)
[Sashiko]
- Clear ENIC_SRIOV_ENABLED and cancel the link-notify work before
freeing per-VF state in the SR-IOV disable path, closing a
use-after-free window against a concurrent link notification
(patch 9) [Sashiko]
- Link to v8: https://patch.msgid.link/20260609-enic-sriov-v2-admin-channel-v2-v8-0-8ad8babbb826@cisco.com
Changes in v8:
- Replace NAPI polling with workqueue for admin CQ service — admin
channel is low-frequency control traffic, not data path (patch 4)
[Jakub Kicinski]
- Use explicit enum value (= 4) for VIC_FEATURE_SRIOV instead of
placeholder VIC_FEATURE_PTP entry (patch 1) [Breno Leitao]
- Remove unnecessary rmb() in WQ CQ service (patch 4) [Jakub Kicinski]
- Remove admin_msg_drop_cnt counter (patch 4) [Simon Horman]
- Drop NAPI reschedule on RQ refill failure — the NAPI-to-workqueue
switch removes the livelock and budget issues (patch 4) [Simon Horman]
- Remove unnecessary READ_ONCE/WRITE_ONCE on admin_rq_handler — all
access is serialized by probe/remove (patch 6) [Jakub Kicinski]
- Fix checkpatch line-length warnings (patches 3, 5, 6)
- Rate-limit link state send failure and ACK error warnings (patch 7)
[Jakub Kicinski]
- Correct enic_link_check comment to describe actual PF link state
notification flow (patch 7) [Simon Horman]
- Correct mbox_expected_reply comment — serialization is by
RTNL/probe, not mbox_lock (patch 8) [Jakub Kicinski]
- Wire enic_mbox_send_link_state() from enic_link_check() so PF
notifies VFs on carrier change (patch 9) [Simon Horman]
- Fix commit message wording about MSI-X reservation (patch 10)
[Simon Horman]
- Link to v7: https://patch.msgid.link/20260513-enic-sriov-v2-admin-channel-v2-v7-0-68b9f4141f4c@cisco.com
Changes in v7:
- Replace magic numbers in admin channel init with named macros
and inline comments for MBOX descriptor encoding
(patches 2, 6) [Paolo Abeni]
- Add defense-in-depth bounds check on admin RQ bytes_written (patch 4)
- Force NAPI reschedule on admin RQ refill failure (patch 4)
- Always unmask admin interrupt even with zero credits (patch 4)
- Reorder NAPI init before request_irq in admin channel open (patch 4)
- Remove redundant netdev_warn on admin msg enqueue kmalloc failure
(patch 4) [Paolo Abeni]
- Add netdev_warn on admin WQ/RQ disable failure in close path
(patch 2)
- Remove incorrect RES_TYPE_SRIOV_INTR interrupt allocation from
admin channel open (patch 2); interrupt setup handled entirely
in patch 4 using RES_TYPE_INTR_CTRL
- Rate-limit VF register/unregister log messages (patch 7) [Paolo Abeni]
- Add __aligned(8) to admin message data[] for strict-alignment
safety (patch 4)
- Rate-limit MBOX handler error warnings (patch 7)
- Pre-allocate port profile array before pci_disable_sriov in V1
disable path to avoid half-torn-down state on alloc failure (patch 9)
- Account for admin channel interrupt reservation in
enic_set_intr_mode() and enic_adjust_resources() (patch 9) [Paolo Abeni]
- Clear admin_rq_handler in enic_admin_channel_close (patch 9)
- Quiesce admin channel (mask interrupt, disable NAPI, block MBOX
sends) around soft reset (patch 9)
- Use WRITE_ONCE/READ_ONCE for mbox_send_disabled and
admin_rq_handler across data-path/reset boundaries
(patches 4, 6, 9)
- Fix commit message: reference enic_adjust_resources() alongside
enic_set_intr_mode() (patch 10)
Investigated findings from automated review (Simon Horman / Sashiko):
- Race between probe-time feature check and VF proxy: false positive;
detection runs at probe, enable runs from sriov_configure
- Struct alignment of __le32 after 2-byte mbox_hdr_embed: compiler
inserts correct padding, no manual alignment needed
- Stale MBOX reply matching / reinit_completion race: single-flight
design with mutex serialization prevents this
- cancel_work_sync vs MBOX unregister race: work cannot be
re-triggered during the close window
- Link to v6: https://patch.msgid.link/20260503-enic-sriov-v2-admin-channel-v2-v6-0-0af4fbc2d86d@cisco.com
Changes in v6:
- Add explanatory comments documenting admin_cq[0] (WQ CQE size) and
admin_cq[1] (RQ CQE size matching firmware enic_ext_cq() programming)
allocations (patch 2)
- Enforce bytes_written from CQ descriptor when enqueuing admin RQ
message; previously buf->len (allocation size) was passed, exposing
uninitialized buffer memory beyond the real payload (patch 4)
- Drop admin RQ messages with TRUNCATED set or FCS_OK clear, gated by
netdev_warn_once() (patch 4)
- Disable interrupt_enable on admin_cq[0]: WQ completions are polled
synchronously inside enic_mbox_send_msg() and never raise an
interrupt; matches admin_cq[1] (RQ) which does NAPI polling (patch 4)
- Add mbox_expected_reply gating in VF reply handlers (capability,
register, unregister): drop replies whose type does not match the
current waiter's expected type, avoiding spurious wakeup of an
unrelated waiter from a stale reply that arrives after timeout
(patch 8)
- Distinguish error returns in enic_mbox_vf_unregister(): -ETIMEDOUT
(no reply received), -EACCES (PF rejected the unregister), 0 on
success. Previously all paths collapsed to a single -ETIMEDOUT
(patch 8)
- Reserve one extra MSI-X slot in enic_set_intr_mode() when
has_admin_channel is set so enic_admin_setup_intr() always has room
to allocate at intr_count without exceeding intr_avail bounds when
data queue count is maxed out (patch 10)
- Clarify in commit messages that .sriov_configure is intentionally
not yet wired in this series and will be added in a follow-up after
the necessary devcmd hardening lands (patch 9)
- Link to v5: https://patch.msgid.link/20260423-enic-sriov-v2-admin-channel-v2-v5-0-caa9f504a3dc@cisco.com
Changes in v5:
- Fix DMA-into-freed-memory race: call enic_admin_qp_type_set() before
disabling RQ/WQ in both error and close paths (patch 3)
- Fix DMA mapping leak: enic_admin_wq_buf_clean() now unmaps and frees
WQ buffers still held at close time after a send timeout (patch 3)
- Log rate-limited warning on admin RQ refill failure (patch 4)
- Add missing linux/types.h and linux/bits.h includes to enic_mbox.h
(patch 5)
- Guard mbox_lock/mbox_comp init with mbox_initialized flag to prevent
re-initialization on sriov_configure re-entry (patch 7)
- Clear VF registered state before sending unregister reply so PF does
not treat a dead VF as still registered (patch 8)
- Gate VF-facing log messages with net_ratelimit() to prevent malicious
VF from flooding PF dmesg (patch 8)
- Reject VF port profile requests when V2 SR-IOV is active since
enic->pp is not reallocated for V2 VFs (patch 9)
- Move enic_sriov_detect_vf_type() before auto-enable check; skip
probe-time auto-enable for V2 VFs (patch 9)
- Move admin channel close and VF unregister before unregister_netdev()
in enic_remove() to prevent use-after-free on netdev (patch 10)
- Add comment in enic_reset() documenting that admin channel is not
recovered after soft reset (patch 10)
- Bypass RES_TYPE_SRIOV_INTR check for V2 VFs in admin channel
capability detection (patch 10)
- Link to v4: https://patch.msgid.link/20260411-enic-sriov-v2-admin-channel-v2-v4-0-f052326c2a57@cisco.com
Changes in v4:
- Fix reverse xmas tree variable ordering (patches 1, 6)
- Use kzalloc_obj instead of kzalloc with sizeof (patch 9)
- Add NULL check for pp allocation in V1 SR-IOV disable path (patch 9)
- Link to v3: https://lore.kernel.org/r/20260408-enic-sriov-v2-admin-channel-v2-v3-0-1d4999a03cec@cisco.com
Changes in v3:
- Use early-return pattern in enic_sriov_detect_vf_type to reduce
nesting (patch 1) [Breno Leitao]
- Link to v2: https://lore.kernel.org/r/20260408-enic-sriov-v2-admin-channel-v2-v2-0-d05dd3623fd3@cisco.com
Changes in v2:
- Fix lines exceeding 80 columns (patches 4, 6, 7, 8)
- Add __maybe_unused to enic_sriov_configure and enic_sriov_v2_enable;
.sriov_configure wiring deferred to a later series after devcmd
hardening is in place (patch 9)
- Guard probe-time auto-enable to skip V2 VFs (patch 9)
- Link to v1: https://lore.kernel.org/r/20260406-enic-sriov-v2-admin-channel-v2-v1-0-82cc47636a78@cisco.com
---
Satish Kharat (10):
enic: verify firmware supports V2 SR-IOV at probe time
enic: add admin channel open and close for SR-IOV
enic: add admin RQ buffer management
enic: add admin CQ service with MSI-X interrupt and workqueue polling
enic: define MBOX message types and header structures
enic: add MBOX core send and receive for admin channel
enic: add MBOX PF handlers for VF register and capability
enic: add MBOX VF handlers for capability, register and link state
enic: wire V2 SR-IOV enable with admin channel and MBOX
enic: add V2 VF probe with admin channel and PF registration
drivers/net/ethernet/cisco/enic/Makefile | 3 +-
drivers/net/ethernet/cisco/enic/enic.h | 34 +-
drivers/net/ethernet/cisco/enic/enic_admin.c | 586 ++++++++++++++++++++++++
drivers/net/ethernet/cisco/enic/enic_admin.h | 27 ++
drivers/net/ethernet/cisco/enic/enic_main.c | 349 +++++++++++++-
drivers/net/ethernet/cisco/enic/enic_mbox.c | 630 ++++++++++++++++++++++++++
drivers/net/ethernet/cisco/enic/enic_mbox.h | 95 ++++
drivers/net/ethernet/cisco/enic/enic_pp.c | 5 +
drivers/net/ethernet/cisco/enic/enic_res.c | 4 +-
drivers/net/ethernet/cisco/enic/vnic_cq.h | 9 +
drivers/net/ethernet/cisco/enic/vnic_devcmd.h | 13 +
drivers/net/ethernet/cisco/enic/vnic_enet.h | 4 +-
12 files changed, 1739 insertions(+), 20 deletions(-)
---
base-commit: 2319688890d97c63da423a3c57c23b4ab5952dfc
change-id: 20260404-enic-sriov-v2-admin-channel-v2-c0aa3e988833
Best regards,
--
Satish Kharat <satishkh@cisco.com>
^ permalink raw reply
* [PATCH net-next v9 03/10] enic: add admin RQ buffer management
From: Satish Kharat @ 2026-06-18 1:53 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel, Sesidhar Baddela, Satish Kharat
In-Reply-To: <20260617-enic-sriov-v2-admin-channel-v2-v9-0-37f5f5af4c93@cisco.com>
The admin receive queue needs pre-posted DMA buffers for incoming
mailbox messages from VFs. Each buffer is a kmalloc'd region mapped
for DMA (2048 bytes, sufficient for any MBOX message).
Add enic_admin_rq_fill(gfp) to post buffers at open time, and
enic_admin_rq_drain() to unmap and free them at close time.
Wire both into the admin channel open/close paths. The gfp_t
parameter lets the caller pass the allocation context; both current
callers -- channel open and the CQ-poll work handler that refills
after draining (added in the next patch) -- run in process context
and use GFP_KERNEL.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/enic_admin.c | 66 +++++++++++++++++++++++++++-
1 file changed, 64 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.c b/drivers/net/ethernet/cisco/enic/enic_admin.c
index aa21868a9209..b28fc6c656cc 100644
--- a/drivers/net/ethernet/cisco/enic/enic_admin.c
+++ b/drivers/net/ethernet/cisco/enic/enic_admin.c
@@ -3,6 +3,7 @@
#include <linux/kernel.h>
#include <linux/netdevice.h>
+#include <linux/dma-mapping.h>
#include "vnic_dev.h"
#include "vnic_wq.h"
@@ -34,10 +35,63 @@ static void enic_admin_wq_buf_clean(struct vnic_wq *wq,
}
}
-/* No-op: admin RQ buffer teardown is handled in enic_admin_channel_close */
static void enic_admin_rq_buf_clean(struct vnic_rq *rq,
struct vnic_rq_buf *buf)
{
+ struct enic *enic = vnic_dev_priv(rq->vdev);
+
+ if (!buf->os_buf)
+ return;
+
+ dma_unmap_single(&enic->pdev->dev, buf->dma_addr, buf->len,
+ DMA_FROM_DEVICE);
+ kfree(buf->os_buf);
+ buf->os_buf = NULL;
+}
+
+static int enic_admin_rq_post_one(struct enic *enic, gfp_t gfp)
+{
+ struct vnic_rq *rq = &enic->admin_rq;
+ struct rq_enet_desc *desc;
+ dma_addr_t dma_addr;
+ void *buf;
+
+ buf = kmalloc(ENIC_ADMIN_BUF_SIZE, gfp);
+ if (!buf)
+ return -ENOMEM;
+
+ dma_addr = dma_map_single(&enic->pdev->dev, buf, ENIC_ADMIN_BUF_SIZE,
+ DMA_FROM_DEVICE);
+ if (dma_mapping_error(&enic->pdev->dev, dma_addr)) {
+ kfree(buf);
+ return -ENOMEM;
+ }
+
+ desc = vnic_rq_next_desc(rq);
+ rq_enet_desc_enc(desc, (u64)dma_addr | VNIC_PADDR_TARGET,
+ RQ_ENET_TYPE_ONLY_SOP, ENIC_ADMIN_BUF_SIZE);
+ vnic_rq_post(rq, buf, 0, dma_addr, ENIC_ADMIN_BUF_SIZE, 0);
+
+ return 0;
+}
+
+static int enic_admin_rq_fill(struct enic *enic, gfp_t gfp)
+{
+ struct vnic_rq *rq = &enic->admin_rq;
+ int err;
+
+ while (vnic_rq_desc_avail(rq) > 0) {
+ err = enic_admin_rq_post_one(enic, gfp);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+static void enic_admin_rq_drain(struct enic *enic)
+{
+ vnic_rq_clean(&enic->admin_rq, enic_admin_rq_buf_clean);
}
static int enic_admin_qp_type_set(struct enic *enic, u32 enable)
@@ -171,6 +225,13 @@ int enic_admin_channel_open(struct enic *enic)
vnic_wq_enable(&enic->admin_wq);
vnic_rq_enable(&enic->admin_rq);
+ err = enic_admin_rq_fill(enic, GFP_KERNEL);
+ if (err) {
+ netdev_err(enic->netdev,
+ "Failed to fill admin RQ buffers: %d\n", err);
+ goto disable_queues;
+ }
+
err = enic_admin_qp_type_set(enic, QP_ENABLE);
if (err) {
netdev_err(enic->netdev,
@@ -186,6 +247,7 @@ int enic_admin_channel_open(struct enic *enic)
netdev_warn(enic->netdev, "Failed to disable admin WQ\n");
if (vnic_rq_disable(&enic->admin_rq))
netdev_warn(enic->netdev, "Failed to disable admin RQ\n");
+ enic_admin_rq_drain(enic);
enic_admin_free_resources(enic);
return err;
}
@@ -209,7 +271,7 @@ void enic_admin_channel_close(struct enic *enic)
"Failed to disable admin RQ: %d\n", err);
vnic_wq_clean(&enic->admin_wq, enic_admin_wq_buf_clean);
- vnic_rq_clean(&enic->admin_rq, enic_admin_rq_buf_clean);
+ enic_admin_rq_drain(enic);
vnic_cq_clean(&enic->admin_cq[0]);
vnic_cq_clean(&enic->admin_cq[1]);
enic_admin_free_resources(enic);
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v9 02/10] enic: add admin channel open and close for SR-IOV
From: Satish Kharat @ 2026-06-18 1:53 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel, Sesidhar Baddela, Satish Kharat
In-Reply-To: <20260617-enic-sriov-v2-admin-channel-v2-v9-0-37f5f5af4c93@cisco.com>
The V2 SR-IOV design uses a dedicated admin channel (WQ/RQ/CQ/INTR
on separate BAR resources) for PF-VF mailbox communication rather
than firmware-proxied devcmds.
Introduce enic_admin_channel_open() and enic_admin_channel_close().
Open allocates and initialises the admin WQ, RQ, and two CQs (one per
direction), then issues CMD_QP_TYPE_SET to tell firmware the queues are
admin-type. Close reverses the sequence.
enic_admin_wq_buf_clean() unmaps and frees any WQ buffers still held
at close time, fixing a DMA mapping leak when a send times out.
Add CMD_QP_TYPE_SET (97), QP_TYPE_ADMIN/DATA, and QP_ENABLE/QP_DISABLE
defines to vnic_devcmd.h. Add VNIC_CQ_* named constants to vnic_cq.h
so CQ initialisation parameters are self-documenting from their first
introduction.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/Makefile | 3 +-
drivers/net/ethernet/cisco/enic/enic_admin.c | 216 ++++++++++++++++++++++++++
drivers/net/ethernet/cisco/enic/enic_admin.h | 15 ++
drivers/net/ethernet/cisco/enic/vnic_cq.h | 9 ++
drivers/net/ethernet/cisco/enic/vnic_devcmd.h | 11 ++
5 files changed, 253 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cisco/enic/Makefile b/drivers/net/ethernet/cisco/enic/Makefile
index a96b8332e6e2..7ae72fefc99a 100644
--- a/drivers/net/ethernet/cisco/enic/Makefile
+++ b/drivers/net/ethernet/cisco/enic/Makefile
@@ -3,5 +3,6 @@ obj-$(CONFIG_ENIC) := enic.o
enic-y := enic_main.o vnic_cq.o vnic_intr.o vnic_wq.o \
enic_res.o enic_dev.o enic_pp.o vnic_dev.o vnic_rq.o vnic_vic.o \
- enic_ethtool.o enic_api.o enic_clsf.o enic_rq.o enic_wq.o
+ enic_ethtool.o enic_api.o enic_clsf.o enic_rq.o enic_wq.o \
+ enic_admin.o
diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.c b/drivers/net/ethernet/cisco/enic/enic_admin.c
new file mode 100644
index 000000000000..aa21868a9209
--- /dev/null
+++ b/drivers/net/ethernet/cisco/enic/enic_admin.c
@@ -0,0 +1,216 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2025 Cisco Systems, Inc. All rights reserved.
+
+#include <linux/kernel.h>
+#include <linux/netdevice.h>
+
+#include "vnic_dev.h"
+#include "vnic_wq.h"
+#include "vnic_rq.h"
+#include "vnic_cq.h"
+#include "vnic_intr.h"
+#include "vnic_resource.h"
+#include "vnic_devcmd.h"
+#include "enic.h"
+#include "enic_admin.h"
+#include "cq_desc.h"
+#include "wq_enet_desc.h"
+#include "rq_enet_desc.h"
+
+/* Clean up any admin WQ buffers still held by hardware at close time.
+ * Normally buffers are freed inline after send completion, but a timed-out
+ * send intentionally leaves the buffer live until the queue is stopped.
+ */
+static void enic_admin_wq_buf_clean(struct vnic_wq *wq,
+ struct vnic_wq_buf *buf)
+{
+ struct enic *enic = vnic_dev_priv(wq->vdev);
+
+ if (buf->os_buf) {
+ dma_unmap_single(&enic->pdev->dev, buf->dma_addr,
+ buf->len, DMA_TO_DEVICE);
+ kfree(buf->os_buf);
+ buf->os_buf = NULL;
+ }
+}
+
+/* No-op: admin RQ buffer teardown is handled in enic_admin_channel_close */
+static void enic_admin_rq_buf_clean(struct vnic_rq *rq,
+ struct vnic_rq_buf *buf)
+{
+}
+
+static int enic_admin_qp_type_set(struct enic *enic, u32 enable)
+{
+ u64 a0 = QP_TYPE_ADMIN, a1 = enable;
+ int wait = 1000;
+ int err;
+
+ spin_lock_bh(&enic->devcmd_lock);
+ err = vnic_dev_cmd(enic->vdev, CMD_QP_TYPE_SET, &a0, &a1, wait);
+ spin_unlock_bh(&enic->devcmd_lock);
+
+ return err;
+}
+
+static int enic_admin_alloc_resources(struct enic *enic)
+{
+ int err;
+
+ err = vnic_wq_alloc_with_type(enic->vdev, &enic->admin_wq, 0,
+ ENIC_ADMIN_DESC_COUNT,
+ sizeof(struct wq_enet_desc),
+ RES_TYPE_ADMIN_WQ);
+ if (err)
+ return err;
+
+ err = vnic_rq_alloc_with_type(enic->vdev, &enic->admin_rq, 0,
+ ENIC_ADMIN_DESC_COUNT,
+ sizeof(struct rq_enet_desc),
+ RES_TYPE_ADMIN_RQ);
+ if (err)
+ goto free_wq;
+
+ /* admin_cq[0] is the WQ completion queue. WQ CQEs are always
+ * 16 bytes wide; firmware always writes 16-byte CQEs for WQ
+ * completions on every WQ, including the admin channel WQ.
+ * Use sizeof(struct cq_desc) accordingly.
+ */
+ err = vnic_cq_alloc_with_type(enic->vdev, &enic->admin_cq[0], 0,
+ ENIC_ADMIN_DESC_COUNT,
+ sizeof(struct cq_desc),
+ RES_TYPE_ADMIN_CQ);
+ if (err)
+ goto free_rq;
+
+ /* admin_cq[1] is the RQ completion queue. Its descriptor size
+ * must match what firmware writes. enic_ext_cq() called earlier
+ * in probe issues CMD_CQ_ENTRY_SIZE_SET for VNIC_RQ_ALL,
+ * programming firmware to write CQ entries of (16 << enic->ext_cq)
+ * bytes for every RQ CQ on the vNIC, including the admin RQ CQ.
+ * Allocating with the same size keeps the host poller and
+ * firmware in lockstep:
+ *
+ * - The color/valid bit lives at byte (desc_size - 1) of every
+ * cq_enet_rq_desc[_32|_64] variant, so enic_admin_cq_color()
+ * reads it from the correct offset.
+ * - Only the first 15 bytes of the descriptor (vlan,
+ * bytes_written_flags, ...) are accessed by the admin path;
+ * these fields are identical across all three variants (see
+ * comment in enic_rq.c above cq_enet_rq_desc_dec()).
+ */
+ err = vnic_cq_alloc_with_type(enic->vdev, &enic->admin_cq[1], 1,
+ ENIC_ADMIN_DESC_COUNT,
+ 16 << enic->ext_cq,
+ RES_TYPE_ADMIN_CQ);
+ if (err)
+ goto free_cq0;
+
+ return 0;
+
+free_cq0:
+ vnic_cq_free(&enic->admin_cq[0]);
+free_rq:
+ vnic_rq_free(&enic->admin_rq);
+free_wq:
+ vnic_wq_free(&enic->admin_wq);
+ return err;
+}
+
+static void enic_admin_free_resources(struct enic *enic)
+{
+ vnic_cq_free(&enic->admin_cq[1]);
+ vnic_cq_free(&enic->admin_cq[0]);
+ vnic_rq_free(&enic->admin_rq);
+ vnic_wq_free(&enic->admin_wq);
+}
+
+static void enic_admin_init_resources(struct enic *enic)
+{
+ vnic_wq_init(&enic->admin_wq,
+ 0, 0, 0); /* cq_index, err_intr_enable, err_intr_offset */
+ vnic_rq_init(&enic->admin_rq,
+ 1, 0, 0); /* cq_index, err_intr_enable, err_intr_offset */
+ vnic_cq_init(&enic->admin_cq[0],
+ VNIC_CQ_FC_DISABLE,
+ VNIC_CQ_COLOR_ENABLE,
+ 0, 0, 1, /* cq_head, cq_tail, cq_tail_color */
+ VNIC_CQ_INTR_DISABLE,
+ VNIC_CQ_ENTRY_ENABLE,
+ VNIC_CQ_MSG_DISABLE,
+ 0, /* interrupt_offset */
+ 0 /* cq_message_addr */);
+ vnic_cq_init(&enic->admin_cq[1],
+ VNIC_CQ_FC_DISABLE,
+ VNIC_CQ_COLOR_ENABLE,
+ 0, 0, 1, /* cq_head, cq_tail, cq_tail_color */
+ VNIC_CQ_INTR_DISABLE,
+ VNIC_CQ_ENTRY_ENABLE,
+ VNIC_CQ_MSG_DISABLE,
+ 0, /* interrupt_offset */
+ 0 /* cq_message_addr */);
+}
+
+int enic_admin_channel_open(struct enic *enic)
+{
+ int err;
+
+ if (!enic->has_admin_channel)
+ return -ENODEV;
+
+ err = enic_admin_alloc_resources(enic);
+ if (err) {
+ netdev_err(enic->netdev,
+ "Failed to alloc admin channel resources: %d\n",
+ err);
+ return err;
+ }
+
+ enic_admin_init_resources(enic);
+
+ vnic_wq_enable(&enic->admin_wq);
+ vnic_rq_enable(&enic->admin_rq);
+
+ err = enic_admin_qp_type_set(enic, QP_ENABLE);
+ if (err) {
+ netdev_err(enic->netdev,
+ "Failed to set admin QP type: %d\n", err);
+ goto disable_queues;
+ }
+
+ return 0;
+
+disable_queues:
+ enic_admin_qp_type_set(enic, QP_DISABLE);
+ if (vnic_wq_disable(&enic->admin_wq))
+ netdev_warn(enic->netdev, "Failed to disable admin WQ\n");
+ if (vnic_rq_disable(&enic->admin_rq))
+ netdev_warn(enic->netdev, "Failed to disable admin RQ\n");
+ enic_admin_free_resources(enic);
+ return err;
+}
+
+void enic_admin_channel_close(struct enic *enic)
+{
+ int err;
+
+ if (!enic->has_admin_channel)
+ return;
+
+ enic_admin_qp_type_set(enic, QP_DISABLE);
+
+ err = vnic_wq_disable(&enic->admin_wq);
+ if (err)
+ netdev_warn(enic->netdev,
+ "Failed to disable admin WQ: %d\n", err);
+ err = vnic_rq_disable(&enic->admin_rq);
+ if (err)
+ netdev_warn(enic->netdev,
+ "Failed to disable admin RQ: %d\n", err);
+
+ vnic_wq_clean(&enic->admin_wq, enic_admin_wq_buf_clean);
+ vnic_rq_clean(&enic->admin_rq, enic_admin_rq_buf_clean);
+ vnic_cq_clean(&enic->admin_cq[0]);
+ vnic_cq_clean(&enic->admin_cq[1]);
+ enic_admin_free_resources(enic);
+}
diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.h b/drivers/net/ethernet/cisco/enic/enic_admin.h
new file mode 100644
index 000000000000..569aadeb9312
--- /dev/null
+++ b/drivers/net/ethernet/cisco/enic/enic_admin.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2025 Cisco Systems, Inc. All rights reserved. */
+
+#ifndef _ENIC_ADMIN_H_
+#define _ENIC_ADMIN_H_
+
+#define ENIC_ADMIN_DESC_COUNT 64
+#define ENIC_ADMIN_BUF_SIZE 2048
+
+struct enic;
+
+int enic_admin_channel_open(struct enic *enic);
+void enic_admin_channel_close(struct enic *enic);
+
+#endif /* _ENIC_ADMIN_H_ */
diff --git a/drivers/net/ethernet/cisco/enic/vnic_cq.h b/drivers/net/ethernet/cisco/enic/vnic_cq.h
index d46d4d2ef6bb..35ffa3230713 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_cq.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_cq.h
@@ -76,6 +76,15 @@ int vnic_cq_alloc(struct vnic_dev *vdev, struct vnic_cq *cq, unsigned int index,
int vnic_cq_alloc_with_type(struct vnic_dev *vdev, struct vnic_cq *cq,
unsigned int index, unsigned int desc_count,
unsigned int desc_size, unsigned int res_type);
+#define VNIC_CQ_FC_ENABLE 1
+#define VNIC_CQ_FC_DISABLE 0
+#define VNIC_CQ_COLOR_ENABLE 1
+#define VNIC_CQ_INTR_ENABLE 1
+#define VNIC_CQ_INTR_DISABLE 0
+#define VNIC_CQ_ENTRY_ENABLE 1
+#define VNIC_CQ_MSG_ENABLE 1
+#define VNIC_CQ_MSG_DISABLE 0
+
void vnic_cq_init(struct vnic_cq *cq, unsigned int flow_control_enable,
unsigned int color_enable, unsigned int cq_head, unsigned int cq_tail,
unsigned int cq_tail_color, unsigned int interrupt_enable,
diff --git a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
index 3b6efa743dba..90ca06691ebd 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
@@ -455,8 +455,19 @@ enum vnic_devcmd_cmd {
*/
CMD_CQ_ENTRY_SIZE_SET = _CMDC(_CMD_DIR_WRITE, _CMD_VTYPE_ENET, 90),
+ /*
+ * Set queue pair type (admin or data)
+ * in: (u32) a0 = queue pair type (0 = admin, 1 = data)
+ * in: (u32) a1 = enable (1) / disable (0)
+ */
+ CMD_QP_TYPE_SET = _CMDC(_CMD_DIR_WRITE, _CMD_VTYPE_ENET, 97),
};
+#define QP_TYPE_ADMIN 0
+#define QP_TYPE_DATA 1
+#define QP_ENABLE 1
+#define QP_DISABLE 0
+
/* CMD_ENABLE2 flags */
#define CMD_ENABLE2_STANDBY 0x0
#define CMD_ENABLE2_ACTIVE 0x1
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v9 08/10] enic: add MBOX VF handlers for capability, register and link state
From: Satish Kharat @ 2026-06-18 1:53 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel, Sesidhar Baddela, Satish Kharat
In-Reply-To: <20260617-enic-sriov-v2-admin-channel-v2-v9-0-37f5f5af4c93@cisco.com>
Implement VF-side mailbox message processing for SR-IOV V2
admin channel communication.
VF receive handlers:
- VF_CAPABILITY_REPLY: store PF protocol version, signal
completion
- VF_REGISTER_REPLY: mark VF as registered, signal completion
- VF_UNREGISTER_REPLY: mark VF as unregistered, signal
completion
- PF_LINK_STATE_NOTIF: update carrier state via
netif_carrier_on/off, send ACK back to PF
VF initiation functions for the probe-time handshake:
- enic_mbox_vf_capability_check: send capability request,
wait for PF reply via completion
- enic_mbox_vf_register: send register request, wait for
PF confirmation via completion
- enic_mbox_vf_unregister: send unregister request, wait
for PF confirmation
The wait helper (enic_mbox_wait_reply) uses
wait_for_completion_timeout, signaled when the admin ISR and
CQ-poll/dispatch workqueue pipeline delivers the reply message.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
---
drivers/net/ethernet/cisco/enic/enic.h | 11 ++
drivers/net/ethernet/cisco/enic/enic_mbox.c | 265 ++++++++++++++++++++++++++++
drivers/net/ethernet/cisco/enic/enic_mbox.h | 3 +
3 files changed, 279 insertions(+)
diff --git a/drivers/net/ethernet/cisco/enic/enic.h b/drivers/net/ethernet/cisco/enic/enic.h
index cace8e04e9ce..294b751b7cb6 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -258,6 +258,8 @@ struct enic {
u32 tx_coalesce_usecs;
u16 num_vfs;
enum enic_vf_type vf_type;
+ bool vf_registered;
+ u32 pf_cap_version;
unsigned int enable_count;
spinlock_t enic_api_lock;
bool enic_api_busy;
@@ -307,6 +309,15 @@ struct enic {
/* MBOX protocol state — mbox_lock serializes admin WQ sends */
struct mutex mbox_lock;
u64 mbox_msg_num;
+ /* MBOX request-reply state. Written by the process-context request
+ * helpers (capability/register/unregister) and read/cleared by the
+ * admin_msg_work receive handlers. No explicit lock is needed because
+ * only one request is in flight at a time: requesters run under RTNL or
+ * single-threaded probe/remove, so each request is serialized and its
+ * reply completes mbox_comp before the next request is issued.
+ */
+ struct completion mbox_comp;
+ u8 mbox_expected_reply;
/* PF: per-VF MBOX state, allocated when SRIOV V2 is enabled */
struct enic_vf_state {
diff --git a/drivers/net/ethernet/cisco/enic/enic_mbox.c b/drivers/net/ethernet/cisco/enic/enic_mbox.c
index b6f05b03ae26..eb084adae810 100644
--- a/drivers/net/ethernet/cisco/enic/enic_mbox.c
+++ b/drivers/net/ethernet/cisco/enic/enic_mbox.c
@@ -5,6 +5,7 @@
#include <linux/netdevice.h>
#include <linux/dma-mapping.h>
#include <linux/delay.h>
+#include <linux/completion.h>
#include "vnic_dev.h"
#include "vnic_wq.h"
@@ -135,6 +136,16 @@ int enic_mbox_send_msg(struct enic *enic, u8 msg_type, u16 dst_vnic_id,
return err;
}
+static int enic_mbox_wait_reply(struct enic *enic, unsigned long timeout_ms)
+{
+ unsigned long left;
+
+ left = wait_for_completion_timeout(&enic->mbox_comp,
+ msecs_to_jiffies(timeout_ms));
+
+ return left ? 0 : -ETIMEDOUT;
+}
+
int enic_mbox_send_link_state(struct enic *enic, u16 vf_id, u32 link_state)
{
struct enic_mbox_pf_link_state_notif_msg notif = {};
@@ -306,6 +317,166 @@ static void enic_mbox_pf_process_msg(struct enic *enic,
hdr->msg_type, vf_id, err);
}
+static void enic_mbox_vf_handle_capability_reply(struct enic *enic,
+ void *payload)
+{
+ struct enic_mbox_vf_capability_reply_msg *reply = payload;
+
+ if (enic->mbox_expected_reply != ENIC_MBOX_VF_CAPABILITY_REPLY) {
+ netdev_warn(enic->netdev,
+ "MBOX: stale capability reply (expected %u), drop\n",
+ enic->mbox_expected_reply);
+ return;
+ }
+
+ if (le16_to_cpu(reply->reply.ret_major) == 0)
+ enic->pf_cap_version = le32_to_cpu(reply->version);
+ else
+ netdev_warn(enic->netdev,
+ "MBOX: PF rejected capability request: %u/%u\n",
+ le16_to_cpu(reply->reply.ret_major),
+ le16_to_cpu(reply->reply.ret_minor));
+ complete(&enic->mbox_comp);
+}
+
+static void enic_mbox_vf_handle_register_reply(struct enic *enic,
+ void *payload)
+{
+ struct enic_mbox_vf_register_reply_msg *reply = payload;
+
+ if (enic->mbox_expected_reply != ENIC_MBOX_VF_REGISTER_REPLY) {
+ netdev_warn(enic->netdev,
+ "MBOX: stale register reply (expected %u), drop\n",
+ enic->mbox_expected_reply);
+ return;
+ }
+
+ if (le16_to_cpu(reply->reply.ret_major)) {
+ netdev_warn(enic->netdev,
+ "MBOX: VF register rejected by PF: %u/%u\n",
+ le16_to_cpu(reply->reply.ret_major),
+ le16_to_cpu(reply->reply.ret_minor));
+ } else {
+ enic->vf_registered = true;
+ }
+ complete(&enic->mbox_comp);
+}
+
+static void enic_mbox_vf_handle_unregister_reply(struct enic *enic,
+ void *payload)
+{
+ struct enic_mbox_vf_register_reply_msg *reply = payload;
+
+ if (enic->mbox_expected_reply != ENIC_MBOX_VF_UNREGISTER_REPLY) {
+ netdev_warn(enic->netdev,
+ "MBOX: stale unregister reply (expected %u), drop\n",
+ enic->mbox_expected_reply);
+ return;
+ }
+
+ if (le16_to_cpu(reply->reply.ret_major)) {
+ netdev_warn(enic->netdev,
+ "MBOX: VF unregister rejected by PF: %u/%u\n",
+ le16_to_cpu(reply->reply.ret_major),
+ le16_to_cpu(reply->reply.ret_minor));
+ } else {
+ enic->vf_registered = false;
+ }
+ complete(&enic->mbox_comp);
+}
+
+static void enic_mbox_vf_handle_link_state(struct enic *enic, void *payload)
+{
+ struct enic_mbox_pf_link_state_notif_msg *notif = payload;
+ struct enic_mbox_pf_link_state_ack_msg ack = {};
+ int err;
+
+ switch (le32_to_cpu(notif->link_state)) {
+ case ENIC_MBOX_LINK_STATE_ENABLE:
+ if (!netif_carrier_ok(enic->netdev))
+ netif_carrier_on(enic->netdev);
+ netdev_dbg(enic->netdev, "MBOX: link state -> UP\n");
+ break;
+ case ENIC_MBOX_LINK_STATE_DISABLE:
+ if (netif_carrier_ok(enic->netdev))
+ netif_carrier_off(enic->netdev);
+ netdev_dbg(enic->netdev, "MBOX: link state -> DOWN\n");
+ break;
+ default:
+ netdev_warn(enic->netdev, "MBOX: unknown link state %u\n",
+ le32_to_cpu(notif->link_state));
+ ack.ack.ret_major = cpu_to_le16(ENIC_MBOX_ERR_GENERIC);
+ break;
+ }
+
+ err = enic_mbox_send_msg(enic, ENIC_MBOX_PF_LINK_STATE_ACK,
+ ENIC_MBOX_DST_PF, &ack, sizeof(ack));
+ if (err && net_ratelimit())
+ netdev_warn(enic->netdev,
+ "MBOX: failed to send link state ACK: %d\n", err);
+}
+
+static bool enic_mbox_vf_payload_ok(struct enic *enic, u8 msg_type,
+ u16 payload_len, size_t min_len)
+{
+ if (payload_len < min_len) {
+ netdev_warn(enic->netdev,
+ "MBOX: short payload for type %u (%u < %zu)\n",
+ msg_type, payload_len, min_len);
+ return false;
+ }
+ return true;
+}
+
+static void enic_mbox_vf_process_msg(struct enic *enic,
+ struct enic_mbox_hdr *hdr, void *payload,
+ u16 payload_len)
+{
+ switch (hdr->msg_type) {
+ case ENIC_MBOX_VF_CAPABILITY_REPLY: {
+ size_t exp = sizeof(struct enic_mbox_vf_capability_reply_msg);
+
+ if (!enic_mbox_vf_payload_ok(enic, hdr->msg_type,
+ payload_len, exp))
+ return;
+ enic_mbox_vf_handle_capability_reply(enic, payload);
+ break;
+ }
+ case ENIC_MBOX_VF_REGISTER_REPLY: {
+ size_t exp = sizeof(struct enic_mbox_vf_register_reply_msg);
+
+ if (!enic_mbox_vf_payload_ok(enic, hdr->msg_type,
+ payload_len, exp))
+ return;
+ enic_mbox_vf_handle_register_reply(enic, payload);
+ break;
+ }
+ case ENIC_MBOX_VF_UNREGISTER_REPLY: {
+ size_t exp = sizeof(struct enic_mbox_vf_register_reply_msg);
+
+ if (!enic_mbox_vf_payload_ok(enic, hdr->msg_type,
+ payload_len, exp))
+ return;
+ enic_mbox_vf_handle_unregister_reply(enic, payload);
+ break;
+ }
+ case ENIC_MBOX_PF_LINK_STATE_NOTIF: {
+ size_t exp = sizeof(struct enic_mbox_pf_link_state_notif_msg);
+
+ if (!enic_mbox_vf_payload_ok(enic, hdr->msg_type,
+ payload_len, exp))
+ return;
+ enic_mbox_vf_handle_link_state(enic, payload);
+ break;
+ }
+ default:
+ netdev_dbg(enic->netdev,
+ "MBOX: VF unhandled msg type %u\n",
+ hdr->msg_type);
+ break;
+ }
+}
+
static void enic_mbox_recv_handler(struct enic *enic, void *buf,
unsigned int len)
{
@@ -346,11 +517,105 @@ static void enic_mbox_recv_handler(struct enic *enic, void *buf,
if (enic->vf_state)
enic_mbox_pf_process_msg(enic, hdr, payload);
+ else
+ enic_mbox_vf_process_msg(enic, hdr, payload,
+ msg_len - (u16)sizeof(*hdr));
+}
+
+int enic_mbox_vf_capability_check(struct enic *enic)
+{
+ struct enic_mbox_vf_capability_msg req = {};
+ int err;
+
+ enic->pf_cap_version = 0;
+ reinit_completion(&enic->mbox_comp);
+ enic->mbox_expected_reply = ENIC_MBOX_VF_CAPABILITY_REPLY;
+ req.version = cpu_to_le32(ENIC_MBOX_CAP_VERSION_1);
+
+ err = enic_mbox_send_msg(enic, ENIC_MBOX_VF_CAPABILITY_REQUEST,
+ ENIC_MBOX_DST_PF, &req, sizeof(req));
+ if (err) {
+ enic->mbox_expected_reply = 0;
+ return err;
+ }
+
+ err = enic_mbox_wait_reply(enic, 3000);
+ enic->mbox_expected_reply = 0;
+ if (err) {
+ netdev_warn(enic->netdev,
+ "MBOX: no capability reply from PF\n");
+ return err;
+ }
+
+ if (enic->pf_cap_version < ENIC_MBOX_CAP_VERSION_1) {
+ netdev_warn(enic->netdev,
+ "MBOX: PF version %u too old\n",
+ enic->pf_cap_version);
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+int enic_mbox_vf_register(struct enic *enic)
+{
+ int err;
+
+ enic->vf_registered = false;
+ reinit_completion(&enic->mbox_comp);
+ enic->mbox_expected_reply = ENIC_MBOX_VF_REGISTER_REPLY;
+
+ err = enic_mbox_send_msg(enic, ENIC_MBOX_VF_REGISTER_REQUEST,
+ ENIC_MBOX_DST_PF, NULL, 0);
+ if (err) {
+ enic->mbox_expected_reply = 0;
+ return err;
+ }
+
+ err = enic_mbox_wait_reply(enic, 3000);
+ enic->mbox_expected_reply = 0;
+ if (err) {
+ netdev_warn(enic->netdev,
+ "MBOX: VF registration with PF timed out\n");
+ return err;
+ }
+
+ if (!enic->vf_registered)
+ return -ENODEV;
+
+ return 0;
+}
+
+int enic_mbox_vf_unregister(struct enic *enic)
+{
+ int err;
+
+ if (!enic->vf_registered)
+ return 0;
+
+ reinit_completion(&enic->mbox_comp);
+ enic->mbox_expected_reply = ENIC_MBOX_VF_UNREGISTER_REPLY;
+
+ err = enic_mbox_send_msg(enic, ENIC_MBOX_VF_UNREGISTER_REQUEST,
+ ENIC_MBOX_DST_PF, NULL, 0);
+ if (err) {
+ enic->mbox_expected_reply = 0;
+ return err;
+ }
+
+ err = enic_mbox_wait_reply(enic, 3000);
+ enic->mbox_expected_reply = 0;
+ if (err)
+ return err;
+ if (enic->vf_registered)
+ return -EACCES;
+ return 0;
}
void enic_mbox_init(struct enic *enic)
{
enic->mbox_msg_num = 0;
mutex_init(&enic->mbox_lock);
+ init_completion(&enic->mbox_comp);
enic->admin_rq_handler = enic_mbox_recv_handler;
}
diff --git a/drivers/net/ethernet/cisco/enic/enic_mbox.h b/drivers/net/ethernet/cisco/enic/enic_mbox.h
index f1de67db1273..15e30ee2b0ed 100644
--- a/drivers/net/ethernet/cisco/enic/enic_mbox.h
+++ b/drivers/net/ethernet/cisco/enic/enic_mbox.h
@@ -88,5 +88,8 @@ void enic_mbox_init(struct enic *enic);
int enic_mbox_send_msg(struct enic *enic, u8 msg_type, u16 dst_vnic_id,
void *payload, u16 payload_len);
int enic_mbox_send_link_state(struct enic *enic, u16 vf_id, u32 link_state);
+int enic_mbox_vf_capability_check(struct enic *enic);
+int enic_mbox_vf_register(struct enic *enic);
+int enic_mbox_vf_unregister(struct enic *enic);
#endif /* _ENIC_MBOX_H_ */
--
2.43.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox