* Re: [PATCH v5] net: caif: fix stack out-of-bounds write in cfctrl_link_setup()
From: Simon Horman @ 2026-04-12 13:57 UTC (permalink / raw)
To: Kangzheng Gu
Cc: pabeni, davem, edumazet, kuba, kees, thorsten.blum, arnd,
sjur.brandeland, netdev, linux-kernel, stable
In-Reply-To: <20260408125333.38489-1-xiaoguai0992@gmail.com>
On Wed, Apr 08, 2026 at 12:53:33PM +0000, Kangzheng Gu wrote:
> cfctrl_link_setup() copies the RFM volume name from a received control
> packet into linkparam.u.rfm.volume until a '\0' is found. A malformed
> packet can omit the terminator and make the copy run past the 20-byte
> stack buffer.
>
> Stop copying once the buffer is full and mark the frame as failed by
> setting CFCTRL_ERR_BIT so the link setup is rejected.
>
> Fixes: b482cd2053e3 ("net-caif: add CAIF core protocol stack")
> Cc: stable@vger.kernel.org
> Signed-off-by: Kangzheng Gu <xiaoguai0992@gmail.com>
> ---
> v5:
> - remove the Reported-by.
> - print a warn message and reject link setup by setting CFCTRL_ERR_BIT.
> - using %zu to adapt the compilation of 32-bit kernel.
> - add rate limit to error message
>
> net/caif/cfctrl.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/caif/cfctrl.c b/net/caif/cfctrl.c
> index c6cc2bfed65d..373ab1dc67a7 100644
> --- a/net/caif/cfctrl.c
> +++ b/net/caif/cfctrl.c
> @@ -416,8 +416,16 @@ static int cfctrl_link_setup(struct cfctrl *cfctrl, struct cfpkt *pkt, u8 cmdrsp
> cp = (u8 *) linkparam.u.rfm.volume;
> for (tmp = cfpkt_extr_head_u8(pkt);
> cfpkt_more(pkt) && tmp != '\0';
> - tmp = cfpkt_extr_head_u8(pkt))
> + tmp = cfpkt_extr_head_u8(pkt)) {
> + if (cp >= (u8 *)linkparam.u.rfm.volume +
> + sizeof(linkparam.u.rfm.volume) - 1) {
> + pr_warn_ratelimited("Request reject, volume name length exceeds %zu\n",
> + sizeof(linkparam.u.rfm.volume));
> + cmdrsp |= CFCTRL_ERR_BIT;
> + break;
> + }
> *cp++ = tmp;
> + }
> *cp = '\0';
>
> if (CFCTRL_ERR_BIT & cmdrsp)
I am wondering if it would be best to follow the pattern for
writing linkparam.u.utility.name elsewhere in this function.
That:
1. Uses a somewhat more succinct loop control structure
2. Silently truncates input without updating cmdrsp if overrun would occur
Something like this (compile tested only!):
diff --git a/net/caif/cfctrl.c b/net/caif/cfctrl.c
index c6cc2bfed65d..ba184c11386e 100644
--- a/net/caif/cfctrl.c
+++ b/net/caif/cfctrl.c
@@ -15,6 +15,7 @@
#include <net/caif/cfctrl.h>
#define container_obj(layr) container_of(layr, struct cfctrl, serv.layer)
+#define RFM_VOLUME_LEN 20
#define UTILITY_NAME_LENGTH 16
#define CFPKT_CTRL_PKT_LEN 20
@@ -414,10 +415,11 @@ static int cfctrl_link_setup(struct cfctrl *cfctrl, struct cfpkt *pkt, u8 cmdrsp
*/
linkparam.u.rfm.connid = cfpkt_extr_head_u32(pkt);
cp = (u8 *) linkparam.u.rfm.volume;
- for (tmp = cfpkt_extr_head_u8(pkt);
- cfpkt_more(pkt) && tmp != '\0';
- tmp = cfpkt_extr_head_u8(pkt))
+ caif_assert(sizeof(linkparam.u.rfm.volume) >= RFM_VOLUME_LEN);
+ for(i = 0; i < RFM_VOLUME_LEN - 1 && cfpkt_more(pkt); i++) {
+ tmp = cfpkt_extr_head_u8(pkt);
*cp++ = tmp;
+ }
*cp = '\0';
if (CFCTRL_ERR_BIT & cmdrsp)
Also, it seems that writing linkparam.u.utility.paramlen elsewhere
in this function also has a potential buffer overrun (by one byte).
^ permalink raw reply related
* Re: [PATCH net-next v2] r8169: Use napi_schedule_irqoff()
From: Matt Vollrath @ 2026-04-12 13:51 UTC (permalink / raw)
To: Heiner Kallweit, netdev; +Cc: edumazet, pabeni, kuba, andrew+netdev, nic_swsd
In-Reply-To: <d0e1bab5-3476-4b58-834c-6e742e89539d@gmail.com>
On 4/12/26 07:30, Heiner Kallweit wrote:
> On 12.04.2026 03:40, Matt Vollrath wrote:
>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>> index 791277e750ba..4c0ad0de3410 100644
>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>> @@ -4873,7 +4873,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>> phy_mac_interrupt(tp->phydev);
>>
>> rtl_irq_disable(tp);
>> - napi_schedule(&tp->napi);
>> + napi_schedule_irqoff(&tp->napi);
>> out:
>> rtl_ack_events(tp, status);
>>
>
> Not using napi_schedule_irqoff() here is intentional,
> see 2734a24e6e5d18522fbf599135c59b82ec9b2c9e.
It looks like forced threading was fixed after your fix
to mitigate the issue of forced threading not masking
interrupts.
see 81e2073c175b887398e5bca6c004efa89983f58d
If I understand correctly, this should make
napi_schedule_irqoff() safe in any interrupt handler.
^ permalink raw reply
* RE: [PATCH v5 net-next 0/8] dpll/ice: Add TXC DPLL type and full TX reference clock control for E825
From: Nitka, Grzegorz @ 2026-04-12 13:50 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
intel-wired-lan@lists.osuosl.org, Oros, Petr,
richardcochran@gmail.com, andrew+netdev@lunn.ch,
Kitszel, Przemyslaw, Nguyen, Anthony L,
Prathosh.Satish@microchip.com, Vecera, Ivan, jiri@resnulli.us,
Kubalewski, Arkadiusz, vadim.fedorenko@linux.dev,
donald.hunter@gmail.com, horms@kernel.org, pabeni@redhat.com,
davem@davemloft.net, edumazet@google.com
In-Reply-To: <20260410133812.4cf9b090@kernel.org>
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, April 10, 2026 10:38 PM
> To: Nitka, Grzegorz <grzegorz.nitka@intel.com>
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; intel-wired-
> lan@lists.osuosl.org; Oros, Petr <poros@redhat.com>;
> richardcochran@gmail.com; andrew+netdev@lunn.ch; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Prathosh.Satish@microchip.com; Vecera,
> Ivan <ivecera@redhat.com>; jiri@resnulli.us; Kubalewski, Arkadiusz
> <arkadiusz.kubalewski@intel.com>; vadim.fedorenko@linux.dev;
> donald.hunter@gmail.com; horms@kernel.org; pabeni@redhat.com;
> davem@davemloft.net; edumazet@google.com
> Subject: Re: [PATCH v5 net-next 0/8] dpll/ice: Add TXC DPLL type and full TX
> reference clock control for E825
>
> On Fri, 10 Apr 2026 14:23:58 +0000 Nitka, Grzegorz wrote:
> > Here is the high-level connection diagram for E825 device. I hope you find it
> helpful:
> > [..]
>
> It does thanks a lot.
>
> > Before this series, we tried different approaches.
> > One of them was to create MUX pin associated with netdev interface.
> > EXT_REF and SYNCE pins were registered with this MUX pin.
> > However I recall there were at least two issues with this solution:
> > - when using DPLL subsystem not all the connections/relations were visible
> > from DPLL pin-get perspective. RT netlink was required
> > - due to mixing pins from different modules (like fwnode based pin from zl
> driver
> > and the pins from ice), we were not able to safely clean the references
> between
> > pins and dpll (basicaly .. we observed crashes)
> >
> > Proposed solution just seems to be clean and fully reflects current
> > connection topology.
>
> Do you have the link to the old proposal that was adding stuff to
> rtnetlink? I remember some discussion long-ish ago, maybe I was wrong.
>
Hello Jakub,
This is the patch from the discussion I put the link in the cover letter:
https://lore.kernel.org/netdev/20250828164345.116097-1-arkadiusz.kubalewski@intel.com/
Regards
Grzegorz
> > What's actually your biggest concern?
> > The fact we introduce a new DPLL type? Or multiply DPLL instances? Or
> both?
> > Do you prefer to see "one big" DPLL with 16 pins in our case (8 ports x 2 tx-
> clk pins)?
> > Each pin with the name like, for example, PF0-SyncE/PF0-eRef etc.?
>
> My concern is that I think this is a pretty run of the mill SyncE
> design. If we need to pretend we have two DPLLs here if we really
> only have one and a mux - then our APIs are mis-designed :(
^ permalink raw reply
* Re: [patch 23/38] alpha: Select ARCH_HAS_RANDOM_ENTROPY
From: Magnus Lindholm @ 2026-04-12 13:22 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Richard Henderson, linux-alpha, Arnd Bergmann, x86,
Lu Baolu, iommu, Michael Grzeschik, netdev, linux-wireless,
Herbert Xu, linux-crypto, Vlastimil Babka, linux-mm,
David Woodhouse, Bernie Thompson, linux-fbdev, Theodore Tso,
linux-ext4, Andrew Morton, Uladzislau Rezki, Marco Elver,
Dmitry Vyukov, kasan-dev, Andrey Ryabinin, Thomas Sailer,
linux-hams, Jason A. Donenfeld, Russell King, linux-arm-kernel,
Catalin Marinas, Huacai Chen, loongarch, Geert Uytterhoeven,
linux-m68k, Dinh Nguyen, Jonas Bonn, linux-openrisc, Helge Deller,
linux-parisc, Michael Ellerman, linuxppc-dev, Paul Walmsley,
linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
sparclinux
In-Reply-To: <20260410120319.131582521@kernel.org>
On Fri, Apr 10, 2026 at 2:36 PM Thomas Gleixner <tglx@kernel.org> wrote:
>
> The only remaining usage of get_cycles() is to provide
> random_get_entropy().
>
> Switch alpha over to the new scheme of selecting ARCH_HAS_RANDOM_ENTROPY
> and providing random_get_entropy() in asm/random.h.
>
> Remove asm/timex.h as it has no functionality anymore.
>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: linux-alpha@vger.kernel.org
> ---
> arch/alpha/Kconfig | 1 +
> arch/alpha/include/asm/random.h | 14 ++++++++++++++
> arch/alpha/include/asm/timex.h | 26 --------------------------
> 3 files changed, 15 insertions(+), 26 deletions(-)
Hi,
The Alpha side looks fine to me.
I've applied this patch on top of v7.0-rc7, built a kernel successfully,
boot-tested it on an Alpha UP2000+ (SMP) without issues.
Acked-by: Magnus Lindholm <linmag7@gmail.com>
Tested-by: Magnus Lindholm <linmag7@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next v2] r8169: Use napi_schedule_irqoff()
From: Matt Vollrath @ 2026-04-12 13:11 UTC (permalink / raw)
To: Heiner Kallweit, netdev; +Cc: edumazet, pabeni, kuba, andrew+netdev, nic_swsd
In-Reply-To: <d0e1bab5-3476-4b58-834c-6e742e89539d@gmail.com>
On 4/12/26 07:30, Heiner Kallweit wrote:
> On 12.04.2026 03:40, Matt Vollrath wrote:
>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>> index 791277e750ba..4c0ad0de3410 100644
>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>> @@ -4873,7 +4873,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>> phy_mac_interrupt(tp->phydev);
>>
>> rtl_irq_disable(tp);
>> - napi_schedule(&tp->napi);
>> + napi_schedule_irqoff(&tp->napi);
>> out:
>> rtl_ack_events(tp, status);
>>
>
> Not using napi_schedule_irqoff() here is intentional,
> see 2734a24e6e5d18522fbf599135c59b82ec9b2c9e.
I see, thank you Heiner. Seems like it could work if
legacy interrupts went to a separate handler, but not
worth the complexity for this small optimization.
mv
^ permalink raw reply
* Re: [PATCH v11 net-next 09/11] net/nebula-matrix: add Dispatch layer implementation
From: Simon Horman @ 2026-04-12 12:59 UTC (permalink / raw)
To: illusion.wang
Cc: Simon Horman, alvin.wang, sam.chen, netdev, skhan, lorenzo,
open list, edumazet, linux-doc, enelsonmoore, andrew+netdev,
pabeni, lukas.bulwahn, kuba, hkallweit1, dimon.zhao, corbet,
vadim.fedorenko
In-Reply-To: <20260408093739.56001-10-illusion.wang@nebula-matrix.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net/nebula-matrix: add Dispatch layer implementation
This patch introduces a control-level routing mechanism for the Dispatch
layer with two routing paths: a direct path (Dispatch -> Resource -> HW)
and a channel path (Dispatch -> Channel). The implementation uses
interface-declared control levels and upper-layer configured control
levels to determine the routing.
> diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_core/nbl_dispatch.c b/drivers/net/ethernet/nebula-matrix/nbl/nbl_core/nbl_dispatch.c
> index 8b031f6e963f..eee41d10ad63 100644
> --- a/drivers/net/ethernet/nebula-matrix/nbl/nbl_core/nbl_dispatch.c
> +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_core/nbl_dispatch.c
> @@ -6,6 +6,396 @@
> #include <linux/pci.h>
> #include "nbl_dispatch.h"
>
> +static u16 nbl_disp_chan_get_vsi_id_req(struct nbl_dispatch_mgt *disp_mgt,
> + u16 func_id, u16 type)
> +{
> + struct nbl_channel_ops *chan_ops = disp_mgt->chan_ops_tbl->ops;
> + struct nbl_common_info *common = disp_mgt->common;
> + struct nbl_chan_param_get_vsi_id result = { 0 };
> + struct nbl_chan_param_get_vsi_id param = { 0 };
> + struct nbl_chan_send_info chan_send;
> +
> + param.type = type;
> +
> + NBL_CHAN_SEND(chan_send, common->mgt_pf, NBL_CHAN_MSG_GET_VSI_ID,
> + ¶m, sizeof(param), &result, sizeof(result), 1);
> + chan_ops->send_msg(disp_mgt->chan_ops_tbl->priv, &chan_send);
> +
> + return result.vsi_id;
> +}
Does nbl_disp_chan_get_vsi_id_req() ignore the func_id parameter?
The function accepts func_id but only sends param.type in the channel
message. The nbl_chan_param_get_vsi_id struct appears to have only
{vsi_id, type} fields, with no func_id field.
On the responder side:
> +static void nbl_disp_chan_get_vsi_id_resp(void *priv, u16 src_id, u16 msg_id,
> + void *data, u32 data_len)
> +{
> + struct nbl_dispatch_mgt *disp_mgt = (struct nbl_dispatch_mgt *)priv;
> + struct nbl_channel_ops *chan_ops = disp_mgt->chan_ops_tbl->ops;
> + struct nbl_resource_ops *res_ops = disp_mgt->res_ops_tbl->ops;
> + struct nbl_resource_mgt *p = disp_mgt->res_ops_tbl->priv;
> + struct device *dev = disp_mgt->common->dev;
> + struct nbl_chan_param_get_vsi_id *param;
> + struct nbl_chan_param_get_vsi_id result = { 0 };
> + struct nbl_chan_ack_info chan_ack;
> + int err = NBL_CHAN_RESP_OK;
> + int ret;
> +
> + param = (struct nbl_chan_param_get_vsi_id *)data;
> +
> + result.vsi_id =
> + NBL_OPS_CALL_RET(res_ops->get_vsi_id, (p, src_id, param->type));
nbl_disp_chan_get_vsi_id_resp() uses src_id (the channel source) instead
of func_id. This creates a behavioral inconsistency with the direct path:
> +static u16 nbl_disp_get_vsi_id(struct nbl_dispatch_mgt *disp_mgt, u16 func_id,
> + u16 type)
> +{
> + struct nbl_resource_ops *res_ops = disp_mgt->res_ops_tbl->ops;
> + struct nbl_resource_mgt *p = disp_mgt->res_ops_tbl->priv;
> +
> + return NBL_OPS_CALL_RET(res_ops->get_vsi_id, (p, func_id, type));
> +}
The direct path correctly passes func_id to res_ops->get_vsi_id(p, func_id,
type).
When the channel path is active, callers cannot query VSI IDs for
functions other than the sender. Is the func_id parameter intended to be
used in nbl_disp_chan_get_vsi_id_req()?
[ ... ]
^ permalink raw reply
* Re: [PATCH v11 net-next 05/11] net/nebula-matrix: add channel layer
From: Simon Horman @ 2026-04-12 12:59 UTC (permalink / raw)
To: illusion.wang
Cc: Simon Horman, dimon.zhao, andrew+netdev, alvin.wang, corbet,
enelsonmoore, netdev, kuba, sam.chen, vadim.fedorenko, edumazet,
skhan, hkallweit1, lukas.bulwahn, lorenzo, open list, linux-doc,
pabeni
In-Reply-To: <20260408093739.56001-6-illusion.wang@nebula-matrix.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net/nebula-matrix: add channel layer
This commit adds a channel management layer for the nebula-matrix
network driver, providing mailbox-based communication between PF0
and other PFs with message send/receive, ACK handling, and
hash-based message handler lookup.
> diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_channel/nbl_channel.c b/drivers/net/ethernet/nebula-matrix/nbl/nbl_channel/nbl_channel.c
> --- a/drivers/net/ethernet/nebula-matrix/nbl/nbl_channel/nbl_channel.c
> +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_channel/nbl_channel.c
[ ... ]
> +static int nbl_chan_send_msg(struct nbl_channel_mgt *chan_mgt,
> + struct nbl_chan_send_info *chan_send)
> +{
[ ... ]
> + if (test_bit(NBL_CHAN_INTERRUPT_READY, chan_info->state)) {
> + ret = wait_event_timeout(wait_head->wait_queue,
> + wait_head->acked,
> + NBL_CHAN_ACK_WAIT_TIME);
> + if (!ret) {
> + wait_head->status = NBL_MBX_STATUS_TIMEOUT;
> + dev_err(dev,
> + "Channel waiting ack failed, message type: %d, msg id: %u\n",
> + chan_send->msg_type, msgid.id);
> + return -ETIMEDOUT;
> + }
[ ... ]
> + /*polling wait mailbox ack*/
> + while (i--) {
> + nbl_chan_clean_queue(chan_mgt, chan_info);
> +
> + if (wait_head->acked) {
> + chan_send->ack_len = wait_head->ack_data_len;
> + wait_head->status = NBL_MBX_STATUS_IDLE;
> + return wait_head->ack_err;
> + }
> + usleep_range(NBL_CHAN_TX_WAIT_ACK_US_MIN,
> + NBL_CHAN_TX_WAIT_ACK_US_MAX);
> + }
> +
> + wait_head->status = NBL_MBX_STATUS_TIMEOUT;
> + dev_err(dev,
> + "Channel polling ack failed, message type: %d msg id: %u\n",
> + chan_send->msg_type, msgid.id);
> +
> + return -EFAULT;
^^^^^^
The interrupt-based ACK wait path above returns -ETIMEDOUT on timeout,
but this polling-based ACK wait path returns -EFAULT. -EFAULT ("bad
address") is not the right error code for a timeout condition.
Should this also return -ETIMEDOUT for consistency? Any caller that
checks specifically for -ETIMEDOUT to handle timeouts would miss
this polling timeout case.
> +}
[ ... ]
> diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_common/nbl_common.c b/drivers/net/ethernet/nebula-matrix/nbl/nbl_common/nbl_common.c
> --- /dev/null
> +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_common/nbl_common.c
[ ... ]
> +int nbl_common_create_wq(void)
> +{
> + wq_mgt = kzalloc_obj(*wq_mgt);
> + if (!wq_mgt)
> + return -ENOMEM;
> +
> + wq_mgt->ctrl_dev_wq = alloc_workqueue("%s", WQ_MEM_RECLAIM | WQ_UNBOUND,
> + 0, "nbl_ctrldev_wq");
> + if (!wq_mgt->ctrl_dev_wq) {
> + pr_err("Failed to create workqueue nbl_ctrldev_wq\n");
> + goto alloc_ctrl_dev_wq_failed;
> + }
> +
> + return 0;
> +alloc_ctrl_dev_wq_failed:
> + kfree(wq_mgt);
> + return -ENOMEM;
> +}
When alloc_workqueue() fails, wq_mgt is freed but the static pointer
is not set to NULL. After this error path, wq_mgt becomes a dangling
pointer.
If nbl_common_queue_work() or nbl_common_destroy_wq() is called
afterwards, they dereference wq_mgt unconditionally:
nbl_common_queue_work():
queue_work(wq_mgt->ctrl_dev_wq, task);
nbl_common_destroy_wq():
destroy_workqueue(wq_mgt->ctrl_dev_wq);
kfree(wq_mgt);
Would adding "wq_mgt = NULL;" after kfree(wq_mgt) on the error path
prevent this?
[ ... ]
> diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_common/nbl_common.h b/drivers/net/ethernet/nebula-matrix/nbl/nbl_common/nbl_common.h
> --- /dev/null
> +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_common/nbl_common.h
[ ... ]
> +/* it used for y_axis no necessay */
> +struct nbl_hash_entry_node {
> + struct hlist_node node;
> + void *key;
> + void *data;
> +};
This isn't a bug, but the comment "it used for y_axis no necessay"
appears garbled and "necessay" is misspelled. Could the comment be
rewritten to explain the purpose of this struct, or removed if it
is no longer relevant?
^ permalink raw reply
* Re: [PATCH v11 net-next 02/11] net/nebula-matrix: add our driver architecture
From: Simon Horman @ 2026-04-12 12:57 UTC (permalink / raw)
To: illusion.wang
Cc: Simon Horman, enelsonmoore, alvin.wang, linux-doc, hkallweit1,
edumazet, lukas.bulwahn, kuba, open list, pabeni, netdev,
sam.chen, corbet, vadim.fedorenko, lorenzo, dimon.zhao, skhan,
andrew+netdev
In-Reply-To: <20260408093739.56001-3-illusion.wang@nebula-matrix.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net/nebula-matrix: add our driver architecture
This commit introduces a layered architecture for the nebula-matrix
driver, consisting of Device, Dispatch, Resource, Hardware, Channel, and
Common layers. It adds initialization and cleanup functions for each layer
with struct-based operation registration.
Signed-off-by: illusion.wang <illusion.wang@nebula-matrix.com>
> diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_main.c b/drivers/net/ethernet/nebula-matrix/nbl/nbl_main.c
> index 10c3536b327b..db035136e09c 100644
> --- a/drivers/net/ethernet/nebula-matrix/nbl/nbl_main.c
> +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_main.c
[ ... ]
> +struct nbl_adapter *nbl_core_init(struct pci_dev *pdev,
> + struct nbl_init_param *param)
> +{
> + struct nbl_product_base_ops *product_base_ops;
> + struct nbl_common_info *common;
> + struct nbl_adapter *adapter;
> + int ret;
> +
> + adapter = devm_kzalloc(&pdev->dev, sizeof(*adapter), GFP_KERNEL);
> + if (!adapter)
> + return NULL;
> +
> + adapter->pdev = pdev;
> + common = &adapter->common;
> +
> + common->pdev = pdev;
> + common->dev = &pdev->dev;
> + common->has_ctrl = param->caps.has_ctrl;
> + common->has_net = param->caps.has_net;
> + common->pci_using_dac = param->pci_using_dac;
> + common->function = PCI_FUNC(pdev->devfn);
> + common->devid = PCI_SLOT(pdev->devfn);
> + common->bus = pdev->bus->number;
> + common->product_type = param->product_type;
> +
> + product_base_ops = nbl_core_setup_product_ops(adapter, param);
> +
> + /*
> + *every product's hw/chan/res layer has a great difference,
> + *so call their own init ops
> + */
> + ret = product_base_ops->hw_init(adapter);
> + if (ret)
> + goto hw_init_fail;
> +
> + ret = product_base_ops->chan_init(adapter);
> + if (ret)
> + goto chan_init_fail;
> +
> + ret = product_base_ops->res_init(adapter);
> + if (ret)
> + goto res_init_fail;
> +
> + ret = nbl_disp_init(adapter);
> + if (ret)
> + goto res_init_fail;
> +
> + ret = nbl_dev_init(adapter);
> + if (ret)
> + goto res_init_fail;
Does the error path correctly clean up all initialized layers?
The initialization order is:
1. hw_init
2. chan_init
3. res_init
4. nbl_disp_init
5. nbl_dev_init
When nbl_disp_init() or nbl_dev_init() fails, control jumps to the
'res_init_fail' label, which only tears down the channel and hardware
layers:
> +res_init_fail:
> + product_base_ops->chan_remove(adapter);
> +chan_init_fail:
> + product_base_ops->hw_remove(adapter);
However, res_init has already succeeded at this point and is never cleaned
up. Looking at nbl_product_base_ops, there is no res_remove callback
defined, and nbl_core_remove() also shows the same asymmetry:
> +void nbl_core_remove(struct nbl_adapter *adapter)
> +{
> + struct nbl_product_base_ops *product_base_ops;
> +
> + product_base_ops = adapter->product_base_ops;
> + nbl_dev_remove(adapter);
> + product_base_ops->chan_remove(adapter);
> + product_base_ops->hw_remove(adapter);
> +}
This function calls nbl_dev_remove(), chan_remove(), and hw_remove(), but
never tears down the resource layer or the dispatch layer.
Currently, all allocations in res_init (nbl_res_init_leonis) and
nbl_disp_init use devm_kzalloc, so resources are auto-freed. However, the
error path structure is concretely wrong: if res_init or disp_init ever
needs non-devm resources (as nbl_hw_init_leonis already uses ioremap and
pci_request_selected_regions), the missing cleanup will cause resource
leaks.
Should the init/remove paths be symmetric, with every successful init
having a corresponding remove in both the error cleanup path and
nbl_core_remove()?
^ permalink raw reply
* Re: [PATCH net] netrom: do some basic forms of validation on incoming frames
From: Chris Maness @ 2026-04-12 12:56 UTC (permalink / raw)
To: hugh
Cc: Greg KH, Kuniyuki Iwashima, kuba, davem, edumazet, horms,
linux-hams, linux-kernel, netdev, pabeni, stable, workflows,
yizhe
In-Reply-To: <3cd91fbc-d3a9-431e-b915-58e851c7df9f@blemings.org>
Thanks for your work, Hugh.
-73 de Chris KQ6UP
On Sat, Apr 11, 2026 at 7:33 PM Hugh Blemings <hugh@blemings.org> wrote:
>
>
> On 11/4/2026 18:58, Greg KH wrote:
> > On Sat, Apr 11, 2026 at 05:24:17PM +1000, Hugh Blemings wrote:
> >> On 11/4/2026 15:50, Greg KH wrote:
> >>> On Sat, Apr 11, 2026 at 08:25:19AM +1000, Hugh Blemings wrote:
> >>>> On 11/4/2026 08:11, Kuniyuki Iwashima wrote:
> >>>>> From: Jakub Kicinski <kuba@kernel.org>
> >>>>> Date: Fri, 10 Apr 2026 14:54:48 -0700
> >>>>>> On Fri, 10 Apr 2026 14:30:42 -0700 Jakub Kicinski wrote:
> >>>>>>> On Fri, 10 Apr 2026 07:24:36 +0200 Greg Kroah-Hartman wrote:
> >>>>>>>> On Thu, Apr 09, 2026 at 08:32:35PM -0700, Jakub Kicinski wrote:
> >>>>>>>>> Or for simplicity we could also be testing against skb_headlen()
> >>>>>>>>> since we don't expect any legit non-linear frames here? Dunno.
> >>>>>>>> I'll be glad to change this either way, your call. Given that this is
> >>>>>>>> an obsolete protocol that seems to only be a target for drive-by fuzzers
> >>>>>>>> to attack, whatever the simplest thing to do to quiet them up I'll be
> >>>>>>>> glad to implement.
> >>>>>>>>
> >>>>>>>> Or can we just delete this stuff entirely? :)
> >>>>>>> Yes.
> >>>>>>>
> >>>>>>> My thinking is to delete hamradio, nfc, atm, caif.. [more to come]
> >>>>>>> Create GH repos which provide them as OOT modules.
> >>>>>>> Hopefully we can convince any existing users to switch to that.
> >>>>>>>
> >>>>>>> The only thing stopping me is the concern that this is just the softest
> >>>>>>> target and the LLMs will find something else to focus on which we can't
> >>>>>>> delete. I suspect any PCIe driver can be flooded with "aren't you
> >>>>>>> trusting the HW to provide valid responses here?" bullshit.
> >>>>>>>
> >>>>>>> But hey, let's try. I'll post a patch nuking all of hamradio later
> >>>>>>> today.
> >>>>>> Well, either we "expunge" this code to OOT repos, or we mark it
> >>>>>> as broken and tell everyone that we don't take security fixes
> >>>>>> for anything that depends on BROKEN. I'd personally rather expunge.
> >>>>> +1 for "expunge" to prevent LLM-based patch flood.
> >>>>>
> >>>>> IIRC, we did that recently for one driver only used by OpenWRT ?
> >>>>>
> >>>>>
> >>>> If the main concern here is ongoing maintenance of these Ham Radio related
> >>>> protocols/drivers, can we pause for a moment on anything as dramatic as
> >>>> removing from the tree entirely ?
> >>> Sure, but:
> >>>
> >>>> There is a good cohort of capable kernel folks that either are or were ham
> >>>> radio operators who I believe, upon realising that things have got to this
> >>>> point, will be happy to redouble efforts to ensure this code maintained and
> >>>> tested to a satisfactory standard.
> >>> We need this code to be maintained, because as is being shown, there are
> >>> reported problems with it that will affect these devices/networks that
> >>> you all are using. So all we need is a maintainer for this to be able
> >>> to take reports that we get and fix things up as needed. I know you
> >>> have that experience, want to come back to kernel development, we've
> >>> missed you :)
> >> That's most kind Greg, thank you, have missed all you cool kids too :)
> >>
> >> More seriously though - I'd be up for doing it, but I think there may be
> >> others better placed than I who haven't yet realised we have this conundrum.
> >> I'm nudging a few folks offline on this front.
> > The main "conundrum" is, is that this protocol completly trusts the
> > hardware to give the kernel the "correct" data. So if you trust the
> > hardware to work properly, it will be fine, but as the fuzzing tools are
> > finding, if the data from the hardware modems is a bit out-of-spec,
> > "bad" things can happen.
> >
> > I don't know how well controlled the data is from these devices, if it's
> > just a "pass through" from what they get off the "wire" or if the
> > devices always ensure the protocol packets are sane before passing them
> > off to the kernel. That's going to be something you all with the
> > hardware is going to have to determine in order to keep this a working
> > system over time. Especially given that this is a wireless protcol
> > where you "have" to trust the remote end.
>
> Thanks for the thoughts Greg - and ya, I guess on balance I come back to
> being generally skeptical of both hardware and software to Do The Right
> Thing (TM)
>
> So bounds checking and the like seems prudent irrespective of whether
> the kernel is getting the data from real hardware, software modems etc.
>
> I've done some initial digging around that confirms my suspicion that
> this in kernel code remains quite widely used, if somewhat out of view.
> Accordingly I lean then towards working to get these various mitigations
> in place with some revised patches etc. as needed and into the main tree.
>
> Once this done I think that'll give me a good sense of whether I or
> someone else is well positioned to keep the code maintained longer term
> and thus justify it remaining in tree or not.
>
> More to follow once I finish remembering this kernel thing!
>
> Cheers,
> Hugh
>
>
>
>
--
Thanks,
Chris Maness
^ permalink raw reply
* Re: [PATCH net-next] net: phy: call phy_init_hw() in phy resume path
From: Russell King (Oracle) @ 2026-04-12 12:55 UTC (permalink / raw)
To: Biju Das
Cc: Andrew Lunn, biju.das.au, Heiner Kallweit, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Ovidiu Panait,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Geert Uytterhoeven, Prabhakar Mahadev Lad,
linux-renesas-soc@vger.kernel.org
In-Reply-To: <TY3PR01MB11346F78B929EA7F377BB5B4A86272@TY3PR01MB11346.jpnprd01.prod.outlook.com>
On Sun, Apr 12, 2026 at 12:05:06PM +0000, Biju Das wrote:
> Hi Russell King,
>
> > -----Original Message-----
> > From: Russell King <linux@armlinux.org.uk>
> > Sent: 11 April 2026 17:47
> > Subject: Re: [PATCH net-next] net: phy: call phy_init_hw() in phy resume path
> >
> > On Sat, Apr 11, 2026 at 03:50:13PM +0200, Andrew Lunn wrote:
> > > > So, I question whether any of the functions in this driver actually
> > > > have a valid reason to take phydev->lock - looks to me like a not
> > > > very well written driver.
> > > >
> > > > In cases like this, I don't think we should make things more
> > > > difficult in the core just because we have a lockdep splat when that
> > > > can be avoided by killing off unnecessary locking.
> > >
> > > Agreed. This patchset should cleanup these locks.
> > >
> > > We also need to look at lan937x_dsp_workaround(). I also don't see
> > > what that mutex lock/unlock is protecting. Accessing bank registers
> > > need to be protected, so doing one additional access within that
> > > should not need additional protection.
> >
> > Looking at access_ereg(), shouldn't it be taking the MDIO bus lock and using the __phy_* accessors
> > anyway because it's writing various registers which determine what is being read via the
> > LAN87XX_EXT_REG_RD_DATA register or the value written via the LAN87XX_EXT_REG_WR_DATA register.
> >
> > Also, as it has access_ereg_modify_changed(), that entire sequence needs to take the MDIO bus lock to
> > safely do the read-modify-write.
> >
> > Then there's lan87xx_config_rgmii_delay() which is a large open coded read-modify-write for the
> > PHYACC_ATTR_BANK_MISC, LAN87XX_CTRL_1 register.
> >
> > To me, this looks like a racy driver, and it also looks like it's using the wrong lock to try and
> > protect hardware accesses.
>
> OK, will replace it with MDIO bus lock.
Remember that the phy_* accessors will take the MDIO bus lock, so will
need to be changed to their __phy_* counterparts.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply
* Re: [PATCH net 2/2] net: mana: Move current_speed debugfs file to mana_init_port()
From: Simon Horman @ 2026-04-12 12:52 UTC (permalink / raw)
To: Erni Sri Satya Vennela
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, ssengar, dipayanroy, gargaditya,
shradhagupta, kees, kotaranov, yury.norov, linux-hyperv, netdev,
linux-kernel
In-Reply-To: <20260408081224.302308-3-ernis@linux.microsoft.com>
On Wed, Apr 08, 2026 at 01:12:20AM -0700, Erni Sri Satya Vennela wrote:
> Move the current_speed debugfs file creation from mana_probe_port() to
> mana_init_port(). The file was previously created only during initial
> probe, but mana_cleanup_port_context() removes the entire vPort debugfs
> directory during detach/attach cycles. Since mana_init_port() recreates
> the directory on re-attach, moving current_speed here ensures it survives
> these cycles.
>
> Fixes: 75cabb46935b ("net: mana: Add support for net_shaper_ops")
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* Re: [PATCH net 1/2] net: mana: Use pci_name() for debugfs directory naming
From: Simon Horman @ 2026-04-12 12:52 UTC (permalink / raw)
To: Erni Sri Satya Vennela
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, ssengar, dipayanroy, gargaditya,
shradhagupta, kees, kotaranov, yury.norov, linux-hyperv, netdev,
linux-kernel
In-Reply-To: <20260408081224.302308-2-ernis@linux.microsoft.com>
On Wed, Apr 08, 2026 at 01:12:19AM -0700, Erni Sri Satya Vennela wrote:
> Use pci_name(pdev) for the per-device debugfs directory instead of
> hardcoded "0" for PFs and pci_slot_name(pdev->slot) for VFs. The
> previous approach had two issues:
>
> 1. pci_slot_name() dereferences pdev->slot, which can be NULL for VFs
> in environments like generic VFIO passthrough or nested KVM,
> causing a NULL pointer dereference.
>
> 2. Multiple PFs would all use "0", and VFs across different PCI
> domains or buses could share the same slot name, leading to
> -EEXIST errors from debugfs_create_dir().
>
> pci_name(pdev) returns the unique BDF address, is always valid, and is
> unique across the system.
>
> Fixes: 6607c17c6c5e ("net: mana: Enable debugfs files for MANA device")
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* Re: [PATCH net-next v6] net: mana: Expose hardware diagnostic info via debugfs
From: Simon Horman @ 2026-04-12 12:49 UTC (permalink / raw)
To: Erni Sri Satya Vennela
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, kotaranov, shradhagupta, shirazsaleem,
yury.norov, kees, ssengar, dipayanroy, gargaditya, linux-hyperv,
netdev, linux-kernel, linux-rdma
In-Reply-To: <20260408081555.302620-1-ernis@linux.microsoft.com>
On Wed, Apr 08, 2026 at 01:15:46AM -0700, Erni Sri Satya Vennela wrote:
> Add debugfs entries to expose hardware configuration and diagnostic
> information that aids in debugging driver initialization and runtime
> operations without adding noise to dmesg.
>
> The debugfs directory for each PCI device is named using pci_name()
> (the unique BDF address), and its creation and removal is integrated
> into mana_gd_setup() and mana_gd_cleanup_device() respectively, so
> that all callers (probe, remove, suspend, resume, shutdown) share a
> single code path.
>
> Device-level entries (under /sys/kernel/debug/mana/<BDF>/):
> - num_msix_usable, max_num_queues: Max resources from hardware
> - gdma_protocol_ver, pf_cap_flags1: VF version negotiation results
> - num_vports, bm_hostmode: Device configuration
>
> Per-vPort entries (under /sys/kernel/debug/mana/<BDF>/vportN/):
> - port_handle: Hardware vPort handle
> - max_sq, max_rq: Max queues from vPort config
> - indir_table_sz: Indirection table size
> - steer_rx, steer_rss, steer_update_tab, steer_cqe_coalescing:
> Last applied steering configuration parameters
>
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> ---
> This patch depends on the following fixes submitted to net:
> - "net: mana: Use pci_name() for debugfs directory naming"
> - "net: mana: Move current_speed debugfs file to mana_init_port()"
> Conflict resolution may be needed when net merges into net-next.
Unfortunately this patch doesn't apply to net-next,
which is a requirement for our workflow.
--
pw-bot: changes-requested
^ permalink raw reply
* Re: [PATCH nf] netfilter: arp_tables: fix IEEE1394 ARP payload parsing in arp_packet_match()
From: Simon Horman @ 2026-04-12 12:47 UTC (permalink / raw)
To: Weiming Shi
Cc: Pablo Neira Ayuso, Florian Westphal, David S . Miller,
David Ahern, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Phil Sutter, netfilter-devel, coreteam, netdev, Xiang Mei
In-Reply-To: <20260408073515.79296-2-bestswngs@gmail.com>
On Wed, Apr 08, 2026 at 03:35:16PM +0800, Weiming Shi wrote:
> arp_packet_match() unconditionally parses the ARP payload assuming two
> hardware addresses are present (source and target). However,
> IPv4-over-IEEE1394 ARP (RFC 2734) omits the target hardware address
> field, and arp_hdr_len() already accounts for this by returning a
> shorter length for ARPHRD_IEEE1394 devices.
>
> As a result, on IEEE1394 interfaces arp_packet_match() advances past a
> nonexistent target hardware address and reads the wrong bytes for both
> the target device address comparison and the target IP address. This
> causes arptables rules to match against garbage data, leading to
> incorrect filtering decisions: packets that should be accepted may be
> dropped and vice versa.
>
> The ARP stack in net/ipv4/arp.c (arp_create and arp_process) already
> handles this correctly by skipping the target hardware address for
> ARPHRD_IEEE1394. Apply the same pattern to arp_packet_match().
>
> Fixes: 6752c8db8e0c ("firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection.")
> Reported-by: Xiang Mei <xmei5@asu.edu>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> ---
> net/ipv4/netfilter/arp_tables.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
> index 1cdd9c28ab2da..4b2392bdcd0a6 100644
> --- a/net/ipv4/netfilter/arp_tables.c
> +++ b/net/ipv4/netfilter/arp_tables.c
> @@ -86,7 +86,7 @@ static inline int arp_packet_match(const struct arphdr *arphdr,
> const struct arpt_arp *arpinfo)
> {
> const char *arpptr = (char *)(arphdr + 1);
> - const char *src_devaddr, *tgt_devaddr;
> + const char *src_devaddr, *tgt_devaddr = NULL;
I think that it's more in keeping with Kernel code practices
to set tgt_devaddr conditionally.
> __be32 src_ipaddr, tgt_ipaddr;
> long ret;
>
> @@ -110,13 +110,23 @@ static inline int arp_packet_match(const struct arphdr *arphdr,
> arpptr += dev->addr_len;
> memcpy(&src_ipaddr, arpptr, sizeof(u32));
> arpptr += sizeof(u32);
> - tgt_devaddr = arpptr;
> - arpptr += dev->addr_len;
> + switch (dev->type) {
> +#if IS_ENABLED(CONFIG_FIREWIRE_NET)
> + case ARPHRD_IEEE1394:
> + break;
> +#endif
> + default:
> + tgt_devaddr = arpptr;
> + arpptr += dev->addr_len;
> + break;
> + }
While I acknowledge this isn't the approach taken in arp_hdr_len()
I think it would be nicer to use the following construction
which will give build coverage to all paths regardless of if
CONFIG_FIREWIRE_NET is set or not.
if (IS_ENABLED(CONFIG_FIREWIRE_NET) && dev->type == ARPHRD_IEEE1394) {
tgt_devaddr = NULL;
} else {
tgt_devaddr = arpptr;
arpptr += dev->addr_len;
}
Also, I would include a blank line before the if condition.
> memcpy(&tgt_ipaddr, arpptr, sizeof(u32));
>
> if (NF_INVF(arpinfo, ARPT_INV_SRCDEVADDR,
> arp_devaddr_compare(&arpinfo->src_devaddr, src_devaddr,
> - dev->addr_len)) ||
> + dev->addr_len)))
> + return 0;
> + if (tgt_devaddr &&
> NF_INVF(arpinfo, ARPT_INV_TGTDEVADDR,
> arp_devaddr_compare(&arpinfo->tgt_devaddr, tgt_devaddr,
> dev->addr_len)))
> --
> 2.43.0
>
^ permalink raw reply
* [PATCH] RDS: Fix memory leak in rds_rdma_extra_size()
From: Xiaobo Liu @ 2026-04-12 12:44 UTC (permalink / raw)
To: Allison Henderson, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Simon Horman, netdev, linux-rdma, rds-devel, linux-kernel,
Xiaobo Liu
Free iov->iov when copy_from_user() or page count validation fails in rds_rdma_extra_size().
This preserves the existing success path and avoids leaking the allocated iovec array on error.
---
net/rds/rdma.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index aa6465dc7..91a20c1e2 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -560,6 +560,7 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
struct rds_iovec *vec;
struct rds_iovec __user *local_vec;
int tot_pages = 0;
+ int ret = 0;
unsigned int nr_pages;
unsigned int i;
@@ -578,16 +579,20 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
vec = &iov->iov[0];
if (copy_from_user(vec, local_vec, args->nr_local *
- sizeof(struct rds_iovec)))
- return -EFAULT;
+ sizeof(struct rds_iovec))) {
+ ret = -EFAULT;
+ goto out;
+ }
iov->len = args->nr_local;
/* figure out the number of pages in the vector */
for (i = 0; i < args->nr_local; i++, vec++) {
nr_pages = rds_pages_in_vec(vec);
- if (nr_pages == 0)
- return -EINVAL;
+ if (nr_pages == 0) {
+ ret = -EINVAL;
+ goto out;
+ }
tot_pages += nr_pages;
@@ -595,11 +600,20 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
* nr_pages for one entry is limited to (UINT_MAX>>PAGE_SHIFT)+1,
* so tot_pages cannot overflow without first going negative.
*/
- if (tot_pages < 0)
- return -EINVAL;
+ if (tot_pages < 0) {
+ ret = -EINVAL;
+ goto out;
+ }
}
- return tot_pages * sizeof(struct scatterlist);
+ ret = tot_pages * sizeof(struct scatterlist);
+
+out:
+ if (ret < 0) {
+ kfree(iov->iov);
+ iov->iov = NULL;
+ }
+ return ret;
}
/*
--
2.34.1
^ permalink raw reply related
* RE: [PATCH net-next] net: phy: call phy_init_hw() in phy resume path
From: Biju Das @ 2026-04-12 12:05 UTC (permalink / raw)
To: Russell King, Andrew Lunn
Cc: biju.das.au, Heiner Kallweit, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Ovidiu Panait,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Geert Uytterhoeven, Prabhakar Mahadev Lad,
linux-renesas-soc@vger.kernel.org
In-Reply-To: <adp6-wElGOOijZRG@shell.armlinux.org.uk>
Hi Russell King,
> -----Original Message-----
> From: Russell King <linux@armlinux.org.uk>
> Sent: 11 April 2026 17:47
> Subject: Re: [PATCH net-next] net: phy: call phy_init_hw() in phy resume path
>
> On Sat, Apr 11, 2026 at 03:50:13PM +0200, Andrew Lunn wrote:
> > > So, I question whether any of the functions in this driver actually
> > > have a valid reason to take phydev->lock - looks to me like a not
> > > very well written driver.
> > >
> > > In cases like this, I don't think we should make things more
> > > difficult in the core just because we have a lockdep splat when that
> > > can be avoided by killing off unnecessary locking.
> >
> > Agreed. This patchset should cleanup these locks.
> >
> > We also need to look at lan937x_dsp_workaround(). I also don't see
> > what that mutex lock/unlock is protecting. Accessing bank registers
> > need to be protected, so doing one additional access within that
> > should not need additional protection.
>
> Looking at access_ereg(), shouldn't it be taking the MDIO bus lock and using the __phy_* accessors
> anyway because it's writing various registers which determine what is being read via the
> LAN87XX_EXT_REG_RD_DATA register or the value written via the LAN87XX_EXT_REG_WR_DATA register.
>
> Also, as it has access_ereg_modify_changed(), that entire sequence needs to take the MDIO bus lock to
> safely do the read-modify-write.
>
> Then there's lan87xx_config_rgmii_delay() which is a large open coded read-modify-write for the
> PHYACC_ATTR_BANK_MISC, LAN87XX_CTRL_1 register.
>
> To me, this looks like a racy driver, and it also looks like it's using the wrong lock to try and
> protect hardware accesses.
OK, will replace it with MDIO bus lock.
Cheers,
Biju
^ permalink raw reply
* RE: [PATCH next-next] net: phy: mscc: Drop redundant phydev->lock
From: Biju Das @ 2026-04-12 12:04 UTC (permalink / raw)
To: Andrew Lunn, biju.das.au
Cc: Heiner Kallweit, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Russell King, Prabhakar Mahadev Lad, Horatiu Vultur,
Vladimir Oltean, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Geert Uytterhoeven,
linux-renesas-soc@vger.kernel.org
In-Reply-To: <bcacb8ae-f35d-49fc-91a8-995387dc07ca@lunn.ch>
Hi Andrew,
> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: 11 April 2026 21:45
> Subject: Re: [PATCH next-next] net: phy: mscc: Drop redundant phydev->lock
>
> On Sat, Apr 11, 2026 at 04:49:56PM +0100, Biju wrote:
> > From: Biju Das <biju.das.jz@bp.renesas.com>
> >
> > Remove manual mutex_lock/unlock(&phydev->lock) calls from several
> > functions in the MSCC PHY driver, as the PHY core already holds this
> > lock when invoking these callbacks.
> >
> > The affected functions are:
> >
> > vsc85xx_edge_rate_cntl_set() — lock/unlock around phy_modify_paged()
> > vsc85xx_mac_if_set() — lock/unlock with a goto out_unlock error path
> > vsc8531_pre_init_seq_set() — lock/unlock around
> > phy_select/restore_page()
> > vsc85xx_eee_init_seq_set() — lock/unlock around
> > phy_select/restore_page()
> >
> > Along with dropping the locks, error-path labels are renamed from
> > out_unlock to err or restore_oldpage to better reflect their purpose
> > now that no unlocking is performed. In vsc8531_pre_init_seq_set() and
> > vsc85xx_eee_init_seq_set(), the redundant intermediate assignment of
> > oldpage before returning is also eliminated.
> >
> > No functional change intended.
>
> This patch needs to be sent as part of the patchset with your other change. The order they get merged
> matters, otherwise a git bisect could land on a deadlock.
OK.
Cheers,
Biju
^ permalink raw reply
* Aw: [RFC net-next v5 0/3] Add RSS and LRO support
From: Frank Wunderlich @ 2026-04-12 11:57 UTC (permalink / raw)
To: linux, nbd, sean.wang, lorenzo, andrew+netdev, davem, edumazet,
kuba, pabeni, matthias.bgg, angelogioacchino.delregno, linux
Cc: daniel, netdev, linux-kernel, linux-arm-kernel, linux-mediatek
In-Reply-To: <20251219151219.77115-1-linux@fw-web.de>
Hi,
some time has passed without a single comment, so i just send a friendly reminder ;)
regards Frank
> Gesendet: Freitag, 19. Dezember 2025 um 16:12
> Von: "Frank Wunderlich" <linux@fw-web.de>
> An: "Felix Fietkau" <nbd@nbd.name>, "Sean Wang" <sean.wang@mediatek.com>, "Lorenzo Bianconi" <lorenzo@kernel.org>, "Andrew Lunn" <andrew+netdev@lunn.ch>, "David S. Miller" <davem@davemloft.net>, "Eric Dumazet" <edumazet@google.com>, "Jakub Kicinski" <kuba@kernel.org>, "Paolo Abeni" <pabeni@redhat.com>, "Matthias Brugger" <matthias.bgg@gmail.com>, "AngeloGioacchino Del Regno" <angelogioacchino.delregno@collabora.com>, "Russell King" <linux@armlinux.org.uk>
> CC: "Frank Wunderlich" <frank-w@public-files.de>, "Daniel Golle" <daniel@makrotopia.org>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org
> Betreff: [RFC net-next v5 0/3] Add RSS and LRO support
>
> From: Frank Wunderlich <frank-w@public-files.de>
>
> This series add RSS and LRO hardware acceleration for terminating
> traffic on MT798x.
>
> It is currently only for discussion to get the upported SDK driver
> changes in a good shape.
>
> patches are upported from mtk SDK:
> - https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/master/files/target/linux/mediatek/patches-6.12/999-eth-08-mtk_eth_soc-add-register-definitions-for-rss-lro-reg.patch
> - https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/master/files/target/linux/mediatek/patches-6.12/999-eth-09-mtk_eth_soc-add-rss-support.patch
> - https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/master/files/target/linux/mediatek/patches-6.12/999-eth-10-mtk_eth_soc-add-hw-lro-support.patch
> with additional fixes
>
> changes:
> v5:
> - fix too long lines after macro changes reported by checkpatch
>
> v4:
> - drop unrelated file
> - rss-changes suggested by andrew
> - fix MTK_HW_LRO_RING_NUM macro (add eth)
> - fix MTK_LRO_CTRL_DW[123]_CFG (add reg_map param)
> - fix MTK_RX_DONE_INT (add eth param)
> - fix lro reverse christmas tree and LRO params suggested by andrew
> - drop mtk_hwlro_stats_ebl and unused IS_HW_LRO_RING (only used in
> properitary debugfs)
>
> v3:
> - readded the change dropped in v2 because it was a fix
> for getting RSS working on mt7986
> - changes requested by jakub
> - reworked coverletter (dropped instructions for configuration)
> - name all PDMA-IRQ the same way
> - retested on
> - BPI-R3/mt7986 (RSS needs to be enabled)
> - BPI-R4/mt7988
> - BPI-R64/mt7622 and BPI-R2/mt7623 for not breaking network functionality
>
> v2:
> - drop wrong change (MTK_CDMP_IG_CTRL is only netsys v1)
> - Fix immutable string IRQ setup (thx to Emilia Schotte)
> - drop links to 6.6 patches/commits in sdk in comments
>
> Mason Chang (3):
> net: ethernet: mtk_eth_soc: Add register definitions for RSS and LRO
> net: ethernet: mtk_eth_soc: Add RSS support
> net: ethernet: mtk_eth_soc: Add LRO support
>
> drivers/net/ethernet/mediatek/mtk_eth_soc.c | 812 ++++++++++++++++----
> drivers/net/ethernet/mediatek/mtk_eth_soc.h | 173 +++--
> 2 files changed, 778 insertions(+), 207 deletions(-)
>
> --
> 2.43.0</frank-w@public-files.de>
^ permalink raw reply
* Re: [PATCH net-next v2] r8169: Use napi_schedule_irqoff()
From: Heiner Kallweit @ 2026-04-12 11:30 UTC (permalink / raw)
To: Matt Vollrath, netdev; +Cc: edumazet, pabeni, kuba, andrew+netdev, nic_swsd
In-Reply-To: <20260412014031.525061-1-tactii@gmail.com>
On 12.04.2026 03:40, Matt Vollrath wrote:
> napi_schedule() masks hard interrupts while doing its work, which is
> redundant when called from an interrupt handler where hard interrupts
> are already masked. Use napi_schedule_irqoff() instead to bypass this
> redundant masking. This is an optimization.
>
> Tested on a Lenovo RTL8168h/8111h.
>
> Signed-off-by: Matt Vollrath <tactii@gmail.com>
> ---
> drivers/net/ethernet/realtek/r8169_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 791277e750ba..4c0ad0de3410 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -4873,7 +4873,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
> phy_mac_interrupt(tp->phydev);
>
> rtl_irq_disable(tp);
> - napi_schedule(&tp->napi);
> + napi_schedule_irqoff(&tp->napi);
> out:
> rtl_ack_events(tp, status);
>
Not using napi_schedule_irqoff() here is intentional,
see 2734a24e6e5d18522fbf599135c59b82ec9b2c9e.
^ permalink raw reply
* [PATCH ipsec-next v7 00/14] xfrm: XFRM_MSG_MIGRATE_STATE new netlink message
From: Antony Antony @ 2026-04-12 11:13 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
The current XFRM_MSG_MIGRATE interface is tightly coupled to policy and
SA migration, and it lacks the information required to reliably migrate
individual SAs. This makes it unsuitable for IKEv2 deployments,
dual-stack setups (IPv4/IPv6), and scenarios where policies are managed
externally (e.g., by daemons other than the IKE daemon).
Mandatory SA selector list
The current API requires a non-empty SA selector list, which does not
reflect the IKEv2 use case.
A single Child SA may correspond to multiple policies,
and SA discovery already occurs via address and reqid matching. With
dual-stack Child SAs this leads to excessive churn: the current method
would have to be called up to six times (in/out/fwd × v4/v6) on SA,
while the new method only requires two calls.
Selectors lack SPI (and marks)
XFRM_MSG_MIGRATE cannot uniquely identify an SA when multiple SAs share
the same policies (per-CPU SAs, SELinux label-based SAs, etc.). Without
the SPI, the kernel may update the wrong SA instance.
Reqid cannot be changed
Some implementations allocate reqids based on traffic selectors. In
host-to-host or selector-changing scenarios, the reqid must change,
which the current API cannot express.
Because strongSwan and other implementations manage policies
independently of the kernel, an interface that updates only a specific
SA - with complete and unambiguous identification - is required.
SA Selector, x->sel, can't be changed, especially Transport mode.
XFRM_MSG_MIGRATE_STATE provides that interface. It supports migration
of a single SA via xfrm_usersa_id (including SPI) and we fix
encap removal in this patch set, reqid updates, address changes,
and other SA-specific parameters. It avoids the structural limitations
of XFRM_MSG_MIGRATE and provides a simpler, extensible mechanism for
precise per-SA migration without involving policies.
This method also allows migtrating SA selectors typically used with
host-to-host in Transport mode.
New migration steps: first install block policy, remove the old policy,
call XFRM_MSG_MIGRATE_STATE for each state, then re-install the
policies and remove the block policy.
If the target SA tuple (daddr, SPI, proto, family) is already
occupied, the operation returns -EEXIST. In this case the original
SA is not preserved. Userspace must handle -EEXIST by
re-establishing the SA at the IKE level and manage policies.
---
v6->v7: - add SA selectoor migration
- fixes to commit messages
- white space removal
Link to v6: https://lore.kernel.org/r/migrate-state-v6-0-9df9764ddb9e@secunet.com
v5->v6: - add mark to look up SA.
- restrict netlink attributes in new method
- address review feedback from Sabrina
- add new patch to fix existing inter-family address comparison
- add extack xfrm_state_init()
- Feedback from Yan : omit-to-inherit add migrating marks
- Drop missing __rcu annotation on nlsk, Sabrina has a better patch
Link to v5: https://lore.kernel.org/all/cover.1769509130.git.antony.antony@secunet.com/
v4->v5: add synchronize after migrate and delete it inside a lock
- split xfrm_state_migrate into create and install functions
Link to v4: https://lore.kernel.org/all/cover.1768811736.git.antony.antony@secunet.com/
v3->v4: add patch to fix pre-existing missing __rcu annotation on nlsk
v2->v3: - fix commit message formatting
v1->v2: dropped 6/6. That check is already there where the func is called
- merged patch 4/6 and 5/6, to fix use uninitialized value
- fix commit messages
---
Antony Antony (14):
xfrm: remove redundant assignments
xfrm: add extack to xfrm_init_state
xfrm: allow migration from UDP encapsulated to non-encapsulated ESP
xfrm: fix NAT-related field inheritance in SA migration
xfrm: rename reqid in xfrm_migrate
xfrm: split xfrm_state_migrate into create and install functions
xfrm: check family before comparing addresses in migrate
xfrm: add state synchronization after migration
xfrm: add error messages to state migration
xfrm: move encap and xuo into struct xfrm_migrate
xfrm: refactor XFRMA_MTIMER_THRESH validation into a helper
xfrm: add XFRM_MSG_MIGRATE_STATE for single SA migration
xfrm: restrict netlink attributes for XFRM_MSG_MIGRATE_STATE
xfrm: add documentation for XFRM_MSG_MIGRATE_STATE
Documentation/networking/xfrm/index.rst | 1 +
.../networking/xfrm/xfrm_migrate_state.rst | 230 ++++++++++++++
include/net/xfrm.h | 78 ++++-
include/uapi/linux/xfrm.h | 21 ++
net/ipv4/ipcomp.c | 2 +-
net/ipv6/ipcomp6.c | 2 +-
net/key/af_key.c | 12 +-
net/xfrm/xfrm_device.c | 2 +-
net/xfrm/xfrm_policy.c | 27 +-
net/xfrm/xfrm_state.c | 144 +++++----
net/xfrm/xfrm_user.c | 344 ++++++++++++++++++++-
security/selinux/nlmsgtab.c | 3 +-
12 files changed, 769 insertions(+), 97 deletions(-)
---
base-commit: be14d13625c9b070c33c423026b598ed65695225
change-id: migrate-state-063ee0342680
Best regards,
--
Antony Antony <antony.antony@secunet.com>
^ permalink raw reply
* [PATCH ipsec-next v7 14/14] xfrm: add documentation for XFRM_MSG_MIGRATE_STATE
From: Antony Antony @ 2026-04-12 11:16 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
Add documentation for the new XFRM_MSG_MIGRATE_STATE netlink message,
which migrates a single SA identified by SPI and mark without involving
policies.
The document covers the motivation and design differences from the
existing XFRM_MSG_MIGRATE, the SA lookup mechanism, supported attributes
with their omit-to-inherit semantics, and usage examples.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v6->v7: update docs to reflect the flags
v5->v6: added this patch
---
Documentation/networking/xfrm/index.rst | 1 +
.../networking/xfrm/xfrm_migrate_state.rst | 230 +++++++++++++++++++++
2 files changed, 231 insertions(+)
diff --git a/Documentation/networking/xfrm/index.rst b/Documentation/networking/xfrm/index.rst
index 7d866da836fe..90191848f8db 100644
--- a/Documentation/networking/xfrm/index.rst
+++ b/Documentation/networking/xfrm/index.rst
@@ -9,5 +9,6 @@ XFRM Framework
xfrm_device
xfrm_proc
+ xfrm_migrate_state
xfrm_sync
xfrm_sysctl
diff --git a/Documentation/networking/xfrm/xfrm_migrate_state.rst b/Documentation/networking/xfrm/xfrm_migrate_state.rst
new file mode 100644
index 000000000000..1e0d77f0e043
--- /dev/null
+++ b/Documentation/networking/xfrm/xfrm_migrate_state.rst
@@ -0,0 +1,230 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+XFRM SA Migrate State
+=====================
+
+Overview
+========
+
+``XFRM_MSG_MIGRATE_STATE`` migrates a single SA, looked up using SPI and
+mark, without involving policies. Unlike ``XFRM_MSG_MIGRATE``, which couples
+SA and policy migration and allows migrating multiple SAs in one call, this
+interface identifies the SA unambiguously via SPI and supports changing
+the reqid, addresses, encapsulation, selector, and offload.
+
+Because IKE daemons such as *wan manage policies independently of
+the kernel, this interface allows precise per-SA migration without
+requiring policy involvement. Optional XFRM attributes follow an
+omit-to-inherit model: omitting an attribute preserves the value from
+the old SA. Hardware offload is an exception. It is inherited by default
+but can be disabled with the ``XFRM_MIGRATE_STATE_NO_OFFLOAD``
+flag or set to a new offload configuration with the
+``XFRMA_OFFLOAD_DEV`` attribute.
+
+SA Identification
+=================
+
+The struct is defined in ``include/uapi/linux/xfrm.h``. The SA is looked
+up using ``xfrm_state_lookup()`` with ``id.spi``,
+``id.daddr``, ``id.proto``, ``id.family``, and
+``old_mark.v & old_mark.m`` as the mark key::
+
+ struct xfrm_user_migrate_state {
+ struct xfrm_usersa_id id; /* spi, daddr, proto, family */
+ xfrm_address_t new_daddr;
+ xfrm_address_t new_saddr;
+ struct xfrm_mark old_mark; /* SA lookup: key = v & m */
+ struct xfrm_selector new_sel; /* new selector (see Flags) */
+ __u32 new_reqid;
+ __u32 flags; /* XFRM_MIGRATE_STATE_* */
+ __u16 new_family;
+ __u16 reserved;
+ };
+
+Supported Attributes
+====================
+
+The following fields in ``xfrm_user_migrate_state`` are always explicit
+and are not inherited from the existing SA. Passing zero is not equivalent
+to "keep unchanged" — zero is used as-is:
+
+- ``new_daddr`` - new destination address
+- ``new_saddr`` - new source address
+- ``new_family`` - new address family
+- ``new_reqid`` - new reqid (0 = no reqid)
+- ``new_sel`` - new selector; used when ``XFRM_MIGRATE_STATE_UPDATE_SEL`` is
+ not set (see `Flags`_ below)
+- ``flags`` - bitmask of ``XFRM_MIGRATE_STATE_*`` flags (see `Flags`_ below)
+
+The following netlink attributes are also accepted. Omitting an attribute
+inherits the value from the existing SA (omit-to-inherit).
+
+.. list-table::
+ :widths: 30 70
+ :header-rows: 1
+
+ * - Attribute
+ - Description
+ * - ``XFRMA_MARK``
+ - Mark on the migrated SA (``struct xfrm_mark``). Absent inherits
+ ``old_mark``. To use no mark on the new SA, send ``XFRMA_MARK``
+ with ``{0, 0}``.
+ * - ``XFRMA_ENCAP``
+ - UDP encapsulation template; only ``UDP_ENCAP_ESPINUDP`` is supported.
+ Set ``encap_type=0`` to remove encap.
+ * - ``XFRMA_OFFLOAD_DEV``
+ - Hardware offload configuration (``struct xfrm_user_offload``). Absent
+ copies offload from the existing SA. When
+ ``XFRM_MIGRATE_STATE_NO_OFFLOAD`` is set in ``flags``, the new SA has
+ no offload; this flag is mutually exclusive with ``XFRMA_OFFLOAD_DEV``
+ and sending both returns ``-EINVAL``.
+ * - ``XFRMA_SET_MARK``
+ - Output mark on the migrated SA; pair with ``XFRMA_SET_MARK_MASK``.
+ Send 0 to clear.
+ * - ``XFRMA_NAT_KEEPALIVE_INTERVAL``
+ - NAT keepalive interval in seconds. Requires encap. Send 0 to clear.
+ Automatically cleared when encap is removed; setting a non-zero
+ value without encap returns ``-EINVAL``.
+ * - ``XFRMA_MTIMER_THRESH``
+ - Mapping maxage threshold. Requires encap. Send 0 to clear.
+ Automatically cleared when encap is removed; setting a non-zero
+ value without encap returns ``-EINVAL``.
+
+The following SA properties are immutable and cannot be changed via
+``XFRM_MSG_MIGRATE_STATE``: algorithms (``XFRMA_ALG_*``), replay state,
+direction (``XFRMA_SA_DIR``), and security context (``XFRMA_SEC_CTX``).
+
+Flags
+=====
+
+The ``flags`` field in ``xfrm_user_migrate_state`` controls optional
+migration behaviour. Unknown flag bits are rejected with ``-EINVAL``.
+
+.. list-table::
+ :widths: 40 60
+ :header-rows: 1
+
+ * - Flag
+ - Description
+ * - ``XFRM_MIGRATE_STATE_NO_OFFLOAD``
+ - When set, the new SA has no hardware offload even when
+ ``XFRMA_OFFLOAD_DEV`` is absent. Without this flag, omitting
+ ``XFRMA_OFFLOAD_DEV`` copies the existing offload to the new SA.
+ Mutually exclusive with ``XFRMA_OFFLOAD_DEV``; sending both
+ returns ``-EINVAL``.
+ * - ``XFRM_MIGRATE_STATE_UPDATE_SEL``
+ - When set, the kernel validates that the existing SA selector is a
+ single-host entry matching the SA addresses (``prefixlen_s ==
+ prefixlen_d`` equal to 32 for IPv4 or 128 for IPv6, and addresses
+ matching ``id.daddr`` and ``props.saddr``). If the check passes,
+ the new selector is derived from ``new_daddr`` and ``new_saddr``
+ with the single-host mask for ``new_family``. A mismatch returns
+ ``-EINVAL``. When this flag is not set, ``new_sel`` is used as-is
+ for the migrated SA.
+
+Migration Steps
+===============
+
+#. Install a block policy to drop traffic on the affected selector.
+#. Remove the old policy.
+#. Call ``XFRM_MSG_MIGRATE_STATE`` for each SA.
+#. Reinstall the policies.
+#. Remove the block policy.
+
+Block Policy and IV Safety
+--------------------------
+
+Installing a block policy before migration is required to prevent
+traffic leaks and IV reuse.
+
+AES-GCM IV uniqueness is critical: reusing a (key, IV) pair allows
+an attacker to recover the authentication subkey and forge
+authentication tags, breaking both confidentiality and integrity.
+
+``XFRM_MSG_MIGRATE_STATE`` atomically copies the sequence number and
+replay window from the old SA to the new SA and deletes the old SA.
+The block policy ensures no outgoing packets are sent in the migration
+window, preventing IV reuse under the same key.
+
+Feature Detection
+=================
+
+Userspace can probe for kernel support by sending a minimal
+``XFRM_MSG_MIGRATE_STATE`` message with a non-existent SPI:
+
+- ``-ENOPROTOOPT``: not supported (``CONFIG_XFRM_MIGRATE`` not enabled)
+- any other error: supported
+
+Userspace Notification on Success
+=================================
+
+On successful migration the kernel multicasts an
+``XFRM_MSG_MIGRATE_STATE`` message to the ``XFRMNLGRP_MIGRATE`` group.
+The fixed header is ``struct xfrm_user_migrate_state`` copied from the
+request, followed by the same set of netlink attributes that are
+accepted as input, with the differences noted below.
+
+Differences from the request
+-----------------------------
+
+.. list-table::
+ :widths: 25 75
+ :header-rows: 1
+
+ * - Field / Attribute
+ - Difference
+ * - ``new_sel``
+ - Contains the actual selector of the newly installed SA, not the
+ ``new_sel`` from the request. When
+ ``XFRM_MIGRATE_STATE_UPDATE_SEL`` is set the kernel derives the
+ selector from ``new_daddr`` / ``new_saddr``; the caller's
+ ``new_sel`` field is ignored in that case. The notification
+ always carries the real selector of the new SA.
+ * - ``XFRMA_SA_DIR``
+ - Present in the notification (set from the direction of the new
+ SA) but **not accepted as input** — direction is immutable.
+ * - ``flags``
+ - Echoed back as-is. ``XFRM_MIGRATE_STATE_NO_OFFLOAD`` and
+ ``XFRM_MIGRATE_STATE_UPDATE_SEL`` describe the request that was
+ made, not a property of the resulting SA.
+
+Attributes in the notification
+-------------------------------
+
+.. list-table::
+ :widths: 30 70
+ :header-rows: 1
+
+ * - Attribute
+ - Description
+ * - ``XFRMA_ENCAP``
+ - UDP encapsulation template, if configured on the new SA.
+ * - ``XFRMA_OFFLOAD_DEV``
+ - Hardware offload configuration, if active on the new SA.
+ * - ``XFRMA_MARK``
+ - Mark on the new SA, if set.
+ * - ``XFRMA_SET_MARK``
+ - Output mark on the new SA, if set.
+ * - ``XFRMA_SET_MARK_MASK``
+ - Output mark mask, present together with ``XFRMA_SET_MARK``.
+ * - ``XFRMA_MTIMER_THRESH``
+ - Mapping maxage threshold, if non-zero.
+ * - ``XFRMA_NAT_KEEPALIVE_INTERVAL``
+ - NAT keepalive interval, if non-zero.
+ * - ``XFRMA_SA_DIR``
+ - Direction of the new SA.
+
+Error Handling
+==============
+
+If the target SA tuple (daddr, SPI, proto, family) is occupied by an existing
+unrelated SA, the operation returns ``-EEXIST``. In this case both the old and
+the new SA are gone. The old SA cannot be restored as doing so would risk
+duplicate sequence number and IV reuse, which must not occur. Userspace should
+handle ``-EEXIST``, for example by re-establishing the SA at the IKE level.
+
+If the multicast notification (``XFRMNLGRP_MIGRATE``) fails to send,
+the migration itself has already completed successfully and the new SA
+is installed. The operation returns success, 0, with an extack warning,
+but listeners will not receive the migration event.
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 13/14] xfrm: restrict netlink attributes for XFRM_MSG_MIGRATE_STATE
From: Antony Antony @ 2026-04-12 11:16 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
Only accept XFRMA used in this method, reject the rest.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v5->v6: added this patch
---
net/xfrm/xfrm_user.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 46e506548122..441e6b1fed10 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -3721,6 +3721,30 @@ static int xfrm_reject_unused_attr(int type, struct nlattr **attrs,
}
}
+ if (type == XFRM_MSG_MIGRATE_STATE) {
+ int i;
+
+ for (i = 0; i <= XFRMA_MAX; i++) {
+ if (!attrs[i])
+ continue;
+
+ switch (i) {
+ case XFRMA_MARK:
+ case XFRMA_ENCAP:
+ case XFRMA_OFFLOAD_DEV:
+ case XFRMA_SET_MARK:
+ case XFRMA_SET_MARK_MASK:
+ case XFRMA_MTIMER_THRESH:
+ case XFRMA_NAT_KEEPALIVE_INTERVAL:
+ break;
+ default:
+ NL_SET_ERR_MSG_ATTR(extack, attrs[i],
+ "Unsupported attribute in XFRM_MSG_MIGRATE_STATE");
+ return -EINVAL;
+ }
+ }
+ }
+
return 0;
}
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 12/14] xfrm: add XFRM_MSG_MIGRATE_STATE for single SA migration
From: Antony Antony @ 2026-04-12 11:16 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
Add a new netlink method to migrate a single xfrm_state.
Unlike the existing migration mechanism (SA + policy), this
supports migrating only the SA and allows changing the reqid.
The SA is looked up via xfrm_usersa_id, which uniquely
identifies it, so old_saddr is not needed. old_daddr is carried in
xfrm_usersa_id.daddr.
The reqid is invariant in the old migration.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v6->v7: - add flags field to xfrm_user_migrate_state (based on Sabrina's feedback)
- add XFRM_MIGRATE_STATE_NO_OFFLOAD (bit 0): suppresses offload
- omit-to-inherit; mutually exclusive with XFRMA_OFFLOAD_DEV
- zero-initialize struct xfrm_migrate m[XFRM_MAX_DEPTH]
- add struct xfrm_selector new_sel to xfrm_user_migrate_state
- add XFRM_MIGRATE_STATE_UPDATE_SEL: derive new selector
from SA addresses when old selector is a single-host match
v5->v6: - (Feedback from Sabrina's review)
- reqid change: use xfrm_state_add, not xfrm_state_insert
- encap and xuo: use nla_data() directly, no kmemdup needed
- notification failure is non-fatal: set extack warning, return 0
- drop state direction, x->dir, check, not required
- reverse xmas tree local variable ordering
- use NL_SET_ERR_MSG_WEAK for clone failure message
- fix implicit padding in xfrm_user_migrate_state uapi struct
- support XFRMA_SET_MARK/XFRMA_SET_MARK_MASK in XFRM_MSG_MIGRATE_STATE
v4->v5: - set portid, seq in XFRM_MSG_MIGRATE_STATE netlink notification
- rename error label to out for clarity
- add locking and synchronize after cloning
- change some if(x) to if(!x) for clarity
- call __xfrm_state_delete() inside the lock
- return error from xfrm_send_migrate_state() instead of always returning 0
v3->v4: preserve reqid invariant for each state migrated
v2->v3: free the skb on the error path
v1->v2: merged next patch here to fix use uninitialized value
- removed unnecessary inline
- added const when possible
---
include/net/xfrm.h | 16 ++-
include/uapi/linux/xfrm.h | 21 ++++
net/xfrm/xfrm_device.c | 2 +-
net/xfrm/xfrm_policy.c | 19 +++
net/xfrm/xfrm_state.c | 29 +++--
net/xfrm/xfrm_user.c | 287 +++++++++++++++++++++++++++++++++++++++++++-
security/selinux/nlmsgtab.c | 3 +-
7 files changed, 363 insertions(+), 14 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 4b29ab92c2a7..e33e524cd909 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -684,12 +684,20 @@ struct xfrm_migrate {
xfrm_address_t new_saddr;
struct xfrm_encap_tmpl *encap;
struct xfrm_user_offload *xuo;
+ struct xfrm_mark old_mark;
+ struct xfrm_mark *new_mark;
+ struct xfrm_mark smark;
u8 proto;
u8 mode;
- u16 reserved;
+ u16 msg_type; /* XFRM_MSG_MIGRATE or XFRM_MSG_MIGRATE_STATE */
+ u32 flags;
u32 old_reqid;
+ u32 new_reqid;
+ u32 nat_keepalive_interval;
+ u32 mapping_maxage;
u16 old_family;
u16 new_family;
+ const struct xfrm_selector *new_sel;
};
#define XFRM_KM_TIMEOUT 30
@@ -2104,7 +2112,7 @@ void xfrm_dev_resume(struct sk_buff *skb);
void xfrm_dev_backlog(struct softnet_data *sd);
struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t features, bool *again);
int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
- struct xfrm_user_offload *xuo,
+ const struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack);
int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
struct xfrm_user_offload *xuo, u8 dir,
@@ -2175,7 +2183,9 @@ static inline struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_fea
return skb;
}
-static inline int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, struct xfrm_user_offload *xuo, struct netlink_ext_ack *extack)
+static inline int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
+ const struct xfrm_user_offload *xuo,
+ struct netlink_ext_ack *extack)
{
return 0;
}
diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h
index a23495c0e0a1..34d8ad5c4818 100644
--- a/include/uapi/linux/xfrm.h
+++ b/include/uapi/linux/xfrm.h
@@ -227,6 +227,9 @@ enum {
#define XFRM_MSG_SETDEFAULT XFRM_MSG_SETDEFAULT
XFRM_MSG_GETDEFAULT,
#define XFRM_MSG_GETDEFAULT XFRM_MSG_GETDEFAULT
+
+ XFRM_MSG_MIGRATE_STATE,
+#define XFRM_MSG_MIGRATE_STATE XFRM_MSG_MIGRATE_STATE
__XFRM_MSG_MAX
};
#define XFRM_MSG_MAX (__XFRM_MSG_MAX - 1)
@@ -507,6 +510,24 @@ struct xfrm_user_migrate {
__u16 new_family;
};
+struct xfrm_user_migrate_state {
+ struct xfrm_usersa_id id;
+ xfrm_address_t new_daddr;
+ xfrm_address_t new_saddr;
+ struct xfrm_mark old_mark;
+ struct xfrm_selector new_sel;
+ __u32 new_reqid;
+ __u32 flags;
+ __u16 new_family;
+ __u16 reserved;
+};
+
+/* Flags for xfrm_user_migrate_state.flags */
+enum xfrm_migrate_state_flags {
+ XFRM_MIGRATE_STATE_NO_OFFLOAD = 1, /* do not inherit offload from existing SA */
+ XFRM_MIGRATE_STATE_UPDATE_SEL = 2, /* update host-to-host selector from saddr and daddr */
+};
+
struct xfrm_user_mapping {
struct xfrm_usersa_id id;
__u32 reqid;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 52ae0e034d29..9d4c1addb98f 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -229,7 +229,7 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur
EXPORT_SYMBOL_GPL(validate_xmit_xfrm);
int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
- struct xfrm_user_offload *xuo,
+ const struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack)
{
int err;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 0b5c7b51183a..3d6c778d8645 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -4635,6 +4635,22 @@ static int xfrm_migrate_check(const struct xfrm_migrate *m, int num_migrate,
return 0;
}
+/*
+ * Fill migrate fields that are invariant in XFRM_MSG_MIGRATE: inherited
+ * from the existing SA unchanged. XFRM_MSG_MIGRATE_STATE can update these.
+ */
+static void xfrm_migrate_copy_old(struct xfrm_migrate *mp,
+ const struct xfrm_state *x,
+ struct xfrm_mark *new_mark_buf)
+{
+ mp->smark = x->props.smark;
+ mp->new_reqid = x->props.reqid;
+ mp->nat_keepalive_interval = x->nat_keepalive_interval;
+ mp->mapping_maxage = x->mapping_maxage;
+ *new_mark_buf = x->mark;
+ mp->new_mark = new_mark_buf;
+}
+
int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
struct xfrm_migrate *m, int num_migrate,
struct xfrm_kmaddress *k, struct net *net,
@@ -4642,6 +4658,7 @@ int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
struct netlink_ext_ack *extack, struct xfrm_user_offload *xuo)
{
int i, err, nx_cur = 0, nx_new = 0;
+ struct xfrm_mark new_marks[XFRM_MAX_DEPTH] = {};
struct xfrm_policy *pol = NULL;
struct xfrm_state *x, *xc;
struct xfrm_state *x_cur[XFRM_MAX_DEPTH];
@@ -4674,6 +4691,8 @@ int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
nx_cur++;
mp->encap = encap;
mp->xuo = xuo;
+ xfrm_migrate_copy_old(mp, x, &new_marks[i]);
+
xc = xfrm_state_migrate(x, mp, net, extack);
if (xc) {
x_new[nx_new] = xc;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 1ee114f8515d..25d54c44fd94 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1974,11 +1974,25 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
goto out;
memcpy(&x->id, &orig->id, sizeof(x->id));
- memcpy(&x->sel, &orig->sel, sizeof(x->sel));
+ if (m->msg_type == XFRM_MSG_MIGRATE_STATE) {
+ if (m->flags & XFRM_MIGRATE_STATE_UPDATE_SEL) {
+ u8 prefixlen = (m->new_family == AF_INET6) ? 128 : 32;
+
+ memcpy(&x->sel, &orig->sel, sizeof(x->sel));
+ x->sel.family = m->new_family;
+ x->sel.prefixlen_d = prefixlen;
+ x->sel.prefixlen_s = prefixlen;
+ memcpy(&x->sel.daddr, &m->new_daddr, sizeof(x->sel.daddr));
+ memcpy(&x->sel.saddr, &m->new_saddr, sizeof(x->sel.saddr));
+ } else {
+ x->sel = *m->new_sel;
+ }
+ } else {
+ memcpy(&x->sel, &orig->sel, sizeof(x->sel));
+ }
memcpy(&x->lft, &orig->lft, sizeof(x->lft));
x->props.mode = orig->props.mode;
x->props.replay_window = orig->props.replay_window;
- x->props.reqid = orig->props.reqid;
if (orig->aalg) {
x->aalg = xfrm_algo_auth_clone(orig->aalg);
@@ -2011,8 +2025,8 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
x->encap = kmemdup(m->encap, sizeof(*x->encap), GFP_KERNEL);
if (!x->encap)
goto error;
- x->mapping_maxage = orig->mapping_maxage;
- x->nat_keepalive_interval = orig->nat_keepalive_interval;
+ x->mapping_maxage = m->mapping_maxage;
+ x->nat_keepalive_interval = m->nat_keepalive_interval;
}
if (orig->security)
@@ -2029,8 +2043,9 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
if (xfrm_replay_clone(x, orig))
goto error;
- memcpy(&x->mark, &orig->mark, sizeof(x->mark));
- memcpy(&x->props.smark, &orig->props.smark, sizeof(x->props.smark));
+ x->mark = m->new_mark ? *m->new_mark : m->old_mark;
+
+ x->props.smark = m->smark;
x->props.flags = orig->props.flags;
x->props.extra_flags = orig->props.extra_flags;
@@ -2053,7 +2068,7 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
goto error;
}
-
+ x->props.reqid = m->new_reqid;
x->props.family = m->new_family;
memcpy(&x->id.daddr, &m->new_daddr, sizeof(x->id.daddr));
memcpy(&x->props.saddr, &m->new_saddr, sizeof(x->props.saddr));
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index fe0cf824f072..46e506548122 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1318,7 +1318,7 @@ static int copy_to_user_encap(struct xfrm_encap_tmpl *ep, struct sk_buff *skb)
return 0;
}
-static int xfrm_smark_put(struct sk_buff *skb, struct xfrm_mark *m)
+static int xfrm_smark_put(struct sk_buff *skb, const struct xfrm_mark *m)
{
int ret = 0;
@@ -3059,6 +3059,25 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,
}
#ifdef CONFIG_XFRM_MIGRATE
+static void copy_from_user_migrate_state(struct xfrm_migrate *ma,
+ const struct xfrm_user_migrate_state *um)
+{
+ memcpy(&ma->old_daddr, &um->id.daddr, sizeof(ma->old_daddr));
+ memcpy(&ma->new_daddr, &um->new_daddr, sizeof(ma->new_daddr));
+ memcpy(&ma->new_saddr, &um->new_saddr, sizeof(ma->new_saddr));
+
+ ma->proto = um->id.proto;
+ ma->new_reqid = um->new_reqid;
+
+ ma->old_family = um->id.family;
+ ma->new_family = um->new_family;
+
+ ma->old_mark = um->old_mark;
+ ma->flags = um->flags;
+ ma->new_sel = &um->new_sel;
+ ma->msg_type = XFRM_MSG_MIGRATE_STATE;
+}
+
static int copy_from_user_migrate(struct xfrm_migrate *ma,
struct xfrm_kmaddress *k,
struct nlattr **attrs, int *num,
@@ -3098,6 +3117,7 @@ static int copy_from_user_migrate(struct xfrm_migrate *ma,
ma->old_family = um->old_family;
ma->new_family = um->new_family;
+ ma->msg_type = XFRM_MSG_MIGRATE;
}
*num = i;
@@ -3108,7 +3128,7 @@ static int xfrm_do_migrate(struct sk_buff *skb, struct nlmsghdr *nlh,
struct nlattr **attrs, struct netlink_ext_ack *extack)
{
struct xfrm_userpolicy_id *pi = nlmsg_data(nlh);
- struct xfrm_migrate m[XFRM_MAX_DEPTH];
+ struct xfrm_migrate m[XFRM_MAX_DEPTH] = {};
struct xfrm_kmaddress km, *kmp;
u8 type;
int err;
@@ -3161,7 +3181,268 @@ static int xfrm_do_migrate(struct sk_buff *skb, struct nlmsghdr *nlh,
kfree(xuo);
return err;
}
+
+static int build_migrate_state(struct sk_buff *skb,
+ const struct xfrm_user_migrate_state *um,
+ const struct xfrm_migrate *m,
+ u8 dir, u32 portid, u32 seq)
+{
+ int err;
+ struct nlmsghdr *nlh;
+ struct xfrm_user_migrate_state *hdr;
+
+ nlh = nlmsg_put(skb, portid, seq, XFRM_MSG_MIGRATE_STATE,
+ sizeof(struct xfrm_user_migrate_state), 0);
+ if (!nlh)
+ return -EMSGSIZE;
+
+ hdr = nlmsg_data(nlh);
+ *hdr = *um;
+ hdr->new_sel = *m->new_sel;
+
+ if (m->encap) {
+ err = nla_put(skb, XFRMA_ENCAP, sizeof(*m->encap), m->encap);
+ if (err)
+ goto out_cancel;
+ }
+
+ if (m->xuo) {
+ err = nla_put(skb, XFRMA_OFFLOAD_DEV, sizeof(*m->xuo), m->xuo);
+ if (err)
+ goto out_cancel;
+ }
+
+ if (m->new_mark) {
+ err = nla_put(skb, XFRMA_MARK, sizeof(*m->new_mark),
+ m->new_mark);
+ if (err)
+ goto out_cancel;
+ }
+
+ err = xfrm_smark_put(skb, &m->smark);
+ if (err)
+ goto out_cancel;
+
+ if (m->mapping_maxage) {
+ err = nla_put_u32(skb, XFRMA_MTIMER_THRESH, m->mapping_maxage);
+ if (err)
+ goto out_cancel;
+ }
+
+ if (m->nat_keepalive_interval) {
+ err = nla_put_u32(skb, XFRMA_NAT_KEEPALIVE_INTERVAL,
+ m->nat_keepalive_interval);
+ if (err)
+ goto out_cancel;
+ }
+
+ if (dir) {
+ err = nla_put_u8(skb, XFRMA_SA_DIR, dir);
+ if (err)
+ goto out_cancel;
+ }
+
+ nlmsg_end(skb, nlh);
+ return 0;
+
+out_cancel:
+ nlmsg_cancel(skb, nlh);
+ return err;
+}
+
+static unsigned int xfrm_migrate_state_msgsize(const struct xfrm_migrate *m,
+ u8 dir)
+{
+ return NLMSG_ALIGN(sizeof(struct xfrm_user_migrate_state)) +
+ (m->encap ? nla_total_size(sizeof(struct xfrm_encap_tmpl)) : 0) +
+ (m->xuo ? nla_total_size(sizeof(struct xfrm_user_offload)) : 0) +
+ (m->new_mark ? nla_total_size(sizeof(struct xfrm_mark)) : 0) +
+ (m->smark.v ? nla_total_size(sizeof(u32)) * 2 : 0) + /* SET_MARK + SET_MARK_MASK */
+ (m->mapping_maxage ? nla_total_size(sizeof(u32)) : 0) +
+ (m->nat_keepalive_interval ? nla_total_size(sizeof(u32)) : 0) +
+ (dir ? nla_total_size(sizeof(u8)) : 0); /* XFRMA_SA_DIR */
+}
+
+static int xfrm_send_migrate_state(const struct xfrm_user_migrate_state *um,
+ const struct xfrm_migrate *m,
+ u8 dir, u32 portid, u32 seq)
+{
+ int err;
+ struct sk_buff *skb;
+ struct net *net = &init_net;
+
+ skb = nlmsg_new(xfrm_migrate_state_msgsize(m, dir), GFP_ATOMIC);
+ if (!skb)
+ return -ENOMEM;
+
+ err = build_migrate_state(skb, um, m, dir, portid, seq);
+ if (err < 0) {
+ kfree_skb(skb);
+ return err;
+ }
+
+ return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_MIGRATE);
+}
+
+static int xfrm_do_migrate_state(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct nlattr **attrs, struct netlink_ext_ack *extack)
+{
+ struct xfrm_user_migrate_state *um = nlmsg_data(nlh);
+ struct net *net = sock_net(skb->sk);
+ struct xfrm_user_offload xuo = {};
+ struct xfrm_migrate m = {};
+ struct xfrm_state *xc;
+ struct xfrm_state *x;
+ int err;
+
+ if (!um->id.spi) {
+ NL_SET_ERR_MSG(extack, "Invalid SPI 0x0");
+ return -EINVAL;
+ }
+
+ if (um->reserved) {
+ NL_SET_ERR_MSG(extack, "Reserved field must be zero");
+ return -EINVAL;
+ }
+
+ if (um->flags & ~(XFRM_MIGRATE_STATE_NO_OFFLOAD |
+ XFRM_MIGRATE_STATE_UPDATE_SEL)) {
+ NL_SET_ERR_MSG(extack, "Unknown flags in XFRM_MSG_MIGRATE_STATE");
+ return -EINVAL;
+ }
+
+ if ((um->flags & XFRM_MIGRATE_STATE_NO_OFFLOAD) &&
+ attrs[XFRMA_OFFLOAD_DEV]) {
+ NL_SET_ERR_MSG(extack,
+ "XFRM_MIGRATE_STATE_NO_OFFLOAD and XFRMA_OFFLOAD_DEV are mutually exclusive");
+ return -EINVAL;
+ }
+
+ copy_from_user_migrate_state(&m, um);
+
+ x = xfrm_state_lookup(net, m.old_mark.v & m.old_mark.m,
+ &um->id.daddr, um->id.spi,
+ um->id.proto, um->id.family);
+ if (!x) {
+ NL_SET_ERR_MSG(extack, "Can not find state");
+ return -ESRCH;
+ }
+
+ if (um->flags & XFRM_MIGRATE_STATE_UPDATE_SEL) {
+ u8 prefixlen = (x->sel.family == AF_INET6) ? 128 : 32;
+
+ if (x->sel.prefixlen_s != x->sel.prefixlen_d ||
+ x->sel.prefixlen_d != prefixlen ||
+ !xfrm_addr_equal(&x->sel.daddr, &x->id.daddr, x->sel.family) ||
+ !xfrm_addr_equal(&x->sel.saddr, &x->props.saddr, x->sel.family)) {
+ NL_SET_ERR_MSG(extack,
+ "SA selector is not a single-host match for SA addresses");
+ err = -EINVAL;
+ goto out;
+ }
+ }
+
+ if (attrs[XFRMA_ENCAP]) {
+ m.encap = nla_data(attrs[XFRMA_ENCAP]);
+ if (m.encap->encap_type == 0) {
+ m.encap = NULL; /* sentinel: remove encap */
+ } else if (m.encap->encap_type != UDP_ENCAP_ESPINUDP) {
+ NL_SET_ERR_MSG(extack, "Unsupported encapsulation type");
+ err = -EINVAL;
+ goto out;
+ }
+ } else {
+ m.encap = x->encap; /* omit-to-inherit */
+ }
+
+ if (attrs[XFRMA_MTIMER_THRESH]) {
+ err = verify_mtimer_thresh(!!m.encap, x->dir, extack);
+ if (err)
+ goto out;
+ }
+
+ if (attrs[XFRMA_NAT_KEEPALIVE_INTERVAL] &&
+ nla_get_u32(attrs[XFRMA_NAT_KEEPALIVE_INTERVAL]) && !m.encap) {
+ NL_SET_ERR_MSG(extack,
+ "NAT_KEEPALIVE_INTERVAL requires encapsulation");
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (attrs[XFRMA_OFFLOAD_DEV]) {
+ m.xuo = nla_data(attrs[XFRMA_OFFLOAD_DEV]);
+ } else if (!(um->flags & XFRM_MIGRATE_STATE_NO_OFFLOAD) && x->xso.dev) {
+ xuo.ifindex = x->xso.dev->ifindex;
+ if (x->xso.dir == XFRM_DEV_OFFLOAD_IN)
+ xuo.flags = XFRM_OFFLOAD_INBOUND;
+ if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET)
+ xuo.flags |= XFRM_OFFLOAD_PACKET;
+ m.xuo = &xuo;
+ }
+
+ if (attrs[XFRMA_MARK])
+ m.new_mark = nla_data(attrs[XFRMA_MARK]);
+
+ if (attrs[XFRMA_SET_MARK])
+ xfrm_smark_init(attrs, &m.smark);
+ else
+ m.smark = x->props.smark;
+
+ m.mapping_maxage = attrs[XFRMA_MTIMER_THRESH] ?
+ nla_get_u32(attrs[XFRMA_MTIMER_THRESH]) : x->mapping_maxage;
+ m.nat_keepalive_interval = attrs[XFRMA_NAT_KEEPALIVE_INTERVAL] ?
+ nla_get_u32(attrs[XFRMA_NAT_KEEPALIVE_INTERVAL]) :
+ x->nat_keepalive_interval;
+
+ xc = xfrm_state_migrate_create(x, &m, net, extack);
+ if (!xc) {
+ NL_SET_ERR_MSG_WEAK(extack, "State migration clone failed");
+ err = -EINVAL;
+ goto out;
+ }
+
+ spin_lock_bh(&x->lock);
+ xfrm_migrate_sync(xc, x); /* to prevent SN/IV reuse */
+ __xfrm_state_delete(x);
+ spin_unlock_bh(&x->lock);
+
+ err = xfrm_state_migrate_install(x, xc, &m, extack);
+ if (err < 0) {
+ /*
+ * In this rare case both the old SA and the new SA
+ * will disappear.
+ * Alternatives risk duplicate SN/IV usage which must not occur.
+ * Userspace must handle this error, -EEXIST.
+ */
+ goto out;
+ }
+
+ /* Restore encap cleared by sentinel (type=0) during migration. */
+ if (attrs[XFRMA_ENCAP])
+ m.encap = nla_data(attrs[XFRMA_ENCAP]);
+
+ m.new_sel = &xc->sel;
+
+ err = xfrm_send_migrate_state(um, &m, xc->dir,
+ nlh->nlmsg_pid, nlh->nlmsg_seq);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack, "Failed to send migration notification");
+ err = 0;
+ }
+
+out:
+ xfrm_state_put(x);
+ return err;
+}
+
#else
+static int xfrm_do_migrate_state(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct nlattr **attrs, struct netlink_ext_ack *extack)
+{
+ NL_SET_ERR_MSG(extack, "XFRM_MSG_MIGRATE_STATE is not supported");
+ return -ENOPROTOOPT;
+}
+
static int xfrm_do_migrate(struct sk_buff *skb, struct nlmsghdr *nlh,
struct nlattr **attrs, struct netlink_ext_ack *extack)
{
@@ -3314,6 +3595,7 @@ const int xfrm_msg_min[XFRM_NR_MSGTYPES] = {
[XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = sizeof(u32),
[XFRM_MSG_SETDEFAULT - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_default),
[XFRM_MSG_GETDEFAULT - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_default),
+ [XFRM_MSG_MIGRATE_STATE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_migrate_state),
};
EXPORT_SYMBOL_GPL(xfrm_msg_min);
@@ -3407,6 +3689,7 @@ static const struct xfrm_link {
[XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = { .doit = xfrm_get_spdinfo },
[XFRM_MSG_SETDEFAULT - XFRM_MSG_BASE] = { .doit = xfrm_set_default },
[XFRM_MSG_GETDEFAULT - XFRM_MSG_BASE] = { .doit = xfrm_get_default },
+ [XFRM_MSG_MIGRATE_STATE - XFRM_MSG_BASE] = { .doit = xfrm_do_migrate_state },
};
static int xfrm_reject_unused_attr(int type, struct nlattr **attrs,
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index 2c0b07f9fbbd..655d2616c9d2 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -128,6 +128,7 @@ static const struct nlmsg_perm nlmsg_xfrm_perms[] = {
{ XFRM_MSG_MAPPING, NETLINK_XFRM_SOCKET__NLMSG_READ },
{ XFRM_MSG_SETDEFAULT, NETLINK_XFRM_SOCKET__NLMSG_WRITE },
{ XFRM_MSG_GETDEFAULT, NETLINK_XFRM_SOCKET__NLMSG_READ },
+ { XFRM_MSG_MIGRATE_STATE, NETLINK_XFRM_SOCKET__NLMSG_WRITE },
};
static const struct nlmsg_perm nlmsg_audit_perms[] = {
@@ -203,7 +204,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm)
* structures at the top of this file with the new mappings
* before updating the BUILD_BUG_ON() macro!
*/
- BUILD_BUG_ON(XFRM_MSG_MAX != XFRM_MSG_GETDEFAULT);
+ BUILD_BUG_ON(XFRM_MSG_MAX != XFRM_MSG_MIGRATE_STATE);
if (selinux_policycap_netlink_xperm()) {
*perm = NETLINK_XFRM_SOCKET__NLMSG;
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 11/14] xfrm: refactor XFRMA_MTIMER_THRESH validation into a helper
From: Antony Antony @ 2026-04-12 11:15 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
Extract verify_mtimer_thresh() to consolidate the XFRMA_MTIMER_THRESH
validation logic shared between the add_sa and upcoming patch.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v5->v6: added this patch
---
net/xfrm/xfrm_user.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 26b82d94acc1..fe0cf824f072 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -239,6 +239,22 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
return 0;
}
+static int verify_mtimer_thresh(bool has_encap, u8 dir,
+ struct netlink_ext_ack *extack)
+{
+ if (!has_encap) {
+ NL_SET_ERR_MSG(extack,
+ "MTIMER_THRESH requires encapsulation");
+ return -EINVAL;
+ }
+ if (dir == XFRM_SA_DIR_OUT) {
+ NL_SET_ERR_MSG(extack,
+ "MTIMER_THRESH should not be set on output SA");
+ return -EINVAL;
+ }
+ return 0;
+}
+
static int verify_newsa_info(struct xfrm_usersa_info *p,
struct nlattr **attrs,
struct netlink_ext_ack *extack)
@@ -446,18 +462,9 @@ static int verify_newsa_info(struct xfrm_usersa_info *p,
err = 0;
if (attrs[XFRMA_MTIMER_THRESH]) {
- if (!attrs[XFRMA_ENCAP]) {
- NL_SET_ERR_MSG(extack, "MTIMER_THRESH attribute can only be set on ENCAP states");
- err = -EINVAL;
- goto out;
- }
-
- if (sa_dir == XFRM_SA_DIR_OUT) {
- NL_SET_ERR_MSG(extack,
- "MTIMER_THRESH attribute should not be set on output SA");
- err = -EINVAL;
+ err = verify_mtimer_thresh(!!attrs[XFRMA_ENCAP], sa_dir, extack);
+ if (err)
goto out;
- }
}
if (sa_dir == XFRM_SA_DIR_OUT) {
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 10/14] xfrm: move encap and xuo into struct xfrm_migrate
From: Antony Antony @ 2026-04-12 11:15 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
In preparation for an upcoming patch, move the xfrm_encap_tmpl and
xfrm_user_offload pointers from separate parameters into struct
xfrm_migrate, reducing the parameter count of
xfrm_state_migrate_create(), xfrm_state_migrate_install(), and
xfrm_state_migrate().
The fields are placed after the four xfrm_address_t members where
the struct is naturally 8-byte aligned, avoiding padding.
No functional change.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v5->v6: added this patch.
---
include/net/xfrm.h | 7 ++-----
net/xfrm/xfrm_policy.c | 4 +++-
net/xfrm/xfrm_state.c | 20 +++++++-------------
3 files changed, 12 insertions(+), 19 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index be22c26e4661..4b29ab92c2a7 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -682,6 +682,8 @@ struct xfrm_migrate {
xfrm_address_t old_saddr;
xfrm_address_t new_daddr;
xfrm_address_t new_saddr;
+ struct xfrm_encap_tmpl *encap;
+ struct xfrm_user_offload *xuo;
u8 proto;
u8 mode;
u16 reserved;
@@ -1897,20 +1899,15 @@ struct xfrm_state *xfrm_migrate_state_find(struct xfrm_migrate *m, struct net *n
u32 if_id);
struct xfrm_state *xfrm_state_migrate_create(struct xfrm_state *x,
const struct xfrm_migrate *m,
- const struct xfrm_encap_tmpl *encap,
struct net *net,
- struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack);
int xfrm_state_migrate_install(const struct xfrm_state *x,
struct xfrm_state *xc,
const struct xfrm_migrate *m,
- struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack);
struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
struct xfrm_migrate *m,
- struct xfrm_encap_tmpl *encap,
struct net *net,
- struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack);
int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
struct xfrm_migrate *m, int num_bundles,
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 62218b52fd35..0b5c7b51183a 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -4672,7 +4672,9 @@ int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
if ((x = xfrm_migrate_state_find(mp, net, if_id))) {
x_cur[nx_cur] = x;
nx_cur++;
- xc = xfrm_state_migrate(x, mp, encap, net, xuo, extack);
+ mp->encap = encap;
+ mp->xuo = xuo;
+ xc = xfrm_state_migrate(x, mp, net, extack);
if (xc) {
x_new[nx_new] = xc;
nx_new++;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 06ba8f03eab3..1ee114f8515d 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1966,7 +1966,6 @@ static inline int clone_security(struct xfrm_state *x, struct xfrm_sec_ctx *secu
}
static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
- const struct xfrm_encap_tmpl *encap,
const struct xfrm_migrate *m)
{
struct net *net = xs_net(orig);
@@ -2008,8 +2007,8 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
}
x->props.calgo = orig->props.calgo;
- if (encap) {
- x->encap = kmemdup(encap, sizeof(*x->encap), GFP_KERNEL);
+ if (m->encap) {
+ x->encap = kmemdup(m->encap, sizeof(*x->encap), GFP_KERNEL);
if (!x->encap)
goto error;
x->mapping_maxage = orig->mapping_maxage;
@@ -2122,14 +2121,12 @@ EXPORT_SYMBOL(xfrm_migrate_state_find);
struct xfrm_state *xfrm_state_migrate_create(struct xfrm_state *x,
const struct xfrm_migrate *m,
- const struct xfrm_encap_tmpl *encap,
struct net *net,
- struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack)
{
struct xfrm_state *xc;
- xc = xfrm_state_clone_and_setup(x, encap, m);
+ xc = xfrm_state_clone_and_setup(x, m);
if (!xc) {
NL_SET_ERR_MSG(extack, "Failed to clone and setup state");
return NULL;
@@ -2141,7 +2138,7 @@ struct xfrm_state *xfrm_state_migrate_create(struct xfrm_state *x,
}
/* configure the hardware if offload is requested */
- if (xuo && xfrm_dev_state_add(net, xc, xuo, extack))
+ if (m->xuo && xfrm_dev_state_add(net, xc, m->xuo, extack))
goto error;
return xc;
@@ -2155,7 +2152,6 @@ EXPORT_SYMBOL(xfrm_state_migrate_create);
int xfrm_state_migrate_install(const struct xfrm_state *x,
struct xfrm_state *xc,
const struct xfrm_migrate *m,
- struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack)
{
if (m->new_family == m->old_family &&
@@ -2168,7 +2164,7 @@ int xfrm_state_migrate_install(const struct xfrm_state *x,
} else {
if (xfrm_state_add(xc) < 0) {
NL_SET_ERR_MSG(extack, "Failed to add migrated state");
- if (xuo)
+ if (m->xuo)
xfrm_dev_state_delete(xc);
xc->km.state = XFRM_STATE_DEAD;
xfrm_state_put(xc);
@@ -2182,20 +2178,18 @@ EXPORT_SYMBOL(xfrm_state_migrate_install);
struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
struct xfrm_migrate *m,
- struct xfrm_encap_tmpl *encap,
struct net *net,
- struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack)
{
struct xfrm_state *xc;
- xc = xfrm_state_migrate_create(x, m, encap, net, xuo, extack);
+ xc = xfrm_state_migrate_create(x, m, net, extack);
if (!xc)
return NULL;
xfrm_migrate_sync(xc, x);
- if (xfrm_state_migrate_install(x, xc, m, xuo, extack) < 0)
+ if (xfrm_state_migrate_install(x, xc, m, extack) < 0)
return NULL;
return xc;
--
2.47.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox