* [PATCH AUTOSEL 6.12 013/486] SUNRPC: Don't allow waiting for exiting tasks
[not found] <20250505223922.2682012-1-sashal@kernel.org>
@ 2025-05-05 22:31 ` Sasha Levin
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 029/486] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting Sasha Levin
` (66 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Trond Myklebust, Jeff Layton, Sasha Levin, chuck.lever, trondmy,
anna, davem, edumazet, kuba, pabeni, linux-nfs, netdev
From: Trond Myklebust <trond.myklebust@hammerspace.com>
[ Upstream commit 14e41b16e8cb677bb440dca2edba8b041646c742 ]
Once a task calls exit_signals() it can no longer be signalled. So do
not allow it to do killable waits.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/sched.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 9b45fbdc90cab..73bc39281ef5f 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
{
+ if (unlikely(current->flags & PF_EXITING))
+ return -EINTR;
schedule();
if (signal_pending_state(mode, current))
return -ERESTARTSYS;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 029/486] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
[not found] <20250505223922.2682012-1-sashal@kernel.org>
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 013/486] SUNRPC: Don't allow waiting for exiting tasks Sasha Levin
@ 2025-05-05 22:31 ` Sasha Levin
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 030/486] SUNRPC: rpcbind should never reset the port to the value '0' Sasha Levin
` (65 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Trond Myklebust, Jeff Layton, Benjamin Coddington, Sasha Levin,
trondmy, anna, chuck.lever, davem, edumazet, kuba, pabeni,
linux-nfs, netdev
From: Trond Myklebust <trond.myklebust@hammerspace.com>
[ Upstream commit bf9be373b830a3e48117da5d89bb6145a575f880 ]
The autobind setting was supposed to be determined in rpc_create(),
since commit c2866763b402 ("SUNRPC: use sockaddr + size when creating
remote transport endpoints").
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/clnt.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 0090162ee8c35..17a4de75bfaf6 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -270,9 +270,6 @@ static struct rpc_xprt *rpc_clnt_set_transport(struct rpc_clnt *clnt,
old = rcu_dereference_protected(clnt->cl_xprt,
lockdep_is_held(&clnt->cl_lock));
- if (!xprt_bound(xprt))
- clnt->cl_autobind = 1;
-
clnt->cl_timeout = timeout;
rcu_assign_pointer(clnt->cl_xprt, xprt);
spin_unlock(&clnt->cl_lock);
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 030/486] SUNRPC: rpcbind should never reset the port to the value '0'
[not found] <20250505223922.2682012-1-sashal@kernel.org>
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 013/486] SUNRPC: Don't allow waiting for exiting tasks Sasha Levin
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 029/486] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting Sasha Levin
@ 2025-05-05 22:31 ` Sasha Levin
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 034/486] mctp: Fix incorrect tx flow invalidation condition in mctp-i2c Sasha Levin
` (64 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Trond Myklebust, Jeff Layton, Benjamin Coddington, Sasha Levin,
chuck.lever, trondmy, anna, davem, edumazet, kuba, pabeni,
linux-nfs, netdev
From: Trond Myklebust <trond.myklebust@hammerspace.com>
[ Upstream commit 214c13e380ad7636631279f426387f9c4e3c14d9 ]
If we already had a valid port number for the RPC service, then we
should not allow the rpcbind client to set it to the invalid value '0'.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/rpcb_clnt.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 102c3818bc54d..53bcca365fb1c 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -820,9 +820,10 @@ static void rpcb_getport_done(struct rpc_task *child, void *data)
}
trace_rpcb_setport(child, map->r_status, map->r_port);
- xprt->ops->set_port(xprt, map->r_port);
- if (map->r_port)
+ if (map->r_port) {
+ xprt->ops->set_port(xprt, map->r_port);
xprt_set_bound(xprt);
+ }
}
/*
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 034/486] mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (2 preceding siblings ...)
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 030/486] SUNRPC: rpcbind should never reset the port to the value '0' Sasha Levin
@ 2025-05-05 22:31 ` Sasha Levin
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 035/486] net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards Sasha Levin
` (63 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Daniel Hsu, Daniel Hsu, Jeremy Kerr, David S . Miller,
Sasha Levin, matt, andrew+netdev, edumazet, kuba, pabeni, netdev
From: Daniel Hsu <d486250@gmail.com>
[ Upstream commit 70facbf978ac90c6da17a3de2a8dd111b06f1bac ]
Previously, the condition for invalidating the tx flow in
mctp_i2c_invalidate_tx_flow() checked if `rc` was nonzero.
However, this could incorrectly trigger the invalidation
even when `rc > 0` was returned as a success status.
This patch updates the condition to explicitly check for `rc < 0`,
ensuring that only error cases trigger the invalidation.
Signed-off-by: Daniel Hsu <Daniel-Hsu@quantatw.com>
Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/mctp/mctp-i2c.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/mctp/mctp-i2c.c b/drivers/net/mctp/mctp-i2c.c
index 6622de48fc9e7..503a9174321c6 100644
--- a/drivers/net/mctp/mctp-i2c.c
+++ b/drivers/net/mctp/mctp-i2c.c
@@ -538,7 +538,7 @@ static void mctp_i2c_xmit(struct mctp_i2c_dev *midev, struct sk_buff *skb)
rc = __i2c_transfer(midev->adapter, &msg, 1);
/* on tx errors, the flow can no longer be considered valid */
- if (rc)
+ if (rc < 0)
mctp_i2c_invalidate_tx_flow(midev, skb);
break;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 035/486] net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (3 preceding siblings ...)
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 034/486] mctp: Fix incorrect tx flow invalidation condition in mctp-i2c Sasha Levin
@ 2025-05-05 22:31 ` Sasha Levin
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 036/486] net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus Sasha Levin
` (62 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Hans-Frieder Vogt, Andrew Lunn, Jakub Kicinski, Sasha Levin,
fujita.tomonori, andrew+netdev, davem, edumazet, pabeni, netdev
From: Hans-Frieder Vogt <hfdevel@gmx.net>
[ Upstream commit 53377b5c2952097527b01ce2f1d9a9332f042f70 ]
Add the PCI-ID of the AQR105-based Tehuti TN4010 cards to allow loading
of the tn40xx driver on these cards. Here, I chose the detailed definition
with the subvendor ID similar to the QT2025 cards with the PCI-ID
TEHUTI:0x4022, because there is a card with an AQ2104 hiding amongst the
AQR105 cards, and they all come with the same PCI-ID (TEHUTI:0x4025). But
the AQ2104 is currently not supported.
Signed-off-by: Hans-Frieder Vogt <hfdevel@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322-tn9510-v3a-v7-7-672a9a3d8628@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/tehuti/tn40.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/tehuti/tn40.c b/drivers/net/ethernet/tehuti/tn40.c
index 259bdac24cf21..a6965258441c4 100644
--- a/drivers/net/ethernet/tehuti/tn40.c
+++ b/drivers/net/ethernet/tehuti/tn40.c
@@ -1832,6 +1832,10 @@ static const struct pci_device_id tn40_id_table[] = {
PCI_VENDOR_ID_ASUSTEK, 0x8709) },
{ PCI_DEVICE_SUB(PCI_VENDOR_ID_TEHUTI, 0x4022,
PCI_VENDOR_ID_EDIMAX, 0x8103) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_TEHUTI, PCI_DEVICE_ID_TEHUTI_TN9510,
+ PCI_VENDOR_ID_TEHUTI, 0x3015) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_TEHUTI, PCI_DEVICE_ID_TEHUTI_TN9510,
+ PCI_VENDOR_ID_EDIMAX, 0x8102) },
{ }
};
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 036/486] net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (4 preceding siblings ...)
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 035/486] net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards Sasha Levin
@ 2025-05-05 22:31 ` Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 046/486] r8169: disable RTL8126 ZRX-DC timeout Sasha Levin
` (61 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Hans-Frieder Vogt, Andrew Lunn, Jakub Kicinski, Sasha Levin,
fujita.tomonori, andrew+netdev, davem, edumazet, pabeni, netdev
From: Hans-Frieder Vogt <hfdevel@gmx.net>
[ Upstream commit 25b6a6d29d4082f6ac231c056ac321a996eb55c9 ]
In case of an AQR105-based device, create a software node for the mdio
function, with a child node for the Aquantia AQR105 PHY, providing a
firmware-name (and a bit more, which may be used for future checks) to
allow the PHY to load a MAC specific firmware from the file system.
The name of the PHY software node follows the naming convention suggested
in the patch for the mdiobus_scan function (in the same patch series).
Signed-off-by: Hans-Frieder Vogt <hfdevel@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322-tn9510-v3a-v7-5-672a9a3d8628@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/tehuti/tn40.c | 5 +-
drivers/net/ethernet/tehuti/tn40.h | 33 ++++++++++
drivers/net/ethernet/tehuti/tn40_mdio.c | 82 ++++++++++++++++++++++++-
3 files changed, 117 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/tehuti/tn40.c b/drivers/net/ethernet/tehuti/tn40.c
index a6965258441c4..558b791a97edd 100644
--- a/drivers/net/ethernet/tehuti/tn40.c
+++ b/drivers/net/ethernet/tehuti/tn40.c
@@ -1778,7 +1778,7 @@ static int tn40_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
ret = tn40_phy_register(priv);
if (ret) {
dev_err(&pdev->dev, "failed to set up PHY.\n");
- goto err_free_irq;
+ goto err_cleanup_swnodes;
}
ret = tn40_priv_init(priv);
@@ -1795,6 +1795,8 @@ static int tn40_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return 0;
err_unregister_phydev:
tn40_phy_unregister(priv);
+err_cleanup_swnodes:
+ tn40_swnodes_cleanup(priv);
err_free_irq:
pci_free_irq_vectors(pdev);
err_unset_drvdata:
@@ -1816,6 +1818,7 @@ static void tn40_remove(struct pci_dev *pdev)
unregister_netdev(ndev);
tn40_phy_unregister(priv);
+ tn40_swnodes_cleanup(priv);
pci_free_irq_vectors(priv->pdev);
pci_set_drvdata(pdev, NULL);
iounmap(priv->regs);
diff --git a/drivers/net/ethernet/tehuti/tn40.h b/drivers/net/ethernet/tehuti/tn40.h
index 490781fe51205..25da8686d4691 100644
--- a/drivers/net/ethernet/tehuti/tn40.h
+++ b/drivers/net/ethernet/tehuti/tn40.h
@@ -4,10 +4,13 @@
#ifndef _TN40_H_
#define _TN40_H_
+#include <linux/property.h>
#include "tn40_regs.h"
#define TN40_DRV_NAME "tn40xx"
+#define PCI_DEVICE_ID_TEHUTI_TN9510 0x4025
+
#define TN40_MDIO_SPEED_1MHZ (1)
#define TN40_MDIO_SPEED_6MHZ (6)
@@ -102,10 +105,39 @@ struct tn40_txdb {
int size; /* Number of elements in the db */
};
+#define NODE_PROP(_NAME, _PROP) ( \
+ (const struct software_node) { \
+ .name = _NAME, \
+ .properties = _PROP, \
+ })
+
+#define NODE_PAR_PROP(_NAME, _PAR, _PROP) ( \
+ (const struct software_node) { \
+ .name = _NAME, \
+ .parent = _PAR, \
+ .properties = _PROP, \
+ })
+
+enum tn40_swnodes {
+ SWNODE_MDIO,
+ SWNODE_PHY,
+ SWNODE_MAX
+};
+
+struct tn40_nodes {
+ char phy_name[32];
+ char mdio_name[32];
+ struct property_entry phy_props[3];
+ struct software_node swnodes[SWNODE_MAX];
+ const struct software_node *group[SWNODE_MAX + 1];
+};
+
struct tn40_priv {
struct net_device *ndev;
struct pci_dev *pdev;
+ struct tn40_nodes nodes;
+
struct napi_struct napi;
/* RX FIFOs: 1 for data (full) descs, and 2 for free descs */
struct tn40_rxd_fifo rxd_fifo0;
@@ -225,6 +257,7 @@ static inline void tn40_write_reg(struct tn40_priv *priv, u32 reg, u32 val)
int tn40_set_link_speed(struct tn40_priv *priv, u32 speed);
+void tn40_swnodes_cleanup(struct tn40_priv *priv);
int tn40_mdiobus_init(struct tn40_priv *priv);
int tn40_phy_register(struct tn40_priv *priv);
diff --git a/drivers/net/ethernet/tehuti/tn40_mdio.c b/drivers/net/ethernet/tehuti/tn40_mdio.c
index af18615d64a8a..5bb0cbc87d064 100644
--- a/drivers/net/ethernet/tehuti/tn40_mdio.c
+++ b/drivers/net/ethernet/tehuti/tn40_mdio.c
@@ -14,6 +14,8 @@
(FIELD_PREP(TN40_MDIO_PRTAD_MASK, (port))))
#define TN40_MDIO_CMD_READ BIT(15)
+#define AQR105_FIRMWARE "tehuti/aqr105-tn40xx.cld"
+
static void tn40_mdio_set_speed(struct tn40_priv *priv, u32 speed)
{
void __iomem *regs = priv->regs;
@@ -111,6 +113,56 @@ static int tn40_mdio_write_c45(struct mii_bus *mii_bus, int addr, int devnum,
return tn40_mdio_write(mii_bus->priv, addr, devnum, regnum, val);
}
+/* registers an mdio node and an aqr105 PHY at address 1
+ * tn40_mdio-%id {
+ * ethernet-phy@1 {
+ * compatible = "ethernet-phy-id03a1.b4a3";
+ * reg = <1>;
+ * firmware-name = AQR105_FIRMWARE;
+ * };
+ * };
+ */
+static int tn40_swnodes_register(struct tn40_priv *priv)
+{
+ struct tn40_nodes *nodes = &priv->nodes;
+ struct pci_dev *pdev = priv->pdev;
+ struct software_node *swnodes;
+ u32 id;
+
+ id = pci_dev_id(pdev);
+
+ snprintf(nodes->phy_name, sizeof(nodes->phy_name), "ethernet-phy@1");
+ snprintf(nodes->mdio_name, sizeof(nodes->mdio_name), "tn40_mdio-%x",
+ id);
+
+ swnodes = nodes->swnodes;
+
+ swnodes[SWNODE_MDIO] = NODE_PROP(nodes->mdio_name, NULL);
+
+ nodes->phy_props[0] = PROPERTY_ENTRY_STRING("compatible",
+ "ethernet-phy-id03a1.b4a3");
+ nodes->phy_props[1] = PROPERTY_ENTRY_U32("reg", 1);
+ nodes->phy_props[2] = PROPERTY_ENTRY_STRING("firmware-name",
+ AQR105_FIRMWARE);
+ swnodes[SWNODE_PHY] = NODE_PAR_PROP(nodes->phy_name,
+ &swnodes[SWNODE_MDIO],
+ nodes->phy_props);
+
+ nodes->group[SWNODE_PHY] = &swnodes[SWNODE_PHY];
+ nodes->group[SWNODE_MDIO] = &swnodes[SWNODE_MDIO];
+ return software_node_register_node_group(nodes->group);
+}
+
+void tn40_swnodes_cleanup(struct tn40_priv *priv)
+{
+ /* cleanup of swnodes is only needed for AQR105-based cards */
+ if (priv->pdev->device == PCI_DEVICE_ID_TEHUTI_TN9510) {
+ fwnode_handle_put(dev_fwnode(&priv->mdio->dev));
+ device_remove_software_node(&priv->mdio->dev);
+ software_node_unregister_node_group(priv->nodes.group);
+ }
+}
+
int tn40_mdiobus_init(struct tn40_priv *priv)
{
struct pci_dev *pdev = priv->pdev;
@@ -129,14 +181,40 @@ int tn40_mdiobus_init(struct tn40_priv *priv)
bus->read_c45 = tn40_mdio_read_c45;
bus->write_c45 = tn40_mdio_write_c45;
+ priv->mdio = bus;
+
+ /* provide swnodes for AQR105-based cards only */
+ if (pdev->device == PCI_DEVICE_ID_TEHUTI_TN9510) {
+ ret = tn40_swnodes_register(priv);
+ if (ret) {
+ pr_err("swnodes failed\n");
+ return ret;
+ }
+
+ ret = device_add_software_node(&bus->dev,
+ priv->nodes.group[SWNODE_MDIO]);
+ if (ret) {
+ dev_err(&pdev->dev,
+ "device_add_software_node failed: %d\n", ret);
+ goto err_swnodes_unregister;
+ }
+ }
ret = devm_mdiobus_register(&pdev->dev, bus);
if (ret) {
dev_err(&pdev->dev, "failed to register mdiobus %d %u %u\n",
ret, bus->state, MDIOBUS_UNREGISTERED);
- return ret;
+ goto err_swnodes_cleanup;
}
tn40_mdio_set_speed(priv, TN40_MDIO_SPEED_6MHZ);
- priv->mdio = bus;
return 0;
+
+err_swnodes_unregister:
+ software_node_unregister_node_group(priv->nodes.group);
+ return ret;
+err_swnodes_cleanup:
+ tn40_swnodes_cleanup(priv);
+ return ret;
}
+
+MODULE_FIRMWARE(AQR105_FIRMWARE);
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 046/486] r8169: disable RTL8126 ZRX-DC timeout
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (5 preceding siblings ...)
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 036/486] net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus Sasha Levin
@ 2025-05-05 22:32 ` Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 092/486] bnxt_en: Query FW parameters when the CAPS_CHANGE bit is set Sasha Levin
` (60 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:32 UTC (permalink / raw)
To: linux-kernel, stable
Cc: ChunHao Lin, Heiner Kallweit, Jakub Kicinski, Sasha Levin,
nic_swsd, andrew+netdev, davem, edumazet, pabeni, netdev
From: ChunHao Lin <hau@realtek.com>
[ Upstream commit b48688ea3c9ac8d5d910c6e91fb7f80d846581f0 ]
Disable it due to it dose not meet ZRX-DC specification. If it is enabled,
device will exit L1 substate every 100ms. Disable it for saving more power
in L1 substate.
Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250318083721.4127-3-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/realtek/r8169_main.c | 27 +++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 5ed2818bac257..3420b6cf8189f 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2850,6 +2850,32 @@ static u32 rtl_csi_read(struct rtl8169_private *tp, int addr)
RTL_R32(tp, CSIDR) : ~0;
}
+static void rtl_disable_zrxdc_timeout(struct rtl8169_private *tp)
+{
+ struct pci_dev *pdev = tp->pci_dev;
+ u32 csi;
+ int rc;
+ u8 val;
+
+#define RTL_GEN3_RELATED_OFF 0x0890
+#define RTL_GEN3_ZRXDC_NONCOMPL 0x1
+ if (pdev->cfg_size > RTL_GEN3_RELATED_OFF) {
+ rc = pci_read_config_byte(pdev, RTL_GEN3_RELATED_OFF, &val);
+ if (rc == PCIBIOS_SUCCESSFUL) {
+ val &= ~RTL_GEN3_ZRXDC_NONCOMPL;
+ rc = pci_write_config_byte(pdev, RTL_GEN3_RELATED_OFF,
+ val);
+ if (rc == PCIBIOS_SUCCESSFUL)
+ return;
+ }
+ }
+
+ netdev_notice_once(tp->dev,
+ "No native access to PCI extended config space, falling back to CSI\n");
+ csi = rtl_csi_read(tp, RTL_GEN3_RELATED_OFF);
+ rtl_csi_write(tp, RTL_GEN3_RELATED_OFF, csi & ~RTL_GEN3_ZRXDC_NONCOMPL);
+}
+
static void rtl_set_aspm_entry_latency(struct rtl8169_private *tp, u8 val)
{
struct pci_dev *pdev = tp->pci_dev;
@@ -3816,6 +3842,7 @@ static void rtl_hw_start_8125b(struct rtl8169_private *tp)
static void rtl_hw_start_8126a(struct rtl8169_private *tp)
{
+ rtl_disable_zrxdc_timeout(tp);
rtl_set_def_aspm_entry_latency(tp);
rtl_hw_start_8125_common(tp);
}
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 092/486] bnxt_en: Query FW parameters when the CAPS_CHANGE bit is set
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (6 preceding siblings ...)
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 046/486] r8169: disable RTL8126 ZRX-DC timeout Sasha Levin
@ 2025-05-05 22:32 ` Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 103/486] tcp: reorganize tcp_in_ack_event() and tcp_count_delivered() Sasha Levin
` (59 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:32 UTC (permalink / raw)
To: linux-kernel, stable
Cc: shantiprasad shettar, Somnath Kotur, Pavan Chebbi, Michael Chan,
Jacob Keller, Paolo Abeni, Sasha Levin, andrew+netdev, davem,
edumazet, kuba, netdev
From: shantiprasad shettar <shantiprasad.shettar@broadcom.com>
[ Upstream commit a6c81e32aeacbfd530d576fa401edd506ec966ef ]
Newer FW can set the CAPS_CHANGE flag during ifup if some capabilities
or configurations have changed. For example, the CoS queue
configurations may have changed. Support this new flag by treating it
almost like FW reset. The driver will essentially rediscover all
features and capabilities, reconfigure all backing store context memory,
reset everything to default, and reserve all resources.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: shantiprasad shettar <shantiprasad.shettar@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250310183129.3154117-5-michael.chan@broadcom.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 016dcfec8d496..63b674cae892c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -11776,6 +11776,7 @@ static int bnxt_hwrm_if_change(struct bnxt *bp, bool up)
struct hwrm_func_drv_if_change_input *req;
bool fw_reset = !bp->irq_tbl;
bool resc_reinit = false;
+ bool caps_change = false;
int rc, retry = 0;
u32 flags = 0;
@@ -11831,8 +11832,11 @@ static int bnxt_hwrm_if_change(struct bnxt *bp, bool up)
set_bit(BNXT_STATE_ABORT_ERR, &bp->state);
return -ENODEV;
}
- if (resc_reinit || fw_reset) {
- if (fw_reset) {
+ if (flags & FUNC_DRV_IF_CHANGE_RESP_FLAGS_CAPS_CHANGE)
+ caps_change = true;
+
+ if (resc_reinit || fw_reset || caps_change) {
+ if (fw_reset || caps_change) {
set_bit(BNXT_STATE_FW_RESET_DET, &bp->state);
if (!test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
bnxt_ulp_irq_stop(bp);
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 103/486] tcp: reorganize tcp_in_ack_event() and tcp_count_delivered()
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (7 preceding siblings ...)
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 092/486] bnxt_en: Query FW parameters when the CAPS_CHANGE bit is set Sasha Levin
@ 2025-05-05 22:32 ` Sasha Levin
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 116/486] net/smc: use the correct ndev to find pnetid by pnetid table Sasha Levin
` (58 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:32 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Ilpo Järvinen, Chia-Yu Chang, David S . Miller, Sasha Levin,
edumazet, ncardwell, dsahern, kuba, pabeni, netdev
From: Ilpo Järvinen <ij@kernel.org>
[ Upstream commit 149dfb31615e22271d2525f078c95ea49bc4db24 ]
- Move tcp_count_delivered() earlier and split tcp_count_delivered_ce()
out of it
- Move tcp_in_ack_event() later
- While at it, remove the inline from tcp_in_ack_event() and let
the compiler to decide
Accurate ECN's heuristics does not know if there is going
to be ACE field based CE counter increase or not until after
rtx queue has been processed. Only then the number of ACKed
bytes/pkts is available. As CE or not affects presence of
FLAG_ECE, that information for tcp_in_ack_event is not yet
available in the old location of the call to tcp_in_ack_event().
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/tcp_input.c | 56 +++++++++++++++++++++++++-------------------
1 file changed, 32 insertions(+), 24 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d93a5a89c5692..d29219e067b7f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -419,6 +419,20 @@ static bool tcp_ecn_rcv_ecn_echo(const struct tcp_sock *tp, const struct tcphdr
return false;
}
+static void tcp_count_delivered_ce(struct tcp_sock *tp, u32 ecn_count)
+{
+ tp->delivered_ce += ecn_count;
+}
+
+/* Updates the delivered and delivered_ce counts */
+static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
+ bool ece_ack)
+{
+ tp->delivered += delivered;
+ if (ece_ack)
+ tcp_count_delivered_ce(tp, delivered);
+}
+
/* Buffer size and advertised window tuning.
*
* 1. Tuning sk->sk_sndbuf, when connection enters established state.
@@ -1154,15 +1168,6 @@ void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb)
}
}
-/* Updates the delivered and delivered_ce counts */
-static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
- bool ece_ack)
-{
- tp->delivered += delivered;
- if (ece_ack)
- tp->delivered_ce += delivered;
-}
-
/* This procedure tags the retransmission queue when SACKs arrive.
*
* We have three tag bits: SACKED(S), RETRANS(R) and LOST(L).
@@ -3862,12 +3867,23 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag)
}
}
-static inline void tcp_in_ack_event(struct sock *sk, u32 flags)
+static void tcp_in_ack_event(struct sock *sk, int flag)
{
const struct inet_connection_sock *icsk = inet_csk(sk);
- if (icsk->icsk_ca_ops->in_ack_event)
- icsk->icsk_ca_ops->in_ack_event(sk, flags);
+ if (icsk->icsk_ca_ops->in_ack_event) {
+ u32 ack_ev_flags = 0;
+
+ if (flag & FLAG_WIN_UPDATE)
+ ack_ev_flags |= CA_ACK_WIN_UPDATE;
+ if (flag & FLAG_SLOWPATH) {
+ ack_ev_flags |= CA_ACK_SLOWPATH;
+ if (flag & FLAG_ECE)
+ ack_ev_flags |= CA_ACK_ECE;
+ }
+
+ icsk->icsk_ca_ops->in_ack_event(sk, ack_ev_flags);
+ }
}
/* Congestion control has updated the cwnd already. So if we're in
@@ -3984,12 +4000,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
tcp_snd_una_update(tp, ack);
flag |= FLAG_WIN_UPDATE;
- tcp_in_ack_event(sk, CA_ACK_WIN_UPDATE);
-
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPACKS);
} else {
- u32 ack_ev_flags = CA_ACK_SLOWPATH;
-
if (ack_seq != TCP_SKB_CB(skb)->end_seq)
flag |= FLAG_DATA;
else
@@ -4001,19 +4013,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una,
&sack_state);
- if (tcp_ecn_rcv_ecn_echo(tp, tcp_hdr(skb))) {
+ if (tcp_ecn_rcv_ecn_echo(tp, tcp_hdr(skb)))
flag |= FLAG_ECE;
- ack_ev_flags |= CA_ACK_ECE;
- }
if (sack_state.sack_delivered)
tcp_count_delivered(tp, sack_state.sack_delivered,
flag & FLAG_ECE);
-
- if (flag & FLAG_WIN_UPDATE)
- ack_ev_flags |= CA_ACK_WIN_UPDATE;
-
- tcp_in_ack_event(sk, ack_ev_flags);
}
/* This is a deviation from RFC3168 since it states that:
@@ -4040,6 +4045,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
tcp_rack_update_reo_wnd(sk, &rs);
+ tcp_in_ack_event(sk, flag);
+
if (tp->tlp_high_seq)
tcp_process_tlp_ack(sk, ack, flag);
@@ -4071,6 +4078,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
return 1;
no_queue:
+ tcp_in_ack_event(sk, flag);
/* If data was DSACKed, see if we can undo a cwnd reduction. */
if (flag & FLAG_DSACKING_ACK) {
tcp_fastretrans_alert(sk, prior_snd_una, num_dupack, &flag,
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 116/486] net/smc: use the correct ndev to find pnetid by pnetid table
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (8 preceding siblings ...)
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 103/486] tcp: reorganize tcp_in_ack_event() and tcp_count_delivered() Sasha Levin
@ 2025-05-05 22:33 ` Sasha Levin
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 131/486] net: stmmac: dwmac-rk: Validate GRF and peripheral GRF during probe Sasha Levin
` (57 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:33 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Guangguan Wang, Wenjia Zhang, Halil Pasic, David S . Miller,
Sasha Levin, jaka, edumazet, kuba, pabeni, linux-rdma, linux-s390,
netdev
From: Guangguan Wang <guangguan.wang@linux.alibaba.com>
[ Upstream commit bfc6c67ec2d64d0ca4e5cc3e1ac84298a10b8d62 ]
When using smc_pnet in SMC, it will only search the pnetid in the
base_ndev of the netdev hierarchy(both HW PNETID and User-defined
sw pnetid). This may not work for some scenarios when using SMC in
container on cloud environment.
In container, there have choices of different container network,
such as directly using host network, virtual network IPVLAN, veth,
etc. Different choices of container network have different netdev
hierarchy. Examples of netdev hierarchy show below. (eth0 and eth1
in host below is the netdev directly related to the physical device).
_______________________________
| _________________ |
| |POD | |
| | | |
| | eth0_________ | |
| |____| |__| |
| | | |
| | | |
| eth1|base_ndev| eth0_______ |
| | | | RDMA ||
| host |_________| |_______||
---------------------------------
netdev hierarchy if directly using host network
________________________________
| _________________ |
| |POD __________ | |
| | |upper_ndev| | |
| |eth0|__________| | |
| |_______|_________| |
| |lower netdev |
| __|______ |
| eth1| | eth0_______ |
| |base_ndev| | RDMA ||
| host |_________| |_______||
---------------------------------
netdev hierarchy if using IPVLAN
_______________________________
| _____________________ |
| |POD _________ | |
| | |base_ndev|| |
| |eth0(veth)|_________|| |
| |____________|________| |
| |pairs |
| _______|_ |
| | | eth0_______ |
| veth|base_ndev| | RDMA ||
| |_________| |_______||
| _________ |
| eth1|base_ndev| |
| host |_________| |
---------------------------------
netdev hierarchy if using veth
Due to some reasons, the eth1 in host is not RDMA attached netdevice,
pnetid is needed to map the eth1(in host) with RDMA device so that POD
can do SMC-R. Because the eth1(in host) is managed by CNI plugin(such
as Terway, network management plugin in container environment), and in
cloud environment the eth(in host) can dynamically be inserted by CNI
when POD create and dynamically be removed by CNI when POD destroy and
no POD related to the eth(in host) anymore. It is hard to config the
pnetid to the eth1(in host). But it is easy to config the pnetid to the
netdevice which can be seen in POD. When do SMC-R, both the container
directly using host network and the container using veth network can
successfully match the RDMA device, because the configured pnetid netdev
is a base_ndev. But the container using IPVLAN can not successfully
match the RDMA device and 0x03030000 fallback happens, because the
configured pnetid netdev is not a base_ndev. Additionally, if config
pnetid to the eth1(in host) also can not work for matching RDMA device
when using veth network and doing SMC-R in POD.
To resolve the problems list above, this patch extends to search user
-defined sw pnetid in the clc handshake ndev when no pnetid can be found
in the base_ndev, and the base_ndev take precedence over ndev for backward
compatibility. This patch also can unify the pnetid setup of different
network choices list above in container(Config user-defined sw pnetid in
the netdevice can be seen in POD).
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/smc/smc_pnet.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c
index 716808f374a8d..b391c2ef463f2 100644
--- a/net/smc/smc_pnet.c
+++ b/net/smc/smc_pnet.c
@@ -1079,14 +1079,16 @@ static void smc_pnet_find_roce_by_pnetid(struct net_device *ndev,
struct smc_init_info *ini)
{
u8 ndev_pnetid[SMC_MAX_PNETID_LEN];
+ struct net_device *base_ndev;
struct net *net;
- ndev = pnet_find_base_ndev(ndev);
+ base_ndev = pnet_find_base_ndev(ndev);
net = dev_net(ndev);
- if (smc_pnetid_by_dev_port(ndev->dev.parent, ndev->dev_port,
+ if (smc_pnetid_by_dev_port(base_ndev->dev.parent, base_ndev->dev_port,
ndev_pnetid) &&
+ smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid) &&
smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid)) {
- smc_pnet_find_rdma_dev(ndev, ini);
+ smc_pnet_find_rdma_dev(base_ndev, ini);
return; /* pnetid could not be determined */
}
_smc_pnet_find_roce_by_pnetid(ndev_pnetid, ini, NULL, net);
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 131/486] net: stmmac: dwmac-rk: Validate GRF and peripheral GRF during probe
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (9 preceding siblings ...)
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 116/486] net/smc: use the correct ndev to find pnetid by pnetid table Sasha Levin
@ 2025-05-05 22:33 ` Sasha Levin
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 132/486] net: hsr: Fix PRP duplicate detection Sasha Levin
` (56 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:33 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jonas Karlman, Simon Horman, Sebastian Reichel, Paolo Abeni,
Sasha Levin, andrew+netdev, davem, edumazet, kuba,
mcoquelin.stm32, alexandre.torgue, rmk+kernel, david.wu,
jan.petrous, wens, netdev, linux-stm32, linux-arm-kernel
From: Jonas Karlman <jonas@kwiboo.se>
[ Upstream commit 247e84f66a3d1946193d739fec5dc3d69833fd00 ]
All Rockchip GMAC variants typically write to GRF regs to control e.g.
interface mode, speed and MAC rx/tx delay. Newer SoCs such as RK3576 and
RK3588 use a mix of GRF and peripheral GRF regs. These syscon regmaps is
located with help of a rockchip,grf and rockchip,php-grf phandle.
However, validating the rockchip,grf and rockchip,php-grf syscon regmap
is deferred until e.g. interface mode or speed is configured, inside the
individual SoC specific operations.
Change to validate the rockchip,grf and rockchip,php-grf syscon regmap
at probe time to simplify all SoC specific operations.
This should not introduce any backward compatibility issues as all
GMAC nodes have been added together with a rockchip,grf phandle (and
rockchip,php-grf where required) in their initial commit.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250308213720.2517944-3-jonas@kwiboo.se
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
.../net/ethernet/stmicro/stmmac/dwmac-rk.c | 21 +++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
index 50073bdade46e..8f90eae937741 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -33,6 +33,7 @@ struct rk_gmac_ops {
void (*set_clock_selection)(struct rk_priv_data *bsp_priv, bool input,
bool enable);
void (*integrated_phy_powerup)(struct rk_priv_data *bsp_priv);
+ bool php_grf_required;
bool regs_valid;
u32 regs[];
};
@@ -1263,6 +1264,7 @@ static const struct rk_gmac_ops rk3576_ops = {
.set_rgmii_speed = rk3576_set_gmac_speed,
.set_rmii_speed = rk3576_set_gmac_speed,
.set_clock_selection = rk3576_set_clock_selection,
+ .php_grf_required = true,
.regs_valid = true,
.regs = {
0x2a220000, /* gmac0 */
@@ -1410,6 +1412,7 @@ static const struct rk_gmac_ops rk3588_ops = {
.set_rgmii_speed = rk3588_set_gmac_speed,
.set_rmii_speed = rk3588_set_gmac_speed,
.set_clock_selection = rk3588_set_clock_selection,
+ .php_grf_required = true,
.regs_valid = true,
.regs = {
0xfe1b0000, /* gmac0 */
@@ -1830,8 +1833,22 @@ static struct rk_priv_data *rk_gmac_setup(struct platform_device *pdev,
bsp_priv->grf = syscon_regmap_lookup_by_phandle(dev->of_node,
"rockchip,grf");
- bsp_priv->php_grf = syscon_regmap_lookup_by_phandle(dev->of_node,
- "rockchip,php-grf");
+ if (IS_ERR(bsp_priv->grf)) {
+ dev_err_probe(dev, PTR_ERR(bsp_priv->grf),
+ "failed to lookup rockchip,grf\n");
+ return ERR_CAST(bsp_priv->grf);
+ }
+
+ if (ops->php_grf_required) {
+ bsp_priv->php_grf =
+ syscon_regmap_lookup_by_phandle(dev->of_node,
+ "rockchip,php-grf");
+ if (IS_ERR(bsp_priv->php_grf)) {
+ dev_err_probe(dev, PTR_ERR(bsp_priv->php_grf),
+ "failed to lookup rockchip,php-grf\n");
+ return ERR_CAST(bsp_priv->php_grf);
+ }
+ }
if (plat->phy_node) {
bsp_priv->integrated_phy = of_property_read_bool(plat->phy_node,
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 132/486] net: hsr: Fix PRP duplicate detection
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (10 preceding siblings ...)
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 131/486] net: stmmac: dwmac-rk: Validate GRF and peripheral GRF during probe Sasha Levin
@ 2025-05-05 22:33 ` Sasha Levin
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 135/486] netfilter: conntrack: Bound nf_conntrack sysctl writes Sasha Levin
` (55 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:33 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jaakko Karrenpalo, Simon Horman, Paolo Abeni, Sasha Levin, davem,
edumazet, kuba, lukma, aleksander.lobakin, sdf, w-kwok2,
m-karicheri2, danishanwar, wojciech.drewek, netdev
From: Jaakko Karrenpalo <jkarrenpalo@gmail.com>
[ Upstream commit 05fd00e5e7b1ac60d264f72423fba38cc382b447 ]
Add PRP specific function for handling duplicate
packets. This is needed because of potential
L2 802.1p prioritization done by network switches.
The L2 prioritization can re-order the PRP packets
from a node causing the existing implementation to
discard the frame(s) that have been received 'late'
because the sequence number is before the previous
received packet. This can happen if the node is
sending multiple frames back-to-back with different
priority.
Signed-off-by: Jaakko Karrenpalo <jkarrenpalo@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250307161700.1045-1-jkarrenpalo@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/hsr/hsr_device.c | 2 +
net/hsr/hsr_forward.c | 4 +-
net/hsr/hsr_framereg.c | 95 ++++++++++++++++++++++++++++++++++++++++--
net/hsr/hsr_framereg.h | 8 +++-
net/hsr/hsr_main.h | 2 +
5 files changed, 104 insertions(+), 7 deletions(-)
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
index 44048d7538ddc..9d0754b3642fd 100644
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -543,6 +543,7 @@ static struct hsr_proto_ops hsr_ops = {
.drop_frame = hsr_drop_frame,
.fill_frame_info = hsr_fill_frame_info,
.invalid_dan_ingress_frame = hsr_invalid_dan_ingress_frame,
+ .register_frame_out = hsr_register_frame_out,
};
static struct hsr_proto_ops prp_ops = {
@@ -553,6 +554,7 @@ static struct hsr_proto_ops prp_ops = {
.fill_frame_info = prp_fill_frame_info,
.handle_san_frame = prp_handle_san_frame,
.update_san_info = prp_update_san_info,
+ .register_frame_out = prp_register_frame_out,
};
void hsr_dev_setup(struct net_device *dev)
diff --git a/net/hsr/hsr_forward.c b/net/hsr/hsr_forward.c
index c0217476eb17f..ace4e355d1647 100644
--- a/net/hsr/hsr_forward.c
+++ b/net/hsr/hsr_forward.c
@@ -524,8 +524,8 @@ static void hsr_forward_do(struct hsr_frame_info *frame)
* Also for SAN, this shouldn't be done.
*/
if (!frame->is_from_san &&
- hsr_register_frame_out(port, frame->node_src,
- frame->sequence_nr))
+ hsr->proto_ops->register_frame_out &&
+ hsr->proto_ops->register_frame_out(port, frame))
continue;
if (frame->is_supervision && port->type == HSR_PT_MASTER &&
diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
index 73bc6f659812f..85991fab7db58 100644
--- a/net/hsr/hsr_framereg.c
+++ b/net/hsr/hsr_framereg.c
@@ -35,6 +35,7 @@ static bool seq_nr_after(u16 a, u16 b)
#define seq_nr_before(a, b) seq_nr_after((b), (a))
#define seq_nr_before_or_eq(a, b) (!seq_nr_after((a), (b)))
+#define PRP_DROP_WINDOW_LEN 32768
bool hsr_addr_is_redbox(struct hsr_priv *hsr, unsigned char *addr)
{
@@ -176,8 +177,11 @@ static struct hsr_node *hsr_add_node(struct hsr_priv *hsr,
new_node->time_in[i] = now;
new_node->time_out[i] = now;
}
- for (i = 0; i < HSR_PT_PORTS; i++)
+ for (i = 0; i < HSR_PT_PORTS; i++) {
new_node->seq_out[i] = seq_out;
+ new_node->seq_expected[i] = seq_out + 1;
+ new_node->seq_start[i] = seq_out + 1;
+ }
if (san && hsr->proto_ops->handle_san_frame)
hsr->proto_ops->handle_san_frame(san, rx_port, new_node);
@@ -482,9 +486,11 @@ void hsr_register_frame_in(struct hsr_node *node, struct hsr_port *port,
* 0 otherwise, or
* negative error code on error
*/
-int hsr_register_frame_out(struct hsr_port *port, struct hsr_node *node,
- u16 sequence_nr)
+int hsr_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
{
+ struct hsr_node *node = frame->node_src;
+ u16 sequence_nr = frame->sequence_nr;
+
spin_lock_bh(&node->seq_out_lock);
if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type]) &&
time_is_after_jiffies(node->time_out[port->type] +
@@ -499,6 +505,89 @@ int hsr_register_frame_out(struct hsr_port *port, struct hsr_node *node,
return 0;
}
+/* Adaptation of the PRP duplicate discard algorithm described in wireshark
+ * wiki (https://wiki.wireshark.org/PRP)
+ *
+ * A drop window is maintained for both LANs with start sequence set to the
+ * first sequence accepted on the LAN that has not been seen on the other LAN,
+ * and expected sequence set to the latest received sequence number plus one.
+ *
+ * When a frame is received on either LAN it is compared against the received
+ * frames on the other LAN. If it is outside the drop window of the other LAN
+ * the frame is accepted and the drop window is updated.
+ * The drop window for the other LAN is reset.
+ *
+ * 'port' is the outgoing interface
+ * 'frame' is the frame to be sent
+ *
+ * Return:
+ * 1 if frame can be shown to have been sent recently on this interface,
+ * 0 otherwise
+ */
+int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
+{
+ enum hsr_port_type other_port;
+ enum hsr_port_type rcv_port;
+ struct hsr_node *node;
+ u16 sequence_diff;
+ u16 sequence_exp;
+ u16 sequence_nr;
+
+ /* out-going frames are always in order
+ * and can be checked the same way as for HSR
+ */
+ if (frame->port_rcv->type == HSR_PT_MASTER)
+ return hsr_register_frame_out(port, frame);
+
+ /* for PRP we should only forward frames from the slave ports
+ * to the master port
+ */
+ if (port->type != HSR_PT_MASTER)
+ return 1;
+
+ node = frame->node_src;
+ sequence_nr = frame->sequence_nr;
+ sequence_exp = sequence_nr + 1;
+ rcv_port = frame->port_rcv->type;
+ other_port = rcv_port == HSR_PT_SLAVE_A ? HSR_PT_SLAVE_B :
+ HSR_PT_SLAVE_A;
+
+ spin_lock_bh(&node->seq_out_lock);
+ if (time_is_before_jiffies(node->time_out[port->type] +
+ msecs_to_jiffies(HSR_ENTRY_FORGET_TIME)) ||
+ (node->seq_start[rcv_port] == node->seq_expected[rcv_port] &&
+ node->seq_start[other_port] == node->seq_expected[other_port])) {
+ /* the node hasn't been sending for a while
+ * or both drop windows are empty, forward the frame
+ */
+ node->seq_start[rcv_port] = sequence_nr;
+ } else if (seq_nr_before(sequence_nr, node->seq_expected[other_port]) &&
+ seq_nr_before_or_eq(node->seq_start[other_port], sequence_nr)) {
+ /* drop the frame, update the drop window for the other port
+ * and reset our drop window
+ */
+ node->seq_start[other_port] = sequence_exp;
+ node->seq_expected[rcv_port] = sequence_exp;
+ node->seq_start[rcv_port] = node->seq_expected[rcv_port];
+ spin_unlock_bh(&node->seq_out_lock);
+ return 1;
+ }
+
+ /* update the drop window for the port where this frame was received
+ * and clear the drop window for the other port
+ */
+ node->seq_start[other_port] = node->seq_expected[other_port];
+ node->seq_expected[rcv_port] = sequence_exp;
+ sequence_diff = sequence_exp - node->seq_start[rcv_port];
+ if (sequence_diff > PRP_DROP_WINDOW_LEN)
+ node->seq_start[rcv_port] = sequence_exp - PRP_DROP_WINDOW_LEN;
+
+ node->time_out[port->type] = jiffies;
+ node->seq_out[port->type] = sequence_nr;
+ spin_unlock_bh(&node->seq_out_lock);
+ return 0;
+}
+
static struct hsr_port *get_late_port(struct hsr_priv *hsr,
struct hsr_node *node)
{
diff --git a/net/hsr/hsr_framereg.h b/net/hsr/hsr_framereg.h
index 993fa950d8144..b04948659d84d 100644
--- a/net/hsr/hsr_framereg.h
+++ b/net/hsr/hsr_framereg.h
@@ -44,8 +44,7 @@ void hsr_addr_subst_dest(struct hsr_node *node_src, struct sk_buff *skb,
void hsr_register_frame_in(struct hsr_node *node, struct hsr_port *port,
u16 sequence_nr);
-int hsr_register_frame_out(struct hsr_port *port, struct hsr_node *node,
- u16 sequence_nr);
+int hsr_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame);
void hsr_prune_nodes(struct timer_list *t);
void hsr_prune_proxy_nodes(struct timer_list *t);
@@ -73,6 +72,8 @@ void prp_update_san_info(struct hsr_node *node, bool is_sup);
bool hsr_is_node_in_db(struct list_head *node_db,
const unsigned char addr[ETH_ALEN]);
+int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame);
+
struct hsr_node {
struct list_head mac_list;
/* Protect R/W access to seq_out */
@@ -89,6 +90,9 @@ struct hsr_node {
bool san_b;
u16 seq_out[HSR_PT_PORTS];
bool removed;
+ /* PRP specific duplicate handling */
+ u16 seq_expected[HSR_PT_PORTS];
+ u16 seq_start[HSR_PT_PORTS];
struct rcu_head rcu_head;
};
diff --git a/net/hsr/hsr_main.h b/net/hsr/hsr_main.h
index fcfeb79bb0401..e26244456f639 100644
--- a/net/hsr/hsr_main.h
+++ b/net/hsr/hsr_main.h
@@ -183,6 +183,8 @@ struct hsr_proto_ops {
struct hsr_frame_info *frame);
bool (*invalid_dan_ingress_frame)(__be16 protocol);
void (*update_san_info)(struct hsr_node *node, bool is_sup);
+ int (*register_frame_out)(struct hsr_port *port,
+ struct hsr_frame_info *frame);
};
struct hsr_self_node {
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 135/486] netfilter: conntrack: Bound nf_conntrack sysctl writes
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (11 preceding siblings ...)
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 132/486] net: hsr: Fix PRP duplicate detection Sasha Levin
@ 2025-05-05 22:33 ` Sasha Levin
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 155/486] ipv6: save dontfrag in cork Sasha Levin
` (54 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:33 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Nicolas Bouchinet, Pablo Neira Ayuso, Sasha Levin, kadlec, davem,
edumazet, kuba, pabeni, netfilter-devel, coreteam, netdev
From: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
[ Upstream commit 8b6861390ffee6b8ed78b9395e3776c16fec6579 ]
nf_conntrack_max and nf_conntrack_expect_max sysctls were authorized to
be written any negative value, which would then be stored in the
unsigned int variables nf_conntrack_max and nf_ct_expect_max variables.
While the do_proc_dointvec_conv function is supposed to limit writing
handled by proc_dointvec proc_handler to INT_MAX. Such a negative value
being written in an unsigned int leads to a very high value, exceeding
this limit.
Moreover, the nf_conntrack_expect_max sysctl documentation specifies the
minimum value is 1.
The proc_handlers have thus been updated to proc_dointvec_minmax in
order to specify the following write bounds :
* Bound nf_conntrack_max sysctl writings between SYSCTL_ZERO
and SYSCTL_INT_MAX.
* Bound nf_conntrack_expect_max sysctl writings between SYSCTL_ONE
and SYSCTL_INT_MAX as defined in the sysctl documentation.
With this patch applied, sysctl writes outside the defined in the bound
will thus lead to a write error :
```
sysctl -w net.netfilter.nf_conntrack_expect_max=-1
sysctl: setting key "net.netfilter.nf_conntrack_expect_max": Invalid argument
```
Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netfilter/nf_conntrack_standalone.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 7d4f0fa8b609d..3ea60ff7a6a49 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -619,7 +619,9 @@ static struct ctl_table nf_ct_sysctl_table[] = {
.data = &nf_conntrack_max,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_INT_MAX,
},
[NF_SYSCTL_CT_COUNT] = {
.procname = "nf_conntrack_count",
@@ -655,7 +657,9 @@ static struct ctl_table nf_ct_sysctl_table[] = {
.data = &nf_ct_expect_max,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ONE,
+ .extra2 = SYSCTL_INT_MAX,
},
[NF_SYSCTL_CT_ACCT] = {
.procname = "nf_conntrack_acct",
@@ -948,7 +952,9 @@ static struct ctl_table nf_ct_netfilter_table[] = {
.data = &nf_conntrack_max,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_INT_MAX,
},
};
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 155/486] ipv6: save dontfrag in cork
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (12 preceding siblings ...)
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 135/486] netfilter: conntrack: Bound nf_conntrack sysctl writes Sasha Levin
@ 2025-05-05 22:33 ` Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 180/486] tcp: bring back NUMA dispersion in inet_ehash_locks_alloc() Sasha Levin
` (53 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:33 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Willem de Bruijn, Eric Dumazet, Jakub Kicinski, Sasha Levin,
davem, dsahern, pabeni, netdev
From: Willem de Bruijn <willemb@google.com>
[ Upstream commit a18dfa9925b9ef6107ea3aa5814ca3c704d34a8a ]
When spanning datagram construction over multiple send calls using
MSG_MORE, per datagram settings are configured on the first send.
That is when ip(6)_setup_cork stores these settings for subsequent use
in __ip(6)_append_data and others.
The only flag that escaped this was dontfrag. As a result, a datagram
could be constructed with df=0 on the first sendmsg, but df=1 on a
next. Which is what cmsg_ip.sh does in an upcoming MSG_MORE test in
the "diff" scenario.
Changing datagram conditions in the middle of constructing an skb
makes this already complex code path even more convoluted. It is here
unintentional. Bring this flag in line with expected sockopt/cmsg
behavior.
And stop passing ipc6 to __ip6_append_data, to avoid such issues
in the future. This is already the case for __ip_append_data.
inet6_cork had a 6 byte hole, so the 1B flag has no impact.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250307033620.411611-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/ipv6.h | 1 +
net/ipv6/ip6_output.c | 9 +++++----
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index a6e2aadbb91bd..5aeeed22f35bf 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -207,6 +207,7 @@ struct inet6_cork {
struct ipv6_txoptions *opt;
u8 hop_limit;
u8 tclass;
+ u8 dontfrag:1;
};
/* struct ipv6_pinfo - ipv6 private area */
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 434ddf263b88a..89a61e040e6a1 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1386,6 +1386,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
}
v6_cork->hop_limit = ipc6->hlimit;
v6_cork->tclass = ipc6->tclass;
+ v6_cork->dontfrag = ipc6->dontfrag;
if (rt->dst.flags & DST_XFRM_TUNNEL)
mtu = READ_ONCE(np->pmtudisc) >= IPV6_PMTUDISC_PROBE ?
READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst);
@@ -1417,7 +1418,7 @@ static int __ip6_append_data(struct sock *sk,
int getfrag(void *from, char *to, int offset,
int len, int odd, struct sk_buff *skb),
void *from, size_t length, int transhdrlen,
- unsigned int flags, struct ipcm6_cookie *ipc6)
+ unsigned int flags)
{
struct sk_buff *skb, *skb_prev = NULL;
struct inet_cork *cork = &cork_full->base;
@@ -1471,7 +1472,7 @@ static int __ip6_append_data(struct sock *sk,
if (headersize + transhdrlen > mtu)
goto emsgsize;
- if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
+ if (cork->length + length > mtu - headersize && v6_cork->dontfrag &&
(sk->sk_protocol == IPPROTO_UDP ||
sk->sk_protocol == IPPROTO_ICMPV6 ||
sk->sk_protocol == IPPROTO_RAW)) {
@@ -1843,7 +1844,7 @@ int ip6_append_data(struct sock *sk,
return __ip6_append_data(sk, &sk->sk_write_queue, &inet->cork,
&np->cork, sk_page_frag(sk), getfrag,
- from, length, transhdrlen, flags, ipc6);
+ from, length, transhdrlen, flags);
}
EXPORT_SYMBOL_GPL(ip6_append_data);
@@ -2048,7 +2049,7 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
err = __ip6_append_data(sk, &queue, cork, &v6_cork,
¤t->task_frag, getfrag, from,
length + exthdrlen, transhdrlen + exthdrlen,
- flags, ipc6);
+ flags);
if (err) {
__ip6_flush_pending_frames(sk, &queue, cork, &v6_cork);
return ERR_PTR(err);
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 180/486] tcp: bring back NUMA dispersion in inet_ehash_locks_alloc()
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (13 preceding siblings ...)
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 155/486] ipv6: save dontfrag in cork Sasha Levin
@ 2025-05-05 22:34 ` Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 182/486] ieee802154: ca8210: Use proper setters and getters for bitwise types Sasha Levin
` (52 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:34 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Eric Dumazet, Jason Xing, Kuniyuki Iwashima, Jakub Kicinski,
Sasha Levin, ncardwell, davem, dsahern, pabeni, netdev
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit f8ece40786c9342249aa0a1b55e148ee23b2a746 ]
We have platforms with 6 NUMA nodes and 480 cpus.
inet_ehash_locks_alloc() currently allocates a single 64KB page
to hold all ehash spinlocks. This adds more pressure on a single node.
Change inet_ehash_locks_alloc() to use vmalloc() to spread
the spinlocks on all online nodes, driven by NUMA policies.
At boot time, NUMA policy is interleave=all, meaning that
tcp_hashinfo.ehash_locks gets hash dispersion on all nodes.
Tested:
lack5:~# grep inet_ehash_locks_alloc /proc/vmallocinfo
0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
lack5:~# echo 8192 >/proc/sys/net/ipv4/tcp_child_ehash_entries
lack5:~# numactl --interleave=all unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo"
0x000000004e99d30c-0x00000000763f3279 36864 inet_ehash_locks_alloc+0x90/0x100 pages=8 vmalloc N0=1 N1=2 N2=2 N3=1 N4=1 N5=1
0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
lack5:~# numactl --interleave=0,5 unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo"
0x00000000fd73a33e-0x0000000004b9a177 36864 inet_ehash_locks_alloc+0x90/0x100 pages=8 vmalloc N0=4 N5=4
0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
lack5:~# echo 1024 >/proc/sys/net/ipv4/tcp_child_ehash_entries
lack5:~# numactl --interleave=all unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo"
0x00000000db07d7a2-0x00000000ad697d29 8192 inet_ehash_locks_alloc+0x90/0x100 pages=1 vmalloc N2=1
0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250305130550.1865988-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/inet_hashtables.c | 37 ++++++++++++++++++++++++++-----------
1 file changed, 26 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 9bfcfd016e182..2b4a588247639 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -1230,22 +1230,37 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
{
unsigned int locksz = sizeof(spinlock_t);
unsigned int i, nblocks = 1;
+ spinlock_t *ptr = NULL;
- if (locksz != 0) {
- /* allocate 2 cache lines or at least one spinlock per cpu */
- nblocks = max(2U * L1_CACHE_BYTES / locksz, 1U);
- nblocks = roundup_pow_of_two(nblocks * num_possible_cpus());
+ if (locksz == 0)
+ goto set_mask;
- /* no more locks than number of hash buckets */
- nblocks = min(nblocks, hashinfo->ehash_mask + 1);
+ /* Allocate 2 cache lines or at least one spinlock per cpu. */
+ nblocks = max(2U * L1_CACHE_BYTES / locksz, 1U) * num_possible_cpus();
- hashinfo->ehash_locks = kvmalloc_array(nblocks, locksz, GFP_KERNEL);
- if (!hashinfo->ehash_locks)
- return -ENOMEM;
+ /* At least one page per NUMA node. */
+ nblocks = max(nblocks, num_online_nodes() * PAGE_SIZE / locksz);
+
+ nblocks = roundup_pow_of_two(nblocks);
+
+ /* No more locks than number of hash buckets. */
+ nblocks = min(nblocks, hashinfo->ehash_mask + 1);
- for (i = 0; i < nblocks; i++)
- spin_lock_init(&hashinfo->ehash_locks[i]);
+ if (num_online_nodes() > 1) {
+ /* Use vmalloc() to allow NUMA policy to spread pages
+ * on all available nodes if desired.
+ */
+ ptr = vmalloc_array(nblocks, locksz);
+ }
+ if (!ptr) {
+ ptr = kvmalloc_array(nblocks, locksz, GFP_KERNEL);
+ if (!ptr)
+ return -ENOMEM;
}
+ for (i = 0; i < nblocks; i++)
+ spin_lock_init(&ptr[i]);
+ hashinfo->ehash_locks = ptr;
+set_mask:
hashinfo->ehash_locks_mask = nblocks - 1;
return 0;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 182/486] ieee802154: ca8210: Use proper setters and getters for bitwise types
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (14 preceding siblings ...)
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 180/486] tcp: bring back NUMA dispersion in inet_ehash_locks_alloc() Sasha Levin
@ 2025-05-05 22:34 ` Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 193/486] net: phylink: use pl->link_interface in phylink_expects_phy() Sasha Levin
` (51 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:34 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Andy Shevchenko, Miquel Raynal, Linus Walleij, Stefan Schmidt,
Sasha Levin, alex.aring, andrew+netdev, davem, edumazet, kuba,
pabeni, linux-wpan, netdev
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[ Upstream commit 169b2262205836a5d1213ff44dca2962276bece1 ]
Sparse complains that the driver doesn't respect the bitwise types:
drivers/net/ieee802154/ca8210.c:1796:27: warning: incorrect type in assignment (different base types)
drivers/net/ieee802154/ca8210.c:1796:27: expected restricted __le16 [addressable] [assigned] [usertype] pan_id
drivers/net/ieee802154/ca8210.c:1796:27: got unsigned short [usertype]
drivers/net/ieee802154/ca8210.c:1801:25: warning: incorrect type in assignment (different base types)
drivers/net/ieee802154/ca8210.c:1801:25: expected restricted __le16 [addressable] [assigned] [usertype] pan_id
drivers/net/ieee802154/ca8210.c:1801:25: got unsigned short [usertype]
drivers/net/ieee802154/ca8210.c:1928:28: warning: incorrect type in argument 3 (different base types)
drivers/net/ieee802154/ca8210.c:1928:28: expected unsigned short [usertype] dst_pan_id
drivers/net/ieee802154/ca8210.c:1928:28: got restricted __le16 [addressable] [usertype] pan_id
Use proper setters and getters for bitwise types.
Note, in accordance with [1] the protocol is little endian.
Link: https://www.cascoda.com/wp-content/uploads/2018/11/CA-8210_datasheet_0418.pdf [1]
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/20250305105656.2133487-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ieee802154/ca8210.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c
index 753215ebc67c7..a036910f60828 100644
--- a/drivers/net/ieee802154/ca8210.c
+++ b/drivers/net/ieee802154/ca8210.c
@@ -1446,8 +1446,7 @@ static u8 mcps_data_request(
command.pdata.data_req.src_addr_mode = src_addr_mode;
command.pdata.data_req.dst.mode = dst_address_mode;
if (dst_address_mode != MAC_MODE_NO_ADDR) {
- command.pdata.data_req.dst.pan_id[0] = LS_BYTE(dst_pan_id);
- command.pdata.data_req.dst.pan_id[1] = MS_BYTE(dst_pan_id);
+ put_unaligned_le16(dst_pan_id, command.pdata.data_req.dst.pan_id);
if (dst_address_mode == MAC_MODE_SHORT_ADDR) {
command.pdata.data_req.dst.address[0] = LS_BYTE(
dst_addr->short_address
@@ -1795,12 +1794,12 @@ static int ca8210_skb_rx(
}
hdr.source.mode = data_ind[0];
dev_dbg(&priv->spi->dev, "srcAddrMode: %#03x\n", hdr.source.mode);
- hdr.source.pan_id = *(u16 *)&data_ind[1];
+ hdr.source.pan_id = cpu_to_le16(get_unaligned_le16(&data_ind[1]));
dev_dbg(&priv->spi->dev, "srcPanId: %#06x\n", hdr.source.pan_id);
memcpy(&hdr.source.extended_addr, &data_ind[3], 8);
hdr.dest.mode = data_ind[11];
dev_dbg(&priv->spi->dev, "dstAddrMode: %#03x\n", hdr.dest.mode);
- hdr.dest.pan_id = *(u16 *)&data_ind[12];
+ hdr.dest.pan_id = cpu_to_le16(get_unaligned_le16(&data_ind[12]));
dev_dbg(&priv->spi->dev, "dstPanId: %#06x\n", hdr.dest.pan_id);
memcpy(&hdr.dest.extended_addr, &data_ind[14], 8);
@@ -1927,7 +1926,7 @@ static int ca8210_skb_tx(
status = mcps_data_request(
header.source.mode,
header.dest.mode,
- header.dest.pan_id,
+ le16_to_cpu(header.dest.pan_id),
(union macaddr *)&header.dest.extended_addr,
skb->len - mac_len,
&skb->data[mac_len],
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 193/486] net: phylink: use pl->link_interface in phylink_expects_phy()
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (15 preceding siblings ...)
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 182/486] ieee802154: ca8210: Use proper setters and getters for bitwise types Sasha Levin
@ 2025-05-05 22:34 ` Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 206/486] net: ethernet: ti: cpsw_new: populate netdev of_node Sasha Levin
` (50 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:34 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Choong Yong Liang, Russell King, Jakub Kicinski, Sasha Levin,
linux, andrew, hkallweit1, davem, edumazet, pabeni, netdev
From: Choong Yong Liang <yong.liang.choong@linux.intel.com>
[ Upstream commit b63263555eaafbf9ab1a82f2020bbee872d83759 ]
The phylink_expects_phy() function allows MAC drivers to check if they are
expecting a PHY to attach. The checking condition in phylink_expects_phy()
aims to achieve the same result as the checking condition in
phylink_attach_phy().
However, the checking condition in phylink_expects_phy() uses
pl->link_config.interface, while phylink_attach_phy() uses
pl->link_interface.
Initially, both pl->link_interface and pl->link_config.interface are set
to SGMII, and pl->cfg_link_an_mode is set to MLO_AN_INBAND.
When the interface switches from SGMII to 2500BASE-X,
pl->link_config.interface is updated by phylink_major_config().
At this point, pl->cfg_link_an_mode remains MLO_AN_INBAND, and
pl->link_config.interface is set to 2500BASE-X.
Subsequently, when the STMMAC interface is taken down
administratively and brought back up, it is blocked by
phylink_expects_phy().
Since phylink_expects_phy() and phylink_attach_phy() aim to achieve the
same result, phylink_expects_phy() should check pl->link_interface,
which never changes, instead of pl->link_config.interface, which is
updated by phylink_major_config().
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Link: https://patch.msgid.link/20250227121522.1802832-2-yong.liang.choong@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/phy/phylink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 3e9957b6aa148..b78dfcbec936c 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1811,7 +1811,7 @@ bool phylink_expects_phy(struct phylink *pl)
{
if (pl->cfg_link_an_mode == MLO_AN_FIXED ||
(pl->cfg_link_an_mode == MLO_AN_INBAND &&
- phy_interface_mode_is_8023z(pl->link_config.interface)))
+ phy_interface_mode_is_8023z(pl->link_interface)))
return false;
return true;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 206/486] net: ethernet: ti: cpsw_new: populate netdev of_node
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (16 preceding siblings ...)
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 193/486] net: phylink: use pl->link_interface in phylink_expects_phy() Sasha Levin
@ 2025-05-05 22:34 ` Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 207/486] net: phy: nxp-c45-tja11xx: add match_phy_device to TJA1103/TJA1104 Sasha Levin
` (49 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:34 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Alexander Sverdlin, Siddharth Vadapalli, Andrew Lunn,
Jakub Kicinski, Sasha Levin, andrew+netdev, davem, edumazet,
pabeni, alexander.sverdlin, hkallweit1, u.kleine-koenig, lorenzo,
aleksander.lobakin, nicolas.dichtel, linux-omap, netdev
From: Alexander Sverdlin <alexander.sverdlin@siemens.com>
[ Upstream commit 7ff1c88fc89688c27f773ba956f65f0c11367269 ]
So that of_find_net_device_by_node() can find CPSW ports and other DSA
switches can be stacked downstream. Tested in conjunction with KSZ8873.
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://patch.msgid.link/20250303074703.1758297-1-alexander.sverdlin@siemens.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/ti/cpsw_new.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index 557cc71b9dd22..0eee1a0527b5c 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -1417,6 +1417,7 @@ static int cpsw_create_ports(struct cpsw_common *cpsw)
ndev->netdev_ops = &cpsw_netdev_ops;
ndev->ethtool_ops = &cpsw_ethtool_ops;
SET_NETDEV_DEV(ndev, dev);
+ ndev->dev.of_node = slave_data->slave_node;
if (!napi_ndev) {
/* CPSW Host port CPDMA interface is shared between
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 207/486] net: phy: nxp-c45-tja11xx: add match_phy_device to TJA1103/TJA1104
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (17 preceding siblings ...)
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 206/486] net: ethernet: ti: cpsw_new: populate netdev of_node Sasha Levin
@ 2025-05-05 22:34 ` Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 208/486] dpll: Add an assertion to check freq_supported_num Sasha Levin
` (48 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:34 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Andrei Botila, Andrew Lunn, Jakub Kicinski, Sasha Levin,
hkallweit1, davem, edumazet, pabeni, sd, netdev
From: Andrei Botila <andrei.botila@oss.nxp.com>
[ Upstream commit a06a868a0cd96bc51401cdea897313a3f6ad01a0 ]
Add .match_phy_device for the existing TJAs to differentiate between
TJA1103 and TJA1104.
TJA1103 and TJA1104 share the same PHY_ID but TJA1104 has MACsec
capabilities while TJA1103 doesn't.
Signed-off-by: Andrei Botila <andrei.botila@oss.nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250228154320.2979000-2-andrei.botila@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/phy/nxp-c45-tja11xx.c | 54 +++++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/drivers/net/phy/nxp-c45-tja11xx.c b/drivers/net/phy/nxp-c45-tja11xx.c
index 9788b820c6be7..99a5eee77bec1 100644
--- a/drivers/net/phy/nxp-c45-tja11xx.c
+++ b/drivers/net/phy/nxp-c45-tja11xx.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/* NXP C45 PHY driver
- * Copyright 2021-2023 NXP
+ * Copyright 2021-2025 NXP
* Author: Radu Pirea <radu-nicolae.pirea@oss.nxp.com>
*/
@@ -18,6 +18,8 @@
#include "nxp-c45-tja11xx.h"
+#define PHY_ID_MASK GENMASK(31, 4)
+/* Same id: TJA1103, TJA1104 */
#define PHY_ID_TJA_1103 0x001BB010
#define PHY_ID_TJA_1120 0x001BB031
@@ -1930,6 +1932,30 @@ static void tja1120_nmi_handler(struct phy_device *phydev,
}
}
+static int nxp_c45_macsec_ability(struct phy_device *phydev)
+{
+ bool macsec_ability;
+ int phy_abilities;
+
+ phy_abilities = phy_read_mmd(phydev, MDIO_MMD_VEND1,
+ VEND1_PORT_ABILITIES);
+ macsec_ability = !!(phy_abilities & MACSEC_ABILITY);
+
+ return macsec_ability;
+}
+
+static int tja1103_match_phy_device(struct phy_device *phydev)
+{
+ return phy_id_compare(phydev->phy_id, PHY_ID_TJA_1103, PHY_ID_MASK) &&
+ !nxp_c45_macsec_ability(phydev);
+}
+
+static int tja1104_match_phy_device(struct phy_device *phydev)
+{
+ return phy_id_compare(phydev->phy_id, PHY_ID_TJA_1103, PHY_ID_MASK) &&
+ nxp_c45_macsec_ability(phydev);
+}
+
static const struct nxp_c45_regmap tja1120_regmap = {
.vend1_ptp_clk_period = 0x1020,
.vend1_event_msg_filt = 0x9010,
@@ -2000,7 +2026,6 @@ static const struct nxp_c45_phy_data tja1120_phy_data = {
static struct phy_driver nxp_c45_driver[] = {
{
- PHY_ID_MATCH_MODEL(PHY_ID_TJA_1103),
.name = "NXP C45 TJA1103",
.get_features = nxp_c45_get_features,
.driver_data = &tja1103_phy_data,
@@ -2022,6 +2047,31 @@ static struct phy_driver nxp_c45_driver[] = {
.get_sqi = nxp_c45_get_sqi,
.get_sqi_max = nxp_c45_get_sqi_max,
.remove = nxp_c45_remove,
+ .match_phy_device = tja1103_match_phy_device,
+ },
+ {
+ .name = "NXP C45 TJA1104",
+ .get_features = nxp_c45_get_features,
+ .driver_data = &tja1103_phy_data,
+ .probe = nxp_c45_probe,
+ .soft_reset = nxp_c45_soft_reset,
+ .config_aneg = genphy_c45_config_aneg,
+ .config_init = nxp_c45_config_init,
+ .config_intr = tja1103_config_intr,
+ .handle_interrupt = nxp_c45_handle_interrupt,
+ .read_status = genphy_c45_read_status,
+ .suspend = genphy_c45_pma_suspend,
+ .resume = genphy_c45_pma_resume,
+ .get_sset_count = nxp_c45_get_sset_count,
+ .get_strings = nxp_c45_get_strings,
+ .get_stats = nxp_c45_get_stats,
+ .cable_test_start = nxp_c45_cable_test_start,
+ .cable_test_get_status = nxp_c45_cable_test_get_status,
+ .set_loopback = genphy_c45_loopback,
+ .get_sqi = nxp_c45_get_sqi,
+ .get_sqi_max = nxp_c45_get_sqi_max,
+ .remove = nxp_c45_remove,
+ .match_phy_device = tja1104_match_phy_device,
},
{
PHY_ID_MATCH_MODEL(PHY_ID_TJA_1120),
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 208/486] dpll: Add an assertion to check freq_supported_num
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (18 preceding siblings ...)
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 207/486] net: phy: nxp-c45-tja11xx: add match_phy_device to TJA1103/TJA1104 Sasha Levin
@ 2025-05-05 22:34 ` Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 212/486] net: pktgen: fix mpls maximum labels list parsing Sasha Levin
` (47 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:34 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jiasheng Jiang, Jiri Pirko, Vadim Fedorenko, Arkadiusz Kubalewski,
Jakub Kicinski, Sasha Levin, jiri, netdev
From: Jiasheng Jiang <jiashengjiangcool@gmail.com>
[ Upstream commit 39e912a959c19338855b768eaaee2917d7841f71 ]
Since the driver is broken in the case that src->freq_supported is not
NULL but src->freq_supported_num is 0, add an assertion for it.
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20250228150210.34404-1-jiashengjiangcool@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/dpll/dpll_core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
index 1877201d1aa9f..20bdc52f63a50 100644
--- a/drivers/dpll/dpll_core.c
+++ b/drivers/dpll/dpll_core.c
@@ -443,8 +443,11 @@ static void dpll_pin_prop_free(struct dpll_pin_properties *prop)
static int dpll_pin_prop_dup(const struct dpll_pin_properties *src,
struct dpll_pin_properties *dst)
{
+ if (WARN_ON(src->freq_supported && !src->freq_supported_num))
+ return -EINVAL;
+
memcpy(dst, src, sizeof(*dst));
- if (src->freq_supported && src->freq_supported_num) {
+ if (src->freq_supported) {
size_t freq_size = src->freq_supported_num *
sizeof(*src->freq_supported);
dst->freq_supported = kmemdup(src->freq_supported,
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 212/486] net: pktgen: fix mpls maximum labels list parsing
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (19 preceding siblings ...)
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 208/486] dpll: Add an assertion to check freq_supported_num Sasha Levin
@ 2025-05-05 22:34 ` Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 216/486] ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config() Sasha Levin
` (46 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:34 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Peter Seiderer, Simon Horman, Paolo Abeni, Sasha Levin, davem,
edumazet, kuba, netdev
From: Peter Seiderer <ps.report@gmx.net>
[ Upstream commit 2b15a0693f70d1e8119743ee89edbfb1271b3ea8 ]
Fix mpls maximum labels list parsing up to MAX_MPLS_LABELS entries (instead
of up to MAX_MPLS_LABELS - 1).
Addresses the following:
$ echo "mpls 00000f00,00000f01,00000f02,00000f03,00000f04,00000f05,00000f06,00000f07,00000f08,00000f09,00000f0a,00000f0b,00000f0c,00000f0d,00000f0e,00000f0f" > /proc/net/pktgen/lo\@0
-bash: echo: write error: Argument list too long
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/pktgen.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index b6db4910359bb..4d87da56c56a0 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -898,6 +898,10 @@ static ssize_t get_labels(const char __user *buffer, struct pktgen_dev *pkt_dev)
pkt_dev->nr_labels = 0;
do {
__u32 tmp;
+
+ if (n >= MAX_MPLS_LABELS)
+ return -E2BIG;
+
len = hex32_arg(&buffer[i], 8, &tmp);
if (len <= 0)
return len;
@@ -909,8 +913,6 @@ static ssize_t get_labels(const char __user *buffer, struct pktgen_dev *pkt_dev)
return -EFAULT;
i++;
n++;
- if (n >= MAX_MPLS_LABELS)
- return -E2BIG;
} while (c == ',');
pkt_dev->nr_labels = n;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 216/486] ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config().
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (20 preceding siblings ...)
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 212/486] net: pktgen: fix mpls maximum labels list parsing Sasha Levin
@ 2025-05-05 22:34 ` Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 238/486] net/mlx5: Avoid report two health errors on same syndrome Sasha Levin
` (45 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:34 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kuniyuki Iwashima, Eric Dumazet, David Ahern, Jakub Kicinski,
Sasha Levin, davem, pabeni, netdev
From: Kuniyuki Iwashima <kuniyu@amazon.com>
[ Upstream commit 254ba7e6032d3fc738050d500b0c1d8197af90ca ]
fib_valid_key_len() is called in the beginning of fib_table_insert()
or fib_table_delete() to check if the prefix length is valid.
fib_table_insert() and fib_table_delete() are called from 3 paths
- ip_rt_ioctl()
- inet_rtm_newroute() / inet_rtm_delroute()
- fib_magic()
In the first ioctl() path, rtentry_to_fib_config() checks the prefix
length with bad_mask(). Also, fib_magic() always passes the correct
prefix: 32 or ifa->ifa_prefixlen, which is already validated.
Let's move fib_valid_key_len() to the rtnetlink path, rtm_to_fib_config().
While at it, 2 direct returns in rtm_to_fib_config() are changed to
goto to match other places in the same function
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-12-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/fib_frontend.c | 18 ++++++++++++++++--
net/ipv4/fib_trie.c | 22 ----------------------
2 files changed, 16 insertions(+), 24 deletions(-)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 793e6781399a4..5b7c41333d6fc 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -829,19 +829,33 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
}
}
+ if (cfg->fc_dst_len > 32) {
+ NL_SET_ERR_MSG(extack, "Invalid prefix length");
+ err = -EINVAL;
+ goto errout;
+ }
+
+ if (cfg->fc_dst_len < 32 && (ntohl(cfg->fc_dst) << cfg->fc_dst_len)) {
+ NL_SET_ERR_MSG(extack, "Invalid prefix for given prefix length");
+ err = -EINVAL;
+ goto errout;
+ }
+
if (cfg->fc_nh_id) {
if (cfg->fc_oif || cfg->fc_gw_family ||
cfg->fc_encap || cfg->fc_mp) {
NL_SET_ERR_MSG(extack,
"Nexthop specification and nexthop id are mutually exclusive");
- return -EINVAL;
+ err = -EINVAL;
+ goto errout;
}
}
if (has_gw && has_via) {
NL_SET_ERR_MSG(extack,
"Nexthop configuration can not contain both GATEWAY and VIA");
- return -EINVAL;
+ err = -EINVAL;
+ goto errout;
}
if (!cfg->fc_table)
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 09e31757e96c7..cc86031d2050f 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1193,22 +1193,6 @@ static int fib_insert_alias(struct trie *t, struct key_vector *tp,
return 0;
}
-static bool fib_valid_key_len(u32 key, u8 plen, struct netlink_ext_ack *extack)
-{
- if (plen > KEYLENGTH) {
- NL_SET_ERR_MSG(extack, "Invalid prefix length");
- return false;
- }
-
- if ((plen < KEYLENGTH) && (key << plen)) {
- NL_SET_ERR_MSG(extack,
- "Invalid prefix for given prefix length");
- return false;
- }
-
- return true;
-}
-
static void fib_remove_alias(struct trie *t, struct key_vector *tp,
struct key_vector *l, struct fib_alias *old);
@@ -1229,9 +1213,6 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
key = ntohl(cfg->fc_dst);
- if (!fib_valid_key_len(key, plen, extack))
- return -EINVAL;
-
pr_debug("Insert table=%u %08x/%d\n", tb->tb_id, key, plen);
fi = fib_create_info(cfg, extack);
@@ -1723,9 +1704,6 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
key = ntohl(cfg->fc_dst);
- if (!fib_valid_key_len(key, plen, extack))
- return -EINVAL;
-
l = fib_find_node(t, &tp, key);
if (!l)
return -ESRCH;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 238/486] net/mlx5: Avoid report two health errors on same syndrome
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (21 preceding siblings ...)
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 216/486] ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config() Sasha Levin
@ 2025-05-05 22:35 ` Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 239/486] selftests/net: have `gro.sh -t` return a correct exit code Sasha Levin
` (44 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:35 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Moshe Shemesh, Shahar Shitrit, Tariq Toukan, Kalesh AP,
David S . Miller, Sasha Levin, saeedm, andrew+netdev, edumazet,
kuba, pabeni, netdev, linux-rdma
From: Moshe Shemesh <moshe@nvidia.com>
[ Upstream commit b5d7b2f04ebcff740f44ef4d295b3401aeb029f4 ]
In case health counter has not increased for few polling intervals, miss
counter will reach max misses threshold and health report will be
triggered for FW health reporter. In case syndrome found on same health
poll another health report will be triggered.
Avoid two health reports on same syndrome by marking this syndrome as
already known.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/health.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index a6329ca2d9bff..52c8035547be5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -799,6 +799,7 @@ static void poll_health(struct timer_list *t)
health->prev = count;
if (health->miss_counter == MAX_MISSES) {
mlx5_core_err(dev, "device's health compromised - reached miss count\n");
+ health->synd = ioread8(&h->synd);
print_health_info(dev);
queue_work(health->wq, &health->report_work);
}
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 239/486] selftests/net: have `gro.sh -t` return a correct exit code
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (22 preceding siblings ...)
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 238/486] net/mlx5: Avoid report two health errors on same syndrome Sasha Levin
@ 2025-05-05 22:35 ` Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 244/486] net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only Sasha Levin
` (43 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:35 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kevin Krakauer, Willem de Bruijn, Jakub Kicinski, Sasha Levin,
davem, edumazet, pabeni, shuah, netdev, linux-kselftest
From: Kevin Krakauer <krakauer@google.com>
[ Upstream commit 784e6abd99f24024a8998b5916795f0bec9d2fd9 ]
Modify gro.sh to return a useful exit code when the -t flag is used. It
formerly returned 0 no matter what.
Tested: Ran `gro.sh -t large` and verified that test failures return 1.
Signed-off-by: Kevin Krakauer <krakauer@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250226192725.621969-2-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/testing/selftests/net/gro.sh | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh
index 02c21ff4ca81f..aabd6e5480b8e 100755
--- a/tools/testing/selftests/net/gro.sh
+++ b/tools/testing/selftests/net/gro.sh
@@ -100,5 +100,6 @@ trap cleanup EXIT
if [[ "${test}" == "all" ]]; then
run_all_tests
else
- run_test "${proto}" "${test}"
+ exit_code=$(run_test "${proto}" "${test}")
+ exit $exit_code
fi;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 244/486] net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (23 preceding siblings ...)
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 239/486] selftests/net: have `gro.sh -t` return a correct exit code Sasha Levin
@ 2025-05-05 22:35 ` Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 245/486] net: xgene-v2: remove incorrect ACPI_PTR annotation Sasha Levin
` (42 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:35 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Eric Woudstra, Paolo Abeni, Sasha Levin, nbd, sean.wang, lorenzo,
andrew+netdev, davem, edumazet, kuba, matthias.bgg,
angelogioacchino.delregno, netdev, linux-arm-kernel,
linux-mediatek
From: Eric Woudstra <ericwouds@gmail.com>
[ Upstream commit 7fe0353606d77a32c4c7f2814833dd1c043ebdd2 ]
mtk_foe_entry_set_vlan() in mtk_ppe.c already supports double vlan
tagging, but mtk_flow_offload_replace() in mtk_ppe_offload.c only allows
for 1 vlan tag, optionally in combination with pppoe and dsa tags.
However, mtk_foe_entry_set_vlan() only allows for setting the vlan id.
The protocol cannot be set, it is always ETH_P_8021Q, for inner and outer
tag. This patch adds QinQ support to mtk_flow_offload_replace(), only in
the case that both inner and outer tags are ETH_P_8021Q.
Only PPPoE-in-Q (as before) and Q-in-Q are allowed. A combination
of PPPoE and Q-in-Q is not allowed.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Link: https://patch.msgid.link/20250225201509.20843-1-ericwouds@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
.../net/ethernet/mediatek/mtk_ppe_offload.c | 22 +++++++++----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_ppe_offload.c b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
index f20bb390df3ad..c855fb799ce14 100644
--- a/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
@@ -34,8 +34,10 @@ struct mtk_flow_data {
u16 vlan_in;
struct {
- u16 id;
- __be16 proto;
+ struct {
+ u16 id;
+ __be16 proto;
+ } vlans[2];
u8 num;
} vlan;
struct {
@@ -349,18 +351,19 @@ mtk_flow_offload_replace(struct mtk_eth *eth, struct flow_cls_offload *f,
case FLOW_ACTION_CSUM:
break;
case FLOW_ACTION_VLAN_PUSH:
- if (data.vlan.num == 1 ||
+ if (data.vlan.num + data.pppoe.num == 2 ||
act->vlan.proto != htons(ETH_P_8021Q))
return -EOPNOTSUPP;
- data.vlan.id = act->vlan.vid;
- data.vlan.proto = act->vlan.proto;
+ data.vlan.vlans[data.vlan.num].id = act->vlan.vid;
+ data.vlan.vlans[data.vlan.num].proto = act->vlan.proto;
data.vlan.num++;
break;
case FLOW_ACTION_VLAN_POP:
break;
case FLOW_ACTION_PPPOE_PUSH:
- if (data.pppoe.num == 1)
+ if (data.pppoe.num == 1 ||
+ data.vlan.num == 2)
return -EOPNOTSUPP;
data.pppoe.sid = act->pppoe.sid;
@@ -450,12 +453,9 @@ mtk_flow_offload_replace(struct mtk_eth *eth, struct flow_cls_offload *f,
if (offload_type == MTK_PPE_PKT_TYPE_BRIDGE)
foe.bridge.vlan = data.vlan_in;
- if (data.vlan.num == 1) {
- if (data.vlan.proto != htons(ETH_P_8021Q))
- return -EOPNOTSUPP;
+ for (i = 0; i < data.vlan.num; i++)
+ mtk_foe_entry_set_vlan(eth, &foe, data.vlan.vlans[i].id);
- mtk_foe_entry_set_vlan(eth, &foe, data.vlan.id);
- }
if (data.pppoe.num == 1)
mtk_foe_entry_set_pppoe(eth, &foe, data.pppoe.sid);
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 245/486] net: xgene-v2: remove incorrect ACPI_PTR annotation
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (24 preceding siblings ...)
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 244/486] net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only Sasha Levin
@ 2025-05-05 22:35 ` Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 246/486] bonding: report duplicate MAC address in all situations Sasha Levin
` (41 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:35 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Arnd Bergmann, Paolo Abeni, Sasha Levin, iyappan, keyur,
andrew+netdev, davem, edumazet, kuba, netdev
From: Arnd Bergmann <arnd@arndb.de>
[ Upstream commit 01358e8fe922f716c05d7864ac2213b2440026e7 ]
Building with W=1 shows a warning about xge_acpi_match being unused when
CONFIG_ACPI is disabled:
drivers/net/ethernet/apm/xgene-v2/main.c:723:36: error: unused variable 'xge_acpi_match' [-Werror,-Wunused-const-variable]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250225163341.4168238-2-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/apm/xgene-v2/main.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/apm/xgene-v2/main.c b/drivers/net/ethernet/apm/xgene-v2/main.c
index 9e90c23814910..68335935cea77 100644
--- a/drivers/net/ethernet/apm/xgene-v2/main.c
+++ b/drivers/net/ethernet/apm/xgene-v2/main.c
@@ -9,8 +9,6 @@
#include "main.h"
-static const struct acpi_device_id xge_acpi_match[];
-
static int xge_get_resources(struct xge_pdata *pdata)
{
struct platform_device *pdev;
@@ -731,7 +729,7 @@ MODULE_DEVICE_TABLE(acpi, xge_acpi_match);
static struct platform_driver xge_driver = {
.driver = {
.name = "xgene-enet-v2",
- .acpi_match_table = ACPI_PTR(xge_acpi_match),
+ .acpi_match_table = xge_acpi_match,
},
.probe = xge_probe,
.remove_new = xge_remove,
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 246/486] bonding: report duplicate MAC address in all situations
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (25 preceding siblings ...)
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 245/486] net: xgene-v2: remove incorrect ACPI_PTR annotation Sasha Levin
@ 2025-05-05 22:35 ` Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 250/486] Octeontx2-af: RPM: Register driver with PCI subsys IDs Sasha Levin
` (40 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:35 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Hangbin Liu, Nikolay Aleksandrov, Jakub Kicinski, Sasha Levin, jv,
andrew+netdev, davem, edumazet, pabeni, netdev
From: Hangbin Liu <liuhangbin@gmail.com>
[ Upstream commit 28d68d396a1cd21591e8c6d74afbde33a7ea107e ]
Normally, a bond uses the MAC address of the first added slave as the bond’s
MAC address. And the bond will set active slave’s MAC address to bond’s
address if fail_over_mac is set to none (0) or follow (2).
When the first slave is removed, the bond will still use the removed slave’s
MAC address, which can lead to a duplicate MAC address and potentially cause
issues with the switch. To avoid confusion, let's warn the user in all
situations, including when fail_over_mac is set to 2 or not in active-backup
mode.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250225033914.18617-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/bonding/bond_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 4d73abae503d1..4d2e30f4ee250 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2542,7 +2542,7 @@ static int __bond_release_one(struct net_device *bond_dev,
RCU_INIT_POINTER(bond->current_arp_slave, NULL);
- if (!all && (!bond->params.fail_over_mac ||
+ if (!all && (bond->params.fail_over_mac != BOND_FOM_ACTIVE ||
BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)) {
if (ether_addr_equal_64bits(bond_dev->dev_addr, slave->perm_hwaddr) &&
bond_has_slaves(bond))
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 250/486] Octeontx2-af: RPM: Register driver with PCI subsys IDs
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (26 preceding siblings ...)
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 246/486] bonding: report duplicate MAC address in all situations Sasha Levin
@ 2025-05-05 22:35 ` Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 258/486] vhost-scsi: Return queue full for page alloc failures during copy Sasha Levin
` (39 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:35 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Hariprasad Kelam, Jakub Kicinski, Sasha Levin, sgoutham, lcherian,
gakula, jerinj, sbhatta, andrew+netdev, davem, edumazet, pabeni,
netdev
From: Hariprasad Kelam <hkelam@marvell.com>
[ Upstream commit fc9167192f29485be5621e2e9c8208b717b65753 ]
Although the PCI device ID and Vendor ID for the RPM (MAC) block
have remained the same across Octeon CN10K and the next-generation
CN20K silicon, Hardware architecture has changed (NIX mapped RPMs
and RFOE Mapped RPMs).
Add PCI Subsystem IDs to the device table to ensure that this driver
can be probed from NIX mapped RPM devices only.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250224035603.1220913-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/marvell/octeontx2/af/cgx.c | 14 ++++++++++++--
drivers/net/ethernet/marvell/octeontx2/af/rvu.h | 2 ++
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
index 8216f843a7cd5..0b27a695008bd 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
@@ -66,8 +66,18 @@ static int cgx_fwi_link_change(struct cgx *cgx, int lmac_id, bool en);
/* Supported devices */
static const struct pci_device_id cgx_id_table[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_CGX) },
- { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_RPM) },
- { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10KB_RPM) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_RPM,
+ PCI_ANY_ID, PCI_SUBSYS_DEVID_CN10K_A) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_RPM,
+ PCI_ANY_ID, PCI_SUBSYS_DEVID_CNF10K_A) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_RPM,
+ PCI_ANY_ID, PCI_SUBSYS_DEVID_CNF10K_B) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10KB_RPM,
+ PCI_ANY_ID, PCI_SUBSYS_DEVID_CN10K_B) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10KB_RPM,
+ PCI_ANY_ID, PCI_SUBSYS_DEVID_CN20KA) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10KB_RPM,
+ PCI_ANY_ID, PCI_SUBSYS_DEVID_CNF20KA) },
{ 0, } /* end of table */
};
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 8555edbb1c8f9..f94bf04788e98 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -30,6 +30,8 @@
#define PCI_SUBSYS_DEVID_CNF10K_A 0xBA00
#define PCI_SUBSYS_DEVID_CNF10K_B 0xBC00
#define PCI_SUBSYS_DEVID_CN10K_B 0xBD00
+#define PCI_SUBSYS_DEVID_CN20KA 0xC220
+#define PCI_SUBSYS_DEVID_CNF20KA 0xC320
/* PCI BAR nos */
#define PCI_AF_REG_BAR_NUM 0
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 258/486] vhost-scsi: Return queue full for page alloc failures during copy
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (27 preceding siblings ...)
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 250/486] Octeontx2-af: RPM: Register driver with PCI subsys IDs Sasha Levin
@ 2025-05-05 22:35 ` Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 263/486] net/mlx5e: Add correct match to check IPSec syndromes for switchdev mode Sasha Levin
` (38 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:35 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Mike Christie, Michael S . Tsirkin, Stefan Hajnoczi, Sasha Levin,
jasowang, virtualization, kvm, netdev
From: Mike Christie <michael.christie@oracle.com>
[ Upstream commit 891b99eab0f89dbe08d216f4ab71acbeaf7a3102 ]
This has us return queue full if we can't allocate a page during the
copy operation so the initiator can retry.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20241203191705.19431-5-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/vhost/scsi.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 35a03306d1345..f9a106bbe8ee1 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -757,7 +757,7 @@ vhost_scsi_copy_iov_to_sgl(struct vhost_scsi_cmd *cmd, struct iov_iter *iter,
size_t len = iov_iter_count(iter);
unsigned int nbytes = 0;
struct page *page;
- int i;
+ int i, ret;
if (cmd->tvc_data_direction == DMA_FROM_DEVICE) {
cmd->saved_iter_addr = dup_iter(&cmd->saved_iter, iter,
@@ -770,6 +770,7 @@ vhost_scsi_copy_iov_to_sgl(struct vhost_scsi_cmd *cmd, struct iov_iter *iter,
page = alloc_page(GFP_KERNEL);
if (!page) {
i--;
+ ret = -ENOMEM;
goto err;
}
@@ -777,8 +778,10 @@ vhost_scsi_copy_iov_to_sgl(struct vhost_scsi_cmd *cmd, struct iov_iter *iter,
sg_set_page(&sg[i], page, nbytes, 0);
if (cmd->tvc_data_direction == DMA_TO_DEVICE &&
- copy_page_from_iter(page, 0, nbytes, iter) != nbytes)
+ copy_page_from_iter(page, 0, nbytes, iter) != nbytes) {
+ ret = -EFAULT;
goto err;
+ }
len -= nbytes;
}
@@ -793,7 +796,7 @@ vhost_scsi_copy_iov_to_sgl(struct vhost_scsi_cmd *cmd, struct iov_iter *iter,
for (; i >= 0; i--)
__free_page(sg_page(&sg[i]));
kfree(cmd->saved_iter_addr);
- return -ENOMEM;
+ return ret;
}
static int
@@ -1277,9 +1280,9 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
" %d\n", cmd, exp_data_len, prot_bytes, data_direction);
if (data_direction != DMA_NONE) {
- if (unlikely(vhost_scsi_mapal(cmd, prot_bytes,
- &prot_iter, exp_data_len,
- &data_iter))) {
+ ret = vhost_scsi_mapal(cmd, prot_bytes, &prot_iter,
+ exp_data_len, &data_iter);
+ if (unlikely(ret)) {
vq_err(vq, "Failed to map iov to sgl\n");
vhost_scsi_release_cmd_res(&cmd->tvc_se_cmd);
goto err;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 263/486] net/mlx5e: Add correct match to check IPSec syndromes for switchdev mode
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (28 preceding siblings ...)
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 258/486] vhost-scsi: Return queue full for page alloc failures during copy Sasha Levin
@ 2025-05-05 22:35 ` Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 270/486] net/mlx5: Change POOL_NEXT_SIZE define value and make it global Sasha Levin
` (37 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:35 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jianbo Liu, Leon Romanovsky, Patrisious Haddad, Tariq Toukan,
Jakub Kicinski, Sasha Levin, saeedm, andrew+netdev, davem,
edumazet, pabeni, rrameshbabu, moshe, netdev, linux-rdma
From: Jianbo Liu <jianbol@nvidia.com>
[ Upstream commit 85e4a808af2545fefaf18c8fe50071b06fcbdabc ]
In commit dddb49b63d86 ("net/mlx5e: Add IPsec and ASO syndromes check
in HW"), IPSec and ASO syndromes checks after decryption for the
specified ASO object were added. But they are correct only for eswith
in legacy mode. For switchdev mode, metadata register c1 is used to
save the mapped id (not ASO object id). So, need to change the match
accordingly for the check rules in status table.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
.../mellanox/mlx5/core/en_accel/ipsec_fs.c | 28 ++++++++++++++-----
.../mellanox/mlx5/core/esw/ipsec_fs.c | 13 +++++++++
.../mellanox/mlx5/core/esw/ipsec_fs.h | 5 ++++
include/linux/mlx5/eswitch.h | 2 ++
4 files changed, 41 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
index 57861d34d46f8..59b9653f573c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
@@ -165,6 +165,25 @@ static void ipsec_rx_status_pass_destroy(struct mlx5e_ipsec *ipsec,
#endif
}
+static void ipsec_rx_rule_add_match_obj(struct mlx5e_ipsec_sa_entry *sa_entry,
+ struct mlx5e_ipsec_rx *rx,
+ struct mlx5_flow_spec *spec)
+{
+ struct mlx5e_ipsec *ipsec = sa_entry->ipsec;
+
+ if (rx == ipsec->rx_esw) {
+ mlx5_esw_ipsec_rx_rule_add_match_obj(sa_entry, spec);
+ } else {
+ MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria,
+ misc_parameters_2.metadata_reg_c_2);
+ MLX5_SET(fte_match_param, spec->match_value,
+ misc_parameters_2.metadata_reg_c_2,
+ sa_entry->ipsec_obj_id | BIT(31));
+
+ spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS_2;
+ }
+}
+
static int rx_add_rule_drop_auth_trailer(struct mlx5e_ipsec_sa_entry *sa_entry,
struct mlx5e_ipsec_rx *rx)
{
@@ -200,11 +219,8 @@ static int rx_add_rule_drop_auth_trailer(struct mlx5e_ipsec_sa_entry *sa_entry,
MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, misc_parameters_2.ipsec_syndrome);
MLX5_SET(fte_match_param, spec->match_value, misc_parameters_2.ipsec_syndrome, 1);
- MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, misc_parameters_2.metadata_reg_c_2);
- MLX5_SET(fte_match_param, spec->match_value,
- misc_parameters_2.metadata_reg_c_2,
- sa_entry->ipsec_obj_id | BIT(31));
spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS_2;
+ ipsec_rx_rule_add_match_obj(sa_entry, rx, spec);
rule = mlx5_add_flow_rules(ft, spec, &flow_act, &dest, 1);
if (IS_ERR(rule)) {
err = PTR_ERR(rule);
@@ -281,10 +297,8 @@ static int rx_add_rule_drop_replay(struct mlx5e_ipsec_sa_entry *sa_entry, struct
MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, misc_parameters_2.metadata_reg_c_4);
MLX5_SET(fte_match_param, spec->match_value, misc_parameters_2.metadata_reg_c_4, 1);
- MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, misc_parameters_2.metadata_reg_c_2);
- MLX5_SET(fte_match_param, spec->match_value, misc_parameters_2.metadata_reg_c_2,
- sa_entry->ipsec_obj_id | BIT(31));
spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS_2;
+ ipsec_rx_rule_add_match_obj(sa_entry, rx, spec);
rule = mlx5_add_flow_rules(ft, spec, &flow_act, &dest, 1);
if (IS_ERR(rule)) {
err = PTR_ERR(rule);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c
index ed977ae75fab8..4bba2884c1c05 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c
@@ -85,6 +85,19 @@ int mlx5_esw_ipsec_rx_setup_modify_header(struct mlx5e_ipsec_sa_entry *sa_entry,
return err;
}
+void mlx5_esw_ipsec_rx_rule_add_match_obj(struct mlx5e_ipsec_sa_entry *sa_entry,
+ struct mlx5_flow_spec *spec)
+{
+ MLX5_SET(fte_match_param, spec->match_criteria,
+ misc_parameters_2.metadata_reg_c_1,
+ ESW_IPSEC_RX_MAPPED_ID_MATCH_MASK);
+ MLX5_SET(fte_match_param, spec->match_value,
+ misc_parameters_2.metadata_reg_c_1,
+ sa_entry->rx_mapped_id << ESW_ZONE_ID_BITS);
+
+ spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS_2;
+}
+
void mlx5_esw_ipsec_rx_id_mapping_remove(struct mlx5e_ipsec_sa_entry *sa_entry)
{
struct mlx5e_ipsec *ipsec = sa_entry->ipsec;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.h b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.h
index ac9c65b89166e..514c15258b1d1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.h
@@ -20,6 +20,8 @@ int mlx5_esw_ipsec_rx_ipsec_obj_id_search(struct mlx5e_priv *priv, u32 id,
void mlx5_esw_ipsec_tx_create_attr_set(struct mlx5e_ipsec *ipsec,
struct mlx5e_ipsec_tx_create_attr *attr);
void mlx5_esw_ipsec_restore_dest_uplink(struct mlx5_core_dev *mdev);
+void mlx5_esw_ipsec_rx_rule_add_match_obj(struct mlx5e_ipsec_sa_entry *sa_entry,
+ struct mlx5_flow_spec *spec);
#else
static inline void mlx5_esw_ipsec_rx_create_attr_set(struct mlx5e_ipsec *ipsec,
struct mlx5e_ipsec_rx_create_attr *attr) {}
@@ -48,5 +50,8 @@ static inline void mlx5_esw_ipsec_tx_create_attr_set(struct mlx5e_ipsec *ipsec,
struct mlx5e_ipsec_tx_create_attr *attr) {}
static inline void mlx5_esw_ipsec_restore_dest_uplink(struct mlx5_core_dev *mdev) {}
+static inline void
+mlx5_esw_ipsec_rx_rule_add_match_obj(struct mlx5e_ipsec_sa_entry *sa_entry,
+ struct mlx5_flow_spec *spec) {}
#endif /* CONFIG_MLX5_ESWITCH */
#endif /* __MLX5_ESW_IPSEC_FS_H__ */
diff --git a/include/linux/mlx5/eswitch.h b/include/linux/mlx5/eswitch.h
index df73a2ccc9af3..67256e776566c 100644
--- a/include/linux/mlx5/eswitch.h
+++ b/include/linux/mlx5/eswitch.h
@@ -147,6 +147,8 @@ u32 mlx5_eswitch_get_vport_metadata_for_set(struct mlx5_eswitch *esw,
/* reuse tun_opts for the mapped ipsec obj id when tun_id is 0 (invalid) */
#define ESW_IPSEC_RX_MAPPED_ID_MASK GENMASK(ESW_TUN_OPTS_BITS - 1, 0)
+#define ESW_IPSEC_RX_MAPPED_ID_MATCH_MASK \
+ GENMASK(31 - ESW_RESERVED_BITS, ESW_ZONE_ID_BITS)
u8 mlx5_eswitch_mode(const struct mlx5_core_dev *dev);
u16 mlx5_eswitch_get_total_vports(const struct mlx5_core_dev *dev);
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 270/486] net/mlx5: Change POOL_NEXT_SIZE define value and make it global
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (29 preceding siblings ...)
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 263/486] net/mlx5e: Add correct match to check IPSec syndromes for switchdev mode Sasha Levin
@ 2025-05-05 22:35 ` Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 274/486] net: ipv6: Init tunnel link-netns before registering dev Sasha Levin
` (36 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:35 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Patrisious Haddad, Maor Gottlieb, Mark Bloch, Tariq Toukan,
Leon Romanovsky, Sasha Levin, saeedm, andrew+netdev, davem,
edumazet, kuba, pabeni, cratiu, bpoirier, michal.swiatkowski,
vulab, horms, netdev, linux-rdma
From: Patrisious Haddad <phaddad@nvidia.com>
[ Upstream commit 80df31f384b4146a62a01b3d4beb376cc7b9a89e ]
Change POOL_NEXT_SIZE define value from 0 to BIT(30), since this define
is used to request the available maximum sized flow table, and zero doesn't
make sense for it, whereas some places in the driver use zero explicitly
expecting the smallest table size possible but instead due to this
define they end up allocating the biggest table size unawarely.
In addition move the definition to "include/linux/mlx5/fs.h" to expose the
define to IB driver as well, while appropriately renaming it.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250219085808.349923-3-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.c | 6 ++++--
drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.h | 2 --
drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c | 3 ++-
include/linux/mlx5/fs.h | 2 ++
5 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c
index 8587cd572da53..bdb825aa87268 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c
@@ -96,7 +96,7 @@ static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw)
if (!flow_group_in)
return -ENOMEM;
- ft_attr.max_fte = POOL_NEXT_SIZE;
+ ft_attr.max_fte = MLX5_FS_MAX_POOL_SIZE;
ft_attr.prio = LEGACY_FDB_PRIO;
fdb = mlx5_create_flow_table(root_ns, &ft_attr);
if (IS_ERR(fdb)) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.c
index c14590acc7726..f6abfd00d7e68 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.c
@@ -50,10 +50,12 @@ mlx5_ft_pool_get_avail_sz(struct mlx5_core_dev *dev, enum fs_flow_table_type tab
int i, found_i = -1;
for (i = ARRAY_SIZE(FT_POOLS) - 1; i >= 0; i--) {
- if (dev->priv.ft_pool->ft_left[i] && FT_POOLS[i] >= desired_size &&
+ if (dev->priv.ft_pool->ft_left[i] &&
+ (FT_POOLS[i] >= desired_size ||
+ desired_size == MLX5_FS_MAX_POOL_SIZE) &&
FT_POOLS[i] <= max_ft_size) {
found_i = i;
- if (desired_size != POOL_NEXT_SIZE)
+ if (desired_size != MLX5_FS_MAX_POOL_SIZE)
break;
}
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.h
index 25f4274b372b5..173e312db7204 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.h
@@ -7,8 +7,6 @@
#include <linux/mlx5/driver.h>
#include "fs_core.h"
-#define POOL_NEXT_SIZE 0
-
int mlx5_ft_pool_init(struct mlx5_core_dev *dev);
void mlx5_ft_pool_destroy(struct mlx5_core_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
index 711d14dea2485..d313cb7f0ed88 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
@@ -161,7 +161,8 @@ mlx5_chains_create_table(struct mlx5_fs_chains *chains,
ft_attr.flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
- sz = (chain == mlx5_chains_get_nf_ft_chain(chains)) ? FT_TBL_SZ : POOL_NEXT_SIZE;
+ sz = (chain == mlx5_chains_get_nf_ft_chain(chains)) ?
+ FT_TBL_SZ : MLX5_FS_MAX_POOL_SIZE;
ft_attr.max_fte = sz;
/* We use chains_default_ft(chains) as the table's next_ft till
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index b744e554f014d..db5c9ddef1702 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -40,6 +40,8 @@
#define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
+#define MLX5_FS_MAX_POOL_SIZE BIT(30)
+
enum mlx5_flow_destination_type {
MLX5_FLOW_DESTINATION_TYPE_NONE,
MLX5_FLOW_DESTINATION_TYPE_VPORT,
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 274/486] net: ipv6: Init tunnel link-netns before registering dev
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (30 preceding siblings ...)
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 270/486] net/mlx5: Change POOL_NEXT_SIZE define value and make it global Sasha Levin
@ 2025-05-05 22:35 ` Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 291/486] net: pktgen: fix access outside of user given buffer in pktgen_thread_write() Sasha Levin
` (35 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:35 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Xiao Liang, Kuniyuki Iwashima, Jakub Kicinski, Sasha Levin, davem,
dsahern, edumazet, pabeni, steffen.klassert, netdev
From: Xiao Liang <shaw.leon@gmail.com>
[ Upstream commit db014522f35606031d8ac58b4aed6b1ed84f03d1 ]
Currently some IPv6 tunnel drivers set tnl->net to dev_net(dev) in
ndo_init(), which is called in register_netdevice(). However, it lacks
the context of link-netns when we enable cross-net tunnels at device
registration time.
Let's move the init of tunnel link-netns before register_netdevice().
ip6_gre has already initialized netns, so just remove the redundant
assignment.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-8-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv6/ip6_gre.c | 2 --
net/ipv6/ip6_tunnel.c | 3 ++-
net/ipv6/ip6_vti.c | 3 ++-
net/ipv6/sit.c | 8 +++++---
4 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 235808cfec705..68e9a41eed491 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1498,7 +1498,6 @@ static int ip6gre_tunnel_init_common(struct net_device *dev)
tunnel = netdev_priv(dev);
tunnel->dev = dev;
- tunnel->net = dev_net(dev);
strcpy(tunnel->parms.name, dev->name);
ret = dst_cache_init(&tunnel->dst_cache, GFP_KERNEL);
@@ -1882,7 +1881,6 @@ static int ip6erspan_tap_init(struct net_device *dev)
tunnel = netdev_priv(dev);
tunnel->dev = dev;
- tunnel->net = dev_net(dev);
strcpy(tunnel->parms.name, dev->name);
ret = dst_cache_init(&tunnel->dst_cache, GFP_KERNEL);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 48fd53b989726..5350c9bb2319b 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1878,7 +1878,6 @@ ip6_tnl_dev_init_gen(struct net_device *dev)
int t_hlen;
t->dev = dev;
- t->net = dev_net(dev);
ret = dst_cache_init(&t->dst_cache, GFP_KERNEL);
if (ret)
@@ -1940,6 +1939,7 @@ static int __net_init ip6_fb_tnl_dev_init(struct net_device *dev)
struct net *net = dev_net(dev);
struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
+ t->net = net;
t->parms.proto = IPPROTO_IPV6;
rcu_assign_pointer(ip6n->tnls_wc[0], t);
@@ -2013,6 +2013,7 @@ static int ip6_tnl_newlink(struct net *src_net, struct net_device *dev,
int err;
nt = netdev_priv(dev);
+ nt->net = net;
if (ip_tunnel_netlink_encap_parms(data, &ipencap)) {
err = ip6_tnl_encap_setup(nt, &ipencap);
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 590737c275379..0123504691443 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -925,7 +925,6 @@ static inline int vti6_dev_init_gen(struct net_device *dev)
struct ip6_tnl *t = netdev_priv(dev);
t->dev = dev;
- t->net = dev_net(dev);
netdev_hold(dev, &t->dev_tracker, GFP_KERNEL);
netdev_lockdep_set_classes(dev);
return 0;
@@ -958,6 +957,7 @@ static int __net_init vti6_fb_tnl_dev_init(struct net_device *dev)
struct net *net = dev_net(dev);
struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+ t->net = net;
t->parms.proto = IPPROTO_IPV6;
rcu_assign_pointer(ip6n->tnls_wc[0], t);
@@ -1008,6 +1008,7 @@ static int vti6_newlink(struct net *src_net, struct net_device *dev,
vti6_netlink_parms(data, &nt->parms);
nt->parms.proto = IPPROTO_IPV6;
+ nt->net = net;
if (vti6_locate(net, &nt->parms, 0))
return -EEXIST;
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 39bd8951bfca1..3c15a0ae228e2 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -269,6 +269,7 @@ static struct ip_tunnel *ipip6_tunnel_locate(struct net *net,
nt = netdev_priv(dev);
+ nt->net = net;
nt->parms = *parms;
if (ipip6_tunnel_create(dev) < 0)
goto failed_free;
@@ -1449,7 +1450,6 @@ static int ipip6_tunnel_init(struct net_device *dev)
int err;
tunnel->dev = dev;
- tunnel->net = dev_net(dev);
strcpy(tunnel->parms.name, dev->name);
ipip6_tunnel_bind_dev(dev);
@@ -1563,6 +1563,7 @@ static int ipip6_newlink(struct net *src_net, struct net_device *dev,
int err;
nt = netdev_priv(dev);
+ nt->net = net;
if (ip_tunnel_netlink_encap_parms(data, &ipencap)) {
err = ip_tunnel_encap_setup(nt, &ipencap);
@@ -1858,6 +1859,9 @@ static int __net_init sit_init_net(struct net *net)
*/
sitn->fb_tunnel_dev->netns_local = true;
+ t = netdev_priv(sitn->fb_tunnel_dev);
+ t->net = net;
+
err = register_netdev(sitn->fb_tunnel_dev);
if (err)
goto err_reg_dev;
@@ -1865,8 +1869,6 @@ static int __net_init sit_init_net(struct net *net)
ipip6_tunnel_clone_6rd(sitn->fb_tunnel_dev, sitn);
ipip6_fb_tunnel_init(sitn->fb_tunnel_dev);
- t = netdev_priv(sitn->fb_tunnel_dev);
-
strcpy(t->parms.name, sitn->fb_tunnel_dev->name);
return 0;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 291/486] net: pktgen: fix access outside of user given buffer in pktgen_thread_write()
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (31 preceding siblings ...)
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 274/486] net: ipv6: Init tunnel link-netns before registering dev Sasha Levin
@ 2025-05-05 22:36 ` Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 294/486] bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback Sasha Levin
` (34 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Peter Seiderer, Simon Horman, Jakub Kicinski, Sasha Levin, davem,
edumazet, pabeni, netdev
From: Peter Seiderer <ps.report@gmx.net>
[ Upstream commit 425e64440ad0a2f03bdaf04be0ae53dededbaa77 ]
Honour the user given buffer size for the strn_len() calls (otherwise
strn_len() will access memory outside of the user given buffer).
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250219084527.20488-8-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/pktgen.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 4d87da56c56a0..762ede0278990 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -1898,8 +1898,8 @@ static ssize_t pktgen_thread_write(struct file *file,
i = len;
/* Read variable name */
-
- len = strn_len(&user_buffer[i], sizeof(name) - 1);
+ max = min(sizeof(name) - 1, count - i);
+ len = strn_len(&user_buffer[i], max);
if (len < 0)
return len;
@@ -1929,7 +1929,8 @@ static ssize_t pktgen_thread_write(struct file *file,
if (!strcmp(name, "add_device")) {
char f[32];
memset(f, 0, 32);
- len = strn_len(&user_buffer[i], sizeof(f) - 1);
+ max = min(sizeof(f) - 1, count - i);
+ len = strn_len(&user_buffer[i], max);
if (len < 0) {
ret = len;
goto out;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 294/486] bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (32 preceding siblings ...)
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 291/486] net: pktgen: fix access outside of user given buffer in pktgen_thread_write() Sasha Levin
@ 2025-05-05 22:36 ` Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 312/486] eth: mlx4: don't try to complete XDP frames in netpoll Sasha Levin
` (33 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jason Xing, Martin KaFai Lau, Sasha Levin, ast, daniel, andrii,
edumazet, ncardwell, davem, kuba, pabeni, martin.lau, dsahern,
bpf, netdev
From: Jason Xing <kerneljasonxing@gmail.com>
[ Upstream commit fd93eaffb3f977b23bc0a48d4c8616e654fcf133 ]
The subsequent patch will implement BPF TX timestamping. It will
call the sockops BPF program without holding the sock lock.
This breaks the current assumption that all sock ops programs will
hold the sock lock. The sock's fields of the uapi's bpf_sock_ops
requires this assumption.
To address this, a new "u8 is_locked_tcp_sock;" field is added. This
patch sets it in the current sock_ops callbacks. The "is_fullsock"
test is then replaced by the "is_locked_tcp_sock" test during
sock_ops_convert_ctx_access().
The new TX timestamping callbacks added in the subsequent patch will
not have this set. This will prevent unsafe access from the new
timestamping callbacks.
Potentially, we could allow read-only access. However, this would
require identifying which callback is read-safe-only and also requires
additional BPF instruction rewrites in the covert_ctx. Since the BPF
program can always read everything from a socket (e.g., by using
bpf_core_cast), this patch keeps it simple and disables all read
and write access to any socket fields through the bpf_sock_ops
UAPI from the new TX timestamping callback.
Moreover, note that some of the fields in bpf_sock_ops are specific
to tcp_sock, and sock_ops currently only supports tcp_sock. In
the future, UDP timestamping will be added, which will also break
this assumption. The same idea used in this patch will be reused.
Considering that the current sock_ops only supports tcp_sock, the
variable is named is_locked_"tcp"_sock.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250220072940.99994-4-kerneljasonxing@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/filter.h | 1 +
include/net/tcp.h | 1 +
net/core/filter.c | 8 ++++----
net/ipv4/tcp_input.c | 2 ++
net/ipv4/tcp_output.c | 2 ++
5 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 5118caf8aa1c7..2b1029aeb36ae 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1506,6 +1506,7 @@ struct bpf_sock_ops_kern {
void *skb_data_end;
u8 op;
u8 is_fullsock;
+ u8 is_locked_tcp_sock;
u8 remaining_opt_len;
u64 temp; /* temp and everything after is not
* initialized to 0 before calling
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3255a199ef60d..c4820759ee0c3 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2667,6 +2667,7 @@ static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args)
memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
if (sk_fullsock(sk)) {
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_owned_by_me(sk);
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 790345c2546b7..b5ede32ba3b14 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10378,10 +10378,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
} \
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
struct bpf_sock_ops_kern, \
- is_fullsock), \
+ is_locked_tcp_sock), \
fullsock_reg, si->src_reg, \
offsetof(struct bpf_sock_ops_kern, \
- is_fullsock)); \
+ is_locked_tcp_sock)); \
*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp); \
if (si->dst_reg == si->src_reg) \
*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg, \
@@ -10466,10 +10466,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
temp)); \
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
struct bpf_sock_ops_kern, \
- is_fullsock), \
+ is_locked_tcp_sock), \
reg, si->dst_reg, \
offsetof(struct bpf_sock_ops_kern, \
- is_fullsock)); \
+ is_locked_tcp_sock)); \
*insn++ = BPF_JMP_IMM(BPF_JEQ, reg, 0, 2); \
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
struct bpf_sock_ops_kern, sk),\
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d29219e067b7f..f5690085a2ac5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -169,6 +169,7 @@ static void bpf_skops_parse_hdr(struct sock *sk, struct sk_buff *skb)
memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
sock_ops.op = BPF_SOCK_OPS_PARSE_HDR_OPT_CB;
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_ops.sk = sk;
bpf_skops_init_skb(&sock_ops, skb, tcp_hdrlen(skb));
@@ -185,6 +186,7 @@ static void bpf_skops_established(struct sock *sk, int bpf_op,
memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
sock_ops.op = bpf_op;
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_ops.sk = sk;
/* sk with TCP_REPAIR_ON does not have skb in tcp_finish_connect */
if (skb)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6d5387811c32a..ca1e52036d4d2 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -525,6 +525,7 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
sock_owned_by_me(sk);
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_ops.sk = sk;
}
@@ -570,6 +571,7 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
sock_owned_by_me(sk);
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_ops.sk = sk;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 312/486] eth: mlx4: don't try to complete XDP frames in netpoll
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (33 preceding siblings ...)
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 294/486] bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback Sasha Levin
@ 2025-05-05 22:36 ` Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 315/486] vxlan: Join / leave MC group after remote changes Sasha Levin
` (32 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jakub Kicinski, Tariq Toukan, Sasha Levin, andrew+netdev, davem,
edumazet, pabeni, ast, daniel, hawk, john.fastabend, netdev,
linux-rdma, bpf
From: Jakub Kicinski <kuba@kernel.org>
[ Upstream commit 8fdeafd66edaf420ea0063a1f13442fe3470fe70 ]
mlx4 doesn't support ndo_xdp_xmit / XDP_REDIRECT and wasn't
using page pool until now, so it could run XDP completions
in netpoll (NAPI budget == 0) just fine. Page pool has calling
context requirements, make sure we don't try to call it from
what is potentially HW IRQ context.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 1ddb11cb25f91..6e077d202827a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -450,6 +450,8 @@ int mlx4_en_process_tx_cq(struct net_device *dev,
if (unlikely(!priv->port_up))
return 0;
+ if (unlikely(!napi_budget) && cq->type == TX_XDP)
+ return 0;
netdev_txq_bql_complete_prefetchw(ring->tx_queue);
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 315/486] vxlan: Join / leave MC group after remote changes
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (34 preceding siblings ...)
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 312/486] eth: mlx4: don't try to complete XDP frames in netpoll Sasha Levin
@ 2025-05-05 22:36 ` Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 321/486] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Sasha Levin
` (31 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Petr Machata, Ido Schimmel, Nikolay Aleksandrov, Paolo Abeni,
Sasha Levin, andrew+netdev, davem, edumazet, kuba, menglong8.dong,
gnault, netdev
From: Petr Machata <petrm@nvidia.com>
[ Upstream commit d42d543368343c0449a4e433b5f02e063a86209c ]
When a vxlan netdevice is brought up, if its default remote is a multicast
address, the device joins the indicated group.
Therefore when the multicast remote address changes, the device should
leave the current group and subscribe to the new one. Similarly when the
interface used for endpoint communication is changed in a situation when
multicast remote is configured. This is currently not done.
Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is
possible that with such fix, the netdevice will end up in an inconsistent
situation where the old group is not joined anymore, but joining the new
group fails. Should we join the new group first, and leave the old one
second, we might end up in the opposite situation, where both groups are
joined. Undoing any of this during rollback is going to be similarly
problematic.
One solution would be to just forbid the change when the netdevice is up.
However in vnifilter mode, changing the group address is allowed, and these
problems are simply ignored (see vxlan_vni_update_group()):
# ip link add name br up type bridge vlan_filtering 1
# ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789
# bridge vni add dev vx1 vni 200 group 224.0.0.1
# tcpdump -i lo &
# bridge vni add dev vx1 vni 200 group 224.0.0.2
18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
# bridge vni
dev vni group/remote
vx1 200 224.0.0.2
Having two different modes of operation for conceptually the same interface
is silly, so in this patch, just do what the vnifilter code does and deal
with the errors by crossing fingers real hard.
The vnifilter code leaves old before joining new, and in case of join /
leave failures does not roll back the configuration changes that have
already been applied, but bails out of joining if it could not leave. Do
the same here: leave before join, apply changes unconditionally and do not
attempt to join if we couldn't leave.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/vxlan/vxlan_core.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 5e7cdd1b806fb..01f66760e1328 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -4340,6 +4340,7 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
struct netlink_ext_ack *extack)
{
struct vxlan_dev *vxlan = netdev_priv(dev);
+ bool rem_ip_changed, change_igmp;
struct net_device *lowerdev;
struct vxlan_config conf;
struct vxlan_rdst *dst;
@@ -4363,8 +4364,13 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
if (err)
return err;
+ rem_ip_changed = !vxlan_addr_equal(&conf.remote_ip, &dst->remote_ip);
+ change_igmp = vxlan->dev->flags & IFF_UP &&
+ (rem_ip_changed ||
+ dst->remote_ifindex != conf.remote_ifindex);
+
/* handle default dst entry */
- if (!vxlan_addr_equal(&conf.remote_ip, &dst->remote_ip)) {
+ if (rem_ip_changed) {
u32 hash_index = fdb_head_index(vxlan, all_zeros_mac, conf.vni);
spin_lock_bh(&vxlan->hash_lock[hash_index]);
@@ -4408,6 +4414,9 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
}
}
+ if (change_igmp && vxlan_addr_multicast(&dst->remote_ip))
+ err = vxlan_multicast_leave(vxlan);
+
if (conf.age_interval != vxlan->cfg.age_interval)
mod_timer(&vxlan->age_timer, jiffies);
@@ -4415,7 +4424,12 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
if (lowerdev && lowerdev != dst->remote_dev)
dst->remote_dev = lowerdev;
vxlan_config_apply(dev, &conf, lowerdev, vxlan->net, true);
- return 0;
+
+ if (!err && change_igmp &&
+ vxlan_addr_multicast(&dst->remote_ip))
+ err = vxlan_multicast_join(vxlan);
+
+ return err;
}
static void vxlan_dellink(struct net_device *dev, struct list_head *head)
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 321/486] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (35 preceding siblings ...)
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 315/486] vxlan: Join / leave MC group after remote changes Sasha Levin
@ 2025-05-05 22:36 ` Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 322/486] net/mlx5: Apply rate-limiting to high temperature warning Sasha Levin
` (30 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Shahar Shitrit, Tariq Toukan, Mateusz Polchlopek, Jakub Kicinski,
Sasha Levin, saeedm, andrew+netdev, davem, edumazet, pabeni,
netdev, linux-rdma
From: Shahar Shitrit <shshitrit@nvidia.com>
[ Upstream commit 633f16d7e07c129a36b882c05379e01ce5bdb542 ]
In the sensor_count field of the MTEWE register, bits 1-62 are
supported only for unmanaged switches, not for NICs, and bit 63
is reserved for internal use.
To prevent confusing output that may include set bits that are
not relevant to NIC sensors, we update the bitmask to retain only
the first bit, which corresponds to the sensor ASIC.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/20250213094641.226501-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/events.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index d91ea53eb394d..cd8d107f7d9e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -163,6 +163,10 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
u64 value_msb;
value_lsb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_lsb);
+ /* bit 1-63 are not supported for NICs,
+ * hence read only bit 0 (asic) from lsb.
+ */
+ value_lsb &= 0x1;
value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
mlx5_core_warn(events->dev,
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 322/486] net/mlx5: Apply rate-limiting to high temperature warning
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (36 preceding siblings ...)
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 321/486] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Sasha Levin
@ 2025-05-05 22:36 ` Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 342/486] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value Sasha Levin
` (29 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Shahar Shitrit, Tariq Toukan, Mateusz Polchlopek, Jakub Kicinski,
Sasha Levin, saeedm, andrew+netdev, davem, edumazet, pabeni,
netdev, linux-rdma
From: Shahar Shitrit <shshitrit@nvidia.com>
[ Upstream commit 9dd3d5d258aceb37bdf09c8b91fa448f58ea81f0 ]
Wrap the high temperature warning in a temperature event with
a call to net_ratelimit() to prevent flooding the kernel log
with repeated warning messages when temperature exceeds the
threshold multiple times within a short duration.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/20250213094641.226501-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/events.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index cd8d107f7d9e3..fc6e56305cbbc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -169,9 +169,10 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
value_lsb &= 0x1;
value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
- mlx5_core_warn(events->dev,
- "High temperature on sensors with bit set %llx %llx",
- value_msb, value_lsb);
+ if (net_ratelimit())
+ mlx5_core_warn(events->dev,
+ "High temperature on sensors with bit set %llx %llx",
+ value_msb, value_lsb);
return NOTIFY_OK;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 342/486] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (37 preceding siblings ...)
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 322/486] net/mlx5: Apply rate-limiting to high temperature warning Sasha Levin
@ 2025-05-05 22:36 ` Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 359/486] net: stmmac: dwmac-loongson: Set correct {tx,rx}_fifo_size Sasha Levin
` (28 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kees Cook, Jakub Kicinski, Sasha Levin, tariqt, andrew+netdev,
davem, edumazet, pabeni, yishaih, netdev, linux-rdma
From: Kees Cook <kees@kernel.org>
[ Upstream commit 4a6f18f28627e121bd1f74b5fcc9f945d6dbeb1e ]
GCC can see that the value range for "order" is capped, but this leads
it to consider that it might be negative, leading to a false positive
warning (with GCC 15 with -Warray-bounds -fdiagnostics-details):
../drivers/net/ethernet/mellanox/mlx4/alloc.c:691:47: error: array subscript -1 is below array bounds of 'long unsigned int *[2]' [-Werror=array-bounds=]
691 | i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o);
| ~~~~~~~~~~~^~~
'mlx4_alloc_db_from_pgdir': events 1-2
691 | i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | | | | (2) out of array bounds here
| (1) when the condition is evaluated to true In file included from ../drivers/net/ethernet/mellanox/mlx4/mlx4.h:53,
from ../drivers/net/ethernet/mellanox/mlx4/alloc.c:42:
../include/linux/mlx4/device.h:664:33: note: while referencing 'bits'
664 | unsigned long *bits[2];
| ^~~~
Switch the argument to unsigned int, which removes the compiler needing
to consider negative values.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20250210174504.work.075-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx4/alloc.c | 6 +++---
include/linux/mlx4/device.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c b/drivers/net/ethernet/mellanox/mlx4/alloc.c
index b330020dc0d67..f2bded847e61d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c
@@ -682,9 +682,9 @@ static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device)
}
static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir,
- struct mlx4_db *db, int order)
+ struct mlx4_db *db, unsigned int order)
{
- int o;
+ unsigned int o;
int i;
for (o = order; o <= 1; ++o) {
@@ -712,7 +712,7 @@ static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir,
return 0;
}
-int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order)
+int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, unsigned int order)
{
struct mlx4_priv *priv = mlx4_priv(dev);
struct mlx4_db_pgdir *pgdir;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 27f42f713c891..86f0f2a25a3d6 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1135,7 +1135,7 @@ int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
struct mlx4_buf *buf);
-int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order);
+int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, unsigned int order);
void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db);
int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 359/486] net: stmmac: dwmac-loongson: Set correct {tx,rx}_fifo_size
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (38 preceding siblings ...)
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 342/486] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value Sasha Levin
@ 2025-05-05 22:37 ` Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 379/486] net/mlx5: XDP, Enable TX side XDP multi-buffer support Sasha Levin
` (27 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Huacai Chen, Yanteng Si, Simon Horman, Chong Qiao, Jakub Kicinski,
Sasha Levin, andrew+netdev, davem, edumazet, pabeni,
mcoquelin.stm32, alexandre.torgue, chenhuacai, fancer.lancer,
chenfeiyang, phasta, zhaoqunqin, rmk+kernel, netdev, linux-stm32,
linux-arm-kernel
From: Huacai Chen <chenhuacai@loongson.cn>
[ Upstream commit 8dbf0c7556454b52af91bae305ca71500c31495c ]
Now for dwmac-loongson {tx,rx}_fifo_size are uninitialised, which means
zero. This means dwmac-loongson doesn't support changing MTU because in
stmmac_change_mtu() it requires the fifo size be no less than MTU. Thus,
set the correct tx_fifo_size and rx_fifo_size for it (16KB multiplied by
queue counts).
Here {tx,rx}_fifo_size is initialised with the initial value (also the
maximum value) of {tx,rx}_queues_to_use. So it will keep as 16KB if we
don't change the queue count, and will be larger than 16KB if we change
(decrease) the queue count. However stmmac_change_mtu() still work well
with current logic (MTU cannot be larger than 16KB for stmmac).
Note: the Fixes tag picked here is the oldest commit and key commit of
the dwmac-loongson series "stmmac: Add Loongson platform support".
Acked-by: Yanteng Si <si.yanteng@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Chong Qiao <qiaochong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://patch.msgid.link/20250210134328.2755328-1-chenhuacai@loongson.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c
index ab7c2750c1042..702ea5a00b56d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c
@@ -590,6 +590,9 @@ static int loongson_dwmac_probe(struct pci_dev *pdev, const struct pci_device_id
if (ret)
goto err_disable_device;
+ plat->tx_fifo_size = SZ_16K * plat->tx_queues_to_use;
+ plat->rx_fifo_size = SZ_16K * plat->rx_queues_to_use;
+
if (dev_of_node(&pdev->dev))
ret = loongson_dwmac_dt_config(pdev, plat, &res);
else
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 379/486] net/mlx5: XDP, Enable TX side XDP multi-buffer support
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (39 preceding siblings ...)
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 359/486] net: stmmac: dwmac-loongson: Set correct {tx,rx}_fifo_size Sasha Levin
@ 2025-05-05 22:37 ` Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 380/486] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB Sasha Levin
` (26 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Alexei Lazar, Tariq Toukan, Jakub Kicinski, Sasha Levin, saeedm,
andrew+netdev, davem, edumazet, pabeni, ast, daniel, hawk,
john.fastabend, dtatulea, michal.swiatkowski, yorayz, lkayal,
witu, leitao, cratiu, netdev, linux-rdma, bpf
From: Alexei Lazar <alazar@nvidia.com>
[ Upstream commit 1a9304859b3a4119579524c293b902a8927180f3 ]
In XDP scenarios, fragmented packets can occur if the MTU is larger
than the page size, even when the packet size fits within the linear
part.
If XDP multi-buffer support is disabled, the fragmented part won't be
handled in the TX flow, leading to packet drops.
Since XDP multi-buffer support is always available, this commit removes
the conditional check for enabling it.
This ensures that XDP multi-buffer support is always enabled,
regardless of the `is_xdp_mb` parameter, and guarantees the handling of
fragmented packets in such scenarios.
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-16-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 -
.../ethernet/mellanox/mlx5/core/en/params.c | 1 -
.../ethernet/mellanox/mlx5/core/en/params.h | 1 -
.../mellanox/mlx5/core/en/reporter_tx.c | 1 -
.../net/ethernet/mellanox/mlx5/core/en/xdp.c | 49 ++++++++-----------
.../net/ethernet/mellanox/mlx5/core/en_main.c | 29 -----------
6 files changed, 21 insertions(+), 61 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 57b7298a0e793..d6266f6a96d6e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -385,7 +385,6 @@ enum {
MLX5E_SQ_STATE_VLAN_NEED_L2_INLINE,
MLX5E_SQ_STATE_PENDING_XSK_TX,
MLX5E_SQ_STATE_PENDING_TLS_RX_RESYNC,
- MLX5E_SQ_STATE_XDP_MULTIBUF,
MLX5E_NUM_SQ_STATES, /* Must be kept last */
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 31eb99f09c63c..8c4d710e85675 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -1242,7 +1242,6 @@ void mlx5e_build_xdpsq_param(struct mlx5_core_dev *mdev,
mlx5e_build_sq_param_common(mdev, param);
MLX5_SET(wq, wq, log_wq_sz, params->log_sq_size);
param->is_mpw = MLX5E_GET_PFLAG(params, MLX5E_PFLAG_XDP_TX_MPWQE);
- param->is_xdp_mb = !mlx5e_rx_is_linear_skb(mdev, params, xsk);
mlx5e_build_tx_cq_param(mdev, params, ¶m->cqp);
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
index 3f8986f9d8629..bd5877acc5b1e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
@@ -33,7 +33,6 @@ struct mlx5e_sq_param {
struct mlx5_wq_param wq;
bool is_mpw;
bool is_tls;
- bool is_xdp_mb;
u16 stop_room;
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 09433b91be176..532c7fa94d172 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -16,7 +16,6 @@ static const char * const sq_sw_state_type_name[] = {
[MLX5E_SQ_STATE_VLAN_NEED_L2_INLINE] = "vlan_need_l2_inline",
[MLX5E_SQ_STATE_PENDING_XSK_TX] = "pending_xsk_tx",
[MLX5E_SQ_STATE_PENDING_TLS_RX_RESYNC] = "pending_tls_rx_resync",
- [MLX5E_SQ_STATE_XDP_MULTIBUF] = "xdp_multibuf",
};
static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 4610621a340e5..08ab0999f7b31 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -546,6 +546,7 @@ mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd,
bool inline_ok;
bool linear;
u16 pi;
+ int i;
struct mlx5e_xdpsq_stats *stats = sq->stats;
@@ -612,41 +613,33 @@ mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd,
cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8) | MLX5_OPCODE_SEND);
- if (test_bit(MLX5E_SQ_STATE_XDP_MULTIBUF, &sq->state)) {
- int i;
-
- memset(&cseg->trailer, 0, sizeof(cseg->trailer));
- memset(eseg, 0, sizeof(*eseg) - sizeof(eseg->trailer));
-
- eseg->inline_hdr.sz = cpu_to_be16(inline_hdr_sz);
+ memset(&cseg->trailer, 0, sizeof(cseg->trailer));
+ memset(eseg, 0, sizeof(*eseg) - sizeof(eseg->trailer));
- for (i = 0; i < num_frags; i++) {
- skb_frag_t *frag = &xdptxdf->sinfo->frags[i];
- dma_addr_t addr;
+ eseg->inline_hdr.sz = cpu_to_be16(inline_hdr_sz);
- addr = xdptxdf->dma_arr ? xdptxdf->dma_arr[i] :
- page_pool_get_dma_addr(skb_frag_page(frag)) +
- skb_frag_off(frag);
+ for (i = 0; i < num_frags; i++) {
+ skb_frag_t *frag = &xdptxdf->sinfo->frags[i];
+ dma_addr_t addr;
- dseg->addr = cpu_to_be64(addr);
- dseg->byte_count = cpu_to_be32(skb_frag_size(frag));
- dseg->lkey = sq->mkey_be;
- dseg++;
- }
+ addr = xdptxdf->dma_arr ? xdptxdf->dma_arr[i] :
+ page_pool_get_dma_addr(skb_frag_page(frag)) +
+ skb_frag_off(frag);
- cseg->qpn_ds = cpu_to_be32((sq->sqn << 8) | ds_cnt);
+ dseg->addr = cpu_to_be64(addr);
+ dseg->byte_count = cpu_to_be32(skb_frag_size(frag));
+ dseg->lkey = sq->mkey_be;
+ dseg++;
+ }
- sq->db.wqe_info[pi] = (struct mlx5e_xdp_wqe_info) {
- .num_wqebbs = num_wqebbs,
- .num_pkts = 1,
- };
+ cseg->qpn_ds = cpu_to_be32((sq->sqn << 8) | ds_cnt);
- sq->pc += num_wqebbs;
- } else {
- cseg->fm_ce_se = 0;
+ sq->db.wqe_info[pi] = (struct mlx5e_xdp_wqe_info) {
+ .num_wqebbs = num_wqebbs,
+ .num_pkts = 1,
+ };
- sq->pc++;
- }
+ sq->pc += num_wqebbs;
xsk_tx_metadata_request(meta, &mlx5e_xsk_tx_metadata_ops, eseg);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 1c087fa1ca269..15ec9750d4be0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2030,41 +2030,12 @@ int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
csp.min_inline_mode = sq->min_inline_mode;
set_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
- if (param->is_xdp_mb)
- set_bit(MLX5E_SQ_STATE_XDP_MULTIBUF, &sq->state);
-
err = mlx5e_create_sq_rdy(c->mdev, param, &csp, 0, &sq->sqn);
if (err)
goto err_free_xdpsq;
mlx5e_set_xmit_fp(sq, param->is_mpw);
- if (!param->is_mpw && !test_bit(MLX5E_SQ_STATE_XDP_MULTIBUF, &sq->state)) {
- unsigned int ds_cnt = MLX5E_TX_WQE_EMPTY_DS_COUNT + 1;
- unsigned int inline_hdr_sz = 0;
- int i;
-
- if (sq->min_inline_mode != MLX5_INLINE_MODE_NONE) {
- inline_hdr_sz = MLX5E_XDP_MIN_INLINE;
- ds_cnt++;
- }
-
- /* Pre initialize fixed WQE fields */
- for (i = 0; i < mlx5_wq_cyc_get_size(&sq->wq); i++) {
- struct mlx5e_tx_wqe *wqe = mlx5_wq_cyc_get_wqe(&sq->wq, i);
- struct mlx5_wqe_ctrl_seg *cseg = &wqe->ctrl;
- struct mlx5_wqe_eth_seg *eseg = &wqe->eth;
-
- sq->db.wqe_info[i] = (struct mlx5e_xdp_wqe_info) {
- .num_wqebbs = 1,
- .num_pkts = 1,
- };
-
- cseg->qpn_ds = cpu_to_be32((sq->sqn << 8) | ds_cnt);
- eseg->inline_hdr.sz = cpu_to_be16(inline_hdr_sz);
- }
- }
-
return 0;
err_free_xdpsq:
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 380/486] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (40 preceding siblings ...)
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 379/486] net/mlx5: XDP, Enable TX side XDP multi-buffer support Sasha Levin
@ 2025-05-05 22:37 ` Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 381/486] net/mlx5e: set the tx_queue_len for pfifo_fast Sasha Levin
` (25 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Alexei Lazar, Dragos Tatulea, Tariq Toukan, Jakub Kicinski,
Sasha Levin, saeedm, andrew+netdev, davem, edumazet, pabeni,
netdev, linux-rdma
From: Alexei Lazar <alazar@nvidia.com>
[ Upstream commit 95b9606b15bb3ce1198d28d2393dd0e1f0a5f3e9 ]
Current loopback test validation ignores non-linear SKB case in
the SKB access, which can lead to failures in scenarios such as
when HW GRO is enabled.
Linearize the SKB so both cases will be handled.
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-15-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index 1d60465cc2ca4..2f7a543feca62 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -166,6 +166,9 @@ mlx5e_test_loopback_validate(struct sk_buff *skb,
struct udphdr *udph;
struct iphdr *iph;
+ if (skb_linearize(skb))
+ goto out;
+
/* We are only going to peek, no need to clone the SKB */
if (MLX5E_TEST_PKT_SIZE - ETH_HLEN > skb_headlen(skb))
goto out;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 381/486] net/mlx5e: set the tx_queue_len for pfifo_fast
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (41 preceding siblings ...)
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 380/486] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB Sasha Levin
@ 2025-05-05 22:37 ` Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 382/486] net/mlx5e: reduce rep rxq depth to 256 for ECPF Sasha Levin
` (24 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: William Tu, Daniel Jurgens, Tariq Toukan, Michal Swiatkowski,
Jakub Kicinski, Sasha Levin, saeedm, andrew+netdev, davem,
edumazet, pabeni, netdev, linux-rdma
From: William Tu <witu@nvidia.com>
[ Upstream commit a38cc5706fb9f7dc4ee3a443f61de13ce1e410ed ]
By default, the mq netdev creates a pfifo_fast qdisc. On a
system with 16 core, the pfifo_fast with 3 bands consumes
16 * 3 * 8 (size of pointer) * 1024 (default tx queue len)
= 393KB. The patch sets the tx qlen to representor default
value, 128 (1<<MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE), which
consumes 16 * 3 * 8 * 128 = 49KB, saving 344KB for each
representor at ECPF.
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 0657d10765357..fd1f460b7be65 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -885,6 +885,8 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev,
netdev->ethtool_ops = &mlx5e_rep_ethtool_ops;
netdev->watchdog_timeo = 15 * HZ;
+ if (mlx5_core_is_ecpf(mdev))
+ netdev->tx_queue_len = 1 << MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE;
#if IS_ENABLED(CONFIG_MLX5_CLS_ACT)
netdev->hw_features |= NETIF_F_HW_TC;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 382/486] net/mlx5e: reduce rep rxq depth to 256 for ECPF
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (42 preceding siblings ...)
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 381/486] net/mlx5e: set the tx_queue_len for pfifo_fast Sasha Levin
@ 2025-05-05 22:37 ` Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 383/486] net/mlx5e: reduce the max log mpwrq sz for ECPF and reps Sasha Levin
` (23 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: William Tu, Bodong Wang, Saeed Mahameed, Tariq Toukan,
Michal Swiatkowski, Jakub Kicinski, Sasha Levin, andrew+netdev,
davem, edumazet, pabeni, netdev, linux-rdma
From: William Tu <witu@nvidia.com>
[ Upstream commit b9cc8f9d700867aaa77aedddfea85e53d5e5d584 ]
By experiments, a single queue representor netdev consumes kernel
memory around 2.8MB, and 1.8MB out of the 2.8MB is due to page
pool for the RXQ. Scaling to a thousand representors consumes 2.8GB,
which becomes a memory pressure issue for embedded devices such as
BlueField-2 16GB / BlueField-3 32GB memory.
Since representor netdevs mostly handles miss traffic, and ideally,
most of the traffic will be offloaded, reduce the default non-uplink
rep netdev's RXQ default depth from 1024 to 256 if mdev is ecpf eswitch
manager. This saves around 1MB of memory per regular RQ,
(1024 - 256) * 2KB, allocated from page pool.
With rxq depth of 256, the netlink page pool tool reports
$./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump page-pool-get
{'id': 277,
'ifindex': 9,
'inflight': 128,
'inflight-mem': 786432,
'napi-id': 775}]
This is due to mtu 1500 + headroom consumes half pages, so 256 rxq
entries consumes around 128 pages (thus create a page pool with
size 128), shown above at inflight.
Note that each netdev has multiple types of RQs, including
Regular RQ, XSK, PTP, Drop, Trap RQ. Since non-uplink representor
only supports regular rq, this patch only changes the regular RQ's
default depth.
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index fd1f460b7be65..18ec392d17404 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -65,6 +65,7 @@
#define MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE \
max(0x7, MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE)
#define MLX5E_REP_PARAMS_DEF_NUM_CHANNELS 1
+#define MLX5E_REP_PARAMS_DEF_LOG_RQ_SIZE 0x8
static const char mlx5e_rep_driver_name[] = "mlx5e_rep";
@@ -854,6 +855,8 @@ static void mlx5e_build_rep_params(struct net_device *netdev)
/* RQ */
mlx5e_build_rq_params(mdev, params);
+ if (!mlx5e_is_uplink_rep(priv) && mlx5_core_is_ecpf(mdev))
+ params->log_rq_mtu_frames = MLX5E_REP_PARAMS_DEF_LOG_RQ_SIZE;
/* If netdev is already registered (e.g. move from nic profile to uplink,
* RTNL lock must be held before triggering netdev notifiers.
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 383/486] net/mlx5e: reduce the max log mpwrq sz for ECPF and reps
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (43 preceding siblings ...)
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 382/486] net/mlx5e: reduce rep rxq depth to 256 for ECPF Sasha Levin
@ 2025-05-05 22:37 ` Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 385/486] xfrm: prevent high SEQ input in non-ESN mode Sasha Levin
` (22 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: William Tu, Tariq Toukan, Michal Swiatkowski, Jakub Kicinski,
Sasha Levin, saeedm, andrew+netdev, davem, edumazet, pabeni,
dtatulea, alazar, yorayz, lkayal, netdev, linux-rdma
From: William Tu <witu@nvidia.com>
[ Upstream commit e1d68ea58c7e9ebacd9ad7a99b25a3578fa62182 ]
For the ECPF and representors, reduce the max MPWRQ size from 256KB (18)
to 128KB (17). This prepares the later patch for saving representor
memory.
With Striding RQ, there is a minimum of 4 MPWQEs. So with 128KB of max
MPWRQ size, the minimal memory is 4 * 128KB = 512KB. When creating page
pool, consider 1500 mtu, the minimal page pool size will be 512KB/4KB =
128 pages = 256 rx ring entries (2 entries per page).
Before this patch, setting RX ringsize (ethtool -G rx) to 256 causes
driver to allocate page pool size more than it needs due to max MPWRQ
is 256KB (18). Ex: 4 * 256KB = 1MB, 1MB/4KB = 256 pages, but actually
128 pages is good enough. Reducing the max MPWRQ to 128KB fixes the
limitation.
Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 --
.../net/ethernet/mellanox/mlx5/core/en/params.c | 15 +++++++++++----
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index d6266f6a96d6e..e048a667e0758 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -94,8 +94,6 @@ struct page_pool;
#define MLX5_MPWRQ_DEF_LOG_STRIDE_SZ(mdev) \
MLX5_MPWRQ_LOG_STRIDE_SZ(mdev, order_base_2(MLX5E_RX_MAX_HEAD))
-#define MLX5_MPWRQ_MAX_LOG_WQE_SZ 18
-
/* Keep in sync with mlx5e_mpwrq_log_wqe_sz.
* These are theoretical maximums, which can be further restricted by
* capabilities. These values are used for static resource allocations and
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 8c4d710e85675..58ec5e44aa7ad 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -10,6 +10,9 @@
#include <net/page_pool/types.h>
#include <net/xdp_sock_drv.h>
+#define MLX5_MPWRQ_MAX_LOG_WQE_SZ 18
+#define MLX5_REP_MPWRQ_MAX_LOG_WQE_SZ 17
+
static u8 mlx5e_mpwrq_min_page_shift(struct mlx5_core_dev *mdev)
{
u8 min_page_shift = MLX5_CAP_GEN_2(mdev, log_min_mkey_entity_size);
@@ -103,18 +106,22 @@ u8 mlx5e_mpwrq_log_wqe_sz(struct mlx5_core_dev *mdev, u8 page_shift,
enum mlx5e_mpwrq_umr_mode umr_mode)
{
u8 umr_entry_size = mlx5e_mpwrq_umr_entry_size(umr_mode);
- u8 max_pages_per_wqe, max_log_mpwqe_size;
+ u8 max_pages_per_wqe, max_log_wqe_size_calc;
+ u8 max_log_wqe_size_cap;
u16 max_wqe_size;
/* Keep in sync with MLX5_MPWRQ_MAX_PAGES_PER_WQE. */
max_wqe_size = mlx5e_get_max_sq_aligned_wqebbs(mdev) * MLX5_SEND_WQE_BB;
max_pages_per_wqe = ALIGN_DOWN(max_wqe_size - sizeof(struct mlx5e_umr_wqe),
MLX5_UMR_FLEX_ALIGNMENT) / umr_entry_size;
- max_log_mpwqe_size = ilog2(max_pages_per_wqe) + page_shift;
+ max_log_wqe_size_calc = ilog2(max_pages_per_wqe) + page_shift;
+
+ WARN_ON_ONCE(max_log_wqe_size_calc < MLX5E_ORDER2_MAX_PACKET_MTU);
- WARN_ON_ONCE(max_log_mpwqe_size < MLX5E_ORDER2_MAX_PACKET_MTU);
+ max_log_wqe_size_cap = mlx5_core_is_ecpf(mdev) ?
+ MLX5_REP_MPWRQ_MAX_LOG_WQE_SZ : MLX5_MPWRQ_MAX_LOG_WQE_SZ;
- return min_t(u8, max_log_mpwqe_size, MLX5_MPWRQ_MAX_LOG_WQE_SZ);
+ return min_t(u8, max_log_wqe_size_calc, max_log_wqe_size_cap);
}
u8 mlx5e_mpwrq_pages_per_wqe(struct mlx5_core_dev *mdev, u8 page_shift,
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 385/486] xfrm: prevent high SEQ input in non-ESN mode
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (44 preceding siblings ...)
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 383/486] net/mlx5e: reduce the max log mpwrq sz for ECPF and reps Sasha Levin
@ 2025-05-05 22:37 ` Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 387/486] mptcp: pm: userspace: flags: clearer msg if no remote addr Sasha Levin
` (21 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Leon Romanovsky, Steffen Klassert, Sasha Levin, davem, edumazet,
kuba, pabeni, netdev
From: Leon Romanovsky <leonro@nvidia.com>
[ Upstream commit e3aa43a50a6455831e3c32dabc7ece38d9cd9d05 ]
In non-ESN mode, the SEQ numbers are limited to 32 bits and seq_hi/oseq_hi
are not used. So make sure that user gets proper error message, in case
such assignment occurred.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/xfrm/xfrm_user.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 87013623773a2..da2a1c00ca8a6 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -178,6 +178,12 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
"Replay seq and seq_hi should be 0 for output SA");
return -EINVAL;
}
+ if (rs->oseq_hi && !(p->flags & XFRM_STATE_ESN)) {
+ NL_SET_ERR_MSG(
+ extack,
+ "Replay oseq_hi should be 0 in non-ESN mode for output SA");
+ return -EINVAL;
+ }
if (rs->bmp_len) {
NL_SET_ERR_MSG(extack, "Replay bmp_len should 0 for output SA");
return -EINVAL;
@@ -190,6 +196,12 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
"Replay oseq and oseq_hi should be 0 for input SA");
return -EINVAL;
}
+ if (rs->seq_hi && !(p->flags & XFRM_STATE_ESN)) {
+ NL_SET_ERR_MSG(
+ extack,
+ "Replay seq_hi should be 0 in non-ESN mode for input SA");
+ return -EINVAL;
+ }
}
return 0;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 387/486] mptcp: pm: userspace: flags: clearer msg if no remote addr
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (45 preceding siblings ...)
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 385/486] xfrm: prevent high SEQ input in non-ESN mode Sasha Levin
@ 2025-05-05 22:37 ` Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 393/486] net: fec: Refactor MAC reset to function Sasha Levin
` (20 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Matthieu Baerts (NGI0), Geliang Tang, Simon Horman, Paolo Abeni,
Sasha Levin, martineau, davem, edumazet, kuba, netdev, mptcp
From: "Matthieu Baerts (NGI0)" <matttbe@kernel.org>
[ Upstream commit 58b21309f97b08b6b9814d1ee1419249eba9ef08 ]
Since its introduction in commit 892f396c8e68 ("mptcp: netlink: issue
MP_PRIO signals from userspace PMs"), it was mandatory to specify the
remote address, because of the 'if (rem->addr.family == AF_UNSPEC)'
check done later one.
In theory, this attribute can be optional, but it sounds better to be
precise to avoid sending the MP_PRIO on the wrong subflow, e.g. if there
are multiple subflows attached to the same local ID. This can be relaxed
later on if there is a need to act on multiple subflows with one
command.
For the moment, the check to see if attr_rem is NULL can be removed,
because mptcp_pm_parse_entry() will do this check as well, no need to do
that differently here.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/mptcp/pm_userspace.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/net/mptcp/pm_userspace.c b/net/mptcp/pm_userspace.c
index e35178f5205fa..bb76295d04c56 100644
--- a/net/mptcp/pm_userspace.c
+++ b/net/mptcp/pm_userspace.c
@@ -589,11 +589,9 @@ int mptcp_userspace_pm_set_flags(struct sk_buff *skb, struct genl_info *info)
if (ret < 0)
goto set_flags_err;
- if (attr_rem) {
- ret = mptcp_pm_parse_entry(attr_rem, info, false, &rem);
- if (ret < 0)
- goto set_flags_err;
- }
+ ret = mptcp_pm_parse_entry(attr_rem, info, false, &rem);
+ if (ret < 0)
+ goto set_flags_err;
if (loc.addr.family == AF_UNSPEC ||
rem.addr.family == AF_UNSPEC) {
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 393/486] net: fec: Refactor MAC reset to function
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (46 preceding siblings ...)
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 387/486] mptcp: pm: userspace: flags: clearer msg if no remote addr Sasha Levin
@ 2025-05-05 22:37 ` Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 397/486] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure() Sasha Levin
` (19 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Csókás, Bence, Michal Swiatkowski, Jacob Keller,
Simon Horman, Paolo Abeni, Sasha Levin, wei.fang, andrew+netdev,
davem, edumazet, kuba, imx, netdev
From: Csókás, Bence <csokas.bence@prolan.hu>
[ Upstream commit 67800d296191d0a9bde0a7776f99ca1ddfa0fc26 ]
The core is reset both in `fec_restart()` (called on link-up) and
`fec_stop()` (going to sleep, driver remove etc.). These two functions
had their separate implementations, which was at first only a register
write and a `udelay()` (and the accompanying block comment). However,
since then we got soft-reset (MAC disable) and Wake-on-LAN support, which
meant that these implementations diverged, often causing bugs.
For instance, as of now, `fec_stop()` does not check for
`FEC_QUIRK_NO_HARD_RESET`, meaning the MII/RMII mode is cleared on eg.
a PM power-down event; and `fec_restart()` missed the refactor renaming
the "magic" constant `1` to `FEC_ECR_RESET`.
To harmonize current implementations, and eliminate this source of
potential future bugs, refactor implementation to a common function.
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu>
Link: https://patch.msgid.link/20250207121255.161146-2-csokas.bence@prolan.hu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/freescale/fec_main.c | 52 +++++++++++------------
1 file changed, 25 insertions(+), 27 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 2b05d9c6c21a4..00f31d5ea4fca 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1093,6 +1093,29 @@ static void fec_enet_enable_ring(struct net_device *ndev)
}
}
+/* Whack a reset. We should wait for this.
+ * For i.MX6SX SOC, enet use AXI bus, we use disable MAC
+ * instead of reset MAC itself.
+ */
+static void fec_ctrl_reset(struct fec_enet_private *fep, bool allow_wol)
+{
+ u32 val;
+
+ if (!allow_wol || !(fep->wol_flag & FEC_WOL_FLAG_SLEEP_ON)) {
+ if (fep->quirks & FEC_QUIRK_HAS_MULTI_QUEUES ||
+ ((fep->quirks & FEC_QUIRK_NO_HARD_RESET) && fep->link)) {
+ writel(0, fep->hwp + FEC_ECNTRL);
+ } else {
+ writel(FEC_ECR_RESET, fep->hwp + FEC_ECNTRL);
+ udelay(10);
+ }
+ } else {
+ val = readl(fep->hwp + FEC_ECNTRL);
+ val |= (FEC_ECR_MAGICEN | FEC_ECR_SLEEP);
+ writel(val, fep->hwp + FEC_ECNTRL);
+ }
+}
+
/*
* This function is called to start or restart the FEC during a link
* change, transmit timeout, or to reconfigure the FEC. The network
@@ -1109,17 +1132,7 @@ fec_restart(struct net_device *ndev)
if (fep->bufdesc_ex)
fec_ptp_save_state(fep);
- /* Whack a reset. We should wait for this.
- * For i.MX6SX SOC, enet use AXI bus, we use disable MAC
- * instead of reset MAC itself.
- */
- if (fep->quirks & FEC_QUIRK_HAS_MULTI_QUEUES ||
- ((fep->quirks & FEC_QUIRK_NO_HARD_RESET) && fep->link)) {
- writel(0, fep->hwp + FEC_ECNTRL);
- } else {
- writel(1, fep->hwp + FEC_ECNTRL);
- udelay(10);
- }
+ fec_ctrl_reset(fep, false);
/*
* enet-mac reset will reset mac address registers too,
@@ -1373,22 +1386,7 @@ fec_stop(struct net_device *ndev)
if (fep->bufdesc_ex)
fec_ptp_save_state(fep);
- /* Whack a reset. We should wait for this.
- * For i.MX6SX SOC, enet use AXI bus, we use disable MAC
- * instead of reset MAC itself.
- */
- if (!(fep->wol_flag & FEC_WOL_FLAG_SLEEP_ON)) {
- if (fep->quirks & FEC_QUIRK_HAS_MULTI_QUEUES) {
- writel(0, fep->hwp + FEC_ECNTRL);
- } else {
- writel(FEC_ECR_RESET, fep->hwp + FEC_ECNTRL);
- udelay(10);
- }
- } else {
- val = readl(fep->hwp + FEC_ECNTRL);
- val |= (FEC_ECR_MAGICEN | FEC_ECR_SLEEP);
- writel(val, fep->hwp + FEC_ECNTRL);
- }
+ fec_ctrl_reset(fep, true);
writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK);
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 397/486] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure().
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (47 preceding siblings ...)
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 393/486] net: fec: Refactor MAC reset to function Sasha Levin
@ 2025-05-05 22:37 ` Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 398/486] r8152: add vendor/device ID pair for Dell Alienware AW1022z Sasha Levin
` (18 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kuniyuki Iwashima, Eric Dumazet, Ido Schimmel, Jakub Kicinski,
Sasha Levin, davem, dsahern, pabeni, netdev
From: Kuniyuki Iwashima <kuniyu@amazon.com>
[ Upstream commit 5a1ccffd30a08f5a2428cd5fbb3ab03e8eb6c66d ]
The following patch will not set skb->sk from VRF path.
Let's fetch net from fib_rule->fr_net instead of sock_net(skb->sk)
in fib[46]_rule_configure().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250207072502.87775-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/fib_rules.c | 4 ++--
net/ipv6/fib6_rules.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index b07292d50ee76..4563e5303c1a8 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -245,9 +245,9 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
struct nlattr **tb,
struct netlink_ext_ack *extack)
{
- struct net *net = sock_net(skb->sk);
+ struct fib4_rule *rule4 = (struct fib4_rule *)rule;
+ struct net *net = rule->fr_net;
int err = -EINVAL;
- struct fib4_rule *rule4 = (struct fib4_rule *) rule;
if (!inet_validate_dscp(frh->tos)) {
NL_SET_ERR_MSG(extack,
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index 04a9ed5e8310f..29185c9ebd020 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -365,9 +365,9 @@ static int fib6_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
struct nlattr **tb,
struct netlink_ext_ack *extack)
{
+ struct fib6_rule *rule6 = (struct fib6_rule *)rule;
+ struct net *net = rule->fr_net;
int err = -EINVAL;
- struct net *net = sock_net(skb->sk);
- struct fib6_rule *rule6 = (struct fib6_rule *) rule;
if (!inet_validate_dscp(frh->tos)) {
NL_SET_ERR_MSG(extack,
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 398/486] r8152: add vendor/device ID pair for Dell Alienware AW1022z
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (48 preceding siblings ...)
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 397/486] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure() Sasha Levin
@ 2025-05-05 22:37 ` Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 402/486] net: ethtool: prevent flow steering to RSS contexts which don't exist Sasha Levin
` (17 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Aleksander Jan Bajkowski, Jakub Kicinski, Sasha Levin,
andrew+netdev, davem, edumazet, pabeni, gregkh, hayeswang, horms,
dianders, phahn-oss, linux-usb, netdev
From: Aleksander Jan Bajkowski <olek2@wp.pl>
[ Upstream commit 848b09d53d923b4caee5491f57a5c5b22d81febc ]
The Dell AW1022z is an RTL8156B based 2.5G Ethernet controller.
Add the vendor and product ID values to the driver. This makes Ethernet
work with the adapter.
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Link: https://patch.msgid.link/20250206224033.980115-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/usb/r8152.c | 1 +
include/linux/usb/r8152.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 96fa3857d8e25..2cab046749a92 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -10085,6 +10085,7 @@ static const struct usb_device_id rtl8152_table[] = {
{ USB_DEVICE(VENDOR_ID_NVIDIA, 0x09ff) },
{ USB_DEVICE(VENDOR_ID_TPLINK, 0x0601) },
{ USB_DEVICE(VENDOR_ID_DLINK, 0xb301) },
+ { USB_DEVICE(VENDOR_ID_DELL, 0xb097) },
{ USB_DEVICE(VENDOR_ID_ASUS, 0x1976) },
{}
};
diff --git a/include/linux/usb/r8152.h b/include/linux/usb/r8152.h
index 33a4c146dc19c..2ca60828f28bb 100644
--- a/include/linux/usb/r8152.h
+++ b/include/linux/usb/r8152.h
@@ -30,6 +30,7 @@
#define VENDOR_ID_NVIDIA 0x0955
#define VENDOR_ID_TPLINK 0x2357
#define VENDOR_ID_DLINK 0x2001
+#define VENDOR_ID_DELL 0x413c
#define VENDOR_ID_ASUS 0x0b05
#if IS_REACHABLE(CONFIG_USB_RTL8152)
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 402/486] net: ethtool: prevent flow steering to RSS contexts which don't exist
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (49 preceding siblings ...)
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 398/486] r8152: add vendor/device ID pair for Dell Alienware AW1022z Sasha Levin
@ 2025-05-05 22:37 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 412/486] net: page_pool: avoid false positive warning if NAPI was never added Sasha Levin
` (16 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jakub Kicinski, Joe Damato, Sasha Levin, andrew, davem, edumazet,
pabeni, ecree.xilinx, przemyslaw.kitszel, gal, daniel.zahka,
almasrymina, netdev
From: Jakub Kicinski <kuba@kernel.org>
[ Upstream commit de7f7582dff292832fbdeaeff34e6b2ee6f9f95f ]
Since commit 42dc431f5d0e ("ethtool: rss: prevent rss ctx deletion
when in use") we prevent removal of RSS contexts pointed to by
existing flow rules. Core should also prevent creation of rules
which point to RSS context which don't exist in the first place.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250206235334.1425329-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ethtool/ioctl.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 8b9692c35e706..6ed01cec97a8e 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -993,10 +993,14 @@ static noinline_for_stack int ethtool_set_rxnfc(struct net_device *dev,
return rc;
/* Nonzero ring with RSS only makes sense if NIC adds them together */
- if (cmd == ETHTOOL_SRXCLSRLINS && info.fs.flow_type & FLOW_RSS &&
- !ops->cap_rss_rxnfc_adds &&
- ethtool_get_flow_spec_ring(info.fs.ring_cookie))
- return -EINVAL;
+ if (cmd == ETHTOOL_SRXCLSRLINS && info.fs.flow_type & FLOW_RSS) {
+ if (!ops->cap_rss_rxnfc_adds &&
+ ethtool_get_flow_spec_ring(info.fs.ring_cookie))
+ return -EINVAL;
+
+ if (!xa_load(&dev->ethtool->rss_ctx, info.rss_context))
+ return -EINVAL;
+ }
if (cmd == ETHTOOL_SRXFH && ops->get_rxfh) {
struct ethtool_rxfh_param rxfh = {};
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 412/486] net: page_pool: avoid false positive warning if NAPI was never added
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (50 preceding siblings ...)
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 402/486] net: ethtool: prevent flow steering to RSS contexts which don't exist Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 420/486] eth: fbnic: set IFF_UNICAST_FLT to avoid enabling promiscuous mode when adding unicast addrs Sasha Levin
` (15 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jakub Kicinski, Mina Almasry, Sasha Levin, davem, edumazet,
pabeni, hawk, ilias.apalodimas, jdamato, sdf, kuniyu,
kory.maincent, mkarsten, bigeasy, netdev
From: Jakub Kicinski <kuba@kernel.org>
[ Upstream commit c1e00bc4be06cacee6307cedb9b55bbaddb5044d ]
We expect NAPI to be in disabled state when page pool is torn down.
But it is also legal if the NAPI is completely uninitialized.
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250206225638.1387810-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/dev.h | 12 ++++++++++++
net/core/page_pool.c | 7 ++-----
2 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/net/core/dev.h b/net/core/dev.h
index 2e3bb7669984a..764e0097ccf22 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -148,6 +148,18 @@ void xdp_do_check_flushed(struct napi_struct *napi);
static inline void xdp_do_check_flushed(struct napi_struct *napi) { }
#endif
+/* Best effort check that NAPI is not idle (can't be scheduled to run) */
+static inline void napi_assert_will_not_race(const struct napi_struct *napi)
+{
+ /* uninitialized instance, can't race */
+ if (!napi->poll_list.next)
+ return;
+
+ /* SCHED bit is set on disabled instances */
+ WARN_ON(!test_bit(NAPI_STATE_SCHED, &napi->state));
+ WARN_ON(READ_ONCE(napi->list_owner) != -1);
+}
+
void kick_defer_list_purge(struct softnet_data *sd, unsigned int cpu);
#define XMIT_RECURSION_LIMIT 8
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 7b20f6fcb82c0..c8ce069605c42 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -25,6 +25,7 @@
#include <trace/events/page_pool.h>
+#include "dev.h"
#include "mp_dmabuf_devmem.h"
#include "netmem_priv.h"
#include "page_pool_priv.h"
@@ -1108,11 +1109,7 @@ void page_pool_disable_direct_recycling(struct page_pool *pool)
if (!pool->p.napi)
return;
- /* To avoid races with recycling and additional barriers make sure
- * pool and NAPI are unlinked when NAPI is disabled.
- */
- WARN_ON(!test_bit(NAPI_STATE_SCHED, &pool->p.napi->state));
- WARN_ON(READ_ONCE(pool->p.napi->list_owner) != -1);
+ napi_assert_will_not_race(pool->p.napi);
WRITE_ONCE(pool->p.napi, NULL);
}
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 420/486] eth: fbnic: set IFF_UNICAST_FLT to avoid enabling promiscuous mode when adding unicast addrs
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (51 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 412/486] net: page_pool: avoid false positive warning if NAPI was never added Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 421/486] tools: ynl-gen: don't output external constants Sasha Levin
` (14 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Alexander Duyck, Jakub Kicinski, Simon Horman, Paolo Abeni,
Sasha Levin, alexanderduyck, andrew+netdev, davem, edumazet,
jdamato, mohsin.bashr, vadim.fedorenko, sdf, netdev
From: Alexander Duyck <alexanderduyck@meta.com>
[ Upstream commit 09717c28b76c30b1dc8c261c855ffb2406abab2e ]
I realized when we were adding unicast addresses we were enabling
promiscuous mode. I did a bit of digging and realized we had overlooked
setting the driver private flag to indicate we supported unicast filtering.
Example below shows the table with 00deadbeef01 as the main NIC address,
and 5 additional addresses in the 00deadbeefX0 format.
# cat $dbgfs/mac_addr
Idx S TCAM Bitmap Addr/Mask
----------------------------------
00 0 00000000,00000000 000000000000
000000000000
01 0 00000000,00000000 000000000000
000000000000
02 0 00000000,00000000 000000000000
000000000000
...
24 0 00000000,00000000 000000000000
000000000000
25 1 00100000,00000000 00deadbeef50
000000000000
26 1 00100000,00000000 00deadbeef40
000000000000
27 1 00100000,00000000 00deadbeef30
000000000000
28 1 00100000,00000000 00deadbeef20
000000000000
29 1 00100000,00000000 00deadbeef10
000000000000
30 1 00100000,00000000 00deadbeef01
000000000000
31 0 00000000,00000000 000000000000
000000000000
Before rule 31 would be active. With this change it correctly sticks
to just the unicast filters.
Signed-off-by: Alexander Duyck <alexanderduyck@meta.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250204010038.1404268-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/meta/fbnic/fbnic_netdev.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
index a400616a24d41..79e94632533c8 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
@@ -544,6 +544,8 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
fbnic_rss_key_fill(fbn->rss_key);
fbnic_rss_init_en_mask(fbn);
+ netdev->priv_flags |= IFF_UNICAST_FLT;
+
netdev->features |=
NETIF_F_RXHASH |
NETIF_F_SG |
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 421/486] tools: ynl-gen: don't output external constants
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (52 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 420/486] eth: fbnic: set IFF_UNICAST_FLT to avoid enabling promiscuous mode when adding unicast addrs Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 422/486] net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled Sasha Levin
` (13 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jakub Kicinski, Paolo Abeni, Sasha Levin, donald.hunter, davem,
edumazet, sdf, antonio, jstancek, johannes.berg, netdev
From: Jakub Kicinski <kuba@kernel.org>
[ Upstream commit 7e8b24e24ac46038e48c9a042e7d9b31855cbca5 ]
A definition with a "header" property is an "external" definition
for C code, as in it is defined already in another C header file.
Other languages will need the exact value but C codegen should
not recreate it. So don't output those definitions in the uAPI
header.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250203215510.1288728-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/net/ynl/ynl-gen-c.py | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/net/ynl/ynl-gen-c.py b/tools/net/ynl/ynl-gen-c.py
index 463f1394ab971..c78f1c1bca75c 100755
--- a/tools/net/ynl/ynl-gen-c.py
+++ b/tools/net/ynl/ynl-gen-c.py
@@ -2417,6 +2417,9 @@ def render_uapi(family, cw):
defines = []
for const in family['definitions']:
+ if const.get('header'):
+ continue
+
if const['type'] != 'const':
cw.writes_defines(defines)
defines = []
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 422/486] net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (53 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 421/486] tools: ynl-gen: don't output external constants Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 424/486] vxlan: Annotate FDB data races Sasha Levin
` (12 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Carolina Jubran, Yael Chemla, Cosmin Ratiu, Tariq Toukan,
Kalesh AP, Paolo Abeni, Sasha Levin, saeedm, andrew+netdev, davem,
edumazet, kuba, netdev, linux-rdma
From: Carolina Jubran <cjubran@nvidia.com>
[ Upstream commit 689805dcc474c2accb5cffbbcea1c06ee4a54570 ]
When attempting to enable MQPRIO while HTB offload is already
configured, the driver currently returns `-EINVAL` and triggers a
`WARN_ON`, leading to an unnecessary call trace.
Update the code to handle this case more gracefully by returning
`-EOPNOTSUPP` instead, while also providing a helpful user message.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 15ec9750d4be0..36a2c935267b0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3759,8 +3759,11 @@ static int mlx5e_setup_tc_mqprio(struct mlx5e_priv *priv,
/* MQPRIO is another toplevel qdisc that can't be attached
* simultaneously with the offloaded HTB.
*/
- if (WARN_ON(mlx5e_selq_is_htb_enabled(&priv->selq)))
- return -EINVAL;
+ if (mlx5e_selq_is_htb_enabled(&priv->selq)) {
+ NL_SET_ERR_MSG_MOD(mqprio->extack,
+ "MQPRIO cannot be configured when HTB offload is enabled.");
+ return -EOPNOTSUPP;
+ }
switch (mqprio->mode) {
case TC_MQPRIO_MODE_DCB:
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 424/486] vxlan: Annotate FDB data races
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (54 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 422/486] net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 425/486] ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-only Sasha Levin
` (11 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Ido Schimmel, Petr Machata, Eric Dumazet, Nikolay Aleksandrov,
Jakub Kicinski, Sasha Levin, andrew+netdev, davem, pabeni,
menglong8.dong, gnault, netdev
From: Ido Schimmel <idosch@nvidia.com>
[ Upstream commit f6205f8215f12a96518ac9469ff76294ae7bd612 ]
The 'used' and 'updated' fields in the FDB entry structure can be
accessed concurrently by multiple threads, leading to reports such as
[1]. Can be reproduced using [2].
Suppress these reports by annotating these accesses using
READ_ONCE() / WRITE_ONCE().
[1]
BUG: KCSAN: data-race in vxlan_xmit / vxlan_xmit
write to 0xffff942604d263a8 of 8 bytes by task 286 on cpu 0:
vxlan_xmit+0xb29/0x2380
dev_hard_start_xmit+0x84/0x2f0
__dev_queue_xmit+0x45a/0x1650
packet_xmit+0x100/0x150
packet_sendmsg+0x2114/0x2ac0
__sys_sendto+0x318/0x330
__x64_sys_sendto+0x76/0x90
x64_sys_call+0x14e8/0x1c00
do_syscall_64+0x9e/0x1a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffff942604d263a8 of 8 bytes by task 287 on cpu 2:
vxlan_xmit+0xadf/0x2380
dev_hard_start_xmit+0x84/0x2f0
__dev_queue_xmit+0x45a/0x1650
packet_xmit+0x100/0x150
packet_sendmsg+0x2114/0x2ac0
__sys_sendto+0x318/0x330
__x64_sys_sendto+0x76/0x90
x64_sys_call+0x14e8/0x1c00
do_syscall_64+0x9e/0x1a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0x00000000fffbac6e -> 0x00000000fffbac6f
Reported by Kernel Concurrency Sanitizer on:
CPU: 2 UID: 0 PID: 287 Comm: mausezahn Not tainted 6.13.0-rc7-01544-gb4b270f11a02 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[2]
#!/bin/bash
set +H
echo whitelist > /sys/kernel/debug/kcsan
echo !vxlan_xmit > /sys/kernel/debug/kcsan
ip link add name vx0 up type vxlan id 10010 dstport 4789 local 192.0.2.1
bridge fdb add 00:11:22:33:44:55 dev vx0 self static dst 198.51.100.1
taskset -c 0 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &
taskset -c 2 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250204145549.1216254-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/vxlan/vxlan_core.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 01f66760e1328..474faccf75fd9 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -227,9 +227,9 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
be32_to_cpu(fdb->vni)))
goto nla_put_failure;
- ci.ndm_used = jiffies_to_clock_t(now - fdb->used);
+ ci.ndm_used = jiffies_to_clock_t(now - READ_ONCE(fdb->used));
ci.ndm_confirmed = 0;
- ci.ndm_updated = jiffies_to_clock_t(now - fdb->updated);
+ ci.ndm_updated = jiffies_to_clock_t(now - READ_ONCE(fdb->updated));
ci.ndm_refcnt = 0;
if (nla_put(skb, NDA_CACHEINFO, sizeof(ci), &ci))
@@ -434,8 +434,8 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
struct vxlan_fdb *f;
f = __vxlan_find_mac(vxlan, mac, vni);
- if (f && f->used != jiffies)
- f->used = jiffies;
+ if (f && READ_ONCE(f->used) != jiffies)
+ WRITE_ONCE(f->used, jiffies);
return f;
}
@@ -1009,12 +1009,12 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
!(f->flags & NTF_VXLAN_ADDED_BY_USER)) {
if (f->state != state) {
f->state = state;
- f->updated = jiffies;
+ WRITE_ONCE(f->updated, jiffies);
notify = 1;
}
if (f->flags != fdb_flags) {
f->flags = fdb_flags;
- f->updated = jiffies;
+ WRITE_ONCE(f->updated, jiffies);
notify = 1;
}
}
@@ -1048,7 +1048,7 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
}
if (ndm_flags & NTF_USE)
- f->used = jiffies;
+ WRITE_ONCE(f->used, jiffies);
if (notify) {
if (rd == NULL)
@@ -1477,7 +1477,7 @@ static bool vxlan_snoop(struct net_device *dev,
src_mac, &rdst->remote_ip.sa, &src_ip->sa);
rdst->remote_ip = *src_ip;
- f->updated = jiffies;
+ WRITE_ONCE(f->updated, jiffies);
vxlan_fdb_notify(vxlan, f, rdst, RTM_NEWNEIGH, true, NULL);
} else {
u32 hash_index = fdb_head_index(vxlan, src_mac, vni);
@@ -2825,7 +2825,7 @@ static void vxlan_cleanup(struct timer_list *t)
if (f->flags & NTF_EXT_LEARNED)
continue;
- timeout = f->used + vxlan->cfg.age_interval * HZ;
+ timeout = READ_ONCE(f->used) + vxlan->cfg.age_interval * HZ;
if (time_before_eq(timeout, jiffies)) {
netdev_dbg(vxlan->dev,
"garbage collect %pM\n",
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 425/486] ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-only
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (55 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 424/486] vxlan: Annotate FDB data races Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 426/486] r8169: don't scan PHY addresses > 0 Sasha Levin
` (10 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Geert Uytterhoeven, kernel test robot, Simon Horman,
Jakub Kicinski, Sasha Levin, davem, dsahern, edumazet, pabeni,
netdev
From: Geert Uytterhoeven <geert@linux-m68k.org>
[ Upstream commit 50f37fc2a39c4a8cc4813629b4cf239b71c6097d ]
if CONFIG_NET_IPGRE is enabled, but CONFIG_IPV6 is disabled:
net/ipv4/ip_gre.c: In function ‘ipgre_err’:
net/ipv4/ip_gre.c:144:22: error: variable ‘data_len’ set but not used [-Werror=unused-but-set-variable]
144 | unsigned int data_len = 0;
| ^~~~~~~~
Fix this by moving all data_len processing inside the IPV6-only section
that uses its result.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501121007.2GofXmh5-lkp@intel.com/
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/d09113cfe2bfaca02f3dddf832fb5f48dd20958b.1738704881.git.geert@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/ip_gre.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index f1f31ebfc7934..9667f27740258 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -141,7 +141,6 @@ static int ipgre_err(struct sk_buff *skb, u32 info,
const struct iphdr *iph;
const int type = icmp_hdr(skb)->type;
const int code = icmp_hdr(skb)->code;
- unsigned int data_len = 0;
struct ip_tunnel *t;
if (tpi->proto == htons(ETH_P_TEB))
@@ -182,7 +181,6 @@ static int ipgre_err(struct sk_buff *skb, u32 info,
case ICMP_TIME_EXCEEDED:
if (code != ICMP_EXC_TTL)
return 0;
- data_len = icmp_hdr(skb)->un.reserved[1] * 4; /* RFC 4884 4.1 */
break;
case ICMP_REDIRECT:
@@ -190,10 +188,16 @@ static int ipgre_err(struct sk_buff *skb, u32 info,
}
#if IS_ENABLED(CONFIG_IPV6)
- if (tpi->proto == htons(ETH_P_IPV6) &&
- !ip6_err_gen_icmpv6_unreach(skb, iph->ihl * 4 + tpi->hdr_len,
- type, data_len))
- return 0;
+ if (tpi->proto == htons(ETH_P_IPV6)) {
+ unsigned int data_len = 0;
+
+ if (type == ICMP_TIME_EXCEEDED)
+ data_len = icmp_hdr(skb)->un.reserved[1] * 4; /* RFC 4884 4.1 */
+
+ if (!ip6_err_gen_icmpv6_unreach(skb, iph->ihl * 4 + tpi->hdr_len,
+ type, data_len))
+ return 0;
+ }
#endif
if (t->parms.iph.daddr == 0 ||
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 426/486] r8169: don't scan PHY addresses > 0
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (56 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 425/486] ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-only Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 427/486] net: flush_backlog() small changes Sasha Levin
` (9 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Heiner Kallweit, Andrew Lunn, Jakub Kicinski, Sasha Levin,
nic_swsd, andrew+netdev, davem, edumazet, pabeni, netdev
From: Heiner Kallweit <hkallweit1@gmail.com>
[ Upstream commit faac69a4ae5abb49e62c79c66b51bb905c9aa5ec ]
The PHY address is a dummy, because r8169 PHY access registers
don't support a PHY address. Therefore scan address 0 only.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/830637dd-4016-4a68-92b3-618fcac6589d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/realtek/r8169_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 3420b6cf8189f..85bb5121cd245 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5258,6 +5258,7 @@ static int r8169_mdio_register(struct rtl8169_private *tp)
new_bus->priv = tp;
new_bus->parent = &pdev->dev;
new_bus->irq[0] = PHY_MAC_INTERRUPT;
+ new_bus->phy_mask = GENMASK(31, 1);
snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8169-%x-%x",
pci_domain_nr(pdev->bus), pci_dev_id(pdev));
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 427/486] net: flush_backlog() small changes
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (57 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 426/486] r8169: don't scan PHY addresses > 0 Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 428/486] bridge: mdb: Allow replace of a host-joined group Sasha Levin
` (8 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Eric Dumazet, Jason Xing, Jakub Kicinski, Sasha Levin, davem,
pabeni, kuniyu, sdf, ahmed.zaki, aleksander.lobakin, netdev
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit cbe08724c18078564abefbf6591078a7c98e5e0f ]
Add READ_ONCE() around reads of skb->dev->reg_state, because
this field can be changed from other threads/cpus.
Instead of calling dev_kfree_skb_irq() and kfree_skb()
while interrupts are masked and locks held,
use a temporary list and use __skb_queue_purge_reason()
Use SKB_DROP_REASON_DEV_READY drop reason to better
describe why these skbs are dropped.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250204144825.316785-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/dev.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 7b7b36c43c82c..2ba2160dd093a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6034,16 +6034,18 @@ static DEFINE_PER_CPU(struct work_struct, flush_works);
static void flush_backlog(struct work_struct *work)
{
struct sk_buff *skb, *tmp;
+ struct sk_buff_head list;
struct softnet_data *sd;
+ __skb_queue_head_init(&list);
local_bh_disable();
sd = this_cpu_ptr(&softnet_data);
backlog_lock_irq_disable(sd);
skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) {
- if (skb->dev->reg_state == NETREG_UNREGISTERING) {
+ if (READ_ONCE(skb->dev->reg_state) == NETREG_UNREGISTERING) {
__skb_unlink(skb, &sd->input_pkt_queue);
- dev_kfree_skb_irq(skb);
+ __skb_queue_tail(&list, skb);
rps_input_queue_head_incr(sd);
}
}
@@ -6051,14 +6053,16 @@ static void flush_backlog(struct work_struct *work)
local_lock_nested_bh(&softnet_data.process_queue_bh_lock);
skb_queue_walk_safe(&sd->process_queue, skb, tmp) {
- if (skb->dev->reg_state == NETREG_UNREGISTERING) {
+ if (READ_ONCE(skb->dev->reg_state) == NETREG_UNREGISTERING) {
__skb_unlink(skb, &sd->process_queue);
- kfree_skb(skb);
+ __skb_queue_tail(&list, skb);
rps_input_queue_head_incr(sd);
}
}
local_unlock_nested_bh(&softnet_data.process_queue_bh_lock);
local_bh_enable();
+
+ __skb_queue_purge_reason(&list, SKB_DROP_REASON_DEV_READY);
}
static bool flush_required(int cpu)
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 428/486] bridge: mdb: Allow replace of a host-joined group
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (58 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 427/486] net: flush_backlog() small changes Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 429/486] net-sysfs: remove rtnl_trylock from queue attributes Sasha Levin
` (7 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Petr Machata, Ido Schimmel, Nikolay Aleksandrov, Jakub Kicinski,
Sasha Levin, davem, edumazet, pabeni, shuah, bridge, netdev,
linux-kselftest
From: Petr Machata <petrm@nvidia.com>
[ Upstream commit d9e9f6d7b7d0c520bb87f19d2cbc57aeeb2091d5 ]
Attempts to replace an MDB group membership of the host itself are
currently bounced:
# ip link add name br up type bridge vlan_filtering 1
# bridge mdb replace dev br port br grp 239.0.0.1 vid 2
# bridge mdb replace dev br port br grp 239.0.0.1 vid 2
Error: bridge: Group is already joined by host.
A similar operation done on a member port would succeed. Ignore the check
for replacement of host group memberships as well.
The bit of code that this enables is br_multicast_host_join(), which, for
already-joined groups only refreshes the MC group expiration timer, which
is desirable; and a userspace notification, also desirable.
Change a selftest that exercises this code path from expecting a rejection
to expecting a pass. The rest of MDB selftests pass without modification.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/e5c5188b9787ae806609e7ca3aa2a0a501b9b5c4.1738685648.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/bridge/br_mdb.c | 2 +-
tools/testing/selftests/net/forwarding/bridge_mdb.sh | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 1a52a0bca086d..7e1ad229e1330 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -1040,7 +1040,7 @@ static int br_mdb_add_group(const struct br_mdb_config *cfg,
/* host join */
if (!port) {
- if (mp->host_joined) {
+ if (mp->host_joined && !(cfg->nlflags & NLM_F_REPLACE)) {
NL_SET_ERR_MSG_MOD(extack, "Group is already joined by host");
return -EEXIST;
}
diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
index d9d587454d207..8c1597ebc2d38 100755
--- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
@@ -149,7 +149,7 @@ cfg_test_host_common()
check_err $? "Failed to add $name host entry"
bridge mdb replace dev br0 port br0 grp $grp $state vid 10 &> /dev/null
- check_fail $? "Managed to replace $name host entry"
+ check_err $? "Failed to replace $name host entry"
bridge mdb del dev br0 port br0 grp $grp $state vid 10
bridge mdb get dev br0 grp $grp vid 10 &> /dev/null
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 429/486] net-sysfs: remove rtnl_trylock from queue attributes
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (59 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 428/486] bridge: mdb: Allow replace of a host-joined group Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 430/486] net-sysfs: prevent uncleared queues from being re-added Sasha Levin
` (6 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Antoine Tenart, Jakub Kicinski, Sasha Levin, davem, edumazet,
pabeni, sdf, jdamato, aleksander.lobakin, netdev
From: Antoine Tenart <atenart@kernel.org>
[ Upstream commit b0b6fcfa6ad8433e22b050c72cfbeec2548744b9 ]
Similar to the commit removing remove rtnl_trylock from device
attributes we here apply the same technique to networking queues.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250204170314.146022-5-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/net-sysfs.c | 147 ++++++++++++++++++++++++++-----------------
1 file changed, 89 insertions(+), 58 deletions(-)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 05cf5347f25e8..b398a2e0c243d 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1213,9 +1213,11 @@ static int net_rx_queue_change_owner(struct net_device *dev, int num,
*/
struct netdev_queue_attribute {
struct attribute attr;
- ssize_t (*show)(struct netdev_queue *queue, char *buf);
- ssize_t (*store)(struct netdev_queue *queue,
- const char *buf, size_t len);
+ ssize_t (*show)(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, char *buf);
+ ssize_t (*store)(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, const char *buf,
+ size_t len);
};
#define to_netdev_queue_attr(_attr) \
container_of(_attr, struct netdev_queue_attribute, attr)
@@ -1232,7 +1234,7 @@ static ssize_t netdev_queue_attr_show(struct kobject *kobj,
if (!attribute->show)
return -EIO;
- return attribute->show(queue, buf);
+ return attribute->show(kobj, attr, queue, buf);
}
static ssize_t netdev_queue_attr_store(struct kobject *kobj,
@@ -1246,7 +1248,7 @@ static ssize_t netdev_queue_attr_store(struct kobject *kobj,
if (!attribute->store)
return -EIO;
- return attribute->store(queue, buf, count);
+ return attribute->store(kobj, attr, queue, buf, count);
}
static const struct sysfs_ops netdev_queue_sysfs_ops = {
@@ -1254,7 +1256,8 @@ static const struct sysfs_ops netdev_queue_sysfs_ops = {
.store = netdev_queue_attr_store,
};
-static ssize_t tx_timeout_show(struct netdev_queue *queue, char *buf)
+static ssize_t tx_timeout_show(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, char *buf)
{
unsigned long trans_timeout = atomic_long_read(&queue->trans_timeout);
@@ -1272,18 +1275,18 @@ static unsigned int get_netdev_queue_index(struct netdev_queue *queue)
return i;
}
-static ssize_t traffic_class_show(struct netdev_queue *queue,
- char *buf)
+static ssize_t traffic_class_show(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, char *buf)
{
struct net_device *dev = queue->dev;
- int num_tc, tc;
- int index;
+ int num_tc, tc, index, ret;
if (!netif_is_multiqueue(dev))
return -ENOENT;
- if (!rtnl_trylock())
- return restart_syscall();
+ ret = sysfs_rtnl_lock(kobj, attr, queue->dev);
+ if (ret)
+ return ret;
index = get_netdev_queue_index(queue);
@@ -1310,24 +1313,25 @@ static ssize_t traffic_class_show(struct netdev_queue *queue,
}
#ifdef CONFIG_XPS
-static ssize_t tx_maxrate_show(struct netdev_queue *queue,
- char *buf)
+static ssize_t tx_maxrate_show(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, char *buf)
{
return sysfs_emit(buf, "%lu\n", queue->tx_maxrate);
}
-static ssize_t tx_maxrate_store(struct netdev_queue *queue,
- const char *buf, size_t len)
+static ssize_t tx_maxrate_store(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, const char *buf,
+ size_t len)
{
- struct net_device *dev = queue->dev;
int err, index = get_netdev_queue_index(queue);
+ struct net_device *dev = queue->dev;
u32 rate = 0;
if (!capable(CAP_NET_ADMIN))
return -EPERM;
/* The check is also done later; this helps returning early without
- * hitting the trylock/restart below.
+ * hitting the locking section below.
*/
if (!dev->netdev_ops->ndo_set_tx_maxrate)
return -EOPNOTSUPP;
@@ -1336,18 +1340,21 @@ static ssize_t tx_maxrate_store(struct netdev_queue *queue,
if (err < 0)
return err;
- if (!rtnl_trylock())
- return restart_syscall();
+ err = sysfs_rtnl_lock(kobj, attr, dev);
+ if (err)
+ return err;
err = -EOPNOTSUPP;
if (dev->netdev_ops->ndo_set_tx_maxrate)
err = dev->netdev_ops->ndo_set_tx_maxrate(dev, index, rate);
- rtnl_unlock();
if (!err) {
queue->tx_maxrate = rate;
+ rtnl_unlock();
return len;
}
+
+ rtnl_unlock();
return err;
}
@@ -1391,16 +1398,17 @@ static ssize_t bql_set(const char *buf, const size_t count,
return count;
}
-static ssize_t bql_show_hold_time(struct netdev_queue *queue,
- char *buf)
+static ssize_t bql_show_hold_time(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, char *buf)
{
struct dql *dql = &queue->dql;
return sysfs_emit(buf, "%u\n", jiffies_to_msecs(dql->slack_hold_time));
}
-static ssize_t bql_set_hold_time(struct netdev_queue *queue,
- const char *buf, size_t len)
+static ssize_t bql_set_hold_time(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, const char *buf,
+ size_t len)
{
struct dql *dql = &queue->dql;
unsigned int value;
@@ -1419,15 +1427,17 @@ static struct netdev_queue_attribute bql_hold_time_attribute __ro_after_init
= __ATTR(hold_time, 0644,
bql_show_hold_time, bql_set_hold_time);
-static ssize_t bql_show_stall_thrs(struct netdev_queue *queue, char *buf)
+static ssize_t bql_show_stall_thrs(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, char *buf)
{
struct dql *dql = &queue->dql;
return sysfs_emit(buf, "%u\n", jiffies_to_msecs(dql->stall_thrs));
}
-static ssize_t bql_set_stall_thrs(struct netdev_queue *queue,
- const char *buf, size_t len)
+static ssize_t bql_set_stall_thrs(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, const char *buf,
+ size_t len)
{
struct dql *dql = &queue->dql;
unsigned int value;
@@ -1453,13 +1463,15 @@ static ssize_t bql_set_stall_thrs(struct netdev_queue *queue,
static struct netdev_queue_attribute bql_stall_thrs_attribute __ro_after_init =
__ATTR(stall_thrs, 0644, bql_show_stall_thrs, bql_set_stall_thrs);
-static ssize_t bql_show_stall_max(struct netdev_queue *queue, char *buf)
+static ssize_t bql_show_stall_max(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, char *buf)
{
return sysfs_emit(buf, "%u\n", READ_ONCE(queue->dql.stall_max));
}
-static ssize_t bql_set_stall_max(struct netdev_queue *queue,
- const char *buf, size_t len)
+static ssize_t bql_set_stall_max(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, const char *buf,
+ size_t len)
{
WRITE_ONCE(queue->dql.stall_max, 0);
return len;
@@ -1468,7 +1480,8 @@ static ssize_t bql_set_stall_max(struct netdev_queue *queue,
static struct netdev_queue_attribute bql_stall_max_attribute __ro_after_init =
__ATTR(stall_max, 0644, bql_show_stall_max, bql_set_stall_max);
-static ssize_t bql_show_stall_cnt(struct netdev_queue *queue, char *buf)
+static ssize_t bql_show_stall_cnt(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, char *buf)
{
struct dql *dql = &queue->dql;
@@ -1478,8 +1491,8 @@ static ssize_t bql_show_stall_cnt(struct netdev_queue *queue, char *buf)
static struct netdev_queue_attribute bql_stall_cnt_attribute __ro_after_init =
__ATTR(stall_cnt, 0444, bql_show_stall_cnt, NULL);
-static ssize_t bql_show_inflight(struct netdev_queue *queue,
- char *buf)
+static ssize_t bql_show_inflight(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, char *buf)
{
struct dql *dql = &queue->dql;
@@ -1490,13 +1503,16 @@ static struct netdev_queue_attribute bql_inflight_attribute __ro_after_init =
__ATTR(inflight, 0444, bql_show_inflight, NULL);
#define BQL_ATTR(NAME, FIELD) \
-static ssize_t bql_show_ ## NAME(struct netdev_queue *queue, \
- char *buf) \
+static ssize_t bql_show_ ## NAME(struct kobject *kobj, \
+ struct attribute *attr, \
+ struct netdev_queue *queue, char *buf) \
{ \
return bql_show(buf, queue->dql.FIELD); \
} \
\
-static ssize_t bql_set_ ## NAME(struct netdev_queue *queue, \
+static ssize_t bql_set_ ## NAME(struct kobject *kobj, \
+ struct attribute *attr, \
+ struct netdev_queue *queue, \
const char *buf, size_t len) \
{ \
return bql_set(buf, len, &queue->dql.FIELD); \
@@ -1582,19 +1598,21 @@ static ssize_t xps_queue_show(struct net_device *dev, unsigned int index,
return len < PAGE_SIZE ? len : -EINVAL;
}
-static ssize_t xps_cpus_show(struct netdev_queue *queue, char *buf)
+static ssize_t xps_cpus_show(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, char *buf)
{
struct net_device *dev = queue->dev;
unsigned int index;
- int len, tc;
+ int len, tc, ret;
if (!netif_is_multiqueue(dev))
return -ENOENT;
index = get_netdev_queue_index(queue);
- if (!rtnl_trylock())
- return restart_syscall();
+ ret = sysfs_rtnl_lock(kobj, attr, queue->dev);
+ if (ret)
+ return ret;
/* If queue belongs to subordinate dev use its map */
dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
@@ -1605,18 +1623,21 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue, char *buf)
return -EINVAL;
}
- /* Make sure the subordinate device can't be freed */
- get_device(&dev->dev);
+ /* Increase the net device refcnt to make sure it won't be freed while
+ * xps_queue_show is running.
+ */
+ dev_hold(dev);
rtnl_unlock();
len = xps_queue_show(dev, index, tc, buf, XPS_CPUS);
- put_device(&dev->dev);
+ dev_put(dev);
return len;
}
-static ssize_t xps_cpus_store(struct netdev_queue *queue,
- const char *buf, size_t len)
+static ssize_t xps_cpus_store(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, const char *buf,
+ size_t len)
{
struct net_device *dev = queue->dev;
unsigned int index;
@@ -1640,9 +1661,10 @@ static ssize_t xps_cpus_store(struct netdev_queue *queue,
return err;
}
- if (!rtnl_trylock()) {
+ err = sysfs_rtnl_lock(kobj, attr, dev);
+ if (err) {
free_cpumask_var(mask);
- return restart_syscall();
+ return err;
}
err = netif_set_xps_queue(dev, mask, index);
@@ -1656,26 +1678,34 @@ static ssize_t xps_cpus_store(struct netdev_queue *queue,
static struct netdev_queue_attribute xps_cpus_attribute __ro_after_init
= __ATTR_RW(xps_cpus);
-static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
+static ssize_t xps_rxqs_show(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, char *buf)
{
struct net_device *dev = queue->dev;
unsigned int index;
- int tc;
+ int tc, ret;
index = get_netdev_queue_index(queue);
- if (!rtnl_trylock())
- return restart_syscall();
+ ret = sysfs_rtnl_lock(kobj, attr, dev);
+ if (ret)
+ return ret;
tc = netdev_txq_to_tc(dev, index);
+
+ /* Increase the net device refcnt to make sure it won't be freed while
+ * xps_queue_show is running.
+ */
+ dev_hold(dev);
rtnl_unlock();
- if (tc < 0)
- return -EINVAL;
- return xps_queue_show(dev, index, tc, buf, XPS_RXQS);
+ ret = tc >= 0 ? xps_queue_show(dev, index, tc, buf, XPS_RXQS) : -EINVAL;
+ dev_put(dev);
+ return ret;
}
-static ssize_t xps_rxqs_store(struct netdev_queue *queue, const char *buf,
+static ssize_t xps_rxqs_store(struct kobject *kobj, struct attribute *attr,
+ struct netdev_queue *queue, const char *buf,
size_t len)
{
struct net_device *dev = queue->dev;
@@ -1699,9 +1729,10 @@ static ssize_t xps_rxqs_store(struct netdev_queue *queue, const char *buf,
return err;
}
- if (!rtnl_trylock()) {
+ err = sysfs_rtnl_lock(kobj, attr, dev);
+ if (err) {
bitmap_free(mask);
- return restart_syscall();
+ return err;
}
cpus_read_lock();
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 430/486] net-sysfs: prevent uncleared queues from being re-added
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (60 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 429/486] net-sysfs: remove rtnl_trylock from queue attributes Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 431/486] net-sysfs: remove rtnl_trylock from device attributes Sasha Levin
` (5 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Antoine Tenart, Jakub Kicinski, Sasha Levin, davem, edumazet,
pabeni, sdf, jdamato, aleksander.lobakin, netdev
From: Antoine Tenart <atenart@kernel.org>
[ Upstream commit 7e54f85c60828842be27e0149f3533357225090e ]
With the (upcoming) removal of the rtnl_trylock/restart_syscall logic
and because of how Tx/Rx queues are implemented (and their
requirements), it might happen that a queue is re-added before having
the chance to be cleared. In such rare case, do not complete the queue
addition operation.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250204170314.146022-4-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/net-sysfs.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b398a2e0c243d..d6d0c3082b82b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1100,6 +1100,22 @@ static int rx_queue_add_kobject(struct net_device *dev, int index)
struct kobject *kobj = &queue->kobj;
int error = 0;
+ /* Rx queues are cleared in rx_queue_release to allow later
+ * re-registration. This is triggered when their kobj refcount is
+ * dropped.
+ *
+ * If a queue is removed while both a read (or write) operation and a
+ * the re-addition of the same queue are pending (waiting on rntl_lock)
+ * it might happen that the re-addition will execute before the read,
+ * making the initial removal to never happen (queue's kobj refcount
+ * won't drop enough because of the pending read). In such rare case,
+ * return to allow the removal operation to complete.
+ */
+ if (unlikely(kobj->state_initialized)) {
+ netdev_warn_once(dev, "Cannot re-add rx queues before their removal completed");
+ return -EAGAIN;
+ }
+
/* Kobject_put later will trigger rx_queue_release call which
* decreases dev refcount: Take that reference here
*/
@@ -1811,6 +1827,22 @@ static int netdev_queue_add_kobject(struct net_device *dev, int index)
struct kobject *kobj = &queue->kobj;
int error = 0;
+ /* Tx queues are cleared in netdev_queue_release to allow later
+ * re-registration. This is triggered when their kobj refcount is
+ * dropped.
+ *
+ * If a queue is removed while both a read (or write) operation and a
+ * the re-addition of the same queue are pending (waiting on rntl_lock)
+ * it might happen that the re-addition will execute before the read,
+ * making the initial removal to never happen (queue's kobj refcount
+ * won't drop enough because of the pending read). In such rare case,
+ * return to allow the removal operation to complete.
+ */
+ if (unlikely(kobj->state_initialized)) {
+ netdev_warn_once(dev, "Cannot re-add tx queues before their removal completed");
+ return -EAGAIN;
+ }
+
/* Kobject_put later will trigger netdev_queue_release call
* which decreases dev refcount: Take that reference here
*/
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 431/486] net-sysfs: remove rtnl_trylock from device attributes
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (61 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 430/486] net-sysfs: prevent uncleared queues from being re-added Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 432/486] ice: init flow director before RDMA Sasha Levin
` (4 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Antoine Tenart, Jakub Kicinski, Sasha Levin, davem, edumazet,
pabeni, sdf, jdamato, kuniyu, shaw.leon, netdev
From: Antoine Tenart <atenart@kernel.org>
[ Upstream commit 79c61899b5eee317907efd1b0d06a1ada0cc00d8 ]
There is an ABBA deadlock between net device unregistration and sysfs
files being accessed[1][2]. To prevent this from happening all paths
taking the rtnl lock after the sysfs one (actually kn->active refcount)
use rtnl_trylock and return early (using restart_syscall)[3], which can
make syscalls to spin for a long time when there is contention on the
rtnl lock[4].
There are not many possibilities to improve the above:
- Rework the entire net/ locking logic.
- Invert two locks in one of the paths — not possible.
But here it's actually possible to drop one of the locks safely: the
kernfs_node refcount. More details in the code itself, which comes with
lots of comments.
Note that we check the device is alive in the added sysfs_rtnl_lock
helper to disallow sysfs operations to run after device dismantle has
started. This also help keeping the same behavior as before. Because of
this calls to dev_isalive in sysfs ops were removed.
[1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/
[2] https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/
[3] https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/
[4] https://lore.kernel.org/all/20210928125500.167943-1-atenart@kernel.org/T/
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250204170314.146022-2-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/rtnetlink.h | 1 +
net/core/net-sysfs.c | 186 +++++++++++++++++++++++++++-----------
net/core/rtnetlink.c | 5 +
3 files changed, 139 insertions(+), 53 deletions(-)
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index a7da7dfc06a2a..84f7818d50024 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -44,6 +44,7 @@ extern void rtnl_lock(void);
extern void rtnl_unlock(void);
extern int rtnl_trylock(void);
extern int rtnl_is_locked(void);
+extern int rtnl_lock_interruptible(void);
extern int rtnl_lock_killable(void);
extern bool refcount_dec_and_rtnl_lock(refcount_t *r);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index d6d0c3082b82b..749c4745086b7 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -42,6 +42,87 @@ static inline int dev_isalive(const struct net_device *dev)
return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED;
}
+/* There is a possible ABBA deadlock between rtnl_lock and kernfs_node->active,
+ * when unregistering a net device and accessing associated sysfs files. The
+ * potential deadlock is as follow:
+ *
+ * CPU 0 CPU 1
+ *
+ * rtnl_lock vfs_read
+ * unregister_netdevice_many kernfs_seq_start
+ * device_del / kobject_put kernfs_get_active (kn->active++)
+ * kernfs_drain sysfs_kf_seq_show
+ * wait_event( rtnl_lock
+ * kn->active == KN_DEACTIVATED_BIAS) -> waits on CPU 0 to release
+ * -> waits on CPU 1 to decrease kn->active the rtnl lock.
+ *
+ * The historical fix was to use rtnl_trylock with restart_syscall to bail out
+ * of sysfs operations when the lock couldn't be taken. This fixed the above
+ * issue as it allowed CPU 1 to bail out of the ABBA situation.
+ *
+ * But it came with performances issues, as syscalls are being restarted in
+ * loops when there was contention on the rtnl lock, with huge slow downs in
+ * specific scenarios (e.g. lots of virtual interfaces created and userspace
+ * daemons querying their attributes).
+ *
+ * The idea below is to bail out of the active kernfs_node protection
+ * (kn->active) while trying to take the rtnl lock.
+ *
+ * This replaces rtnl_lock() and still has to be used with rtnl_unlock(). The
+ * net device is guaranteed to be alive if this returns successfully.
+ */
+static int sysfs_rtnl_lock(struct kobject *kobj, struct attribute *attr,
+ struct net_device *ndev)
+{
+ struct kernfs_node *kn;
+ int ret = 0;
+
+ /* First, we hold a reference to the net device as the unregistration
+ * path might run in parallel. This will ensure the net device and the
+ * associated sysfs objects won't be freed while we try to take the rtnl
+ * lock.
+ */
+ dev_hold(ndev);
+ /* sysfs_break_active_protection was introduced to allow self-removal of
+ * devices and their associated sysfs files by bailing out of the
+ * sysfs/kernfs protection. We do this here to allow the unregistration
+ * path to complete in parallel. The following takes a reference on the
+ * kobject and the kernfs_node being accessed.
+ *
+ * This works because we hold a reference onto the net device and the
+ * unregistration path will wait for us eventually in netdev_run_todo
+ * (outside an rtnl lock section).
+ */
+ kn = sysfs_break_active_protection(kobj, attr);
+ /* We can now try to take the rtnl lock. This can't deadlock us as the
+ * unregistration path is able to drain sysfs files (kernfs_node) thanks
+ * to the above dance.
+ */
+ if (rtnl_lock_interruptible()) {
+ ret = -ERESTARTSYS;
+ goto unbreak;
+ }
+ /* Check dismantle on the device hasn't started, otherwise deny the
+ * operation.
+ */
+ if (!dev_isalive(ndev)) {
+ rtnl_unlock();
+ ret = -ENODEV;
+ goto unbreak;
+ }
+ /* We are now sure the device dismantle hasn't started nor that it can
+ * start before we exit the locking section as we hold the rtnl lock.
+ * There's no need to keep unbreaking the sysfs protection nor to hold
+ * a net device reference from that point; that was only needed to take
+ * the rtnl lock.
+ */
+unbreak:
+ sysfs_unbreak_active_protection(kn);
+ dev_put(ndev);
+
+ return ret;
+}
+
/* use same locking rules as GIF* ioctl's */
static ssize_t netdev_show(const struct device *dev,
struct device_attribute *attr, char *buf,
@@ -95,14 +176,14 @@ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
if (ret)
goto err;
- if (!rtnl_trylock())
- return restart_syscall();
+ ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
+ if (ret)
+ goto err;
+
+ ret = (*set)(netdev, new);
+ if (ret == 0)
+ ret = len;
- if (dev_isalive(netdev)) {
- ret = (*set)(netdev, new);
- if (ret == 0)
- ret = len;
- }
rtnl_unlock();
err:
return ret;
@@ -190,7 +271,7 @@ static ssize_t carrier_store(struct device *dev, struct device_attribute *attr,
struct net_device *netdev = to_net_dev(dev);
/* The check is also done in change_carrier; this helps returning early
- * without hitting the trylock/restart in netdev_store.
+ * without hitting the locking section in netdev_store.
*/
if (!netdev->netdev_ops->ndo_change_carrier)
return -EOPNOTSUPP;
@@ -204,8 +285,9 @@ static ssize_t carrier_show(struct device *dev,
struct net_device *netdev = to_net_dev(dev);
int ret = -EINVAL;
- if (!rtnl_trylock())
- return restart_syscall();
+ ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
+ if (ret)
+ return ret;
if (netif_running(netdev)) {
/* Synchronize carrier state with link watch,
@@ -215,8 +297,8 @@ static ssize_t carrier_show(struct device *dev,
ret = sysfs_emit(buf, fmt_dec, !!netif_carrier_ok(netdev));
}
- rtnl_unlock();
+ rtnl_unlock();
return ret;
}
static DEVICE_ATTR_RW(carrier);
@@ -228,13 +310,14 @@ static ssize_t speed_show(struct device *dev,
int ret = -EINVAL;
/* The check is also done in __ethtool_get_link_ksettings; this helps
- * returning early without hitting the trylock/restart below.
+ * returning early without hitting the locking section below.
*/
if (!netdev->ethtool_ops->get_link_ksettings)
return ret;
- if (!rtnl_trylock())
- return restart_syscall();
+ ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
+ if (ret)
+ return ret;
if (netif_running(netdev)) {
struct ethtool_link_ksettings cmd;
@@ -254,13 +337,14 @@ static ssize_t duplex_show(struct device *dev,
int ret = -EINVAL;
/* The check is also done in __ethtool_get_link_ksettings; this helps
- * returning early without hitting the trylock/restart below.
+ * returning early without hitting the locking section below.
*/
if (!netdev->ethtool_ops->get_link_ksettings)
return ret;
- if (!rtnl_trylock())
- return restart_syscall();
+ ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
+ if (ret)
+ return ret;
if (netif_running(netdev)) {
struct ethtool_link_ksettings cmd;
@@ -459,16 +543,15 @@ static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
if (len > 0 && buf[len - 1] == '\n')
--count;
- if (!rtnl_trylock())
- return restart_syscall();
+ ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
+ if (ret)
+ return ret;
- if (dev_isalive(netdev)) {
- ret = dev_set_alias(netdev, buf, count);
- if (ret < 0)
- goto err;
- ret = len;
- netdev_state_change(netdev);
- }
+ ret = dev_set_alias(netdev, buf, count);
+ if (ret < 0)
+ goto err;
+ ret = len;
+ netdev_state_change(netdev);
err:
rtnl_unlock();
@@ -520,24 +603,23 @@ static ssize_t phys_port_id_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct net_device *netdev = to_net_dev(dev);
+ struct netdev_phys_item_id ppid;
ssize_t ret = -EINVAL;
/* The check is also done in dev_get_phys_port_id; this helps returning
- * early without hitting the trylock/restart below.
+ * early without hitting the locking section below.
*/
if (!netdev->netdev_ops->ndo_get_phys_port_id)
return -EOPNOTSUPP;
- if (!rtnl_trylock())
- return restart_syscall();
+ ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
+ if (ret)
+ return ret;
- if (dev_isalive(netdev)) {
- struct netdev_phys_item_id ppid;
+ ret = dev_get_phys_port_id(netdev, &ppid);
+ if (!ret)
+ ret = sysfs_emit(buf, "%*phN\n", ppid.id_len, ppid.id);
- ret = dev_get_phys_port_id(netdev, &ppid);
- if (!ret)
- ret = sysfs_emit(buf, "%*phN\n", ppid.id_len, ppid.id);
- }
rtnl_unlock();
return ret;
@@ -549,24 +631,23 @@ static ssize_t phys_port_name_show(struct device *dev,
{
struct net_device *netdev = to_net_dev(dev);
ssize_t ret = -EINVAL;
+ char name[IFNAMSIZ];
/* The checks are also done in dev_get_phys_port_name; this helps
- * returning early without hitting the trylock/restart below.
+ * returning early without hitting the locking section below.
*/
if (!netdev->netdev_ops->ndo_get_phys_port_name &&
!netdev->devlink_port)
return -EOPNOTSUPP;
- if (!rtnl_trylock())
- return restart_syscall();
+ ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
+ if (ret)
+ return ret;
- if (dev_isalive(netdev)) {
- char name[IFNAMSIZ];
+ ret = dev_get_phys_port_name(netdev, name, sizeof(name));
+ if (!ret)
+ ret = sysfs_emit(buf, "%s\n", name);
- ret = dev_get_phys_port_name(netdev, name, sizeof(name));
- if (!ret)
- ret = sysfs_emit(buf, "%s\n", name);
- }
rtnl_unlock();
return ret;
@@ -577,26 +658,25 @@ static ssize_t phys_switch_id_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct net_device *netdev = to_net_dev(dev);
+ struct netdev_phys_item_id ppid = { };
ssize_t ret = -EINVAL;
/* The checks are also done in dev_get_phys_port_name; this helps
- * returning early without hitting the trylock/restart below. This works
+ * returning early without hitting the locking section below. This works
* because recurse is false when calling dev_get_port_parent_id.
*/
if (!netdev->netdev_ops->ndo_get_port_parent_id &&
!netdev->devlink_port)
return -EOPNOTSUPP;
- if (!rtnl_trylock())
- return restart_syscall();
+ ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
+ if (ret)
+ return ret;
- if (dev_isalive(netdev)) {
- struct netdev_phys_item_id ppid = { };
+ ret = dev_get_port_parent_id(netdev, &ppid, false);
+ if (!ret)
+ ret = sysfs_emit(buf, "%*phN\n", ppid.id_len, ppid.id);
- ret = dev_get_port_parent_id(netdev, &ppid, false);
- if (!ret)
- ret = sysfs_emit(buf, "%*phN\n", ppid.id_len, ppid.id);
- }
rtnl_unlock();
return ret;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4d0ee1c9002aa..1252275001fcc 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -80,6 +80,11 @@ void rtnl_lock(void)
}
EXPORT_SYMBOL(rtnl_lock);
+int rtnl_lock_interruptible(void)
+{
+ return mutex_lock_interruptible(&rtnl_mutex);
+}
+
int rtnl_lock_killable(void)
{
return mutex_lock_killable(&rtnl_mutex);
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 432/486] ice: init flow director before RDMA
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (62 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 431/486] net-sysfs: remove rtnl_trylock from device attributes Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 433/486] ice: treat dyn_allowed only as suggestion Sasha Levin
` (3 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Michal Swiatkowski, Jacob Keller, Pucha Himasekhar Reddy,
Tony Nguyen, Sasha Levin, przemyslaw.kitszel, andrew+netdev,
davem, edumazet, kuba, pabeni, intel-wired-lan, netdev
From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
[ Upstream commit d67627e7b53203ca150e54723abbed81a0716286 ]
Flow director needs only one MSI-X. Load it before RDMA to save MSI-X
for it.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/intel/ice/ice_main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index ca707dfcb286e..63d2105fce933 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5175,11 +5175,12 @@ int ice_load(struct ice_pf *pf)
ice_napi_add(vsi);
+ ice_init_features(pf);
+
err = ice_init_rdma(pf);
if (err)
goto err_init_rdma;
- ice_init_features(pf);
ice_service_task_restart(pf);
clear_bit(ICE_DOWN, pf->state);
@@ -5187,6 +5188,7 @@ int ice_load(struct ice_pf *pf)
return 0;
err_init_rdma:
+ ice_deinit_features(pf);
ice_tc_indir_block_unregister(vsi);
err_tc_indir_block_register:
ice_unregister_netdev(vsi);
@@ -5210,8 +5212,8 @@ void ice_unload(struct ice_pf *pf)
devl_assert_locked(priv_to_devlink(pf));
- ice_deinit_features(pf);
ice_deinit_rdma(pf);
+ ice_deinit_features(pf);
ice_tc_indir_block_unregister(vsi);
ice_unregister_netdev(vsi);
ice_devlink_destroy_pf_port(pf);
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 433/486] ice: treat dyn_allowed only as suggestion
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (63 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 432/486] ice: init flow director before RDMA Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 438/486] ice: count combined queues using Rx/Tx count Sasha Levin
` (2 subsequent siblings)
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Michal Swiatkowski, Jacob Keller, Wojciech Drewek,
Pucha Himasekhar Reddy, Tony Nguyen, Sasha Levin,
przemyslaw.kitszel, andrew+netdev, davem, edumazet, kuba, pabeni,
intel-wired-lan, netdev
From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
[ Upstream commit a8c2d3932c1106af2764cc6869b29bcf3cb5bc47 ]
It can be needed to have some MSI-X allocated as static and rest as
dynamic. For example on PF VSI. We want to always have minimum one MSI-X
on it, because of that it is allocated as a static one, rest can be
dynamic if it is supported.
Change the ice_get_irq_res() to allow using static entries if they are
free even if caller wants dynamic one.
Adjust limit values to the new approach. Min and max in limit means the
values that are valid, so decrease max and num_static by one.
Set vsi::irq_dyn_alloc if dynamic allocation is supported.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/intel/ice/ice_irq.c | 25 ++++++++++++------------
drivers/net/ethernet/intel/ice/ice_lib.c | 2 ++
2 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index ad82ff7d19957..09f9c7ba52795 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -45,7 +45,7 @@ static void ice_free_irq_res(struct ice_pf *pf, u16 index)
/**
* ice_get_irq_res - get an interrupt resource
* @pf: board private structure
- * @dyn_only: force entry to be dynamically allocated
+ * @dyn_allowed: allow entry to be dynamically allocated
*
* Allocate new irq entry in the free slot of the tracker. Since xarray
* is used, always allocate new entry at the lowest possible index. Set
@@ -53,11 +53,12 @@ static void ice_free_irq_res(struct ice_pf *pf, u16 index)
*
* Returns allocated irq entry or NULL on failure.
*/
-static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
+static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf,
+ bool dyn_allowed)
{
- struct xa_limit limit = { .max = pf->irq_tracker.num_entries,
+ struct xa_limit limit = { .max = pf->irq_tracker.num_entries - 1,
.min = 0 };
- unsigned int num_static = pf->irq_tracker.num_static;
+ unsigned int num_static = pf->irq_tracker.num_static - 1;
struct ice_irq_entry *entry;
unsigned int index;
int ret;
@@ -66,9 +67,9 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
if (!entry)
return NULL;
- /* skip preallocated entries if the caller says so */
- if (dyn_only)
- limit.min = num_static;
+ /* only already allocated if the caller says so */
+ if (!dyn_allowed)
+ limit.max = num_static;
ret = xa_alloc(&pf->irq_tracker.entries, &index, entry, limit,
GFP_KERNEL);
@@ -78,7 +79,7 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
entry = NULL;
} else {
entry->index = index;
- entry->dynamic = index >= num_static;
+ entry->dynamic = index > num_static;
}
return entry;
@@ -272,7 +273,7 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
/**
* ice_alloc_irq - Allocate new interrupt vector
* @pf: board private structure
- * @dyn_only: force dynamic allocation of the interrupt
+ * @dyn_allowed: allow dynamic allocation of the interrupt
*
* Allocate new interrupt vector for a given owner id.
* return struct msi_map with interrupt details and track
@@ -285,20 +286,20 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
* interrupt will be allocated with pci_msix_alloc_irq_at.
*
* Some callers may only support dynamically allocated interrupts.
- * This is indicated with dyn_only flag.
+ * This is indicated with dyn_allowed flag.
*
* On failure, return map with negative .index. The caller
* is expected to check returned map index.
*
*/
-struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_only)
+struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_allowed)
{
int sriov_base_vector = pf->sriov_base_vector;
struct msi_map map = { .index = -ENOENT };
struct device *dev = ice_pf_to_dev(pf);
struct ice_irq_entry *entry;
- entry = ice_get_irq_res(pf, dyn_only);
+ entry = ice_get_irq_res(pf, dyn_allowed);
if (!entry)
return map;
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 121a5ad5c8e10..8961eebe67aa2 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -567,6 +567,8 @@ ice_vsi_alloc_def(struct ice_vsi *vsi, struct ice_channel *ch)
return -ENOMEM;
}
+ vsi->irq_dyn_alloc = pci_msix_can_alloc_dyn(vsi->back->pdev);
+
switch (vsi->type) {
case ICE_VSI_PF:
case ICE_VSI_SF:
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 438/486] ice: count combined queues using Rx/Tx count
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (64 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 433/486] ice: treat dyn_allowed only as suggestion Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 440/486] net/mana: fix warning in the writer of client oob Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 456/486] bpf: Use kallsyms to find the function name of a struct_ops's stub function Sasha Levin
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Michal Swiatkowski, Jacob Keller, Pucha Himasekhar Reddy,
Tony Nguyen, Sasha Levin, przemyslaw.kitszel, andrew+netdev,
davem, edumazet, kuba, pabeni, intel-wired-lan, netdev
From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
[ Upstream commit c3a392bdd31adc474f1009ee85c13fdd01fe800d ]
Previous implementation assumes that there is 1:1 matching between
vectors and queues. It isn't always true.
Get minimum value from Rx/Tx queues to determine combined queues number.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/intel/ice/ice_ethtool.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 7d1feeb317be3..2a2acbeb57221 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3817,8 +3817,7 @@ static u32 ice_get_combined_cnt(struct ice_vsi *vsi)
ice_for_each_q_vector(vsi, q_idx) {
struct ice_q_vector *q_vector = vsi->q_vectors[q_idx];
- if (q_vector->rx.rx_ring && q_vector->tx.tx_ring)
- combined++;
+ combined += min(q_vector->num_ring_tx, q_vector->num_ring_rx);
}
return combined;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 440/486] net/mana: fix warning in the writer of client oob
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (65 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 438/486] ice: count combined queues using Rx/Tx count Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 456/486] bpf: Use kallsyms to find the function name of a struct_ops's stub function Sasha Levin
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Konstantin Taranov, Shiraz Saleem, Long Li, Leon Romanovsky,
Sasha Levin, kys, haiyangz, wei.liu, decui, andrew+netdev, davem,
edumazet, kuba, pabeni, shradhagupta, mlevitsk, peterz, ernis,
linux-hyperv, netdev
From: Konstantin Taranov <kotaranov@microsoft.com>
[ Upstream commit 5ec7e1c86c441c46a374577bccd9488abea30037 ]
Do not warn on missing pad_data when oob is in sgl.
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/1737394039-28772-9-git-send-email-kotaranov@linux.microsoft.com
Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/microsoft/mana/gdma_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 36802e0a8b570..9bac4083d8a09 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1044,7 +1044,7 @@ static u32 mana_gd_write_client_oob(const struct gdma_wqe_request *wqe_req,
header->inline_oob_size_div4 = client_oob_size / sizeof(u32);
if (oob_in_sgl) {
- WARN_ON_ONCE(!pad_data || wqe_req->num_sge < 2);
+ WARN_ON_ONCE(wqe_req->num_sge < 2);
header->client_oob_in_sgl = 1;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread* [PATCH AUTOSEL 6.12 456/486] bpf: Use kallsyms to find the function name of a struct_ops's stub function
[not found] <20250505223922.2682012-1-sashal@kernel.org>
` (66 preceding siblings ...)
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 440/486] net/mana: fix warning in the writer of client oob Sasha Levin
@ 2025-05-05 22:38 ` Sasha Levin
67 siblings, 0 replies; 68+ messages in thread
From: Sasha Levin @ 2025-05-05 22:38 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Martin KaFai Lau, Tejun Heo, Benjamin Tissoires, Yonghong Song,
Amery Hung, Alexei Starovoitov, Sasha Levin, martin.lau, daniel,
andrii, bpf, netdev
From: Martin KaFai Lau <martin.lau@kernel.org>
[ Upstream commit 12fdd29d5d71d2987a1aec434b704d850a4d7fcb ]
In commit 1611603537a4 ("bpf: Create argument information for nullable arguments."),
it introduced a "__nullable" tagging at the argument name of a
stub function. Some background on the commit:
it requires to tag the stub function instead of directly tagging
the "ops" of a struct. This is because the btf func_proto of the "ops"
does not have the argument name and the "__nullable" is tagged at
the argument name.
To find the stub function of a "ops", it currently relies on a naming
convention on the stub function "st_ops__ops_name".
e.g. tcp_congestion_ops__ssthresh. However, the new kernel
sub system implementing bpf_struct_ops have missed this and
have been surprised that the "__nullable" and the to-be-landed
"__ref" tagging was not effective.
One option would be to give a warning whenever the stub function does
not follow the naming convention, regardless if it requires arg tagging
or not.
Instead, this patch uses the kallsyms_lookup approach and removes
the requirement on the naming convention. The st_ops->cfi_stubs has
all the stub function kernel addresses. kallsyms_lookup() is used to
lookup the function name. With the function name, BTF can be used to
find the BTF func_proto. The existing "__nullable" arg name searching
logic will then fall through.
One notable change is,
if it failed in kallsyms_lookup or it failed in looking up the stub
function name from the BTF, the bpf_struct_ops registration will fail.
This is different from the previous behavior that it silently ignored
the "st_ops__ops_name" function not found error.
The "tcp_congestion_ops", "sched_ext_ops", and "hid_bpf_ops" can still be
registered successfully after this patch. There is struct_ops_maybe_null
selftest to cover the "__nullable" tagging.
Other minor changes:
1. Removed the "%s__%s" format from the pr_warn because the naming
convention is removed.
2. The existing bpf_struct_ops_supported() is also moved earlier
because prepare_arg_info needs to use it to decide if the
stub function is NULL before calling the prepare_arg_info.
Cc: Tejun Heo <tj@kernel.org>
Cc: Benjamin Tissoires <bentiss@kernel.org>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20250127222719.2544255-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/bpf/bpf_struct_ops.c | 98 +++++++++++++++++--------------------
1 file changed, 44 insertions(+), 54 deletions(-)
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index b70d0eef8a284..477947456371a 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -147,39 +147,6 @@ void bpf_struct_ops_image_free(void *image)
}
#define MAYBE_NULL_SUFFIX "__nullable"
-#define MAX_STUB_NAME 128
-
-/* Return the type info of a stub function, if it exists.
- *
- * The name of a stub function is made up of the name of the struct_ops and
- * the name of the function pointer member, separated by "__". For example,
- * if the struct_ops type is named "foo_ops" and the function pointer
- * member is named "bar", the stub function name would be "foo_ops__bar".
- */
-static const struct btf_type *
-find_stub_func_proto(const struct btf *btf, const char *st_op_name,
- const char *member_name)
-{
- char stub_func_name[MAX_STUB_NAME];
- const struct btf_type *func_type;
- s32 btf_id;
- int cp;
-
- cp = snprintf(stub_func_name, MAX_STUB_NAME, "%s__%s",
- st_op_name, member_name);
- if (cp >= MAX_STUB_NAME) {
- pr_warn("Stub function name too long\n");
- return NULL;
- }
- btf_id = btf_find_by_name_kind(btf, stub_func_name, BTF_KIND_FUNC);
- if (btf_id < 0)
- return NULL;
- func_type = btf_type_by_id(btf, btf_id);
- if (!func_type)
- return NULL;
-
- return btf_type_by_id(btf, func_type->type); /* FUNC_PROTO */
-}
/* Prepare argument info for every nullable argument of a member of a
* struct_ops type.
@@ -204,27 +171,42 @@ find_stub_func_proto(const struct btf *btf, const char *st_op_name,
static int prepare_arg_info(struct btf *btf,
const char *st_ops_name,
const char *member_name,
- const struct btf_type *func_proto,
+ const struct btf_type *func_proto, void *stub_func_addr,
struct bpf_struct_ops_arg_info *arg_info)
{
const struct btf_type *stub_func_proto, *pointed_type;
const struct btf_param *stub_args, *args;
struct bpf_ctx_arg_aux *info, *info_buf;
u32 nargs, arg_no, info_cnt = 0;
+ char ksym[KSYM_SYMBOL_LEN];
+ const char *stub_fname;
+ s32 stub_func_id;
u32 arg_btf_id;
int offset;
- stub_func_proto = find_stub_func_proto(btf, st_ops_name, member_name);
- if (!stub_func_proto)
- return 0;
+ stub_fname = kallsyms_lookup((unsigned long)stub_func_addr, NULL, NULL, NULL, ksym);
+ if (!stub_fname) {
+ pr_warn("Cannot find the stub function name for the %s in struct %s\n",
+ member_name, st_ops_name);
+ return -ENOENT;
+ }
+
+ stub_func_id = btf_find_by_name_kind(btf, stub_fname, BTF_KIND_FUNC);
+ if (stub_func_id < 0) {
+ pr_warn("Cannot find the stub function %s in btf\n", stub_fname);
+ return -ENOENT;
+ }
+
+ stub_func_proto = btf_type_by_id(btf, stub_func_id);
+ stub_func_proto = btf_type_by_id(btf, stub_func_proto->type);
/* Check if the number of arguments of the stub function is the same
* as the number of arguments of the function pointer.
*/
nargs = btf_type_vlen(func_proto);
if (nargs != btf_type_vlen(stub_func_proto)) {
- pr_warn("the number of arguments of the stub function %s__%s does not match the number of arguments of the member %s of struct %s\n",
- st_ops_name, member_name, member_name, st_ops_name);
+ pr_warn("the number of arguments of the stub function %s does not match the number of arguments of the member %s of struct %s\n",
+ stub_fname, member_name, st_ops_name);
return -EINVAL;
}
@@ -254,21 +236,21 @@ static int prepare_arg_info(struct btf *btf,
&arg_btf_id);
if (!pointed_type ||
!btf_type_is_struct(pointed_type)) {
- pr_warn("stub function %s__%s has %s tagging to an unsupported type\n",
- st_ops_name, member_name, MAYBE_NULL_SUFFIX);
+ pr_warn("stub function %s has %s tagging to an unsupported type\n",
+ stub_fname, MAYBE_NULL_SUFFIX);
goto err_out;
}
offset = btf_ctx_arg_offset(btf, func_proto, arg_no);
if (offset < 0) {
- pr_warn("stub function %s__%s has an invalid trampoline ctx offset for arg#%u\n",
- st_ops_name, member_name, arg_no);
+ pr_warn("stub function %s has an invalid trampoline ctx offset for arg#%u\n",
+ stub_fname, arg_no);
goto err_out;
}
if (args[arg_no].type != stub_args[arg_no].type) {
- pr_warn("arg#%u type in stub function %s__%s does not match with its original func_proto\n",
- arg_no, st_ops_name, member_name);
+ pr_warn("arg#%u type in stub function %s does not match with its original func_proto\n",
+ arg_no, stub_fname);
goto err_out;
}
@@ -325,6 +307,13 @@ static bool is_module_member(const struct btf *btf, u32 id)
return !strcmp(btf_name_by_offset(btf, t->name_off), "module");
}
+int bpf_struct_ops_supported(const struct bpf_struct_ops *st_ops, u32 moff)
+{
+ void *func_ptr = *(void **)(st_ops->cfi_stubs + moff);
+
+ return func_ptr ? 0 : -ENOTSUPP;
+}
+
int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc,
struct btf *btf,
struct bpf_verifier_log *log)
@@ -388,7 +377,10 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc,
for_each_member(i, t, member) {
const struct btf_type *func_proto;
+ void **stub_func_addr;
+ u32 moff;
+ moff = __btf_member_bit_offset(t, member) / 8;
mname = btf_name_by_offset(btf, member->name_off);
if (!*mname) {
pr_warn("anon member in struct %s is not supported\n",
@@ -414,7 +406,11 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc,
func_proto = btf_type_resolve_func_ptr(btf,
member->type,
NULL);
- if (!func_proto)
+
+ /* The member is not a function pointer or
+ * the function pointer is not supported.
+ */
+ if (!func_proto || bpf_struct_ops_supported(st_ops, moff))
continue;
if (btf_distill_func_proto(log, btf,
@@ -426,8 +422,9 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc,
goto errout;
}
+ stub_func_addr = *(void **)(st_ops->cfi_stubs + moff);
err = prepare_arg_info(btf, st_ops->name, mname,
- func_proto,
+ func_proto, stub_func_addr,
arg_info + i);
if (err)
goto errout;
@@ -1153,13 +1150,6 @@ void bpf_struct_ops_put(const void *kdata)
bpf_map_put(&st_map->map);
}
-int bpf_struct_ops_supported(const struct bpf_struct_ops *st_ops, u32 moff)
-{
- void *func_ptr = *(void **)(st_ops->cfi_stubs + moff);
-
- return func_ptr ? 0 : -ENOTSUPP;
-}
-
static bool bpf_struct_ops_valid_to_reg(struct bpf_map *map)
{
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
--
2.39.5
^ permalink raw reply related [flat|nested] 68+ messages in thread