* [PATCH iwl-next v3 0/3] igc: add support for forcing link speed without autonegotiation
From: KhaiWenTan @ 2026-04-22 15:56 UTC (permalink / raw)
To: anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev, davem,
edumazet, kuba, pabeni
Cc: intel-wired-lan, netdev, linux-kernel, faizal.abdul.rahim,
hong.aun.looi, khai.wen.tan, Faizal Rahim
From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
This series adds support for forcing 10/100 Mb/s link speed via ethtool
when autonegotiation is disabled on the igc driver.
Changes in v3:
- Modify condition from "if (duplex == DUPLEX_HALF)" to
"if (duplex != DUPLEX_FULL)". (Simon Horman)
Changes in v2:
- When forcing half-duplex, set hw->fc.requested_mode = igc_fc_none,
since half-duplex cannot support flow control per IEEE 802.3.
(Simon Horman)
- Split the original single patch into three patches for clarity:
patches 1 and 2 are preparatory cleanups; patch 3 carries the
functional change.
v2 at:
https://patchwork.kernel.org/project/netdevbpf/patch/20260416015520.6090-4-khai.wen.tan@linux.intel.com/
v1 at:
https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20260409072747.217836-1-khai.wen.tan@linux.intel.com/
Faizal Rahim (3):
igc: remove unused autoneg_failed field
igc: move autoneg-enabled settings into igc_handle_autoneg_enabled()
igc: add support for forcing link speed without autonegotiation
drivers/net/ethernet/intel/igc/igc_base.c | 35 +++-
drivers/net/ethernet/intel/igc/igc_defines.h | 9 +-
drivers/net/ethernet/intel/igc/igc_ethtool.c | 203 +++++++++++++------
drivers/net/ethernet/intel/igc/igc_hw.h | 10 +-
drivers/net/ethernet/intel/igc/igc_mac.c | 16 +-
drivers/net/ethernet/intel/igc/igc_main.c | 2 +-
drivers/net/ethernet/intel/igc/igc_phy.c | 65 +++++-
drivers/net/ethernet/intel/igc/igc_phy.h | 1 +
8 files changed, 251 insertions(+), 90 deletions(-)
--
2.43.0
^ permalink raw reply
* [PATCH iwl-next v3 1/3] igc: remove unused autoneg_failed field
From: KhaiWenTan @ 2026-04-22 15:56 UTC (permalink / raw)
To: anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev, davem,
edumazet, kuba, pabeni
Cc: intel-wired-lan, netdev, linux-kernel, faizal.abdul.rahim,
hong.aun.looi, khai.wen.tan, Faizal Rahim, Looi,
Aleksandr Loktionov, KhaiWenTan
In-Reply-To: <20260422155701.7420-1-khai.wen.tan@linux.intel.com>
From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
autoneg_failed in struct igc_mac_info is never set in the igc driver.
Remove the field and the dead code checking it in
igc_config_fc_after_link_up().
Reviewed-by: Looi, Hong Aun <hong.aun.looi@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Signed-off-by: KhaiWenTan <khai.wen.tan@linux.intel.com>
---
drivers/net/ethernet/intel/igc/igc_hw.h | 1 -
drivers/net/ethernet/intel/igc/igc_mac.c | 16 +---------------
2 files changed, 1 insertion(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_hw.h b/drivers/net/ethernet/intel/igc/igc_hw.h
index be8a49a86d09..86ab8f566f44 100644
--- a/drivers/net/ethernet/intel/igc/igc_hw.h
+++ b/drivers/net/ethernet/intel/igc/igc_hw.h
@@ -92,7 +92,6 @@ struct igc_mac_info {
bool asf_firmware_present;
bool arc_subsystem_valid;
- bool autoneg_failed;
bool get_link_status;
};
diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/ethernet/intel/igc/igc_mac.c
index 7ac6637f8db7..142beb9ae557 100644
--- a/drivers/net/ethernet/intel/igc/igc_mac.c
+++ b/drivers/net/ethernet/intel/igc/igc_mac.c
@@ -438,28 +438,14 @@ void igc_config_collision_dist(struct igc_hw *hw)
* Checks the status of auto-negotiation after link up to ensure that the
* speed and duplex were not forced. If the link needed to be forced, then
* flow control needs to be forced also. If auto-negotiation is enabled
- * and did not fail, then we configure flow control based on our link
- * partner.
+ * then we configure flow control based on our link partner.
*/
s32 igc_config_fc_after_link_up(struct igc_hw *hw)
{
u16 mii_status_reg, mii_nway_adv_reg, mii_nway_lp_ability_reg;
- struct igc_mac_info *mac = &hw->mac;
u16 speed, duplex;
s32 ret_val = 0;
- /* Check for the case where we have fiber media and auto-neg failed
- * so we had to force link. In this case, we need to force the
- * configuration of the MAC to match the "fc" parameter.
- */
- if (mac->autoneg_failed)
- ret_val = igc_force_mac_fc(hw);
-
- if (ret_val) {
- hw_dbg("Error forcing flow control settings\n");
- goto out;
- }
-
/* In auto-neg, we need to check and see if Auto-Neg has completed,
* and if so, how the PHY and link partner has flow control
* configured.
--
2.43.0
^ permalink raw reply related
* [PATCH iwl-next v3 2/3] igc: move autoneg-enabled settings into igc_handle_autoneg_enabled()
From: KhaiWenTan @ 2026-04-22 15:57 UTC (permalink / raw)
To: anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev, davem,
edumazet, kuba, pabeni
Cc: intel-wired-lan, netdev, linux-kernel, faizal.abdul.rahim,
hong.aun.looi, khai.wen.tan, Faizal Rahim, Looi,
Aleksandr Loktionov, KhaiWenTan
In-Reply-To: <20260422155701.7420-1-khai.wen.tan@linux.intel.com>
From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Move the advertised link modes and flow control configuration from
igc_ethtool_set_link_ksettings() into igc_handle_autoneg_enabled().
No functional change.
Reviewed-by: Looi, Hong Aun <hong.aun.looi@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Signed-off-by: KhaiWenTan <khai.wen.tan@linux.intel.com>
---
drivers/net/ethernet/intel/igc/igc_ethtool.c | 72 ++++++++++++--------
1 file changed, 44 insertions(+), 28 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 0122009bedd0..cfcbf2fdad6e 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -2000,6 +2000,49 @@ static int igc_ethtool_get_link_ksettings(struct net_device *netdev,
return 0;
}
+/**
+ * igc_handle_autoneg_enabled - Configure autonegotiation advertisement
+ * @adapter: private driver structure
+ * @cmd: ethtool link ksettings from user
+ *
+ * Records advertised speeds and flow control settings when autoneg
+ * is enabled.
+ */
+static void igc_handle_autoneg_enabled(struct igc_adapter *adapter,
+ const struct ethtool_link_ksettings *cmd)
+{
+ struct igc_hw *hw = &adapter->hw;
+ u16 advertised = 0;
+
+ if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
+ 2500baseT_Full))
+ advertised |= ADVERTISE_2500_FULL;
+
+ if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
+ 1000baseT_Full))
+ advertised |= ADVERTISE_1000_FULL;
+
+ if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
+ 100baseT_Full))
+ advertised |= ADVERTISE_100_FULL;
+
+ if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
+ 100baseT_Half))
+ advertised |= ADVERTISE_100_HALF;
+
+ if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
+ 10baseT_Full))
+ advertised |= ADVERTISE_10_FULL;
+
+ if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
+ 10baseT_Half))
+ advertised |= ADVERTISE_10_HALF;
+
+ hw->phy.autoneg_advertised = advertised;
+ if (adapter->fc_autoneg)
+ hw->fc.requested_mode = igc_fc_default;
+}
+
static int
igc_ethtool_set_link_ksettings(struct net_device *netdev,
const struct ethtool_link_ksettings *cmd)
@@ -2007,7 +2050,6 @@ igc_ethtool_set_link_ksettings(struct net_device *netdev,
struct igc_adapter *adapter = netdev_priv(netdev);
struct net_device *dev = adapter->netdev;
struct igc_hw *hw = &adapter->hw;
- u16 advertised = 0;
/* When adapter in resetting mode, autoneg/speed/duplex
* cannot be changed
@@ -2032,34 +2074,8 @@ igc_ethtool_set_link_ksettings(struct net_device *netdev,
while (test_and_set_bit(__IGC_RESETTING, &adapter->state))
usleep_range(1000, 2000);
- if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
- 2500baseT_Full))
- advertised |= ADVERTISE_2500_FULL;
-
- if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
- 1000baseT_Full))
- advertised |= ADVERTISE_1000_FULL;
-
- if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
- 100baseT_Full))
- advertised |= ADVERTISE_100_FULL;
-
- if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
- 100baseT_Half))
- advertised |= ADVERTISE_100_HALF;
-
- if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
- 10baseT_Full))
- advertised |= ADVERTISE_10_FULL;
-
- if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
- 10baseT_Half))
- advertised |= ADVERTISE_10_HALF;
-
if (cmd->base.autoneg == AUTONEG_ENABLE) {
- hw->phy.autoneg_advertised = advertised;
- if (adapter->fc_autoneg)
- hw->fc.requested_mode = igc_fc_default;
+ igc_handle_autoneg_enabled(adapter, cmd);
} else {
netdev_info(dev, "Force mode currently not supported\n");
}
--
2.43.0
^ permalink raw reply related
* [PATCH iwl-next v3 3/3] igc: add support for forcing link speed without autonegotiation
From: KhaiWenTan @ 2026-04-22 15:57 UTC (permalink / raw)
To: anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev, davem,
edumazet, kuba, pabeni
Cc: intel-wired-lan, netdev, linux-kernel, faizal.abdul.rahim,
hong.aun.looi, khai.wen.tan, Faizal Rahim, Looi, KhaiWenTan
In-Reply-To: <20260422155701.7420-1-khai.wen.tan@linux.intel.com>
From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Allow users to force 10/100 Mb/s link speed and duplex via ethtool
when autonegotiation is disabled. Previously, the driver rejected
these requests with "Force mode currently not supported.".
Forcing at 1000 Mb/s and 2500 Mb/s is not supported.
Reviewed-by: Looi, Hong Aun <hong.aun.looi@intel.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Signed-off-by: KhaiWenTan <khai.wen.tan@linux.intel.com>
---
drivers/net/ethernet/intel/igc/igc_base.c | 35 ++++-
drivers/net/ethernet/intel/igc/igc_defines.h | 9 +-
drivers/net/ethernet/intel/igc/igc_ethtool.c | 131 +++++++++++++------
drivers/net/ethernet/intel/igc/igc_hw.h | 9 ++
drivers/net/ethernet/intel/igc/igc_mac.c | 10 ++
drivers/net/ethernet/intel/igc/igc_main.c | 2 +-
drivers/net/ethernet/intel/igc/igc_phy.c | 65 ++++++++-
drivers/net/ethernet/intel/igc/igc_phy.h | 1 +
8 files changed, 211 insertions(+), 51 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_base.c b/drivers/net/ethernet/intel/igc/igc_base.c
index 1613b562d17c..ab9120a3127f 100644
--- a/drivers/net/ethernet/intel/igc/igc_base.c
+++ b/drivers/net/ethernet/intel/igc/igc_base.c
@@ -114,11 +114,35 @@ static s32 igc_setup_copper_link_base(struct igc_hw *hw)
u32 ctrl;
ctrl = rd32(IGC_CTRL);
- ctrl |= IGC_CTRL_SLU;
- ctrl &= ~(IGC_CTRL_FRCSPD | IGC_CTRL_FRCDPX);
- wr32(IGC_CTRL, ctrl);
-
- ret_val = igc_setup_copper_link(hw);
+ ctrl &= ~(IGC_CTRL_FRCSPD | IGC_CTRL_FRCDPX |
+ IGC_CTRL_SPEED_MASK | IGC_CTRL_FD);
+
+ if (hw->mac.autoneg_enabled) {
+ ctrl |= IGC_CTRL_SLU;
+ wr32(IGC_CTRL, ctrl);
+ ret_val = igc_setup_copper_link(hw);
+ } else {
+ ctrl |= IGC_CTRL_SLU | IGC_CTRL_FRCSPD | IGC_CTRL_FRCDPX;
+
+ switch (hw->mac.forced_speed_duplex) {
+ case IGC_FORCED_10H:
+ ctrl |= IGC_CTRL_SPEED_10;
+ break;
+ case IGC_FORCED_10F:
+ ctrl |= IGC_CTRL_SPEED_10 | IGC_CTRL_FD;
+ break;
+ case IGC_FORCED_100H:
+ ctrl |= IGC_CTRL_SPEED_100;
+ break;
+ case IGC_FORCED_100F:
+ ctrl |= IGC_CTRL_SPEED_100 | IGC_CTRL_FD;
+ break;
+ default:
+ return -IGC_ERR_CONFIG;
+ }
+ wr32(IGC_CTRL, ctrl);
+ ret_val = igc_setup_copper_link(hw);
+ }
return ret_val;
}
@@ -443,6 +467,7 @@ static const struct igc_phy_operations igc_phy_ops_base = {
.reset = igc_phy_hw_reset,
.read_reg = igc_read_phy_reg_gpy,
.write_reg = igc_write_phy_reg_gpy,
+ .force_speed_duplex = igc_force_speed_duplex,
};
const struct igc_info igc_base_info = {
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 9482ab11f050..3f504751c2d9 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -129,10 +129,13 @@
#define IGC_ERR_SWFW_SYNC 13
/* Device Control */
+#define IGC_CTRL_FD BIT(0) /* Full Duplex */
#define IGC_CTRL_RST 0x04000000 /* Global reset */
-
#define IGC_CTRL_PHY_RST 0x80000000 /* PHY Reset */
#define IGC_CTRL_SLU 0x00000040 /* Set link up (Force Link) */
+#define IGC_CTRL_SPEED_MASK GENMASK(10, 8)
+#define IGC_CTRL_SPEED_10 FIELD_PREP(IGC_CTRL_SPEED_MASK, 0)
+#define IGC_CTRL_SPEED_100 FIELD_PREP(IGC_CTRL_SPEED_MASK, 1)
#define IGC_CTRL_FRCSPD 0x00000800 /* Force Speed */
#define IGC_CTRL_FRCDPX 0x00001000 /* Force Duplex */
#define IGC_CTRL_VME 0x40000000 /* IEEE VLAN mode enable */
@@ -673,6 +676,10 @@
#define IGC_GEN_POLL_TIMEOUT 1920
/* PHY Control Register */
+#define MII_CR_SPEED_MASK (BIT(6) | BIT(13))
+#define MII_CR_SPEED_10 0x0000 /* SSM=0, SSL=0: 10 Mb/s */
+#define MII_CR_SPEED_100 BIT(13) /* SSM=0, SSL=1: 100 Mb/s */
+#define MII_CR_DUPLEX_EN BIT(8) /* 0 = Half Duplex, 1 = Full Duplex */
#define MII_CR_RESTART_AUTO_NEG 0x0200 /* Restart auto negotiation */
#define MII_CR_POWER_DOWN 0x0800 /* Power down */
#define MII_CR_AUTO_NEG_EN 0x1000 /* Auto Neg Enable */
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index cfcbf2fdad6e..6a54c7a98f39 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -1914,44 +1914,58 @@ static int igc_ethtool_get_link_ksettings(struct net_device *netdev,
ethtool_link_ksettings_add_link_mode(cmd, supported, TP);
ethtool_link_ksettings_add_link_mode(cmd, advertising, TP);
- /* advertising link modes */
- if (hw->phy.autoneg_advertised & ADVERTISE_10_HALF)
- ethtool_link_ksettings_add_link_mode(cmd, advertising, 10baseT_Half);
- if (hw->phy.autoneg_advertised & ADVERTISE_10_FULL)
- ethtool_link_ksettings_add_link_mode(cmd, advertising, 10baseT_Full);
- if (hw->phy.autoneg_advertised & ADVERTISE_100_HALF)
- ethtool_link_ksettings_add_link_mode(cmd, advertising, 100baseT_Half);
- if (hw->phy.autoneg_advertised & ADVERTISE_100_FULL)
- ethtool_link_ksettings_add_link_mode(cmd, advertising, 100baseT_Full);
- if (hw->phy.autoneg_advertised & ADVERTISE_1000_FULL)
- ethtool_link_ksettings_add_link_mode(cmd, advertising, 1000baseT_Full);
- if (hw->phy.autoneg_advertised & ADVERTISE_2500_FULL)
- ethtool_link_ksettings_add_link_mode(cmd, advertising, 2500baseT_Full);
-
/* set autoneg settings */
ethtool_link_ksettings_add_link_mode(cmd, supported, Autoneg);
- ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg);
+ if (hw->mac.autoneg_enabled) {
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg);
+ cmd->base.autoneg = AUTONEG_ENABLE;
+
+ /* advertising link modes only apply when autoneg is on */
+ if (hw->phy.autoneg_advertised & ADVERTISE_10_HALF)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ 10baseT_Half);
+ if (hw->phy.autoneg_advertised & ADVERTISE_10_FULL)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ 10baseT_Full);
+ if (hw->phy.autoneg_advertised & ADVERTISE_100_HALF)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ 100baseT_Half);
+ if (hw->phy.autoneg_advertised & ADVERTISE_100_FULL)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ 100baseT_Full);
+ if (hw->phy.autoneg_advertised & ADVERTISE_1000_FULL)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ 1000baseT_Full);
+ if (hw->phy.autoneg_advertised & ADVERTISE_2500_FULL)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ 2500baseT_Full);
+
+ /* Set pause flow control advertising */
+ switch (hw->fc.requested_mode) {
+ case igc_fc_full:
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ Pause);
+ break;
+ case igc_fc_rx_pause:
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ Pause);
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ Asym_Pause);
+ break;
+ case igc_fc_tx_pause:
+ ethtool_link_ksettings_add_link_mode(cmd, advertising,
+ Asym_Pause);
+ break;
+ default:
+ break;
+ }
+ } else {
+ cmd->base.autoneg = AUTONEG_DISABLE;
+ }
- /* Set pause flow control settings */
+ /* Pause is always supported */
ethtool_link_ksettings_add_link_mode(cmd, supported, Pause);
- switch (hw->fc.requested_mode) {
- case igc_fc_full:
- ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause);
- break;
- case igc_fc_rx_pause:
- ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause);
- ethtool_link_ksettings_add_link_mode(cmd, advertising,
- Asym_Pause);
- break;
- case igc_fc_tx_pause:
- ethtool_link_ksettings_add_link_mode(cmd, advertising,
- Asym_Pause);
- break;
- default:
- break;
- }
-
status = pm_runtime_suspended(&adapter->pdev->dev) ?
0 : rd32(IGC_STATUS);
@@ -1983,7 +1997,6 @@ static int igc_ethtool_get_link_ksettings(struct net_device *netdev,
cmd->base.duplex = DUPLEX_UNKNOWN;
}
cmd->base.speed = speed;
- cmd->base.autoneg = AUTONEG_ENABLE;
/* MDI-X => 2; MDI =>1; Invalid =>0 */
if (hw->phy.media_type == igc_media_type_copper)
@@ -2000,6 +2013,41 @@ static int igc_ethtool_get_link_ksettings(struct net_device *netdev,
return 0;
}
+/**
+ * igc_handle_autoneg_disabled - Configure forced speed/duplex settings
+ * @adapter: private driver structure
+ * @speed: requested speed (must be SPEED_10 or SPEED_100)
+ * @duplex: requested duplex
+ *
+ * Records forced speed/duplex when autoneg is disabled.
+ * Caller must validate speed before calling this function.
+ */
+static void igc_handle_autoneg_disabled(struct igc_adapter *adapter, u32 speed,
+ u8 duplex)
+{
+ struct igc_mac_info *mac = &adapter->hw.mac;
+
+ switch (speed) {
+ case SPEED_10:
+ mac->forced_speed_duplex = (duplex == DUPLEX_FULL) ?
+ IGC_FORCED_10F : IGC_FORCED_10H;
+ break;
+ case SPEED_100:
+ mac->forced_speed_duplex = (duplex == DUPLEX_FULL) ?
+ IGC_FORCED_100F : IGC_FORCED_100H;
+ break;
+ default:
+ WARN_ONCE(1, "Unsupported speed %u\n", speed);
+ return;
+ }
+
+ mac->autoneg_enabled = false;
+
+ /* Half-duplex cannot support flow control per IEEE 802.3 */
+ if (duplex != DUPLEX_FULL)
+ adapter->hw.fc.requested_mode = igc_fc_none;
+}
+
/**
* igc_handle_autoneg_enabled - Configure autonegotiation advertisement
* @adapter: private driver structure
@@ -2038,6 +2086,7 @@ static void igc_handle_autoneg_enabled(struct igc_adapter *adapter,
10baseT_Half))
advertised |= ADVERTISE_10_HALF;
+ hw->mac.autoneg_enabled = true;
hw->phy.autoneg_advertised = advertised;
if (adapter->fc_autoneg)
hw->fc.requested_mode = igc_fc_default;
@@ -2071,14 +2120,20 @@ igc_ethtool_set_link_ksettings(struct net_device *netdev,
}
}
+ if (cmd->base.autoneg == AUTONEG_DISABLE &&
+ cmd->base.speed != SPEED_10 && cmd->base.speed != SPEED_100) {
+ netdev_info(dev, "Unsupported speed for forced link\n");
+ return -EINVAL;
+ }
+
while (test_and_set_bit(__IGC_RESETTING, &adapter->state))
usleep_range(1000, 2000);
- if (cmd->base.autoneg == AUTONEG_ENABLE) {
+ if (cmd->base.autoneg == AUTONEG_ENABLE)
igc_handle_autoneg_enabled(adapter, cmd);
- } else {
- netdev_info(dev, "Force mode currently not supported\n");
- }
+ else
+ igc_handle_autoneg_disabled(adapter, cmd->base.speed,
+ cmd->base.duplex);
/* MDI-X => 2; MDI => 1; Auto => 3 */
if (cmd->base.eth_tp_mdix_ctrl) {
diff --git a/drivers/net/ethernet/intel/igc/igc_hw.h b/drivers/net/ethernet/intel/igc/igc_hw.h
index 86ab8f566f44..62aaee55668a 100644
--- a/drivers/net/ethernet/intel/igc/igc_hw.h
+++ b/drivers/net/ethernet/intel/igc/igc_hw.h
@@ -73,6 +73,13 @@ struct igc_info {
extern const struct igc_info igc_base_info;
+enum igc_forced_speed_duplex {
+ IGC_FORCED_10H,
+ IGC_FORCED_10F,
+ IGC_FORCED_100H,
+ IGC_FORCED_100F,
+};
+
struct igc_mac_info {
struct igc_mac_operations ops;
@@ -93,6 +100,8 @@ struct igc_mac_info {
bool arc_subsystem_valid;
bool get_link_status;
+ bool autoneg_enabled;
+ enum igc_forced_speed_duplex forced_speed_duplex;
};
struct igc_nvm_operations {
diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/ethernet/intel/igc/igc_mac.c
index 142beb9ae557..8bbb6d5581c7 100644
--- a/drivers/net/ethernet/intel/igc/igc_mac.c
+++ b/drivers/net/ethernet/intel/igc/igc_mac.c
@@ -446,6 +446,16 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
u16 speed, duplex;
s32 ret_val = 0;
+ /* When autoneg is disabled, force the MAC flow control settings
+ * to match the "fc" parameter.
+ */
+ if (!hw->mac.autoneg_enabled) {
+ ret_val = igc_force_mac_fc(hw);
+ if (ret_val)
+ hw_dbg("Error forcing flow control settings\n");
+ goto out;
+ }
+
/* In auto-neg, we need to check and see if Auto-Neg has completed,
* and if so, how the PHY and link partner has flow control
* configured.
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 72bc5128d8b8..437e1d1ef1e4 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -7298,7 +7298,7 @@ static int igc_probe(struct pci_dev *pdev,
/* Initialize link properties that are user-changeable */
adapter->fc_autoneg = true;
hw->phy.autoneg_advertised = 0xaf;
-
+ hw->mac.autoneg_enabled = true;
hw->fc.requested_mode = igc_fc_default;
hw->fc.current_mode = igc_fc_default;
diff --git a/drivers/net/ethernet/intel/igc/igc_phy.c b/drivers/net/ethernet/intel/igc/igc_phy.c
index 6c4d204aecfa..4cf737fb3b21 100644
--- a/drivers/net/ethernet/intel/igc/igc_phy.c
+++ b/drivers/net/ethernet/intel/igc/igc_phy.c
@@ -494,12 +494,20 @@ s32 igc_setup_copper_link(struct igc_hw *hw)
s32 ret_val = 0;
bool link;
- /* Setup autoneg and flow control advertisement and perform
- * autonegotiation.
- */
- ret_val = igc_copper_link_autoneg(hw);
- if (ret_val)
- goto out;
+ if (hw->mac.autoneg_enabled) {
+ /* Setup autoneg and flow control advertisement and perform
+ * autonegotiation.
+ */
+ ret_val = igc_copper_link_autoneg(hw);
+ if (ret_val)
+ goto out;
+ } else {
+ ret_val = hw->phy.ops.force_speed_duplex(hw);
+ if (ret_val) {
+ hw_dbg("Error Forcing Speed/Duplex\n");
+ goto out;
+ }
+ }
/* Check link status. Wait up to 100 microseconds for link to become
* valid.
@@ -778,3 +786,48 @@ u16 igc_read_phy_fw_version(struct igc_hw *hw)
return gphy_version;
}
+
+/**
+ * igc_force_speed_duplex - Force PHY speed and duplex settings
+ * @hw: pointer to the HW structure
+ *
+ * Programs the GPY PHY control register to disable autonegotiation
+ * and force the speed/duplex indicated by hw->mac.forced_speed_duplex.
+ */
+s32 igc_force_speed_duplex(struct igc_hw *hw)
+{
+ struct igc_phy_info *phy = &hw->phy;
+ u16 phy_ctrl;
+ s32 ret_val;
+
+ ret_val = phy->ops.read_reg(hw, PHY_CONTROL, &phy_ctrl);
+ if (ret_val)
+ return ret_val;
+
+ phy_ctrl &= ~(MII_CR_SPEED_MASK | MII_CR_DUPLEX_EN |
+ MII_CR_AUTO_NEG_EN | MII_CR_RESTART_AUTO_NEG);
+
+ switch (hw->mac.forced_speed_duplex) {
+ case IGC_FORCED_10H:
+ phy_ctrl |= MII_CR_SPEED_10;
+ break;
+ case IGC_FORCED_10F:
+ phy_ctrl |= MII_CR_SPEED_10 | MII_CR_DUPLEX_EN;
+ break;
+ case IGC_FORCED_100H:
+ phy_ctrl |= MII_CR_SPEED_100;
+ break;
+ case IGC_FORCED_100F:
+ phy_ctrl |= MII_CR_SPEED_100 | MII_CR_DUPLEX_EN;
+ break;
+ default:
+ return -IGC_ERR_CONFIG;
+ }
+
+ ret_val = phy->ops.write_reg(hw, PHY_CONTROL, phy_ctrl);
+ if (ret_val)
+ return ret_val;
+
+ hw->mac.get_link_status = true;
+ return 0;
+}
diff --git a/drivers/net/ethernet/intel/igc/igc_phy.h b/drivers/net/ethernet/intel/igc/igc_phy.h
index 832a7e359f18..d37a89174826 100644
--- a/drivers/net/ethernet/intel/igc/igc_phy.h
+++ b/drivers/net/ethernet/intel/igc/igc_phy.h
@@ -18,5 +18,6 @@ void igc_power_down_phy_copper(struct igc_hw *hw);
s32 igc_write_phy_reg_gpy(struct igc_hw *hw, u32 offset, u16 data);
s32 igc_read_phy_reg_gpy(struct igc_hw *hw, u32 offset, u16 *data);
u16 igc_read_phy_fw_version(struct igc_hw *hw);
+s32 igc_force_speed_duplex(struct igc_hw *hw);
#endif
--
2.43.0
^ permalink raw reply related
* [PATCH net v5 0/2] net/sched: taprio: fix NULL pointer dereference in class dump
From: Weiming Shi @ 2026-04-22 16:19 UTC (permalink / raw)
To: vinicius.gomes, jhs, jiri
Cc: davem, edumazet, kuba, pabeni, horms, vladimir.oltean, shuah,
xmei5, netdev, linux-kselftest, Weiming Shi
Fix a NULL pointer dereference in taprio_dump_class() reachable by an
unprivileged local user on kernels with unprivileged user namespaces
enabled and CONFIG_NET_SCH_TAPRIO=y. The bug allows a local DoS via a
crafted sequence of taprio child-qdisc graft, delete, and class dump.
Patch 1/2 is the fix: replace NULL entries in q->qdiscs[] with the
global &noop_qdisc singleton so that control-plane dump paths, as well
as the existing NULL guards in the data-plane enqueue/dequeue paths,
cannot deref a NULL child qdisc.
Patch 2/2 is a tdc regression test that drives the graft + delete +
class-dump sequence on a multi-queue netdevsim device. It panics the
vulnerable kernel and passes on the fixed one.
v5: only call qdisc_put(*old) when *old is non-NULL and not
&noop_qdisc (Paolo).
v4: https://lore.kernel.org/netdev/20260416185501.647884-3-bestswngs@gmail.com/
add selftests/tc-testing regression test (patch 2/2) (Jamal).
add Assisted-by tag.
v3: https://lore.kernel.org/netdev/20260414104311.74115-2-bestswngs@gmail.com/
fix broken patch
v2: https://lore.kernel.org/netdev/20260410153902.955227-2-bestswngs@gmail.com/
also update NULL guards in taprio_enqueue() and
taprio_dequeue_from_txq() to avoid qlen/backlog inflation (Paolo).
v1: https://lore.kernel.org/netdev/20260330102904.2677818-5-bestswngs@gmail.com/
Weiming Shi (2):
net/sched: taprio: fix NULL pointer dereference in class dump
selftests/tc-testing: add taprio test for class dump after child
delete
net/sched/sch_taprio.c | 13 ++++++----
.../tc-testing/tc-tests/qdiscs/taprio.json | 26 +++++++++++++++++++
2 files changed, 34 insertions(+), 5 deletions(-)
--
2.43.0
^ permalink raw reply
* [PATCH net v5 1/2] net/sched: taprio: fix NULL pointer dereference in class dump
From: Weiming Shi @ 2026-04-22 16:19 UTC (permalink / raw)
To: vinicius.gomes, jhs, jiri
Cc: davem, edumazet, kuba, pabeni, horms, vladimir.oltean, shuah,
xmei5, netdev, linux-kselftest, Weiming Shi
In-Reply-To: <20260422161958.2517539-2-bestswngs@gmail.com>
When a TAPRIO child qdisc is deleted via RTM_DELQDISC, taprio_graft()
is called with new == NULL and stores NULL into q->qdiscs[cl - 1].
Subsequent RTM_GETTCLASS dump operations walk all classes via
taprio_walk() and call taprio_dump_class(), which calls taprio_leaf()
returning the NULL pointer, then dereferences it to read child->handle,
causing a kernel NULL pointer dereference.
The bug is reachable with namespace-scoped CAP_NET_ADMIN on any kernel
with CONFIG_NET_SCH_TAPRIO enabled. On systems with unprivileged user
namespaces enabled, an unprivileged local user can trigger a kernel
panic by creating a taprio qdisc inside a new network namespace,
grafting an explicit child qdisc, deleting it, and requesting a class
dump. The RTM_GETTCLASS dump itself requires no capability.
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000007: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
RIP: 0010:taprio_dump_class (net/sched/sch_taprio.c:2478)
Call Trace:
<TASK>
tc_fill_tclass (net/sched/sch_api.c:1966)
qdisc_class_dump (net/sched/sch_api.c:2326)
taprio_walk (net/sched/sch_taprio.c:2514)
tc_dump_tclass_qdisc (net/sched/sch_api.c:2352)
tc_dump_tclass_root (net/sched/sch_api.c:2370)
tc_dump_tclass (net/sched/sch_api.c:2431)
rtnl_dumpit (net/core/rtnetlink.c:6864)
netlink_dump (net/netlink/af_netlink.c:2325)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6959)
netlink_rcv_skb (net/netlink/af_netlink.c:2550)
</TASK>
Fix this by substituting &noop_qdisc when new is NULL in
taprio_graft(), a common pattern used by other qdiscs (e.g.,
multiq_graft()) to ensure the q->qdiscs[] slots are never NULL.
This makes control-plane dump paths safe without requiring individual
NULL checks.
Since the data-plane paths (taprio_enqueue and taprio_dequeue_from_txq)
previously had explicit NULL guards that would drop/skip the packet
cleanly, update those checks to test for &noop_qdisc instead. Without
this, packets would reach taprio_enqueue_one() which increments the root
qdisc's qlen and backlog before calling the child's enqueue; noop_qdisc
drops the packet but those counters are never rolled back, permanently
inflating the root qdisc's statistics.
After this change *old can be a valid qdisc, NULL, or &noop_qdisc.
Only call qdisc_put(*old) in the first case to avoid decreasing
noop_qdisc's refcount, which was never increased.
Fixes: 665338b2a7a0 ("net/sched: taprio: dump class stats for the actual q->qdiscs[]")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Assisted-by: Claude:claude-opus-4-6
---
net/sched/sch_taprio.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 8e37528119506..a7daf34593e07 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -634,7 +634,7 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
queue = skb_get_queue_mapping(skb);
child = q->qdiscs[queue];
- if (unlikely(!child))
+ if (unlikely(child == &noop_qdisc))
return qdisc_drop(skb, sch, to_free);
if (taprio_skb_exceeds_queue_max_sdu(sch, skb)) {
@@ -717,7 +717,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq,
int len;
u8 tc;
- if (unlikely(!child))
+ if (unlikely(child == &noop_qdisc))
return NULL;
if (TXTIME_ASSIST_IS_ENABLED(q->flags))
@@ -2183,6 +2183,9 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
if (!dev_queue)
return -EINVAL;
+ if (!new)
+ new = &noop_qdisc;
+
if (dev->flags & IFF_UP)
dev_deactivate(dev, false);
@@ -2196,14 +2199,14 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
*old = q->qdiscs[cl - 1];
if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
WARN_ON_ONCE(dev_graft_qdisc(dev_queue, new) != *old);
- if (new)
+ if (new != &noop_qdisc)
qdisc_refcount_inc(new);
- if (*old)
+ if (*old && *old != &noop_qdisc)
qdisc_put(*old);
}
q->qdiscs[cl - 1] = new;
- if (new)
+ if (new != &noop_qdisc)
new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
if (dev->flags & IFF_UP)
--
2.43.0
^ permalink raw reply related
* [PATCH net v5 2/2] selftests/tc-testing: add taprio test for class dump after child delete
From: Weiming Shi @ 2026-04-22 16:19 UTC (permalink / raw)
To: vinicius.gomes, jhs, jiri
Cc: davem, edumazet, kuba, pabeni, horms, vladimir.oltean, shuah,
xmei5, netdev, linux-kselftest, Weiming Shi
In-Reply-To: <20260422161958.2517539-2-bestswngs@gmail.com>
Add a regression test for the NULL pointer dereference fixed in the
previous commit. Before the fix, taprio_graft() stored NULL into
q->qdiscs[cl - 1] when an explicitly grafted child qdisc was deleted
via RTM_DELQDISC; the next RTM_GETTCLASS dump then crashed the kernel
in taprio_dump_class() while reading child->handle.
The test installs a taprio root qdisc on a multi-queue netdevsim
device, grafts a pfifo child onto class 8001:1, deletes that child,
and then performs a class dump. On a fixed kernel the dump succeeds
and all eight taprio classes are listed; on an unpatched kernel the
class dump crashes, which surfaces as a test failure.
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Assisted-by: Claude:claude-opus-4-6
---
.../tc-testing/tc-tests/qdiscs/taprio.json | 26 +++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json
index 557fb074acf0c..cd19d05925e40 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json
@@ -302,5 +302,31 @@
"$TC qdisc del dev $ETH root",
"echo \"1\" > /sys/bus/netdevsim/del_device"
]
+ },
+ {
+ "id": "c7e1",
+ "name": "Class dump after graft and delete of explicit child qdisc",
+ "category": [
+ "qdisc",
+ "taprio"
+ ],
+ "plugins": {
+ "requires": "nsPlugin"
+ },
+ "setup": [
+ "echo \"1 1 8\" > /sys/bus/netdevsim/new_device",
+ "$TC qdisc replace dev $ETH handle 8001: parent root taprio num_tc 8 map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 sched-entry S ff 20000000 clockid CLOCK_TAI",
+ "$TC qdisc add dev $ETH parent 8001:1 handle 8002: pfifo",
+ "$TC qdisc del dev $ETH parent 8001:1 handle 8002:"
+ ],
+ "cmdUnderTest": "$TC class show dev $ETH",
+ "expExitCode": "0",
+ "verifyCmd": "$TC class show dev $ETH",
+ "matchPattern": "class taprio 8001:[0-9]+ root",
+ "matchCount": "8",
+ "teardown": [
+ "$TC qdisc del dev $ETH root",
+ "echo \"1\" > /sys/bus/netdevsim/del_device"
+ ]
}
]
--
2.43.0
^ permalink raw reply related
* [PATCH bpf-next v2 0/2] selftests/bpf: drop xdping tool
From: Alexis Lothoré (eBPF Foundation) @ 2026-04-22 16:20 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, David S. Miller,
Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev
Cc: ebpf, Bastien Curutchet, Thomas Petazzoni, bpf, linux-kselftest,
linux-kernel, netdev, Alexis Lothoré (eBPF Foundation),
Alan Maguire
Hello,
this is the v2 of the small series dropping xdping tool. This removal is
part of the larger effort aiming to tidy the bpf selftests directory.
This new revision updates btf_dump test to make it use another bpf
program rather than xdping_kern so that we can drop it as well.
---
Changes in v2:
- make btf_dump use xdp_dummy rather than xdping_kern
- and so, drop xdping_kern at the same time as the xdping tool
- collect Alan's RB
- Link to v1: https://patch.msgid.link/20260417-xdping-v1-1-9b0ce0e7adf8@bootlin.com
To: Alexei Starovoitov <ast@kernel.org>
To: Daniel Borkmann <daniel@iogearbox.net>
To: Andrii Nakryiko <andrii@kernel.org>
To: Martin KaFai Lau <martin.lau@linux.dev>
To: Eduard Zingerman <eddyz87@gmail.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Song Liu <song@kernel.org>
To: Yonghong Song <yonghong.song@linux.dev>
To: Jiri Olsa <jolsa@kernel.org>
To: Shuah Khan <shuah@kernel.org>
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
To: Jesper Dangaard Brouer <hawk@kernel.org>
To: John Fastabend <john.fastabend@gmail.com>
To: Stanislav Fomichev <sdf@fomichev.me>
Cc: ebpf@linuxfoundation.org
Cc: Bastien Curutchet <bastien.curutchet@bootlin.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: bpf@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
---
Alexis Lothoré (eBPF Foundation) (2):
selftests/bpf: make btf_dump use xdp_dummy rather than xdping_kern
selftests/bpf: drop xdping tool
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 3 -
tools/testing/selftests/bpf/prog_tests/btf_dump.c | 4 +-
tools/testing/selftests/bpf/progs/xdping_kern.c | 183 ----------------
tools/testing/selftests/bpf/test_xdping.sh | 103 ---------
tools/testing/selftests/bpf/xdping.c | 254 ----------------------
tools/testing/selftests/bpf/xdping.h | 13 --
7 files changed, 2 insertions(+), 559 deletions(-)
---
base-commit: b7fb68124aa80db90394236a9a4a6add12f4425d
change-id: 20260417-xdping-5c2ef5a63899
Best regards,
--
Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
^ permalink raw reply
* [PATCH bpf-next v2 1/2] selftests/bpf: make btf_dump use xdp_dummy rather than xdping_kern
From: Alexis Lothoré (eBPF Foundation) @ 2026-04-22 16:20 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, David S. Miller,
Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev
Cc: ebpf, Bastien Curutchet, Thomas Petazzoni, bpf, linux-kselftest,
linux-kernel, netdev, Alexis Lothoré (eBPF Foundation)
In-Reply-To: <20260422-xdping-v2-0-c0f8ccedcf91@bootlin.com>
In order to prepare xdping tool removal from the BPF selftests
directory, make the btf_dump test use another BPF program for the btf
datasec dump test. Use xdp_dummy.bpf.o, as it is already used by various
other tests.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
---
tools/testing/selftests/bpf/prog_tests/btf_dump.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_dump.c b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
index f1642794f70e..9f1b50e07a29 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_dump.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
@@ -1027,8 +1027,8 @@ static void test_btf_dump_datasec_data(char *str)
char license[4] = "GPL";
struct btf_dump *d;
- btf = btf__parse("xdping_kern.bpf.o", NULL);
- if (!ASSERT_OK_PTR(btf, "xdping_kern.bpf.o BTF not found"))
+ btf = btf__parse("xdp_dummy.bpf.o", NULL);
+ if (!ASSERT_OK_PTR(btf, "xdp_dummy.bpf.o BTF not found"))
return;
d = btf_dump__new(btf, btf_dump_snprintf, str, NULL);
--
2.53.0
^ permalink raw reply related
* [PATCH bpf-next v2 2/2] selftests/bpf: drop xdping tool
From: Alexis Lothoré (eBPF Foundation) @ 2026-04-22 16:20 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, David S. Miller,
Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev
Cc: ebpf, Bastien Curutchet, Thomas Petazzoni, bpf, linux-kselftest,
linux-kernel, netdev, Alexis Lothoré (eBPF Foundation),
Alan Maguire
In-Reply-To: <20260422-xdping-v2-0-c0f8ccedcf91@bootlin.com>
As part of a larger cleanup effort in the bpf selftests directory,
tests and scripts are either being converted to the test_progs framework
(so they are executed automatically in bpf CI), or removed if not
relevant for such integration.
The test_xdping.sh script (with the associated xdping.c) acts as a RTT
measurement tool, by attaching two small xdp programs to two interfaces.
Converting this test to test_progs may not make much sense:
- RTT measurement does not really fit in the scope of a functional test,
this is rather about measuring some performance level.
- there are other existing tests in test_progs that actively validate
XDP features like program attachment, return value processing, packet
modification, etc
Drop test_xdping.sh, the corresponding xdping.c userspace part, the
xdping_kern.c program, and the shared header, xdping.h
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 3 -
tools/testing/selftests/bpf/progs/xdping_kern.c | 183 -----------------
tools/testing/selftests/bpf/test_xdping.sh | 103 ----------
tools/testing/selftests/bpf/xdping.c | 254 ------------------------
tools/testing/selftests/bpf/xdping.h | 13 --
6 files changed, 557 deletions(-)
diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index bfdc5518ecc8..986a6389186b 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -21,7 +21,6 @@ test_lirc_mode2_user
flow_dissector_load
test_tcpnotify_user
test_libbpf
-xdping
test_cpp
*.d
*.subskel.h
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 78e60040811e..00a986a7d088 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -111,7 +111,6 @@ TEST_FILES = xsk_prereqs.sh $(wildcard progs/btf_dump_test_case_*.c)
# Order correspond to 'make run_tests' order
TEST_PROGS := test_kmod.sh \
test_lirc_mode2.sh \
- test_xdping.sh \
test_bpftool_build.sh \
test_doc_build.sh \
test_xsk.sh \
@@ -134,7 +133,6 @@ TEST_GEN_PROGS_EXTENDED = \
xdp_features \
xdp_hw_metadata \
xdp_synproxy \
- xdping \
xskxceiver
TEST_GEN_FILES += $(TEST_KMODS) liburandom_read.so urandom_read sign-file uprobe_multi
@@ -320,7 +318,6 @@ $(OUTPUT)/test_tcpnotify_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(TRACE_HELP
$(OUTPUT)/test_sock_fields: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_tag: $(TESTING_HELPERS)
$(OUTPUT)/test_lirc_mode2_user: $(TESTING_HELPERS)
-$(OUTPUT)/xdping: $(TESTING_HELPERS)
$(OUTPUT)/flow_dissector_load: $(TESTING_HELPERS)
$(OUTPUT)/test_maps: $(TESTING_HELPERS)
$(OUTPUT)/test_verifier: $(TESTING_HELPERS) $(CAP_HELPERS) $(UNPRIV_HELPERS)
diff --git a/tools/testing/selftests/bpf/progs/xdping_kern.c b/tools/testing/selftests/bpf/progs/xdping_kern.c
deleted file mode 100644
index 44e2b0ef23ae..000000000000
--- a/tools/testing/selftests/bpf/progs/xdping_kern.c
+++ /dev/null
@@ -1,183 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved. */
-
-#define KBUILD_MODNAME "foo"
-#include <stddef.h>
-#include <string.h>
-#include <linux/bpf.h>
-#include <linux/icmp.h>
-#include <linux/in.h>
-#include <linux/if_ether.h>
-#include <linux/if_packet.h>
-#include <linux/if_vlan.h>
-#include <linux/ip.h>
-
-#include <bpf/bpf_helpers.h>
-#include <bpf/bpf_endian.h>
-
-#include "bpf_compiler.h"
-#include "xdping.h"
-
-struct {
- __uint(type, BPF_MAP_TYPE_HASH);
- __uint(max_entries, 256);
- __type(key, __u32);
- __type(value, struct pinginfo);
-} ping_map SEC(".maps");
-
-static __always_inline void swap_src_dst_mac(void *data)
-{
- unsigned short *p = data;
- unsigned short dst[3];
-
- dst[0] = p[0];
- dst[1] = p[1];
- dst[2] = p[2];
- p[0] = p[3];
- p[1] = p[4];
- p[2] = p[5];
- p[3] = dst[0];
- p[4] = dst[1];
- p[5] = dst[2];
-}
-
-static __always_inline __u16 csum_fold_helper(__wsum sum)
-{
- sum = (sum & 0xffff) + (sum >> 16);
- return ~((sum & 0xffff) + (sum >> 16));
-}
-
-static __always_inline __u16 ipv4_csum(void *data_start, int data_size)
-{
- __wsum sum;
-
- sum = bpf_csum_diff(0, 0, data_start, data_size, 0);
- return csum_fold_helper(sum);
-}
-
-#define ICMP_ECHO_LEN 64
-
-static __always_inline int icmp_check(struct xdp_md *ctx, int type)
-{
- void *data_end = (void *)(long)ctx->data_end;
- void *data = (void *)(long)ctx->data;
- struct ethhdr *eth = data;
- struct icmphdr *icmph;
- struct iphdr *iph;
-
- if (data + sizeof(*eth) + sizeof(*iph) + ICMP_ECHO_LEN > data_end)
- return XDP_PASS;
-
- if (eth->h_proto != bpf_htons(ETH_P_IP))
- return XDP_PASS;
-
- iph = data + sizeof(*eth);
-
- if (iph->protocol != IPPROTO_ICMP)
- return XDP_PASS;
-
- if (bpf_ntohs(iph->tot_len) - sizeof(*iph) != ICMP_ECHO_LEN)
- return XDP_PASS;
-
- icmph = data + sizeof(*eth) + sizeof(*iph);
-
- if (icmph->type != type)
- return XDP_PASS;
-
- return XDP_TX;
-}
-
-SEC("xdp")
-int xdping_client(struct xdp_md *ctx)
-{
- void *data = (void *)(long)ctx->data;
- struct pinginfo *pinginfo = NULL;
- struct ethhdr *eth = data;
- struct icmphdr *icmph;
- struct iphdr *iph;
- __u64 recvtime;
- __be32 raddr;
- __be16 seq;
- int ret;
- __u8 i;
-
- ret = icmp_check(ctx, ICMP_ECHOREPLY);
-
- if (ret != XDP_TX)
- return ret;
-
- iph = data + sizeof(*eth);
- icmph = data + sizeof(*eth) + sizeof(*iph);
- raddr = iph->saddr;
-
- /* Record time reply received. */
- recvtime = bpf_ktime_get_ns();
- pinginfo = bpf_map_lookup_elem(&ping_map, &raddr);
- if (!pinginfo || pinginfo->seq != icmph->un.echo.sequence)
- return XDP_PASS;
-
- if (pinginfo->start) {
- __pragma_loop_unroll_full
- for (i = 0; i < XDPING_MAX_COUNT; i++) {
- if (pinginfo->times[i] == 0)
- break;
- }
- /* verifier is fussy here... */
- if (i < XDPING_MAX_COUNT) {
- pinginfo->times[i] = recvtime -
- pinginfo->start;
- pinginfo->start = 0;
- i++;
- }
- /* No more space for values? */
- if (i == pinginfo->count || i == XDPING_MAX_COUNT)
- return XDP_PASS;
- }
-
- /* Now convert reply back into echo request. */
- swap_src_dst_mac(data);
- iph->saddr = iph->daddr;
- iph->daddr = raddr;
- icmph->type = ICMP_ECHO;
- seq = bpf_htons(bpf_ntohs(icmph->un.echo.sequence) + 1);
- icmph->un.echo.sequence = seq;
- icmph->checksum = 0;
- icmph->checksum = ipv4_csum(icmph, ICMP_ECHO_LEN);
-
- pinginfo->seq = seq;
- pinginfo->start = bpf_ktime_get_ns();
-
- return XDP_TX;
-}
-
-SEC("xdp")
-int xdping_server(struct xdp_md *ctx)
-{
- void *data = (void *)(long)ctx->data;
- struct ethhdr *eth = data;
- struct icmphdr *icmph;
- struct iphdr *iph;
- __be32 raddr;
- int ret;
-
- ret = icmp_check(ctx, ICMP_ECHO);
-
- if (ret != XDP_TX)
- return ret;
-
- iph = data + sizeof(*eth);
- icmph = data + sizeof(*eth) + sizeof(*iph);
- raddr = iph->saddr;
-
- /* Now convert request into echo reply. */
- swap_src_dst_mac(data);
- iph->saddr = iph->daddr;
- iph->daddr = raddr;
- icmph->type = ICMP_ECHOREPLY;
- icmph->checksum = 0;
- icmph->checksum = ipv4_csum(icmph, ICMP_ECHO_LEN);
-
- return XDP_TX;
-}
-
-char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_xdping.sh b/tools/testing/selftests/bpf/test_xdping.sh
deleted file mode 100755
index c3d82e0a7378..000000000000
--- a/tools/testing/selftests/bpf/test_xdping.sh
+++ /dev/null
@@ -1,103 +0,0 @@
-#!/bin/bash
-# SPDX-License-Identifier: GPL-2.0
-
-# xdping tests
-# Here we setup and teardown configuration required to run
-# xdping, exercising its options.
-#
-# Setup is similar to test_tunnel tests but without the tunnel.
-#
-# Topology:
-# ---------
-# root namespace | tc_ns0 namespace
-# |
-# ---------- | ----------
-# | veth1 | --------- | veth0 |
-# ---------- peer ----------
-#
-# Device Configuration
-# --------------------
-# Root namespace with BPF
-# Device names and addresses:
-# veth1 IP: 10.1.1.200
-# xdp added to veth1, xdpings originate from here.
-#
-# Namespace tc_ns0 with BPF
-# Device names and addresses:
-# veth0 IPv4: 10.1.1.100
-# For some tests xdping run in server mode here.
-#
-
-readonly TARGET_IP="10.1.1.100"
-readonly TARGET_NS="xdp_ns0"
-
-readonly LOCAL_IP="10.1.1.200"
-
-setup()
-{
- ip netns add $TARGET_NS
- ip link add veth0 type veth peer name veth1
- ip link set veth0 netns $TARGET_NS
- ip netns exec $TARGET_NS ip addr add ${TARGET_IP}/24 dev veth0
- ip addr add ${LOCAL_IP}/24 dev veth1
- ip netns exec $TARGET_NS ip link set veth0 up
- ip link set veth1 up
-}
-
-cleanup()
-{
- set +e
- ip netns delete $TARGET_NS 2>/dev/null
- ip link del veth1 2>/dev/null
- if [[ $server_pid -ne 0 ]]; then
- kill -TERM $server_pid
- fi
-}
-
-test()
-{
- client_args="$1"
- server_args="$2"
-
- echo "Test client args '$client_args'; server args '$server_args'"
-
- server_pid=0
- if [[ -n "$server_args" ]]; then
- ip netns exec $TARGET_NS ./xdping $server_args &
- server_pid=$!
- sleep 10
- fi
- ./xdping $client_args $TARGET_IP
-
- if [[ $server_pid -ne 0 ]]; then
- kill -TERM $server_pid
- server_pid=0
- fi
-
- echo "Test client args '$client_args'; server args '$server_args': PASS"
-}
-
-set -e
-
-server_pid=0
-
-trap cleanup EXIT
-
-setup
-
-for server_args in "" "-I veth0 -s -S" ; do
- # client in skb mode
- client_args="-I veth1 -S"
- test "$client_args" "$server_args"
-
- # client with count of 10 RTT measurements.
- client_args="-I veth1 -S -c 10"
- test "$client_args" "$server_args"
-done
-
-# Test drv mode
-test "-I veth1 -N" "-I veth0 -s -N"
-test "-I veth1 -N -c 10" "-I veth0 -s -N"
-
-echo "OK. All tests passed"
-exit 0
diff --git a/tools/testing/selftests/bpf/xdping.c b/tools/testing/selftests/bpf/xdping.c
deleted file mode 100644
index 9ed8c796645d..000000000000
--- a/tools/testing/selftests/bpf/xdping.c
+++ /dev/null
@@ -1,254 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved. */
-
-#include <linux/bpf.h>
-#include <linux/if_link.h>
-#include <arpa/inet.h>
-#include <assert.h>
-#include <errno.h>
-#include <signal.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <unistd.h>
-#include <libgen.h>
-#include <net/if.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <netdb.h>
-
-#include "bpf/bpf.h"
-#include "bpf/libbpf.h"
-
-#include "xdping.h"
-#include "testing_helpers.h"
-
-static int ifindex;
-static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
-
-static void cleanup(int sig)
-{
- bpf_xdp_detach(ifindex, xdp_flags, NULL);
- if (sig)
- exit(1);
-}
-
-static int get_stats(int fd, __u16 count, __u32 raddr)
-{
- struct pinginfo pinginfo = { 0 };
- char inaddrbuf[INET_ADDRSTRLEN];
- struct in_addr inaddr;
- __u16 i;
-
- inaddr.s_addr = raddr;
-
- printf("\nXDP RTT data:\n");
-
- if (bpf_map_lookup_elem(fd, &raddr, &pinginfo)) {
- perror("bpf_map_lookup elem");
- return 1;
- }
-
- for (i = 0; i < count; i++) {
- if (pinginfo.times[i] == 0)
- break;
-
- printf("64 bytes from %s: icmp_seq=%d ttl=64 time=%#.5f ms\n",
- inet_ntop(AF_INET, &inaddr, inaddrbuf,
- sizeof(inaddrbuf)),
- count + i + 1,
- (double)pinginfo.times[i]/1000000);
- }
-
- if (i < count) {
- fprintf(stderr, "Expected %d samples, got %d.\n", count, i);
- return 1;
- }
-
- bpf_map_delete_elem(fd, &raddr);
-
- return 0;
-}
-
-static void show_usage(const char *prog)
-{
- fprintf(stderr,
- "usage: %s [OPTS] -I interface destination\n\n"
- "OPTS:\n"
- " -c count Stop after sending count requests\n"
- " (default %d, max %d)\n"
- " -I interface interface name\n"
- " -N Run in driver mode\n"
- " -s Server mode\n"
- " -S Run in skb mode\n",
- prog, XDPING_DEFAULT_COUNT, XDPING_MAX_COUNT);
-}
-
-int main(int argc, char **argv)
-{
- __u32 mode_flags = XDP_FLAGS_DRV_MODE | XDP_FLAGS_SKB_MODE;
- struct addrinfo *a, hints = { .ai_family = AF_INET };
- __u16 count = XDPING_DEFAULT_COUNT;
- struct pinginfo pinginfo = { 0 };
- const char *optstr = "c:I:NsS";
- struct bpf_program *main_prog;
- int prog_fd = -1, map_fd = -1;
- struct sockaddr_in rin;
- struct bpf_object *obj;
- struct bpf_map *map;
- char *ifname = NULL;
- char filename[256];
- int opt, ret = 1;
- __u32 raddr = 0;
- int server = 0;
- char cmd[256];
-
- while ((opt = getopt(argc, argv, optstr)) != -1) {
- switch (opt) {
- case 'c':
- count = atoi(optarg);
- if (count < 1 || count > XDPING_MAX_COUNT) {
- fprintf(stderr,
- "min count is 1, max count is %d\n",
- XDPING_MAX_COUNT);
- return 1;
- }
- break;
- case 'I':
- ifname = optarg;
- ifindex = if_nametoindex(ifname);
- if (!ifindex) {
- fprintf(stderr, "Could not get interface %s\n",
- ifname);
- return 1;
- }
- break;
- case 'N':
- xdp_flags |= XDP_FLAGS_DRV_MODE;
- break;
- case 's':
- /* use server program */
- server = 1;
- break;
- case 'S':
- xdp_flags |= XDP_FLAGS_SKB_MODE;
- break;
- default:
- show_usage(basename(argv[0]));
- return 1;
- }
- }
-
- if (!ifname) {
- show_usage(basename(argv[0]));
- return 1;
- }
- if (!server && optind == argc) {
- show_usage(basename(argv[0]));
- return 1;
- }
-
- if ((xdp_flags & mode_flags) == mode_flags) {
- fprintf(stderr, "-N or -S can be specified, not both.\n");
- show_usage(basename(argv[0]));
- return 1;
- }
-
- if (!server) {
- /* Only supports IPv4; see hints initialization above. */
- if (getaddrinfo(argv[optind], NULL, &hints, &a) || !a) {
- fprintf(stderr, "Could not resolve %s\n", argv[optind]);
- return 1;
- }
- memcpy(&rin, a->ai_addr, sizeof(rin));
- raddr = rin.sin_addr.s_addr;
- freeaddrinfo(a);
- }
-
- /* Use libbpf 1.0 API mode */
- libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
-
- snprintf(filename, sizeof(filename), "%s_kern.bpf.o", argv[0]);
-
- if (bpf_prog_test_load(filename, BPF_PROG_TYPE_XDP, &obj, &prog_fd)) {
- fprintf(stderr, "load of %s failed\n", filename);
- return 1;
- }
-
- main_prog = bpf_object__find_program_by_name(obj,
- server ? "xdping_server" : "xdping_client");
- if (main_prog)
- prog_fd = bpf_program__fd(main_prog);
- if (!main_prog || prog_fd < 0) {
- fprintf(stderr, "could not find xdping program");
- return 1;
- }
-
- map = bpf_object__next_map(obj, NULL);
- if (map)
- map_fd = bpf_map__fd(map);
- if (!map || map_fd < 0) {
- fprintf(stderr, "Could not find ping map");
- goto done;
- }
-
- signal(SIGINT, cleanup);
- signal(SIGTERM, cleanup);
-
- printf("Setting up XDP for %s, please wait...\n", ifname);
-
- printf("XDP setup disrupts network connectivity, hit Ctrl+C to quit\n");
-
- if (bpf_xdp_attach(ifindex, prog_fd, xdp_flags, NULL) < 0) {
- fprintf(stderr, "Link set xdp fd failed for %s\n", ifname);
- goto done;
- }
-
- if (server) {
- close(prog_fd);
- close(map_fd);
- printf("Running server on %s; press Ctrl+C to exit...\n",
- ifname);
- do { } while (1);
- }
-
- /* Start xdping-ing from last regular ping reply, e.g. for a count
- * of 10 ICMP requests, we start xdping-ing using reply with seq number
- * 10. The reason the last "real" ping RTT is much higher is that
- * the ping program sees the ICMP reply associated with the last
- * XDP-generated packet, so ping doesn't get a reply until XDP is done.
- */
- pinginfo.seq = htons(count);
- pinginfo.count = count;
-
- if (bpf_map_update_elem(map_fd, &raddr, &pinginfo, BPF_ANY)) {
- fprintf(stderr, "could not communicate with BPF map: %s\n",
- strerror(errno));
- cleanup(0);
- goto done;
- }
-
- /* We need to wait for XDP setup to complete. */
- sleep(10);
-
- snprintf(cmd, sizeof(cmd), "ping -c %d -I %s %s",
- count, ifname, argv[optind]);
-
- printf("\nNormal ping RTT data\n");
- printf("[Ignore final RTT; it is distorted by XDP using the reply]\n");
-
- ret = system(cmd);
-
- if (!ret)
- ret = get_stats(map_fd, count, raddr);
-
- cleanup(0);
-
-done:
- if (prog_fd > 0)
- close(prog_fd);
- if (map_fd > 0)
- close(map_fd);
-
- return ret;
-}
diff --git a/tools/testing/selftests/bpf/xdping.h b/tools/testing/selftests/bpf/xdping.h
deleted file mode 100644
index afc578df77be..000000000000
--- a/tools/testing/selftests/bpf/xdping.h
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/* Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved. */
-
-#define XDPING_MAX_COUNT 10
-#define XDPING_DEFAULT_COUNT 4
-
-struct pinginfo {
- __u64 start;
- __be16 seq;
- __u16 count;
- __u32 pad;
- __u64 times[XDPING_MAX_COUNT];
-};
--
2.53.0
^ permalink raw reply related
* Re: [PATCH net v2 2/2] 8021q: delete cleared egress QoS mappings
From: Simon Horman @ 2026-04-22 16:20 UTC (permalink / raw)
To: Ren Wei
Cc: netdev, edumazet, andrew+netdev, davem, kuba, pabeni, kees,
yuantan098, ylong030, yifanwucs, tomapufckgml, bird
In-Reply-To: <ecfa6f6ce2467a42647ff4c5221238ae85b79a59.1776647968.git.yuantan098@gmail.com>
On Mon, Apr 20, 2026 at 11:18:46AM +0800, Ren Wei wrote:
> From: Longxuan Yu <ylong030@ucr.edu>
>
> vlan_dev_set_egress_priority() currently keeps cleared egress
> priority mappings in the hash as tombstones. Repeated set/clear cycles
> with distinct skb priorities therefore accumulate mapping nodes until
> device teardown and leak memory.
>
> Delete mappings when vlan_prio is cleared instead of keeping tombstones.
> Now that the egress mapping lists are RCU protected, the node can be
> unlinked safely and freed after a grace period.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: stable@kernel.org
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Reported-by: Xin Liu <bird@lzu.edu.cn>
> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
> Signed-off-by: Longxuan Yu <ylong030@ucr.edu>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
> ---
> net/8021q/vlan_dev.c | 20 ++++++++++++++------
> net/8021q/vlan_netlink.c | 4 ----
> 2 files changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index a5340932b657..7aa3af8b10ea 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -172,26 +172,34 @@ int vlan_dev_set_egress_priority(const struct net_device *dev,
> u32 skb_prio, u16 vlan_prio)
> {
> struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
> + struct vlan_priority_tci_mapping __rcu **mpp;
> struct vlan_priority_tci_mapping *mp;
> struct vlan_priority_tci_mapping *np;
> u32 bucket = skb_prio & 0xF;
> u32 vlan_qos = (vlan_prio << VLAN_PRIO_SHIFT) & VLAN_PRIO_MASK;
>
> /* See if a priority mapping exists.. */
> - mp = rtnl_dereference(vlan->egress_priority_map[bucket]);
> + mpp = &vlan->egress_priority_map[bucket];
> + mp = rtnl_dereference(*mpp);
> while (mp) {
> if (mp->priority == skb_prio) {
> - if (mp->vlan_qos && !vlan_qos)
> + if (!vlan_qos) {
> + rcu_assign_pointer(*mpp, rtnl_dereference(mp->next));
> vlan->nr_egress_mappings--;
> - else if (!mp->vlan_qos && vlan_qos)
> - vlan->nr_egress_mappings++;
> - WRITE_ONCE(mp->vlan_qos, vlan_qos);
> + kfree_rcu(mp, rcu);
> + } else {
> + WRITE_ONCE(mp->vlan_qos, vlan_qos);
> + }
> return 0;
> }
> - mp = rtnl_dereference(mp->next);
> + mpp = &mp->next;
> + mp = rtnl_dereference(*mpp);
> }
Hi Ren,
Thanks for splitting up the patchset, it is very helpful to me.
It seems to me that the mpp/mp construct used is a bit complex and
stems from the use of a hand-rolled list centred the next field of
struct vlan_priority_tci_mapping.
I wonder if things can be simplified by moving to use a standardised
list construct, such as an hlist. And the helpers available for using it.
>
> /* Create a new mapping then. */
> + if (!vlan_qos)
> + return 0;
> +
> np = kmalloc_obj(struct vlan_priority_tci_mapping);
> if (!np)
> return -ENOBUFS;
...
^ permalink raw reply
* Re: [PATCH iproute2] ss: fix vsock port filter
From: Stephen Hemminger @ 2026-04-22 16:21 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Luigi Leonardi, stefanha, netdev, Mathieu Schroeter, David Ahern
In-Reply-To: <CAGxU2F4C4JbqBEWehBhhOehGaXZJOWLkQ09hjaaSAh8-J5W50w@mail.gmail.com>
On Wed, 22 Apr 2026 10:03:49 +0200
Stefano Garzarella <sgarzare@redhat.com> wrote:
> On Wed, 22 Apr 2026 at 01:38, Stephen Hemminger <stephen@networkplumber.org> wrote:
> >
> > On Tue, 21 Apr 2026 14:35:12 +0200
> > Luigi Leonardi <leonardi@redhat.com> wrote:
> >
> > > parse_hostcond() uses get_u32() to parse the vsock port into the
> > > aafilter.port field, which is a long. On 64-bit systems, get_u32()
> > > only writes the lower 32 bits, leaving the upper 32 bits set from
> > > the -1 initialization. This causes the port comparison
> > > "a->port != s->rport" in run_ssfilter() to always fail, since the
> > > corrupted long value never matches the int rport.
> > >
> > > Fix by using get_long() instead, consistent with how AF_PACKET and
> > > AF_NETLINK handle the same field.
> > >
> > > Fixes: c759116a0b2b ("ss: add AF_VSOCK support")
> > > Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
> > > ---
> > > misc/ss.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/misc/ss.c b/misc/ss.c
> > > index 14e9f27a..6e3321ac 100644
> > > --- a/misc/ss.c
> > > +++ b/misc/ss.c
> > > @@ -2323,7 +2323,7 @@ void *parse_hostcond(char *addr, bool is_port)
> > > port = find_port(addr, is_port);
> > >
> > > if (port && strcmp(port, "*") &&
> > > - get_u32((__u32 *)&a.port, port, 0))
> > > + get_long(&a.port, port, 0))
> > > return NULL;
> >
> > If you use get_long() then the code could get negative values.
> > Actually have port in ss as signed value seems like a mistake in original design.
> >
> > The port in unix domain socket is inode number.
> > Originally it was int, but got changed to long back in 6.6
> >
> > The port in ss cache is int.
>
> Yeah, as I mentioned I think the issue was introduced by commit
> 012cb515 ("ss: change aafilter port from int to long (inode support)").
What about this which avoids the cast but keeps the same semantics.
diff --git a/misc/ss.c b/misc/ss.c
index 14e9f27a..e830e146 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2317,14 +2317,16 @@ void *parse_hostcond(char *addr, bool is_port)
if (fam == AF_VSOCK) {
__u32 cid = ~(__u32)0;
+ __u32 vport = 0;
a.addr.family = AF_VSOCK;
port = find_port(addr, is_port);
-
- if (port && strcmp(port, "*") &&
- get_u32((__u32 *)&a.port, port, 0))
- return NULL;
+ if (port && strcmp(port, "*")) {
+ if (get_u32(&vport, port, 0))
+ return NULL;
+ }
+ a.port = vport;
if (!is_port && addr[0] && strcmp(addr, "*")) {
a.addr.bitlen = 32;
^ permalink raw reply related
* Re: [PATCH net-next v2 1/3] net/ethernet: add ZTE network driver support
From: Andrew Lunn @ 2026-04-22 16:24 UTC (permalink / raw)
To: Junyang Han
Cc: netdev, davem, andrew+netdev, edumazet, kuba, pabeni, ran.ming,
han.chengfei, zhang.yanze
In-Reply-To: <20260422144901.2403456-2-han.junyang@zte.com.cn>
> +MODULE_AUTHOR("Junyang Han <han.junyang@zte.com.cn>");
> +MODULE_DESCRIPTION("ZTE Corporation network adapters (DingHai series)
> Ethernet driver");
> +err_pci_save_state:
> + pci_release_selected_regions(dev->pdev, pci_select_bars(dev->
> pdev, IORESOURCE_MEM));
The email is still getting its white space changed. Please don't post
any more versions until you have sent it to yourself and got it back
again in a good state.
https://b4.docs.kernel.org/en/latest/contributor/send.html might help.
Andrew
^ permalink raw reply
* Re: [PATCH v6 3/3] dts: s32g: Add GPR syscon region
From: Jared Kangas @ 2026-04-22 16:25 UTC (permalink / raw)
To: Dan Carpenter
Cc: Chester Lin, Matthias Brugger, Ghennadi Procopciuc,
NXP S32 Linux Team, Frank Li, Sascha Hauer,
Pengutronix Kernel Team, Fabio Estevam, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, linux-arm-kernel, imx,
devicetree, linux-kernel, linaro-s32, netdev
In-Reply-To: <aeZYQp9b5aoV7Ihv@stanley.mountain>
On Mon, Apr 20, 2026 at 07:45:54PM +0300, Dan Carpenter wrote:
> On Mon, Apr 20, 2026 at 09:04:00AM -0700, Jared Kangas wrote:
> > Fixing Dan's address based on mailmap update, sorry for the noise.
> >
> > On Fri, Apr 17, 2026 at 02:36:25PM -0700, Jared Kangas wrote:
> > > Hi Dan,
> > >
> > > [snip]
> > >
> > > I gave this a test on an S32G-VNP-RDB3 and didn't see any issues on the
> > > dwmac-s32 side, but this appears to trigger a panic when reading the new
> > > debugfs regmap/*/registers file for the syscon node:
> > >
> > > [snip]
>
> Oh, ugh... I didn't realize that this wasn't merged. I don't have a
> way to test this any more. The simplest fix would be to do change the
> 0x3000 to 0x100. The GPR63 register is at 0xFC.
>
> reg = <0x4007c000 0x100>;
>
> That's probably the best fix as well. The later register areas would
> be their own syscons.
Tried that out and it looks good to me. With the write routed through
syscon:
# xxd -g4 /proc/device-tree/soc@0/syscon@4007c000/reg
00000000: 4007c000 00000100 @.......
# cat /sys/kernel/debug/regmap/dummy-syscon@0x000000004007c000/registers
00: 00000000
04: 00000002
08: 000000e7
0c: 00000001
10: ffffffff
14: 1fffffff
18: 00007fff
1c: 00000000
20: 00000000
...
f4: 00000000
f8: 00000000
fc: 00000000
No more crashes and 04's value lines up with the S32_PHY_INTF_SEL_RGMII
(0x2) write, so if you're able to post a revision, feel free to add my
T-b:
Tested-by: Jared Kangas <jkangas@redhat.com>
^ permalink raw reply
* Re: [BUG] rxrpc: Client connection leak and BUG() call during kernel IO thread exit
From: Anderson Nascimento @ 2026-04-22 16:25 UTC (permalink / raw)
To: David Howells
Cc: netdev, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, linux-kernel, Jeffrey Altman,
Simon Horman
In-Reply-To: <2593154.1776874118@warthog.procyon.org.uk>
Hi David,
On Wed, Apr 22, 2026 at 1:08 PM David Howells <dhowells@redhat.com> wrote:
>
> Do you by any chance have a reproducer program for this?
>
Sorry, I mixed things up and ended up sending a different reproducer.
I do have it, I will test it now and send here in a few minutes.
> David
>
--
Anderson Nascimento
Allele Security Intelligence
https://www.allelesecurity.com
^ permalink raw reply
* Re: [PATCH v1 1/2] vfio: add callback to get tph info for dma-buf
From: Jason Gunthorpe @ 2026-04-22 16:29 UTC (permalink / raw)
To: Alex Williamson
Cc: Zhiping Zhang, Stanislav Fomichev, Keith Busch, Leon Romanovsky,
Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel,
Yochai Cohen, Yishai Hadas
In-Reply-To: <20260422092327.3f629ad6@shazbot.org>
On Wed, Apr 22, 2026 at 09:23:27AM -0600, Alex Williamson wrote:
> In general though, I'm really hoping that someone interested in
> enabling TPH as an interface through vfio actually decides to take
> resource targeting and revocation seriously. There's no validation of
> the steering tag here relative to what the user has access to and no
> mechanism to revoke those tags if access changes. In fact, there's not
> even a proposed mechanism allowing the user to derive valid steering
> tags. Does the user implicitly know the value and the kernel just
> allows it because... yolo?
This is the steering tag that remote devices will send *INTO* the VFIO
device.
IMHO it is entirely appropriate that the driver controlling the device
decide what tags are sent into it and when, so that's the VFIO
userspace.
There is no concept of access here since the entire device is captured
by VFIO.
If the VFIO device catastrophically malfunctions when receiving
certain steering tags then it is incompatible with VFIO and we should
at least block this new API..
The only requirement is that the device limit the TPH to only the
function that is perceiving them. If a device is really broken and
doesn't meet that then it should be blocked off and it is probably not
safe to be used with VMs at all.
Jason
^ permalink raw reply
* Re: [PATCH AUTOSEL 7.0-6.18] net: mana: hardening: Validate adapter_mtu from MANA_QUERY_DEV_CONFIG
From: Erni Sri Satya Vennela @ 2026-04-22 16:31 UTC (permalink / raw)
To: Sasha Levin
Cc: patches, stable, Jakub Kicinski, kys, haiyangz, wei.liu, decui,
longli, andrew+netdev, davem, edumazet, pabeni, linux-hyperv,
netdev, linux-kernel
In-Reply-To: <20260420132314.1023554-44-sashal@kernel.org>
On Mon, Apr 20, 2026 at 09:17:18AM -0400, Sasha Levin wrote:
> From: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
>
> [ Upstream commit d7709812e13d06132ddae3d21540472ea5cb11c5 ]
>
> As a part of MANA hardening for CVM, validate the adapter_mtu value
> returned from the MANA_QUERY_DEV_CONFIG HWC command.
>
> The adapter_mtu value is used to compute ndev->max_mtu via:
> gc->adapter_mtu - ETH_HLEN. If hardware returns a bogus adapter_mtu
> smaller than ETH_HLEN (e.g. 0), the unsigned subtraction wraps to a
> huge value, silently allowing oversized MTU settings.
>
> Add a validation check to reject adapter_mtu values below
> ETH_MIN_MTU + ETH_HLEN, returning -EPROTO to fail the device
> configuration early with a clear error message.
>
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> Link: https://patch.msgid.link/20260326173101.2010514-1-ernis@linux.microsoft.com
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>
> FOR backporting:
> - Fixes a concrete integer underflow bug (adapter_mtu - ETH_HLEN wraps
> to ~4GB)
> - Small, surgical fix (6 lines of logic)
> - Obviously correct bounds check
> - No regression risk
> - Accepted by netdev maintainer
> - Author is regular driver contributor
> - Affects widely-used Azure MANA driver
> - Security-relevant in CVM environments
>
> 2.53.0
Thanks Sasha, this is good for stable.
This should also carry a Fixes tag:
Fixes: 80f6215b450e ("net: mana: Add support for jumbo frame")
The code was introduced in v6.4-rc1, so the backport applies
to 6.6.y and later stable trees.
Acked-by: Erni Sri Satya Vennela ernis@linux.microsoft.com
^ permalink raw reply
* Re: [PATCH net v3 3/8] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
From: Stanislav Fomichev @ 2026-04-22 16:31 UTC (permalink / raw)
To: Jason Xing; +Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260422033650.68457-4-kerneljasonxing@gmail.com>
> From: Jason Xing <kernelxing@tencent.com>
>
> When xsk_build_skb() processes multi-buffer packets in copy mode, the
> first descriptor stores data into the skb linear area without adding
> any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb
> to accumulate subsequent descriptors.
>
> If a continuation descriptor fails (e.g. alloc_page returns NULL with
> -EAGAIN), we jump to free_err where the condition:
>
> if (skb && !skb_shinfo(skb)->nr_frags)
> kfree_skb(skb);
>
> evaluates to true because nr_frags is still 0 (the first descriptor
> used the linear area, not frags). This frees the skb while xs->skb
> still points to it, creating a dangling pointer. On the next transmit
> attempt or socket close, xs->skb is dereferenced, causing a
> use-after-free or double-free.
>
> Fix by using a !xs->skb check to handle first frag situation, ensuring
> we only free skbs that were freshly allocated in this call
> (xs->skb is NULL) and never free an in-progress multi-buffer skb that
> the caller still references.
>
> Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
> Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/xsk.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 6521604f8d42..d23d1b14b8b4 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -889,7 +889,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> return skb;
>
> free_err:
> - if (skb && !skb_shinfo(skb)->nr_frags)
> + if (skb && !xs->skb)
> kfree_skb(skb);
>
> if (err == -EOVERFLOW) {
> --
> 2.41.3
>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply
* Re: [PATCH net v3 4/8] xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()
From: Stanislav Fomichev @ 2026-04-22 16:31 UTC (permalink / raw)
To: Jason Xing; +Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260422033650.68457-5-kerneljasonxing@gmail.com>
> From: Jason Xing <kernelxing@tencent.com>
>
> Once xsk_skb_init_misc() has been called on an skb, its destructor is
> set to xsk_destruct_skb(), which submits the descriptor address(es) to
> the completion queue and advances the CQ producer. If such an skb is
> subsequently freed via kfree_skb() along an error path - before the
> skb has ever been handed to the driver - the destructor still runs and
> submits a bogus, half-initialized address to the CQ.
>
> Postpone the init phase when we believe the allocation of first frag is
> successfully completed. Before this init, skb can be safely freed by
> kfree_skb().
>
> Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/
> Fixes: c30d084960cf ("xsk: avoid overwriting skb fields for multi-buffer traffic")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/xsk.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index d23d1b14b8b4..88ec6d2cbbcf 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -739,8 +739,6 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> return ERR_PTR(err);
>
> skb_reserve(skb, hr);
> -
> - xsk_skb_init_misc(skb, xs, desc->addr);
> if (desc->options & XDP_TX_METADATA) {
> err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
> if (unlikely(err))
> @@ -834,7 +832,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> if (unlikely(err))
> goto free_err;
>
> - xsk_skb_init_misc(skb, xs, desc->addr);
> if (desc->options & XDP_TX_METADATA) {
> err = xsk_skb_metadata(skb, buffer, desc,
> xs->pool, hr);
> @@ -884,6 +881,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> }
> }
>
> + if (!xs->skb)
> + xsk_skb_init_misc(skb, xs, desc->addr);
> xsk_inc_num_desc(skb);
>
> return skb;
> --
> 2.41.3
>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply
* Re: [PATCH net v3 5/8] xsk: avoid skb leak in XDP_TX_METADATA case
From: Stanislav Fomichev @ 2026-04-22 16:31 UTC (permalink / raw)
To: Jason Xing; +Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260422033650.68457-6-kerneljasonxing@gmail.com>
> From: Jason Xing <kernelxing@tencent.com>
>
> Fix it by explicitly adding kfree_skb() before returning back to its
> caller.
>
> How to reproduce it in virtio_net:
> 1. the current skb is the first one (which means no frag and xs->skb is
> NULL) and users enable metadata feature.
> 2. xsk_skb_metadata() returns a error code.
> 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
> 4. there is no chance to free this skb anymore.
>
> Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
> Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/xsk.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 88ec6d2cbbcf..c49b58199d2f 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -741,8 +741,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> skb_reserve(skb, hr);
> if (desc->options & XDP_TX_METADATA) {
> err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
> - if (unlikely(err))
> + if (unlikely(err)) {
> + kfree_skb(skb);
> return ERR_PTR(err);
> + }
> }
> } else {
> struct xsk_addrs *xsk_addr;
> --
> 2.41.3
>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply
* Re: [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures
From: Stanislav Fomichev @ 2026-04-22 16:31 UTC (permalink / raw)
To: Jason Xing; +Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260422033650.68457-9-kerneljasonxing@gmail.com>
> From: Jason Xing <kernelxing@tencent.com>
>
> In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
> descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
> uintptr_t cast:
>
> skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
>
> On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
> the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
> mode the chunk offset is encoded in bits 48-63 of the descriptor
> address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
> lost entirely. The completion queue then returns a truncated address to
> userspace, making buffer recycling impossible.
>
> Since we hear no one is using AF_XDP on 32-bit arch, we decided to
> strictly stop supporting it at compile time.
>
> Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
> Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
> Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
> index 71af2febe72a..819aa5795f50 100644
> --- a/net/xdp/Kconfig
> +++ b/net/xdp/Kconfig
> @@ -1,7 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
> config XDP_SOCKETS
> bool "XDP sockets"
> - depends on BPF_SYSCALL
> + depends on BPF_SYSCALL && 64BIT
> default n
> help
> XDP sockets allows a channel between XDP programs and
> --
> 2.41.3
>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply
* Re: [PATCH net v3 6/8] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
From: Stanislav Fomichev @ 2026-04-22 16:31 UTC (permalink / raw)
To: Jason Xing; +Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260422033650.68457-7-kerneljasonxing@gmail.com>
> From: Jason Xing <kernelxing@tencent.com>
>
> Fix it by explicitly adding kfree_skb() before returning back to its
> caller.
>
> How to reproduce it in virtio_net:
> 1. the current skb is the first one (which means xs->skb is NULL) and
> hit the limit MAX_SKB_FRAGS.
> 2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
> 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
> is why bug can be triggered.
> 4. there is no chance to free this skb anymore.
>
> Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
> call xsk_drop_skb(xs->skb) to do the right thing.
>
> Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/xsk.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index c49b58199d2f..5e6326e076ab 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -776,8 +776,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> addr = buffer - pool->addrs;
>
> for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) {
> - if (unlikely(i >= MAX_SKB_FRAGS))
> + if (unlikely(i >= MAX_SKB_FRAGS)) {
> + if (!xs->skb)
> + kfree_skb(skb);
> return ERR_PTR(-EOVERFLOW);
> + }
>
> page = pool->umem->pgs[addr >> PAGE_SHIFT];
> get_page(page);
> --
> 2.41.3
>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply
* Re: [BUG] rxrpc: Client connection leak and BUG() call during kernel IO thread exit
From: Anderson Nascimento @ 2026-04-22 16:37 UTC (permalink / raw)
To: David Howells
Cc: netdev, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, linux-kernel, Jeffrey Altman,
Simon Horman
In-Reply-To: <CAPhRvkyCQQM1cv+zQgZjbQDhQ2YVxJ26PG9UrkA6B7uTCguhPQ@mail.gmail.com>
On Wed, Apr 22, 2026 at 1:25 PM Anderson Nascimento
<anderson@allelesecurity.com> wrote:
>
> Hi David,
>
> On Wed, Apr 22, 2026 at 1:08 PM David Howells <dhowells@redhat.com> wrote:
> >
> > Do you by any chance have a reproducer program for this?
> >
>
> Sorry, I mixed things up and ended up sending a different reproducer.
> I do have it, I will test it now and send here in a few minutes.
You can find the reproducers below. I run them simultaneously in a
bash while loop on two different SSH shells. I can trigger running
only one server and one client, but when I discovered the bug I was
running 2 servers and 2 clients simultaneously. You can try both, and
here it doesn't take long to trigger it. My virtual machine is
configured to have 4 cores.
while true; do ./server; done
while true; do ./client; done
server.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <linux/rxrpc.h>
#include <pthread.h>
#define RXRPC_ADD_CHARGE_ACCEPT(control, ctrllen) \
do { \
struct cmsghdr *__cmsg; \
__cmsg = (void *)(control) + (ctrllen); \
__cmsg->cmsg_len = CMSG_LEN(0); \
__cmsg->cmsg_level = SOL_RXRPC; \
__cmsg->cmsg_type = RXRPC_CHARGE_ACCEPT; \
(ctrllen) += __cmsg->cmsg_len; \
\
} while (0)
int sk;
void *__close(void *a){
close(sk);
return NULL;
}
int main(void){
struct sockaddr_rxrpc sockaddr_rxrpc_server;
struct msghdr msg;
struct iovec iov;
char buffer_msg_control[4096];
size_t control_len = 0;
pthread_t th[2];
memset(&sockaddr_rxrpc_server,'\0',sizeof(sockaddr_rxrpc_server));
memset(&buffer_msg_control,'\0',sizeof(buffer_msg_control));
memset(&msg,'\0',sizeof(msg));
memset(&iov,'\0',sizeof(iov));
sk = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
sockaddr_rxrpc_server.srx_family = AF_RXRPC;
sockaddr_rxrpc_server.srx_service = 1234;
sockaddr_rxrpc_server.transport_type = SOCK_DGRAM;
sockaddr_rxrpc_server.transport_len =
sizeof(sockaddr_rxrpc_server.transport.sin);
sockaddr_rxrpc_server.transport.family = AF_INET;
sockaddr_rxrpc_server.transport.sin.sin_port = htons(7000);
bind(sk, (struct sockaddr *)&sockaddr_rxrpc_server,
sizeof(sockaddr_rxrpc_server));
listen(sk,10);
RXRPC_ADD_CHARGE_ACCEPT(buffer_msg_control, control_len);
msg.msg_control = buffer_msg_control;
msg.msg_controllen = control_len;
sendmsg(sk, &msg, 0);
pthread_create(&th[0],NULL,&__close,NULL);
pthread_join(th[0],NULL);
return 0;
}
client.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <linux/rxrpc.h>
static const unsigned char local_addr[4] = { 127, 0, 0, 1 };
#define RXRPC_ADD_CALLID(control, ctrllen, id) \
do { \
struct cmsghdr *__cmsg; \
__cmsg = (void *)(control) + (ctrllen); \
__cmsg->cmsg_len = CMSG_LEN(sizeof(unsigned long)); \
__cmsg->cmsg_level = SOL_RXRPC; \
__cmsg->cmsg_type = RXRPC_USER_CALL_ID; \
*(unsigned long *)CMSG_DATA(__cmsg) = (id); \
(ctrllen) += __cmsg->cmsg_len; \
\
} while (0)
int main(void){
struct msghdr msg;
struct sockaddr_rxrpc sockaddr_rxrpc_local;
struct sockaddr_rxrpc sockaddr_rxrpc_client;
char buffer_msg_control[4096];
size_t control_len;
int sk;
memset(&sockaddr_rxrpc_local,'\0',sizeof(sockaddr_rxrpc_local));
memset(&sockaddr_rxrpc_client,'\0',sizeof(sockaddr_rxrpc_client));
memset(&buffer_msg_control,'\0',sizeof(buffer_msg_control));
memset(&msg,'\0',sizeof(msg));
sk = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
sockaddr_rxrpc_local.srx_family = AF_RXRPC;
sockaddr_rxrpc_local.srx_service = 0;
sockaddr_rxrpc_local.transport_type = SOCK_DGRAM;
sockaddr_rxrpc_local.transport_len = sizeof(sockaddr_rxrpc_local.transport.sin);
sockaddr_rxrpc_local.transport.family = AF_INET;
sockaddr_rxrpc_local.transport.sin.sin_port = htons(7001);
memcpy(&sockaddr_rxrpc_local.transport.sin.sin_addr, &local_addr, 4);
bind(sk, (struct sockaddr *)&sockaddr_rxrpc_local,
sizeof(sockaddr_rxrpc_local));
sockaddr_rxrpc_client.srx_family = AF_RXRPC;
sockaddr_rxrpc_client.srx_service = 1234;
sockaddr_rxrpc_client.transport_type = SOCK_DGRAM;
sockaddr_rxrpc_client.transport_len =
sizeof(sockaddr_rxrpc_client.transport.sin);
sockaddr_rxrpc_client.transport.family = AF_INET;
sockaddr_rxrpc_client.transport.sin.sin_port = htons(7000);
memcpy(&sockaddr_rxrpc_client.transport.sin.sin_addr, &local_addr, 4);
connect(sk, (struct sockaddr *)&sockaddr_rxrpc_client,
sizeof(sockaddr_rxrpc_client));
control_len = 0;
RXRPC_ADD_CALLID(buffer_msg_control, control_len, 0x1234);
msg.msg_control = buffer_msg_control;
msg.msg_controllen = control_len;
sendmsg(sk, &msg, 0);
return 0;
}
The report I have just triggered.
[ 473.601077] rxrpc: AF_RXRPC: Leaked client conn 00000000bf02a6a7 {1}
[ 473.601115] ------------[ cut here ]------------
[ 473.601117] kernel BUG at net/rxrpc/conn_client.c:64!
[ 473.601169] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 473.601180] CPU: 0 UID: 0 PID: 1107239 Comm: krxrpcio/7001 Not
tainted 6.18.13-200.fc43.x86_64 #1 PREEMPT(lazy)
[ 473.601193] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[ 473.601205] RIP: 0010:rxrpc_purge_client_connections+0x58/0xa0 [rxrpc]
[ 473.601261] Code: 28 01 00 00 00 74 25 31 c0 48 8d 74 24 0c 48 89
cf 89 44 24 0c 48 89 0c 24 e8 d4 ec c2 c1 48 89 c6 48 85 c0 0f 85 49
dd 01 00 <0f> 0b 31 f6 48 89 cf 48 89 0c 24 e8 c8 aa c4 c1 48 8b 0c 24
85 c0
[ 473.601280] RSP: 0018:ffffc900159cfdd8 EFLAGS: 00010246
[ 473.601288] RAX: 0000000000000000 RBX: ffff88810a6b4800 RCX: 0000000000000000
[ 473.601297] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88810a6b4920
[ 473.601305] RBP: ffff888123398000 R08: ffffc900159cfdb8 R09: ffff88810a6b4928
[ 473.601313] R10: 0000000000000018 R11: 0000000040000000 R12: ffff88810a9cda00
[ 473.601322] R13: ffff88810a6b4800 R14: ffffc900159cfe70 R15: ffff88812d0c2800
[ 473.601330] FS: 0000000000000000(0000) GS:ffff8882af626000(0000)
knlGS:0000000000000000
[ 473.601339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 473.601347] CR2: 00007faf20630030 CR3: 000000000382e002 CR4: 00000000003706f0
[ 473.601380] Call Trace:
[ 473.601387] <TASK>
[ 473.601393] rxrpc_destroy_local+0xc9/0xe0 [rxrpc]
[ 473.601443] rxrpc_io_thread+0x65d/0x750 [rxrpc]
[ 473.601487] ? __pfx_rxrpc_io_thread+0x10/0x10 [rxrpc]
[ 473.601527] kthread+0xfc/0x240
[ 473.601536] ? __pfx_kthread+0x10/0x10
[ 473.601542] ret_from_fork+0xf4/0x110
[ 473.601550] ? __pfx_kthread+0x10/0x10
[ 473.601558] ret_from_fork_asm+0x1a/0x30
[ 473.601574] </TASK>
[ 473.601578] Modules linked in: vsock_diag fcrypt pcbc rxrpc
ip6_udp_tunnel krb5 udp_tunnel rfkill nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 nf_tables intel_rapl_msr intel_rapl_common
intel_uncore_frequency_common intel_pmc_core pmt_telemetry
pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec rapl
vmw_balloon sunrpc qrtr vmxnet3 i2c_piix4 i2c_smbus binfmt_misc joydev
loop dm_multipath nfnetlink zram lz4hc_compress lz4_compress
vmw_vsock_vmci_transport vsock vmw_vmci xfs nvme nvme_core
nvme_keyring polyval_clmulni ghash_clmulni_intel nvme_auth vmwgfx hkdf
drm_ttm_helper ata_generic ttm pata_acpi serio_raw scsi_dh_rdac
scsi_dh_emc scsi_dh_alua i2c_dev fuse
[ 473.601690] ---[ end trace 0000000000000000 ]---
[ 473.601698] RIP: 0010:rxrpc_purge_client_connections+0x58/0xa0 [rxrpc]
[ 473.601794] Code: 28 01 00 00 00 74 25 31 c0 48 8d 74 24 0c 48 89
cf 89 44 24 0c 48 89 0c 24 e8 d4 ec c2 c1 48 89 c6 48 85 c0 0f 85 49
dd 01 00 <0f> 0b 31 f6 48 89 cf 48 89 0c 24 e8 c8 aa c4 c1 48 8b 0c 24
85 c0
[ 473.601813] RSP: 0018:ffffc900159cfdd8 EFLAGS: 00010246
[ 473.601820] RAX: 0000000000000000 RBX: ffff88810a6b4800 RCX: 0000000000000000
[ 473.601829] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88810a6b4920
[ 473.601837] RBP: ffff888123398000 R08: ffffc900159cfdb8 R09: ffff88810a6b4928
[ 473.601845] R10: 0000000000000018 R11: 0000000040000000 R12: ffff88810a9cda00
[ 473.602211] R13: ffff88810a6b4800 R14: ffffc900159cfe70 R15: ffff88812d0c2800
[ 473.602599] FS: 0000000000000000(0000) GS:ffff8882af626000(0000)
knlGS:0000000000000000
[ 473.603301] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 473.604869] CR2: 00007faf20630030 CR3: 000000000382e002 CR4: 00000000003706f0
Best regards,
--
Anderson Nascimento
Allele Security Intelligence
https://www.allelesecurity.com
^ permalink raw reply
* Re: [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures
From: Jason Xing @ 2026-04-22 16:37 UTC (permalink / raw)
To: Alexander Lobakin
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
In-Reply-To: <2e0e0e18-cb75-4638-9a12-5906de6a8308@intel.com>
On Thu, Apr 23, 2026 at 12:10 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Jason Xing <kerneljasonxing@gmail.com>
> Date: Wed, 22 Apr 2026 11:36:50 +0800
>
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
> > descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
> > uintptr_t cast:
> >
> > skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
> >
> > On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
> > the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
> > mode the chunk offset is encoded in bits 48-63 of the descriptor
> > address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
> > lost entirely. The completion queue then returns a truncated address to
> > userspace, making buffer recycling impossible.
>
> What if we relax the restriction a bit? For example, refuse to configure
As to the bug itself, yes, It only affects the unaligned mode.
I wonder if we can support this after someone requires us to support
32-bit arch and use it in the real world, then we can use the previous
patch to complete the full support (which doesn't harm the path on
64-bit arch).
The code looks like this based on your suggestion. Just for the record.
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 58da2f4f4397..03417b04592f 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -177,6 +177,9 @@ static int xdp_umem_reg(struct xdp_umem *umem,
struct xdp_umem_reg *mr)
if (mr->flags & ~XDP_UMEM_FLAGS_VALID)
return -EINVAL;
+ if (!IS_ENABLED(CONFIG_64BIT) && unaligned_chunks)
+ return -EOPNOTSUPP;
+
if (!unaligned_chunks && !is_power_of_2(chunk_size))
return -EINVAL;
Actually I'm fine with either of them. Right now I'm not so sure which
direction this patch should take :)
Thanks,
Jason
> an XSk socket in unaligned mode if on a 32-bit arch? Or add a check
> under CONFIG_32_BIT like it was done in Page Pool:
>
> skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
>
> #ifdef CONFIG_32BIT
> if (((uintptr_t)skb_shinfo(skb)->destructor_arg) & ~0x1UL) != addr)
> // WARN_ONCE or whatever + error path
> #endif
>
> I never used XSk on a 32-bit arch, but back when I was working on 32-bit
> MIPS 1G routers, I wanted to add native XSk support to the Eth driver.
> Sure, just for fun, now that we have cheap AArch64 and other 64-bit
> embedded chips, 32-bit embedded networking SoCs are almost dead, but
> OTOH, as you can see, other subsystems like PP still try to support 32 bit.
> Especially given that this issue applies to only to the skb XSk path,
> not native in-driver implementations.
>
> >
> > Since we hear no one is using AF_XDP on 32-bit arch, we decided to
> > strictly stop supporting it at compile time.
> >
> > Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
> > Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
> > Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> > net/xdp/Kconfig | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
> > index 71af2febe72a..819aa5795f50 100644
> > --- a/net/xdp/Kconfig
> > +++ b/net/xdp/Kconfig
> > @@ -1,7 +1,7 @@
> > # SPDX-License-Identifier: GPL-2.0-only
> > config XDP_SOCKETS
> > bool "XDP sockets"
> > - depends on BPF_SYSCALL
> > + depends on BPF_SYSCALL && 64BIT
> > default n
> > help
> > XDP sockets allows a channel between XDP programs and
>
> Thanks,
> Olek
^ permalink raw reply related
* Re: [PATCH net-next v7 0/5] TLS read_sock performance scalability
From: Chuck Lever @ 2026-04-22 16:41 UTC (permalink / raw)
To: john.fastabend, Jakub Kicinski, Sabrina Dubroca
Cc: netdev, kernel-tls-handshake, Chuck Lever, Hannes Reinecke,
Alistair Francis
In-Reply-To: <20260328-tls-read-sock-v7-0-15678415dfc1@oracle.com>
On Sat, Mar 28, 2026, at 11:17 AM, Chuck Lever wrote:
> I'd like to encourage in-kernel kTLS consumers (i.e., NFS and
> NVMe/TCP) to coalesce on the use of read_sock. When I suggested
> this to Hannes, he reported a number of nagging performance
> scalability issues with read_sock. This series is an attempt to
> run these issues down and get them fixed before we convert the
> above sock_recvmsg consumers over to read_sock.
>
> Batch async decryption and its submit/deliver scaffolding were
> dropped from this series because async_capable is always false
> for TLS 1.3, which NFS and NVMe/TCP both require. Async crypto
> support for TLS 1.3 is a prerequisite for revisiting that work.
>
> ---
> Changes since v6:
> - Rebased on net-next, v5's 1/6 was merged upstream
>
> Changes since v5:
> - Patch 6: Set released = true when sk_flush_backlog() returns
> true, so tls_strp_msg_load() knows the socket lock was
> released (Sabrina)
> - Patch 6: Drop Fixes tag; submit bug fix separately via net
> if warranted (Sabrina)
> - Patch 6: Note redundant flush on cold path in commit message
> (Sabrina)
>
> Changes since v4:
> - Drop batch async decryption and submit/deliver restructure:
> async_capable is always false for TLS 1.3, so the new code
> was unreachable for NFS and NVMe/TCP
> - Purge async_hold directly in tls_decrypt_async_wait() and drop
> the tls_decrypt_async_drain() wrapper
> - Merge tls_strp_check_rcv_quiet() into tls_strp_check_rcv() with
> a bool wake parameter; fix lost wakeup on the recvmsg exit path
>
> Changes since v3:
> - Clarify why tls_decrypt_async_drain() is separate from _wait()
> - Fold tls_err_abort() into tls_rx_one_record(), drop tls_rx_decrypt_record()
> - Move backlog flush into tls_rx_rec_wait() so all RX paths benefit
>
> Changes since v2:
> - Fix short read self tests
>
> Changes since v1:
> - Add C11 reference
> - Extend data_ready reduction to recvmsg and splice
> - Restructure read_sock and recvmsg using shared helpers
>
> ---
> Chuck Lever (5):
> tls: Abort the connection on decrypt failure
> tls: Fix dangling skb pointer in tls_sw_read_sock()
> tls: Factor tls_strp_msg_release() from tls_strp_msg_done()
> tls: Suppress spurious saved_data_ready on all receive paths
> tls: Flush backlog before waiting for a new record
>
> net/tls/tls.h | 4 ++--
> net/tls/tls_main.c | 2 +-
> net/tls/tls_strp.c | 42 +++++++++++++++++++++++++++++++-----------
> net/tls/tls_sw.c | 50 ++++++++++++++++++++++++++++++--------------------
> 4 files changed, 64 insertions(+), 34 deletions(-)
> ---
> base-commit: ced629dc8e5c51ff2b5d847adeeb1035cd655d58
> change-id: 20260317-tls-read-sock-a0022c9df265
I see that this series is currently not in v7.1. What is left to do?
--
Chuck Lever
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox