* Re: [PATCH net,v2 00/14] Netfilter fixes for net
From: Pablo Neira Ayuso @ 2026-06-22 8:16 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
In-Reply-To: <20260620222738.112506-1-pablo@netfilter.org>
Hi,
Sashiko reports two issues, one in:
- netfilter: flowtable: fix offloaded ct timeout never being extended
which is real for net/sched/act_ct.c, this was a preexisting issue,
we can follow up on it.
- netfilter: nf_conntrack_expect: use conntrack GC to reap expectations
I already planned to follow up on this and a few more subtle issues
(includeing one related patch I have withdrew because it is
incomplete).
Please apply, thanks.
On Sun, Jun 21, 2026 at 12:27:24AM +0200, Pablo Neira Ayuso wrote:
> This is v2, dropping two patches that need a bit more work,
> uncovered by sashiko. I have revisit the working of this cover
> letter to refine it.
>
> -o-
>
> Hi,
>
> The following patchset contains Netfilter fixes for net. This batches
> fixes for real crashes with trivial/correctness fixes. There is too
> a rework of the conntrack expectation timeout strategy to deal with
> a possible race when removing an expectation.
>
> 1) Fix the incorrect flowtable timeout extension for entries in
> hw offload, from Adrian Bente. This is correcting a defect in
> the functionality, no crash.
>
> 2) Hold reference to device under the fake dst in br_netfilter,
> from Haoze Xie. This is fixing a possible UaF if the device
> is removed while packet is sitting in nfqueue.
>
> 3) Reject template conntrack in xt_cluster, otherwise access to
> uninitialize conntrack fields are possible leading to WARN_ON
> due to unset layer 3 protocol. From Wyatt Feng.
>
> 4) Make sure the IPv6 tunnel header is in the linear skb data
> area before pulling. While at it remove incomplete NEXTHDR_DEST
> support. From Lorenzo Bianconi. This possibly leading to crash
> if IPv4 header is not in the linear area.
>
> 5) Use test_bit_acquire in ipset hash set to avoid reordering
> of subsequent memory access. This is addressing a LLM related
> report, no crash has been observed. From Jozsef Kadlecsik.
>
> 6) Use test_bit_acquire in ipset bitmap set too, for the same
> reason as in the previous patch, from Jozsef Kadlecsik.
>
> 7) Call kfree_rcu() after rcu_assign_pointer() to address a
> possible UaF if kfree_rcu() runs inmediately, which to my
> understanding never happens. Never observed in practise,
> reported by LLM. Also from Jozsef Kadlecsik.
>
> 8) Use disable_delayed_work_sync() instead cancel_delayed_work_sync()
> to avoid that ipset GC handler re-queues work as reported by LLM.
> From Jozsef Kadlecsik. This is for correctness.
>
> 9) Restore the check in nft_payload for exceeding payloda offset
> over 2^16. From Florian Westphal. This fixes a silent truncation,
> not a big deal, but better be assertive and reject it.
>
> 10) Validate NFT_META_BRI_IIFHWADDR can only run from bridge
> prerouting. From Florian Westphal. Harmless but it could allow
> to read bytes from skb->cb.
>
> 11) Zero out destination hardware address during the flowtable
> path setup, also from Florian. This is a correctness fix, LLM
> points that possible infoleak can happen but topology to achieve
> it is not clear.
>
> 12) Skip IPv4 options if present when building the IPV4 reject reply.
> Otherwise bytes in the IPv4 options header can be sent back to
> origin where the ICMP header is being expected. Again from
> Florian Westphal.
>
> 13) Replace timer API for expectation by GC worker approach. This
> is implicitly fixing a race between nf_ct_remove_expectations()
> which might fail to remove the expectation due to timer_del()
> returning false because timer has expired and callback is
> being run concurrently. This fix is addressing a crash that has
> been already reported with a reproducer.
>
> 14) Check if br_vlan_get_pvid_rcu() fails, otherwise possible stack
> infoleak of 4-bytes. From Florian Westphal.
>
> Please, pull these changes from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-26-06-21
>
> Thanks.
>
> ----------------------------------------------------------------
>
> The following changes since commit 96e7f9122aae0ed000ee321f324b812a447906d9:
>
> eth: fbnic: take netif_addr_lock_bh() around rx mode address programming (2026-06-18 18:36:26 -0700)
>
> are available in the Git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-26-06-21
>
> for you to fetch changes up to 27dd2997746d54ebc079bb13161cc1bdd401d4a6:
>
> netfilter: nft_meta_bridge: fix NFT_META_BRI_IIFPVID stack leak (2026-06-21 00:18:37 +0200)
>
> ----------------------------------------------------------------
> netfilter pull request 26-06-21
>
> ----------------------------------------------------------------
> Adrian Bente (1):
> netfilter: flowtable: fix offloaded ct timeout never being extended
>
> Florian Westphal (5):
> netfilter: nft_payload: reject offsets exceeding 65535 bytes
> netfilter: nft_meta_bridge: add validate callback for get operations
> netfilter: nft_flow_offload: zero device address for non-ether case
> netfilter: nf_reject: skip iphdr options when looking for icmp header
> netfilter: nft_meta_bridge: fix NFT_META_BRI_IIFPVID stack leak
>
> Haoze Xie (1):
> netfilter: nf_queue: pin bridge device while NFQUEUE holds fake dst
>
> Jozsef Kadlecsik (4):
> netfilter: ipset: Don't use test_bit() in lockless RCU readers in hash types
> netfilter: ipset: Don't use test_bit() in lockless RCU readers in bitmap types
> netfilter: ipset: fix order of kfree_rcu() and rcu_assign_pointer()
> netfilter: ipset: make sure gc is properly stopped
>
> Lorenzo Bianconi (1):
> netfilter: flowtable: fix and simplify IP6IP6 tunnel handling
>
> Pablo Neira Ayuso (1):
> netfilter: nf_conntrack_expect: use conntrack GC to reap expectations
>
> Wyatt Feng (1):
> netfilter: xt_cluster: reject template conntracks in hash match
>
> include/net/netfilter/nf_conntrack_expect.h | 16 ++-
> include/net/netfilter/nf_queue.h | 1 +
> include/net/netfilter/nft_meta.h | 2 +
> include/uapi/linux/netfilter/nf_conntrack_common.h | 1 +
> net/bridge/netfilter/nft_meta_bridge.c | 23 +++-
> net/ipv4/netfilter/nf_reject_ipv4.c | 2 +-
> net/ipv6/ip6_tunnel.c | 7 +
> net/netfilter/ipset/ip_set_bitmap_gen.h | 4 +-
> net/netfilter/ipset/ip_set_bitmap_ip.c | 2 +-
> net/netfilter/ipset/ip_set_bitmap_ipmac.c | 2 +-
> net/netfilter/ipset/ip_set_bitmap_port.c | 2 +-
> net/netfilter/ipset/ip_set_core.c | 4 +-
> net/netfilter/ipset/ip_set_hash_gen.h | 12 +-
> net/netfilter/nf_conntrack_core.c | 33 ++++-
> net/netfilter/nf_conntrack_expect.c | 145 ++++++++++-----------
> net/netfilter/nf_conntrack_h323_main.c | 4 +-
> net/netfilter/nf_conntrack_helper.c | 10 +-
> net/netfilter/nf_conntrack_netlink.c | 22 ++--
> net/netfilter/nf_conntrack_sip.c | 13 +-
> net/netfilter/nf_flow_table_core.c | 13 +-
> net/netfilter/nf_flow_table_ip.c | 80 +++---------
> net/netfilter/nf_flow_table_path.c | 4 +-
> net/netfilter/nf_queue.c | 14 ++
> net/netfilter/nfnetlink_queue.c | 3 +
> net/netfilter/nft_ct.c | 3 +-
> net/netfilter/nft_meta.c | 5 +-
> net/netfilter/nft_payload.c | 16 ++-
> net/netfilter/xt_cluster.c | 2 +-
> .../selftests/net/netfilter/nft_flowtable.sh | 8 +-
> 29 files changed, 254 insertions(+), 199 deletions(-)
>
^ permalink raw reply
* Re: [PATCH net] net: dst: block BH in ipip6_tunnel_xmit
From: Eric Dumazet @ 2026-06-22 8:13 UTC (permalink / raw)
To: yuan.gao
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
Yue Haibing, Kuniyuki Iwashima, Thorsten Blum, Kyle Zeng,
Kees Cook, netdev, linux-kernel
In-Reply-To: <20260622033118.244651-1-yuan.gao@ucloud.cn>
On Sun, Jun 21, 2026 at 8:31 PM yuan.gao <yuan.gao@ucloud.cn> wrote:
>
> Similar to commit 1378817486d6 ("tipc: block BH before using dst_cache"),
> the dst cache helper functions must be invoked with local BH disabled.
>
> This ensures proper synchronization and fixes a potential race condition
> on SMP systems.
>
> Signed-off-by: yuan.gao <yuan.gao@ucloud.cn>
> ---
All ndo_start_xmit() methods already run with BH blocked, can you give
us a stack trace when this would not be enforced?
You forgot a Fixes: tag.
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH iwl-next v5 4/4] igc: add support for forcing link speed without autonegotiation
From: Kadosh, MoriyaX @ 2026-06-22 8:12 UTC (permalink / raw)
To: Ruinskiy, Dima, KhaiWenTan, anthony.l.nguyen, przemyslaw.kitszel,
andrew+netdev, davem, edumazet, kuba, pabeni
Cc: intel-wired-lan, netdev, linux-kernel, faizal.abdul.rahim,
hong.aun.looi, hector.blanco.alcaine, khai.wen.tan, Faizal Rahim
In-Reply-To: <d8f4f16c-adf6-4d99-bb76-09c047ba19eb@intel.com>
On 14/06/2026 10:17, Ruinskiy, Dima wrote:
> On 08/05/2026 0:47, KhaiWenTan wrote:
>> From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>>
>> Allow users to force 10/100 Mb/s link speed and duplex via ethtool
>> when autonegotiation is disabled. Previously, the driver rejected
>> these requests with "Force mode currently not supported.".
>>
>> Forcing at 1000 Mb/s and 2500 Mb/s is not supported.
>>
>> Reviewed-by: Looi Hong Aun <hong.aun.looi@intel.com>
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>> Signed-off-by: Khai Wen Tan <khai.wen.tan@linux.intel.com>
>> ---
>> drivers/net/ethernet/intel/igc/igc_base.c | 35 ++++-
>> drivers/net/ethernet/intel/igc/igc_defines.h | 9 +-
>> drivers/net/ethernet/intel/igc/igc_ethtool.c | 138 ++++++++++++++-----
>> drivers/net/ethernet/intel/igc/igc_hw.h | 9 ++
>> drivers/net/ethernet/intel/igc/igc_mac.c | 12 ++
>> drivers/net/ethernet/intel/igc/igc_main.c | 2 +-
>> drivers/net/ethernet/intel/igc/igc_phy.c | 65 ++++++++-
>> drivers/net/ethernet/intel/igc/igc_phy.h | 1 +
>> 8 files changed, 220 insertions(+), 51 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc_base.c b/drivers/net/
>> ethernet/intel/igc/igc_base.c
>> index 1613b562d17c..ab9120a3127f 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_base.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_base.c
>> @@ -114,11 +114,35 @@ static s32 igc_setup_copper_link_base(struct
>> igc_hw *hw)
>> u32 ctrl;
>> ctrl = rd32(IGC_CTRL);
>> - ctrl |= IGC_CTRL_SLU;
>> - ctrl &= ~(IGC_CTRL_FRCSPD | IGC_CTRL_FRCDPX);
>> - wr32(IGC_CTRL, ctrl);
>> -
>> - ret_val = igc_setup_copper_link(hw);
>> + ctrl &= ~(IGC_CTRL_FRCSPD | IGC_CTRL_FRCDPX |
>> + IGC_CTRL_SPEED_MASK | IGC_CTRL_FD);
>> +
>> + if (hw->mac.autoneg_enabled) {
>> + ctrl |= IGC_CTRL_SLU;
>> + wr32(IGC_CTRL, ctrl);
>> + ret_val = igc_setup_copper_link(hw);
>> + } else {
>> + ctrl |= IGC_CTRL_SLU | IGC_CTRL_FRCSPD | IGC_CTRL_FRCDPX;
>> +
>> + switch (hw->mac.forced_speed_duplex) {
>> + case IGC_FORCED_10H:
>> + ctrl |= IGC_CTRL_SPEED_10;
>> + break;
>> + case IGC_FORCED_10F:
>> + ctrl |= IGC_CTRL_SPEED_10 | IGC_CTRL_FD;
>> + break;
>> + case IGC_FORCED_100H:
>> + ctrl |= IGC_CTRL_SPEED_100;
>> + break;
>> + case IGC_FORCED_100F:
>> + ctrl |= IGC_CTRL_SPEED_100 | IGC_CTRL_FD;
>> + break;
>> + default:
>> + return -IGC_ERR_CONFIG;
>> + }
>> + wr32(IGC_CTRL, ctrl);
>> + ret_val = igc_setup_copper_link(hw);
>> + }
>> return ret_val;
>> }
>> @@ -443,6 +467,7 @@ static const struct igc_phy_operations
>> igc_phy_ops_base = {
>> .reset = igc_phy_hw_reset,
>> .read_reg = igc_read_phy_reg_gpy,
>> .write_reg = igc_write_phy_reg_gpy,
>> + .force_speed_duplex = igc_force_speed_duplex,
>> };
>> const struct igc_info igc_base_info = {
>> diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/
>> net/ethernet/intel/igc/igc_defines.h
>> index 9482ab11f050..3f504751c2d9 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_defines.h
>> +++ b/drivers/net/ethernet/intel/igc/igc_defines.h
>> @@ -129,10 +129,13 @@
>> #define IGC_ERR_SWFW_SYNC 13
>> /* Device Control */
>> +#define IGC_CTRL_FD BIT(0) /* Full Duplex */
>> #define IGC_CTRL_RST 0x04000000 /* Global reset */
>> -
>> #define IGC_CTRL_PHY_RST 0x80000000 /* PHY Reset */
>> #define IGC_CTRL_SLU 0x00000040 /* Set link up (Force Link) */
>> +#define IGC_CTRL_SPEED_MASK GENMASK(10, 8)
>> +#define IGC_CTRL_SPEED_10 FIELD_PREP(IGC_CTRL_SPEED_MASK, 0)
>> +#define IGC_CTRL_SPEED_100 FIELD_PREP(IGC_CTRL_SPEED_MASK, 1)
>> #define IGC_CTRL_FRCSPD 0x00000800 /* Force Speed */
>> #define IGC_CTRL_FRCDPX 0x00001000 /* Force Duplex */
>> #define IGC_CTRL_VME 0x40000000 /* IEEE VLAN mode enable */
>> @@ -673,6 +676,10 @@
>> #define IGC_GEN_POLL_TIMEOUT 1920
>> /* PHY Control Register */
>> +#define MII_CR_SPEED_MASK (BIT(6) | BIT(13))
>> +#define MII_CR_SPEED_10 0x0000 /* SSM=0, SSL=0: 10 Mb/s */
>> +#define MII_CR_SPEED_100 BIT(13) /* SSM=0, SSL=1: 100 Mb/s */
>> +#define MII_CR_DUPLEX_EN BIT(8) /* 0 = Half Duplex, 1 = Full
>> Duplex */
>> #define MII_CR_RESTART_AUTO_NEG 0x0200 /* Restart auto
>> negotiation */
>> #define MII_CR_POWER_DOWN 0x0800 /* Power down */
>> #define MII_CR_AUTO_NEG_EN 0x1000 /* Auto Neg Enable */
>> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/
>> net/ethernet/intel/igc/igc_ethtool.c
>> index cfcbf2fdad6e..b103836a895f 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
>> @@ -1914,44 +1914,58 @@ static int
>> igc_ethtool_get_link_ksettings(struct net_device *netdev,
>> ethtool_link_ksettings_add_link_mode(cmd, supported, TP);
>> ethtool_link_ksettings_add_link_mode(cmd, advertising, TP);
>> - /* advertising link modes */
>> - if (hw->phy.autoneg_advertised & ADVERTISE_10_HALF)
>> - ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> 10baseT_Half);
>> - if (hw->phy.autoneg_advertised & ADVERTISE_10_FULL)
>> - ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> 10baseT_Full);
>> - if (hw->phy.autoneg_advertised & ADVERTISE_100_HALF)
>> - ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> 100baseT_Half);
>> - if (hw->phy.autoneg_advertised & ADVERTISE_100_FULL)
>> - ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> 100baseT_Full);
>> - if (hw->phy.autoneg_advertised & ADVERTISE_1000_FULL)
>> - ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> 1000baseT_Full);
>> - if (hw->phy.autoneg_advertised & ADVERTISE_2500_FULL)
>> - ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> 2500baseT_Full);
>> -
>> /* set autoneg settings */
>> ethtool_link_ksettings_add_link_mode(cmd, supported, Autoneg);
>> - ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg);
>> + if (hw->mac.autoneg_enabled) {
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg);
>> + cmd->base.autoneg = AUTONEG_ENABLE;
>> +
>> + /* advertising link modes only apply when autoneg is on */
>> + if (hw->phy.autoneg_advertised & ADVERTISE_10_HALF)
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> + 10baseT_Half);
>> + if (hw->phy.autoneg_advertised & ADVERTISE_10_FULL)
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> + 10baseT_Full);
>> + if (hw->phy.autoneg_advertised & ADVERTISE_100_HALF)
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> + 100baseT_Half);
>> + if (hw->phy.autoneg_advertised & ADVERTISE_100_FULL)
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> + 100baseT_Full);
>> + if (hw->phy.autoneg_advertised & ADVERTISE_1000_FULL)
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> + 1000baseT_Full);
>> + if (hw->phy.autoneg_advertised & ADVERTISE_2500_FULL)
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> + 2500baseT_Full);
>> +
>> + /* Set pause flow control advertising */
>> + switch (hw->fc.requested_mode) {
>> + case igc_fc_full:
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> + Pause);
>> + break;
>> + case igc_fc_rx_pause:
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> + Pause);
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> + Asym_Pause);
>> + break;
>> + case igc_fc_tx_pause:
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> + Asym_Pause);
>> + break;
>> + default:
>> + break;
>> + }
>> + } else {
>> + cmd->base.autoneg = AUTONEG_DISABLE;
>> + }
>> - /* Set pause flow control settings */
>> + /* Pause is always supported */
>> ethtool_link_ksettings_add_link_mode(cmd, supported, Pause);
>> - switch (hw->fc.requested_mode) {
>> - case igc_fc_full:
>> - ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause);
>> - break;
>> - case igc_fc_rx_pause:
>> - ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause);
>> - ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> - Asym_Pause);
>> - break;
>> - case igc_fc_tx_pause:
>> - ethtool_link_ksettings_add_link_mode(cmd, advertising,
>> - Asym_Pause);
>> - break;
>> - default:
>> - break;
>> - }
>> -
>> status = pm_runtime_suspended(&adapter->pdev->dev) ?
>> 0 : rd32(IGC_STATUS);
>> @@ -1983,7 +1997,6 @@ static int igc_ethtool_get_link_ksettings(struct
>> net_device *netdev,
>> cmd->base.duplex = DUPLEX_UNKNOWN;
>> }
>> cmd->base.speed = speed;
>> - cmd->base.autoneg = AUTONEG_ENABLE;
>> /* MDI-X => 2; MDI =>1; Invalid =>0 */
>> if (hw->phy.media_type == igc_media_type_copper)
>> @@ -2000,6 +2013,37 @@ static int
>> igc_ethtool_get_link_ksettings(struct net_device *netdev,
>> return 0;
>> }
>> +/**
>> + * igc_handle_autoneg_disabled - Configure forced speed/duplex settings
>> + * @adapter: private driver structure
>> + * @speed: requested speed (must be SPEED_10 or SPEED_100)
>> + * @duplex: requested duplex
>> + *
>> + * Records forced speed/duplex when autoneg is disabled.
>> + * Caller must validate speed before calling this function.
>> + */
>> +static void igc_handle_autoneg_disabled(struct igc_adapter *adapter,
>> u32 speed,
>> + u8 duplex)
>> +{
>> + struct igc_mac_info *mac = &adapter->hw.mac;
>> +
>> + switch (speed) {
>> + case SPEED_10:
>> + mac->forced_speed_duplex = (duplex == DUPLEX_FULL) ?
>> + IGC_FORCED_10F : IGC_FORCED_10H;
>> + break;
>> + case SPEED_100:
>> + mac->forced_speed_duplex = (duplex == DUPLEX_FULL) ?
>> + IGC_FORCED_100F : IGC_FORCED_100H;
>> + break;
>> + default:
>> + WARN_ONCE(1, "Unsupported speed %u\n", speed);
>> + return;
>> + }
>> +
>> + mac->autoneg_enabled = false;
>> +}
>> +
>> /**
>> * igc_handle_autoneg_enabled - Configure autonegotiation advertisement
>> * @adapter: private driver structure
>> @@ -2038,6 +2082,7 @@ static void igc_handle_autoneg_enabled(struct
>> igc_adapter *adapter,
>> 10baseT_Half))
>> advertised |= ADVERTISE_10_HALF;
>> + hw->mac.autoneg_enabled = true;
>> hw->phy.autoneg_advertised = advertised;
>> if (adapter->fc_autoneg)
>> hw->fc.requested_mode = igc_fc_default;
>> @@ -2059,6 +2104,12 @@ igc_ethtool_set_link_ksettings(struct
>> net_device *netdev,
>> return -EINVAL;
>> }
>> + if (cmd->base.autoneg != AUTONEG_ENABLE &&
>> + cmd->base.autoneg != AUTONEG_DISABLE) {
>> + netdev_info(dev, "Unsupported autoneg setting\n");
>> + return -EINVAL;
>> + }
>> +
>> /* MDI setting is only allowed when autoneg enabled because
>> * some hardware doesn't allow MDI setting when speed or
>> * duplex is forced.
>> @@ -2071,14 +2122,25 @@ igc_ethtool_set_link_ksettings(struct
>> net_device *netdev,
>> }
>> }
>> + if (cmd->base.autoneg == AUTONEG_DISABLE) {
>> + if (cmd->base.speed != SPEED_10 && cmd->base.speed !=
>> SPEED_100) {
>> + netdev_info(dev, "Unsupported speed for forced link\n");
>> + return -EINVAL;
>> + }
>> + if (cmd->base.duplex != DUPLEX_HALF && cmd->base.duplex !=
>> DUPLEX_FULL) {
>> + netdev_info(dev, "Duplex must be half or full for forced
>> link\n");
>> + return -EINVAL;
>> + }
>> + }
>> +
>> while (test_and_set_bit(__IGC_RESETTING, &adapter->state))
>> usleep_range(1000, 2000);
>> - if (cmd->base.autoneg == AUTONEG_ENABLE) {
>> + if (cmd->base.autoneg == AUTONEG_ENABLE)
>> igc_handle_autoneg_enabled(adapter, cmd);
>> - } else {
>> - netdev_info(dev, "Force mode currently not supported\n");
>> - }
>> + else
>> + igc_handle_autoneg_disabled(adapter, cmd->base.speed,
>> + cmd->base.duplex);
>> /* MDI-X => 2; MDI => 1; Auto => 3 */
>> if (cmd->base.eth_tp_mdix_ctrl) {
>> diff --git a/drivers/net/ethernet/intel/igc/igc_hw.h b/drivers/net/
>> ethernet/intel/igc/igc_hw.h
>> index 86ab8f566f44..62aaee55668a 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_hw.h
>> +++ b/drivers/net/ethernet/intel/igc/igc_hw.h
>> @@ -73,6 +73,13 @@ struct igc_info {
>> extern const struct igc_info igc_base_info;
>> +enum igc_forced_speed_duplex {
>> + IGC_FORCED_10H,
>> + IGC_FORCED_10F,
>> + IGC_FORCED_100H,
>> + IGC_FORCED_100F,
>> +};
>> +
>> struct igc_mac_info {
>> struct igc_mac_operations ops;
>> @@ -93,6 +100,8 @@ struct igc_mac_info {
>> bool arc_subsystem_valid;
>> bool get_link_status;
>> + bool autoneg_enabled;
>> + enum igc_forced_speed_duplex forced_speed_duplex;
>> };
>> struct igc_nvm_operations {
>> diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/
>> ethernet/intel/igc/igc_mac.c
>> index 0a3d3f357505..d6f3f6618469 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_mac.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_mac.c
>> @@ -446,6 +446,17 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
>> u16 speed, duplex;
>> s32 ret_val = 0;
>> + /* Without autoneg, flow control capability is not exchanged with
>> the
>> + * link partner. IEEE 802.3 prohibits flow control in half-duplex
>> mode.
>> + */
>> + if (!hw->mac.autoneg_enabled) {
>> + if (hw->mac.forced_speed_duplex == IGC_FORCED_10H ||
>> + hw->mac.forced_speed_duplex == IGC_FORCED_100H)
>> + hw->fc.current_mode = igc_fc_none;
>> +
>> + goto force_fc;
>> + }
>> +
>> /* In auto-neg, we need to check and see if Auto-Neg has completed,
>> * and if so, how the PHY and link partner has flow control
>> * configured.
>> @@ -607,6 +618,7 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
>> /* Now we call a subroutine to actually force the MAC
>> * controller to use the correct flow control settings.
>> */
>> +force_fc:
>> ret_val = igc_force_mac_fc(hw);
>> if (ret_val) {
>> hw_dbg("Error forcing flow control settings\n");
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/
>> ethernet/intel/igc/igc_main.c
>> index 72bc5128d8b8..437e1d1ef1e4 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -7298,7 +7298,7 @@ static int igc_probe(struct pci_dev *pdev,
>> /* Initialize link properties that are user-changeable */
>> adapter->fc_autoneg = true;
>> hw->phy.autoneg_advertised = 0xaf;
>> -
>> + hw->mac.autoneg_enabled = true;
>> hw->fc.requested_mode = igc_fc_default;
>> hw->fc.current_mode = igc_fc_default;
>> diff --git a/drivers/net/ethernet/intel/igc/igc_phy.c b/drivers/net/
>> ethernet/intel/igc/igc_phy.c
>> index 6c4d204aecfa..4cf737fb3b21 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_phy.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_phy.c
>> @@ -494,12 +494,20 @@ s32 igc_setup_copper_link(struct igc_hw *hw)
>> s32 ret_val = 0;
>> bool link;
>> - /* Setup autoneg and flow control advertisement and perform
>> - * autonegotiation.
>> - */
>> - ret_val = igc_copper_link_autoneg(hw);
>> - if (ret_val)
>> - goto out;
>> + if (hw->mac.autoneg_enabled) {
>> + /* Setup autoneg and flow control advertisement and perform
>> + * autonegotiation.
>> + */
>> + ret_val = igc_copper_link_autoneg(hw);
>> + if (ret_val)
>> + goto out;
>> + } else {
>> + ret_val = hw->phy.ops.force_speed_duplex(hw);
>> + if (ret_val) {
>> + hw_dbg("Error Forcing Speed/Duplex\n");
>> + goto out;
>> + }
>> + }
>> /* Check link status. Wait up to 100 microseconds for link to
>> become
>> * valid.
>> @@ -778,3 +786,48 @@ u16 igc_read_phy_fw_version(struct igc_hw *hw)
>> return gphy_version;
>> }
>> +
>> +/**
>> + * igc_force_speed_duplex - Force PHY speed and duplex settings
>> + * @hw: pointer to the HW structure
>> + *
>> + * Programs the GPY PHY control register to disable autonegotiation
>> + * and force the speed/duplex indicated by hw->mac.forced_speed_duplex.
>> + */
>> +s32 igc_force_speed_duplex(struct igc_hw *hw)
>> +{
>> + struct igc_phy_info *phy = &hw->phy;
>> + u16 phy_ctrl;
>> + s32 ret_val;
>> +
>> + ret_val = phy->ops.read_reg(hw, PHY_CONTROL, &phy_ctrl);
>> + if (ret_val)
>> + return ret_val;
>> +
>> + phy_ctrl &= ~(MII_CR_SPEED_MASK | MII_CR_DUPLEX_EN |
>> + MII_CR_AUTO_NEG_EN | MII_CR_RESTART_AUTO_NEG);
>> +
>> + switch (hw->mac.forced_speed_duplex) {
>> + case IGC_FORCED_10H:
>> + phy_ctrl |= MII_CR_SPEED_10;
>> + break;
>> + case IGC_FORCED_10F:
>> + phy_ctrl |= MII_CR_SPEED_10 | MII_CR_DUPLEX_EN;
>> + break;
>> + case IGC_FORCED_100H:
>> + phy_ctrl |= MII_CR_SPEED_100;
>> + break;
>> + case IGC_FORCED_100F:
>> + phy_ctrl |= MII_CR_SPEED_100 | MII_CR_DUPLEX_EN;
>> + break;
>> + default:
>> + return -IGC_ERR_CONFIG;
>> + }
>> +
>> + ret_val = phy->ops.write_reg(hw, PHY_CONTROL, phy_ctrl);
>> + if (ret_val)
>> + return ret_val;
>> +
>> + hw->mac.get_link_status = true;
>> + return 0;
>> +}
>> diff --git a/drivers/net/ethernet/intel/igc/igc_phy.h b/drivers/net/
>> ethernet/intel/igc/igc_phy.h
>> index 832a7e359f18..d37a89174826 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_phy.h
>> +++ b/drivers/net/ethernet/intel/igc/igc_phy.h
>> @@ -18,5 +18,6 @@ void igc_power_down_phy_copper(struct igc_hw *hw);
>> s32 igc_write_phy_reg_gpy(struct igc_hw *hw, u32 offset, u16 data);
>> s32 igc_read_phy_reg_gpy(struct igc_hw *hw, u32 offset, u16 *data);
>> u16 igc_read_phy_fw_version(struct igc_hw *hw);
>> +s32 igc_force_speed_duplex(struct igc_hw *hw);
>> #endif
> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com>
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH iwl-next v5 3/4] igc: replace goto out with direct returns in igc_config_fc_after_link_up()
From: Kadosh, MoriyaX @ 2026-06-22 8:11 UTC (permalink / raw)
To: Ruinskiy, Dima, KhaiWenTan, anthony.l.nguyen, przemyslaw.kitszel,
andrew+netdev, davem, edumazet, kuba, pabeni
Cc: intel-wired-lan, netdev, linux-kernel, faizal.abdul.rahim,
hong.aun.looi, hector.blanco.alcaine, khai.wen.tan, Faizal Rahim
In-Reply-To: <58af982a-1531-43f8-934c-e83b45111b1f@intel.com>
On 14/06/2026 10:17, Ruinskiy, Dima wrote:
> On 08/05/2026 0:47, KhaiWenTan wrote:
>> From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>>
>> The out: label only returns ret_val with no cleanup. The kernel coding
>> style guide states: "If there is no cleanup needed then just return
>> directly." (Documentation/process/coding-style.rst, section 7).
>>
>> This improves readability ahead of a subsequent patch that introduces a
>> new goto label in this function.
>>
>> No functional change.
>>
>> Reviewed-by: Looi Hong Aun <hong.aun.looi@intel.com>
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>> Signed-off-by: Khai Wen Tan <khai.wen.tan@linux.intel.com>
>> ---
>> drivers/net/ethernet/intel/igc/igc_mac.c | 15 +++++++--------
>> 1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/
>> ethernet/intel/igc/igc_mac.c
>> index 142beb9ae557..0a3d3f357505 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_mac.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_mac.c
>> @@ -458,15 +458,15 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
>> ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS,
>> &mii_status_reg);
>> if (ret_val)
>> - goto out;
>> + return ret_val;
>> ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS,
>> &mii_status_reg);
>> if (ret_val)
>> - goto out;
>> + return ret_val;
>> if (!(mii_status_reg & MII_SR_AUTONEG_COMPLETE)) {
>> hw_dbg("Copper PHY and Auto Neg has not completed.\n");
>> - goto out;
>> + return ret_val;
>> }
>> /* The AutoNeg process has completed, so we now need to
>> @@ -478,11 +478,11 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
>> ret_val = hw->phy.ops.read_reg(hw, PHY_AUTONEG_ADV,
>> &mii_nway_adv_reg);
>> if (ret_val)
>> - goto out;
>> + return ret_val;
>> ret_val = hw->phy.ops.read_reg(hw, PHY_LP_ABILITY,
>> &mii_nway_lp_ability_reg);
>> if (ret_val)
>> - goto out;
>> + return ret_val;
>> /* Two bits in the Auto Negotiation Advertisement Register
>> * (Address 4) and two bits in the Auto Negotiation Base
>> * Page Ability Register (Address 5) determine flow control
>> @@ -598,7 +598,7 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
>> ret_val = hw->mac.ops.get_speed_and_duplex(hw, &speed, &duplex);
>> if (ret_val) {
>> hw_dbg("Error getting link speed and duplex\n");
>> - goto out;
>> + return ret_val;
>> }
>> if (duplex == HALF_DUPLEX)
>> @@ -610,10 +610,9 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
>> ret_val = igc_force_mac_fc(hw);
>> if (ret_val) {
>> hw_dbg("Error forcing flow control settings\n");
>> - goto out;
>> + return ret_val;
>> }
>> -out:
>> return ret_val;
>> }
> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com>
^ permalink raw reply
* [PATCH iwl-net] ice: clear the default forwarding VSI rule when releasing a VSI
From: Petr Oros @ 2026-06-22 8:10 UTC (permalink / raw)
To: netdev
Cc: Petr Oros, Tony Nguyen, Przemek Kitszel, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Jacob Keller, Michal Swiatkowski, intel-wired-lan, linux-kernel
When a VSI is configured as the switch's default forwarding VSI
(ICE_SW_LKUP_DFLT) and is then torn down, the rule is left behind in
the switch. ice_vsi_release() no longer removes it, and the SR-IOV VF
free path (ice_free_vfs() -> ice_free_vf_res() -> ice_vf_vsi_release()
-> ice_vsi_release()) does not disable promiscuous mode either, which
only happens on VF reset in ice_vf_clear_all_promisc_modes().
A trusted VF that enters unicast promiscuous mode becomes the default
forwarding VSI (this is the default mode, when the PF does not have VF
true-promiscuous mode enabled). If the VFs are then destroyed without
the VF first leaving promiscuous mode, the ICE_SW_LKUP_DFLT rule for
the now-freed VSI is leaked. When VFs are recreated, a VSI reuses the
freed hw_vsi_id. If it is assigned a different VSI handle than the
leaked rule holds, ice_set_dflt_vsi() does not recognize it as
already-default, and ice_add_update_vsi_list() folds the dangling
(freed) handle into a VSI list, which the firmware rejects. The VSI
handle assigned on re-creation varies, so the failure is intermittent
rather than every cycle.
Reproduce by repeatedly running the cycle below on the two ports of the
same card, where $VF0 and $VF1 are the netdevs of vf 15 once they
appear. The VF must be brought up so iavf actually pushes the unicast
promiscuous request, and the rule must settle before the VFs are torn
down again:
echo 16 > /sys/class/net/$PF0/device/sriov_numvfs
echo 16 > /sys/class/net/$PF1/device/sriov_numvfs
ip link set $PF0 vf 15 trust on
ip link set $PF1 vf 15 trust on
ip link set $VF0 up
ip link set $VF1 up
ip link set $VF0 promisc on
ip link set $VF1 promisc on
sleep 1
echo 0 > /sys/class/net/$PF0/device/sriov_numvfs
echo 0 > /sys/class/net/$PF1/device/sriov_numvfs
Within a few cycles the ice PF and iavf VF log:
Failed to set VSI 25 as the default forwarding VSI, error -22
Turning on/off promiscuous mode for VF 63 failed, error: -22
PF returned error -53 (IAVF_ERR_ADMIN_QUEUE_ERROR) to our request 14
This cleanup used to live in ice_vsi_release() but was dropped by the
referenced refactor. Restore it. Clear the default forwarding VSI rule
in ice_vsi_release() when this VSI owns it, which covers every teardown
path.
Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Signed-off-by: Petr Oros <poros@redhat.com>
---
drivers/net/ethernet/intel/ice/ice_lib.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 2717cc31bff8fe..408464434506ef 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2872,6 +2872,9 @@ int ice_vsi_release(struct ice_vsi *vsi)
return -ENODEV;
pf = vsi->back;
+ if (ice_is_vsi_dflt_vsi(vsi))
+ ice_clear_dflt_vsi(vsi);
+
if (test_bit(ICE_FLAG_RSS_ENA, pf->flags))
ice_rss_clean(vsi);
--
2.53.0
^ permalink raw reply related
* Re: [Intel-wired-lan] [PATCH iwl-next v5 1/4] igc: remove unused autoneg_failed field
From: Kadosh, MoriyaX @ 2026-06-22 8:10 UTC (permalink / raw)
To: Ruinskiy, Dima, KhaiWenTan, anthony.l.nguyen, przemyslaw.kitszel,
andrew+netdev, davem, edumazet, kuba, pabeni
Cc: intel-wired-lan, netdev, linux-kernel, faizal.abdul.rahim,
hong.aun.looi, hector.blanco.alcaine, khai.wen.tan, Faizal Rahim,
Aleksandr Loktionov
In-Reply-To: <7d4b2a62-231a-4f61-8561-5c26d6ed3125@intel.com>
On 14/06/2026 10:16, Ruinskiy, Dima wrote:
> On 08/05/2026 0:47, KhaiWenTan wrote:
>> From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>>
>> autoneg_failed in struct igc_mac_info is never set in the igc driver.
>> Remove the field and the dead code checking it in
>> igc_config_fc_after_link_up().
>>
>> The field originates from the e1000/e1000e fiber/serdes forced-link
>> path, where MAC-level autoneg timeout sets it to signal the flow-control
>> code to force pause. igc supports only copper, so it never needs to set
>> this field.
>>
>> Reviewed-by: Looi Hong Aun <hong.aun.looi@intel.com>
>> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>> Signed-off-by: Khai Wen Tan <khai.wen.tan@linux.intel.com>
>> ---
>> drivers/net/ethernet/intel/igc/igc_hw.h | 1 -
>> drivers/net/ethernet/intel/igc/igc_mac.c | 16 +---------------
>> 2 files changed, 1 insertion(+), 16 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc_hw.h b/drivers/net/
>> ethernet/intel/igc/igc_hw.h
>> index be8a49a86d09..86ab8f566f44 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_hw.h
>> +++ b/drivers/net/ethernet/intel/igc/igc_hw.h
>> @@ -92,7 +92,6 @@ struct igc_mac_info {
>> bool asf_firmware_present;
>> bool arc_subsystem_valid;
>> - bool autoneg_failed;
>> bool get_link_status;
>> };
>> diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/
>> ethernet/intel/igc/igc_mac.c
>> index 7ac6637f8db7..142beb9ae557 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_mac.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_mac.c
>> @@ -438,28 +438,14 @@ void igc_config_collision_dist(struct igc_hw *hw)
>> * Checks the status of auto-negotiation after link up to ensure
>> that the
>> * speed and duplex were not forced. If the link needed to be
>> forced, then
>> * flow control needs to be forced also. If auto-negotiation is
>> enabled
>> - * and did not fail, then we configure flow control based on our link
>> - * partner.
>> + * then we configure flow control based on our link partner.
>> */
>> s32 igc_config_fc_after_link_up(struct igc_hw *hw)
>> {
>> u16 mii_status_reg, mii_nway_adv_reg, mii_nway_lp_ability_reg;
>> - struct igc_mac_info *mac = &hw->mac;
>> u16 speed, duplex;
>> s32 ret_val = 0;
>> - /* Check for the case where we have fiber media and auto-neg failed
>> - * so we had to force link. In this case, we need to force the
>> - * configuration of the MAC to match the "fc" parameter.
>> - */
>> - if (mac->autoneg_failed)
>> - ret_val = igc_force_mac_fc(hw);
>> -
>> - if (ret_val) {
>> - hw_dbg("Error forcing flow control settings\n");
>> - goto out;
>> - }
>> -
>> /* In auto-neg, we need to check and see if Auto-Neg has completed,
>> * and if so, how the PHY and link partner has flow control
>> * configured.
> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com>
^ permalink raw reply
* Re: [PATCH net] net: usb: kalmia: bound RX frame length in kalmia_rx_fixup()
From: Andrew Lunn @ 2026-06-22 8:09 UTC (permalink / raw)
To: Maoyi Xie
Cc: Oliver Neukum, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, linux-usb, netdev, linux-kernel,
stable
In-Reply-To: <178211531778.2216480.12637613349790980750@maoyixie.com>
On Mon, Jun 22, 2026 at 04:01:57PM +0800, Maoyi Xie wrote:
> kalmia_rx_fixup() computes usb_packet_length = skb->len - (2 *
> KALMIA_HEADER_LENGTH) as a u16, guarded only by a pre-loop check that
> skb->len is at least KALMIA_HEADER_LENGTH, which is 6. A device can
> deliver a short bulk-IN frame with skb->len in the 6 to 11 range, or
> leave a short trailing remainder on a later loop iteration. Either case
> underflows usb_packet_length to about 65530.
>
> That bypasses the usb_packet_length < ether_packet_length truncation path.
> The device-supplied ether_packet_length, a le16 up to 65535 read from
> header_start[2], then drives a memcmp() and the following skb_trim() and
> skb_pull() past the end of the rx buffer. The rx buffer is hard_mtu * 10,
> which is 14000 bytes. That is an out of bounds read.
>
> Require both the start and end framing headers to be present before
> subtracting them, on every loop iteration.
>
> Fixes: d40261236e8e ("net/usb: Add Samsung Kalmia driver for Samsung GT-B3730")
> Cc: stable@vger.kernel.org
> Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH iwl-next v5 2/4] igc: move autoneg-enabled settings into igc_handle_autoneg_enabled()
From: Kadosh, MoriyaX @ 2026-06-22 8:08 UTC (permalink / raw)
To: Ruinskiy, Dima, KhaiWenTan, anthony.l.nguyen, przemyslaw.kitszel,
andrew+netdev, davem, edumazet, kuba, pabeni
Cc: intel-wired-lan, netdev, linux-kernel, faizal.abdul.rahim,
hong.aun.looi, hector.blanco.alcaine, khai.wen.tan, Faizal Rahim,
Aleksandr Loktionov
In-Reply-To: <4d8d9eaa-d9bb-4589-a37d-31d0da584335@intel.com>
On 14/06/2026 10:17, Ruinskiy, Dima wrote:
> On 08/05/2026 0:47, KhaiWenTan wrote:
>> From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>>
>> Move the advertised link modes and flow control configuration from
>> igc_ethtool_set_link_ksettings() into igc_handle_autoneg_enabled().
>>
>> No functional change.
>>
>> Reviewed-by: Looi Hong Aun <hong.aun.looi@intel.com>
>> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>> Signed-off-by: Khai Wen Tan <khai.wen.tan@linux.intel.com>
>> ---
>> drivers/net/ethernet/intel/igc/igc_ethtool.c | 72 ++++++++++++--------
>> 1 file changed, 44 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/
>> net/ethernet/intel/igc/igc_ethtool.c
>> index 0122009bedd0..cfcbf2fdad6e 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
>> @@ -2000,6 +2000,49 @@ static int
>> igc_ethtool_get_link_ksettings(struct net_device *netdev,
>> return 0;
>> }
>> +/**
>> + * igc_handle_autoneg_enabled - Configure autonegotiation advertisement
>> + * @adapter: private driver structure
>> + * @cmd: ethtool link ksettings from user
>> + *
>> + * Records advertised speeds and flow control settings when autoneg
>> + * is enabled.
>> + */
>> +static void igc_handle_autoneg_enabled(struct igc_adapter *adapter,
>> + const struct ethtool_link_ksettings *cmd)
>> +{
>> + struct igc_hw *hw = &adapter->hw;
>> + u16 advertised = 0;
>> +
>> + if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
>> + 2500baseT_Full))
>> + advertised |= ADVERTISE_2500_FULL;
>> +
>> + if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
>> + 1000baseT_Full))
>> + advertised |= ADVERTISE_1000_FULL;
>> +
>> + if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
>> + 100baseT_Full))
>> + advertised |= ADVERTISE_100_FULL;
>> +
>> + if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
>> + 100baseT_Half))
>> + advertised |= ADVERTISE_100_HALF;
>> +
>> + if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
>> + 10baseT_Full))
>> + advertised |= ADVERTISE_10_FULL;
>> +
>> + if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
>> + 10baseT_Half))
>> + advertised |= ADVERTISE_10_HALF;
>> +
>> + hw->phy.autoneg_advertised = advertised;
>> + if (adapter->fc_autoneg)
>> + hw->fc.requested_mode = igc_fc_default;
>> +}
>> +
>> static int
>> igc_ethtool_set_link_ksettings(struct net_device *netdev,
>> const struct ethtool_link_ksettings *cmd)
>> @@ -2007,7 +2050,6 @@ igc_ethtool_set_link_ksettings(struct net_device
>> *netdev,
>> struct igc_adapter *adapter = netdev_priv(netdev);
>> struct net_device *dev = adapter->netdev;
>> struct igc_hw *hw = &adapter->hw;
>> - u16 advertised = 0;
>> /* When adapter in resetting mode, autoneg/speed/duplex
>> * cannot be changed
>> @@ -2032,34 +2074,8 @@ igc_ethtool_set_link_ksettings(struct
>> net_device *netdev,
>> while (test_and_set_bit(__IGC_RESETTING, &adapter->state))
>> usleep_range(1000, 2000);
>> - if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
>> - 2500baseT_Full))
>> - advertised |= ADVERTISE_2500_FULL;
>> -
>> - if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
>> - 1000baseT_Full))
>> - advertised |= ADVERTISE_1000_FULL;
>> -
>> - if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
>> - 100baseT_Full))
>> - advertised |= ADVERTISE_100_FULL;
>> -
>> - if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
>> - 100baseT_Half))
>> - advertised |= ADVERTISE_100_HALF;
>> -
>> - if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
>> - 10baseT_Full))
>> - advertised |= ADVERTISE_10_FULL;
>> -
>> - if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
>> - 10baseT_Half))
>> - advertised |= ADVERTISE_10_HALF;
>> -
>> if (cmd->base.autoneg == AUTONEG_ENABLE) {
>> - hw->phy.autoneg_advertised = advertised;
>> - if (adapter->fc_autoneg)
>> - hw->fc.requested_mode = igc_fc_default;
>> + igc_handle_autoneg_enabled(adapter, cmd);
>> } else {
>> netdev_info(dev, "Force mode currently not supported\n");
>> }
> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com>
^ permalink raw reply
* Re: [PATCH net] net: ti: icssg-prueth: fix XDP_TX from the AF_XDP zero-copy RX path
From: Meghana Malladi @ 2026-06-22 8:05 UTC (permalink / raw)
To: David Carlier, danishanwar, rogerq, andrew+netdev, netdev
Cc: davem, edumazet, kuba, pabeni, horms, hawk, john.fastabend, sdf,
ast, daniel, bpf, linux-arm-kernel, linux-kernel, stable
In-Reply-To: <20260620213756.87499-1-devnexen@gmail.com>
Hi David,
Thanks for the fix.
On 6/21/26 03:07, David Carlier wrote:
> On XDP_TX from the zero-copy RX path, emac_run_xdp() converts the xsk
> buffer via xdp_convert_zc_to_xdp_frame(), which clones the data into a
> fresh MEM_TYPE_PAGE_ORDER0 page that is not DMA mapped. Transmitting it
> as PRUETH_TX_BUFF_TYPE_XDP_TX derives the DMA address with
> page_pool_get_dma_addr(), reading an uninitialized page->dma_addr, so
> the device DMAs from a bogus address (corrupt TX, or an IOMMU fault).
>
> Pick the TX buffer type from the frame's memory type: keep
> PRUETH_TX_BUFF_TYPE_XDP_TX for page_pool frames and use
> PRUETH_TX_BUFF_TYPE_XDP_NDO for the cloned zero-copy frame. The
> completion path already unmaps PRUETH_SWDATA_XDPF buffers.
>
Is it safe to unconditionally unmap the buffer for the case where
frame's memory type is PRUETH_TX_BUFF_TYPE_XDP_TX? In this case the DMA
mapping is done with rx_chn->dma_dev, where as in completion path we are
unmapping with tx_chn->dma_dev unconditionally.
> Fixes: 7a64bb388df3 ("net: ti: icssg-prueth: Add AF_XDP zero copy for RX")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
> drivers/net/ethernet/ti/icssg/icssg_common.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
> index 82ddef9c17d5..302e700ea17d 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_common.c
> +++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
> @@ -804,6 +804,7 @@ EXPORT_SYMBOL_GPL(emac_xmit_xdp_frame);
> */
> static u32 emac_run_xdp(struct prueth_emac *emac, struct xdp_buff *xdp, u32 *len)
> {
> + enum prueth_tx_buff_type tx_buff_type;
> struct net_device *ndev = emac->ndev;
> struct netdev_queue *netif_txq;
> int cpu = smp_processor_id();
> @@ -826,11 +827,21 @@ static u32 emac_run_xdp(struct prueth_emac *emac, struct xdp_buff *xdp, u32 *len
> goto drop;
> }
>
> + /* In AF_XDP zero-copy mode xdp_convert_buff_to_frame()
> + * clones the xsk buffer into a fresh MEM_TYPE_PAGE_ORDER0
> + * page that is not DMA mapped. Such a frame must be mapped
> + * via the NDO path; only a page pool-backed frame already
> + * carries a usable page_pool DMA address.
> + */
> + tx_buff_type = xdpf->mem_type == MEM_TYPE_PAGE_POOL ?
> + PRUETH_TX_BUFF_TYPE_XDP_TX :
> + PRUETH_TX_BUFF_TYPE_XDP_NDO;
> +
> q_idx = cpu % emac->tx_ch_num;
> netif_txq = netdev_get_tx_queue(ndev, q_idx);
> __netif_tx_lock(netif_txq, cpu);
> result = emac_xmit_xdp_frame(emac, xdpf, q_idx,
> - PRUETH_TX_BUFF_TYPE_XDP_TX);
> + tx_buff_type);
> __netif_tx_unlock(netif_txq);
> if (result == ICSSG_XDP_CONSUMED) {
> ndev->stats.tx_dropped++;
^ permalink raw reply
* [PATCH net] net: usb: kalmia: bound RX frame length in kalmia_rx_fixup()
From: Maoyi Xie @ 2026-06-22 8:01 UTC (permalink / raw)
To: Oliver Neukum
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, linux-usb, netdev, linux-kernel, stable
kalmia_rx_fixup() computes usb_packet_length = skb->len - (2 *
KALMIA_HEADER_LENGTH) as a u16, guarded only by a pre-loop check that
skb->len is at least KALMIA_HEADER_LENGTH, which is 6. A device can
deliver a short bulk-IN frame with skb->len in the 6 to 11 range, or
leave a short trailing remainder on a later loop iteration. Either case
underflows usb_packet_length to about 65530.
That bypasses the usb_packet_length < ether_packet_length truncation path.
The device-supplied ether_packet_length, a le16 up to 65535 read from
header_start[2], then drives a memcmp() and the following skb_trim() and
skb_pull() past the end of the rx buffer. The rx buffer is hard_mtu * 10,
which is 14000 bytes. That is an out of bounds read.
Require both the start and end framing headers to be present before
subtracting them, on every loop iteration.
Fixes: d40261236e8e ("net/usb: Add Samsung Kalmia driver for Samsung GT-B3730")
Cc: stable@vger.kernel.org
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
---
I asked about this on linux-usb on 2026-06-15 and got no reply, so I
am sending the fix.
drivers/net/usb/kalmia.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/net/usb/kalmia.c b/drivers/net/usb/kalmia.c
index ee9c48f7f68f..0dd0a30c3db4 100644
--- a/drivers/net/usb/kalmia.c
+++ b/drivers/net/usb/kalmia.c
@@ -276,6 +276,14 @@ kalmia_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
"Received header: %6phC. Package length: %i\n",
header_start, skb->len - KALMIA_HEADER_LENGTH);
+ /* both framing headers must be present before we subtract
+ * them, otherwise usb_packet_length underflows and the
+ * device-supplied ether_packet_length drives an out of bounds
+ * access below
+ */
+ if (skb->len < 2 * KALMIA_HEADER_LENGTH)
+ return 0;
+
/* subtract start header and end header */
usb_packet_length = skb->len - (2 * KALMIA_HEADER_LENGTH);
ether_packet_length = get_unaligned_le16(&header_start[2]);
--
2.34.1
^ permalink raw reply related
* Re: [PATCH] net: meth: check skb allocation in meth_init_rx_ring()
From: Andrew Lunn @ 2026-06-22 8:01 UTC (permalink / raw)
To: Pavan Chebbi
Cc: Haoxiang Li, andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
linux-kernel, stable
In-Reply-To: <CALs4sv2dr2QsFU_DUDNAMgr4MDxHcRrHqer+Kdm7dP+4TUT0eg@mail.gmail.com>
On Mon, Jun 22, 2026 at 11:27:41AM +0530, Pavan Chebbi wrote:
> On Mon, Jun 22, 2026 at 10:20 AM Haoxiang Li <haoxiang_li2024@163.com> wrote:
> >
> > meth_init_rx_ring() does not check the return value of alloc_skb().
> > If the allocation fails, the NULL skb is passed to skb_reserve() and
> > then dereferenced through skb->head.
> >
> > Add check for alloc_skb() to prevent potential null pointer dereference.
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
> > ---
> > drivers/net/ethernet/sgi/meth.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/sgi/meth.c b/drivers/net/ethernet/sgi/meth.c
> > index f7c3a5a766b7..ceff3cc937ad 100644
> > --- a/drivers/net/ethernet/sgi/meth.c
> > +++ b/drivers/net/ethernet/sgi/meth.c
> > @@ -228,6 +228,9 @@ static int meth_init_rx_ring(struct meth_private *priv)
> >
> > for (i = 0; i < RX_RING_ENTRIES; i++) {
> > priv->rx_skbs[i] = alloc_skb(METH_RX_BUFF_SIZE, 0);
> > + if (!priv->rx_skbs[i])
> > + return -ENOMEM;
> > +
>
> I think the fix is not complete. The caller meth_open() will not free
> any successfully allocated skbs if the function ever returns -ENOMEM.
There is also the question, does anybody care? Are SGI machines still
used? This is a Fast Ethernet driver, written in 2003. It has no
Maintainer. Maybe it would be better to just remove the driver?
At least drop the Fixes: tag, it does not fit the Stable rules.
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
Andrew
---
pw-bot: cr
^ permalink raw reply
* [PATCH 5/7] xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>
From: Eric Dumazet <edumazet@google.com>
KCSAN reported a data race involving net->xfrm.policy_count access.
Add missing READ_ONCE()/WRITE_ONCE() annotations on
xfrm_policy_count and xfrm_policy_default.
Fixes: 2518c7c2b3d7 ("[XFRM]: Hash policies when non-prefixed.")
Reported-by: syzbot+d85ba1c732720b9a4097@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a2b9e96.99669fcc.12a77b.0006.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
include/net/xfrm.h | 8 ++++----
net/xfrm/xfrm_policy.c | 24 ++++++++++++------------
net/xfrm/xfrm_user.c | 18 +++++++++---------
3 files changed, 25 insertions(+), 25 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 874409127e29..35a743129329 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1250,8 +1250,8 @@ int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb,
static inline bool __xfrm_check_nopolicy(struct net *net, struct sk_buff *skb,
int dir)
{
- if (!net->xfrm.policy_count[dir] && !secpath_exists(skb))
- return net->xfrm.policy_default[dir] == XFRM_USERPOLICY_ACCEPT;
+ if (!READ_ONCE(net->xfrm.policy_count[dir]) && !secpath_exists(skb))
+ return READ_ONCE(net->xfrm.policy_default[dir]) == XFRM_USERPOLICY_ACCEPT;
return false;
}
@@ -1351,8 +1351,8 @@ static inline int xfrm_route_forward(struct sk_buff *skb, unsigned short family)
{
struct net *net = dev_net(skb->dev);
- if (!net->xfrm.policy_count[XFRM_POLICY_OUT] &&
- net->xfrm.policy_default[XFRM_POLICY_OUT] == XFRM_USERPOLICY_ACCEPT)
+ if (!READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT]) &&
+ READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT]) == XFRM_USERPOLICY_ACCEPT)
return true;
return (skb_dst(skb)->flags & DST_NOXFRM) ||
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 959544425692..1f4afd580105 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -685,7 +685,7 @@ static void xfrm_byidx_resize(struct net *net)
static inline int xfrm_bydst_should_resize(struct net *net, int dir, int *total)
{
- unsigned int cnt = net->xfrm.policy_count[dir];
+ unsigned int cnt = READ_ONCE(net->xfrm.policy_count[dir]);
unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
if (total)
@@ -711,12 +711,12 @@ static inline int xfrm_byidx_should_resize(struct net *net, int total)
void xfrm_spd_getinfo(struct net *net, struct xfrmk_spdinfo *si)
{
- si->incnt = net->xfrm.policy_count[XFRM_POLICY_IN];
- si->outcnt = net->xfrm.policy_count[XFRM_POLICY_OUT];
- si->fwdcnt = net->xfrm.policy_count[XFRM_POLICY_FWD];
- si->inscnt = net->xfrm.policy_count[XFRM_POLICY_IN+XFRM_POLICY_MAX];
- si->outscnt = net->xfrm.policy_count[XFRM_POLICY_OUT+XFRM_POLICY_MAX];
- si->fwdscnt = net->xfrm.policy_count[XFRM_POLICY_FWD+XFRM_POLICY_MAX];
+ si->incnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_IN]);
+ si->outcnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT]);
+ si->fwdcnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_FWD]);
+ si->inscnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_IN+XFRM_POLICY_MAX]);
+ si->outscnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT+XFRM_POLICY_MAX]);
+ si->fwdscnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_FWD+XFRM_POLICY_MAX]);
si->spdhcnt = net->xfrm.policy_idx_hmask;
si->spdhmcnt = xfrm_policy_hashmax;
}
@@ -2318,7 +2318,7 @@ static void __xfrm_policy_link(struct xfrm_policy *pol, int dir)
}
list_add(&pol->walk.all, &net->xfrm.policy_all);
- net->xfrm.policy_count[dir]++;
+ WRITE_ONCE(net->xfrm.policy_count[dir], net->xfrm.policy_count[dir] + 1);
xfrm_pol_hold(pol);
}
@@ -2337,7 +2337,7 @@ static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol,
}
list_del_init(&pol->walk.all);
- net->xfrm.policy_count[dir]--;
+ WRITE_ONCE(net->xfrm.policy_count[dir], net->xfrm.policy_count[dir] - 1);
return pol;
}
@@ -3222,7 +3222,7 @@ struct dst_entry *xfrm_lookup_with_ifid(struct net *net,
/* To accelerate a bit... */
if (!if_id && ((dst_orig->flags & DST_NOXFRM) ||
- !net->xfrm.policy_count[XFRM_POLICY_OUT]))
+ !READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT])))
goto nopol;
xdst = xfrm_bundle_lookup(net, fl, family, dir, &xflo, if_id);
@@ -3296,7 +3296,7 @@ struct dst_entry *xfrm_lookup_with_ifid(struct net *net,
nopol:
if ((!dst_orig->dev || !(dst_orig->dev->flags & IFF_LOOPBACK)) &&
- net->xfrm.policy_default[dir] == XFRM_USERPOLICY_BLOCK) {
+ READ_ONCE(net->xfrm.policy_default[dir]) == XFRM_USERPOLICY_BLOCK) {
err = -EPERM;
goto error;
}
@@ -3750,7 +3750,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
const bool is_crypto_offload = sp &&
(xfrm_input_state(skb)->xso.type == XFRM_DEV_OFFLOAD_CRYPTO);
- if (net->xfrm.policy_default[dir] == XFRM_USERPOLICY_BLOCK) {
+ if (READ_ONCE(net->xfrm.policy_default[dir]) == XFRM_USERPOLICY_BLOCK) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOPOLS);
return 0;
}
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 3b1cf29bc402..61eb5de33b87 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2485,9 +2485,9 @@ static int xfrm_notify_userpolicy(struct net *net)
}
up = nlmsg_data(nlh);
- up->in = net->xfrm.policy_default[XFRM_POLICY_IN];
- up->fwd = net->xfrm.policy_default[XFRM_POLICY_FWD];
- up->out = net->xfrm.policy_default[XFRM_POLICY_OUT];
+ up->in = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_IN]);
+ up->fwd = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_FWD]);
+ up->out = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT]);
nlmsg_end(skb, nlh);
@@ -2511,13 +2511,13 @@ static int xfrm_set_default(struct sk_buff *skb, struct nlmsghdr *nlh,
struct xfrm_userpolicy_default *up = nlmsg_data(nlh);
if (xfrm_userpolicy_is_valid(up->in))
- net->xfrm.policy_default[XFRM_POLICY_IN] = up->in;
+ WRITE_ONCE(net->xfrm.policy_default[XFRM_POLICY_IN], up->in);
if (xfrm_userpolicy_is_valid(up->fwd))
- net->xfrm.policy_default[XFRM_POLICY_FWD] = up->fwd;
+ WRITE_ONCE(net->xfrm.policy_default[XFRM_POLICY_FWD], up->fwd);
if (xfrm_userpolicy_is_valid(up->out))
- net->xfrm.policy_default[XFRM_POLICY_OUT] = up->out;
+ WRITE_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT], up->out);
rt_genid_bump_all(net);
@@ -2547,9 +2547,9 @@ static int xfrm_get_default(struct sk_buff *skb, struct nlmsghdr *nlh,
}
r_up = nlmsg_data(r_nlh);
- r_up->in = net->xfrm.policy_default[XFRM_POLICY_IN];
- r_up->fwd = net->xfrm.policy_default[XFRM_POLICY_FWD];
- r_up->out = net->xfrm.policy_default[XFRM_POLICY_OUT];
+ r_up->in = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_IN]);
+ r_up->fwd = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_FWD]);
+ r_up->out = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT]);
nlmsg_end(r_skb, r_nlh);
return nlmsg_unicast(xfrm_net_nlsk(net, skb), r_skb, portid);
--
2.43.0
^ permalink raw reply related
* [PATCH 1/7] xfrm: use compat translator only for u64 alignment mismatch
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>
From: Sanman Pradhan <psanman@juniper.net>
The XFRM compat layer (CONFIG_XFRM_USER_COMPAT) translates 32-bit xfrm
netlink and setsockopt messages into the native 64-bit layout. It is
only needed on architectures where the 32-bit and 64-bit ABIs disagree
on u64 alignment, which the kernel encodes as COMPAT_FOR_U64_ALIGNMENT.
That symbol is defined only by arch/x86. XFRM_USER_COMPAT depends on it,
so the translator can never be built on any other architecture,
including arm64, which still provides a 32-bit compat ABI (CONFIG_COMPAT)
for AArch32 EL0 userspace. On arm64 the AArch32 EABI already aligns u64
to 8 bytes, identical to the AArch64 ABI, so no translation is required
and the native code path is correct for 32-bit tasks.
However, xfrm_user_rcv_msg() and xfrm_user_policy() gate on
in_compat_syscall() alone and then call xfrm_get_translator(), which
returns NULL when no translator is registered. On arm64 that is always
the case, so every xfrm netlink message and the XFRM_POLICY setsockopt
issued by a 32-bit task returns -EOPNOTSUPP. A 32-bit userspace process
on arm64 (and on any other arch with CONFIG_COMPAT but without
COMPAT_FOR_U64_ALIGNMENT) therefore cannot configure XFRM state or
policy through the XFRM_USER netlink API, and cannot use the XFRM_POLICY
setsockopt path, because both fail before reaching the native parser.
The translator series replaced the blanket compat rejection with a
translator lookup. That made the path usable on x86 when the translator
is available, but left architectures that cannot build the translator
permanently rejected even when their compat layout already matches the
native layout. Let those architectures use the native parser instead.
Gate the translator requirement on COMPAT_FOR_U64_ALIGNMENT instead of
on in_compat_syscall() alone. Gating on the ABI property rather than on
CONFIG_XFRM_USER_COMPAT is deliberate: on x86 with IA32_EMULATION=y but
XFRM_USER_COMPAT=n, a 32-bit task must still be rejected rather than
routed through the native parser, which would misread genuinely
4-byte-aligned x86-32 messages. COMPAT_FOR_U64_ALIGNMENT is the ABI
property that makes the XFRM translator mandatory.
Only the receive/input direction needs the guard. The send, dump and
notification paths already call the translator as "if (xtr) { ... }"
with no error on NULL, so on arches without a translator they no-op and
the kernel emits native 64-bit-layout messages, which is what an AArch32
task expects.
Tested on Juniper SRX hardware: with the fix, 32-bit IPsec userspace
netlink and XFRM_POLICY setsockopt operations that previously failed
with -EOPNOTSUPP now succeed; x86 behaviour is unchanged by inspection.
Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
Fixes: 96392ee5a13b ("xfrm/compat: Translate 32-bit user_policy from sockptr")
Cc: stable@vger.kernel.org
Signed-off-by: Sanman Pradhan <psanman@juniper.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_state.c | 2 +-
net/xfrm/xfrm_user.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 589c3b6e4679..d8457ceaf28c 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2976,7 +2976,7 @@ int xfrm_user_policy(struct sock *sk, int optname, sockptr_t optval, int optlen)
if (IS_ERR(data))
return PTR_ERR(data);
- if (in_compat_syscall()) {
+ if (IS_ENABLED(CONFIG_COMPAT_FOR_U64_ALIGNMENT) && in_compat_syscall()) {
struct xfrm_translator *xtr = xfrm_get_translator();
if (!xtr) {
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 71a4b7278eba..3b1cf29bc402 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -3472,7 +3472,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
if (!netlink_net_capable(skb, CAP_NET_ADMIN))
return -EPERM;
- if (in_compat_syscall()) {
+ if (IS_ENABLED(CONFIG_COMPAT_FOR_U64_ALIGNMENT) && in_compat_syscall()) {
struct xfrm_translator *xtr = xfrm_get_translator();
if (!xtr)
--
2.43.0
^ permalink raw reply related
* [PATCH 3/7] xfrm: Fix dev use-after-free in xfrm async resumption
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>
From: Dong Chenchen <dongchenchen2@huawei.com>
xfrm async resumption hold skb->dev refcnt until after transport_finish.
However, xfrm_rcv_cb may modify skb->dev to tunnel dev without taking
device reference, such as vti_rcv_cb. The subsequent async resumption
will decrement the tunnel device's reference count, which lead to uaf
of tunnel dev and refcnt leak of orig dev as below:
unregister_netdevice: waiting for vti1 to become free. Usage count = -2
Stash the original skb->dev to fix refcnt imbalance. The new skb->dev set
by xfrm_rcv_cb can race with device teardown. Extend rcu protection over
xfrm_rcv_cb and transport_finish to prevent races.
Fixes: 1c428b038400 ("xfrm: hold dev ref until after transport_finish NF_HOOK")
Reported-by: Xu Chunxiao <xuchunxiao3@huawei.com>
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/ipv4/xfrm4_input.c | 2 --
net/ipv6/xfrm6_input.c | 2 --
net/xfrm/xfrm_input.c | 29 ++++++++++++++++-------------
3 files changed, 16 insertions(+), 17 deletions(-)
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index c2eac844bcdb..f6f2a8ef3f88 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -76,8 +76,6 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
dev_net(dev), NULL, skb, dev, NULL,
xfrm4_rcv_encap_finish);
- if (async)
- dev_put(dev);
return 0;
}
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 699a001ac166..89d0443b5307 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -71,8 +71,6 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
dev_net(dev), NULL, skb, dev, NULL,
xfrm6_transport_finish2);
- if (async)
- dev_put(dev);
return 0;
}
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index e4c2cd24936d..eecab337bd0a 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -467,6 +467,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
{
const struct xfrm_state_afinfo *afinfo;
struct net *net = dev_net(skb->dev);
+ struct net_device *dev = skb->dev;
int err;
__be32 seq;
__be32 seq_hi;
@@ -493,7 +494,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
LINUX_MIB_XFRMINSTATEINVALID);
if (encap_type == -1)
- dev_put(skb->dev);
+ dev_put(dev);
goto drop;
}
@@ -655,16 +656,16 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
if (!crypto_done) {
spin_unlock(&x->lock);
- dev_hold(skb->dev);
+ dev_hold(dev);
nexthdr = x->type->input(x, skb);
if (nexthdr == -EINPROGRESS) {
if (async)
- dev_put(skb->dev);
+ dev_put(dev);
return 0;
}
- dev_put(skb->dev);
+ dev_put(dev);
spin_lock(&x->lock);
}
resume:
@@ -699,7 +700,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
err = xfrm_inner_mode_input(x, skb);
if (err == -EINPROGRESS) {
if (async)
- dev_put(skb->dev);
+ dev_put(dev);
return 0;
} else if (err) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR);
@@ -726,9 +727,12 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
crypto_done = false;
} while (!err);
+ rcu_read_lock();
err = xfrm_rcv_cb(skb, family, x->type->proto, 0);
- if (err)
+ if (err) {
+ rcu_read_unlock();
goto drop;
+ }
nf_reset_ct(skb);
@@ -739,8 +743,9 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
if (skb_valid_dst(skb))
skb_dst_drop(skb);
if (async)
- dev_put(skb->dev);
+ dev_put(dev);
gro_cells_receive(&gro_cells, skb);
+ rcu_read_unlock();
return 0;
} else {
xo = xfrm_offload(skb);
@@ -748,23 +753,21 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
xfrm_gro = xo->flags & XFRM_GRO;
err = -EAFNOSUPPORT;
- rcu_read_lock();
afinfo = xfrm_state_afinfo_get_rcu(x->props.family);
if (likely(afinfo))
err = afinfo->transport_finish(skb, xfrm_gro || async);
- rcu_read_unlock();
if (xfrm_gro) {
sp = skb_sec_path(skb);
if (sp)
sp->olen = 0;
if (skb_valid_dst(skb))
skb_dst_drop(skb);
- if (async)
- dev_put(skb->dev);
gro_cells_receive(&gro_cells, skb);
- return err;
}
+ if (async)
+ dev_put(dev);
+ rcu_read_unlock();
return err;
}
@@ -772,7 +775,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
spin_unlock(&x->lock);
drop:
if (async)
- dev_put(skb->dev);
+ dev_put(dev);
xfrm_rcv_cb(skb, family, x && x->type ? x->type->proto : nexthdr, -1);
kfree_skb(skb);
return 0;
--
2.43.0
^ permalink raw reply related
* [PATCH 7/7] xfrm: validate selector family and prefixlen during match
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>
From: Eric Dumazet <edumazet@google.com>
syzbot reported a shift-out-of-bounds in xfrm_selector_match()
due to AF_UNSPEC selector with large prefixlen (e.g. 128) matched
against IPv4 flow (when XFRM_STATE_AF_UNSPEC is set).
Fix this by:
- Rejecting mismatched families in xfrm_selector_match.
- Returning false in addr4_match if prefixlen > 32.
- Returning false in addr_match if prefixlen > 128 (prevents overflow).
Fixes: 3f0ab59e6537 ("xfrm: validate new SA's prefixlen using SA family when sel.family is unset")
Reported-by: syzbot+9383b1ff0df4b29ca5e6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a2fbe35.be3f099c.2836ae.0018.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
include/net/xfrm.h | 7 +++++++
net/xfrm/xfrm_policy.c | 3 +++
2 files changed, 10 insertions(+)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 35a743129329..f8c909b0f0c3 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -943,6 +943,9 @@ static inline bool addr_match(const void *token1, const void *token2,
unsigned int pdw;
unsigned int pbi;
+ if (prefixlen > 128)
+ return false;
+
pdw = prefixlen >> 5; /* num of whole u32 in prefix */
pbi = prefixlen & 0x1f; /* num of bits in incomplete u32 in prefix */
@@ -967,6 +970,10 @@ static inline bool addr4_match(__be32 a1, __be32 a2, u8 prefixlen)
/* C99 6.5.7 (3): u32 << 32 is undefined behaviour */
if (sizeof(long) == 4 && prefixlen == 0)
return true;
+
+ if (prefixlen > 32)
+ return false;
+
return !((a1 ^ a2) & htonl(~0UL << (32 - prefixlen)));
}
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 1f4afd580105..639934f30016 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -242,6 +242,9 @@ __xfrm6_selector_match(const struct xfrm_selector *sel, const struct flowi *fl)
bool xfrm_selector_match(const struct xfrm_selector *sel, const struct flowi *fl,
unsigned short family)
{
+ if (family != sel->family && sel->family != AF_UNSPEC)
+ return false;
+
switch (family) {
case AF_INET:
return __xfrm4_selector_match(sel, fl);
--
2.43.0
^ permalink raw reply related
* [PATCH 4/7] xfrm: Fix xfrm state cache insertion race
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>
From: Herbert Xu <herbert@gondor.apana.org.au>
The xfrm input state cache insertion code checks the validity of
the state before acquiring the global xfrm_state_lock. Thus it's
possible for someone else to kill the state after it passed the
validity check, and then the insertion will add the dead state
to the cache.
Fix this by moving the validity check inside the lock.
This entire function is called on the input path, where BH must
be off (e.g., the caller of this function xfrm_input acquires
its spinlocks without disabling BH).
So there is no need to disable BH here or take the RCU read lock.
Remove both and replace them with an assertion that trips if BH
is accidentally enabled on some future calling path.
Fixes: 81a331a0e72d ("xfrm: Add an inbound percpu state cache.")
Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_state.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index d8457ceaf28c..9e87f7028201 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1207,9 +1207,11 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
struct hlist_head *state_cache_input;
struct xfrm_state *x = NULL;
+ /* BH is always disabled on the input path. */
+ lockdep_assert_in_softirq();
+
state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input);
- rcu_read_lock();
hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) {
if (x->props.family != family ||
x->id.spi != spi ||
@@ -1227,20 +1229,25 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
xfrm_hash_ptrs_get(net, &state_ptrs);
x = __xfrm_state_lookup(&state_ptrs, mark, daddr, spi, proto, family);
-
- if (x && x->km.state == XFRM_STATE_VALID) {
- spin_lock_bh(&net->xfrm.xfrm_state_lock);
- if (hlist_unhashed(&x->state_cache_input)) {
+ if (x) {
+ spin_lock(&net->xfrm.xfrm_state_lock);
+ if (x->km.state != XFRM_STATE_VALID) {
+ /*
+ * The state is about to be destroyed.
+ *
+ * Don't add it to the cache but still
+ * return it to the caller.
+ */
+ } else if (hlist_unhashed(&x->state_cache_input)) {
hlist_add_head_rcu(&x->state_cache_input, state_cache_input);
} else {
hlist_del_rcu(&x->state_cache_input);
hlist_add_head_rcu(&x->state_cache_input, state_cache_input);
}
- spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+ spin_unlock(&net->xfrm.xfrm_state_lock);
}
out:
- rcu_read_unlock();
return x;
}
EXPORT_SYMBOL(xfrm_input_state_lookup);
--
2.43.0
^ permalink raw reply related
* [PATCH 2/7] net: af_key: initialize alg_key_len for IPComp states
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>
From: Zijing Yin <yzjaurora@gmail.com>
pfkey_msg2xfrm_state() handles the IPComp (SADB_X_SATYPE_IPCOMP) case by
allocating x->calg and copying only the algorithm name:
x->calg = kmalloc_obj(*x->calg);
if (!x->calg) {
err = -ENOMEM;
goto out;
}
strcpy(x->calg->alg_name, a->name);
x->props.calgo = sa->sadb_sa_encrypt;
Unlike the authentication (x->aalg) and encryption (x->ealg) branches of
the same function, the compression branch never initializes
calg->alg_key_len. IPComp carries no key and the allocation only
reserves sizeof(struct xfrm_algo) (i.e. no room for a key), so the field
is left containing uninitialized slab data.
calg->alg_key_len is later used as a length by xfrm_algo_clone() when an
IPComp state is cloned during XFRM_MSG_MIGRATE:
xfrm_state_migrate()
xfrm_state_clone_and_setup()
x->calg = xfrm_algo_clone(orig->calg);
kmemdup(orig, xfrm_alg_len(orig));
where xfrm_alg_len() returns sizeof(*alg) + (alg_key_len + 7) / 8. With
a non-zero garbage alg_key_len, kmemdup() reads past the end of the
68-byte calg object. Adding an IPComp SA via PF_KEY and then migrating
it triggers (net-next, KASAN, init_on_alloc=0):
BUG: KASAN: slab-out-of-bounds in kmemdup_noprof+0x44/0x60
Read of size 4164 at addr ff11000025a74980 by task diag2/9287
CPU: 3 UID: 0 PID: 9287 Comm: diag2 7.1.0-rc6-g903db046d557 #1
Call Trace:
<TASK>
dump_stack_lvl+0x10e/0x1f0
print_report+0xf7/0x600
kasan_report+0xe4/0x120
kasan_check_range+0x105/0x1b0
__asan_memcpy+0x23/0x60
kmemdup_noprof+0x44/0x60
xfrm_state_migrate+0x70a/0x1da0
xfrm_migrate+0x753/0x18a0
xfrm_do_migrate+0xb47/0xf10
xfrm_user_rcv_msg+0x411/0xb50
netlink_rcv_skb+0x158/0x420
xfrm_netlink_rcv+0x71/0x90
netlink_unicast+0x584/0x850
netlink_sendmsg+0x8b0/0xdc0
____sys_sendmsg+0x9f7/0xb90
___sys_sendmsg+0x134/0x1d0
__sys_sendmsg+0x16d/0x220
do_syscall_64+0x116/0x7d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Allocated by task 9287:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
__kasan_kmalloc+0xaa/0xb0
pfkey_add+0x2652/0x2ea0
pfkey_process+0x6d0/0x830
pfkey_sendmsg+0x42c/0x850
__sys_sendto+0x461/0x4b0
__x64_sys_sendto+0xe0/0x1c0
do_syscall_64+0x116/0x7d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The buggy address belongs to the object at ff11000025a74980
which belongs to the cache kmalloc-96 of size 96
The buggy address is located 0 bytes inside of
allocated 68-byte region [ff11000025a74980, ff11000025a749c4)
Depending on the uninitialized value the same field can instead request
an oversized kmemdup() allocation and make the migration clone fail.
The XFRM netlink path is not affected: verify_one_alg() rejects an
XFRMA_ALG_COMP attribute shorter than xfrm_alg_len(), so a calg added via
XFRM_MSG_NEWSA is always self-consistent.
Initialize calg->alg_key_len to 0, matching the aalg/ealg branches.
Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Cc: stable@vger.kernel.org
Signed-off-by: Zijing Yin <yzjaurora@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/key/af_key.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 9cffeef18cd9..3216f897a305 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1218,6 +1218,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
goto out;
}
strcpy(x->calg->alg_name, a->name);
+ x->calg->alg_key_len = 0;
x->props.calgo = sa->sadb_sa_encrypt;
} else {
int keysize = 0;
--
2.43.0
^ permalink raw reply related
* [PATCH 6/7] espintcp: use sk_msg_free_partial to fix partial send
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>
From: Sabrina Dubroca <sd@queasysnail.net>
sk_msg_free_partial() ensures consistency of the skmsg at every
iteration, without having to manually handle uncharges and offsets.
This simplifies the code, and fixes some bugs in skmsg accounting when
we don't send the full contents.
Cc: stable@vger.kernel.org
Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
Reported-by: Aaron Esau <aaron1esau@gmail.com>
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/espintcp.c | 34 +++++++---------------------------
1 file changed, 7 insertions(+), 27 deletions(-)
diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c
index d9035546375e..374e1b964438 100644
--- a/net/xfrm/espintcp.c
+++ b/net/xfrm/espintcp.c
@@ -212,43 +212,23 @@ static int espintcp_sendskmsg_locked(struct sock *sk,
struct sk_msg *skmsg = &emsg->skmsg;
bool more = flags & MSG_MORE;
struct scatterlist *sg;
- int done = 0;
int ret;
- sg = &skmsg->sg.data[skmsg->sg.start];
do {
struct bio_vec bvec;
- size_t size = sg->length - emsg->offset;
- int offset = sg->offset + emsg->offset;
- struct page *p;
-
- emsg->offset = 0;
+ sg = &skmsg->sg.data[skmsg->sg.start];
if (sg_is_last(sg) && !more)
msghdr.msg_flags &= ~MSG_MORE;
- p = sg_page(sg);
-retry:
- bvec_set_page(&bvec, p, size, offset);
- iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size);
- ret = tcp_sendmsg_locked(sk, &msghdr, size);
- if (ret < 0) {
- emsg->offset = offset - sg->offset;
- skmsg->sg.start += done;
+ bvec_set_page(&bvec, sg_page(sg), sg->length, sg->offset);
+ iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, sg->length);
+ ret = tcp_sendmsg_locked(sk, &msghdr, sg->length);
+ if (ret < 0)
return ret;
- }
-
- if (ret != size) {
- offset += ret;
- size -= ret;
- goto retry;
- }
- done++;
- put_page(p);
- sk_mem_uncharge(sk, sg->length);
- sg = sg_next(sg);
- } while (sg);
+ sk_msg_free_partial(sk, skmsg, ret);
+ } while (skmsg->sg.size);
memset(emsg, 0, sizeof(*emsg));
--
2.43.0
^ permalink raw reply related
* [PATCH 0/7] pull request (net): ipsec 2026-06-22
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
1) xfrm: use compat translator only for u64 alignment mismatch
Gate the XFRM_USER_COMPAT translator on COMPAT_FOR_U64_ALIGNMENT
so 32-bit compat tasks on arches whose 32-bit ABI already matches
the native 64-bit layout are no longer rejected with -EOPNOTSUPP.
From Sanman Pradhan.
2) net: af_key: initialize alg_key_len for IPComp states
Initialize the alg_key_len to 0 in the IPComp branch of
pfkey_msg2xfrm_state() so an uninitialized value cannot drive
xfrm_alg_len() into a slab-out-of-bounds kmemdup during
XFRM_MSG_MIGRATE. From Zijing Yin.
3) xfrm: Fix dev use-after-free in xfrm async resumption
Stash the original skb->dev and extend the RCU critical section
across xfrm_rcv_cb() and transport_finish() to prevent a
tunnel-device UAF and original-device refcount leak when a
callback replaces skb->dev. From Dong Chenchen.
4) xfrm: Fix xfrm state cache insertion race
Move the state-validity check inside xfrm_state_lock in the
input state cache insertion path so a state cannot be killed
between the check and the insert. From Herbert Xu.
5) xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
Add READ_ONCE()/WRITE_ONCE() annotations on xfrm_policy_count
and xfrm_policy_default to silence the KCSAN data race reported
on net->xfrm.policy_count. From Eric Dumazet.
6) espintcp: use sk_msg_free_partial to fix partial send
Replace the manual skmsg accounting in espintcp with
sk_msg_free_partial() so the skmsg stays consistent on every
iteration and the partial-send accounting bugs go away.
From Sabrina Dubroca.
7) xfrm: validate selector family and prefixlen during match
Reject mismatched address families in xfrm_selector_match() and
bound prefixlen in addr4_match()/addr_match() to prevent the
shift-out-of-bounds syzbot reported when an AF_UNSPEC selector
with a large prefixlen is matched against an IPv4 flow.
From Eric Dumazet.
Please pull or let me know if there are problems.
Thanks!
The following changes since commit 9bf10032894f429b3e221de63cf95a8544511a90:
Merge branch 'tipc-fix-netlink-gate-and-receive-path-bugs' (2026-06-11 16:01:19 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git tags/ipsec-2026-06-22
for you to fetch changes up to 40f0b1047918539f0b0f795ac65e35336b4c2c78:
xfrm: validate selector family and prefixlen during match (2026-06-17 11:17:27 +0200)
----------------------------------------------------------------
ipsec-2026-06-22
----------------------------------------------------------------
Dong Chenchen (1):
xfrm: Fix dev use-after-free in xfrm async resumption
Eric Dumazet (2):
xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
xfrm: validate selector family and prefixlen during match
Herbert Xu (1):
xfrm: Fix xfrm state cache insertion race
Sabrina Dubroca (1):
espintcp: use sk_msg_free_partial to fix partial send
Sanman Pradhan (1):
xfrm: use compat translator only for u64 alignment mismatch
Zijing Yin (1):
net: af_key: initialize alg_key_len for IPComp states
include/net/xfrm.h | 15 +++++++++++----
net/ipv4/xfrm4_input.c | 2 --
net/ipv6/xfrm6_input.c | 2 --
net/key/af_key.c | 1 +
net/xfrm/espintcp.c | 34 +++++++---------------------------
net/xfrm/xfrm_input.c | 29 ++++++++++++++++-------------
net/xfrm/xfrm_policy.c | 27 +++++++++++++++------------
net/xfrm/xfrm_state.c | 23 +++++++++++++++--------
net/xfrm/xfrm_user.c | 20 ++++++++++----------
9 files changed, 75 insertions(+), 78 deletions(-)
^ permalink raw reply
* Re: Re: [PATCH net-next v8 3/6] net: stmmac: eic7700: make RGMII delay properties optional
From: Andrew Lunn @ 2026-06-22 7:52 UTC (permalink / raw)
To: 李志
Cc: Maxime Chevallier, devicetree, andrew+netdev, davem, edumazet,
kuba, robh, krzk+dt, conor+dt, netdev, pabeni, mcoquelin.stm32,
alexandre.torgue, rmk+kernel, pjw, palmer, aou, alex, linux-riscv,
linux-stm32, linux-arm-kernel, linux-kernel, ningyu, linmin,
pinkesh.vaghela, pritesh.patel, weishangjuan, horms, lee
In-Reply-To: <512b77d5.993b.19eed207fc9.Coremail.lizhi2@eswincomputing.com>
> I'm preparing a v9 of the series. The next revision will address the
> issues reported by Sashiko review, mainly DT binding schema and DTS
> warnings.
>
> Before I post v9, I'd like to check whether you have any concerns or
> suggestions regarding the driver changes.
From what i remember, i think the patch was O.K, but i've looked at
100s of other patches since then. The commit message sounds like the
basic design is correct.
Andrew
^ permalink raw reply
* Re: "ip help" output is an error
From: David Laight @ 2026-06-22 7:49 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Dmitri Seletski, netdev
In-Reply-To: <20260621082105.1196ef72@phoenix.local>
On Sun, 21 Jun 2026 08:21:05 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:
> On Sat, 20 Jun 2026 10:36:31 +0100
> Dmitri Seletski <drjoms@gmail.com> wrote:
>
> > Hello iproute2 maintainers,
> >
> > I am reporting an inconsistency regarding the exit status of the ip help
> > command.
> >
> > Current Behavior:
> > When running ip help, the command prints the help documentation to
> > stdout, but exits with a non-zero status (error). This causes issues in
> > shell scripts that rely on exit codes for control flow.
> >
> > Steps to reproduce:
> > bash
> >
> > # This returns "FAIL" because the exit code is non-zero
> > if ip help > /dev/null; then
> > echo "SUCCESS"
> > else
> > echo "FAIL"
> > fi
> >
> > Expected Behavior:
> > Since the command successfully performs the requested task (displaying
> > help information) and does not encounter a system error, it should
> > return an exit code of 0.
> >
> > Context:
> > This behavior breaks standard Bash logic for automation. For example:
> > ip help && echo "This will not execute"
> >
> > "ip help |grep br" - this will bring no result.
> >
> > Current version tested: iproute2-6.19.0
> >
> > Thank you for your time and for maintaining this tool.
> >
> > Regards,
> > Dmitri Seletski
> >
> >
>
> Yes iproute2 doesn't do a great job of handling error codes
> with usage vs help. Its a bug and no one has bothered to fix it.
>
The version I've got does write(2, "Usage...", 972); exit(-1);
Changing it to do write(1, ...) is likely to break scripts, and making
it do exit(0) is likely cause new scripts to fail on old systems.
The 'grep' works fine if you redirect stderr to stdout.
The exit(-1) is a bug; the parameter is only 8 bits and the high bit
is expected to be used to indicate abnormal termination (eg by a signal).
That should probably be changed to exit(1), there doesn't seem to be
a standard way to differentiate between command line errors and
operational ones.
David
^ permalink raw reply
* [PATCH v4] net: mvneta: re-enable percpu interrupt on resume
From: Yun Zhou @ 2026-06-22 7:43 UTC (permalink / raw)
To: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
bigeasy, clrkwllms, rostedt
Cc: netdev, linux-kernel, linux-rt-devel, yun.zhou
On Marvell MPIC platforms (Armada 370/XP/38x), mvneta uses a percpu
IRQ disable/enable scheme for NAPI: the ISR (mvneta_percpu_isr) calls
disable_percpu_irq() to mask the MPIC per-CPU interrupt and schedules
NAPI poll, which calls enable_percpu_irq() on completion to unmask.
If suspend occurs while NAPI poll is pending (between
disable_percpu_irq in the ISR and enable_percpu_irq in poll
completion), the interrupt is never re-enabled:
1. mvneta_percpu_isr: disable_percpu_irq() + napi_schedule()
=> MPIC masked, percpu_enabled cpumask bit cleared
2. NAPI poll does not complete before suspend proceeds
(on PREEMPT_RT this is highly likely since softirqs run in
ksoftirqd which gets frozen; on non-RT it can happen when
softirq processing is deferred to ksoftirqd)
3. mvneta_stop_dev => napi_disable(): cancels the pending poll
without executing the completion path
4. suspend_device_irqs => IRQCHIP_MASK_ON_SUSPEND: masks MPIC
(already masked, but records IRQS_SUSPENDED)
5. Resume: mpic_resume checks irq_percpu_is_enabled() => false
(bit was cleared in step 1) => skips unmask
6. mvneta_start_dev only restores device-level INTR_NEW_MASK,
does not touch the MPIC per-CPU mask
Result: MPIC per-CPU interrupt stays masked permanently. The NIC
generates interrupts (INTR_NEW_CAUSE != 0) but the CPU never
receives them, causing complete loss of network connectivity.
Fix by calling on_each_cpu(mvneta_percpu_enable) in the resume path
to unconditionally unmask the MPIC per-CPU interrupt regardless of
pre-suspend state.
Fixes: 12bb03b436da ("net: mvneta: Handle per-cpu interrupts")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
v4:
- Rewrite commit message with accurate root cause analysis.
v3:
- Dropped the free_irq/request_irq approach (incorrect root cause).
- Instead, call on_each_cpu(mvneta_percpu_enable) in the resume path
to ensure the MPIC percpu IRQ is unmasked, matching mvneta_open().
- Updated commit message with correct root cause analysis.
v2:
- Move request_irq before cpuhp registration in resume (matching
mvneta_open ordering) so that failure does not leave cpuhp
callbacks registered on a non-functional device.
- On request_irq failure, call netif_device_detach() to prevent
further traffic on the dead interface.
drivers/net/ethernet/marvell/mvneta.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 488f2663ad2c..543e566425c1 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -5918,6 +5918,9 @@ static int mvneta_resume(struct device *device)
rtnl_unlock();
mvneta_set_rx_mode(dev);
+ if (!pp->neta_armada3700)
+ on_each_cpu(mvneta_percpu_enable, pp, true);
+
return 0;
}
#endif
--
2.43.0
^ permalink raw reply related
* Re: [syzbot] [net?] INFO: task hung in nsim_destroy (4)
From: syzbot @ 2026-06-22 7:42 UTC (permalink / raw)
To: andrew+netdev, andrew, davem, edumazet, kuba, linux-kernel,
netdev, pabeni, syzkaller-bugs
In-Reply-To: <000000000000f9be320619be1c0a@google.com>
syzbot has found a reproducer for the following issue on:
HEAD commit: b85966adbf5d Merge tag 'net-next-7.2' of git://git.kernel...
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=11f167b6580000
kernel config: https://syzkaller.appspot.com/x/.config?x=9a9f723a32776544
dashboard link: https://syzkaller.appspot.com/bug?extid=8141dcbd23a8f857798a
compiler: Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15cf400a580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1013a50e580000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/d65306d96573/disk-b85966ad.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/ef43139aab0e/vmlinux-b85966ad.xz
kernel image: https://storage.googleapis.com/syzbot-assets/26d4d1ab67c3/bzImage-b85966ad.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8141dcbd23a8f857798a@syzkaller.appspotmail.com
INFO: task kworker/R-netns:8 blocked for more than 140 seconds.
Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/R-netns state:D stack:27768 pid:8 tgid:8 ppid:2 task_flags:0x4208060 flags:0x00080000
Workqueue: netns cleanup_net
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5504 [inline]
__schedule+0x17d9/0x56c0 kernel/sched/core.c:7228
__schedule_loop kernel/sched/core.c:7307 [inline]
schedule+0x164/0x360 kernel/sched/core.c:7322
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7379
__mutex_lock_common kernel/locking/mutex.c:726 [inline]
__mutex_lock+0x7bf/0x1550 kernel/locking/mutex.c:821
rtnl_net_lock include/linux/rtnetlink.h:130 [inline]
rtnl_net_dev_lock+0x257/0x2f0 net/core/dev.c:2163
unregister_netdevice_notifier_dev_net+0x96/0x450 net/core/dev.c:2208
nsim_destroy+0xfd/0x800 drivers/net/netdevsim/netdev.c:1183
__nsim_dev_port_del+0x14e/0x200 drivers/net/netdevsim/dev.c:1547
nsim_dev_port_del_all drivers/net/netdevsim/dev.c:1561 [inline]
nsim_dev_reload_destroy+0x288/0x490 drivers/net/netdevsim/dev.c:1785
nsim_dev_reload_down+0x8a/0xc0 drivers/net/netdevsim/dev.c:1038
devlink_reload+0x1c5/0x890 net/devlink/dev.c:462
devlink_pernet_pre_exit+0x1ff/0x420 net/devlink/core.c:560
ops_pre_exit_list net/core/net_namespace.c:161 [inline]
ops_undo_list+0x17d/0x8d0 net/core/net_namespace.c:234
cleanup_net+0x572/0x810 net/core/net_namespace.c:702
process_one_work kernel/workqueue.c:3314 [inline]
process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
rescuer_thread+0x7b6/0x10b0 kernel/workqueue.c:3621
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
INFO: task kworker/0:1:10 blocked for more than 142 seconds.
Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:1 state:D stack:27096 pid:10 tgid:10 ppid:2 task_flags:0x4208060 flags:0x00080000
Workqueue: events request_firmware_work_func
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5504 [inline]
__schedule+0x17d9/0x56c0 kernel/sched/core.c:7228
__schedule_loop kernel/sched/core.c:7307 [inline]
schedule+0x164/0x360 kernel/sched/core.c:7322
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7379
__mutex_lock_common kernel/locking/mutex.c:726 [inline]
__mutex_lock+0x7bf/0x1550 kernel/locking/mutex.c:821
regdb_fw_cb+0x7d/0x1c0 net/wireless/reg.c:1005
request_firmware_work_func+0xf2/0x1a0 drivers/base/firmware_loader/main.c:1152
process_one_work kernel/workqueue.c:3314 [inline]
process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
INFO: task kworker/u8:0:12 blocked for more than 143 seconds.
Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:0 state:D stack:23864 pid:12 tgid:12 ppid:2 task_flags:0x4208060 flags:0x00080000
Workqueue: ipv6_addrconf addrconf_dad_work
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5504 [inline]
__schedule+0x17d9/0x56c0 kernel/sched/core.c:7228
__schedule_loop kernel/sched/core.c:7307 [inline]
schedule+0x164/0x360 kernel/sched/core.c:7322
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7379
__mutex_lock_common kernel/locking/mutex.c:726 [inline]
__mutex_lock+0x7bf/0x1550 kernel/locking/mutex.c:821
rtnl_net_lock include/linux/rtnetlink.h:130 [inline]
addrconf_dad_work+0x116/0x15c0 net/ipv6/addrconf.c:4223
</TASK>
INFO: task syz-executor831:5632 blocked for more than 144 seconds.
Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor831 state:R running task stack:22336 pid:5632 tgid:5632 ppid:5629 task_flags:0x400140 flags:0x00080000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5504 [inline]
__schedule+0x17d9/0x56c0 kernel/sched/core.c:7228
</TASK>
INFO: lockdep is turned off.
NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 31 Comm: khungtaskd Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
nmi_cpu_backtrace+0x274/0x2d0 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x17a/0x300 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:162 [inline]
__sys_info lib/sys_info.c:157 [inline]
sys_info+0x135/0x170 lib/sys_info.c:165
check_hung_uninterruptible_tasks kernel/hung_task.c:353 [inline]
watchdog+0xfd7/0x1030 kernel/hung_task.c:561
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 UID: 0 PID: 808 Comm: kworker/0:2 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
Workqueue: wg-crypt-wg0 wg_packet_encrypt_worker
RIP: 0010:memset_orig+0x25/0xb0 arch/x86/lib/memset_64.S:64
Code: 90 90 90 90 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01 01 01 48 0f af c1 41 89 f9 41 83 e1 07 75 74 48 89 d1 48 c1 e9 06 <74> 39 66 0f 1f 84 00 00 00 00 00 48 ff c9 48 89 07 48 89 47 08 48
RSP: 0018:ffffc90000006f20 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffc90000007078 RCX: 0000000000000000
RDX: 0000000000000010 RSI: 0000000000000000 RDI: ffffc90000007078
RBP: 1ffffffff2189698 R08: ffffc90000007087 R09: 0000000000000000
R10: ffffc90000007078 R11: fffff52000000e11 R12: dffffc0000000000
R13: ffffffff90c4b4c0 R14: ffffc90000007028 R15: ffffc90000007070
FS: 0000000000000000(0000) GS:ffff88812527c000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055968609b450 CR3: 000000000e746000 CR4: 00000000003526f0
Call Trace:
<IRQ>
unwind_next_frame+0xd04/0x2550 arch/x86/kernel/unwind_orc.c:621
__unwind_start+0x514/0x660 arch/x86/kernel/unwind_orc.c:787
unwind_start arch/x86/include/asm/unwind.h:64 [inline]
arch_stack_walk+0xe3/0x150 arch/x86/kernel/stacktrace.c:24
stack_trace_save+0xa9/0x100 kernel/stacktrace.c:122
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
poison_slab_object mm/kasan/common.c:253 [inline]
__kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
kasan_slab_free include/linux/kasan.h:235 [inline]
slab_free_hook mm/slub.c:2700 [inline]
slab_free mm/slub.c:6310 [inline]
kmem_cache_free+0x182/0x650 mm/slub.c:6437
nft_synproxy_eval_v4+0x383/0x530 net/netfilter/nft_synproxy.c:61
nft_synproxy_do_eval+0x335/0x550 net/netfilter/nft_synproxy.c:142
expr_call_ops_eval net/netfilter/nf_tables_core.c:237 [inline]
nft_do_chain+0x48d/0x1b10 net/netfilter/nf_tables_core.c:285
nft_do_chain_inet+0x360/0x4b0 net/netfilter/nft_chain_filter.c:162
nf_hook_entry_hookfn include/linux/netfilter.h:158 [inline]
nf_hook_slow+0xc5/0x220 net/netfilter/core.c:619
nf_hook include/linux/netfilter.h:273 [inline]
NF_HOOK+0x21f/0x3c0 include/linux/netfilter.h:316
NF_HOOK+0x336/0x3c0 include/linux/netfilter.h:318
__netif_receive_skb_one_core net/core/dev.c:6206 [inline]
__netif_receive_skb net/core/dev.c:6319 [inline]
process_backlog+0xa34/0x1860 net/core/dev.c:6670
__napi_poll+0xaa/0x330 net/core/dev.c:7729
napi_poll net/core/dev.c:7792 [inline]
net_rx_action+0x61d/0xf50 net/core/dev.c:7949
handle_softirqs+0x225/0x840 kernel/softirq.c:622
do_softirq+0x76/0xd0 kernel/softirq.c:523
</IRQ>
<TASK>
__local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
spin_unlock_bh include/linux/spinlock.h:396 [inline]
ptr_ring_consume_bh include/linux/ptr_ring.h:393 [inline]
wg_packet_encrypt_worker+0x16e2/0x1760 drivers/net/wireguard/send.c:293
process_one_work kernel/workqueue.c:3314 [inline]
process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.314 msecs
---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
^ permalink raw reply
* Re: [PATCH iwl-net] idpf: fix max_vport related crash on allocation error during init
From: Simon Horman @ 2026-06-22 7:30 UTC (permalink / raw)
To: Emil Tantilov
Cc: intel-wired-lan, netdev, anthony.l.nguyen, przemyslaw.kitszel,
andrew+netdev, davem, edumazet, kuba, pabeni, madhu.chittim
In-Reply-To: <20260618192325.8694-1-emil.s.tantilov@intel.com>
On Thu, Jun 18, 2026 at 12:23:25PM -0700, Emil Tantilov wrote:
> Set adapter->max_vports only after successful allocation of vports, netdevs
> and vport_config buffers. This fixes possible crashes on reset or rmmod,
> following failed allocation on init
>
> [ 305.981402] idpf 0000:83:00.0: enabling device (0100 -> 0102)
> [ 305.994464] idpf 0000:83:00.0: Device HW Reset initiated
> [ 320.416872] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 320.416918] #PF: supervisor read access in kernel mode
> [ 320.416942] #PF: error_code(0x0000) - not-present page
> [ 320.416963] PGD 2099657067 P4D 0
> [ 320.416983] Oops: Oops: 0000 [#1] SMP NOPTI
> ...
> [ 320.417093] RIP: 0010:idpf_remove+0x118/0x200 [idpf]
> [ 320.417130] Code: 8b bb 98 09 00 00 e8 17 0f 5b e5 48 8b bb e8 08 00 00 e8 0b 0f 5b e5 66 83 bb 28 06 00 00 00 48 8b bb 20 06 00 00 74 49 31 ed <48> 8b 04 ef 48 85 c0 74 2f 48 8b 78 20 e8 66 58 91 e5 48 8b 83 20
> [ 320.417183] RSP: 0018:ff7322212903fdb8 EFLAGS: 00010246
> [ 320.417205] RAX: 0000000000000000 RBX: ff4463de40300000 RCX: ff7322212903fd4c
> [ 320.417228] RDX: 0000000000000001 RSI: ffffffffa7f7d100 RDI: 0000000000000000
> [ 320.417250] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> [ 320.417272] R10: 0000000000000001 R11: ff4463de3a638f58 R12: ff4463be89ac7000
> [ 320.417294] R13: ff4463be89ac7198 R14: ff4463be94fc7198 R15: ffffffffc0f10f20
> [ 320.417317] FS: 00007f963c0e6740(0000) GS:ff4463fdd65d8000(0000) knlGS:0000000000000000
> [ 320.417342] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 320.417362] CR2: 0000000000000000 CR3: 00000020ba674002 CR4: 0000000000773ef0
> [ 320.417385] PKRU: 55555554
> [ 320.417398] Call Trace:
> [ 320.417412] <TASK>
> [ 320.417429] pci_device_remove+0x42/0xb0
> [ 320.417459] device_release_driver_internal+0x1a9/0x210
> [ 320.417492] driver_detach+0x4b/0x90
> [ 320.417516] bus_remove_driver+0x70/0x100
> [ 320.417539] pci_unregister_driver+0x2e/0xb0
> [ 320.417564] __do_sys_delete_module.constprop.0+0x190/0x2f0
> [ 320.417592] ? kmem_cache_free+0x31e/0x550
> [ 320.417619] ? lockdep_hardirqs_on_prepare+0xde/0x190
> [ 320.417644] ? do_syscall_64+0x38/0x6b0
> [ 320.417665] do_syscall_64+0xc8/0x6b0
> [ 320.417683] ? clear_bhb_loop+0x30/0x80
> [ 320.417706] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 320.417727] RIP: 0033:0x7f963bb30beb
>
> Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
FTR, an AI generated review of this patch is available on sashiko.dev.
I think that the issue raised there can be looked at in the context of
possible follow-up.
^ permalink raw reply
* Re: [PATCH v1 net] ipv4: fib: Don't ignore error route in local/main tables.
From: Ido Schimmel @ 2026-06-22 7:05 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David Ahern, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <20260619212753.3367244-1-kuniyu@google.com>
On Fri, Jun 19, 2026 at 09:27:20PM +0000, Kuniyuki Iwashima wrote:
> When CONFIG_IP_MULTIPLE_TABLES is enabled but no rule is added,
> fib_lookup() performs route lookup directly on two tables.
>
> Since the first lookup does not properly bail out, the result
> of an error route in the merged local/main table could be
> overwritten by another route in the default table:
>
> # unshare -n
> # ip link set lo up
> # ip route add 192.168.0.0/24 dev lo table 253
> # ip route add unreachable 192.168.0.0/24
> # ip route get 192.168.0.1
> 192.168.0.1 dev lo table default uid 0
> cache <local>
>
> Once a random rule is added, the error route is respected:
>
> # ip rule add table 0
> # ip rule del table 0
> # ip route get 192.168.0.1
> RTNETLINK answers: No route to host
>
> Let's fix the inconsistent behaviour.
>
> Fixes: f4530fa574df ("ipv4: Avoid overhead when no custom FIB rules are installed.")
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox