* Re: [PATCH net-next v2 14/14] net: macb: use context swapping in .ndo_change_mtu()
From: Jakub Kicinski @ 2026-04-14 0:56 UTC (permalink / raw)
To: theo.lebrun
Cc: Jakub Kicinski, nicolas.ferre, claudiu.beznea, andrew+netdev,
davem, edumazet, pabeni, richardcochran, linux, pvalerio, conor,
nb, vladimir.kondratiev, gregory.clement, benoit.monin,
tawfik.bayouk, thomas.petazzoni, maxime.chevallier, netdev,
linux-kernel
In-Reply-To: <20260410-macb-context-v2-14-af39f71d40b6@bootlin.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: macb: use context swapping in .ndo_change_mtu()
This commit introduces a mechanism to perform context swapping when changing
the MTU while the interface is running. It allocates a new context,
reconfigures the hardware via a start and end swap process, and then frees
the old context to avoid PHY reinitialization and gracefully handle memory
pressure.
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index d1536cb1e40bb..79371718f8392 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -3430,11 +3430,25 @@ static int macb_close(struct net_device *netdev)
>
> static int macb_change_mtu(struct net_device *netdev, int new_mtu)
> {
> - if (netif_running(netdev))
> - return -EBUSY;
> + struct macb *bp = netdev_priv(netdev);
> + bool running = netif_running(netdev);
> + struct macb_context *new_ctx;
> +
> + if (running) {
> + new_ctx = macb_context_alloc(bp, new_mtu,
> + bp->configured_rx_ring_size,
> + bp->configured_tx_ring_size);
> + if (IS_ERR(new_ctx))
> + return PTR_ERR(new_ctx);
> +
> + macb_context_swap_start(bp);
Does macb_context_swap_start() gracefully halt the transmitter before the
context is freed?
It appears to attempt disabling the hardware transmitter by clearing the
TE and RE bits in the NCR register, but does not poll the hardware to ensure
it has reached an idle state.
Immediately after, macb_context_swap_end() frees the descriptor rings and data
buffers via macb_free_consistent().
If the MAC DMA engine is still mid-transfer, could it read or write to kernel
memory that has already been freed or reallocated?
Can this code cause an AB-BA deadlock between napi_disable() and
cancel_work_sync()?
In macb_context_swap_start(), the code calls napi_disable() for all queues,
and then immediately calls cancel_work_sync() on the error tasks.
Both of these work items internally invoke napi_disable().
If either error task runs concurrently, it will block forever inside its
napi_disable() waiting for the NAPI state bits (which were already set by the
first call in macb_context_swap_start()) to clear.
Consequently, could cancel_work_sync() hang indefinitely waiting for the
blocked work task to finish?
> }
>
> WRITE_ONCE(netdev->mtu, new_mtu);
>
> + if (running)
> + macb_context_swap_end(bp, new_ctx);
Does macb_context_swap_end() leak pending TX buffers and their DMA mappings?
When changing the MTU while the interface is running, macb_context_swap_end()
calls macb_free_consistent() to free the old DMA context. While RX buffers are
properly freed, for the TX queues it appears to only free the tracking array
pointer without iterating through the ring to unmap the DMA mappings and free
the actual buffers.
Any packets queued in the hardware ring at the moment of the MTU swap might
be permanently leaked, which could lead to memory and IOMMU space exhaustion
under active traffic.
Will MTU changes on EMAC devices permanently disable hardware interrupts?
macb_context_swap_start() disables all interrupts unconditionally by writing
-1 to the IDR register.
However, macb_context_swap_end() restores interrupts via the IER register
inside an explicit check:
if (!(bp->caps & MACB_CAPS_MACB_IS_EMAC)) {
...
}
Since there is no else branch to re-enable interrupts for EMAC devices,
could changing the MTU on these devices leave interrupts permanently masked
and cause the network interface to silently stall?
Is there a missing spinlock when modifying the IER and NCR registers in
macb_context_swap_end()?
Modifications to the NCR and interrupt registers in macb_context_swap_start()
are protected by the spinlock. However, macb_context_swap_end() performs
lockless read-modify-write operations on both IER and NCR:
ctrl = macb_readl(bp, NCR);
macb_writel(bp, NCR, ctrl | MACB_BIT(RE) | MACB_BIT(TE));
If concurrent hardware events or background tasks take the lock and modify
NCR during this window, could their updates be silently overwritten?
> +
> return 0;
> }
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH iwl-net v2 2/6] ixgbe: add bounds check for debugfs register access
From: Jacob Keller @ 2026-04-14 1:00 UTC (permalink / raw)
To: Simon Horman, Aleksandr Loktionov
Cc: intel-wired-lan, anthony.l.nguyen, netdev, Paul Greenwalt
In-Reply-To: <20260413103050.GL469338@kernel.org>
On 4/13/2026 3:30 AM, Simon Horman wrote:
> On Wed, Apr 08, 2026 at 03:11:50PM +0200, Aleksandr Loktionov wrote:
>> From: Paul Greenwalt <paul.greenwalt@intel.com>
>>
>> Prevent out-of-bounds MMIO accesses triggered through user-controlled
>> register offsets. IXGBE_HFDR (0x15FE8) is the highest valid MMIO
>> register in the ixgbe register map; any offset beyond it would address
>> unmapped memory.
>>
>> Add a defense-in-depth check at two levels:
>>
>> 1. ixgbe_read_reg() -- the noinline register read accessor. A
>> WARN_ON_ONCE() guard here catches any future code path (including
>> ioctl extensions) that might inadvertently pass an out-of-range
>> offset without relying on higher layers to catch it first.
>> ixgbe_write_reg() is a static inline called from the TX/RX hot path;
>> adding WARN_ON_ONCE there would inline the check at every call site,
>> so only the read path gets this guard.
>>
>> 2. ixgbe_dbg_reg_ops_write() -- the debugfs 'reg_ops' interface is the
>> only current path where a raw, user-supplied offset enters the driver.
>> Gating it before invoking the register accessors provides a clean,
>> user-visible failure (silent ignore with no kernel splat) for
>> deliberately malformed debugfs writes.
>>
>> Add a reg <= IXGBE_HFDR guard to both the read and write paths in
>> ixgbe_dbg_reg_ops_write(), and a WARN_ON_ONCE + early-return guard to
>> ixgbe_read_reg().
>>
>> Fixes: 91fbd8f081e2 ("ixgbe: added reg_ops file to debugfs")
>> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>> ---
>> v1 -> v2:
>> - Add Fixes: tag; reroute from iwl-next to iwl-net (security-relevant
>> hardening for user-controllable out-of-bounds MMIO).
>
> Thanks for the update.
>
> And sorry for not thinking to ask this earlier: this patch
> addresses possible overruns of the mapped address space if the
> supplied value for reg is too large. But do we also need a
> guard against underrun if the value for reg is too small?
>
I don't think so. This is bounds checking a register offset which is an
unsigned 32-bit value and begins at 0, so the map goes from 0 to
IXGBE_HFDR. Since the value is unsigned, if it does underflow somehow it
would then get caught by the check for IXGBE_HFDR right?
Thanks,
Jake
> ...
^ permalink raw reply
* Re: [RFC] Proposal: Add sysfs interface for PCIe TPH Steering Tag retrieval and configuration
From: fengchengwen @ 2026-04-14 1:07 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jason Gunthorpe, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang
In-Reply-To: <20260413191930.GP21470@unreal>
On 4/14/2026 3:19 AM, Leon Romanovsky wrote:
> On Mon, Apr 13, 2026 at 08:04:10PM +0800, fengchengwen wrote:
>> On 4/13/2026 6:01 PM, Leon Romanovsky wrote:
>>> On Fri, Apr 10, 2026 at 10:30:52PM +0800, fengchengwen wrote:
>>>> Hi all,
>>>>
>>>> I'm writing to propose adding a sysfs interface to expose and configure the
>>>> PCIe TPH
>>>> Steering Tag for PCIe devices, which is retrieved inside the kernel.
>>>>
>>>>
>>>> Background: The TPH Steering Tag is tightly coupled with both a PCIe device
>>>> (identified
>>>> by its BDF) and a CPU core. It can only be obtained in kernel mode. To allow
>>>> user-space
>>>> applications to fetch and set this value securely and conveniently, we need
>>>> a standard
>>>> kernel-to-user interface.
>>>>
>>>>
>>>> Proposed Solution: Add several sysfs attributes under each PCIe device's
>>>> sysfs directory:
>>>> 1. /sys/bus/pci/devices/<BDF>/tph_mode to query the TPH mode (interrupt or
>>>> device specific)
>>>> 2. /sys/bus/pci/devices/<BDF>/tph_enable to control the TPH feature
>>>> 3. /sys/bus/pci/devices/<BDF>/tph_st to support both read and write
>>>> operations, e.g.:
>>>> Read operation:
>>>> echo "cpu=3" > /sys/bus/pci/devices/0000:01:00.0/tph_st
>>>> cat /sys/bus/pci/devices/0000:01:00.0/tph_st
>>>> Write operation:
>>>> echo "index=10 st=123" > /sys/bus/pci/devices/0000:01:00.0/tph_st
>>>>
>>>>
>>>> The design strictly follows PCI subsystem sysfs standards and has the
>>>> following key properties:
>>>>
>>>> 1. Dynamic Visibility: The sysfs attributes will only be present for PCIe
>>>> devices that
>>>> support TPH Steering Tag. Devices without TPH capability will not show
>>>> these nodes,
>>>> avoiding unnecessary user confusion.
>>>>
>>>> 2. Permission Control: The attributes will use 0600 file permissions,
>>>> ensuring only
>>>> privileged root users can read or write them, which satisfies security
>>>> requirements
>>>> for hardware configuration interfaces.
>>>>
>>>> 3. Standard Implementation Location: The interface will be implemented in
>>>> drivers/pci/pci-sysfs.c, the canonical location for all PCI device sysfs
>>>> attributes,
>>>> ensuring consistency and maintainability within the PCI subsystem.
>>>>
>>>>
>>>> Why sysfs instead of alternatives like VFIO-PCI ioctl:
>>>>
>>>> - Universality: sysfs does not require binding the device to a special
>>>> driver such as
>>>> vfio-pci. It is available to any privileged user-space component,
>>>> including system
>>>> utilities, daemons, and monitoring tools.
>>>>
>>>> - Simplicity: Both user-space usage (cat/echo) and kernel implementation are
>>>> straightforward, reducing code complexity and long-term maintenance cost.
>>>>
>>>> - Design Alignment: TPH Steering Tag is a generic PCIe device feature, not
>>>> specific to
>>>> user-space drivers like DPDK or VFIO. Exposing it via sysfs matches the
>>>> kernel's
>>>> standard pattern for hardware capabilities.
>>>>
>>>>
>>>> I look forward to your comments about this design before submitting the
>>>> final patch.
>>>
>>> You need to explain more clearly why this write functionality is useful
>>> and necessary outside the VFIO/RDMA context:
>>> https://lore.kernel.org/all/20260324234615.3731237-1-zhipingz@meta.com/
>>>
>>> AFAIK, for non-VFIO TPH callers, kernel has enough knowledge to set
>>> right ST values.
>>>
>>> There are several comments regarding the implementation, but those can wait
>>> until the rationale behind the proposal is fully clarified.
>>
>> Thanks for your review and comments.
>>
>> Let me clarify the rationale behind this user-space sysfs interface:
>>
>> 1. VFIO is just one of the user-space device access frameworks.
>> There are many other in-kernel frameworks that expose devices
>> to user space, such as UIO, UACCE, etc., which may also require
>> TPH Steering Tag support.
>>
>> 2. The kernel can automatically program Steering Tags only when
>> the device provides a standard ST table in MSI-X or config space.
>> However, many devices implement vendor-specific or platform-specific
>> Steering Tag programming methods that cannot be fully handled
>> by the generic kernel code.
>>
>> 3. For such devices, user-space applications or framework drivers
>> need to retrieve and configure TPH Steering Tags directly.
>> A unified sysfs interface allows all user-space frameworks
>> (not just VFIO) to use a common, standard way to manage
>> TPH Steering Tags, rather than implementing duplicated logic
>> in each subsystem.
>>
>> This interface provides a uniform method for any user-space
>> device access solution to work with TPH, which is why I believe
>> it is useful and necessary beyond the VFIO/RDMA case.
>
> I understand the rationale for providing a read interface, for example for
> debugging, but I do not see any justification for a write interface.
Thank you for the comment!
As I explained, read interface is not only for debugging. It was used to
such device who don't declare ST location in MSI-X or config-space, the following
is Intel X710 NIC device's lspci output (only TPH part):
Capabilities: [1a0 v1] Transaction Processing Hints
Device specific mode supported
No steering table available
So we could not config the ST for device on kernel because it's vendor specific.
But we could configure ST by it's vendor user-space driver, in this case, we
should get ST from kernel to user-space.
As for write interface, which was used to devices whose ST location is known, I
think we could simple it, and only passing <index, cpu>, then kernel query cpu's ST
and set to corresponding index.
>
> TPH is defined by the PCI specification. If a device intends to support it,
> then it should conform to the specification.
According to the PCI specification 6.17.3 ST Modes of Operation:
Device Specific Mode: It is recommended for the Function to use a Steering Tag value from an ST Table entry, but it is not required.
In the Device Specific Mode of operation, the assignment of the Steering Tags to Requests is device specific. The number
of Steering Tags used by the Function is permitted to be different than the number of interrupt vectors allocated for the
Function, irrespective of the ST Table location, and Steering Tag values used in Requests are not required to come from
the architected ST Table.
Thanks
>
> Thanks
>
>
>>
>> Thanks
>>
>>>
>>> Thanks
>>>
>>>>
>>>> Best regards,
>>>> Chengwen Feng
>>>>
>>
>>
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH iwl-next 1/2] i40e: implement basic per-queue stats
From: Jacob Keller @ 2026-04-14 1:22 UTC (permalink / raw)
To: Paolo Abeni, Loktionov, Aleksandr,
intel-wired-lan@lists.osuosl.org
Cc: Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev, netdev@vger.kernel.org
In-Reply-To: <2bae0dc2-4035-4fe2-a87e-dc5dae6c7df5@redhat.com>
On 4/8/2026 7:44 AM, Paolo Abeni wrote:
> On 4/8/26 2:07 PM, Loktionov, Aleksandr wrote:
>>> -----Original Message-----
>>> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
>>> Of Paolo Abeni
>>> Sent: Wednesday, April 8, 2026 1:44 PM
>>> To: intel-wired-lan@lists.osuosl.org
>>> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
>>> Przemyslaw <przemyslaw.kitszel@intel.com>; Andrew Lunn
>>> <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric
>>> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>;
>>> Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann
>>> <daniel@iogearbox.net>; Jesper Dangaard Brouer <hawk@kernel.org>; John
>>> Fastabend <john.fastabend@gmail.com>; Stanislav Fomichev
>>> <sdf@fomichev.me>; netdev@vger.kernel.org
>>> Subject: [Intel-wired-lan] [PATCH iwl-next 1/2] i40e: implement basic
>>> per-queue stats
>>>
>>> Only expose the counters currently available (bytes, packets); add
>>> account for base stats to deal with ring clear.
>>>
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>> ---
>>> drivers/net/ethernet/intel/i40e/i40e.h | 7 ++
>>> drivers/net/ethernet/intel/i40e/i40e_main.c | 133
>>> ++++++++++++++++++++
>>> 2 files changed, 140 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h
>>> b/drivers/net/ethernet/intel/i40e/i40e.h
>>> index dcb50c2e1aa2..fe642c464e9c 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e.h
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
>>> @@ -836,16 +836,23 @@ struct i40e_vsi {
>>> struct i40e_eth_stats eth_stats;
>>> struct i40e_eth_stats eth_stats_offsets;
>>> u64 tx_restart;
>>
>> ...
>>
>>> +static void i40e_zero_tx_ring_stats(struct netdev_queue_stats_tx *tx)
>>> {
>>> + tx->bytes = 0;
>>> + tx->packets = 0;
>>> + tx->stop = 0;
>>> + tx->wake = 0;
>>> + tx->hw_drops = 0;
>>> +}
>>> +
>>> +static void i40e_add_tx_ring_stats(struct i40e_ring *tx_ring,
>>> + struct netdev_queue_stats_tx *tx) {
>>> + u64 bytes, packets;
>>> + unsigned int start;
>>> +
>>> + do {
>>> + start = u64_stats_fetch_begin(&tx_ring->syncp);
>>> + bytes = tx_ring->stats.bytes;
>>> + packets = tx_ring->stats.packets;
>>> + } while (u64_stats_fetch_retry(&tx_ring->syncp, start));
>>> +
>>> + tx->bytes += bytes;
>>> + tx->packets += packets;
>>> +
>>> + tx->stop += tx_ring->tx_stats.tx_stopped;
>>> + tx->wake += tx_ring->tx_stats.restart_queue;
>>> + tx->hw_drops += tx_ring->tx_stats.tx_busy; }
>> Why the reads are outside the seqlock region?
>> On 32-bit kernels, unprotected u64 reads can tear IMHO
>
Paolo is correct that just moving these into the do/while loop is
useless, since the increments aren't protected properly.
> Currently there is no seqlock on the write side; to keep the series
> small I preferred avoid fixing the pre-existing issue. In any case I
> think moving stop, wake, hw_drops (and others) under seqlock protection
> is an orthogonal change.
>
> /P
I ended up doing some work on ice to fix a lot of similar issues a few
months ago.. The intel drivers weren't using u64_stats_t, and several
error/debug counters were not being handled appropriately.
I'd personally prefer fixing existing issues before we compound them by
adding even more incorrect code. Even on 64bit systems we need to use
READ_ONCE/WRITE_ONCE or local64_t, which the u64_stats_t type uses
internally.
I can understand the desire to limit scope of work, and the issues may
feel "minor" but ultimately I'd rather not see us continue making the
problem bigger instead of fixing it.
However.. if other maintainers feel strongly that the additions are
acceptable despite being incorrect w.r.t. the stats logic, I suppose we
can continue this and have someone from Intel look into cleaning up the
mess like I did for ice.
It looks like the series has some other requested changes either way though.
^ permalink raw reply
* [PATCH net] tcp: make probe0 timer handle expired user timeout
From: Altan Hacigumus @ 2026-04-14 1:36 UTC (permalink / raw)
To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S . Miller,
David Ahern, Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, linux-kernel, Enke Chen, Altan Hacigumus
tcp_clamp_probe0_to_user_timeout() computes remaining time in jiffies
using subtraction with an unsigned lvalue. If elapsed probing time
already exceeds the configured TCP_USER_TIMEOUT, the subtraction
underflows and yields a large value.
Handle this expiration case similarly to tcp_clamp_rto_to_user_timeout().
Fixes: 344db93ae3ee ("tcp: make TCP_USER_TIMEOUT accurate for zero window probes")
Signed-off-by: Altan Hacigumus <ahacigu.linux@gmail.com>
---
net/ipv4/tcp_timer.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 5a14a53a3c9e..4a43356a4e06 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -50,7 +50,8 @@ static u32 tcp_clamp_rto_to_user_timeout(const struct sock *sk)
u32 tcp_clamp_probe0_to_user_timeout(const struct sock *sk, u32 when)
{
const struct inet_connection_sock *icsk = inet_csk(sk);
- u32 remaining, user_timeout;
+ u32 user_timeout;
+ s32 remaining;
s32 elapsed;
user_timeout = READ_ONCE(icsk->icsk_user_timeout);
@@ -61,6 +62,8 @@ u32 tcp_clamp_probe0_to_user_timeout(const struct sock *sk, u32 when)
if (unlikely(elapsed < 0))
elapsed = 0;
remaining = msecs_to_jiffies(user_timeout) - elapsed;
+ if (remaining <= 0)
+ return 1;
remaining = max_t(u32, remaining, TCP_TIMEOUT_MIN);
return min_t(u32, remaining, when);
--
2.43.0
^ permalink raw reply related
* Re: commit 0c4f1c02d27a880b cause a deadlock issue
From: He, Guocai (CN) @ 2026-04-14 1:38 UTC (permalink / raw)
To: Greg KH, Thorsten Leemhuis
Cc: Berg, Johannes, Friend, Linux kernel regressions list,
Korenblit, Miriam Rachel, stable@vger.kernel.org
In-Reply-To: <2026041330-groggy-ruse-5e27@gregkh>
OK, I will send a patch later.
________________________________________
From: Greg KH <gregkh@linuxfoundation.org>
Sent: Monday, April 13, 2026 8:55 PM
To: Thorsten Leemhuis
Cc: He, Guocai (CN); Berg, Johannes; Friend; Linux kernel regressions list; Korenblit, Miriam Rachel; stable@vger.kernel.org
Subject: Re: commit 0c4f1c02d27a880b cause a deadlock issue
CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.
On Mon, Apr 13, 2026 at 01:58:56PM +0200, Thorsten Leemhuis wrote:
> On 4/3/26 15:00, Korenblit, Miriam Rachel wrote:
> >> From: Greg KH <gregkh@linuxfoundation.org>
> >> On Fri, Apr 03, 2026 at 12:44:48PM +0000, Korenblit, Miriam Rachel wrote:
> >>>> -----Original Message-----
> >>>> From: Greg KH <gregkh@linuxfoundation.org>
> >>>> On Fri, Apr 03, 2026 at 11:08:46AM +0000, He, Guocai (CN) wrote:
> >>>>> No, The mainline have no this issue.
> >>>>> The changes of 0c4f1c02d27a880b is not in mainline.
> >>>>
> >>>> That does not make sense, that commit is really commit e1696c8bd005
> >>>> ("wifi: cfg80211: stop NAN and P2P in cfg80211_leave") which is in
> >>>> all of the following releases:
> >>>> 5.10.252 5.15.202 6.1.165 6.6.128 6.12.75 6.18.14 6.19.4 7.0-rc1
> >>>> confused,
> >>> The change is indeed in mainline, but the locking situation in
> >>> mainline is totally different (that mutex does not even exist there)
> >>> Therefore, the issue is not supposed to happen in mainline.
> >>
> >> Ok, does that commit now need to be reverted from some of the stable branches?
> >> If so, which ones?
> >
> > From every version which is < 6.7.
>
> Greg, do you still have this in your todo mail queue somewhere? Just
> wondering, as last weeks 6.6.y released afics lacked a revert of
> e1696c8bd0056b ("wifi: cfg80211: stop NAN and P2P in cfg80211_leave") --
> and I cannot spot one in your public stable queue either.
>
> These are the commits that according to Miri need to be reverted if I
> understood things right:
>
> v6.6.128 (4d7a05da767e5c), v6.1.165 (0c4f1c02d27a88), v5.15.202
> (31344ffecd7a34), v5.10.252 (d91240f24e831d)
It is, yes, my queue is huge :(
It's fastest if someone sends me the reverts and I can easily apply them
that way. Otherwise it takes me a bit to do each one manually :(
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH net-next] pppoe: optimize hash with word access
From: Qingfang Deng @ 2026-04-14 1:47 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andrew Lunn, David S. Miller, Jakub Kicinski, Paolo Abeni,
Guillaume Nault, Kees Cook, Eric Woudstra, netdev, linux-kernel
In-Reply-To: <CANn89iKmxWiCTt7nVk-DZ4R_KsYDYbPwQ5f7Hp5F8hWmr+zc=g@mail.gmail.com>
April 13, 2026 at 4:42 PM, Eric Dumazet wrote:
>
> On Sun, Apr 12, 2026 at 8:52 PM Qingfang Deng <qingfang.deng@linux.dev> wrote:
>
> net-next is closed.
>
> https://lore.kernel.org/netdev/20260412142250.131bf997@kernel.org/
>
> Also I would suggest using hash32(hash, PPPOE_HASH_BITS)
Thanks for the info, but I would like to keep the same algorithm.
^ permalink raw reply
* Re: [PATCH net-next v7 00/15] net: sleepable ndo_set_rx_mode
From: Jakub Kicinski @ 2026-04-14 2:08 UTC (permalink / raw)
To: Stanislav Fomichev; +Cc: netdev, davem, edumazet, pabeni
In-Reply-To: <20260413171131.550126-1-sdf@fomichev.me>
On Mon, 13 Apr 2026 10:11:16 -0700 Stanislav Fomichev wrote:
> This series adds a new ndo_set_rx_mode_async callback that enables
> drivers to handle address list updates in a sleepable context. The
> current ndo_set_rx_mode is called under the netif_addr_lock spinlock
> with BHs disabled, which prevents drivers from sleeping. This is
> problematic for ops-locked drivers that need to sleep.
Hi hi, hit this on the new(ish) queue leasing test with bnxt (debug
kernel):
| [ 1148.733157] kselftest: Running tests in drivers/net/hw
[ 1151.485032] ref_tracker: reference already released.
[ 1151.491522] ref_tracker: allocated in:
[ 1151.496526] __dev_set_rx_mode+0x398/0x4f0
[ 1151.501923] dev_mc_add+0xe7/0x100
[ 1151.506537] igmp6_group_added+0x31c/0x400
[ 1151.511930] __ipv6_dev_mc_inc+0x282/0x590
[ 1151.517320] __ipv6_sock_mc_join+0x40d/0x7c0
[ 1151.522908] do_ipv6_setsockopt+0x3504/0x3700
[ 1151.528594] ipv6_setsockopt+0x7e/0xf0
[ 1151.533596] do_sock_setsockopt+0x164/0x3b0
[ 1151.539089] __sys_setsockopt+0xe4/0x150
[ 1151.544276] __x64_sys_setsockopt+0xbd/0x180
[ 1151.549866] do_syscall_64+0xf3/0x5e0
[ 1151.554763] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 1151.561229] ref_tracker: freed in:
[ 1151.565840] netdev_rx_mode_work+0x205/0x410
[ 1151.571429] process_one_work+0xdf5/0x1410
[ 1151.576824] worker_thread+0x4f1/0xd60
[ 1151.581827] kthread+0x364/0x460
[ 1151.586246] ret_from_fork+0x4a4/0x720
[ 1151.591251] ret_from_fork_asm+0x11/0x20
[ 1151.596704] ------------[ cut here ]------------
[ 1151.602747] WARNING: lib/ref_tracker.c:322 at ref_tracker_free.cold+0x10a/0x1f9, CPU#7: kworker/7:2/5262
[ 1151.614292] Modules linked in:
[ 1151.618626] CPU: 7 UID: 0 PID: 5262 Comm: kworker/7:2 Not tainted 7.0.0-rc7-hmtc-g3cb8f09d448a #1 PREEMPT(full)
[ 1151.630928] Hardware name: Giga Computing E163-Z34-AAH1-000/MZ33-DC1-000, BIOS R30_F44 12/24/2025
[ 1151.641762] Workqueue: events netdev_rx_mode_work
[ 1151.647920] RIP: 0010:ref_tracker_free.cold+0x10a/0x1f9
[ 1151.654646] Code: e0 2a 0f b6 04 02 84 c0 74 04 3c 03 7e 78 8b 7b 18 4c 89 04 24 e8 71 b2 e4 01 4c 8b 04 24 4c 89 c6 48 89 ef e8 32 92 05 04 90 <0f> 0b 90 ba ea ff ff ff e9 06 de e4 01 4c 89 ea 48 89 df 4c 89 04
[ 1151.676612] RSP: 0018:ffa0000035f6fb18 EFLAGS: 00010296
[ 1151.683343] RAX: 0000000000000000 RBX: ff110001f7ae9460 RCX: 0000000000000001
[ 1151.692224] RDX: 00000000ffffffff RSI: 1ffffffff1841334 RDI: 0000000000000001
[ 1151.701102] RBP: ff110001a3c12618 R08: 0000000000000000 R09: 0000000000000000
[ 1151.709973] R10: 0000000000000007 R11: 0000000000000001 R12: 1ff4000006bedf67
[ 1151.718835] R13: ff110001f7ae9478 R14: ff110001fd1949fc R15: ffffffff8a80b9a0
[ 1151.727814] FS: 0000000000000000(0000) GS:ff11001882899000(0000) knlGS:0000000000000000
[ 1151.737696] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1151.744947] CR2: 00007f83287c3574 CR3: 00000017e26b0002 CR4: 0000000000771ef0
[ 1151.753758] PKRU: 55555554
[ 1151.757588] Call Trace:
[ 1151.761130] <TASK>
[ 1151.764282] ? ref_tracker_alloc+0x460/0x460
[ 1151.769872] ? netdev_rx_mode_work+0x205/0x410
[ 1151.775659] ? process_one_work+0xdf5/0x1410
[ 1151.781249] ? worker_thread+0x4f1/0xd60
[ 1151.786450] ? kthread+0x364/0x460
[ 1151.791066] ? ret_from_fork+0x4a4/0x720
[ 1151.796267] ? ret_from_fork_asm+0x11/0x20
[ 1151.801666] ? mark_held_locks+0x40/0x70
[ 1151.806870] netdev_rx_mode_work+0x205/0x410
[ 1151.812461] ? trace_workqueue_execute_start+0x9b/0x180
[ 1151.819124] ? process_one_work+0xdb4/0x1410
[ 1151.824712] process_one_work+0xdf5/0x1410
[ 1151.830111] ? pwq_dec_nr_in_flight+0x710/0x710
[ 1151.835992] ? lock_acquire.part.0+0xbc/0x260
[ 1151.841686] worker_thread+0x4f1/0xd60
[ 1151.846694] ? rescuer_thread+0x1320/0x1320
[ 1151.852188] kthread+0x364/0x460
[ 1151.856609] ? trace_irq_enable.constprop.0+0x9b/0x180
[ 1151.863180] ? kthread_affine_node+0x330/0x330
[ 1151.868965] ret_from_fork+0x4a4/0x720
[ 1151.873965] ? arch_exit_to_user_mode_prepare.isra.0+0xb0/0xb0
[ 1151.881315] ? __switch_to+0x540/0xd00
[ 1151.886320] ? kthread_affine_node+0x330/0x330
[ 1151.892108] ret_from_fork_asm+0x11/0x20
| [ 1151.905076] hardirqs last enabled at (7665): [<ffffffff847c61ea>] __up_console_sem+0x5a/0x70
| [ 1151.915450] hardirqs last disabled at (7676): [<ffffffff847c61cf>] __up_console_sem+0x3f/0x70
| [ 1151.925823] softirqs last enabled at (7600): [<ffffffff8462d50e>] handle_softirqs+0x60e/0x920
| [ 1151.936291] softirqs last disabled at (7595): [<ffffffff8462dca2>] irq_exit_rcu+0xa2/0xf0
| [ 1151.946275] ---[ end trace 0000000000000000 ]---
[ 1152.247817] ref_tracker: netdev@ff110001a3c12618 has 1/1 users at\x0a __dev_set_rx_mode+0x398/0x4f0\x0a dev_mc_add+0xe7/0x100\x0a igmp6_group_added+0x31c/0x400\x0a __ipv6_dev_mc_inc+0x282/0x590\x0a addrconf_dad_begin+0x13c/0x540\x0a addrconf_dad_work+0x170/0x930\x0a process_one_work+0xdf5/0x1410\x0a worker_thread+0x4f1/0xd60\x0a kthread+0x364/0x460\x0a ret_from_fork+0x4a4/0x720\x0a ret_from_fork_asm+0x11/0x20\x0a
[ 1152.318767] ------------[ cut here ]------------
[ 1152.325512] WARNING: lib/ref_tracker.c:246 at ref_tracker_dir_exit+0x466/0x7e0, CPU#27: ip/15056
[ 1152.336246] Modules linked in:
[ 1152.340578] CPU: 27 UID: 0 PID: 15056 Comm: ip Tainted: G W 7.0.0-rc7-hmtc-g3cb8f09d448a #1 PREEMPT(full)
[ 1152.353871] Tainted: [W]=WARN
[ 1152.357997] Hardware name: Giga Computing E163-Z34-AAH1-000/MZ33-DC1-000, BIOS R30_F44 12/24/2025
[ 1152.368759] RIP: 0010:ref_tracker_dir_exit+0x466/0x7e0
[ 1152.375326] Code: e8 03 42 80 3c 38 00 0f 85 57 02 00 00 48 8b 03 49 89 de 49 39 dc 0f 85 2e ff ff ff 48 8b 74 24 10 48 89 ef e8 cb aa 20 02 90 <0f> 0b 90 48 8d 5d 44 be 04 00 00 00 48 89 df e8 a6 ee ec fe 48 89
[ 1152.397212] RSP: 0018:ffa00000380470d0 EFLAGS: 00010286
[ 1152.403876] RAX: 0000000000000000 RBX: ff110001a3c12668 RCX: 0000000000000001
[ 1152.412685] RDX: 00000000ffffffff RSI: 1ffffffff1841334 RDI: 0000000000000001
[ 1152.421496] RBP: ff110001a3c12618 R08: 0000000000000000 R09: 0000000000000000
[ 1152.430303] R10: 000000000000000b R11: 0000000000000001 R12: ff110001a3c12668
[ 1152.439103] R13: dead000000000100 R14: ff110001a3c12668 R15: dffffc0000000000
[ 1152.447911] FS: 00007f05f6331840(0000) GS:ff11001883299000(0000) knlGS:0000000000000000
[ 1152.457811] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1152.465057] CR2: 00007f68f567f440 CR3: 00000001a9d12003 CR4: 0000000000771ef0
[ 1152.473863] PKRU: 55555554
[ 1152.477694] Call Trace:
[ 1152.481231] <TASK>
[ 1152.484384] ? rcu_is_watching+0x15/0xd0
[ 1152.489585] ? ref_tracker_free+0x870/0x870
[ 1152.495077] ? lockdep_hardirqs_on+0x8c/0x130
[ 1152.500764] ? __stack_depot_get_stack_record+0x10/0x10
[ 1152.507427] ? kfree+0x151/0x610
[ 1152.511847] free_netdev+0x402/0x870
[ 1152.516648] ? do_raw_spin_unlock+0x59/0x250
[ 1152.522241] ? _raw_spin_unlock_irqrestore+0x40/0x80
[ 1152.528613] netdev_run_todo+0x83b/0xbf0
[ 1152.533804] ? unregister_netdevice_queued+0x80/0x80
[ 1152.540172] ? mutex_is_locked+0x1c/0x50
[ 1152.545373] ? generic_xdp_install+0x400/0x400
[ 1152.551158] ? mutex_is_locked+0x1c/0x50
[ 1152.556349] ? rtnl_is_locked+0x15/0x20
[ 1152.561450] ? unregister_netdevice_queued+0x16/0x80
[ 1152.567826] rtnl_dellink+0x4a5/0xae0
[ 1152.572737] ? rtnl_mdb_del+0x580/0x580
[ 1152.577839] ? __lock_acquire+0x508/0xc10
[ 1152.583148] ? avc_has_perm_noaudit+0xf6/0x300
[ 1152.588935] ? sched_init_numa+0x7f7/0xc30
[ 1152.594331] ? mark_usage+0x61/0x170
[ 1152.599142] ? __lock_acquire+0x508/0xc10
[ 1152.604443] ? lock_acquire.part.0+0xbc/0x260
[ 1152.610130] ? find_held_lock+0x2b/0x80
[ 1152.615237] ? rtnl_mdb_del+0x580/0x580
[ 1152.620339] ? __lock_release.isra.0+0x6b/0x1a0
[ 1152.626215] ? rtnl_mdb_del+0x580/0x580
[ 1152.631319] rtnetlink_rcv_msg+0x6fd/0xbd0
[ 1152.636718] ? rtnl_fdb_dump+0x690/0x690
[ 1152.641917] ? __lock_acquire+0x508/0xc10
[ 1152.647218] ? lock_acquire.part.0+0xbc/0x260
[ 1152.652905] ? find_held_lock+0x2b/0x80
[ 1152.658008] netlink_rcv_skb+0x14e/0x3a0
[ 1152.663210] ? rtnl_fdb_dump+0x690/0x690
[ 1152.668412] ? netlink_ack+0xcd0/0xcd0
[ 1152.673422] ? netlink_deliver_tap+0xc5/0x330
[ 1152.679110] ? netlink_deliver_tap+0x13c/0x330
[ 1152.684894] netlink_unicast+0x47c/0x740
[ 1152.690098] ? netlink_attachskb+0x800/0x800
[ 1152.695687] ? sock_has_perm+0x283/0x3f0
[ 1152.700879] netlink_sendmsg+0x75b/0xc90
[ 1152.706081] ? netlink_unicast+0x740/0x740
[ 1152.711476] ? __lock_release.isra.0+0x6b/0x1a0
[ 1152.717357] ? __import_iovec+0x36c/0x620
[ 1152.722655] ? __might_fault+0x97/0x140
[ 1152.727749] __sock_sendmsg+0xca/0x180
[ 1152.732747] ? move_addr_to_kernel+0x36/0xf0
[ 1152.738328] ____sys_sendmsg+0x609/0x830
[ 1152.743529] ? copy_msghdr_from_user+0x2a0/0x460
[ 1152.749510] ? kernel_sendmsg+0x30/0x30
[ 1152.754611] ? move_addr_to_kernel+0xf0/0xf0
[ 1152.760204] ? kasan_save_stack+0x3d/0x50
[ 1152.765503] ? kasan_save_stack+0x2f/0x50
[ 1152.770799] ? kasan_record_aux_stack+0x9b/0xc0
[ 1152.776681] ? __call_rcu_common.constprop.0+0xb2/0xa10
[ 1152.783337] ? kmem_cache_free+0x3d0/0x5f0
[ 1152.788733] ? fput_close_sync+0xde/0x1b0
[ 1152.794032] ? __x64_sys_close+0x8b/0xf0
[ 1152.799235] ___sys_sendmsg+0x14e/0x1d0
[ 1152.804337] ? copy_msghdr_from_user+0x460/0x460
[ 1152.810330] ? rcu_is_watching+0x15/0xd0
[ 1152.815532] ? trace_irq_enable.constprop.0+0x9b/0x180
[ 1152.822106] __sys_sendmsg+0x145/0x1f0
[ 1152.827110] ? __sys_sendmsg_sock+0x20/0x20
[ 1152.832608] ? do_raw_spin_unlock+0x59/0x250
[ 1152.838196] ? rcu_is_watching+0x15/0xd0
[ 1152.843398] do_syscall_64+0xf3/0x5e0
[ 1152.848307] ? trace_hardirqs_off+0xd/0x30
[ 1152.853703] ? exc_page_fault+0xda/0xf0
[ 1152.858805] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 1152.865267] RIP: 0033:0x7f05f656b22e
[ 1152.870077] Code: 4d 89 d8 e8 94 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 03 ff ff ff 0f 1f 00 f3 0f 1e fa
[ 1152.891961] RSP: 002b:00007fff388be520 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[ 1152.901262] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f05f656b22e
[ 1152.910070] RDX: 0000000000000000 RSI: 00007fff388be5d0 RDI: 0000000000000003
[ 1152.918880] RBP: 00007fff388be530 R08: 0000000000000000 R09: 0000000000000000
[ 1152.927690] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
[ 1152.936499] R13: 0000000069dd8ee8 R14: 000056113dabf040 R15: 0000000000000000
Either decoding failed or I forgot where NIPA-HW puts the output :S
But I think it's clear enough..
^ permalink raw reply
* [PATCH ipsec-next v3] xfrm: cleanup error path in xfrm_add_policy()
From: Deepanshu Kartikey @ 2026-04-14 2:09 UTC (permalink / raw)
To: steffen.klassert, herbert, davem, edumazet, kuba, pabeni, horms,
sd
Cc: netdev, linux-kernel, Deepanshu Kartikey
Replace the open-coded manual cleanup in the error path of
xfrm_add_policy() with xfrm_policy_destroy(), which already
handles all the necessary cleanup internally. This is consistent
with how xfrm_policy_construct() handles its own error paths.
The walk.dead flag must be set before calling xfrm_policy_destroy()
as required by BUG_ON(!policy->walk.dead).
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
v3:
- Changed prefix to ipsec-next as this is a cleanup
- Dropped syzbot references as suggested by Sabrina Dubroca
v2:
- Reworded commit message to reflect cleanup rather than bugfix
as suggested by Sabrina Dubroca
- Removed incorrect Fixes: and Closes: tags
- Corrected subject prefix to PATCH ipsec
---
net/xfrm/xfrm_user.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index d56450f61669..ae144d1e4a65 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2267,9 +2267,8 @@ static int xfrm_add_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err) {
xfrm_dev_policy_delete(xp);
- xfrm_dev_policy_free(xp);
- security_xfrm_policy_free(xp->security);
- kfree(xp);
+ xp->walk.dead = 1;
+ xfrm_policy_destroy(xp);
return err;
}
--
2.43.0
^ permalink raw reply related
* Re: [PATCH net-next v2] net/smc: cap allocation order for SMC-R physically contiguous buffers
From: D. Wythe @ 2026-04-14 2:10 UTC (permalink / raw)
To: Simon Horman
Cc: D. Wythe, David S. Miller, Dust Li, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Sidraya Jayagond, Wenjia Zhang, Mahanta Jambigi,
Tony Lu, Wen Gu, linux-kernel, linux-rdma, linux-s390, netdev,
oliver.yang, pasic
In-Reply-To: <20260410151631.GY469338@kernel.org>
On Fri, Apr 10, 2026 at 04:16:31PM +0100, Simon Horman wrote:
> On Tue, Apr 07, 2026 at 08:43:37PM +0800, D. Wythe wrote:
> > The alloc_pages() cannot satisfy requests exceeding MAX_PAGE_ORDER,
> > and attempting such allocations will lead to guaranteed failures
> > and potential kernel warnings.
> >
> > For SMCR_PHYS_CONT_BUFS, cap the allocation order to MAX_PAGE_ORDER.
> > This ensures the attempts to allocate the largest possible physically
> > contiguous chunk succeed, instead of failing with an invalid order.
> > This also avoids redundant "try-fail-degrade" cycles in
> > __smc_buf_create().
> >
> > For SMCR_MIXED_BUFS, no cap is needed: if the order exceeds
> > MAX_PAGE_ORDER, alloc_pages() will silently fail (__GFP_NOWARN)
> > and automatically fall back to virtual memory.
> >
> > Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> > ---
> > Changes v1 -> v2:
> > https://lore.kernel.org/netdev/20260312082154.36971-1-alibuda@linux.alibaba.com/
> >
> > - Move the bufsize cap from smcr_new_buf_create() up to
> > __smc_buf_create(), which is simpler and avoids touching
> > the allocation logic itself.
>
> The nit below notwithstanding, this looks good to me.
>
> Reviewed-by: Simon Horman <horms@kernel.org>
>
> > ---
> > net/smc/smc_core.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> > index e2d083daeb7e..cdd881746e21 100644
> > --- a/net/smc/smc_core.c
> > +++ b/net/smc/smc_core.c
> > @@ -2440,6 +2440,10 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
> > /* use socket send buffer size (w/o overhead) as start value */
> > bufsize = smc->sk.sk_sndbuf / 2;
> >
> > + /* limit bufsize for physically contiguous buffers */
> > + if (!is_smcd && lgr->buf_type == SMCR_PHYS_CONT_BUFS)
> > + bufsize = min_t(int, bufsize, (PAGE_SIZE << MAX_PAGE_ORDER));
>
> nit: I think min() is sufficient here, and the inner parentheses are
> unnecessary
Hi Simon,
I think min_t is required here because min() triggers a signedness
error:
././include/linux/compiler_types.h:706:38: error: call to
‘__compiletime_assert_950’ declared with attribute error: min(bufsize,
((1UL) << 12) << 10) signedness error
The inner parentheses can be removed, though.
D. Wythe
>
> > +
> > for (bufsize_comp = smc_compress_bufsize(bufsize, is_smcd, is_rmb);
> > bufsize_comp >= 0; bufsize_comp--) {
> > if (is_rmb) {
> > --
> > 2.45.0
> >
^ permalink raw reply
* Re: [PATCH net-next v2 0/3] Follow-ups to nk_qlease net selftests
From: Jakub Kicinski @ 2026-04-14 2:12 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: netdev, dw, pabeni, razor
In-Reply-To: <20260413220809.604592-1-daniel@iogearbox.net>
On Tue, 14 Apr 2026 00:08:03 +0200 Daniel Borkmann wrote:
> This is a set of follow-ups addressing [0]:
>
> - Split netdevsim tests from HW tests in nk_qlease and move the SW
> tests under selftests/net/
> - Remove multiple ksft_run()s to fix the recently enforced hard-fail
> - Move all the setup inside the test cases for the ones under
> selftests/net/ (I'll defer the HW ones to David)
> - Add more test coverage related to queue leasing behavior and corner
> cases, so now we have 45 tests in nk_qlease.py with netdevsim
> which does not need special HW
LGTM, thanks!
I'll let it run overnight in the CI to shake out any latent flakiness
(and the crash which I think is from Stan's series).
Could you cook up one more follow up to enable VETH in the config?
We're getting:
# # Exception| Traceback (most recent call last):
# # Exception| File "/srv/vmksft/testing/wt-24/tools/testing/selftests/net/lib/py/ksft.py", line 420, in ksft_run
# # Exception| func(*args)
# # Exception| ~~~~^^^^^^^
# # Exception| File "/srv/vmksft/testing/wt-24/tools/testing/selftests/drivers/net/hw/./nk_qlease.py", line 393, in test_veth_queue_create
# # Exception| ip("link add veth0 type veth peer name veth1")
# # Exception| ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# # Exception| File "/srv/vmksft/testing/wt-24/tools/testing/selftests/net/lib/py/utils.py", line 238, in ip
# # Exception| return tool('ip', args, json=json, host=host)
# # Exception| File "/srv/vmksft/testing/wt-24/tools/testing/selftests/net/lib/py/utils.py", line 225, in tool
# # Exception| cmd_obj = cmd(cmd_str, ns=ns, host=host)
# # Exception| File "/srv/vmksft/testing/wt-24/tools/testing/selftests/net/lib/py/utils.py", line 91, in __init__
# # Exception| self.process(terminate=False, fail=fail, timeout=timeout)
# # Exception| ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# # Exception| File "/srv/vmksft/testing/wt-24/tools/testing/selftests/net/lib/py/utils.py", line 117, in process
# # Exception| raise CmdExitFailure("Command failed", self)
# # Exception| net.lib.py.utils.CmdExitFailure: Command failed
# # Exception| CMD: ip link add veth0 type veth peer name veth1
# # Exception| EXIT: 2
# # Exception| STDERR: Error: Unknown device type.
# # Exception|
# not ok 27 nk_qlease.test_veth_queue_create
I guess you can post it without waiting for this to be merged, it won't
conflict.
^ permalink raw reply
* Re: [PATCH] xfrm: fix memory leak in xfrm_add_policy()
From: Deepanshu Kartikey @ 2026-04-14 2:12 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: steffen.klassert, herbert, davem, edumazet, kuba, pabeni, horms,
leon, netdev, linux-kernel, syzbot+901d48e0b95aed4a2548
In-Reply-To: <adz_CeDItDjznfWo@krikkit>
On Mon, Apr 13, 2026 at 8:04 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
>
> Ok. Then you should wait 2 weeks until the merge window is over:
> https://lore.kernel.org/netdev/20260412142250.131bf997@kernel.org/
>
> and use "[PATCH ipsec-next]" as prefix for the cleanup patch (+ drop
> the syzbot references).
>
Hi Sabrina,
Thanks for the guidance. I have submitted patch v3.
Thanks
Deepanshu
^ permalink raw reply
* [PATCH net v5 2/2] pppoe: drop PFC frames
From: Qingfang Deng @ 2026-04-14 2:13 UTC (permalink / raw)
To: linux-ppp, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Qingfang Deng, Breno Leitao,
Sebastian Andrzej Siewior, Kees Cook, Kuniyuki Iwashima,
Guillaume Nault, Eric Woudstra, Simon Horman, Sam Protsenko,
netdev, linux-kernel
Cc: Paul Mackerras, Jaco Kroon, James Carlson, Wojciech Drewek,
Marcin Szycik
In-Reply-To: <20260414021353.23471-1-qingfang.deng@linux.dev>
RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT
RECOMMENDED for PPPoE. In practice, pppd does not support negotiating
PFC for PPPoE sessions, and the current PPPoE driver assumes an
uncompressed (2-byte) protocol field. However, the generic PPP layer
function ppp_input() is not aware of the negotiation result, and still
accepts PFC frames.
If a peer with a broken implementation or an attacker sends a frame with
a compressed (1-byte) protocol field, the subsequent PPP payload is
shifted by one byte. This causes the network header to be 4-byte
misaligned, which may trigger unaligned access exceptions on some
architectures.
To reduce the attack surface, drop PPPoE PFC frames. Introduce
ppp_skb_is_compressed_proto() helper function to be used in both
ppp_generic.c and pppoe.c to avoid open-coding.
Fixes: 7fb1b8ca8fa1 ("ppp: Move PFC decompression to PPP generic layer")
Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
---
Changes in v5: add Reviewed-by tag from Simon
Link to v4: https://lore.kernel.org/netdev/20260410033627.93786-2-qingfang.deng@linux.dev/
drivers/net/ppp/ppp_generic.c | 2 +-
drivers/net/ppp/pppoe.c | 8 +++++++-
include/linux/ppp_defs.h | 16 ++++++++++++++++
3 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index b097d1b38ac9..853da966ad46 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -2242,7 +2242,7 @@ ppp_do_recv(struct ppp *ppp, struct sk_buff *skb, struct channel *pch)
*/
static void __ppp_decompress_proto(struct sk_buff *skb)
{
- if (skb->data[0] & 0x01)
+ if (ppp_skb_is_compressed_proto(skb))
*(u8 *)skb_push(skb, 1) = 0x00;
}
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index d546a7af0d54..bdd61c504a1c 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -393,7 +393,7 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device *dev,
if (skb_mac_header_len(skb) < ETH_HLEN)
goto drop;
- if (!pskb_may_pull(skb, sizeof(struct pppoe_hdr)))
+ if (!pskb_may_pull(skb, PPPOE_SES_HLEN))
goto drop;
ph = pppoe_hdr(skb);
@@ -403,6 +403,12 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device *dev,
if (skb->len < len)
goto drop;
+ /* skb->data points to the PPP protocol header after skb_pull_rcsum.
+ * Drop PFC frames.
+ */
+ if (ppp_skb_is_compressed_proto(skb))
+ goto drop;
+
if (pskb_trim_rcsum(skb, len))
goto drop;
diff --git a/include/linux/ppp_defs.h b/include/linux/ppp_defs.h
index b7e57fdbd413..b1d1f46d7d3b 100644
--- a/include/linux/ppp_defs.h
+++ b/include/linux/ppp_defs.h
@@ -8,6 +8,7 @@
#define _PPP_DEFS_H_
#include <linux/crc-ccitt.h>
+#include <linux/skbuff.h>
#include <uapi/linux/ppp_defs.h>
#define PPP_FCS(fcs, c) crc_ccitt_byte(fcs, c)
@@ -25,4 +26,19 @@ static inline bool ppp_proto_is_valid(u16 proto)
return !!((proto & 0x0101) == 0x0001);
}
+/**
+ * ppp_skb_is_compressed_proto - checks if PPP protocol in a skb is compressed
+ * @skb: skb to check
+ *
+ * Check if the PPP protocol field is compressed (the least significant
+ * bit of the most significant octet is 1). skb->data must point to the PPP
+ * protocol header.
+ *
+ * Return: Whether the PPP protocol field is compressed.
+ */
+static inline bool ppp_skb_is_compressed_proto(const struct sk_buff *skb)
+{
+ return unlikely(skb->data[0] & 0x01);
+}
+
#endif /* _PPP_DEFS_H_ */
--
2.43.0
^ permalink raw reply related
* [PATCH net v5 1/2] flow_dissector: do not dissect PPPoE PFC frames
From: Qingfang Deng @ 2026-04-14 2:13 UTC (permalink / raw)
To: linux-ppp, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Qingfang Deng, Tony Nguyen,
Guillaume Nault, Wojciech Drewek, netdev, linux-kernel
Cc: Paul Mackerras, Jaco Kroon, James Carlson, Marcin Szycik
RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT
RECOMMENDED for PPPoE. In practice, pppd does not support negotiating
PFC for PPPoE sessions, and the flow dissector driver has assumed an
uncompressed frame until the blamed commit.
During the review process of that commit [1], support for PFC is
suggested. However, having a compressed (1-byte) protocol field means
the subsequent PPP payload is shifted by one byte, causing 4-byte
misalignment for the network header and an unaligned access exception
on some architectures.
The exception can be reproduced by sending a PPPoE PFC frame to an
ethernet interface of a MIPS board, with RPS enabled, even if no PPPoE
session is active on that interface:
$ 0 : 00000000 80c40000 00000000 85144817
$ 4 : 00000008 00000100 80a75758 81dc9bb8
$ 8 : 00000010 8087ae2c 0000003d 00000000
$12 : 000000e0 00000039 00000000 00000000
$16 : 85043240 80a75758 81dc9bb8 00006488
$20 : 0000002f 00000007 85144810 80a70000
$24 : 81d1bda0 00000000
$28 : 81dc8000 81dc9aa8 00000000 805ead08
Hi : 00009d51
Lo : 2163358a
epc : 805e91f0 __skb_flow_dissect+0x1b0/0x1b50
ra : 805ead08 __skb_get_hash_net+0x74/0x12c
Status: 11000403 KERNEL EXL IE
Cause : 40800010 (ExcCode 04)
BadVA : 85144817
PrId : 0001992f (MIPS 1004Kc)
Call Trace:
[<805e91f0>] __skb_flow_dissect+0x1b0/0x1b50
[<805ead08>] __skb_get_hash_net+0x74/0x12c
[<805ef330>] get_rps_cpu+0x1b8/0x3fc
[<805fca70>] netif_receive_skb_list_internal+0x324/0x364
[<805fd120>] napi_complete_done+0x68/0x2a4
[<8058de5c>] mtk_napi_rx+0x228/0xfec
[<805fd398>] __napi_poll+0x3c/0x1c4
[<805fd754>] napi_threaded_poll_loop+0x234/0x29c
[<805fd848>] napi_threaded_poll+0x8c/0xb0
[<80053544>] kthread+0x104/0x12c
[<80002bd8>] ret_from_kernel_thread+0x14/0x1c
Code: 02d51821 1060045b 00000000 <8c640000> 3084000f 2c820005 144001a2 00042080 8e220000
To reduce the attack surface and maintain performance, do not process
PPPoE PFC frames.
[1] https://patch.msgid.link/20220630231016.GA392@debian.home
Fixes: 46126db9c861 ("flow_dissector: Add PPPoE dissectors")
Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
---
Changes in v5: drop byte-swap change
Link to v4: https://lore.kernel.org/netdev/20260410033627.93786-1-qingfang.deng@linux.dev/
net/core/flow_dissector.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1b61bb25ba0e..f9aaba554128 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1374,16 +1374,8 @@ bool __skb_flow_dissect(const struct net *net,
break;
}
- /* least significant bit of the most significant octet
- * indicates if protocol field was compressed
- */
ppp_proto = ntohs(hdr->proto);
- if (ppp_proto & 0x0100) {
- ppp_proto = ppp_proto >> 8;
- nhoff += PPPOE_SES_HLEN - 1;
- } else {
- nhoff += PPPOE_SES_HLEN;
- }
+ nhoff += PPPOE_SES_HLEN;
if (ppp_proto == PPP_IP) {
proto = htons(ETH_P_IP);
--
2.43.0
^ permalink raw reply related
* [PATCH 6.1.y] Revert "wifi: cfg80211: stop NAN and P2P in cfg80211_leave"
From: guocai.he.cn @ 2026-04-14 2:16 UTC (permalink / raw)
To: stable; +Cc: gregkh, johannes.berg, netdev, regressions,
miriam.rachel.korenblit
From: Guocai He <guocai.he.cn@windriver.com>
This reverts commit 0c4f1c02d27a880b10b58c63f574f13bed4f711d which is commit
e1696c8bd0056bc1a5f7766f58ac333adc203e8a upstream.
The reverted patch introduced a deadlock. The locking situation in mainline is
totally different, so it is incorrect to directly backport the commit from mainline.
Signed-off-by: Guocai He <guocai.he.cn@windriver.com>
---
net/wireless/core.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/wireless/core.c b/net/wireless/core.c
index e75326932c32..2a6a8bdfa724 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1328,10 +1328,8 @@ void __cfg80211_leave(struct cfg80211_registered_device *rdev,
__cfg80211_leave_ocb(rdev, dev);
break;
case NL80211_IFTYPE_P2P_DEVICE:
- cfg80211_stop_p2p_device(rdev, wdev);
- break;
case NL80211_IFTYPE_NAN:
- cfg80211_stop_nan(rdev, wdev);
+ /* cannot happen, has no netdev */
break;
case NL80211_IFTYPE_AP_VLAN:
case NL80211_IFTYPE_MONITOR:
--
2.34.1
^ permalink raw reply related
* Re: [PATCH net-next] net: shaper: Reject zero weight in shaper config
From: Mohsin Bashir @ 2026-04-14 2:25 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, ast, chuck.lever, davem, donald.hunter, edumazet, horms,
linux-kernel, matttbe, pabeni
In-Reply-To: <20260413145039.43f7b162@kernel.org>
On 4/13/26 2:50 PM, Jakub Kicinski wrote:
> On Fri, 10 Apr 2026 15:51:23 -0700 Mohsin Bashir wrote:
>> A zero weight is meaningless for DWRR scheduling and can cause
>> starvation of the affected node. Add a min-value constraint to
>> the weight attribute in the net_shaper netlink spec so that zero
>> is rejected at the netlink policy level.
>>
>> Found while prototyping a new driver, existing drivers are not
>> affected.
>
> AI review points out that if the netlink attr is not present core will
> leave the DWRR weight as 0 in the struct. I guess we need to think this
> thru a little more carefully. What should the "default" weight be?
> What if user specifies weights only for subset of leaves?
>
> This part of the uAPI seems under-defined.
>
> Maybe a better adjustment would be to make core set the weight to 1
> automatically if the user has not defined it? Only when sending it to
> the driver tho, because we'd still want it to not be reported back to
> user space. Not sure how hairy it'd get code-wise.
Interesting!!
Let me look at the big picture here and re-spin.
^ permalink raw reply
* [PATCH 6.6.y] Revert "wifi: cfg80211: stop NAN and P2P in cfg80211_leave"
From: guocai.he.cn @ 2026-04-14 2:46 UTC (permalink / raw)
To: gregkh
Cc: stable, johannes.berg, netdev, regressions,
miriam.rachel.korenblit, linux-kernel
From: Guocai He <guocai.he.cn@windriver.com>
This reverts commit 4d7a05da767e5cbcf4db511b9289d7ebd380dc56 which is commit
e1696c8bd0056bc1a5f7766f58ac333adc203e8a upstream.
The reverted patch introduced a deadlock. The locking situation in mainline is
totally different, so it is incorrect to directly backport the commit from mainline.
Signed-off-by: Guocai He <guocai.he.cn@windriver.com>
---
net/wireless/core.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/wireless/core.c b/net/wireless/core.c
index fac19dab23c6..d07c4baa32d9 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1332,10 +1332,8 @@ void __cfg80211_leave(struct cfg80211_registered_device *rdev,
__cfg80211_leave_ocb(rdev, dev);
break;
case NL80211_IFTYPE_P2P_DEVICE:
- cfg80211_stop_p2p_device(rdev, wdev);
- break;
case NL80211_IFTYPE_NAN:
- cfg80211_stop_nan(rdev, wdev);
+ /* cannot happen, has no netdev */
break;
case NL80211_IFTYPE_AP_VLAN:
case NL80211_IFTYPE_MONITOR:
--
2.34.1
^ permalink raw reply related
* [PATCH 6.1.y] netfilter: conntrack: add missing netlink policy validations
From: Li hongliang @ 2026-04-14 2:59 UTC (permalink / raw)
To: gregkh, stable, fw
Cc: patches, linux-kernel, pablo, kadlec, davem, edumazet, kuba,
pabeni, horms, kaber, netfilter-devel, coreteam, netdev, imv4bel
From: Florian Westphal <fw@strlen.de>
[ Upstream commit f900e1d77ee0ef87bfb5ab3fe60f0b3d8ad5ba05 ]
Hyunwoo Kim reports out-of-bounds access in sctp and ctnetlink.
These attributes are used by the kernel without any validation.
Extend the netlink policies accordingly.
Quoting the reporter:
nlattr_to_sctp() assigns the user-supplied CTA_PROTOINFO_SCTP_STATE
value directly to ct->proto.sctp.state without checking that it is
within the valid range. [..]
and: ... with exp->dir = 100, the access at
ct->master->tuplehash[100] reads 5600 bytes past the start of a
320-byte nf_conn object, causing a slab-out-of-bounds read confirmed by
UBSAN.
Fixes: 076a0ca02644 ("netfilter: ctnetlink: add NAT support for expectations")
Fixes: a258860e01b8 ("netfilter: ctnetlink: add full support for SCTP to ctnetlink")
Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Li hongliang <1468888505@139.com>
---
net/netfilter/nf_conntrack_netlink.c | 2 +-
net/netfilter/nf_conntrack_proto_sctp.c | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 89cec02de68b..bcbd77608365 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -3458,7 +3458,7 @@ ctnetlink_change_expect(struct nf_conntrack_expect *x,
#if IS_ENABLED(CONFIG_NF_NAT)
static const struct nla_policy exp_nat_nla_policy[CTA_EXPECT_NAT_MAX+1] = {
- [CTA_EXPECT_NAT_DIR] = { .type = NLA_U32 },
+ [CTA_EXPECT_NAT_DIR] = NLA_POLICY_MAX(NLA_BE32, IP_CT_DIR_REPLY),
[CTA_EXPECT_NAT_TUPLE] = { .type = NLA_NESTED },
};
#endif
diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
index 7ffd698497f2..90458799324e 100644
--- a/net/netfilter/nf_conntrack_proto_sctp.c
+++ b/net/netfilter/nf_conntrack_proto_sctp.c
@@ -600,7 +600,8 @@ static int sctp_to_nlattr(struct sk_buff *skb, struct nlattr *nla,
}
static const struct nla_policy sctp_nla_policy[CTA_PROTOINFO_SCTP_MAX+1] = {
- [CTA_PROTOINFO_SCTP_STATE] = { .type = NLA_U8 },
+ [CTA_PROTOINFO_SCTP_STATE] = NLA_POLICY_MAX(NLA_U8,
+ SCTP_CONNTRACK_HEARTBEAT_SENT),
[CTA_PROTOINFO_SCTP_VTAG_ORIGINAL] = { .type = NLA_U32 },
[CTA_PROTOINFO_SCTP_VTAG_REPLY] = { .type = NLA_U32 },
};
--
2.34.1
^ permalink raw reply related
* [PATCH 6.6.y] netfilter: conntrack: add missing netlink policy validations
From: Li hongliang @ 2026-04-14 2:59 UTC (permalink / raw)
To: gregkh, stable, fw
Cc: patches, linux-kernel, pablo, kadlec, davem, edumazet, kuba,
pabeni, horms, kaber, netfilter-devel, coreteam, netdev, imv4bel
From: Florian Westphal <fw@strlen.de>
[ Upstream commit f900e1d77ee0ef87bfb5ab3fe60f0b3d8ad5ba05 ]
Hyunwoo Kim reports out-of-bounds access in sctp and ctnetlink.
These attributes are used by the kernel without any validation.
Extend the netlink policies accordingly.
Quoting the reporter:
nlattr_to_sctp() assigns the user-supplied CTA_PROTOINFO_SCTP_STATE
value directly to ct->proto.sctp.state without checking that it is
within the valid range. [..]
and: ... with exp->dir = 100, the access at
ct->master->tuplehash[100] reads 5600 bytes past the start of a
320-byte nf_conn object, causing a slab-out-of-bounds read confirmed by
UBSAN.
Fixes: 076a0ca02644 ("netfilter: ctnetlink: add NAT support for expectations")
Fixes: a258860e01b8 ("netfilter: ctnetlink: add full support for SCTP to ctnetlink")
Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Li hongliang <1468888505@139.com>
---
net/netfilter/nf_conntrack_netlink.c | 2 +-
net/netfilter/nf_conntrack_proto_sctp.c | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 9b089cdfcd35..255996f43d85 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -3454,7 +3454,7 @@ ctnetlink_change_expect(struct nf_conntrack_expect *x,
#if IS_ENABLED(CONFIG_NF_NAT)
static const struct nla_policy exp_nat_nla_policy[CTA_EXPECT_NAT_MAX+1] = {
- [CTA_EXPECT_NAT_DIR] = { .type = NLA_U32 },
+ [CTA_EXPECT_NAT_DIR] = NLA_POLICY_MAX(NLA_BE32, IP_CT_DIR_REPLY),
[CTA_EXPECT_NAT_TUPLE] = { .type = NLA_NESTED },
};
#endif
diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
index 4cc97f971264..fabb2c1ca00a 100644
--- a/net/netfilter/nf_conntrack_proto_sctp.c
+++ b/net/netfilter/nf_conntrack_proto_sctp.c
@@ -587,7 +587,8 @@ static int sctp_to_nlattr(struct sk_buff *skb, struct nlattr *nla,
}
static const struct nla_policy sctp_nla_policy[CTA_PROTOINFO_SCTP_MAX+1] = {
- [CTA_PROTOINFO_SCTP_STATE] = { .type = NLA_U8 },
+ [CTA_PROTOINFO_SCTP_STATE] = NLA_POLICY_MAX(NLA_U8,
+ SCTP_CONNTRACK_HEARTBEAT_SENT),
[CTA_PROTOINFO_SCTP_VTAG_ORIGINAL] = { .type = NLA_U32 },
[CTA_PROTOINFO_SCTP_VTAG_REPLY] = { .type = NLA_U32 },
};
--
2.34.1
^ permalink raw reply related
* [PATCH iwl-next v2 0/2] Introduce IDPF PCI callbacks
From: Emil Tantilov @ 2026-04-14 3:16 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, przemyslaw.kitszel, jay.bhat, ivan.d.barrera,
aleksandr.loktionov, larysa.zaremba, anthony.l.nguyen,
andrew+netdev, davem, edumazet, kuba, pabeni, aleksander.lobakin,
linux-pci, madhu.chittim, decot, willemb, sheenamo, lukas
This series implements PCI callbacks for the purpose of handling FLR and
PCI errors in the IDPF driver.
The first patch removes the conditional deinitialization of the mailbox in
the idpf_vc_core_deinit() function. Aside from being redundant, due to the
shutdown of the mailbox after a reset is detected, the check was also
preventing the driver from sending messages to stop and disable the vports
and queues on FW side, which is needed for the prepare phase of the FLR
handling.
The second patch implements the PCI callbacks. The logic here follows
the reset handling done in idpf_init_hard_reset(), but is split in
prepare and resume phases, where idpf_reset_prepare() stops all driver
operations and the resume callback attempt to recover following the
reset or the PCI error event.
Testing hints:
1. FLR via sysfs:
echo 1 > /sys/class/net/<ifname>/device/reset
Previously this would have been handled by idpf_init_hard_reset() as the
driver detects the reset. Now it will be done by the PCI err callbacks,
so this is the easiest way to test the reset_prepare/resume path.
2. PCI errors can be tested with aer-inject:
./aer-inject -s 83:00.0 examples/<error_type>
3. Stress testing can be done by combining various callbacks with the
reset from step 1:
echo 1 > /sys/class/net/<if>/device/reset& ethtool -L <if> combined 8
ethtool -L <if> combined 16& echo 1 > /sys/class/net/<if>/device/reset
Changelog:
v1->v2:
- Removed the call to pci_save_state() from idpf_pci_err_slot_reset(),
as it is no longer needed after pci_restore_state(). Suggested by
Lukas Wunner.
v1:
https://lore.kernel.org/netdev/20260411003959.30959-1-emil.s.tantilov@intel.com/
Emil Tantilov (2):
idpf: remove conditonal MBX deinit from idpf_vc_core_deinit()
idpf: implement pci error handlers
drivers/net/ethernet/intel/idpf/idpf.h | 3 +
drivers/net/ethernet/intel/idpf/idpf_lib.c | 13 +-
drivers/net/ethernet/intel/idpf/idpf_main.c | 112 ++++++++++++++++++
.../net/ethernet/intel/idpf/idpf_virtchnl.c | 11 +-
4 files changed, 127 insertions(+), 12 deletions(-)
--
2.37.3
^ permalink raw reply
* [PATCH iwl-next v2 1/2] idpf: remove conditonal MBX deinit from idpf_vc_core_deinit()
From: Emil Tantilov @ 2026-04-14 3:16 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, przemyslaw.kitszel, jay.bhat, ivan.d.barrera,
aleksandr.loktionov, larysa.zaremba, anthony.l.nguyen,
andrew+netdev, davem, edumazet, kuba, pabeni, aleksander.lobakin,
linux-pci, madhu.chittim, decot, willemb, sheenamo, lukas
In-Reply-To: <20260414031631.2107-1-emil.s.tantilov@intel.com>
Previously it was assumed that idpf_vc_core_deinit() is always being
called during reset handling, with remove being an exception. Ideally
the driver needs to communicate the changes to FW in all instances where
the MBX is not already disabled. Remove the remove_in_prog check from
idpf_vc_core_deinit() as the MBX was already disabled while handling the
reset via libie_ctlq_xn_shutdown() by the service task. This is also
needed by the following patch, introducing PCI callbacks support.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Jay Bhat <jay.bhat@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
---
drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 129c8f6b0faa..fceaf3ec1cd4 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -3229,24 +3229,15 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
*/
void idpf_vc_core_deinit(struct idpf_adapter *adapter)
{
- bool remove_in_prog;
-
if (!test_bit(IDPF_VC_CORE_INIT, adapter->flags))
return;
- /* Avoid transaction timeouts when called during reset */
- remove_in_prog = test_bit(IDPF_REMOVE_IN_PROG, adapter->flags);
- if (!remove_in_prog)
- idpf_deinit_dflt_mbx(adapter);
-
idpf_ptp_release(adapter);
idpf_deinit_task(adapter);
idpf_idc_deinit_core_aux_device(adapter);
idpf_rel_rx_pt_lkup(adapter);
idpf_intr_rel(adapter);
-
- if (remove_in_prog)
- idpf_deinit_dflt_mbx(adapter);
+ idpf_deinit_dflt_mbx(adapter);
cancel_delayed_work_sync(&adapter->serv_task);
--
2.37.3
^ permalink raw reply related
* [PATCH iwl-next v2 2/2] idpf: implement pci error handlers
From: Emil Tantilov @ 2026-04-14 3:16 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, przemyslaw.kitszel, jay.bhat, ivan.d.barrera,
aleksandr.loktionov, larysa.zaremba, anthony.l.nguyen,
andrew+netdev, davem, edumazet, kuba, pabeni, aleksander.lobakin,
linux-pci, madhu.chittim, decot, willemb, sheenamo, lukas
In-Reply-To: <20260414031631.2107-1-emil.s.tantilov@intel.com>
Add callbacks to handle PCI errors and FLR reset. When preparing to handle
reset on the bus, the driver must stop all operations that can lead to MMIO
access in order to prevent HW errors. To accomplish this introduce helper
idpf_reset_prepare() that gets called prior to FLR or when PCI error is
detected. Upon resume the recovery is done through the existing reset path
by starting the event task.
The following callbacks are implemented:
.reset_prepare runs the first portion of the generic reset path leading up
to the part where we wait for the reset to complete.
.reset_done/resume runs the recovery part of the reset handling.
.error_detected is the callback dealing with PCI errors, similar to the
prepare call, we stop all operations, prior to attempting a recovery.
.slot_reset is the callback attempting to restore the device, provided a
PCI reset was initiated by the AER driver.
Whereas previously the init logic guaranteed netdevs during reset, the
addition of idpf_detach_and_close() to the PCI callbacks flow makes it
possible for the function to be called without netdevs. Add check to
avoid NULL pointer dereference in that case.
Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Jay Bhat <jay.bhat@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
---
drivers/net/ethernet/intel/idpf/idpf.h | 3 +
drivers/net/ethernet/intel/idpf/idpf_lib.c | 13 ++-
drivers/net/ethernet/intel/idpf/idpf_main.c | 112 ++++++++++++++++++++
3 files changed, 126 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
index 1d0e32e47e87..164d2f3e233a 100644
--- a/drivers/net/ethernet/intel/idpf/idpf.h
+++ b/drivers/net/ethernet/intel/idpf/idpf.h
@@ -88,6 +88,7 @@ enum idpf_state {
* @IDPF_REMOVE_IN_PROG: Driver remove in progress
* @IDPF_MB_INTR_MODE: Mailbox in interrupt mode
* @IDPF_VC_CORE_INIT: virtchnl core has been init
+ * @IDPF_PCI_CB_RESET: Reset via the PCI callbacks
* @IDPF_FLAGS_NBITS: Must be last
*/
enum idpf_flags {
@@ -97,6 +98,7 @@ enum idpf_flags {
IDPF_REMOVE_IN_PROG,
IDPF_MB_INTR_MODE,
IDPF_VC_CORE_INIT,
+ IDPF_PCI_CB_RESET,
IDPF_FLAGS_NBITS,
};
@@ -1012,4 +1014,5 @@ void idpf_idc_vdev_mtu_event(struct iidc_rdma_vport_dev_info *vdev_info,
int idpf_add_del_fsteer_filters(struct idpf_adapter *adapter,
struct virtchnl2_flow_rule_add_del *rule,
enum virtchnl2_op opcode);
+void idpf_detach_and_close(struct idpf_adapter *adapter);
#endif /* !_IDPF_H_ */
diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index 7988836fbae0..1e706beb0098 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -758,13 +758,16 @@ static int idpf_init_mac_addr(struct idpf_vport *vport,
return 0;
}
-static void idpf_detach_and_close(struct idpf_adapter *adapter)
+void idpf_detach_and_close(struct idpf_adapter *adapter)
{
int max_vports = adapter->max_vports;
for (int i = 0; i < max_vports; i++) {
struct net_device *netdev = adapter->netdevs[i];
+ if (!netdev)
+ continue;
+
/* If the interface is in detached state, that means the
* previous reset was not handled successfully for this
* vport.
@@ -1908,6 +1911,10 @@ static void idpf_init_hard_reset(struct idpf_adapter *adapter)
dev_info(dev, "Device HW Reset initiated\n");
+ /* Reset has already happened, skip to recovery. */
+ if (test_and_clear_bit(IDPF_PCI_CB_RESET, adapter->flags))
+ goto check_rst_complete;
+
/* Prepare for reset */
if (test_bit(IDPF_HR_DRV_LOAD, adapter->flags)) {
reg_ops->trigger_reset(adapter, IDPF_HR_DRV_LOAD);
@@ -1925,6 +1932,7 @@ static void idpf_init_hard_reset(struct idpf_adapter *adapter)
goto unlock_mutex;
}
+check_rst_complete:
/* Wait for reset to complete */
err = idpf_check_reset_complete(adapter, &adapter->reset_reg);
if (err) {
@@ -1984,7 +1992,8 @@ void idpf_vc_event_task(struct work_struct *work)
if (test_bit(IDPF_HR_FUNC_RESET, adapter->flags))
goto func_reset;
- if (test_bit(IDPF_HR_DRV_LOAD, adapter->flags))
+ if (test_bit(IDPF_HR_DRV_LOAD, adapter->flags) ||
+ test_bit(IDPF_PCI_CB_RESET, adapter->flags))
goto drv_load;
return;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
index d99f759c55e1..54fca25c09f7 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_main.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
@@ -234,6 +234,7 @@ static int idpf_cfg_device(struct idpf_adapter *adapter)
if (err)
pci_dbg(pdev, "PCIe PTM is not supported by PCIe bus/controller\n");
+ pci_save_state(pdev);
pci_set_drvdata(pdev, adapter);
return 0;
@@ -360,6 +361,116 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return err;
}
+static void idpf_reset_prepare(struct idpf_adapter *adapter)
+{
+ pci_dbg(adapter->pdev, "resetting\n");
+ set_bit(IDPF_HR_RESET_IN_PROG, adapter->flags);
+ cancel_delayed_work_sync(&adapter->serv_task);
+ cancel_delayed_work_sync(&adapter->vc_event_task);
+ idpf_detach_and_close(adapter);
+ idpf_idc_issue_reset_event(adapter->cdev_info);
+ idpf_vc_core_deinit(adapter);
+}
+
+/**
+ * idpf_pci_err_detected - PCI error detected, about to attempt recovery
+ * @pdev: PCI device struct
+ * @err: err detected
+ *
+ * Return: %PCI_ERS_RESULT_NEED_RESET to attempt recovery,
+ * %PCI_ERS_RESULT_DISCONNECT if recovery is not possible.
+ */
+static pci_ers_result_t
+idpf_pci_err_detected(struct pci_dev *pdev, pci_channel_state_t err)
+{
+ struct idpf_adapter *adapter = pci_get_drvdata(pdev);
+
+ /* Shutdown the mailbox if PCI I/O is in a bad state to avoid MBX
+ * timeouts during the prepare stage.
+ */
+ if (pci_channel_offline(pdev))
+ libie_ctlq_xn_shutdown(adapter->xnm);
+
+ idpf_reset_prepare(adapter);
+
+ if (err == pci_channel_io_perm_failure)
+ return PCI_ERS_RESULT_DISCONNECT;
+
+ /* When called due to PCI error, driver will have to force PFR on
+ * resume, in order to complete the recovery via the event task.
+ */
+ set_bit(IDPF_PCI_CB_RESET, adapter->flags);
+
+ return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * idpf_pci_err_slot_reset - PCI undergoing reset
+ * @pdev: PCI device struct
+ *
+ * Reset PCI state and use a register read to see if we're good.
+ *
+ * Return: %PCI_ERS_RESULT_RECOVERED on success,
+ * %PCI_ERS_RESULT_DISCONNECT on failure.
+ */
+static pci_ers_result_t
+idpf_pci_err_slot_reset(struct pci_dev *pdev)
+{
+ struct idpf_adapter *adapter = pci_get_drvdata(pdev);
+
+ pci_restore_state(pdev);
+ pci_set_master(pdev);
+ pci_wake_from_d3(pdev, false);
+ if (readl(adapter->reset_reg.rstat) != 0xFFFFFFFF)
+ return PCI_ERS_RESULT_RECOVERED;
+
+ return PCI_ERS_RESULT_DISCONNECT;
+}
+
+/**
+ * idpf_pci_err_resume - Resume operations after PCI error recovery
+ * @pdev: PCI device struct
+ */
+static void idpf_pci_err_resume(struct pci_dev *pdev)
+{
+ struct idpf_adapter *adapter = pci_get_drvdata(pdev);
+
+ /* Force a PFR when resuming from PCI error. */
+ if (test_and_set_bit(IDPF_PCI_CB_RESET, adapter->flags))
+ adapter->dev_ops.reg_ops.trigger_reset(adapter, IDPF_HR_FUNC_RESET);
+
+ queue_delayed_work(adapter->vc_event_wq,
+ &adapter->vc_event_task,
+ msecs_to_jiffies(300));
+}
+
+/**
+ * idpf_pci_err_reset_prepare - Prepare driver for PCI reset
+ * @pdev: PCI device struct
+ */
+static void idpf_pci_err_reset_prepare(struct pci_dev *pdev)
+{
+ idpf_reset_prepare(pci_get_drvdata(pdev));
+}
+
+/**
+ * idpf_pci_err_reset_done - PCI err reset recovery complete
+ * @pdev: PCI device struct
+ */
+static void idpf_pci_err_reset_done(struct pci_dev *pdev)
+{
+ pci_dbg(pdev, "reset: done\n");
+ idpf_pci_err_resume(pdev);
+}
+
+static const struct pci_error_handlers idpf_pci_err_handler = {
+ .error_detected = idpf_pci_err_detected,
+ .slot_reset = idpf_pci_err_slot_reset,
+ .reset_prepare = idpf_pci_err_reset_prepare,
+ .reset_done = idpf_pci_err_reset_done,
+ .resume = idpf_pci_err_resume,
+};
+
/* idpf_pci_tbl - PCI Dev idpf ID Table
*/
static const struct pci_device_id idpf_pci_tbl[] = {
@@ -377,5 +488,6 @@ static struct pci_driver idpf_driver = {
.sriov_configure = idpf_sriov_configure,
.remove = idpf_remove,
.shutdown = idpf_shutdown,
+ .err_handler = &idpf_pci_err_handler,
};
module_pci_driver(idpf_driver);
--
2.37.3
^ permalink raw reply related
* [PATCH 6.18.y] netfilter: conntrack: add missing netlink policy validations
From: Li hongliang @ 2026-04-14 3:31 UTC (permalink / raw)
To: gregkh, stable, fw
Cc: patches, linux-kernel, pablo, kadlec, davem, edumazet, kuba,
pabeni, horms, kaber, netfilter-devel, coreteam, netdev, imv4bel
From: Florian Westphal <fw@strlen.de>
[ Upstream commit f900e1d77ee0ef87bfb5ab3fe60f0b3d8ad5ba05 ]
Hyunwoo Kim reports out-of-bounds access in sctp and ctnetlink.
These attributes are used by the kernel without any validation.
Extend the netlink policies accordingly.
Quoting the reporter:
nlattr_to_sctp() assigns the user-supplied CTA_PROTOINFO_SCTP_STATE
value directly to ct->proto.sctp.state without checking that it is
within the valid range. [..]
and: ... with exp->dir = 100, the access at
ct->master->tuplehash[100] reads 5600 bytes past the start of a
320-byte nf_conn object, causing a slab-out-of-bounds read confirmed by
UBSAN.
Fixes: 076a0ca02644 ("netfilter: ctnetlink: add NAT support for expectations")
Fixes: a258860e01b8 ("netfilter: ctnetlink: add full support for SCTP to ctnetlink")
Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Li hongliang <1468888505@139.com>
---
net/netfilter/nf_conntrack_netlink.c | 2 +-
net/netfilter/nf_conntrack_proto_sctp.c | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 879413b9fa06..2bb9eb2d25fb 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -3465,7 +3465,7 @@ ctnetlink_change_expect(struct nf_conntrack_expect *x,
#if IS_ENABLED(CONFIG_NF_NAT)
static const struct nla_policy exp_nat_nla_policy[CTA_EXPECT_NAT_MAX+1] = {
- [CTA_EXPECT_NAT_DIR] = { .type = NLA_U32 },
+ [CTA_EXPECT_NAT_DIR] = NLA_POLICY_MAX(NLA_BE32, IP_CT_DIR_REPLY),
[CTA_EXPECT_NAT_TUPLE] = { .type = NLA_NESTED },
};
#endif
diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
index 7c6f7c9f7332..645d2c43ebf7 100644
--- a/net/netfilter/nf_conntrack_proto_sctp.c
+++ b/net/netfilter/nf_conntrack_proto_sctp.c
@@ -582,7 +582,8 @@ static int sctp_to_nlattr(struct sk_buff *skb, struct nlattr *nla,
}
static const struct nla_policy sctp_nla_policy[CTA_PROTOINFO_SCTP_MAX+1] = {
- [CTA_PROTOINFO_SCTP_STATE] = { .type = NLA_U8 },
+ [CTA_PROTOINFO_SCTP_STATE] = NLA_POLICY_MAX(NLA_U8,
+ SCTP_CONNTRACK_HEARTBEAT_SENT),
[CTA_PROTOINFO_SCTP_VTAG_ORIGINAL] = { .type = NLA_U32 },
[CTA_PROTOINFO_SCTP_VTAG_REPLY] = { .type = NLA_U32 },
};
--
2.34.1
^ permalink raw reply related
* [PATCH 5.15.y] Revert "wifi: cfg80211: stop NAN and P2P in cfg80211_leave"
From: guocai.he.cn @ 2026-04-14 3:20 UTC (permalink / raw)
To: gregkh
Cc: stable, johannes.berg, netdev, regressions,
miriam.rachel.korenblit, linux-kernel
From: Guocai He <guocai.he.cn@windriver.com>
This reverts commit 31344ffecd7a34335ce2b52e8c205bce3cbfca4b which is commit
e1696c8bd0056bc1a5f7766f58ac333adc203e8a upstream.
The reverted patch introduced a deadlock. The locking situation in mainline is
totally different, so it is incorrect to directly backport the commit from mainline.
Signed-off-by: Guocai He <guocai.he.cn@windriver.com>
---
net/wireless/core.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 22e6fd12f201..58b91e9647c2 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1300,10 +1300,8 @@ void __cfg80211_leave(struct cfg80211_registered_device *rdev,
__cfg80211_leave_ocb(rdev, dev);
break;
case NL80211_IFTYPE_P2P_DEVICE:
- cfg80211_stop_p2p_device(rdev, wdev);
- break;
case NL80211_IFTYPE_NAN:
- cfg80211_stop_nan(rdev, wdev);
+ /* cannot happen, has no netdev */
break;
case NL80211_IFTYPE_AP_VLAN:
case NL80211_IFTYPE_MONITOR:
--
2.34.1
^ permalink raw reply related
* Re: [PATCH v11 net-next 5/7] octeontx2-af: npc: cn20k: add subbank search order control
From: Ratheesh Kannoth @ 2026-04-14 3:46 UTC (permalink / raw)
To: Paolo Abeni
Cc: netdev, linux-kernel, linux-rdma, sgoutham, andrew+netdev, davem,
edumazet, kuba, donald.hunter, horms, jiri, chuck.lever, matttbe,
cjubran, saeedm, leon, tariqt, mbloch, dtatulea
In-Reply-To: <b9ffa72d-ebe2-4fd1-b668-93620f206179@redhat.com>
On 2026-04-13 at 18:26:00, Paolo Abeni (pabeni@redhat.com) wrote:
> > + xa_for_each(&npc_priv.xa_sb_free, index, v) {
> > + val = xa_to_value(v);
> > + fslots[fcnt][0] = index;
> > + fslots[fcnt][1] = val;
> > + xa_erase(&npc_priv.xa_sb_free, index);
> > + fcnt++;
> > + }
> > +
> > + /* xa_store() is done under lock. If xa_store fails
> > + * ,no rollback is planned as it might also fail.
>
> Why do you need to go throuh erase and add loop? Why can't you directly
> xa_store() the new value? Note that xa_store() can fail due to memory
> pressure.
>
> Avoiding the previous erase will prevent deallocation and re allocation
> and will avoid any reasonable xa_store() failure.
ACK.
>
> AFAICS there are a few more items reported by sashiko, please have a look:
>
> https://sashiko.dev/#/patchset/20260409025055.1664053-1-rkannoth%40marvell.com
>
> /P
>
Patch 1: [PATCH v11 net-next 1/7] octeontx2-af: npc: cn20k: debugfs enhancements
>"+static u64 dstats[MAX_NUM_BANKS][MAX_SUBBANK_DEPTH * MAX_NUM_SUB_BANKS] = {};
>Is it safe to use a static global array here? If multiple RVU AF devices exist
>in the system, it looks like they might share this array and overwrite each
>other's hardware statistics."
There is only one single rvu AF device in the system. Ignore the comment.
as it is false positive.
>"Also, since there are no locks, could concurrent reads of the debugfs file by
>multiple threads cause a data race and corrupt the stored statistics?"
ACK. added a mutex lock.
>"Since en_map is added to the global npc_priv structure, is it ever explicitly
>cleared during device initialization?"
npc_priv is defined as static. So all fields are set to 0, by default.
Anyway, added bitmap clear to get rid of this AI review warning.
>"Does updating dstats inline during the seq_file show callback break the
>statistics if the buffer overflows?"
ACK. Used seq open with size() API to set required size of buffer at the start.
Added a check to see if there is a buffer over flow by counting buffers wrote into
seq_file. dstats are "delta stats", and used for debugging.
>"Does updating dstats inline during the seq_file show callback break the
>statistics if the buffer overflows?Does updating dstats inline during the seq_file show callback break the
>statistics if the buffer overflows?"
ACK.
>"When seq_read detects the overflow, it allocates a larger buffer and calls
>this show callback again from the beginning. During the retry,
>stats == dstats[bank][idx] will evaluate to true because the array was
>already updated, which skips all the previously overflowing entries."
ACK.
>"This isn't a bug, but using the ternary operator for its void side-effects
>is generally avoided in the kernel. Could this be written using an if/else
>block, or the assign_bit() macro instead?"
ACK.
>"Since en_map is added to the global npc_priv structure, is it ever explicitly
>cleared during device initialization?
>If the PCI device is unbound and rebound via sysfs, could en_map retain bits
>from the previous lifecycle and cause the new mismatch debugfs file to falsely
>report old enabled rules as mismatched?"
ACK. Reset bitmap to zero on deinit.
Patch 2: [PATCH v11 net-next 2/7] net/mlx5e: heap-allocate devlink param values
>Could we avoid the dynamic allocation entirely? Instead of storing all four
>values at once, could a single union devlink_param_value be declared inside
>the loop, populated, and immediately assigned to the respective config field?
>This would reduce the stack usage to a single union, avoiding both the frame
>size warning and the new memory allocation failure path."
ACK.
Patch 3: [PATCH v11 net-next 3/7] devlink: Change function syntax.
>"The commit message notes that exported APIs will be updated to pass the
>unions by pointer. Is this incomplete? Exported APIs like
>devl_param_driverinit_value_set and the validate callback in struct
>devlink_param still appear to take the union by value, which might continue
>to push the large structure onto the stack during validation."
ACK.
>"This isn't a bug, but it looks like an automated search and replace
>accidentally modified the word "values" in this comment."
ACK.
Patch 5: [PATCH v11 net-next 5/7] octeontx2-af: npc: cn20k: add subbank search order control
>Does this code overflow uslots[]?
>Looking at npc_priv_init(), num_subbanks is populated directly from the
>hardware register NPC_AF_CONST2. While it ensures the value is a power of 2,
>there doesn't seem to be a check that restricts it to MAX_NUM_SUB_BANKS.
>If the hardware reports a value larger than 32, ucnt could exceed the
>bounds of the stack-allocated uslots array.
ACK.
>Can this approach cause a permanent leak of the erased subbanks?
>If any xa_store() call fails, the function jumps to fail_used and returns
>without rolling back the state. Since the subbanks were already erased from
>the xarray into the local stack arrays, failing to re-insert them leaves the
>hardware allocator state partially empty and corrupts it for the lifetime of
>the system.
>Would it be safer to pre-allocate with xa_reserve(), or build new xarrays and
>swap them in on success?
Used xa_store without xa_erase. If xa_store fails due to memory pressure, error is
unrecoverable as this error can happen in between (for loop).
[PATCH v11 net-next 6/7] octeontx2-af: npc: cn20k: dynamically allocate and free default MCAM entries
>"The NIX_LF_DONT_FREE_DFT_IDXS flag is read directly from the mailbox message
>structure. Since mailbox messages can be initiated by Virtual Functions, what
>prevents an unprivileged VF from repeatedly allocating and freeing LFs with
>this flag set to exhaust the hardware MCAM entries?"
rvu_mbox_handler_nix_lf_alloc() ---> npc_cn20k_dft_rules_alloc(). npc_cn20k_dft_rules_alloc() will alloc default entries if and only if it is not already there. So if
VF calls alloc and free LF without setting the flag, the default entries wont be allocated or freed.
>Are the values in ptr[] virtual or physical MCAM indices? It appears that
>npc_cn20k_dft_rules_idx_get() retrieves virtual indices, but they are used
>here as direct indices into mcam->entry2pfvf_map and passed to
>npc_mcam_clear_bit(). Since those structures are sized for physical indices,
>could this cause an out-of-bounds memory corruption or an integer underflow?
default entries are allways allocated by setting "ref_entry" field in struct npc_mcam_alloc_entry_req. Then, rvu_mbox_handler_npc_mcam_alloc_entry() wont return a virtual
mcam index.
>If xa_erase() fails above and returns NULL, ptr[i] is not cleared and the
>code falls through to the free_rules label. Will this result in
>unconditionally calling npc_cn20k_idx_free() on the stale index, potentially
>causing a double-free?
ACK.
>Furthermore, if a VF manually frees its default MCAM rules via the
>NPC_MCAM_FREE_ENTRY mailbox command before this NIX LF teardown occurs,
>npc_cn20k_idx_free() will be called during that manual free. Since the manual
>free does not remove the index from xa_pf2dfl_rmap, could this teardown path
>fetch the same index and attempt to free it again?
default mcam rules are allocated in rvu_mbox_handler_nix_lf_alloc(). Not thru
NPC_MCAM_FREE_ENTRY. if it does intentionally, then it is a violation. we have dev_err() there, and need to debug at User point.
>Does the caller of this function properly handle negative error codes?
>For example, in npc_enadis_default_mce_entry() and
>npc_enadis_default_entries(), the returned index is passed directly to
>npc_enable_mcam_entry() and nix_update_mce_list() without checking for a
>negative value. This could lead to a WARN(1) in npc_enable_mcam_entry() or an
>out-of-bounds write in nix_update_mce_list().
We intentionally did the change to find out the flow which pass wrong mcam index.
So we need a splat using WARN(1)
>Here, index is a physical index from the bitmap iteration, but the values
>returned into dft_idxs[] by npc_cn20k_dft_rules_idx_get() are virtual
>indices. Will this comparison always fail, causing default rules to be
>erroneously physically freed?
No. default indexes are not virtual. This is ensured during allocation itself.
>Additionally, if the NIX LF is freed with NIX_LF_DONT_FREE_DFT_IDXS to
>preserve default rules, the ownership mapping is cleared here.
ACK.
>Upon
>re-allocation, npc_cn20k_dft_rules_alloc() will detect the rules in
>xa_pf2dfl_rmap but won't restore the ownership in entry2pfvf_map, meaning
>subsequent operations on these rules will fail verification.
ACK.
>Does this make the firmware layout dependent on the internal size of
>ikpu_action_entries?
Yes.
>If future kernel versions add new packet kinds and increase the size of
>this array, older firmware files will fail this bounds check and be rejected.
struct npc_kpu_profile_fwdata does not have a field to indicate the size of ikpu_action_entries.
We can't modify the structure as it would break backward compatability on old fw.
>ill this trigger a compiler warning or build failure on strict builds?
>The min() macro performs strict type checking, and fw_kpu->entries appears
>to be a signed int, while rvu->hw->npc_kpu_entries is an unsigned u16.
ACK.
>Could a negative value in fw_kpu->entries cause an integer underflow here?
>If fw_kpu->entries is read from untrusted firmware as a negative value, the
>offset calculation can underflow the size_t offset variable.
>This would bypass the subsequent bounds check because the wrapped offset
>plus hdr_sz wraps again to a small positive value.
>On the next iteration, calculating fw_kpu = fw->data + offset could result
>in an out-of-bounds memory read.
Addded check to return on invalid value.
>Does modifying profile->kpu here corrupt the global default profile state?
>Earlier in the flow, profile->kpu is initialized to point to the global
>static array npc_kpu_profiles. Allocating device-managed memory into
>profile->kpu[kpu].cam2 overwrites this global state with device-specific
>pointers.
>When the device is unbound and the memory is freed, could this leave dangling
>pointers in the global array for other RVU devices in the system? The same
>applies to the legacy firmware parsing path where cam[entry] is overwritten.
>We are not using profile->kpu after unbind and memory is freed. During reinit, these
>fields are intialized again. So there is no issue with it.
>
>Could this printk formatter read past the end of the profile name?
>The name array in the firmware header is 32 bytes. If a user provides a
>firmware file with exactly 32 non-null characters, the string will lack a
>null terminator.
>Printing this with %s can leak adjacent heap memory contents into the kernel
>log. Using %.32s would ensure the read stays within bounds.
ACK.
>Do these fields require an endianness conversion before use?
>The 16-bit values like dp0, dp1, and dp2 are read directly from the firmware
>blob.
>If the firmware payload uses little-endian byte order, applying these
>directly to hardware registers could result in misprogramming on big-endian
>architectures. Would it be safer to use le16_to_cpu() here?
s/w is validated only for little endian as HW is little endian. if big endian required,
we will provide seperate firmware for the same.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox