* Re: [PATCH net-next] selftests/xsk: preserve UMEM view in bidi test
From: Maciej Fijalkowski @ 2026-06-25 10:35 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, bpf, magnus.karlsson, stfomichev, pabeni, horms,
tushar.vyavahare, kerneljasonxing
In-Reply-To: <20260624193326.295e3711@kernel.org>
On Wed, Jun 24, 2026 at 07:33:26PM -0700, Jakub Kicinski wrote:
> On Tue, 23 Jun 2026 11:10:08 +0200 Maciej Fijalkowski wrote:
> > Subject: [PATCH net-next] selftests/xsk: preserve UMEM view in bidi test
>
> Do you want it in net? Either way - we'll need a rebase
I have not checked if this has been -net propagated already, but the rule
of thumb on bpf side was that all selftests related effort goes to -next.
Is it different on netdev side?
>
> > Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com
>
> missing <> around the email
oof.
>
^ permalink raw reply
* Re: [PATCH v2 0/7] vmsplice: fix some problems in my previous vmsplice patchset
From: David Hildenbrand (Arm) @ 2026-06-25 10:35 UTC (permalink / raw)
To: Askar Safin
Cc: akpm, avagin, axboe, brauner, collin.funk1, david.laight.linux,
dhowells, fuse-devel, hch, jack, joannelkoong, kernel, linux-api,
linux-fsdevel, linux-kernel, linux-mm, luto, metze, miklos,
netdev, patches, pfalcato, torvalds, val, viro, w, willy
In-Reply-To: <20260625101132.3859505-1-safinaskar@gmail.com>
On 6/25/26 12:11, Askar Safin wrote:
> "David Hildenbrand (Arm)" <david@kernel.org>:
>> I think we concluded that we cannot rip out vmsplice that way at this point, and
>> I suspect that Christian will drop that topic branch from -next after -rc1.
>
> I think my patches still have a chance.
I talked to Christian and it doesn't sound like it.
--
Cheers,
David
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH net] igc: Fix RX HW timestamp reporting when NET_RX_BUSY_POLL is disabled
From: Ding Meng @ 2026-06-25 10:33 UTC (permalink / raw)
To: Paul Menzel
Cc: Florian Bezdeka, anthony.l.nguyen, przemyslaw.kitszel,
andrew+netdev, davem, edumazet, kuba, pabeni, jan.kiszka,
intel-wired-lan, linux-kernel, netdev, wq.wang
In-Reply-To: <40abd0b5-7f3f-4cd4-9975-9db4498d15d3@molgen.mpg.de>
Dear Paul,
Thanks for your comments.
On Mon, Jun 22, 2026 at 05:59:53PM +0200, Paul Menzel wrote:
> Am 22.06.26 um 06:13 schrieb Ding Meng via Intel-wired-lan:
> > When CONFIG_NET_RX_BUSY_POLL is deactivated, fetching RX HW timestamps
> > from the NIC no longer works as expected.
>
> Maybe paste some logs/errors, so it can be easier found by people with the
> same issue.
Will do.
> > This occurs because disabling CONFIG_NET_RX_BUSY_POLL disables the
> > SKB NAPI mapping in __skb_mark_napi_id(). Consequently, get_timestamp()
> > fails to perform its driver lookup, and the igc driver's struct
> > net_device_ops::ndo_get_tstamp is never invoked.
> >
> > Instead, get_timestamp() falls back to use shhwtstamps(skb)->hwtstamp,
> > a field that the driver has not populated.
> >
> > Fix this by populating the hwtstamp field with the correct timestamp
> > in the default timer when CONFIG_NET_RX_BUSY_POLL is disabled.
>
> Maybe detail, why the adapter needs to be passed now.
>
> Also, please describe a test case to check the change.
Will do.
> > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> > index 8ac16808023..1da8d7aa76d 100644
> > --- a/drivers/net/ethernet/intel/igc/igc_main.c
> > +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> > @@ -1992,7 +1992,26 @@ static struct sk_buff *igc_build_skb(struct igc_ring *rx_ring,
> > return skb;
> > }
> > -static struct sk_buff *igc_construct_skb(struct igc_ring *rx_ring,
> > +static void igc_construct_skb_timestamps(struct igc_adapter *adapter,
> > + struct sk_buff *skb,
> > + struct igc_xdp_buff *ctx)
> > +{
> > + if (!ctx->rx_ts)
> > + return;
> > +#ifdef CONFIG_NET_RX_BUSY_POLL
>
> Is there a way to do this in C instead of the pre-processor. That way all
> the code gets build tested. (Is there a config with disabled
> NET_RX_BUSY_POLL?)
>
How about defining a function to replace the pre-processor:
static inline bool is_net_rx_busy_poll()
{
#ifdef CONFIG_NET_RX_BUSY_POLL
return true;
#else
return false;
#endif
}
CONFIG_PREEMPT_RT=y && CONFIG_NETCONSOLE=y will cause NET_RX_BUSY_POLL
disabled.
Kind regards,
Ding Meng
^ permalink raw reply
* [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full
From: Vimal Agrawal @ 2026-06-25 10:20 UTC (permalink / raw)
To: netdev; +Cc: kuniyu, edumazet, vimal.agrawal, avimalin
In-Reply-To: <CALkUMdSCpx_ywYCx_ePLdm6yioO1nQWx7sSM=AEgsq0kywHxTw@mail.gmail.com>
Once the neighbour table exceeds gc_thresh3, neigh_forced_gc() is called
on every allocation attempt with no rate limiting. In workloads with mostly
active/reachable entries, the GC walk traverses a large portion of the
neighbour table without reclaiming entries, holding tbl->lock for an
extended period. This causes severe lock contention and allocation
latencies exceeding 16ms under sustained neighbour creation.
Add a pre-lock check in neigh_forced_gc() to skip the GC run if one was
performed within the last second, avoiding repeated full table scans and
lock acquisitions on the hot allocation path.
Profiling of neigh_create() shows ~3 orders of magnitude latency
improvement with this change.
Link:https://lore.kernel.org/netdev/CALkUMdSCpx_ywYCx_ePLdm6yioO1nQWx7sSM=AEgsq0kywHxTw@mail.gmail.com/
Signed-off-by: Vimal Agrawal <vimal.agrawal@sophos.com>
---
net/core/neighbour.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 1349c0eedb64..078842db3c5f 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -260,6 +260,9 @@ static int neigh_forced_gc(struct neigh_table *tbl)
int shrunk = 0;
int loop = 0;
+ if (!time_after(jiffies, READ_ONCE(tbl->last_flush) + HZ))
+ return 0;
+
NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
spin_lock_bh(&tbl->lock);
--
2.17.1
^ permalink raw reply related
* Re: [PATCH iwl v3] ice: retry reading NVM if admin queue returns EBUSY
From: Robert Malz @ 2026-06-25 10:14 UTC (permalink / raw)
To: Przemek Kitszel
Cc: Simon Horman, Grzegorz Nitka, anthony.l.nguyen, intel-wired-lan,
netdev
In-Reply-To: <5658849b-0425-4132-ba32-5801e2907c60@intel.com>
Hey Przemek,
Thanks a lot for the feedback.
I was sure that we use ICE_NVM_TIMEOUT (180s) as a timeout every time
(ice_acquire_nvm) but your proposal made me rethink it a little.
First of all, the datasheet for E810 specifies the timeout as: "As an
input, the software might specify timeout longer than the default
taken for this resource, and up to one minute."
180s is greater than one minute so I took a look into AQC logs:
[ 110.698471] ice 0000:05:00.0: CQ CMD: opcode 0x0008, flags 0x2000,
datalen 0x0000, retval 0x0000
[ 110.698474] ice 0000:05:00.0: cookie (h,l) 0x00000000 0x00000000
[ 110.698477] ice 0000:05:00.0: param (0,1) 0x00010001 0x0002BF20
[ 110.698480] ice 0000:05:00.0: addr (h,l) 0x00000000 0x00000000
[ 110.698645] ice 0000:05:00.0: ATQ: desc and buffer writeback:
[ 110.698648] ice 0000:05:00.0: CQ CMD: opcode 0x0008, flags 0x2003,
datalen 0x0000, retval 0x0000
[ 110.698651] ice 0000:05:00.0: cookie (h,l) 0x00000000 0x00000000
[ 110.698654] ice 0000:05:00.0: param (0,1) 0x00010001 0x00000BB8
[ 110.698657] ice 0000:05:00.0: addr (h,l) 0x00000000 0x00000000
Based on the above, the driver requested a 0x0002BF20 timeout (180 000
ms) but the FW returned only 0x00000BB8 (3s).
I'm assuming this is expected behavior since the maximum timeout for
NVM read should be 60,000 ms.
If changing the timeout requested by the driver to 60s for read ops is
handled correctly by the FW and the FW respects that lock, the retry
patch submitted in this email thread might not be required at all.
Let me quickly prepare a new patch and test it. I'll update this
thread once I have a results.
Regards,
Robert
On Thu, Jun 25, 2026 at 9:51 AM Przemek Kitszel
<przemyslaw.kitszel@intel.com> wrote:
>
> >> This is an AI-generated review of your patch. The human sending this
>
>
> >> [Severity: Low]
> >> Does this check allow an extra retry execution? Since retry_cnt starts at 0,
> >> using a strictly greater-than check allows the loop to retry 4 times
> >> (5 total executions) instead of the 3 executions intended by
> >> ICE_SQ_SEND_MAX_EXECUTE.
> >>
> >>> + break;
> >>> + ice_debug(hw, ICE_DBG_NVM,
> >>> + "NVM read EBUSY error, retry %d\n",
> >>> + retry_cnt + 1);
> >>> + last_cmd = false;
> >>> + ice_release_nvm(hw);
> >>> + msleep(ICE_SQ_SEND_DELAY_TIME_MS);
> >>
> > The review is correct that this allows one extra execution. However, I
> > intentionally kept this verbatim from the OOT driver (1.15.4) to
>
> This OOT change was made entirely by Grzegorz, for E825C. Would be great
> to acknowledge his authorship (+ SoB ofc, below).
> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
>
> What was the platform that you have encountered the bug on?
>
> > retain the significant test exposure it has there. The absolute worst
> > case is one harmless extra retry before giving up. If you strongly
> > prefer strict adherence to ICE_SQ_SEND_MAX_EXECUTE, I can update the
> > condition in a v4.
>
> Please update (to have one attempt less).
>
> >
> >> [Severity: High]
> >> Does dropping the NVM lock here violate the caller's assumption of mutual
> >> exclusion?
> >>
> >> Callers of ice_read_flat_nvm() typically acquire the NVM lock beforehand to
> >> ensure an atomic read of multiple sectors without interleaved writes. By
> >> explicitly dropping the lock during the msleep() window, another thread, VF,
> >> or PF could acquire the NVM lock and modify the NVM before the read resumes,
> >> potentially resulting in reading torn or inconsistent NVM data.
> >>
> >>> + status = ice_acquire_nvm(hw, ICE_RES_READ);
> >>> + if (status)
> >>> + break;
> >>
> >
> > I don't think this is an issue. The NVM resource (req_res 0x0008) is
> > an ownership lease, not a content snapshot. The FW grants it with a
> > hold timeout (see the timeout field handling in ice_aq_req_res). It
> > never guaranteed a point-in-time view of the flash.
> > Very similar behavior already exists in ice_devlink_nvm_snapshot(),
> > which reads the NVM in chunks via ice_read_flat_nvm() and deliberately
> > acquires/releases the lock between chunks (with a comment explaining
> > the FW reclaim).
> > Additionally, the lock is only dropped on the failure path, where
> > neither bytes_read nor offset have been advanced. The driver simply
> > re-reads the failed chunk after re-acquiring. So no data that is
> > returned to the caller is committed across the drop window.
> > Overall the lock drop does not weaken any guarantee callers actually
> > have. If we wanted to introduce a true snapshot guarantee, that would
> > probably require different approach and separate patch.
> >
> >
> >> [Severity: High]
> >> Can this lead to an unbalanced NVM resource lock release and potential data
> >> corruption?
> >>
> >> If ice_acquire_nvm() fails here, the loop breaks and returns an error with the
> >> lock currently unheld. However, callers like ice_read_flash_module() are
> >> unaware of this internal lock state change and will unconditionally call
> >> ice_release_nvm() upon failure:
> >>
> >> drivers/net/ethernet/intel/ice/ice_nvm.c:ice_read_flash_module() {
> >> status = ice_acquire_nvm(hw, ICE_RES_READ);
> >> if (status)
> >> return status;
> >>
> >> status = ice_read_flat_nvm(hw, start + offset, &length, data, false);
> >>
> >> ice_release_nvm(hw);
> >> ...
> >>
> >> Because firmware tracks resource locks per-PF rather than per-OS-thread, this
> >> errant second release could forcefully strip the lock from another thread on
> >> the same PF that successfully acquired it, exposing the NVM to concurrent
> >> modification.
> >>
> >
> > Agreed, this might be a real bug, and the one of the three I think is
> > worth investigating.
> > If ice_acquire_nvm() fails after the drop, ice_read_flat_nvm() returns
> > with the lock unheld while callers unconditionally call
> > ice_release_nvm(), so a stray release is issued.
> >
> > On probability, though, the window is very small. Reaching it requires
> > sustained EBUSY across the retry budget plus a failed re-acquire
> > (which itself polls up to ICE_NVM_TIMEOUT), and concurrently another
> > requester taking the lock. Most reads happen during init (ice_probe,
> > and reset/rebuild via ice_init_nvm), and NVM writes only happen on an
> > already initialized driver. The devlink/ethtool nvm_read paths are
> > also exposed, but hitting this race would require precise timing
> > against a concurrent NVM owner on the device.
> >
> > I'd prefer to keep the scope of this patch limited to the EBUSY retry
> > path and not take on the unbalanced-release fix here. A proper fix
> > should change the lock-ownership contract of ice_read_flat_nvm() (on
> > error, the lock must be released by ice_read_flat_nvm(), callers
> > release only on success) and update all callers. Code change sould be
> > simple for all callers but ice_discover_flash_size(), it intentionally
> > holds one lease across a read loop and would need to re-acquire after
> > each expected boundary failure.
> >
> > Given how small the original window is, I'd rather not trade tested
> > OOT behavior for the risk of a complex unbalanced NVM lock fix. I
> > actually have a patch mostly ready that fixes the lock-ownership
> > contract, but I really don't like it. It changes the design of
> > ice_read_flat_nvm(), making it less intuitive for callers. More
> > importantly, I just don't have the resources or test coverage right
> > now to properly verify such change.
> >
> > However, I can modify the failure path for ice_acquire_nvm inside
> > ice_read_flat_nvm. Instead of bailing out immediately, we can just
> > retry it within the existing retry budget. In this case, the
> > probability of leaving ice_read_flat_nvm without holding the lock is
> > reduced even further without needing a refactor.
> >
> > Please let me know what you think about my thought process on this.
>
> I think that both AI-reported issues against the lock are valid
> concerns.
>
> I think that sleep was the actual fix, and re-locking were merely
> a necessity due to their expiration (as you said).
>
> A proper fix would be to just increase lock-timeout to accommodate all
> attempts (and still do the retries&sleep, but without unlocking).
>
>
> >
> >
> >
> >>> + retry_cnt++;
> >>> + } else {
> >>> + bytes_read += read_size;
> >>> + offset += read_size;
> >>> + retry_cnt = 0;
> >>> + }
> >>> } while (!last_cmd);
> >>>
> >>> *length = bytes_read;
> >
> > Thanks,
> > Robert
>
^ permalink raw reply
* Re: [PATCH net] net: libwx: fix VMDQ mask for 1-queue mode
From: Larysa Zaremba @ 2026-06-25 10:14 UTC (permalink / raw)
To: Jiawen Wu
Cc: netdev, 'Mengyuan Lou', 'Andrew Lunn',
'David S. Miller', 'Eric Dumazet',
'Jakub Kicinski', 'Paolo Abeni',
'Simon Horman', 'Kees Cook'
In-Reply-To: <062401dd0487$3a3152f0$ae93f8d0$@trustnetic.com>
On Thu, Jun 25, 2026 at 05:44:33PM +0800, Jiawen Wu wrote:
> On Thu, Jun 25, 2026 5:39 PM, Larysa Zaremba wrote:
> > On Thu, Jun 25, 2026 at 05:08:51PM +0800, Jiawen Wu wrote:
> > > In wx_set_vmdq_queues(), the VMDQ mask was not set for the devices not
> > > support WX_FLAG_MULTI_64_FUNC, i.e., NGBE devices. A mask of 0 causes
> > > __ALIGN_MASK(1, ~vmdq->mask) to return 0, which incorrectly sets
> > > q_per_pool to 0 in wx_write_qde().
> > >
> > > Fix the VMDQ 1-queue mask to 0x7F then ensures that __ALIGN_MASK(1,
> > > 0x7F) correctly evaluates to 1.
> >
> > __ALIGN_MASK(1, 0x7F) evaulates to 0x80 (128), not to 1. __ALIGN_MASK(1, 0x7E)
> > evaluates to 1. Maybe you need 0x7D for 2 queues and 0x7E for 1 queue?
>
> Sorry, the commit log is so wrong for that '~' is missing...
> I want to describe that __ALIGN_MASK(1, ~0x7F) evaluates to 1.
>
Then I do not have any further concerns. Given you fix the lack of "~" in the
commit message and change "not support" to "not supporting" above, I approve
this patch.
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
> >
> > >
> > > Fixes: c52d4b898901 ("net: libwx: Redesign flow when sriov is enabled")
> > > Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
> > > ---
> > > drivers/net/ethernet/wangxun/libwx/wx_lib.c | 1 +
> > > drivers/net/ethernet/wangxun/libwx/wx_type.h | 1 +
> > > 2 files changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
> > > index d042567b8128..814d88d2aee4 100644
> > > --- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c
> > > +++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
> > > @@ -1802,6 +1802,7 @@ static bool wx_set_vmdq_queues(struct wx *wx)
> > > rss_i = 4;
> > > }
> > > } else {
> > > + vmdq_m = WX_VMDQ_1Q_MASK;
> > > /* double check we are limited to maximum pools */
> > > vmdq_i = min_t(u16, 8, vmdq_i);
> > >
> > > diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h
> > > index c7befe4cdfe9..65e3e55db1cf 100644
> > > --- a/drivers/net/ethernet/wangxun/libwx/wx_type.h
> > > +++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h
> > > @@ -486,6 +486,7 @@ enum WX_MSCA_CMD_value {
> > >
> > > #define WX_VMDQ_4Q_MASK 0x7C
> > > #define WX_VMDQ_2Q_MASK 0x7E
> > > +#define WX_VMDQ_1Q_MASK 0x7F
> > >
> > > /****************** Manageablility Host Interface defines ********************/
> > > #define WX_HI_MAX_BLOCK_BYTE_LENGTH 256 /* Num of bytes in range */
> > > --
> > > 2.51.0
> > >
> >
>
^ permalink raw reply
* Re: [PATCH v2 0/7] vmsplice: fix some problems in my previous vmsplice patchset
From: Askar Safin @ 2026-06-25 10:11 UTC (permalink / raw)
To: david
Cc: akpm, avagin, axboe, brauner, collin.funk1, david.laight.linux,
dhowells, fuse-devel, hch, jack, joannelkoong, kernel, linux-api,
linux-fsdevel, linux-kernel, linux-mm, luto, metze, miklos,
netdev, patches, pfalcato, safinaskar, torvalds, val, viro, w,
willy
In-Reply-To: <89ea76b3-e956-4232-8180-ee3929adf905@kernel.org>
"David Hildenbrand (Arm)" <david@kernel.org>:
> I think we concluded that we cannot rip out vmsplice that way at this point, and
> I suspect that Christian will drop that topic branch from -next after -rc1.
I think my patches still have a chance.
On fuse regression: I return EINVAL for particular combination of
flags used by fuse. This causes fuse to fail-back to non-vmsplice
code path. I did Debian code search, and I found none significant
packages, which use same combination of options.
So I think I was able to deal with fuse regression.
On CRIU named fifo "Not supported" regression: it is handled.
On CRIU major performance regression: it is NOT handled. But I still
think my approach is right. (See cover letter for details.)
(I wrote about all these in cover letter for this v2 patchset.)
So all regressions found so far (except for CRIU major performance
regression) are handled.
Other option is to introduce some deprecation period (as
suggested by Andrei Vagin). I can do this, if needed.
--
Askar Safin
^ permalink raw reply
* Re: [PATCH v2] net: mdio: airoha: fix reset control leak in error path
From: Larysa Zaremba @ 2026-06-25 10:10 UTC (permalink / raw)
To: Wentao Liang
Cc: Andrew Lunn, Heiner Kallweit, Russell King, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260622115403.39772-1-vulab@iscas.ac.cn>
Please, specify the target tree (probably, net) via a subject prefix.
Also, looks like you are missing to many CCs [0]
[0]
https://netdev-ctrl.bots.linux.dev/logs/build/1114710/14639274/cc_maintainers/summary
On Mon, Jun 22, 2026 at 07:54:03PM +0800, Wentao Liang wrote:
> In airoha_mdio_probe(), after calling reset_control_deassert(),
> if clk_set_rate() fails, the function returns immediately without
> calling reset_control_assert(). This leaves the reset line
> deasserted and causes a reference count leak on shared reset
> controllers.
>
Sashiko correctly points out that since the reset controller is exclusive,
there is no refcount leak. [1]
So the problem is missing rstc->rcdev->ops->assert(rstc->rcdev, rstc->id) call.
It would be great if you could describe, what problem this can cause.
> Fix this by reorganizing the error handling to use a goto label,
> ensuring reset_control_assert() is called on all error paths
> before returning.
>
> Also add error checking for reset_control_deassert().
Sashiko correctly points out you do not actually do this, which is fine, just
update the commit message. [1]
[1] https://sashiko.dev/#/patchset/20260622115403.39772-1-vulab%40iscas.ac.cn
> Fixes: 67e3ba978361 ("net: mdio: Add MDIO bus controller for Airoha AN7583")
> Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
> ---
> drivers/net/mdio/mdio-airoha.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/mdio/mdio-airoha.c b/drivers/net/mdio/mdio-airoha.c
> index 52e7475121ea..4c1b2415687c 100644
> --- a/drivers/net/mdio/mdio-airoha.c
> +++ b/drivers/net/mdio/mdio-airoha.c
> @@ -246,15 +246,17 @@ static int airoha_mdio_probe(struct platform_device *pdev)
>
> ret = clk_set_rate(priv->clk, freq);
> if (ret)
> - return ret;
> + goto err_reset_assert;
>
> ret = devm_of_mdiobus_register(dev, bus, dev->of_node);
> - if (ret) {
> - reset_control_assert(priv->reset);
> - return ret;
> - }
> + if (ret)
> + goto err_reset_assert;
>
> return 0;
> +
> +err_reset_assert:
> + reset_control_assert(priv->reset);
> + return ret;
> }
>
> static const struct of_device_id airoha_mdio_dt_ids[] = {
> --
> 2.39.5 (Apple Git-154)
>
>
^ permalink raw reply
* Re: [PATCH net] nfc: nci: fix uninit-value in the RF discover/activated NTF handlers
From: Sam P @ 2026-06-25 10:04 UTC (permalink / raw)
To: Simon Horman
Cc: david, davem, edumazet, kuba, pabeni, oe-linux-nfc, netdev,
linux-kernel, stable
In-Reply-To: <20260625091828.1170615-2-horms@kernel.org>
On 25/06/2026 10:18, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://netdev-ai.bots.linux.dev/sashiko/
> ---
> nfc: nci: fix uninit-value in the RF discover/activated NTF handlers
>
> This zero-initialises the on-stack nci_rf_discover_ntf and
> nci_rf_intf_activated_ntf structs in the RF discover and interface
> activated NTF handlers so that the technology- and activation-specific
> unions read back as zero when the corresponding length fields are zero.
>
> Link: https://lore.kernel.org/netdev/20260623172109.1105965-2-horms@kernel.org/
>
>> - activated: with rf_tech_specific_params_len == 0 the same union is read
>> via nci_target_auto_activated(); with activation_params_len == 0 the
>> activation_params union is read by nci_store_ats_nfc_iso_dep() into
>> ndev->target_ats.
>
> [Low]
> Does the third scenario in this paragraph match the code?
>
> Looking at nci_store_ats_nfc_iso_dep():
>
> static int nci_store_ats_nfc_iso_dep(struct nci_dev *ndev,
> const struct nci_rf_intf_activated_ntf *ntf)
> {
> ndev->target_ats_len = 0;
>
> if (ntf->activation_params_len <= 0)
> return NCI_STATUS_OK;
>
> if (ntf->activation_params.nfca_poll_iso_dep.rats_res_len > NFC_ATS_MAXSIZE) {
> ...
>
> The function returns NCI_STATUS_OK before any access to the
> activation_params union when activation_params_len is 0, so the union
> is not read in that case.
>
> The sibling nci_store_general_bytes_nfc_dep() has the same early return
> on activation_params_len <= 0.
>
> The two rf_tech_specific_params_len == 0 cases listed earlier in the
> commit message do match the code (nci_add_new_protocol() and
> nci_target_auto_activated() unconditionally pass
> &ntf.rf_tech_specific_params even when rf_tech_specific_params_len is
> 0), and the KMSAN trace quoted in the commit message only mentions the
> nci_add_new_protocol() path.
>
> The ntf = {} change still looks correct and necessary for the
> rf_tech_specific_params union. Could the third bullet be dropped or
> adjusted so the commit message reflects only the paths that actually
> read uninitialised bytes?
Thanks for sharing the review.
This is valid feedback, I'll fix up the commit message in a v2.
^ permalink raw reply
* Re: [PATCH net v2] tipc: fix out-of-bounds read in broadcast Gap ACK blocks
From: Sam P @ 2026-06-25 10:00 UTC (permalink / raw)
To: Tung Quang Nguyen
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev@vger.kernel.org,
tipc-discussion@lists.sourceforge.net,
linux-kernel@vger.kernel.org, Jon Maloy
In-Reply-To: <GV1P189MB19881AD711511094760906DCC6EC2@GV1P189MB1988.EURP189.PROD.OUTLOOK.COM>
On 25/06/2026 10:23, Tung Quang Nguyen wrote:
>> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index
>> 76a1585d3f6b..08637c3c9db0 100644
>> --- a/net/tipc/bcast.c
>> +++ b/net/tipc/bcast.c
>> @@ -497,11 +497,12 @@ void tipc_bcast_ack_rcv(struct net *net, struct
>> tipc_link *l,
>> */
>> int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
>> struct tipc_msg *hdr,
>> - struct sk_buff_head *retrq)
>> + struct sk_buff_head *retrq, bool *valid)
>> {
>> struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq;
>> struct tipc_gap_ack_blks *ga;
>> struct sk_buff_head xmitq;
>> + u16 glen;
>
> Move this variable declaration to the bottom to follow reverse xmas tree style.
>
>> int rc = 0;
>>
>> __skb_queue_head_init(&xmitq);
>> @@ -510,13 +511,18 @@ int tipc_bcast_sync_rcv(struct net *net, struct
>> tipc_link *l,
>> if (msg_type(hdr) != STATE_MSG) {
>> tipc_link_bc_init_rcv(l, hdr);
>> } else if (!msg_bc_ack_invalid(hdr)) {
>> - tipc_get_gap_ack_blks(&ga, l, hdr, false);
>> - if (!sysctl_tipc_bc_retruni)
>> - retrq = &xmitq;
>> - rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr),
>> - msg_bc_gap(hdr), ga, &xmitq,
>> - retrq);
>> - rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq);
>> + glen = tipc_get_gap_ack_blks(&ga, l, hdr, false);
>> + if (glen > msg_data_sz(hdr)) {
>> + /* Malformed Gap ACK blocks; caller drops the msg */
>> + *valid = false;
>> + } else {
>> + if (!sysctl_tipc_bc_retruni)
>> + retrq = &xmitq;
>> + rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr),
>> + msg_bc_gap(hdr), ga, &xmitq,
>> + retrq);
>> + rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq);
>> + }
>> }
>> tipc_bcast_unlock(net);
>>
>> diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h index
>> 2d9352dc7b0e..55d17b5413e1 100644
>> --- a/net/tipc/bcast.h
>> +++ b/net/tipc/bcast.h
>> @@ -97,7 +97,7 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
>> struct tipc_msg *hdr);
>> int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
>> struct tipc_msg *hdr,
>> - struct sk_buff_head *retrq);
>> + struct sk_buff_head *retrq, bool *valid);
>> int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg,
>> struct tipc_link *bcl);
>> int tipc_nl_bc_link_set(struct net *net, struct nlattr *attrs[]); diff --git
>> a/net/tipc/node.c b/net/tipc/node.c index 97aa970a0d83..2887f94ee28f
>> 100644
>> --- a/net/tipc/node.c
>> +++ b/net/tipc/node.c
>> @@ -1831,12 +1831,13 @@ static void tipc_node_mcast_rcv(struct tipc_node
>> *n) }
>>
>> static void tipc_node_bc_sync_rcv(struct tipc_node *n, struct tipc_msg *hdr,
>> - int bearer_id, struct sk_buff_head *xmitq)
>> + int bearer_id, struct sk_buff_head *xmitq,
>> + bool *valid)
>> {
>> struct tipc_link *ucl;
>> int rc;
>>
>> - rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr, xmitq);
>> + rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr, xmitq, valid);
>
> 'valid' needs to be checked after this call. Then, return immediately if it is false.
>
>>
>> if (rc & TIPC_LINK_DOWN_EVT) {
>> tipc_node_reset_links(n);
>> @@ -2140,12 +2141,18 @@ void tipc_rcv(struct net *net, struct sk_buff *skb,
>> struct tipc_bearer *b)
>>
>> /* Ensure broadcast reception is in synch with peer's send state */
>> if (unlikely(usr == LINK_PROTOCOL)) {
>> + bool valid = true;
>> +
>> if (unlikely(skb_linearize(skb))) {
>> tipc_node_put(n);
>> goto discard;
>> }
>> hdr = buf_msg(skb);
>> - tipc_node_bc_sync_rcv(n, hdr, bearer_id, &xmitq);
>> + tipc_node_bc_sync_rcv(n, hdr, bearer_id, &xmitq, &valid);
>> + if (!valid) {
>> + tipc_node_put(n);
>> + goto discard;
>> + }
>> } else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) {
>> tipc_bcast_ack_rcv(net, n->bc_entry.link, hdr);
>> }
>>
>> base-commit: a986fde914d88af47eb78fd29c5d1af7952c3500
>> --
>> 2.54.0
>
Thanks for the review, I'll address this in a v3.
^ permalink raw reply
* [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq)
From: Bryam Vargas via B4 Relay @ 2026-06-25 9:51 UTC (permalink / raw)
To: Vinicius Costa Gomes, Paolo Abeni, Jamal Hadi Salim, Jiri Pirko,
Jakub Kicinski, David S. Miller, Eric Dumazet
Cc: Simon Horman, netdev, Jarek Poplawski, Vladimir Oltean,
linux-kernel
Commit 77be155cba4e added peek emulation: a non-work-conserving qdisc's
->peek dequeues one skb and stashes it in the child's gso_skb. A parent
that peeks such a child must then take the packet with
qdisc_dequeue_peeked(), not a direct ->dequeue(), or the stashed skb is
bypassed and the child's qlen/backlog desync. sch_red and sch_sfb were
just fixed for this; taprio and multiq still take the direct path.
With a qfq child the desync re-enters qfq_dequeue on an emptied aggregate
list and dereferences NULL, panicking from softirq on ordinary egress.
taprio reaches it on its own (root-only software path, all gates open);
multiq reaches it when a peeking parent such as tbf wraps it over a
non-work-conserving grandchild. Both need only CAP_NET_ADMIN.
Confirmed under KASAN: the unpatched arm panics, the patched arm is
clean, and a work-conserving-child control is clean. The reproducers and
splats for both are below; the per-patch changes are one line each.
taprio reproducer (self-triggering, no parent qdisc needed):
ip link add dummy0 numtxqueues 4 type dummy; ip link set dummy0 up
ip addr add 10.10.11.10/24 dev dummy0
tc qdisc add dev dummy0 root handle 1: taprio num_tc 2 \
map 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@0 1@1 \
base-time 9000000000000000000 sched-entry S 03 200000 flags 0x0 clockid CLOCK_TAI
tc qdisc replace dev dummy0 parent 1:1 handle 3: qfq
tc class add dev dummy0 classid 3:1 parent 3: qfq maxpkt 512 weight 1
tc filter add dev dummy0 parent 3: protocol ip prio 1 matchall classid 3:1
ping -c1 10.10.11.99 -I dummy0
[ 903.769174] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] SMP KASAN NOPTI
[ 903.769953] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
[ 903.770456] CPU: 7 UID: 0 PID: 16162 Comm: ping Not tainted 7.1.0-rc5 #1 PREEMPT(lazy)
[ 903.771725] RIP: 0010:qfq_dequeue+0x362/0x1580 [sch_qfq]
[ 903.777452] Call Trace:
[ 903.778311] taprio_dequeue_from_txq+0x383/0x680 [sch_taprio]
[ 903.778685] taprio_dequeue_tc_priority+0x19a/0x330 [sch_taprio]
[ 903.779645] taprio_dequeue+0xa6/0x330 [sch_taprio]
[ 903.780299] __qdisc_run+0x16c/0x1890
[ 903.780854] __dev_queue_xmit+0x1ece/0x3390
[ 903.784109] ip_finish_output2+0x571/0x1da0
[ 903.785996] ip_output+0x26c/0x4d0
[ 903.789572] ping_v4_sendmsg+0xd22/0x12b0
[ 903.796118] __x64_sys_sendto+0xe0/0x1c0
[ 903.796612] do_syscall_64+0xee/0x590
[ 903.818669] Kernel panic - not syncing: Fatal exception in interrupt
multiq reproducer (needs a peeking parent over a stashing child; tbf
values chosen to force it to throttle):
ip link add dummy0 numtxqueues 2 type dummy; ip link set dummy0 up
ip addr add 10.10.11.10/24 dev dummy0
tc qdisc add dev dummy0 root handle 1: tbf rate 88bit burst 1661b \
peakrate 2257333 minburst 1024 limit 7b
tc qdisc add dev dummy0 parent 1: handle 2: multiq
for b in 1 2; do # qfq on every band
tc qdisc add dev dummy0 parent 2:$b handle 3$b: qfq
tc class add dev dummy0 classid 3$b:1 parent 3$b: qfq maxpkt 512 weight 1
tc filter add dev dummy0 parent 3$b: protocol ip prio 1 matchall classid 3$b:1
done
ping -c12 10.10.11.99 -I dummy0
[ 1066.385097] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] SMP KASAN NOPTI
[ 1066.386385] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
[ 1066.387227] CPU: 1 UID: 0 PID: 5357 Comm: ping Not tainted 7.1.0-rc5 #1 PREEMPT(lazy)
[ 1066.389183] RIP: 0010:qfq_dequeue+0x362/0x1580 [sch_qfq]
[ 1066.396316] Call Trace:
[ 1066.396768] multiq_dequeue+0x163/0x360 [sch_multiq]
[ 1066.397885] tbf_dequeue+0x6b9/0xf17 [sch_tbf]
[ 1066.398269] __qdisc_run+0x16c/0x1890
[ 1066.399315] __dev_queue_xmit+0x1ece/0x3390
[ 1066.403276] ip_finish_output2+0x571/0x1da0
[ 1066.404818] ip_output+0x26c/0x4d0
[ 1066.408620] ping_v4_sendmsg+0xd22/0x12b0
[ 1066.415264] __x64_sys_sendto+0xe0/0x1c0
[ 1066.416251] do_syscall_64+0xee/0x590
[ 1066.441210] Kernel panic - not syncing: Fatal exception in interrupt
---
Bryam Vargas (2):
net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked
net/sched: sch_multiq: Replace direct dequeue call with peek and qdisc_dequeue_peeked
net/sched/sch_multiq.c | 2 +-
net/sched/sch_taprio.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
---
base-commit: 02f144fbb4c86c360495d33debe307cb46a57f95
change-id: 20260625-b4-disp-31bcb279-082e59a3aa36
Best regards,
--
Bryam Vargas <hexlabsecurity@proton.me>
^ permalink raw reply
* [PATCH 1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked
From: Bryam Vargas via B4 Relay @ 2026-06-25 9:51 UTC (permalink / raw)
To: Vinicius Costa Gomes, Paolo Abeni, Jamal Hadi Salim, Jiri Pirko,
Jakub Kicinski, David S. Miller, Eric Dumazet
Cc: Simon Horman, netdev, Jarek Poplawski, Vladimir Oltean,
linux-kernel
In-Reply-To: <20260625-b4-disp-31bcb279-v1-0-85c40b83c529@proton.me>
From: Bryam Vargas <hexlabsecurity@proton.me>
When taprio's software path peeks a non-work-conserving child qdisc, the
child stashes the peeked skb in its gso_skb; taprio_dequeue_from_txq()
then takes the packet with a direct child ->dequeue() call, which ignores
that stash, orphans the peeked skb and desyncs the child's qlen/backlog.
With a qfq child this re-enters the child on an emptied list and
dereferences NULL, panicking the kernel from softirq on ordinary egress.
Take the packet through qdisc_dequeue_peeked(), as sch_red and sch_sfb
now do. The helper returns the child's stashed skb first and is a no-op
when there is none, so a work-conserving child is unaffected and the
gated path now consumes the skb whose length was charged to the budget.
Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
Cc: stable@vger.kernel.org
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
net/sched/sch_taprio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 558987d9b977..299234a5f0fe 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -749,7 +749,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq,
return NULL;
skip_peek_checks:
- skb = child->ops->dequeue(child);
+ skb = qdisc_dequeue_peeked(child);
if (unlikely(!skb))
return NULL;
--
2.43.0
^ permalink raw reply related
* [PATCH 2/2] net/sched: sch_multiq: Replace direct dequeue call with peek and qdisc_dequeue_peeked
From: Bryam Vargas via B4 Relay @ 2026-06-25 9:51 UTC (permalink / raw)
To: Vinicius Costa Gomes, Paolo Abeni, Jamal Hadi Salim, Jiri Pirko,
Jakub Kicinski, David S. Miller, Eric Dumazet
Cc: Simon Horman, netdev, Jarek Poplawski, Vladimir Oltean,
linux-kernel
In-Reply-To: <20260625-b4-disp-31bcb279-v1-0-85c40b83c529@proton.me>
From: Bryam Vargas <hexlabsecurity@proton.me>
multiq_dequeue() takes a packet from a band's child with a direct
->dequeue() call after multiq_peek() peeked it. When the child is
non-work-conserving the peek stashes the skb in the child's gso_skb, so
the direct dequeue returns a different skb and orphans the stash,
desyncing the child's qlen/backlog. With a qfq child reached through a
peeking parent (e.g. tbf) this re-enters the child on an emptied list and
dereferences NULL, panicking the kernel from softirq on ordinary egress.
Take the packet through qdisc_dequeue_peeked(), as sch_prio already does
and as sch_red and sch_sfb were just fixed to do. The helper is a no-op
when the child has no stash, so a work-conserving child is unaffected.
Fixes: 77be155cba4e ("pkt_sched: Add peek emulation for non-work-conserving qdiscs.")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
net/sched/sch_multiq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 4e465d11e3d7..a467dd122369 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -103,7 +103,7 @@ static struct sk_buff *multiq_dequeue(struct Qdisc *sch)
if (!netif_xmit_stopped(
netdev_get_tx_queue(qdisc_dev(sch), q->curband))) {
qdisc = q->queues[q->curband];
- skb = qdisc->dequeue(qdisc);
+ skb = qdisc_dequeue_peeked(qdisc);
if (skb) {
qdisc_bstats_update(sch, skb);
qdisc_qlen_dec(sch);
--
2.43.0
^ permalink raw reply related
* RE: [PATCH net] net: libwx: fix VMDQ mask for 1-queue mode
From: Jiawen Wu @ 2026-06-25 9:44 UTC (permalink / raw)
To: 'Larysa Zaremba'
Cc: netdev, 'Mengyuan Lou', 'Andrew Lunn',
'David S. Miller', 'Eric Dumazet',
'Jakub Kicinski', 'Paolo Abeni',
'Simon Horman', 'Kees Cook'
In-Reply-To: <ajz3QK96wKoLD4n4@soc-5CG4396X81.clients.intel.com>
On Thu, Jun 25, 2026 5:39 PM, Larysa Zaremba wrote:
> On Thu, Jun 25, 2026 at 05:08:51PM +0800, Jiawen Wu wrote:
> > In wx_set_vmdq_queues(), the VMDQ mask was not set for the devices not
> > support WX_FLAG_MULTI_64_FUNC, i.e., NGBE devices. A mask of 0 causes
> > __ALIGN_MASK(1, ~vmdq->mask) to return 0, which incorrectly sets
> > q_per_pool to 0 in wx_write_qde().
> >
> > Fix the VMDQ 1-queue mask to 0x7F then ensures that __ALIGN_MASK(1,
> > 0x7F) correctly evaluates to 1.
>
> __ALIGN_MASK(1, 0x7F) evaulates to 0x80 (128), not to 1. __ALIGN_MASK(1, 0x7E)
> evaluates to 1. Maybe you need 0x7D for 2 queues and 0x7E for 1 queue?
Sorry, the commit log is so wrong for that '~' is missing...
I want to describe that __ALIGN_MASK(1, ~0x7F) evaluates to 1.
>
> >
> > Fixes: c52d4b898901 ("net: libwx: Redesign flow when sriov is enabled")
> > Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
> > ---
> > drivers/net/ethernet/wangxun/libwx/wx_lib.c | 1 +
> > drivers/net/ethernet/wangxun/libwx/wx_type.h | 1 +
> > 2 files changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
> > index d042567b8128..814d88d2aee4 100644
> > --- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c
> > +++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
> > @@ -1802,6 +1802,7 @@ static bool wx_set_vmdq_queues(struct wx *wx)
> > rss_i = 4;
> > }
> > } else {
> > + vmdq_m = WX_VMDQ_1Q_MASK;
> > /* double check we are limited to maximum pools */
> > vmdq_i = min_t(u16, 8, vmdq_i);
> >
> > diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h
> > index c7befe4cdfe9..65e3e55db1cf 100644
> > --- a/drivers/net/ethernet/wangxun/libwx/wx_type.h
> > +++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h
> > @@ -486,6 +486,7 @@ enum WX_MSCA_CMD_value {
> >
> > #define WX_VMDQ_4Q_MASK 0x7C
> > #define WX_VMDQ_2Q_MASK 0x7E
> > +#define WX_VMDQ_1Q_MASK 0x7F
> >
> > /****************** Manageablility Host Interface defines ********************/
> > #define WX_HI_MAX_BLOCK_BYTE_LENGTH 256 /* Num of bytes in range */
> > --
> > 2.51.0
> >
>
^ permalink raw reply
* [PATCH net] net: airoha: dma map xmit frags with skb_frag_dma_map()
From: Lorenzo Bianconi @ 2026-06-25 9:42 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: linux-arm-kernel, linux-mediatek, netdev, Lorenzo Bianconi
Map xmit skb fragments using skb_frag_dma_map() instead of
dma_map_single(skb_frag_address()). skb_frag_address() relies on
page_address() to obtain a kernel virtual address, which is not
guaranteed to work for all page types (e.g. highmem pages or
user-pinned pages from MSG_ZEROCOPY).
skb_frag_dma_map() maps the fragment directly via its struct page and
offset through dma_map_page(), avoiding the need for a kernel virtual
address entirely.
Introduce an enum airoha_dma_map_type to track how each queue entry was
mapped (single vs page), so that the matching unmap function is called
on completion and in error paths.
Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
drivers/net/ethernet/airoha/airoha_eth.c | 61 ++++++++++++++++++++------------
drivers/net/ethernet/airoha/airoha_eth.h | 7 ++++
2 files changed, 45 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 932b3a3df2e5..1caf6766f2c0 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -944,6 +944,25 @@ static void airoha_qdma_wake_netdev_txqs(struct airoha_queue *q)
q->txq_stopped = false;
}
+static void airoha_unmap_xmit_buf(struct airoha_eth *eth,
+ struct airoha_queue_entry *e)
+{
+ switch (e->dma_type) {
+ case AIROHA_DMA_MAP_PAGE:
+ dma_unmap_page(eth->dev, e->dma_addr, e->dma_len,
+ DMA_TO_DEVICE);
+ break;
+ case AIROHA_DMA_MAP_SINGLE:
+ dma_unmap_single(eth->dev, e->dma_addr, e->dma_len,
+ DMA_TO_DEVICE);
+ break;
+ case AIROHA_DMA_UNMAPPED:
+ default:
+ break;
+ }
+ e->dma_type = AIROHA_DMA_UNMAPPED;
+}
+
static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
{
struct airoha_tx_irq_queue *irq_q;
@@ -1006,9 +1025,7 @@ static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
skb = e->skb;
e->skb = NULL;
- dma_unmap_single(eth->dev, e->dma_addr, e->dma_len,
- DMA_TO_DEVICE);
- e->dma_addr = 0;
+ airoha_unmap_xmit_buf(eth, e);
list_add_tail(&e->list, &q->tx_list);
WRITE_ONCE(desc->msg0, 0);
@@ -1177,12 +1194,10 @@ static void airoha_qdma_tx_cleanup(struct airoha_qdma *qdma)
struct airoha_qdma_desc *desc = &q->desc[j];
struct sk_buff *skb = e->skb;
- if (!e->dma_addr)
+ if (e->dma_type == AIROHA_DMA_UNMAPPED)
continue;
- dma_unmap_single(qdma->eth->dev, e->dma_addr,
- e->dma_len, DMA_TO_DEVICE);
- e->dma_addr = 0;
+ airoha_unmap_xmit_buf(qdma->eth, e);
list_add_tail(&e->list, &q->tx_list);
WRITE_ONCE(desc->ctrl, 0);
@@ -2193,8 +2208,8 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
struct netdev_queue *txq;
struct airoha_queue *q;
LIST_HEAD(tx_list);
+ dma_addr_t addr;
int i = 0, qid;
- void *data;
u16 index;
u8 fport;
@@ -2250,24 +2265,22 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
return NETDEV_TX_BUSY;
}
- len = skb_headlen(skb);
- data = skb->data;
-
e = list_first_entry(&q->tx_list, struct airoha_queue_entry,
list);
+ len = skb_headlen(skb);
+ addr = dma_map_single(netdev->dev.parent, skb->data, len,
+ DMA_TO_DEVICE);
+ if (unlikely(dma_mapping_error(netdev->dev.parent, addr)))
+ goto error_unlock;
+
+ e->dma_type = AIROHA_DMA_MAP_SINGLE;
index = e - q->entry;
while (true) {
struct airoha_qdma_desc *desc = &q->desc[index];
skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
- dma_addr_t addr;
u32 val;
- addr = dma_map_single(netdev->dev.parent, data, len,
- DMA_TO_DEVICE);
- if (unlikely(dma_mapping_error(netdev->dev.parent, addr)))
- goto error_unmap;
-
list_move_tail(&e->list, &tx_list);
e->skb = i == nr_frags - 1 ? skb : NULL;
e->dma_addr = addr;
@@ -2291,8 +2304,13 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
if (++i == nr_frags)
break;
- data = skb_frag_address(frag);
len = skb_frag_size(frag);
+ addr = skb_frag_dma_map(netdev->dev.parent, frag, 0, len,
+ DMA_TO_DEVICE);
+ if (unlikely(dma_mapping_error(netdev->dev.parent, addr)))
+ goto error_unmap;
+
+ e->dma_type = AIROHA_DMA_MAP_PAGE;
}
q->queued += i;
@@ -2313,11 +2331,8 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
return NETDEV_TX_OK;
error_unmap:
- list_for_each_entry(e, &tx_list, list) {
- dma_unmap_single(netdev->dev.parent, e->dma_addr, e->dma_len,
- DMA_TO_DEVICE);
- e->dma_addr = 0;
- }
+ list_for_each_entry(e, &tx_list, list)
+ airoha_unmap_xmit_buf(dev->eth, e);
list_splice(&tx_list, &q->tx_list);
error_unlock:
spin_unlock_bh(&q->lock);
diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
index d7ff8c5200e2..2765244d937c 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.h
+++ b/drivers/net/ethernet/airoha/airoha_eth.h
@@ -170,12 +170,19 @@ enum trtcm_param {
#define TRTCM_TOKEN_RATE_MASK GENMASK(23, 6)
#define TRTCM_TOKEN_RATE_FRACTION_MASK GENMASK(5, 0)
+enum airoha_dma_map_type {
+ AIROHA_DMA_UNMAPPED,
+ AIROHA_DMA_MAP_SINGLE,
+ AIROHA_DMA_MAP_PAGE,
+};
+
struct airoha_queue_entry {
union {
void *buf;
struct {
struct list_head list;
struct sk_buff *skb;
+ enum airoha_dma_map_type dma_type;
};
};
dma_addr_t dma_addr;
---
base-commit: 232c4ca2343d1181cbfc061f9856d9591e397579
change-id: 20260625-airoha-eth-skb_frag_dma_map-bcccd5d6e4b1
Best regards,
--
Lorenzo Bianconi <lorenzo@kernel.org>
^ permalink raw reply related
* Re: [PATCH net] net: libwx: fix VMDQ mask for 1-queue mode
From: Larysa Zaremba @ 2026-06-25 9:39 UTC (permalink / raw)
To: Jiawen Wu
Cc: netdev, Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Kees Cook
In-Reply-To: <60D88ADD3E295420+20260625090851.539640-1-jiawenwu@trustnetic.com>
On Thu, Jun 25, 2026 at 05:08:51PM +0800, Jiawen Wu wrote:
> In wx_set_vmdq_queues(), the VMDQ mask was not set for the devices not
> support WX_FLAG_MULTI_64_FUNC, i.e., NGBE devices. A mask of 0 causes
> __ALIGN_MASK(1, ~vmdq->mask) to return 0, which incorrectly sets
> q_per_pool to 0 in wx_write_qde().
>
> Fix the VMDQ 1-queue mask to 0x7F then ensures that __ALIGN_MASK(1,
> 0x7F) correctly evaluates to 1.
__ALIGN_MASK(1, 0x7F) evaulates to 0x80 (128), not to 1. __ALIGN_MASK(1, 0x7E)
evaluates to 1. Maybe you need 0x7D for 2 queues and 0x7E for 1 queue?
>
> Fixes: c52d4b898901 ("net: libwx: Redesign flow when sriov is enabled")
> Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
> ---
> drivers/net/ethernet/wangxun/libwx/wx_lib.c | 1 +
> drivers/net/ethernet/wangxun/libwx/wx_type.h | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
> index d042567b8128..814d88d2aee4 100644
> --- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c
> +++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
> @@ -1802,6 +1802,7 @@ static bool wx_set_vmdq_queues(struct wx *wx)
> rss_i = 4;
> }
> } else {
> + vmdq_m = WX_VMDQ_1Q_MASK;
> /* double check we are limited to maximum pools */
> vmdq_i = min_t(u16, 8, vmdq_i);
>
> diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h
> index c7befe4cdfe9..65e3e55db1cf 100644
> --- a/drivers/net/ethernet/wangxun/libwx/wx_type.h
> +++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h
> @@ -486,6 +486,7 @@ enum WX_MSCA_CMD_value {
>
> #define WX_VMDQ_4Q_MASK 0x7C
> #define WX_VMDQ_2Q_MASK 0x7E
> +#define WX_VMDQ_1Q_MASK 0x7F
>
> /****************** Manageablility Host Interface defines ********************/
> #define WX_HI_MAX_BLOCK_BYTE_LENGTH 256 /* Num of bytes in range */
> --
> 2.51.0
>
^ permalink raw reply
* Re: [PATCH net] net: gianfar: use of_irq_get()
From: Simon Horman @ 2026-06-25 9:36 UTC (permalink / raw)
To: rosenp
Cc: Simon Horman, netdev, claudiu.manoil, andrew+netdev, davem,
edumazet, kuba, pabeni, afleming, linux-kernel
In-Reply-To: <20260624032122.986986-1-rosenp@gmail.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: gianfar: use of_irq_get()
This patch switches gfar_parse_group() from irq_of_parse_and_map() to
of_irq_get() and widens the irq field in struct gfar_irqinfo from
unsigned int to int. The stated goals are to fix a leak from a missing
irq_dispose_mapping() and to enable -EPROBE_DEFER propagation.
> diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
> index 3271de5844f8..17a0d0787ed2 100644
> --- a/drivers/net/ethernet/freescale/gianfar.c
> +++ b/drivers/net/ethernet/freescale/gianfar.c
> @@ -514,15 +514,15 @@ static int gfar_parse_group(struct device_node *np,
> if (!grp->regs)
> return -ENOMEM;
>
> - gfar_irq(grp, TX)->irq = irq_of_parse_and_map(np, 0);
> + gfar_irq(grp, TX)->irq = of_irq_get(np, 0);
>
> /* If we aren't the FEC we have multiple interrupts */
> if (model && strcasecmp(model, "FEC")) {
> - gfar_irq(grp, RX)->irq = irq_of_parse_and_map(np, 1);
> - gfar_irq(grp, ER)->irq = irq_of_parse_and_map(np, 2);
> - if (!gfar_irq(grp, TX)->irq ||
> - !gfar_irq(grp, RX)->irq ||
> - !gfar_irq(grp, ER)->irq)
> + gfar_irq(grp, RX)->irq = of_irq_get(np, 1);
> + gfar_irq(grp, ER)->irq = of_irq_get(np, 2);
> + if (gfar_irq(grp, TX)->irq < 0 ||
> + gfar_irq(grp, RX)->irq < 0 ||
> + gfar_irq(grp, ER)->irq < 0)
> return -EINVAL;
> }
[High]
Does this change miss the 0-return mapping failure case?
The kernel-doc on of_irq_get() in drivers/of/irq.c says it returns:
Linux IRQ number on success, or 0 on the IRQ mapping failure, or
-EPROBE_DEFER ..., or error code in case of any other failure.
The old test rejected an irq value of 0 with !gfar_irq(grp, *)->irq,
which caught both irq_of_parse_and_map()'s and of_irq_get()'s
zero-on-failure. The new test only rejects values strictly less than
zero, so a 0 returned from of_irq_get() is now stored as the IRQ and
later flows into request_irq()/free_irq() during probe and unbind.
Should the test be <= 0, or split so that negative errors are
propagated and 0 still triggers a failure?
[High]
Is -EPROBE_DEFER lost here?
The commit message states:
Changed irq to an int as required by the of_irq_get API as it
supports -EPROBE_DEFER.
However, the only error path is:
if (gfar_irq(grp, TX)->irq < 0 ||
gfar_irq(grp, RX)->irq < 0 ||
gfar_irq(grp, ER)->irq < 0)
return -EINVAL;
of_irq_get() returns -EPROBE_DEFER when irq_find_host() cannot yet
locate the parent interrupt controller. Collapsing every negative
return into -EINVAL prevents the driver core from retrying the probe
once the interrupt controller appears.
Should the actual error from of_irq_get() be propagated, for example:
int ret = gfar_irq(grp, TX)->irq;
if (ret < 0)
return ret;
so that -EPROBE_DEFER reaches gfar_of_init() and then gfar_probe()?
[Medium]
Can the commit message description be revisited?
The message states:
of_irq_get() differs from irq_of_parse_and_map() in that the latter
requires calling irq_dispose_mapping() when done, which is missing
in the driver. Meaning it leaks memory.
No need to map it anyway. Just need the value stored in the irq
field.
Looking at of_irq_get() in drivers/of/irq.c:
int of_irq_get(struct device_node *dev, int index)
{
...
rc = irq_create_of_mapping(&oirq);
...
}
of_irq_get() ends with the same irq_create_of_mapping() call that
irq_of_parse_and_map() makes, so the resulting virq mapping is created
identically by both APIs.
If the previous code leaked, does the new code not leak in the same
way? And does the Fixes: b31a1d8b4151 tag accurately describe this
change, given that it may cause stable-tree selection on the basis of a
leak that does not appear to exist?
^ permalink raw reply
* Re: [PATCH v29 4/5] sfc: obtain and map cxl range using devm_cxl_probe_mem
From: Alejandro Lucero Palau @ 2026-06-25 9:31 UTC (permalink / raw)
To: Dan Williams (nvidia), alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Edward Cree
In-Reply-To: <6a3c55eea91d0_f12301008f@djbw-dev.notmuch>
On 6/24/26 23:10, Dan Williams (nvidia) wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use core API for safely obtain the CXL range linked to an HDM committed
>> by the BIOS. Map such a range for being used as the ctpio buffer.
>>
>> A potential user space action through sysfs unbinding or core cxl
>> modules remove will trigger sfc driver device detachment, with that case
>> not racing with this mapping as this is done during driver probe and
>> therefore protected with device lock against those user space actions.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> ---
>> drivers/net/ethernet/sfc/efx.c | 2 ++
>> drivers/net/ethernet/sfc/efx_cxl.c | 23 +++++++++++++++++++++++
>> drivers/net/ethernet/sfc/efx_cxl.h | 3 +++
>> 3 files changed, 28 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
>> index 61cbb6cfc360..3806cd3dd7f4 100644
>> --- a/drivers/net/ethernet/sfc/efx.c
>> +++ b/drivers/net/ethernet/sfc/efx.c
>> @@ -984,6 +984,7 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
>> efx_fini_io(efx);
>>
>> probe_data = container_of(efx, struct efx_probe_data, efx);
>> + efx_cxl_exit(probe_data);
>>
>> pci_dbg(efx->pci_dev, "shutdown successful\n");
>>
>> @@ -1242,6 +1243,7 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
>> return 0;
>>
>> fail3:
>> + efx_cxl_exit(probe_data);
>> efx_fini_io(efx);
>> fail2:
>> efx_fini_struct(efx);
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index 18b535b3ea40..3e7c950f83e9 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -18,6 +18,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> {
>> struct efx_nic *efx = &probe_data->efx;
>> struct pci_dev *pci_dev = efx->pci_dev;
>> + struct range cxl_pio_range;
>> struct efx_cxl *cxl;
>> u16 dvsec;
>> int rc;
>> @@ -73,9 +74,31 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> return -ENODEV;
>> }
>>
>> + cxl->cxlmd = devm_cxl_probe_mem(&cxl->cxlds, &cxl_pio_range);
>> + if (IS_ERR(cxl->cxlmd)) {
>> + pci_err(pci_dev, "CXL accel memdev creation failed\n");
>> + return PTR_ERR(cxl->cxlmd);
>> + }
>> +
>> + cxl->ctpio_cxl = ioremap_wc(cxl_pio_range.start,
>> + range_len(&cxl_pio_range));
>> + if (!cxl->ctpio_cxl) {
>> + pci_err(pci_dev, "CXL ioremap region (%pra) failed\n",
>> + &cxl_pio_range);
>> + return -ENOMEM;
>> + }
>> +
>> probe_data->cxl = cxl;
>>
>> return 0;
>> }
>>
>> +void efx_cxl_exit(struct efx_probe_data *probe_data)
>> +{
> If you are going to have an explicit efx_cxl_exit() then I would also
> add an explicit unregistration of the memdev.
This is necessary for undoing the mmap. Nothing else happens there
because it is all relying on devm ...
I could change the ioremap_wc call to devm_ioremap_wc, but
> This would also fix the
> Sashiko report about pci_disable_device() running while the cxl_memdev
> is still registered. Unfortunately, mixing devm and explicit unwind is
> always fraught.
I do not think there is a problem here. The cxl core does not need what
a type2 driver can do regarding PCI BAR mappings, or at least it is not
the case for sfc.
Any action through sysfs cxl will go through cxl core and the only thing
linked to the type device is the CXL registers which are mapped inside
cxl_map_component_regs() and those are managed resources.
So, I can not see why this change is needed. If it is really necessary,
please describe the problem with more detail.
It looks like you need reasons for delaying this further ...
>
> Let me know if this passes your testing, and I can send it out as a
> standalone patch. You could also use it to unwind if the ioremap()
> fails.
You did not read my comments on v28 ...
I changed efx_cxl_init to make the driver probe to fail if cxl is
supported and enabled but the cxl initialization fails, including
ioremap_wc(). What you proposed to do, explicitly undo cxl
initialization bits, has the same outcome: device detached from the driver.
^ permalink raw reply
* [PATCH net] xfrm: fix stack-out-of-bounds in xfrm_tmpl_resolve_one
From: Eric Dumazet @ 2026-06-25 9:24 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet,
syzbot+0ac4d84afe1066a1f3e9, Steffen Klassert, Herbert Xu
syzbot reported a stack-out-of-bounds read in xfrm_state_find()
which flows from xfrm_tmpl_resolve_one().
The issue occurs when a policy has a mix of family-changing templates
(e.g. BEET or IPTFS) and transport templates. If an optional
family-changing template is skipped because no state is found, the
current family of the flow (`family`) is not updated. The subsequent
transport template is then evaluated using the unchanged family (e.g.
AF_INET), but it uses the template's `encap_family` (e.g. AF_INET6)
to perform the state lookup.
This causes `xfrm_state_find()` to interpret the IPv4 flow addresses
(allocated on the stack as `struct flowi4` in `raw_sendmsg` or
`udp_sendmsg`) as IPv6 addresses (`xfrm_address_t`), leading to a
16-byte read from the 4-byte stack variables, triggering KASAN.
Fix this by tracking the active family of the flow (`cur_family`)
during template resolution:
1. Initialize `cur_family` to the flow's original family.
2. For transport templates, verify that `tmpl->encap_family` matches
`cur_family`. If they mismatch, abort with -EINVAL.
3. When a template that can change the family (tunnel, beet, iptfs) is
successfully resolved, update `cur_family` to `tmpl->encap_family`.
4. If a template is skipped (optional), `cur_family` remains unchanged.
This prevents mismatched transport lookups and makes the resolution
robust against any family-transition gaps.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+0ac4d84afe1066a1f3e9@syzkaller.appspotmail.com
Closes: https://www.spinics.net/lists/netdev/msg1200923.html
Assisted-by: Jetski:gemini-3.1-pro-preview
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
net/xfrm/xfrm_policy.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 7ef861a0e8231b63ece816b5237b03fa1367ccf9..95e30670303d34598ba164dff59a65c14489d5f3 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2485,6 +2485,7 @@ xfrm_tmpl_resolve_one(struct xfrm_policy *policy, const struct flowi *fl,
int i, error;
xfrm_address_t *daddr = xfrm_flowi_daddr(fl, family);
xfrm_address_t *saddr = xfrm_flowi_saddr(fl, family);
+ unsigned short cur_family = family;
xfrm_address_t tmp;
for (nx = 0, i = 0; i < policy->xfrm_nr; i++) {
@@ -2511,6 +2512,11 @@ xfrm_tmpl_resolve_one(struct xfrm_policy *policy, const struct flowi *fl,
goto fail;
local = &tmp;
}
+ } else {
+ if (tmpl->encap_family != cur_family) {
+ error = -EINVAL;
+ goto fail;
+ }
}
x = xfrm_state_find(remote, local, fl, tmpl, policy, &error,
@@ -2526,6 +2532,11 @@ xfrm_tmpl_resolve_one(struct xfrm_policy *policy, const struct flowi *fl,
xfrm[nx++] = x;
daddr = remote;
saddr = local;
+ if (tmpl->mode == XFRM_MODE_TUNNEL ||
+ tmpl->mode == XFRM_MODE_IPTFS ||
+ tmpl->mode == XFRM_MODE_BEET) {
+ cur_family = tmpl->encap_family;
+ }
continue;
}
if (x) {
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
* RE: [PATCH net v2] tipc: fix out-of-bounds read in broadcast Gap ACK blocks
From: Tung Quang Nguyen @ 2026-06-25 9:23 UTC (permalink / raw)
To: Samuel Page
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev@vger.kernel.org,
tipc-discussion@lists.sourceforge.net,
linux-kernel@vger.kernel.org, Jon Maloy
In-Reply-To: <20260624135629.727262-1-sam@bynar.io>
>Subject: [PATCH net v2] tipc: fix out-of-bounds read in broadcast Gap ACK
>blocks
>
>A broadcast PROTOCOL/STATE_MSG can carry a Gap ACK blocks record in its
>data area. tipc_get_gap_ack_blks() only verifies that the record's len field is
>self-consistent with its ugack_cnt/bgack_cnt counts (sz == struct_size(p, gacks,
>ugack_cnt + bgack_cnt)); it does not check that the record actually fits in the
>message data area, msg_data_sz().
>
>The unicast caller tipc_link_proto_rcv() bounds it ("if (glen > dlen) break;"), but
>the broadcast caller tipc_bcast_sync_rcv() discards the returned size, so
>tipc_link_advance_transmq() copies the record off the receive skb with an
>attacker-controlled count:
>
> this_ga = kmemdup(ga, struct_size(ga, gacks, ga->bgack_cnt),
> GFP_ATOMIC);
>
>A TIPC neighbour that negotiated TIPC_GAP_ACK_BLOCK triggers it with one
>ordinary broadcast STATE_MSG (msg_bc_ack_invalid() clear), sized so its data
>area is short, carrying a Gap ACK record with len = 0x400, bgack_cnt = 0xff and
>ugack_cnt = 0. len then equals struct_size(p, gacks, 255), so the consistency
>check passes and ga is non-NULL; kmemdup() reads struct_size(ga, gacks, 255)
>= 1024 bytes out of the much smaller skb:
>
> BUG: KASAN: slab-out-of-bounds in kmemdup_noprof+0x48/0x60
> Read of size 1024 at addr ffff0000c7030d38 by task poc864/69
> Call trace:
> kmemdup_noprof+0x48/0x60
> tipc_link_advance_transmq+0x86c/0xb80
> tipc_link_bc_ack_rcv+0x19c/0x1e0
> tipc_bcast_sync_rcv+0x1c4/0x2c4
> tipc_rcv+0x85c/0x1340
> tipc_l2_rcv_msg+0xac/0x104
> The buggy address belongs to the object at ffff0000c7030d00
> which belongs to the cache skbuff_small_head of size 704
> The buggy address is located 56 bytes inside of
> allocated 704-byte region [ffff0000c7030d00, ffff0000c7030fc0)
>
>The copied-out bytes are subsequently consumed as gap/ack values, but the
>read is already out of bounds at the kmemdup() regardless of how they are
>used.
>
>The unicast STATE path drops such a message: "if (glen > dlen) break;"
>skips the rest of STATE_MSG handling and the skb is freed. Make the broadcast
>path drop it too. tipc_bcast_sync_rcv() now bounds the record against
>msg_data_sz() and, when it does not fit, reports it back through
>tipc_node_bc_sync_rcv() to tipc_rcv() so the skb is discarded rather than
>processed. ga is not cleared on this path: ga == NULL already means "legacy
>peer without Selective ACK", a distinct legitimate state.
>
>Fixes: d7626b5acff9 ("tipc: introduce Gap ACK blocks for broadcast link")
>Cc: stable@vger.kernel.org
>Assisted-by: Bynario AI
>Signed-off-by: Samuel Page <sam@bynar.io>
>---
>v2, per review of v1 [1]:
> - v1 cleared 'ga' on an oversized Gap ACK record, which let the malformed
> STATE message be processed as a legacy (no Selective ACK) one rather than
> dropped. v2 drops it instead, matching the unicast STATE path:
> tipc_bcast_sync_rcv() reports the bad record through a bool output
> parameter, propagated by tipc_node_bc_sync_rcv() to tipc_rcv(), which
> discards the skb.
> - v1 touched only net/tipc/bcast.c; v2 also touches net/tipc/{bcast.h,node.c}.
>
>[1] https://lore.kernel.org/netdev/20260623134137.3641275-1-sam@bynar.io/
>
>For reference, an earlier thread proposed validating inside
>tipc_get_gap_ack_blks():
>
>https://lore.kernel.org/netdev/1316452e465e9a96fce44ec15130a14f3872149f.
>1775809727.git.caoruide123@gmail.com/
>
> net/tipc/bcast.c | 22 ++++++++++++++-------- net/tipc/bcast.h | 2 +-
>net/tipc/node.c | 13 ++++++++++---
> 3 files changed, 25 insertions(+), 12 deletions(-)
>
>diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index
>76a1585d3f6b..08637c3c9db0 100644
>--- a/net/tipc/bcast.c
>+++ b/net/tipc/bcast.c
>@@ -497,11 +497,12 @@ void tipc_bcast_ack_rcv(struct net *net, struct
>tipc_link *l,
> */
> int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
> struct tipc_msg *hdr,
>- struct sk_buff_head *retrq)
>+ struct sk_buff_head *retrq, bool *valid)
> {
> struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq;
> struct tipc_gap_ack_blks *ga;
> struct sk_buff_head xmitq;
>+ u16 glen;
Move this variable declaration to the bottom to follow reverse xmas tree style.
> int rc = 0;
>
> __skb_queue_head_init(&xmitq);
>@@ -510,13 +511,18 @@ int tipc_bcast_sync_rcv(struct net *net, struct
>tipc_link *l,
> if (msg_type(hdr) != STATE_MSG) {
> tipc_link_bc_init_rcv(l, hdr);
> } else if (!msg_bc_ack_invalid(hdr)) {
>- tipc_get_gap_ack_blks(&ga, l, hdr, false);
>- if (!sysctl_tipc_bc_retruni)
>- retrq = &xmitq;
>- rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr),
>- msg_bc_gap(hdr), ga, &xmitq,
>- retrq);
>- rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq);
>+ glen = tipc_get_gap_ack_blks(&ga, l, hdr, false);
>+ if (glen > msg_data_sz(hdr)) {
>+ /* Malformed Gap ACK blocks; caller drops the msg */
>+ *valid = false;
>+ } else {
>+ if (!sysctl_tipc_bc_retruni)
>+ retrq = &xmitq;
>+ rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr),
>+ msg_bc_gap(hdr), ga, &xmitq,
>+ retrq);
>+ rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq);
>+ }
> }
> tipc_bcast_unlock(net);
>
>diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h index
>2d9352dc7b0e..55d17b5413e1 100644
>--- a/net/tipc/bcast.h
>+++ b/net/tipc/bcast.h
>@@ -97,7 +97,7 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
> struct tipc_msg *hdr);
> int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
> struct tipc_msg *hdr,
>- struct sk_buff_head *retrq);
>+ struct sk_buff_head *retrq, bool *valid);
> int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg,
> struct tipc_link *bcl);
> int tipc_nl_bc_link_set(struct net *net, struct nlattr *attrs[]); diff --git
>a/net/tipc/node.c b/net/tipc/node.c index 97aa970a0d83..2887f94ee28f
>100644
>--- a/net/tipc/node.c
>+++ b/net/tipc/node.c
>@@ -1831,12 +1831,13 @@ static void tipc_node_mcast_rcv(struct tipc_node
>*n) }
>
> static void tipc_node_bc_sync_rcv(struct tipc_node *n, struct tipc_msg *hdr,
>- int bearer_id, struct sk_buff_head *xmitq)
>+ int bearer_id, struct sk_buff_head *xmitq,
>+ bool *valid)
> {
> struct tipc_link *ucl;
> int rc;
>
>- rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr, xmitq);
>+ rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr, xmitq, valid);
'valid' needs to be checked after this call. Then, return immediately if it is false.
>
> if (rc & TIPC_LINK_DOWN_EVT) {
> tipc_node_reset_links(n);
>@@ -2140,12 +2141,18 @@ void tipc_rcv(struct net *net, struct sk_buff *skb,
>struct tipc_bearer *b)
>
> /* Ensure broadcast reception is in synch with peer's send state */
> if (unlikely(usr == LINK_PROTOCOL)) {
>+ bool valid = true;
>+
> if (unlikely(skb_linearize(skb))) {
> tipc_node_put(n);
> goto discard;
> }
> hdr = buf_msg(skb);
>- tipc_node_bc_sync_rcv(n, hdr, bearer_id, &xmitq);
>+ tipc_node_bc_sync_rcv(n, hdr, bearer_id, &xmitq, &valid);
>+ if (!valid) {
>+ tipc_node_put(n);
>+ goto discard;
>+ }
> } else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) {
> tipc_bcast_ack_rcv(net, n->bc_entry.link, hdr);
> }
>
>base-commit: a986fde914d88af47eb78fd29c5d1af7952c3500
>--
>2.54.0
^ permalink raw reply
* Re: [PATCH net] nfc: nci: fix uninit-value in the RF discover/activated NTF handlers
From: Simon Horman @ 2026-06-25 9:18 UTC (permalink / raw)
To: sam
Cc: Simon Horman, david, davem, edumazet, kuba, pabeni, oe-linux-nfc,
netdev, linux-kernel, stable
In-Reply-To: <20260623234126.214667-1-sam@bynar.io>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
nfc: nci: fix uninit-value in the RF discover/activated NTF handlers
This zero-initialises the on-stack nci_rf_discover_ntf and
nci_rf_intf_activated_ntf structs in the RF discover and interface
activated NTF handlers so that the technology- and activation-specific
unions read back as zero when the corresponding length fields are zero.
Link: https://lore.kernel.org/netdev/20260623172109.1105965-2-horms@kernel.org/
> - activated: with rf_tech_specific_params_len == 0 the same union is read
> via nci_target_auto_activated(); with activation_params_len == 0 the
> activation_params union is read by nci_store_ats_nfc_iso_dep() into
> ndev->target_ats.
[Low]
Does the third scenario in this paragraph match the code?
Looking at nci_store_ats_nfc_iso_dep():
static int nci_store_ats_nfc_iso_dep(struct nci_dev *ndev,
const struct nci_rf_intf_activated_ntf *ntf)
{
ndev->target_ats_len = 0;
if (ntf->activation_params_len <= 0)
return NCI_STATUS_OK;
if (ntf->activation_params.nfca_poll_iso_dep.rats_res_len > NFC_ATS_MAXSIZE) {
...
The function returns NCI_STATUS_OK before any access to the
activation_params union when activation_params_len is 0, so the union
is not read in that case.
The sibling nci_store_general_bytes_nfc_dep() has the same early return
on activation_params_len <= 0.
The two rf_tech_specific_params_len == 0 cases listed earlier in the
commit message do match the code (nci_add_new_protocol() and
nci_target_auto_activated() unconditionally pass
&ntf.rf_tech_specific_params even when rf_tech_specific_params_len is
0), and the KMSAN trace quoted in the commit message only mentions the
nci_add_new_protocol() path.
The ntf = {} change still looks correct and necessary for the
rf_tech_specific_params union. Could the third bullet be dropped or
adjusted so the commit message reflects only the paths that actually
read uninitialised bytes?
^ permalink raw reply
* [PATCH net] net: libwx: fix VMDQ mask for 1-queue mode
From: Jiawen Wu @ 2026-06-25 9:08 UTC (permalink / raw)
To: netdev
Cc: Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Larysa Zaremba,
Kees Cook, Jiawen Wu
In wx_set_vmdq_queues(), the VMDQ mask was not set for the devices not
support WX_FLAG_MULTI_64_FUNC, i.e., NGBE devices. A mask of 0 causes
__ALIGN_MASK(1, ~vmdq->mask) to return 0, which incorrectly sets
q_per_pool to 0 in wx_write_qde().
Fix the VMDQ 1-queue mask to 0x7F then ensures that __ALIGN_MASK(1,
0x7F) correctly evaluates to 1.
Fixes: c52d4b898901 ("net: libwx: Redesign flow when sriov is enabled")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
---
drivers/net/ethernet/wangxun/libwx/wx_lib.c | 1 +
drivers/net/ethernet/wangxun/libwx/wx_type.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
index d042567b8128..814d88d2aee4 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
@@ -1802,6 +1802,7 @@ static bool wx_set_vmdq_queues(struct wx *wx)
rss_i = 4;
}
} else {
+ vmdq_m = WX_VMDQ_1Q_MASK;
/* double check we are limited to maximum pools */
vmdq_i = min_t(u16, 8, vmdq_i);
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h
index c7befe4cdfe9..65e3e55db1cf 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_type.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h
@@ -486,6 +486,7 @@ enum WX_MSCA_CMD_value {
#define WX_VMDQ_4Q_MASK 0x7C
#define WX_VMDQ_2Q_MASK 0x7E
+#define WX_VMDQ_1Q_MASK 0x7F
/****************** Manageablility Host Interface defines ********************/
#define WX_HI_MAX_BLOCK_BYTE_LENGTH 256 /* Num of bytes in range */
--
2.51.0
^ permalink raw reply related
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
From: Askar Safin @ 2026-06-25 9:03 UTC (permalink / raw)
To: val
Cc: akpm, axboe, brauner, david, dhowells, fuse-devel, hch, jack,
joannelkoong, linux-api, linux-fsdevel, linux-kernel, linux-mm,
miklos, netdev, patches, pfalcato, rostedt, safinaskar, torvalds,
viro, willy
In-Reply-To: <83f05c55-efba-4bf5-abfe-d2ab0819e904@packett.cool>
Val Packett <val@packett.cool>:
> speaking of fuse_dev_splice……_write actually, this series has broken
> xdg-document-portal!
>
> https://github.com/flatpak/xdg-desktop-portal/issues/2026
>
> Specifically what happens is that the EINVAL is returned due to oh.len
> != nbytes:
>
> fuse_dev_do_write: oh.len 16400 != nbytes 15526
>
> (where 16400 == 16384 (read len) + 16, 15526 == 15510 (file len) + 16)
>
> After reverting the series, there is no error because oh.len
> becomes 15526 too.
Please, test v2 version of my fixes:
https://lore.kernel.org/lkml/20260625083409.3769242-1-safinaskar@gmail.com/ .
This should fix this bug.
--
Askar Safin
^ permalink raw reply
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
From: Askar Safin @ 2026-06-25 8:53 UTC (permalink / raw)
To: avagin
Cc: akpm, alexander, axboe, bernd, brauner, criu, david, dhowells,
fuse-devel, hch, jack, joannelkoong, linux-api, linux-fsdevel,
linux-kernel, linux-mm, miklos, netdev, patches, pfalcato,
rostedt, safinaskar, torvalds, val, viro, willy
In-Reply-To: <CANaxB-xUrLQYGiRJZc4Boi+KX=0TJSWymErNovANVko20fMDVA@mail.gmail.com>
Andrei Vagin <avagin@gmail.com>:
> On Wed, Jun 24, 2026 at 12:12 AM Askar Safin <safinaskar@gmail.com> wrote:
> > Does CRIU actually rely on ability to do SPLICE_F_NONBLOCK vmsplice into
> > named fifos? Or this is merely a test?
>
> Yes, it does.
I. e. CRIU relies on that named fifo behavior? Okay, I just sent
v2 version of my fixes. The patchset contains fix for named fifos.
Please, test that this fixes that named fifo problem.
> I already explained that this isn't just a perfomance degradation, it
> actually breaks the pre-dump mechanism in CRIU. vmsplice is invoked from
> our parasite code within the context of a user process, where execution
> speed is critical. A heavy performance penalty completely invalidates
> the pre-dump logic, making the feature useless.
This is very unfortunate. But I still want to remove vmsplice.
> At a minimum, we may need to consider a deprecation plan where vmsplice
> with SPLICE_F_GIFT triggers a warning for a few releases before these
> changes are applied. Alternatively, we could introduce the proposed
> behavior alongside a sysctl to fall back to the old behavior and explicitly
> state that this fallback path will be completely deprecated in a future kernel
> version.
My patches change not only SPLICE_F_GIFT behavior, but also vmsplice
behavior in general.
Let other developers decide what to do (i. e. do nothing, remove
vmsplice now or implement some deprecation scheme).
--
Askar Safin
^ permalink raw reply
* Re: [PATCH v2] net: meth: Fix skb allocation failure handling in RX init
From: Simon Horman @ 2026-06-25 8:47 UTC (permalink / raw)
To: haoxiang_li2024
Cc: Simon Horman, andrew+netdev, davem, edumazet, kuba, pabeni,
pavan.chebbi, netdev, linux-kernel
In-Reply-To: <20260624031940.2690439-1-haoxiang_li2024@163.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
Simon says:
Hi Haoxiang,
I think that the pre-existing issues flagged below do not need
to be considered in the context of this patch. But I am wondering
if you could take a look over the others.
---
net: meth: Fix skb allocation failure handling in RX init
meth_init_rx_ring() does not check the return value of alloc_skb().
If the allocation fails, the NULL skb is passed to skb_reserve() and
then dereferenced through skb->head.
> --- a/drivers/net/ethernet/sgi/meth.c
> +++ b/drivers/net/ethernet/sgi/meth.c
> @@ -228,6 +228,9 @@ static int meth_init_rx_ring(struct meth_private *priv)
>
> for (i = 0; i < RX_RING_ENTRIES; i++) {
> priv->rx_skbs[i] = alloc_skb(METH_RX_BUFF_SIZE, 0);
> + if (!priv->rx_skbs[i])
> + goto err_free_skbs;
> +
> /* 8byte status vector + 3quad padding + 2byte padding,
> * to put data on 64bit aligned boundary */
> skb_reserve(priv->rx_skbs[i],METH_RX_HEAD);
[ ... ]
> @@ -240,6 +243,17 @@ static int meth_init_rx_ring(struct meth_private *priv)
> }
> priv->rx_write = 0;
> return 0;
> +
> +err_free_skbs:
> + while (i--) {
> + dma_unmap_single(&priv->pdev->dev, priv->rx_ring_dmas[i],
> + METH_RX_BUFF_SIZE, DMA_FROM_DEVICE);
[Severity: Critical]
During the allocation loop, each successfully mapped DMA address is pushed
immediately to the hardware via the mace->eth.rx_fifo register. Unmapping
them here frees the memory, but the hardware FIFO still retains the physical
addresses.
When meth_init_rx_ring() returns -ENOMEM to meth_tx_timeout(), the error is
ignored and the RX DMA engine is enabled anyway:
meth_tx_timeout()
...
meth_init_rx_ring(priv);
priv->dma_ctrl |= METH_DMA_TX_EN | METH_DMA_RX_EN | METH_DMA_RX_INT_EN;
Will this cause the hardware to DMA incoming packet data into the unmapped
and potentially reallocated memory?
> + priv->rx_ring[i] = 0;
[Severity: High]
If the hardware remains active as described above, will the RX interrupt
handler dereference this NULL pointer?
When the hardware receives a packet, it raises an RX interrupt and meth_rx()
attempts to process it:
meth_rx()
...
status = priv->rx_ring[priv->rx_write]->status.raw;
Since the pointer at this index was cleared in the error path, this looks
like it will cause a kernel panic.
> + priv->rx_ring_dmas[i] = 0;
> + kfree_skb(priv->rx_skbs[i]);
> + priv->rx_skbs[i] = NULL;
[Severity: Critical]
Does this partial cleanup lead to a double free of socket buffers?
When meth_init_rx_ring() fails at index N, this loop zeroes out elements 0
through N-1. The elements from N through 63 retain their dangling pointers
from the previous execution of meth_free_rx_ring(), which fails to clear SKB
pointers after freeing them.
Since meth_tx_timeout() ignores the error and leaves the device running, the
next time meth_free_rx_ring() is called, it iterates over all 64 elements
again. Will this call kfree_skb() on the dangling pointers in indices N
through 63?
> + }
> + return -ENOMEM;
> }
[Severity: High]
This isn't a bug introduced by this patch, but dma_map_single() is called in
meth_init_rx_ring() without checking for mapping errors:
meth_init_rx_ring()
...
priv->rx_ring_dmas[i] = dma_map_single(&priv->pdev->dev,
priv->rx_ring[i],
METH_RX_BUFF_SIZE,
DMA_FROM_DEVICE);
mace->eth.rx_fifo = priv->rx_ring_dmas[i];
If the mapping fails, an error cookie is passed directly to the hardware.
Could this lead to IOMMU faults or memory corruption when DMA bounce buffers
are exhausted?
[Severity: High]
This is a pre-existing issue, but there appears to be a race condition due
to lockless processing in meth_rx().
The interrupt handler drops the spinlock before processing the ring:
meth_rx()
...
spin_unlock_irqrestore(&priv->meth_lock, flags);
...
while (priv->rx_write != fifo_rptr) {
Concurrently, meth_tx_timeout() can acquire the lock and destroy the RX rings,
falsely assuming it is protected:
meth_tx_timeout()
...
spin_lock_irqsave(&priv->meth_lock,flags);
meth_reset(dev);
...
meth_free_rx_ring(priv);
If meth_free_rx_ring() frees the SKBs while meth_rx() is still executing the
lockless while loop, could this result in a use-after-free or NULL pointer
dereference?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox