Netdev List
 help / color / mirror / Atom feed
* [PATCH 6.1.y] net: dsa: clean up FDB, MDB, VLAN entries on unbind
From: Alva Lan @ 2026-04-21  7:36 UTC (permalink / raw)
  To: gregkh, sashal, stable; +Cc: netdev, Vladimir Oltean, Jakub Kicinski, Alva Lan

From: Vladimir Oltean <vladimir.oltean@nxp.com>

[ Upstream commit 7afb5fb42d4950f33af2732b8147c552659f79b7 ]

As explained in many places such as commit b117e1e8a86d ("net: dsa:
delete dsa_legacy_fdb_add and dsa_legacy_fdb_del"), DSA is written given
the assumption that higher layers have balanced additions/deletions.
As such, it only makes sense to be extremely vocal when those
assumptions are violated and the driver unbinds with entries still
present.

But Ido Schimmel points out a very simple situation where that is wrong:
https://lore.kernel.org/netdev/ZDazSM5UsPPjQuKr@shredder/
(also briefly discussed by me in the aforementioned commit).

Basically, while the bridge bypass operations are not something that DSA
explicitly documents, and for the majority of DSA drivers this API
simply causes them to go to promiscuous mode, that isn't the case for
all drivers. Some have the necessary requirements for bridge bypass
operations to do something useful - see dsa_switch_supports_uc_filtering().

Although in tools/testing/selftests/net/forwarding/local_termination.sh,
we made an effort to popularize better mechanisms to manage address
filters on DSA interfaces from user space - namely macvlan for unicast,
and setsockopt(IP_ADD_MEMBERSHIP) - through mtools - for multicast, the
fact is that 'bridge fdb add ... self static local' also exists as
kernel UAPI, and might be useful to someone, even if only for a quick
hack.

It seems counter-productive to block that path by implementing shim
.ndo_fdb_add and .ndo_fdb_del operations which just return -EOPNOTSUPP
in order to prevent the ndo_dflt_fdb_add() and ndo_dflt_fdb_del() from
running, although we could do that.

Accepting that cleanup is necessary seems to be the only option.
Especially since we appear to be coming back at this from a different
angle as well. Russell King is noticing that the WARN_ON() triggers even
for VLANs:
https://lore.kernel.org/netdev/Z_li8Bj8bD4-BYKQ@shell.armlinux.org.uk/

What happens in the bug report above is that dsa_port_do_vlan_del() fails,
then the VLAN entry lingers on, and then we warn on unbind and leak it.

This is not a straight revert of the blamed commit, but we now add an
informational print to the kernel log (to still have a way to see
that bugs exist), and some extra comments gathered from past years'
experience, to justify the logic.

Fixes: 0832cd9f1f02 ("net: dsa: warn if port lists aren't empty in dsa_port_teardown")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250414212930.2956310-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Apply the patch to net/dsa/dsa2.c in v6.1 since commit
47d2ce03dcfb ("net: dsa: rename dsa2.c back into dsa.c and create its header")
renamed this file to net/dsa/dsa.c starting from v6.2. ]
Signed-off-by: Alva Lan <alvalan9@foxmail.com>
---
 net/dsa/dsa2.c | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 415e856ba0ac..9ecb5e34e484 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -1738,12 +1738,44 @@ static int dsa_switch_parse(struct dsa_switch *ds, struct dsa_chip_data *cd)
 
 static void dsa_switch_release_ports(struct dsa_switch *ds)
 {
+	struct dsa_mac_addr *a, *tmp;
 	struct dsa_port *dp, *next;
+	struct dsa_vlan *v, *n;
 
 	dsa_switch_for_each_port_safe(dp, next, ds) {
-		WARN_ON(!list_empty(&dp->fdbs));
-		WARN_ON(!list_empty(&dp->mdbs));
-		WARN_ON(!list_empty(&dp->vlans));
+		/* These are either entries that upper layers lost track of
+		 * (probably due to bugs), or installed through interfaces
+		 * where one does not necessarily have to remove them, like
+		 * ndo_dflt_fdb_add().
+		 */
+		list_for_each_entry_safe(a, tmp, &dp->fdbs, list) {
+			dev_info(ds->dev,
+				 "Cleaning up unicast address %pM vid %u from port %d\n",
+				 a->addr, a->vid, dp->index);
+			list_del(&a->list);
+			kfree(a);
+		}
+
+		list_for_each_entry_safe(a, tmp, &dp->mdbs, list) {
+			dev_info(ds->dev,
+				 "Cleaning up multicast address %pM vid %u from port %d\n",
+				 a->addr, a->vid, dp->index);
+			list_del(&a->list);
+			kfree(a);
+		}
+
+		/* These are entries that upper layers have lost track of,
+		 * probably due to bugs, but also due to dsa_port_do_vlan_del()
+		 * having failed and the VLAN entry still lingering on.
+		 */
+		list_for_each_entry_safe(v, n, &dp->vlans, list) {
+			dev_info(ds->dev,
+				 "Cleaning up vid %u from port %d\n",
+				 v->vid, dp->index);
+			list_del(&v->list);
+			kfree(v);
+		}
+
 		list_del(&dp->list);
 		kfree(dp);
 	}
-- 
2.43.0


^ permalink raw reply related

* [PATCH net] netdevsim: Initialize all fields of ip header when building dummy sk_buff
From: Nikola Z. Ivanov @ 2026-04-21  7:37 UTC (permalink / raw)
  To: kuba, andrew+netdev, davem, edumazet, pabeni
  Cc: netdev, linux-kernel, Nikola Z. Ivanov

Syzbot reports a KMSAN uninit-value originating from
nsim_dev_trap_skb_build, with the allocation also
being performed in the same function.

The cause of the KMSAN warning is a missing assignment of
the tos and id fields of the ip header.

Fix this by calling skb_put_zero instead of skb_put to
guarantee null initialization.
Additionally remove the now redundant zero assignments
and reorder the remaining ones so that they more closely
match the order of the fields as they appear in the ip header.

Closes: https://syzkaller.appspot.com/bug?extid=23d7fcd204e3837866ff
Fixes: da58f90f11f5 ("netdevsim: Add devlink-trap support")
Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com>
---
There is a very similar function in psample.c called nsim_dev_psample_skb_build
which is almost identical to nsim_dev_trap_skb_build except for the
allocation flag reflecting its non-interrupt context and the fact
it does proper initialization of all fields.
Since these 2 are almost identical would it make sense to combine them
into 1, possbly by passing the allocation flags as parameters?

Thank you in advance for reviewing and answering!

 drivers/net/netdevsim/dev.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 1e06e781c835..64b7cc3a6575 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -829,16 +829,14 @@ static struct sk_buff *nsim_dev_trap_skb_build(void)
 	skb->protocol = htons(ETH_P_IP);
 
 	skb_set_network_header(skb, skb->len);
-	iph = skb_put(skb, sizeof(struct iphdr));
-	iph->protocol = IPPROTO_UDP;
-	iph->saddr = in_aton("192.0.2.1");
-	iph->daddr = in_aton("198.51.100.1");
-	iph->version = 0x4;
-	iph->frag_off = 0;
+	iph = skb_put_zero(skb, sizeof(struct iphdr));
 	iph->ihl = 0x5;
+	iph->version = 0x4;
 	iph->tot_len = htons(tot_len);
 	iph->ttl = 100;
-	iph->check = 0;
+	iph->protocol = IPPROTO_UDP;
+	iph->saddr = in_aton("192.0.2.1");
+	iph->daddr = in_aton("198.51.100.1");
 	iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
 
 	skb_set_transport_header(skb, skb->len);
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net] ipv6: Implement limits on extension header parsing
From: Daniel Borkmann @ 2026-04-21  7:38 UTC (permalink / raw)
  To: Justin Iurman, Eric Dumazet
  Cc: kuba, dsahern, tom, willemdebruijn.kernel, idosch, pabeni, netdev
In-Reply-To: <e084bf5d-99bf-4e23-8dc1-3e7e13c58a2e@gmail.com>

On 4/18/26 4:15 PM, Justin Iurman wrote:
> On 4/18/26 15:46, Justin Iurman wrote:
>> On 4/18/26 15:15, Eric Dumazet wrote:
>>> On Sat, Apr 18, 2026 at 5:50 AM Justin Iurman <justin.iurman@gmail.com> wrote:
>>>> On 4/18/26 14:26, Daniel Borkmann wrote:
>>>>> On 4/18/26 1:45 PM, Justin Iurman wrote:
>>>>>> On 4/17/26 19:18, Daniel Borkmann wrote:
>>>>> [...]
>>>>>>> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
>>>>>>> index d2cd33e2698d..93f865545a7c 100644
>>>>>>> --- a/net/ipv6/sysctl_net_ipv6.c
>>>>>>> +++ b/net/ipv6/sysctl_net_ipv6.c
>>>>>>> @@ -135,6 +135,14 @@ static struct ctl_table ipv6_table_template[] = {
>>>>>>>            .extra1        = SYSCTL_ZERO,
>>>>>>>            .extra2        = &flowlabel_reflect_max,
>>>>>>>        },
>>>>>>> +    {
>>>>>>> +        .procname    = "max_ext_hdrs_number",
>>>>>>> +        .data        = &init_net.ipv6.sysctl.max_ext_hdrs_cnt,
>>>>>>> +        .maxlen        = sizeof(int),
>>>>>>> +        .mode        = 0644,
>>>>>>> +        .proc_handler    = proc_dointvec_minmax,
>>>>>>> +        .extra1        = SYSCTL_ONE,
>>>>>>> +    },
>>>>>>>        {
>>>>>>>            .procname    = "max_dst_opts_number",
>>>>>>>            .data        = &init_net.ipv6.sysctl.max_dst_opts_cnt,
>>>>>>
>>>>>> NACKed-by: Justin Iurman <justin.iurman@gmail.com>
>>>>>>
>>>>>> +1000 on the need, but NAK on the way it is done. IMO, we don't want
>>>>>> yet-another-sysctl for that. Instead, we have (well, not yet, but it's
>>>>>> about time) this series [1] to enforce ordering and occurrences of
>>>>>> Extension Headers, which is based on an IETF draft [2] (FYI, draft-
>>>>>> ietf-6man-eh-limits is dead). I think we should enforce ordering and
>>>>>> occurrences in this code path too, instead of relying on a sysctl.
>>>>>> Let's keep both code paths consistent.
>>>>
>>>>> Hm, that series [1] should probably go to net instead of net-next, but atm
>>>>
>>>> +1, would make sense.
>>>>
>>>>> hasn't moved since a month. I'd still think max_ext_hdrs_number would be
>>>>> useful given it has less complexity also for stable, but I guess ultimately
>>>>> up to maintainers..
>>>>
>>>> In the short term, I agree. What worries me is that we end up with a
>>>> redundant, or even useless, sysctl once the other series is applied,
>>>> which will only increase user confusion.
>>>
>>> Given the amount of bugs in this code, a sysctl is safe and quire reasonable.
>>>
>>> No one will object when it is eventually removed (or has no action)
>>>
>>> For the record,  I approve Daniel patch.
>>
>> Fair enough. If there is consensus on this patch, then let me just suggest two changes:
>>
>> - make it clear in the sysctl description that it mainly applies to TX (as opposed to the other series [1] discussed earlier that applies to RX)
> 
> Sorry, I meant it does not apply to core RX (ip6_rcv()), which is what series [1] does.
> 
>> - set the default to 8 (which should be the max value) instead of 32, as per RFC8200, Sec. 4.
Ok, I'll switch to use 8 as a default limit and I'm looking to also cover ip6_rcv()
path as well in the next revision given its also affected but less severe as the
icmp6 path.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH RFC net-next v2 2/2] af_packet: Add port specific handling for HSR
From: Willem de Bruijn @ 2026-04-21  7:41 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Willem de Bruijn
  Cc: netdev, (JC), Jayachandran, David S. Miller, Andrew Lunn,
	Chintan Vankar, Danish Anwar, Daolin Qiu, Eric Dumazet,
	Felix Maurer, Jakub Kicinski, Paolo Abeni, Richard Cochran,
	Simon Horman, Raghavendra, Vignesh, Bajjuri, Praneeth,
	TK, Pratheesh Gangadhar, Muralidharan, Neelima
In-Reply-To: <20260416161856.CsL4npMH@linutronix.de>

Sebastian Andrzej Siewior wrote:
> On 2026-04-06 10:47:56 [-0400], Willem de Bruijn wrote:
> Hi Willem,
> 
> > So the requirement is for a communication path between userspace and
> > the driver over packet sockets.
> > 
> > Existing options that work for both rx and tx are
> > 
> > - in-band: a packet header or footer
> > - mark, metadata
> > - maybe: vlan tags
> > 
> > These require changes in the HSR driver to use them, but no changes in
> > the protocol independent core logic, which includes packet sockets.
> > 
> > As I mentioned before we cannot sprinkle protocol specific code
> > throughout protocol independent core code. That quickly leads to an
> > unmaintainable mess. PTP over HSR is a particular small niche case,
> > nowhere near first in line to get an exception to this guideline.
> 
> I understand your concern. I tried to make as self contained as
> possible and little runtime overhead as possible.
> 
> > One perhaps interesting Rx only option I had missed before is
> > SOF_TIMESTAMPING_OPT_PKTINFO. Would that give you the original
> > device ifindex today?
> 
> The upper logic expects to poll() on the fd. If I need to filter the
> device based on this then breaks the expectations.
> I need also to receive packets without a timestamp so I don't think this
> works.

I don't follow this. My suggestion is to optionally receive this
additional metadata along with the data. Not as a filter.
 
> > If so, we now only have to consider the Tx path to the HSR driver
> > (the Tx path directly to the other drivers do not need this metadata).
> > 
> > I'm not convinced that it is hard to come up with a way to send
> > a packet to the HSR driver with an optional header or footer or
> > vlan data (or skb->protocol perhaps?) that cannot be
> > differentiated from other traffic arriving at that ndo_start_xmit.
> 
> I've been looking to skb->protocol. Maybe if the packet has ether type
> set to PTP then the HSR layer could consider everything before it (the
> two MAC address fields) as internal header and the actual packet starts
> after that. Reasoning would be that you shouldn't send a PTP packet over
> HSR without dealing with the restrictions. So this could work.
> 
> Then the question remains how to do the filtering on RX side. For the
> so_mark I did open additional two sockets…
> 
> > If all this fails, we can look into a protocol independent approach
> > to passing other metadata in packet sockets. to/from skb_ext or cb[],
> > say.
> 
> I will try the above but it looks very hackish.
> cb[] is limited to one layer. I do have a skb_ext variant working but
> this requires cmsg to set it. Do you think about generic skb_ext which
> is set from af_packet? But I don't think it brings much value if I can't
> filter on the RX side before returning the packet to userland.
> 
> > But at this point I see enough options that do not require changes
> > to packet sockets.
> > 
> > To get back to the simplest approach: skb->mark. Is there any
> > concrete risk that on this path that would conflict with other
> > uses of that field? If packet sockets inject directly into this
> > driver (possibly even with PACKET_QDISC_BYPASS)?
> 
> So I have a skb->mark variant working. I do read on the ethX interface

When reading on both ethX interfaces, that gives you all the info you
need on Rx, right? Or alternatively by attaching to hsr0 with
SOF_TIMESTAMPING_OPT_PKTINFO.

So skb->mark is only relevant to the Tx side, right? There might be
yet another way to identify in the hsr ndo_start_xmit that a packet
arrived from a PF_PACKET socket. E.g., by checking skb->sk->sk_family.
As alternative or complement to skb->protocol.

Btw, on receive the inverse could also be true: insert a synthetic
header and pop that in userspace, e.g., a VLAN tag.

> and write on the hsr0 interface (so I need two extra fd per interface).
> The only concern here is that the mark value is hardcoded and could
> collide with an existing firewall setup or so.
> This field needs also be evaluated by the ethernet driver in case of
> hw-offloading for HSR.
> So far, this is the only working solution I have which does not touch
> af_packet.
> 
> Let me try the header with the PTP hedaer type and the additional
> sockets for RX.
> It will not win a beauty contest but maybe I judge too harsh…
> 
> Sebastian



^ permalink raw reply

* Re: [PATCH net] ipv6: rpl: expand skb head when recompressed SRH grows, not only on last segment
From: Greg KH @ 2026-04-21  7:48 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: davem, dsahern, edumazet, horms, kuba, linux-kernel, netdev,
	pabeni, stable
In-Reply-To: <2026042140-drench-pursuable-37fa@gregkh>

On Tue, Apr 21, 2026 at 07:50:45AM +0200, Greg KH wrote:
> On Tue, Apr 21, 2026 at 04:52:52AM +0000, Kuniyuki Iwashima wrote:
> > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Date: Mon, 20 Apr 2026 21:32:25 +0200
> > > ipv6_rpl_srh_rcv() processes a Routing Protocol for LLNs Source Routing
> > > Header by decompressing it, swapping the next segment address into
> > > ipv6_hdr->daddr, recompressing, and pushing the new header back. The
> > > recompressed header can be larger than the original when the
> > > address-elision opportunities are worse after the swap.
> > > 
> > > The function pulls (hdr->hdrlen + 1) << 3 bytes (the old header) and
> > > pushes (chdr->hdrlen + 1) << 3 + sizeof(ipv6hdr) bytes (the new header
> > > plus the IPv6 header).  pskb_expand_head() is called to guarantee
> > > headroom only when segments_left == 0.
> > > 
> > > A crafted SRH that loops back to the local host (each segment is a local
> > > address, so ip6_route_input() delivers it back to ipv6_rpl_srh_rcv())
> > > with chdr growing on each pass exhausts headroom over several
> > > iterations.
> > 
> > How could this occur.. ?  Did AI generate a repro or just
> > flagged the possibility ?
> 
> It generated a reproducer which caused a crash which made me have to
> create this patch.  I'll dig it out of the huge pile of mess that was
> sent to me and get it into a form that I can reply here to.

Ok, got the reproducer working, and it turns out that this patch does
NOT fix the issue, I should have tested it better.  Let me work some
more on this thing, sorry for the broken submission.

thanks,

greg k-h

^ permalink raw reply

* [PATCH net v2] ipv6: addrconf: skip ERRDAD transition when address already DEAD
From: Linmao Li @ 2026-04-21  7:50 UTC (permalink / raw)
  To: davem, dsahern, edumazet, kuba, pabeni
  Cc: horms, netdev, linux-kernel, Linmao Li
In-Reply-To: <20260420032842.1063277-1-lilinmao@kylinos.cn>

addrconf_dad_end() transitions ifp->state from DAD to POSTDAD under
ifp->lock and releases the lock.  addrconf_dad_failure() takes
ifp->lock again with the spin_lock_bh() following the
net_info_ratelimited() duplicate-address log.  A concurrent
ipv6_del_addr() can acquire the lock in that window, set ifp->state
to DEAD and run list_del_rcu(&ifp->if_list).

addrconf_dad_failure() then overwrites DEAD with ERRDAD at errdad:
and schedules a new dad_work.  The work calls ipv6_del_addr() again,
hitting the already-poisoned list entry:

  general protection fault: 0000 [#1] SMP NOPTI
  CPU: 4 PID: 217 Comm: kworker/4:1
  Workqueue: ipv6_addrconf addrconf_dad_work
  RIP: 0010:ipv6_del_addr+0xe9/0x280
  RAX: dead000000000122
  Call Trace:
   addrconf_dad_stop+0x113/0x140
   addrconf_dad_work+0x28c/0x430
   process_one_work+0x1eb/0x3b0
   worker_thread+0x4d/0x400
   kthread+0x104/0x140
   ret_from_fork+0x35/0x40

Bail out at errdad: when ifp->state is already DEAD. The existing
in6_ifa_put() releases the reference taken for this invocation.

Fixes: c15b1ccadb32 ("ipv6: move DAD and addrconf_verify processing to workqueue")
Signed-off-by: Linmao Li <lilinmao@kylinos.cn>
---
 net/ipv6/addrconf.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5476b6536eb7..14b1ab43da87 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2227,6 +2227,12 @@ void addrconf_dad_failure(struct sk_buff *skb, struct inet6_ifaddr *ifp)
 
 errdad:
 	/* transition from _POSTDAD to _ERRDAD */
+	if (ifp->state == INET6_IFADDR_STATE_DEAD) {
+		/* ipv6_del_addr() already removed ifp while lock was dropped */
+		spin_unlock_bh(&ifp->lock);
+		in6_ifa_put(ifp);
+		return;
+	}
 	ifp->state = INET6_IFADDR_STATE_ERRDAD;
 	spin_unlock_bh(&ifp->lock);
 
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] ieee802154: ca8210: fix cas_ctl leak on spi_async failure
From: Miquel Raynal @ 2026-04-21  7:52 UTC (permalink / raw)
  To: Shitalkumar Gandhi
  Cc: alex.aring, stefan, andrew+netdev, davem, edumazet, kuba, pabeni,
	linux-wpan, netdev, linux-kernel, stable, Shitalkumar Gandhi
In-Reply-To: <20260421073259.2259783-1-shitalkumar.gandhi@cambiumnetworks.com>

Hello,

On 21/04/2026 at 13:02:59 +0530, Shitalkumar Gandhi <shital.gandhi45@gmail.com> wrote:

> ca8210_spi_transfer() allocates cas_ctl with kzalloc_obj(GFP_ATOMIC)
> and relies entirely on the SPI completion callback
> ca8210_spi_transfer_complete() to free it.

[...]

> Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver")
> Cc: stable@vger.kernel.org
> Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
> ---

Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>

Thanks,
Miquèl

^ permalink raw reply

* Re: [PATCH net v2] slip: reject VJ receive packets on instances with no rstate array
From: patchwork-bot+netdevbpf @ 2026-04-21  8:00 UTC (permalink / raw)
  To: Weiming Shi; +Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev, xmei5
In-Reply-To: <20260415204130.258866-2-bestswngs@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Thu, 16 Apr 2026 04:41:31 +0800 you wrote:
> slhc_init() accepts rslots == 0 as a valid configuration, with the
> documented meaning of 'no receive compression'. In that case the
> allocation loop in slhc_init() is skipped, so comp->rstate stays
> NULL and comp->rslot_limit stays 0 (from the kzalloc of struct
> slcompress).
> 
> The receive helpers do not defend against that configuration.
> slhc_uncompress() dereferences comp->rstate[x] when the VJ header
> carries an explicit connection ID, and slhc_remember() later assigns
> cs = &comp->rstate[...] after only comparing the packet's slot number
> to comp->rslot_limit. Because rslot_limit is 0, slot 0 passes the
> range check, and the code dereferences a NULL rstate.
> 
> [...]

Here is the summary with links:
  - [net,v2] slip: reject VJ receive packets on instances with no rstate array
    https://git.kernel.org/netdev/net/c/e76607442d5b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v5 net] nfc: hci: fix out-of-bounds read in HCP header parsing
From: Paolo Abeni @ 2026-04-21  8:06 UTC (permalink / raw)
  To: Simon Horman, Ashutosh Desai
  Cc: netdev, kuba, edumazet, davem, stable, linux-kernel
In-Reply-To: <20260418163024.GH280379@horms.kernel.org>

On 4/18/26 6:30 PM, Simon Horman wrote:
> On Thu, Apr 16, 2026 at 05:15:22AM +0000, Ashutosh Desai wrote:
>> nfc_hci_recv_from_llc() and nci_hci_data_received_cb() cast skb->data
>> to struct hcp_packet and read the message header byte without checking
>> that enough data is present in the linear sk_buff area. A malicious NFC
>> peer can send a 1-byte HCP frame that passes through the SHDLC layer
>> and reaches these functions, causing an out-of-bounds heap read.
>>
>> Fix this by adding pskb_may_pull() before each cast to ensure the full
>> 2-byte HCP header is pulled into the linear area before it is accessed.
>>
>> Fixes: 8b8d2e08bf0d ("NFC: HCI support")
>> Fixes: 11f54f228643 ("NFC: nci: Add HCI over NCI protocol support")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
>> ---
>> V4 -> V5: fix whitespace damage
>> V3 -> V4: add Fixes tags
>> V2 -> V3: drop redundant checks from nfc_hci_msg_rx_work/nci_hci_msg_rx_work;
>>           remove incorrect Suggested-by tag
>> V1 -> V2: use pskb_may_pull() instead of skb->len check
>>
>> v4: https://lore.kernel.org/netdev/177614425081.3600288.2536320552978506086@gmail.com/
>> v3: https://lore.kernel.org/netdev/20260413024329.3293075-1-ashutoshdesai993@gmail.com/
>> v2: https://lore.kernel.org/netdev/20260409150825.2217133-1-ashutoshdesai993@gmail.com/
>> v1: https://lore.kernel.org/netdev/20260408223113.2009304-1-ashutoshdesai993@gmail.com/
>>
>>  net/nfc/hci/core.c | 5 +++++
>>  net/nfc/nci/hci.c  | 5 +++++
>>  2 files changed, 10 insertions(+)
> 
> Reviewed-by: Simon Horman <horms@kernel.org>
> 
> Review of this patch at Sashiko.dev flags a number of related problems in
> this code. I believe none of them introduced by this patch. And that
> they can all be treated as area for possible follow-up.

I agree that the issue reported by sashiko:
---
Does this patch fully resolve the out-of-bounds access?
Looking at the beginning of nfc_hci_recv_from_llc(), the code accesses
the packet header before checking if the skb has any data:
net/nfc/hci/core.c:nfc_hci_recv_from_llc() {
    packet = (struct hcp_packet *)skb->data;
    if ((packet->header & ~NFC_HCI_FRAGMENT) == 0) {
        skb_queue_tail(&hdev->rx_hcp_frags, skb);
        return;
    }
    ...
}
If a maliciously crafted 0-byte payload is received, couldn't this
result in an out-of-bounds read of uninitialized memory?
Furthermore, if the fragmentation bit is clear and this 0-byte skb
is queued, when a subsequent final fragment arrives, the reassembly
loop calculates the message length:
net/nfc/hci/core.c:nfc_hci_recv_from_llc() {
    ...
    skb_queue_walk(&hdev->rx_hcp_frags, frag_skb) {
        msg_len += (frag_skb->len - NFC_HCI_HCP_PACKET_HEADER_LEN);
    }
    ...
}
Since NFC_HCI_HCP_PACKET_HEADER_LEN is 1, wouldn't a 0-length fragment
cause this calculation to underflow to UINT_MAX, leading to an eventual
skb_over_panic() when skb_put_data() is called?
Would it be safer to add a pskb_may_pull(skb, 1) check at the very start
of the function before packet->header is accessed?
---

is pre-existing but it looks like the validation included here is almost
ineffective without addressing the above.

@Ashutosh, please include the additional validation in the next
revision, thanks!

Paolo


^ permalink raw reply

* Re: [PATCH net] netdevsim: Initialize all fields of ip header when building dummy sk_buff
From: Breno Leitao @ 2026-04-21  8:19 UTC (permalink / raw)
  To: Nikola Z. Ivanov
  Cc: kuba, andrew+netdev, davem, edumazet, pabeni, netdev,
	linux-kernel
In-Reply-To: <20260421073738.22110-1-zlatistiv@gmail.com>

On Tue, Apr 21, 2026 at 10:37:38AM +0300, Nikola Z. Ivanov wrote:
> Syzbot reports a KMSAN uninit-value originating from
> nsim_dev_trap_skb_build, with the allocation also
> being performed in the same function.
> 
> The cause of the KMSAN warning is a missing assignment of
> the tos and id fields of the ip header.
> 
> Fix this by calling skb_put_zero instead of skb_put to
> guarantee null initialization.

> Additionally remove the now redundant zero assignments
> and reorder the remaining ones so that they more closely
> match the order of the fields as they appear in the ip header.
> 
> Closes: https://syzkaller.appspot.com/bug?extid=23d7fcd204e3837866ff

How do you check in the report above that the missig un-initialized
fields are "tos" and "id"?

Thanks for the fix,
--breno

^ permalink raw reply

* Re: [PATCH v4] net: wwan: t7xx: validate port_count against message length in t7xx_port_enum_msg_handler
From: Paolo Abeni @ 2026-04-21  8:25 UTC (permalink / raw)
  To: Pavitra Jha, w; +Cc: chandrashekar.devegowda, linux-wwan, netdev, stable
In-Reply-To: <20260416113205.1789319-1-jhapavitra98@gmail.com>

On 4/16/26 1:32 PM, Pavitra Jha wrote:
> t7xx_port_enum_msg_handler() uses the modem-supplied port_count field as
> a loop bound over port_msg->data[] without checking that the message buffer
> contains sufficient data. A modem sending port_count=65535 in a 12-byte
> buffer triggers a slab-out-of-bounds read of up to 262140 bytes.
> 
> Add a struct_size() check after extracting port_count and before the loop.
> Pass msg_len to t7xx_port_enum_msg_handler() and use it to validate
> the message size before accessing port_msg->data[].
> Pass msg_len from both call sites: skb->len at the DPMAIF path after
> skb_pull(), and the captured rt_feature->data_len at the handshake path.
> 
> Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface")
> Cc: stable@vger.kernel.org
> Signed-off-by: Pavitra Jha <jhapavitra98@gmail.com>

Does not apply cleanly to net, patch says:

patching file drivers/net/wwan/t7xx/t7xx_modem_ops.c
patch: **** malformed patch at line 41:  	return 0;

are you using any https://xkcd.com/378/ derived method to cook your
path? please stick to old good git and verify your local configuration.

/P


^ permalink raw reply

* Re: [PATCH net] slip: bound decode() reads against the compressed packet length
From: patchwork-bot+netdevbpf @ 2026-04-21  8:30 UTC (permalink / raw)
  To: Weiming Shi; +Cc: andrew+netdev, davem, edumazet, kuba, pabeni, horms, netdev
In-Reply-To: <20260416100147.531855-5-bestswngs@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Thu, 16 Apr 2026 18:01:51 +0800 you wrote:
> slhc_uncompress() parses a VJ-compressed TCP header by advancing a
> pointer through the packet via decode() and pull16(). Neither helper
> bounds-checks against isize, and decode() masks its return with
> & 0xffff so it can never return the -1 that callers test for -- those
> error paths are dead code.
> 
> A short compressed frame whose change byte requests optional fields
> lets decode() read past the end of the packet. The over-read bytes
> are folded into the cached cstate and reflected into subsequent
> reconstructed packets.
> 
> [...]

Here is the summary with links:
  - [net] slip: bound decode() reads against the compressed packet length
    https://git.kernel.org/netdev/net/c/4c1367a2d7aa

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH 03/23] tick/nohz: Make nohz_full parameter optional
From: Thomas Gleixner @ 2026-04-21  8:32 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Catalin Marinas, Will Deacon,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Guenter Roeck, Frederic Weisbecker, Paul E. McKenney,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Anna-Maria Behnsen, Ingo Molnar,
	Chen Ridong, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: cgroups, linux-doc, linux-kernel, linux-arm-kernel, linux-hyperv,
	linux-hwmon, rcu, netdev, linux-kselftest, Costa Shulyupin,
	Qiliang Yuan, Waiman Long
In-Reply-To: <20260421030351.281436-4-longman@redhat.com>

On Mon, Apr 20 2026 at 23:03, Waiman Long wrote:
> To provide nohz_full tick support, there is a set of tick dependency
> masks that need to be evaluated on every IRQ and context switch.

s/IRQ/interrupt/

This is a changelog and not a SMS service.

> Switching on nohz_full tick support at runtime will be problematic
> as some of the tick dependency masks may not be properly set causing
> problem down the road.

That's useless blurb with zero content.

> Allow nohz_full boot option to be specified without any
> parameter to force enable nohz_full tick support without any
> CPU in the tick_nohz_full_mask yet. The context_tracking_key and
> tick_nohz_full_running flag will be enabled in this case to make
> tick_nohz_full_enabled() return true.

I kinda can crystal-ball what you are trying to say here, but that does
not make it qualified as a proper change log.

> There is still a small performance overhead by force enable nohz_full
> this way. So it should only be used if there is a chance that some
> CPUs may become isolated later via the cpuset isolated partition
> functionality and better CPU isolation closed to nohz_full is desired.

Why has this key to be enabled on boot if there are no CPUs in the
isolated mask?

If you want to manage this dynamically at runtime then enable the key
once CPUs are isolated. Yes, it's more work, but that avoids the "should
only be used" nonsense and makes this more robust down the road.

Thanks,

        tglx



^ permalink raw reply

* Re: [PATCH v5 net] ax25: fix OOB read after address header strip in ax25_rcv()
From: David Laight @ 2026-04-21  8:41 UTC (permalink / raw)
  To: Ashutosh Desai
  Cc: netdev, linux-hams, jreuter, davem, edumazet, kuba, pabeni, horms,
	linux-kernel, stable
In-Reply-To: <20260421054626.732399-1-ashutoshdesai993@gmail.com>

On Tue, 21 Apr 2026 05:46:26 +0000
Ashutosh Desai <ashutoshdesai993@gmail.com> wrote:

> A crafted AX.25 frame with a valid address header but no control byte
> causes skb->len to reach zero after skb_pull() strips the header.
> The subsequent reads of skb->data[0] (control) and skb->data[1] (PID)
> are then out of bounds.
> 
> Linearize the skb after confirming the device is an AX.25 interface.
> Guard with skb->len < 1 after the pull - one byte suffices for LAPB
> control frames which have no PID byte. Add a separate skb->len < 2
> check inside the UI branch before accessing the PID byte.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: stable@vger.kernel.org
> Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
> ---
> v5:
> - Move skb_linearize() to after ax25_dev_ax25dev() check; avoids
>   unnecessary allocation for frames on non-AX.25 interfaces

Nitpick: 'on interfaces where AX.25 isn't enabled'
They still have to be AX.25 frames and get discarded.
So they won't really be expected and any allocated memory is
immediately freed.
More relevant would be linearizing before the ax25_addr_parse() call.
In any case I suspect this code never sees non-linear packets.
The packets will all be short, I don't know ax25, but X.25 (which I've
implemented most of in the past) originally had an mtu of 128 bytes
(and real links running at 2400 baud).

	David


> - Lower general guard from skb->len < 2 to skb->len < 1; the stricter
>   limit incorrectly dropped valid 1-byte LAPB control frames (SABM,
>   DISC, UA, DM, RR) which carry no PID byte
> - Add explicit skb->len < 2 check inside UI branch before the PID
>   byte (skb->data[1]) access
> v4:
> - Linearize skb at entry to ax25_rcv(); replace pskb_may_pull() with
>   skb->len < 2 check (per David Laight review)
> v3:
> - Remove incorrect Suggested-by; add Fixes:, Cc: stable@
> v2:
> - Replace skb->len check with pskb_may_pull(skb, 2)
> 
> Link to v4: https://lore.kernel.org/netdev/20260417065407.206499-1-ashutoshdesai993@gmail.com/
> Link to v3: https://lore.kernel.org/netdev/20260415063654.3831353-1-ashutoshdesai993@gmail.com/
> Link to v2: https://lore.kernel.org/netdev/20260409152400.2219716-1-ashutoshdesai993@gmail.com/
> Link to v1: https://lore.kernel.org/netdev/20260409012235.2049389-1-ashutoshdesai993@gmail.com/
> 
>  net/ax25/ax25_in.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/net/ax25/ax25_in.c b/net/ax25/ax25_in.c
> index d75b3e9ed93d..c81d6830af48 100644
> --- a/net/ax25/ax25_in.c
> +++ b/net/ax25/ax25_in.c
> @@ -199,6 +199,9 @@ static int ax25_rcv(struct sk_buff *skb, struct net_device *dev,
>  	if ((ax25_dev = ax25_dev_ax25dev(dev)) == NULL)
>  		goto free;
>  
> +	if (skb_linearize(skb))
> +		goto free;
> +
>  	/*
>  	 *	Parse the address header.
>  	 */
> @@ -217,6 +220,9 @@ static int ax25_rcv(struct sk_buff *skb, struct net_device *dev,
>  	 */
>  	skb_pull(skb, ax25_addr_size(&dp));
>  
> +	if (skb->len < 1)
> +		goto free;
> +
>  	/* For our port addresses ? */
>  	if (ax25cmp(&dest, dev_addr) == 0 && dp.lastrepeat + 1 == dp.ndigi)
>  		mine = 1;
> @@ -227,6 +233,9 @@ static int ax25_rcv(struct sk_buff *skb, struct net_device *dev,
>  
>  	/* UI frame - bypass LAPB processing */
>  	if ((*skb->data & ~0x10) == AX25_UI && dp.lastrepeat + 1 == dp.ndigi) {
> +		if (skb->len < 2)
> +			goto free;
> +
>  		skb_set_transport_header(skb, 2); /* skip control and pid */
>  
>  		ax25_send_to_raw(&dest, skb, skb->data[1]);


^ permalink raw reply

* [PATCH 1/1] io_uring/zcrx: warn on freelist violations
From: Pavel Begunkov @ 2026-04-21  8:45 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, axboe, netdev

The freelist is appropriately sized to always be able to take a free
niov, but let's be more defensive and check the invariant with a
warning. That should help to catch any double-free issues.

Suggested-by: Kai Aizen <kai@snailsploit.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 2eb09219f0a0..7b93c87b8371 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -602,6 +602,8 @@ static void io_zcrx_return_niov_freelist(struct net_iov *niov)
 	struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
 
 	guard(spinlock_bh)(&area->freelist_lock);
+	if (WARN_ON_ONCE(area->free_count >= area->nia.num_niovs))
+		return;
 	area->freelist[area->free_count++] = net_iov_idx(niov);
 }
 
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v4 net] net: ax25: fix integer overflow in ax25_rx_fragment()
From: Hugh Blemings @ 2026-04-21  8:45 UTC (permalink / raw)
  To: Paolo Abeni, Mashiro Chen, netdev
  Cc: linux-hams, kuba, horms, davem, edumazet, Jakub Kicinski, Greg KH
In-Reply-To: <805a8583-6a84-4dfb-a4d4-53f80f50effc@redhat.com>

Hi Paolo, All,

On 21/4/2026 17:29, Paolo Abeni wrote:
> On 4/13/26 10:49 PM, Mashiro Chen wrote:
>> ax25_rx_fragment() accumulates fragment lengths into ax25_cb->fraglen,
>> which is an unsigned short. When the total exceeds 65535, fraglen wraps
>> around to a small value. The subsequent alloc_skb(fraglen) allocates a
>> too-small buffer, and skb_put() in the copy loop triggers skb_over_panic().
>>
>> Add pskb_may_pull(skb, 1) at function entry to ensure the segmentation
>> header byte is in the linear data area before dereferencing skb->data.
>> This also rejects zero-length skbs, which the original code did not
>> check for.
>>
>> Two issues in the overflow error path are also fixed:
>> First, the current skb, after skb_pull(skb, 1), is neither enqueued
>> nor freed before returning 1, leaking it. Add kfree_skb(skb) before
>> the return.
>> Second, ax25->fraglen is not reset after skb_queue_purge(). Add
>> ax25->fraglen = 0 to restore a consistent state.
>>
>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>> Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org>
> we are moving ax25 out of tree:
>
> https://lore.kernel.org/netdev/20260421021824.1293976-1-kuba@kernel.org/
>
> please hold off until Thursday (after that our net PR will land into
> mainline), and eventually resend if the code still exists in Linus's
> tree at that point.

Is there any flexibility here ?

Jakubs (CC'd) patches to remove unfortunately weren't cross posted to 
linux-hams and so I'm not able to directly reply in netdev

We've had a thread ongoing in linux-hams around the future of 
AX25/ROSE/NETROM for the last week or so and believe we've a path 
towards an orderly exit from the mainline tree, probably towards a 
userspace implementation. This includes a couple of folks who have 
indicated they would be open to overseeing the maintenance of the code 
in the meantime.

We'd hoped to have a period of a few months to do an orderly exit from 
the tree to minimise the impact on the (admittedly small, but non-zero) 
users that build trees/make use of the in kernel support.

Apologies for my lack of familiarity with the process here to deprecate etc.

Cheers/73
Hugh


-- 
I am slowly moving to hugh@blemings.id.au as my main email address.
If you're using hugh@blemings.org please update your address book accordingly.
Thank you :)


^ permalink raw reply

* [PATCH 1/1] io_uring/zcrx: clear RQ headers on init
From: Pavel Begunkov @ 2026-04-21  8:46 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, axboe, netdev

It might be unexpected to users if the RQ head/tail after a ring
creation are not zeroed, fix that.

Cc: stable@vger.kernel.org
Fixes: 6f377873cb239 ("io_uring/zcrx: add interface queue and refill queue")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index fab3693ecb0d..2eb09219f0a0 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -396,6 +396,7 @@ static int io_allocate_rbuf_ring(struct io_ring_ctx *ctx,
 	ifq->rq.ring = (struct io_uring *)ptr;
 	ifq->rq.rqes = (struct io_uring_zcrx_rqe *)(ptr + off);
 
+	memset(ifq->rq.ring, 0, sizeof(*ifq->rq.ring));
 	return 0;
 }
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH 1/1] io_uring/zcrx: fix user_struct uaf
From: Pavel Begunkov @ 2026-04-21  8:47 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, axboe, netdev

io_free_rbuf_ring() usees a struct user_struct, which
io_zcrx_ifq_free() puts it down before destroying the ring.

Cc: stable@vger.kernel.org
Fixes: 5c686456a4e83 ("io_uring/zcrx: add user_struct and mm_struct to io_zcrx_ifq")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 9a83d7eb4210..fab3693ecb0d 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -579,13 +579,13 @@ static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq)
 
 	if (ifq->area)
 		io_zcrx_free_area(ifq, ifq->area);
-	free_uid(ifq->user);
 	if (ifq->mm_account)
 		mmdrop(ifq->mm_account);
 	if (ifq->dev)
 		put_device(ifq->dev);
 
 	io_free_rbuf_ring(ifq);
+	free_uid(ifq->user);
 	mutex_destroy(&ifq->pp_lock);
 	kfree(ifq);
 }
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net v4 1/2] net/sched: taprio: fix NULL pointer dereference in class dump
From: Paolo Abeni @ 2026-04-21  8:49 UTC (permalink / raw)
  To: Weiming Shi, jhs, vinicius.gomes, jiri, davem, edumazet, kuba,
	shuah
  Cc: horms, vladimir.oltean, xmei5, netdev, linux-kselftest
In-Reply-To: <20260416185501.647884-3-bestswngs@gmail.com>

On 4/16/26 8:55 PM, Weiming Shi wrote:
> @@ -2196,14 +2199,14 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
>  	*old = q->qdiscs[cl - 1];
>  	if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
>  		WARN_ON_ONCE(dev_graft_qdisc(dev_queue, new) != *old);
> -		if (new)
> +		if (new != &noop_qdisc)
>  			qdisc_refcount_inc(new);
>  		if (*old)
>  			qdisc_put(*old);

Unless I'm lost, it looks like taprio_leaf() can now return
`noop_qdisc`. As a consequence, `old` can be a valid qdisc, NULL or even
`noop_qdisc`. In the latter case it should not decrease the refcount, as
it was not increased previously.

/P


^ permalink raw reply

* Re: [PATCH 04/23] tick/nohz: Allow runtime changes in full dynticks CPUs
From: Thomas Gleixner @ 2026-04-21  8:50 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Catalin Marinas, Will Deacon,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Guenter Roeck, Frederic Weisbecker, Paul E. McKenney,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Anna-Maria Behnsen, Ingo Molnar,
	Chen Ridong, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: cgroups, linux-doc, linux-kernel, linux-arm-kernel, linux-hyperv,
	linux-hwmon, rcu, netdev, linux-kselftest, Costa Shulyupin,
	Qiliang Yuan, Waiman Long
In-Reply-To: <20260421030351.281436-5-longman@redhat.com>

On Mon, Apr 20 2026 at 23:03, Waiman Long wrote:
> Full dynticks can only be enabled if "nohz_full" boot option has been
> been specified with or without parameter. Any change in the list of
> nohz_full CPUs have to be reflected in tick_nohz_full_mask. Introduce
> a new tick_nohz_full_update_cpus() helper that can be called to update
> the tick_nohz_full_mask at run time. The housekeeping_update() function
> is modified to call the new helper when the HK_TYPE_KERNEL_NOSIE cpumask
> is going to be changed.
>
> We also need to enable CPU context tracking for those CPUs that

We need nothing. Use passive voice for change logs as requested in
documentation.

> are in tick_nohz_full_mask. So remove __init from tick_nohz_init()
> and ct_cpu_track_user() so that they be called later when an isolated
> cpuset partition is being created. The __ro_after_init attribute is
> taken away from context_tracking_key as well.
>
> Also add a new ct_cpu_untrack_user() function to reverse the action of
> ct_cpu_track_user() in case we need to disable the nohz_full mode of
> a CPU.
>
> With nohz_full enabled, the boot CPU (typically CPU 0) will be the
> tick CPU which cannot be shut down easily. So the boot CPU should not
> be used in an isolated cpuset partition.
>
> With runtime modification of nohz_full CPUs, tick_do_timer_cpu can become
> TICK_DO_TIMER_NONE. So remove the two TICK_DO_TIMER_NONE WARN_ON_ONCE()
> checks in tick-sched.c to avoid unnecessary warnings.

in tick-sched.c? Describe the functions which contain that.

>  static inline void tick_nohz_task_switch(void)
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index 925999de1a28..394e432630a3 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -411,7 +411,7 @@ static __always_inline void ct_kernel_enter(bool user, int offset) { }
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/context_tracking.h>
>  
> -DEFINE_STATIC_KEY_FALSE_RO(context_tracking_key);
> +DEFINE_STATIC_KEY_FALSE(context_tracking_key);
>  EXPORT_SYMBOL_GPL(context_tracking_key);
>  
>  static noinstr bool context_tracking_recursion_enter(void)
> @@ -674,9 +674,9 @@ void user_exit_callable(void)
>  }
>  NOKPROBE_SYMBOL(user_exit_callable);
>  
> -void __init ct_cpu_track_user(int cpu)
> +void ct_cpu_track_user(int cpu)
>  {
> -	static __initdata bool initialized = false;
> +	static bool initialized;
>  
>  	if (cpu == CONTEXT_TRACKING_FORCE_ENABLE) {
>  		static_branch_inc(&context_tracking_key);
> @@ -700,6 +700,15 @@ void __init ct_cpu_track_user(int cpu)
>  	initialized = true;
>  }
>  
> +void ct_cpu_untrack_user(int cpu)
> +{
> +	if (!per_cpu(context_tracking.active, cpu))
> +		return;
> +
> +	per_cpu(context_tracking.active, cpu) = false;
> +	static_branch_dec(&context_tracking_key);
> +}
> +

Why is this in a patch which makes tick/nohz related changes? This is a
preparatory change, so make it that way and do not bury it inside
something else.

> +/* Get the new set of run-time nohz CPU list & update accordingly */
> +void tick_nohz_full_update_cpus(struct cpumask *cpumask)
> +{
> +	int cpu;
> +
> +	if (!tick_nohz_full_running) {
> +		pr_warn_once("Full dynticks cannot be enabled without the nohz_full kernel boot parameter!\n");

That's the result of this enforced enable hackery. Make this work
properly.

> +		return;
> +	}
> +
> +	/*
> +	 * To properly enable/disable nohz_full dynticks for the affected CPUs,
> +	 * the new nohz_full CPUs have to be copied to tick_nohz_full_mask and
> +	 * ct_cpu_track_user/ct_cpu_untrack_user() will have to be called
> +	 * for those CPUs that have their states changed. Those CPUs should be
> +	 * in an offline state.
> +	 */
> +	for_each_cpu_andnot(cpu, cpumask, tick_nohz_full_mask) {
> +		WARN_ON_ONCE(cpu_online(cpu));
> +		ct_cpu_track_user(cpu);
> +		cpumask_set_cpu(cpu, tick_nohz_full_mask);
> +	}
> +
> +	for_each_cpu_andnot(cpu, tick_nohz_full_mask, cpumask) {
> +		WARN_ON_ONCE(cpu_online(cpu));
> +		ct_cpu_untrack_user(cpu);
> +		cpumask_clear_cpu(cpu, tick_nohz_full_mask);
> +	}
> +}

So this writes to tick_nohz_full_mask while other CPUs can access
it. That's just wrong and I'm not at all interested in the resulting
KCSAN warnings.

tick_nohz_full_mask needs to become a RCU protected pointer, which is
updated once the new mask is established in a separately allocated one.

Thanks,

        tglx



^ permalink raw reply

* [PATCH net-next] nfp: fix swapped arguments in nfp_encode_basic_qdr() calls
From: Alexey Kodanev @ 2026-04-21  8:51 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Simon Horman, Andrew Lunn, David S . Miller,
	Eric Dumazet, Paolo Abeni, oss-drivers, Alexey Kodanev

There is a mismatch between the passed arguments and the actual
nfp_encode_basic_qdr() function parameters names:

  static int nfp_encode_basic_qdr(u64 addr, int dest_island, int cpp_tgt,
                                  int mode, bool addr40, int isld1,
                                  int isld0)
  {
      ...

But "dest_island" and "cpp_tgt" are swapped at every call-site.
For example:

  return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island,
                              mode, addr40, isld1, isld0);

As a result, nfp_encode_basic_qdr() receives "dest_island" as CPP target
type, which is always NFP_CPP_TARGET_QDR(2) for these calls, and "cpp_tgt"
as the destination island ID, which can accidentally match or be outside
the valid NFP_CPP_TARGET_* types (e.g. '-1' for any destination).

Detected using the static analysis tool - Svace.

Fixes: 4cb584e0ee7d ("nfp: add CPP access core")
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c
index 79470f198a62..5c1edd143cee 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c
@@ -493,7 +493,7 @@ static int nfp_encode_basic(u64 *addr, int dest_island, int cpp_tgt,
 			 * the address but we can verify if the existing
 			 * contents will point to a valid island.
 			 */
-			return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island,
+			return nfp_encode_basic_qdr(*addr, dest_island, cpp_tgt,
 						    mode, addr40, isld1, isld0);
 
 		iid_lsb = addr40 ? 34 : 26;
@@ -504,7 +504,7 @@ static int nfp_encode_basic(u64 *addr, int dest_island, int cpp_tgt,
 		return 0;
 	case 1:
 		if (cpp_tgt == NFP_CPP_TARGET_QDR && !addr40)
-			return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island,
+			return nfp_encode_basic_qdr(*addr, dest_island, cpp_tgt,
 						    mode, addr40, isld1, isld0);
 
 		idx_lsb = addr40 ? 39 : 31;
@@ -530,7 +530,7 @@ static int nfp_encode_basic(u64 *addr, int dest_island, int cpp_tgt,
 			 * be set before hand and with them select an island.
 			 * So we need to confirm that it's at least plausible.
 			 */
-			return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island,
+			return nfp_encode_basic_qdr(*addr, dest_island, cpp_tgt,
 						    mode, addr40, isld1, isld0);
 
 		/* Make sure we compare against isldN values
@@ -551,7 +551,7 @@ static int nfp_encode_basic(u64 *addr, int dest_island, int cpp_tgt,
 			 * iid<1> = addr<30> = channel<0>
 			 * channel<1> = addr<31> = Index
 			 */
-			return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island,
+			return nfp_encode_basic_qdr(*addr, dest_island, cpp_tgt,
 						    mode, addr40, isld1, isld0);
 
 		isld[0] &= ~3;
-- 
2.25.1


^ permalink raw reply related

* [PATCH net] net: airoha: Do not wake all netdev TX queues in airoha_qdma_wake_netdev_txqs()
From: Lorenzo Bianconi @ 2026-04-21  8:53 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: linux-arm-kernel, linux-mediatek, netdev, Lorenzo Bianconi

Do not wake every netdev TX queue across all ports sharing the QDMA
running netif_tx_wake_all_queues routine in airoha_qdma_wake_netdev_txqs()
but only the ones that are mapped the specific QDMA stopped hw TX queue.
This patch can potentially avoid waking already stopped netdev TX queues
that are mapped to a different QDMA hw TX queue.
Introduce airoha_qdma_get_txq utility routine.

Fixes: b94769eb2f30 ("net: airoha: Fix possible TX queue stall in airoha_qdma_tx_napi_poll()")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 19 +++++++++++++++----
 drivers/net/ethernet/airoha/airoha_eth.h |  5 +++++
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 19f67c7dd8e1..2ca569501045 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -847,13 +847,24 @@ static void airoha_qdma_wake_netdev_txqs(struct airoha_queue *q)
 {
 	struct airoha_qdma *qdma = q->qdma;
 	struct airoha_eth *eth = qdma->eth;
-	int i;
+	int i, qid = q - &qdma->q_tx[0];
 
 	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
 		struct airoha_gdm_port *port = eth->ports[i];
+		int j;
+
+		if (!port)
+			continue;
 
-		if (port && port->qdma == qdma)
-			netif_tx_wake_all_queues(port->dev);
+		if (port->qdma != qdma)
+			continue;
+
+		for (j = 0; j < port->dev->num_tx_queues; j++) {
+			if (airoha_qdma_get_txq(qdma, j) != qid)
+				continue;
+
+			netif_wake_subqueue(port->dev, j);
+		}
 	}
 	q->txq_stopped = false;
 }
@@ -1965,7 +1976,7 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
 	u16 index;
 	u8 fport;
 
-	qid = skb_get_queue_mapping(skb) % ARRAY_SIZE(qdma->q_tx);
+	qid = airoha_qdma_get_txq(qdma, skb_get_queue_mapping(skb));
 	tag = airoha_get_dsa_tag(skb, dev);
 
 	msg0 = FIELD_PREP(QDMA_ETH_TXMSG_CHAN_MASK,
diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
index 87b328cfefb0..c3ea7aadbd82 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.h
+++ b/drivers/net/ethernet/airoha/airoha_eth.h
@@ -631,6 +631,11 @@ u32 airoha_rmw(void __iomem *base, u32 offset, u32 mask, u32 val);
 #define airoha_qdma_clear(qdma, offset, val)			\
 	airoha_rmw((qdma)->regs, (offset), (val), 0)
 
+static inline u16 airoha_qdma_get_txq(struct airoha_qdma *qdma, u16 qid)
+{
+	return qid % ARRAY_SIZE(qdma->q_tx);
+}
+
 static inline bool airoha_is_lan_gdm_port(struct airoha_gdm_port *port)
 {
 	/* GDM1 port on EN7581 SoC is connected to the lan dsa switch.

---
base-commit: a663bac71a2f0b3ac6c373168ca57b2a6e6381aa
change-id: 20260421-airoha-wake_netdev_txqs-optmization-65171ce4ebad

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>


^ permalink raw reply related

* Re: [PATCH net] netdevsim: Initialize all fields of ip header when building dummy sk_buff
From: Nikola Z. Ivanov @ 2026-04-21  8:54 UTC (permalink / raw)
  To: Breno Leitao
  Cc: kuba, andrew+netdev, davem, edumazet, pabeni, netdev,
	linux-kernel
In-Reply-To: <aecyEArzjx8KRp8-@gmail.com>



On 4/21/26 11:19 AM, Breno Leitao wrote:
> On Tue, Apr 21, 2026 at 10:37:38AM +0300, Nikola Z. Ivanov wrote:
>> Syzbot reports a KMSAN uninit-value originating from
>> nsim_dev_trap_skb_build, with the allocation also
>> being performed in the same function.
>>
>> The cause of the KMSAN warning is a missing assignment of
>> the tos and id fields of the ip header.
>>
>> Fix this by calling skb_put_zero instead of skb_put to
>> guarantee null initialization.
>> Additionally remove the now redundant zero assignments
>> and reorder the remaining ones so that they more closely
>> match the order of the fields as they appear in the ip header.
>>
>> Closes: https://syzkaller.appspot.com/bug?extid=23d7fcd204e3837866ff
> How do you check in the report above that the missig un-initialized
> fields are "tos" and "id"?
>
> Thanks for the fix,
> --breno
Hi Breno,

I don't think it is visible here, my guess would
be because the checksum calculator walks the
header in small chunks instead of referencing
its fields.

The whole "KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt"
doesn't really sound quite right.

Thank you!

^ permalink raw reply

* Re: [PATCH 05/23] tick: Pass timer tick job to an online HK CPU in tick_cpu_dying()
From: Thomas Gleixner @ 2026-04-21  8:55 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Catalin Marinas, Will Deacon,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Guenter Roeck, Frederic Weisbecker, Paul E. McKenney,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Anna-Maria Behnsen, Ingo Molnar,
	Chen Ridong, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: cgroups, linux-doc, linux-kernel, linux-arm-kernel, linux-hyperv,
	linux-hwmon, rcu, netdev, linux-kselftest, Costa Shulyupin,
	Qiliang Yuan, Waiman Long
In-Reply-To: <20260421030351.281436-6-longman@redhat.com>

On Mon, Apr 20 2026 at 23:03, Waiman Long wrote:
> In tick_cpu_dying(), if the dying CPU is the current timekeeper,
> it has to pass the job over to another CPU. The current code passes
> it to another online CPU. However, that CPU may not be a timer tick
> housekeeping CPU.  If that happens, another CPU will have to manually
> take it over again later. Avoid this unnecessary work by directly
> assigning an online housekeeping CPU.
>
> Use READ_ONCE/WRITE_ONCE() to access tick_do_timer_cpu in case the
> non-HK CPUs may not be in stop machine in the future.

'may not be in the future' is yet more handwaving without
content. Please write your change logs in a way so that people who have
not spent months on this can follow.

> @@ -394,12 +395,19 @@ int tick_cpu_dying(unsigned int dying_cpu)
>  {
>  	/*
>  	 * If the current CPU is the timekeeper, it's the only one that can
> -	 * safely hand over its duty. Also all online CPUs are in stop
> -	 * machine, guaranteed not to be idle, therefore there is no
> +	 * safely hand over its duty. Also all online housekeeping CPUs are
> +	 * in stop machine, guaranteed not to be idle, therefore there is no
>  	 * concurrency and it's safe to pick any online successor.
>  	 */
> -	if (tick_do_timer_cpu == dying_cpu)
> -		tick_do_timer_cpu = cpumask_first(cpu_online_mask);
> +	if (READ_ONCE(tick_do_timer_cpu) == dying_cpu) {
> +		unsigned int new_cpu;
> +
> +		guard(rcu)();

What's this guard for?

> +		new_cpu = cpumask_first_and(cpu_online_mask, housekeeping_cpumask(HK_TYPE_TICK));

Why has this to use housekeeping_cpumask() and does not use
tick_nohz_full_mask?

Thanks,

        tglx

^ permalink raw reply

* Re: [PATCH 10/23] cpu: Use RCU to protect access of HK_TYPE_TIMER cpumask
From: Thomas Gleixner @ 2026-04-21  8:57 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Catalin Marinas, Will Deacon,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Guenter Roeck, Frederic Weisbecker, Paul E. McKenney,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Anna-Maria Behnsen, Ingo Molnar,
	Chen Ridong, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: cgroups, linux-doc, linux-kernel, linux-arm-kernel, linux-hyperv,
	linux-hwmon, rcu, netdev, linux-kselftest, Costa Shulyupin,
	Qiliang Yuan, Waiman Long
In-Reply-To: <20260421030351.281436-11-longman@redhat.com>

On Mon, Apr 20 2026 at 23:03, Waiman Long wrote:
> As HK_TYPE_TIMER cpumask is going to be changeable at run time, use
> RCU to protect access to the cpumask.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cpu.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index bc4f7a9ba64e..0d02b5d7a7ba 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1890,6 +1890,8 @@ int freeze_secondary_cpus(int primary)
>  	cpu_maps_update_begin();
>  	if (primary == -1) {
>  		primary = cpumask_first(cpu_online_mask);
> +
> +		guard(rcu)();
>  		if (!housekeeping_cpu(primary, HK_TYPE_TIMER))
>  			primary = housekeeping_any_cpu(HK_TYPE_TIMER);

housekeeping_cpu() and housekeeping_any_cpu() can operate on two
different CPU masks once the runtime update is enabled.

Seriously?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox