* AppArmor: TCP Fast Open bypasses connect mediation (last unaddressed LSM)
From: Bryam Vargas @ 2026-06-19 1:11 UTC (permalink / raw)
To: John Johansen, linux-security-module, apparmor
Cc: Paul Moore, James Morris, Serge E . Hallyn, Mickael Salaun,
Stephen Smalley, Matthieu Buffet, Mikhail Ivanov, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
Hello John, and LSM folks,
I have been working on the Landlock TCP Fast Open connect bypass [1]. Stephen
Smalley's SELinux fix for the same issue [3] -- "Similar to Landlock, SELinux was
not updated when TCP Fast Open support was introduced ..." -- made me go back and
check the rest of the connect-mediating LSMs, since I had only been looking at
Landlock. With Landlock [2], SELinux [3], and now TOMOYO [4] all getting fixes,
AppArmor is the last one with the same gap and no fix yet.
Root cause (shared with the others)
-----------------------------------
security_socket_connect() has a single call site, net/socket.c (the connect(2)
syscall). TCP Fast Open performs an implicit connect inside sendmsg:
tcp_sendmsg -> tcp_sendmsg_fastopen -> __inet_stream_connect(..., is_sendmsg=1)
-> sk->sk_prot->connect() net/ipv4/{tcp.c,af_inet.c}
This never calls security_socket_connect(); the only LSM hook on the path is
security_socket_sendmsg(). mptcp_sendmsg_fastopen reaches the same code and is a
second producer.
AppArmor
--------
apparmor_socket_connect() requests AA_MAY_CONNECT; apparmor_socket_sendmsg() (via
aa_sock_msg_perm) requests AA_MAY_SEND. These are distinct bits, and apparmor_parser
compiles them independently: "network send inet stream," yields accept mask 0x02
while "network connect inet stream," yields 0x40. So an egress-restriction profile
that grants send but not connect is bypassed by MSG_FASTOPEN.
Reproduced on 6.12.88 with apparmor active. Under a profile granting the inet/inet6
stream lifecycle except connect:
aa-exec -p egress_restricted -- ./probe
[TCP ] connect(2)=EACCES(blocked) sendto(MSG_FASTOPEN)=OK(reached) => connection established
[TCP6] connect(2)=EACCES(blocked) sendto(MSG_FASTOPEN)=OK(reached) => connection established
(The coarse "network inet stream," idiom grants connect anyway, so this only bites the
fine-grained "allow send, deny connect" policy that the asymmetry is meant to serve.)
Fix
---
Same shape as the TOMOYO [4] and SELinux [3] fixes: in apparmor_socket_sendmsg (or
aa_sock_msg_perm), when MSG_FASTOPEN is set and msg_name carries a destination on a
not-yet-connected stream socket, additionally require aa_sk_perm(OP_CONNECT,
AA_MAY_CONNECT, sk). I am happy to send that patch and the reproducer.
(A single core check in __inet_stream_connect(), gated on is_sendmsg, would have
covered all five LSMs and both the TCP and MPTCP producers in one place -- the kernel
already mediates the analogous implicit-connect-on-send for AF_UNIX via
security_unix_may_send and for SCTP via security_sctp_bind_connect. But since the
other four LSMs are taking per-hook fixes, AppArmor matching them is the consistent
move; mentioning the core option only in case it is preferred.)
[1] Landlock: LANDLOCK_ACCESS_NET_CONNECT_TCP bypass via TCP Fast Open (report)
https://lore.kernel.org/r/20260616201615.275032-1-hexlabsecurity@proton.me
[2] landlock: fix TCP Fast Open connection bypass (Matthieu Buffet)
https://lore.kernel.org/r/20260617180526.15627-2-matthieu@buffet.re
[3] selinux: check connect-related permissions on TCP Fast Open (Stephen Smalley)
https://lore.kernel.org/r/20260618175513.112443-2-stephen.smalley.work@gmail.com
[4] tomoyo: Enforce connect policy in TCP Fast Open (Matthieu Buffet)
https://lore.kernel.org/r/20260619002207.61104-1-matthieu@buffet.re
Thanks,
Bryam Vargas
^ permalink raw reply
* Re: [PATCH v2] [net] net: airoha: fix foe_check_time allocation size
From: patchwork-bot+netdevbpf @ 2026-06-19 1:10 UTC (permalink / raw)
To: Wayen Yan
Cc: netdev, lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
linux-mediatek
In-Reply-To: <178161119471.2163752.14373384830691569758@gmail.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 16 Jun 2026 19:52:36 +0800 you wrote:
> foe_check_time is declared as u16 pointer but was allocated with
> only ppe_num_entries bytes instead of ppe_num_entries * sizeof(u16).
>
> When airoha_ppe_foe_verify_entry() is called with hash >= ppe_num_entries/2,
> it writes beyond the allocated buffer, causing heap buffer overflow and
> potential kernel crash.
>
> [...]
Here is the summary with links:
- [v2,net] net: airoha: fix foe_check_time allocation size
https://git.kernel.org/netdev/net/c/5c121ee63568
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net v3] net: pch_gbe: handle TX skb allocation failure
From: patchwork-bot+netdevbpf @ 2026-06-19 1:10 UTC (permalink / raw)
To: Ruoyu Wang
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, horms, masa-korg,
netdev, linux-kernel
In-Reply-To: <20260615125043.3537046-1-ruoyuw560@gmail.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 15 Jun 2026 20:50:42 +0800 you wrote:
> pch_gbe_alloc_tx_buffers() allocates an skb for each TX descriptor and
> then passes the returned pointer to skb_reserve(). If netdev_alloc_skb()
> fails, skb_reserve() dereferences NULL.
>
> Make pch_gbe_alloc_tx_buffers() return an error when an skb allocation
> fails. On failure, let pch_gbe_alloc_tx_buffers() clean the partially
> allocated TX ring before returning the error. While bringing the device
> up, release the RX buffer pool through a shared cleanup helper before
> unwinding the IRQ setup.
>
> [...]
Here is the summary with links:
- [net,v3] net: pch_gbe: handle TX skb allocation failure
https://git.kernel.org/netdev/net/c/a0aa6bf985aa
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net v3] octeontx2-af: cn10k: restrict VF LMTLINE sharing to its own PF
From: patchwork-bot+netdevbpf @ 2026-06-19 1:10 UTC (permalink / raw)
To: Junrui Luo
Cc: sgoutham, lcherian, gakula, hkelam, sbhatta, andrew+netdev, davem,
edumazet, kuba, pabeni, netdev, linux-kernel, danisjiang, stable
In-Reply-To: <SYBPR01MB78811656934E713B77DA6CEDAFE62@SYBPR01MB7881.ausprd01.prod.outlook.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 15 Jun 2026 23:04:27 +0800 you wrote:
> rvu_mbox_handler_lmtst_tbl_setup() uses req->base_pcifunc as a direct
> index into the LMT map table to read another function's LMTLINE
> physical base address and copy it into the caller's own LMT map table
> entry. The mailbox dispatcher authenticates req->hdr.pcifunc from the
> IRQ source, but req->base_pcifunc is a separate payload field and is
> not sanitized.
>
> [...]
Here is the summary with links:
- [net,v3] octeontx2-af: cn10k: restrict VF LMTLINE sharing to its own PF
https://git.kernel.org/netdev/net/c/8cdcf3d2caac
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net 0/2] devlink: Fix a couple parent ref leaks
From: patchwork-bot+netdevbpf @ 2026-06-19 1:10 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: netdev, jiri, davem, edumazet, kuba, pabeni, horms,
michal.wilczynski, cjubran, mbloch, tariqt
In-Reply-To: <20260616110633.1449432-1-cratiu@nvidia.com>
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 16 Jun 2026 14:06:31 +0300 you wrote:
> These two patches fix parent ref leaks on errors.
>
> Cosmin Ratiu (2):
> devlink: Fix parent ref leak in devl_rate_node_create()
> devlink: Fix parent ref leak on tc-bw failure
>
> net/devlink/rate.c | 25 ++++++++++++++-----------
> 1 file changed, 14 insertions(+), 11 deletions(-)
Here is the summary with links:
- [net,1/2] devlink: Fix parent ref leak in devl_rate_node_create()
https://git.kernel.org/netdev/net/c/ba45106342bb
- [net,2/2] devlink: Fix parent ref leak on tc-bw failure
https://git.kernel.org/netdev/net/c/ba81a8b80f04
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net v2 0/2] dpaa2-switch: reject VLAN uppers while bridged
From: patchwork-bot+netdevbpf @ 2026-06-19 1:00 UTC (permalink / raw)
To: Ioana Ciornei
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev, f.fainelli,
vladimir.oltean, linux-kernel
In-Reply-To: <20260618092813.432535-1-ioana.ciornei@nxp.com>
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 18 Jun 2026 12:28:11 +0300 you wrote:
> The dpaa2-switch driver does not support VLAN uppers on its ports while
> they are bridged. The check which should have prevented a port with a
> VLAN upper to join bridge was poorly refactored and didn't actually
> return an error. Patch 2/2 fixes that.
>
> On the other hand, the driver didn't reject the addition of a VLAN upper
> while bridged. Patch 1/2 fixes that.
>
> [...]
Here is the summary with links:
- [net,v2,1/2] dpaa2-switch: do not accept VLAN uppers while bridged
(no matching commit)
- [net,v2,2/2] dpaa2-switch: fix VLAN upper check not rejecting bridge join
https://git.kernel.org/netdev/net/c/ed2294f94e34
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net] dpaa2-switch: fix VLAN upper check not rejecting bridge join
From: patchwork-bot+netdevbpf @ 2026-06-19 1:00 UTC (permalink / raw)
To: Ioana Ciornei
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev, f.fainelli,
vladimir.oltean, linux-kernel
In-Reply-To: <20260616105430.3725910-1-ioana.ciornei@nxp.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 16 Jun 2026 13:54:30 +0300 you wrote:
> The blamed commit refactored the prechangeupper event handling but
> failed to actually return an error in case
> dpaa2_switch_prevent_bridging_with_8021q_upper() detected a 802.1q upper
> on a port which tries to join a bridge. Fix this by returning err
> instead of 0.
>
> Fixes: 45035febc495 ("net: dpaa2-switch: refactor prechangeupper sanity checks")
> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
>
> [...]
Here is the summary with links:
- [net] dpaa2-switch: fix VLAN upper check not rejecting bridge join
https://git.kernel.org/netdev/net/c/ed2294f94e34
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net] net: llc: make empty have static storage duration
From: patchwork-bot+netdevbpf @ 2026-06-19 1:00 UTC (permalink / raw)
To: Wentao Guan; +Cc: kuba, joel.granados, netdev, linux-kernel, zhanjun, niecheng1
In-Reply-To: <20260616064053.690154-1-guanwentao@uniontech.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 16 Jun 2026 14:40:53 +0800 you wrote:
> Make empty have static storage duration (like net/sysctl_net.c does) to
> avoid a potential use-after-return and keep consistent with
> __register_sysctl_table @table 'should not be free'd after registration'.
>
> Fixes: 73dbd8cf7947 ("net: Remove ctl_table sentinel elements from several networking subsystems")
> Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
>
> [...]
Here is the summary with links:
- [net] net: llc: make empty have static storage duration
https://git.kernel.org/netdev/net/c/d31deeab707b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [net v2] net/sched: fix partial tx_queue_len change rollback
From: Jakub Kicinski @ 2026-06-19 0:59 UTC (permalink / raw)
To: Chenguang Zhao
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman, netdev
In-Reply-To: <20260615031824.314112-1-zhaochenguang@kylinos.cn>
On Mon, 15 Jun 2026 11:18:24 +0800 Chenguang Zhao wrote:
> When dev_qdisc_change_tx_queue_len() fails partway through updating
> per-tx-queue qdiscs, previously updated queues were left at the new
> size while netif_change_tx_queue_len() only restored dev->tx_queue_len.
>
> Pass the original queue length and roll back successfully updated
> queues on failure.
I don't think it matters. Also net-next is closed.
--
pw-bot: reject
^ permalink raw reply
* Re: [PATCH net v2 0/2] net: ethernet: sunplus: spl2sw: fix of_node refcount leaks
From: Jakub Kicinski @ 2026-06-19 0:56 UTC (permalink / raw)
To: Shitalkumar Gandhi
Cc: Wells Lu, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, netdev, linux-kernel, Shitalkumar Gandhi
In-Reply-To: <cover.1781552725.git.shitalkumar.gandhi@cambiumnetworks.com>
On Tue, 16 Jun 2026 01:20:30 +0530 Shitalkumar Gandhi wrote:
> This series fixes of_node refcount leaks in the Sunplus SP7021 ethernet
> driver, found by inspection. Compile-tested only; no SP7021 hardware
> available here.
>
> Patch 1/2 fixes the phy_node leak in the remove path.
> Patch 2/2 fixes multiple leaks in the probe path and depends on the
> cleanup contract from patch 1/2.
Wells Lu, please review.
--
mping: SUNPLUS ETHERNET DRIVER
^ permalink raw reply
* Re: [PATCH net] net: llc: make empty have static storage duration
From: Jakub Kicinski @ 2026-06-19 0:52 UTC (permalink / raw)
To: Wentao Guan; +Cc: joel.granados, netdev, linux-kernel, zhanjun, niecheng1
In-Reply-To: <20260616064053.690154-1-guanwentao@uniontech.com>
On Tue, 16 Jun 2026 14:40:53 +0800 Wentao Guan wrote:
> Make empty have static storage duration (like net/sysctl_net.c does) to
> avoid a potential use-after-return and keep consistent with
> __register_sysctl_table @table 'should not be free'd after registration'.
>
> Fixes: 73dbd8cf7947 ("net: Remove ctl_table sentinel elements from several networking subsystems")
> Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
> ---
> net/llc/sysctl_net_llc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/llc/sysctl_net_llc.c b/net/llc/sysctl_net_llc.c
> index c8d88e2508fce..15f1e5d88f208 100644
> --- a/net/llc/sysctl_net_llc.c
> +++ b/net/llc/sysctl_net_llc.c
> @@ -47,7 +47,7 @@ static struct ctl_table_header *llc_station_header;
>
> int __init llc_sysctl_init(void)
> {
> - struct ctl_table empty[1] = {};
> + static struct ctl_table empty[1] = {};
> llc2_timeout_header = register_net_sysctl(&init_net, "net/llc/llc2/timeout", llc2_timeout_table);
> llc_station_header = register_net_sysctl_sz(&init_net, "net/llc/station", empty, 0);
I will apply this but it's not a bug.
The size is 0, even tho the pointer is stored there can be no access.
^ permalink raw reply
* Re: [PATCH net v4] virtio-net: fix len check in receive_big()
From: patchwork-bot+netdevbpf @ 2026-06-19 0:50 UTC (permalink / raw)
To: Xiang Mei
Cc: mst, jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
kuba, pabeni, netdev, virtualization, linux-kernel,
minhquangbui99, bestswngs
In-Reply-To: <20260616042837.2249468-1-xmei5@asu.edu>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 15 Jun 2026 21:28:37 -0700 you wrote:
> receive_big() bounds the device-announced length by
> (big_packets_num_skbfrags + 1) * PAGE_SIZE. That is still too loose:
> add_recvbuf_big() sets sg[1] to start at offset
> sizeof(struct padded_vnet_hdr) into the first page, so the chain
> actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> check allows for the common hdr_len == 12 case.
>
> [...]
Here is the summary with links:
- [net,v4] virtio-net: fix len check in receive_big()
https://git.kernel.org/netdev/net/c/9e5ad06ea826
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net v2 0/2] net/sched: act_ct: preserve tc_skb_cb across defragmentation
From: patchwork-bot+netdevbpf @ 2026-06-19 0:50 UTC (permalink / raw)
To: Ren Wei
Cc: netdev, linux-kselftest, linux-kernel, jhs, jiri, kuba, paulb,
victor, yuantan098, yifanwucs, tomapufckgml, bird, xizh2024
In-Reply-To: <cover.1781358691.git.xizh2024@lzu.edu.cn>
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Sun, 14 Jun 2026 01:42:38 +0800 you wrote:
> From: Zihan Xi <xizh2024@lzu.edu.cn>
>
> Hi Linux kernel maintainers,
>
> We found and validated an issue in net/sched/act_ct.c. The bug is
> reachable when configuring TC with act_ct on a netdev (requires
> CAP_NET_ADMIN). We have tested it, and the fix should not affect
> other functionality.
>
> [...]
Here is the summary with links:
- [net,v2,1/2] net/sched: act_ct: preserve tc_skb_cb across defragmentation
https://git.kernel.org/netdev/net/c/9092e15defbe
- [net,v2,2/2] selftests/tc-testing: act_ct: add TDC test for skb cb preservation across defrag
https://git.kernel.org/netdev/net/c/1fd8f80a199d
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [RFC PATCH 1/2] landlock: fix TCP Fast Open connection bypass
From: Matthieu Buffet @ 2026-06-19 0:34 UTC (permalink / raw)
To: Bryam Vargas
Cc: Mickaël Salaün, Günther Noack, Mikhail Ivanov,
Paul Moore, Eric Dumazet, Neal Cardwell, linux-security-module,
netdev, linux-kernel
In-Reply-To: <20260618012527.34964-1-hexlabsecurity@proton.me>
Hi Bryam,
On 6/18/2026 3:25 AM, Bryam Vargas wrote:
> One scope note, since you mention MPTCP: an MPTCP socket isn't covered.
> sk_is_tcp() is false for the mptcp parent (sk_protocol is IPPROTO_MPTCP), so
> neither the new sendmsg hook nor the existing socket_connect one mediates it. On
> the patched kernel my MPTCP arm still reaches the blocked port via both connect()
> and MSG_FASTOPEN. If MPTCP is meant to be in scope for CONNECT_TCP, the guard
> wants `|| sk->sk_protocol == IPPROTO_MPTCP` (not sk_is_mptcp(), which is the
> subflow flag).
Indeed, the patch does not try to filter MPTCP: it is not meant to be in
the scope of LANDLOCK_ACCESS_NET_*_TCP rights.
It used to be, but it was a bug, see:
https://lore.kernel.org/all/20250205093651.1424339-2-ivanov.mikhail1@huawei-partners.com/
Have a nice day!
--
Matthieu
^ permalink raw reply
* [RFC] Enabling CONFIG_NTP_PPS for NOHZ by adding ntp_error to system_time_snapshot
From: David Woodhouse @ 2026-06-19 0:33 UTC (permalink / raw)
To: John Stultz, Thomas Gleixner, Stephen Boyd, Miroslav Lichvar,
Richard Cochran, linux-kernel, netdev
Cc: Rodolfo Giometti, Alexander Gordeev
[-- Attachment #1: Type: text/plain, Size: 10587 bytes --]
As far as I can tell, the only (remaining?) reason that CONFIG_NTP_PPS
doesn't work with NO_HZ_COMMON is because the real time snapshots that
pps_get_ts() uses are not sufficiently accurate, so the phase
correction wouldn't work very well.
The inaccuracy happens because of the way the kernel's timekeeping
sawtooths around the 'ideal' time line, by choosing between adjacent
values of 'mult' and 'mult+1' from one tick to the next. But with a
tickless kernel, of course the correction *doesn't* happen each tick,
and the time reported as CLOCK_REALTIME diverges further from the
correct time.
The thing is... since
https://lore.kernel.org/all/20260614144032.534706-1-dwmw2@infradead.org/
we know *precisely* how far from the truth our CLOCK_REALTIME value is,
and we can just put that information into the system_time_snapshot for
the caller to use as it sees fit. If the caller doesn't care about
monotonicity, it can just add the known 'error' to the snapshot.systime
value, and have a completely accurate snapshot even under nohz.
If I run my vmclock reference test on a tickless kernel, I see the
kernel's timekeeping vary by ±15ns around the ideal. The correction
below clamps it back to the ±1ns that I see with a periodic tick.
I think that's enough to enable CONFIG_NTP_PPS too, right? I'll have to
revive the hack at
https://lore.kernel.org/all/87cb97d5a26d0f4909d2ba2545c4b43281109470.camel@infradead.org/
to test it...
Am I missing some other reason for the dependency? Aside from the phase
error, it *does* seem to work. The dependency on !NO_HZ goes all the
way back to the original introduction of hardpps support in commit
025b40abe7, which doesn't explain *why* it didn't work on tickless
kernels.
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Fri, 19 Jun 2026 00:00:29 +0100
Subject: [PATCH] timekeeping: Extrapolate ntp_error into snapshots
ktime_get_snapshot_id() is a lockless reader: it interpolates the
clocksource forward from cycle_last at a fixed mult but never runs the
timekeeping accumulation, so tk->ntp_error is only current as of the
last update. Between updates the read accrues the per-cycle deviation
from the NTP-ideal rate; on a NO_HZ kernel that span can be many ticks,
widening the sawtooth between the snapshot's disciplined CLOCK_REALTIME
and the ideal NTP line. This is the obstacle to accurate in-kernel PPS,
which today depends on !NO_HZ_COMMON.
Carry that deviation in the snapshot as a signed nanosecond offset that
a consumer adds directly to ::systime to land on the ideal line. It sums
four terms in ns << NTP_SCALE_SHIFT before converting:
- tk->ntp_error, the deviation as of the last update;
- (cycle_delta * ntp_err_frac), the fractional-mult drift accrued
since then (cycle_delta is at most a tick on a tickful kernel, but
many ticks' worth under NO_HZ);
- (cycle_delta * ntp_err_mult), subtracting the applied +1 mult dither
over the same span;
- the sub-nanosecond fraction dropped when ::systime was truncated to
whole ns (low shift bits of the read, exact despite overflow).
Only the mono-based clocks (REALTIME/MONOTONIC/BOOTTIME) carry this; RAW
is undisciplined and AUX has its own discipline. The residual is then a
single clocksource cycle, the same bound as a tickful kernel.
NOT-FOR-UPSTREAM: also includes a temporary ptp_vmclock debug hack that
prints the offset and applies it to the returned timestamp, for
validating the field against the host vmclock reference under QEMU.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Assisted-by: Kiro:claude-opus-4.8
---
drivers/ptp/ptp_vmclock.c | 2 ++
include/linux/timekeeper_internal.h | 6 ++++
include/linux/timekeeping.h | 9 +++++
kernel/time/timekeeping.c | 56 +++++++++++++++++++++++++++--
4 files changed, 71 insertions(+), 2 deletions(-)
diff --git a/drivers/ptp/ptp_vmclock.c b/drivers/ptp/ptp_vmclock.c
index c09ae06d7f68..37a9c8390055 100644
--- a/drivers/ptp/ptp_vmclock.c
+++ b/drivers/ptp/ptp_vmclock.c
@@ -140,7 +140,9 @@ static int vmclock_get_crosststamp(struct vmclock_state *st,
ptp_read_system_prets(sts);
if (sts->pre_sts.cs_id == st->cs_id) {
cycle = sts->pre_sts.cycles;
+ sts->pre_sts.systime += sts->pre_sts.ntp_error;
sts->post_sts = sts->pre_sts;
+ pr_info("vmclock pre error %lld\n", sts->pre_sts.ntp_error);
} else if (sts->pre_sts.hw_csid == st->cs_id &&
sts->pre_sts.hw_cycles) {
cycle = sts->pre_sts.hw_cycles;
diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index 5dc7f8bf2740..b487e7d925fe 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -97,6 +97,11 @@ struct tk_read_base {
* @ntp_error_shift: Shift conversion between clock shifted nano seconds and
* ntp shifted nano seconds.
* @ntp_err_mult: Multiplication factor for scaled math conversion
+ * @ntp_err_frac: Fractional part of the per-cycle NTP-ideal mult that the
+ * integer @mult truncates, as a fraction of 2^32 in
+ * clock-shifted nanoseconds per cycle. Used to
+ * extrapolate @ntp_error to an arbitrary cycle count in
+ * the lockless snapshot readers (ktime_get_snapshot_id).
* @cs_tick_adj: Per-second adjustment handed to NTP via ntp_clear()
* accounting for the difference between the nominal
* NTP interval and the real time taken by the
@@ -187,6 +192,7 @@ struct timekeeper {
s64 ntp_error;
u32 ntp_error_shift;
u32 ntp_err_mult;
+ u64 ntp_err_frac;
s64 cs_tick_adj;
u32 skip_second_overflow;
s64 skew_delta;
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 984a866d293b..e53be1952021 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -283,6 +283,14 @@ static inline bool ktime_get_aux_ts64(clockid_t id, struct timespec64 *kt) { ret
* which @cycles was derived
* @systime: The system time of the selected CLOCK ID
* @monoraw: Monotonic raw system time
+ * @ntp_error: Signed nanosecond offset of @systime from the ideal
+ * NTP-disciplined time at @cycles. Extrapolated to @cycles
+ * (so it is exact even when many cycles have elapsed since the
+ * last timekeeping update, e.g. on a NO_HZ kernel) and includes
+ * the sub-nanosecond fraction dropped when @systime was
+ * truncated to whole ns. A consumer lands on the ideal line by
+ * adding @ntp_error directly to @systime. Only meaningful for
+ * CLOCK_REALTIME/CLOCK_MONOTONIC.
* @cs_id: Clocksource ID
* @hw_csid: Clocksource ID of the underlying hardware counter for derived
* clocksources which implement the read_snapshot() callback.
@@ -295,6 +303,7 @@ struct system_time_snapshot {
u64 hw_cycles;
ktime_t systime;
ktime_t monoraw;
+ s64 ntp_error;
enum clocksource_ids cs_id;
enum clocksource_ids hw_csid;
unsigned int clock_was_set_seq;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index a67d2f27c73e..e319eca307ee 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -407,6 +407,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
tk->tkr_mono.mult = clock->mult;
tk->tkr_raw.mult = clock->mult;
tk->ntp_err_mult = 0;
+ tk->ntp_err_frac = 0;
tk->skip_second_overflow = 0;
tk->skew_delta = 0;
@@ -1285,6 +1286,45 @@ void ktime_get_snapshot_id(clockid_t clock_id, struct system_time_snapshot *syst
nsec_sys = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
+
+ /*
+ * For the NTP-disciplined mono-based clocks, report how far
+ * @systime is from the ideal NTP time at @now, in signed ns,
+ * so a caller can land on the ideal line by adding it. Four
+ * terms, summed in ns << NTP_SCALE_SHIFT before converting:
+ *
+ * - tk->ntp_error, the deviation as of the last update;
+ * - (cycle_delta * ntp_err_frac), the fractional-mult drift
+ * accrued since then (cycle_delta is at most a tick on a
+ * tickful kernel, but many ticks' worth under NO_HZ);
+ * - (cycle_delta * ntp_err_mult), subtracting the applied +1
+ * mult dither over the same span;
+ * - the sub-ns fraction @systime dropped when the read was
+ * truncated to whole ns (low @shift bits, exact despite the
+ * multiply overflowing).
+ *
+ * RAW is undisciplined and AUX has its own discipline, so they
+ * carry no ntp_error.
+ */
+ if (clock_id == CLOCK_REALTIME || clock_id == CLOCK_MONOTONIC ||
+ clock_id == CLOCK_BOOTTIME) {
+ u32 nes = tk->ntp_error_shift;
+ u64 cycle_delta = (now - tk->tkr_mono.cycle_last) &
+ tk->tkr_mono.mask;
+ s64 err = tk->ntp_error +
+ (((s64)mul_u64_u64_shr(cycle_delta,
+ tk->ntp_err_frac, 32) -
+ (s64)(cycle_delta * tk->ntp_err_mult)) << nes);
+
+ err += (s64)((cycle_delta * tk->tkr_mono.mult +
+ tk->tkr_mono.xtime_nsec) &
+ ((1ULL << tk->tkr_mono.shift) - 1)) << nes;
+ systime_snapshot->ntp_error =
+ (err + (1LL << (NTP_SCALE_SHIFT - 1))) >>
+ NTP_SCALE_SHIFT;
+ } else {
+ systime_snapshot->ntp_error = 0;
+ }
} while (read_seqcount_retry(&tkd->seq, seq));
systime_snapshot->cycles = now;
@@ -2432,6 +2472,7 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
{
u64 ntp_tl = ntp_tick_length(tk->id);
s64 skew = ntp_get_skew_delta(tk->id);
+ u64 dividend;
u32 mult;
/*
@@ -2452,8 +2493,19 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
* scale it back up to the full per-tick rate for the mult bias.
*/
skew *= NTP_INTERVAL_FREQ;
- mult = div64_u64((tk->ntp_tick + skew) >> tk->ntp_error_shift,
- tk->cycle_interval);
+ dividend = (tk->ntp_tick + skew) >> tk->ntp_error_shift;
+ mult = div64_u64(dividend, tk->cycle_interval);
+ /*
+ * Stash the fractional part of the per-cycle ideal mult that
+ * the integer @mult discards, scaled by 2^32, in clock-shifted
+ * ns per cycle. The lockless snapshot readers use it to
+ * extrapolate @ntp_error forward over the cycles accumulated
+ * since the last tick (which on a NO_HZ kernel may be many
+ * ticks' worth).
+ */
+ tk->ntp_err_frac = div64_u64((dividend - (u64)mult *
+ tk->cycle_interval) << 32,
+ tk->cycle_interval);
}
/*
--
2.43.0
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply related
* [PATCH] tomoyo: Enforce connect policy in TCP Fast Open
From: Matthieu Buffet @ 2026-06-19 0:22 UTC (permalink / raw)
To: Kentaro Takeda, Tetsuo Handa
Cc: Bryam Vargas, Mickaël Salaün, Günther Noack,
linux-security-module, Mikhail Ivanov, Paul Moore, Yuchung Cheng,
Eric Dumazet, netdev, Matthieu Buffet
Tomoyo restricted TCP connections in 2011 in commit
059d84dbb389 ("TOMOYO: Add socket operation restriction support.")
using the socket_connect() LSM hook.
However, the MSG_FASTOPEN sendmsg() flag was added in 2012 to allow
combining connect() and the first sendmsg(). Tomoyo was not updated to
take this into account in its send hook.
This resulted in a TCP connect policy bypass similar to that reported in
Landlock in 2024 (see Link below), with the difference that Tomoyo was
fine when originally merged, and the problem got introduced when adding
fastopen support, possibly due to lack of synchronization between lsm
and netdev worlds.
Add MSG_FASTOPEN handling in Tomoyo's existing send hook.
Link: https://github.com/landlock-lsm/linux/issues/41
Link: https://lore.kernel.org/all/20260616201615.275032-1-hexlabsecurity@proton.me/
Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
Cc: stable@kernel.org
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
---
security/tomoyo/network.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/security/tomoyo/network.c b/security/tomoyo/network.c
index cfc2a019de1e..7d9ba7268dc2 100644
--- a/security/tomoyo/network.c
+++ b/security/tomoyo/network.c
@@ -764,11 +764,25 @@ int tomoyo_socket_sendmsg_permission(struct socket *sock, struct msghdr *msg,
struct tomoyo_addr_info address;
const u8 family = tomoyo_sock_family(sock->sk);
const unsigned int type = sock->type;
+ int ret;
+ address.protocol = type;
+
+ if ((msg->msg_flags & MSG_FASTOPEN) != 0 && msg->msg_name != NULL &&
+ (sk_is_tcp(sock->sk) ||
+ (sk_is_inet(sock->sk) && type == SOCK_STREAM &&
+ sock->sk->sk_protocol == IPPROTO_MPTCP))) {
+ address.operation = TOMOYO_NETWORK_CONNECT;
+ ret = tomoyo_check_inet_address(
+ (struct sockaddr *)msg->msg_name, msg->msg_namelen,
+ sock->sk->sk_protocol, &address);
+ if (ret != 0)
+ return ret;
+ }
if (!msg->msg_name || !family ||
(type != SOCK_DGRAM && type != SOCK_RAW))
return 0;
- address.protocol = type;
+
address.operation = TOMOYO_NETWORK_SEND;
if (family == PF_UNIX)
return tomoyo_check_unix_address((struct sockaddr *)
--
2.47.3
^ permalink raw reply related
* Re: building ynl afaics requires updating the UAPI headers first
From: Jakub Kicinski @ 2026-06-19 0:06 UTC (permalink / raw)
To: Thorsten Leemhuis; +Cc: Donald Hunter, netdev, Riana Tauro
In-Reply-To: <ade91456-2f93-442c-b76c-28bd7157f074@leemhuis.info>
On Thu, 18 Jun 2026 15:39:46 +0200 Thorsten Leemhuis wrote:
> DRM_RAS_CMD_CLEAR_ERROR_COUNTER was introduced to mainline yesterday as
> ee18d39a087792 ("drm/drm_ras: Add clear-error-counter netlink command to
> drm_ras") [v7.1-post].
>
> I finally looked closer today and noticed how to prevent this: update
> the kernel's UAPI files (e.g. the stuff that lives in /usr/include/) on
> the builder. Thing is: that's basically impossible to do from a srpm, as
> those should not change the build environment and can't even when
> working as non-root.
>
> Note sure if relevant and just a shot in the dark, so maybe ignore the
> following:
>
> While investigating this I noticed this comment in
> tools/net/ynl/Makefile.deps:
>
> """
> > # Try to include uAPI headers from the kernel uapi/ path.
> > # Most code under tools/ requires the respective kernel uAPI headers
> > # to be copied to tools/include. The duplication is annoying.
> > # All the family headers should be self-contained. We avoid the copying
> > # by selectively including just the uAPI header of the family directly
> > # from the kernel sources.
> """
>
> Is that maybe not the case anymore with the recent changes to ynl?
Can't repro for some reason, but we probably need something like
commit 46e9b0224475abc to add the explicit include rule.
^ permalink raw reply
* Re: general protection fault in fou_nl_add_doit
From: Jakub Kicinski @ 2026-06-18 23:52 UTC (permalink / raw)
To: sanan.hasanou
Cc: davem, dsahern, edumazet, pabeni, horms, netdev, linux-kernel,
syzkaller, contact
In-Reply-To: <6a346fa4.26cc5c6d.1ace13.9d21@mx.google.com>
On Thu, 18 Jun 2026 15:22:28 -0700 (PDT) sanan.hasanou@gmail.com wrote:
> We found a bug using a modified version of syzkaller.
>
> Kernel Branch: 7.0-rc1
That's an old kernel. Did you re-run this on 7.1?
^ permalink raw reply
* Re: [PATCHv2 0/4] m68k: coldfire: fix non-standard readX()/writeX() functions
From: Greg Ungerer @ 2026-06-18 23:49 UTC (permalink / raw)
To: Paolo Abeni, linux-m68k
Cc: linux-kernel, arnd, wei.fang, frank.li, shenwei.wang, imx, netdev,
nico, adureghello, ulfh, linux-mmc, linux-can, linux-spi, olteanv
In-Reply-To: <fe40891c-3fd1-417c-835e-6f1046db7844@redhat.com>
Hi Paolo,
On 13/6/26 19:22, Paolo Abeni wrote:
> On 6/9/26 4:12 PM, Greg Ungerer wrote:
>> This odd collection of patches is aimed at fixing the non-standard ColdFire
>> set of readX()/writeX() IO access functions. Instead switching to using the
>> asm-generic definitions in include/asm-generic/io.h. The difficulty comes
>> in trying not to break any drivers with this change.
>>
>> The implementation of the readX()/writeX() family of IO access functions
>> is non-standard on ColdFire platforms. They either return big-endian (that
>> is native endian) data, or on platforms with PCI bus support check the
>> supplied address and return either big or little endian data based on that
>> check. This is non-standard, they are expected to always return
>> little-endian byte ordered data. Unfortunately this behavior also means
>> that ioreadX()/iowroteX() and their big-endian counter parts
>> ioreadXbe()/iowriteXbe() are currently broken because they are implemented
>> using the readX()/writeX() functions.
>>
>> Patches 1, 2 and 3 in this series are specific driver changes that can be
>> made independently of the final ColdFire readX()/writeX() change.
>>
>> Patch 4 is the actual switch to ColdFire building using asm-generic
>> readX()/writeX(), but also contains three driver fixes that are not easily
>> handled independently.
>>
>> Note that I don't have access to all supported hardware needed to fully
>> test all these changes. I have tested what I have, a bunch of the standard
>> Freescale ColdFire eval boards, and inspected generated code for differences.
>>
>> Note also that patch 3 relies on changes that are currently only in
>> linux-next, and are scheduled to hit mainline during the next v7.2
>> merge window. Those changes are also available in an immutable git tree
>> at git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu.git
>> cf-internal-io branch.
>
> I understand that with this series you are targeting the m68K tree, am I
> correct?
All the changes are targeted at fixing an m68k issue, yes.
> A possibly better option would be, after that the pre-req patches land
> into Linus's tree, to share an immutable branch for this series, so that
> both m68k and net-next could pull it.
I can certainly do that. All pre-requisite changes are now in Linus' tree.
My preference would be for subsystem maintainers to pick up their respective
changes (so patches 1, 2 and 3). I expect I will push patch 4 via the m68knommu
git tree, with appropriate sign offs from affected subsystems.
Regards
Greg
^ permalink raw reply
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Jakub Kicinski @ 2026-06-18 23:47 UTC (permalink / raw)
To: Breno Leitao
Cc: Peter Zijlstra, Petr Mladek, Sebastian Andrzej Siewior,
John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner,
netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <ajQFMS4ucT-mybhi@gmail.com>
On Thu, 18 Jun 2026 07:57:33 -0700 Breno Leitao wrote:
> Let me verify my understanding: if we switched to __raise_softirq_irqoff()
> in dev_kfree_skb_irq_reason(), the issue would be resolved since we'd
> avoid waking ksoftirqd and therefore wouldn't touch the runqueue lock in this
> code path.
That's the same as Vlad's patch. It risks leaving the softirq raised
but never invoked.
^ permalink raw reply
* Re: [PATCH v28 5/5] sfc: support pio mapping based on cxl
From: Dave Jiang @ 2026-06-18 23:06 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, djbw, edward.cree,
davem, kuba, pabeni, edumazet
Cc: Alejandro Lucero
In-Reply-To: <20260618181806.118745-6-alejandro.lucero-palau@amd.com>
On 6/18/26 11:18 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> A PIO buffer is a region of device memory to which the driver can write a
> packet for TX, with the device handling the transmit doorbell without
> requiring a DMA for getting the packet data, which helps reducing latency
> in certain exchanges. With CXL mem protocol this latency can be lowered
> further.
>
> With a device supporting CXL and successfully initialised, use the cxl
> region to map the memory range and use this mapping for PIO buffers.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/net/ethernet/sfc/ef10.c | 41 ++++++++++++++++++++++-----
> drivers/net/ethernet/sfc/efx.h | 1 -
> drivers/net/ethernet/sfc/efx_cxl.c | 1 +
> drivers/net/ethernet/sfc/net_driver.h | 1 +
> drivers/net/ethernet/sfc/nic.h | 3 ++
> 5 files changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
> index 7e04f115bbaa..73bc064929f6 100644
> --- a/drivers/net/ethernet/sfc/ef10.c
> +++ b/drivers/net/ethernet/sfc/ef10.c
> @@ -24,6 +24,7 @@
> #include <linux/wait.h>
> #include <linux/workqueue.h>
> #include <net/udp_tunnel.h>
> +#include "efx_cxl.h"
>
> /* Hardware control for EF10 architecture including 'Huntington'. */
>
> @@ -106,7 +107,7 @@ static int efx_ef10_get_vf_index(struct efx_nic *efx)
>
> static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
> {
> - MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V4_OUT_LEN);
> + MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V7_OUT_LEN);
> struct efx_ef10_nic_data *nic_data = efx->nic_data;
> size_t outlen;
> int rc;
> @@ -177,6 +178,12 @@ static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
> efx->num_mac_stats);
> }
>
> + if (outlen < MC_CMD_GET_CAPABILITIES_V7_OUT_LEN)
> + nic_data->datapath_caps3 = 0;
> + else
> + nic_data->datapath_caps3 = MCDI_DWORD(outbuf,
> + GET_CAPABILITIES_V7_OUT_FLAGS3);
> +
> return 0;
> }
>
> @@ -1140,6 +1147,9 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
> unsigned int channel_vis, pio_write_vi_base, max_vis;
> struct efx_ef10_nic_data *nic_data = efx->nic_data;
> unsigned int uc_mem_map_size, wc_mem_map_size;
> +#ifdef CONFIG_SFC_CXL
> + struct efx_probe_data *probe_data;
> +#endif
> void __iomem *membase;
> int rc;
>
> @@ -1263,8 +1273,23 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
> iounmap(efx->membase);
> efx->membase = membase;
>
> - /* Set up the WC mapping if needed */
> - if (wc_mem_map_size) {
> + if (!wc_mem_map_size)
> + goto skip_pio;
> +
> + /* Set up the WC mapping */
> +
> +#ifdef CONFIG_SFC_CXL
> + probe_data = container_of(efx, struct efx_probe_data, efx);
> + if ((nic_data->datapath_caps3 &
> + (1 << MC_CMD_GET_CAPABILITIES_V7_OUT_CXL_CONFIG_ENABLE_LBN)) &&
> + probe_data->cxl_pio_initialised) {
> + /* Using PIO through CXL mapping */
> + nic_data->pio_write_base = probe_data->cxl->ctpio_cxl;
> + nic_data->pio_write_vi_base = pio_write_vi_base;
> + } else
> +#endif
> + {
> + /* Using legacy PIO BAR mapping */
> nic_data->wc_membase = ioremap_wc(efx->membase_phys +
> uc_mem_map_size,
> wc_mem_map_size);
> @@ -1279,12 +1304,14 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
> nic_data->wc_membase +
> (pio_write_vi_base * efx->vi_stride + ER_DZ_TX_PIOBUF -
> uc_mem_map_size);
> -
> - rc = efx_ef10_link_piobufs(efx);
> - if (rc)
> - efx_ef10_free_piobufs(efx);
> }
>
> + rc = efx_ef10_link_piobufs(efx);
> + if (rc)
> + efx_ef10_free_piobufs(efx);
> +
> +skip_pio:
> +
> netif_dbg(efx, probe, efx->net_dev,
> "memory BAR at %pa (virtual %p+%x UC, %p+%x WC)\n",
> &efx->membase_phys, efx->membase, uc_mem_map_size,
> diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
> index 45e191686625..057d30090894 100644
> --- a/drivers/net/ethernet/sfc/efx.h
> +++ b/drivers/net/ethernet/sfc/efx.h
> @@ -236,5 +236,4 @@ static inline bool efx_rwsem_assert_write_locked(struct rw_semaphore *sem)
>
> int efx_xdp_tx_buffers(struct efx_nic *efx, int n, struct xdp_frame **xdpfs,
> bool flush);
> -
> #endif /* EFX_EFX_H */
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index 3e7c950f83e9..348d7404cd7a 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -88,6 +88,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> return -ENOMEM;
> }
>
> + probe_data->cxl_pio_initialised = true;
> probe_data->cxl = cxl;
>
> return 0;
> diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
> index de3fc9537662..3964b2c56609 100644
> --- a/drivers/net/ethernet/sfc/net_driver.h
> +++ b/drivers/net/ethernet/sfc/net_driver.h
> @@ -1213,6 +1213,7 @@ struct efx_probe_data {
> struct efx_nic efx;
> #ifdef CONFIG_SFC_CXL
> struct efx_cxl *cxl;
> + bool cxl_pio_initialised;
> #endif
> };
>
> diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
> index ec3b2df43b68..7480f9995dfb 100644
> --- a/drivers/net/ethernet/sfc/nic.h
> +++ b/drivers/net/ethernet/sfc/nic.h
> @@ -152,6 +152,8 @@ enum {
> * %MC_CMD_GET_CAPABILITIES response)
> * @datapath_caps2: Further Capabilities of datapath firmware (FLAGS2 field of
> * %MC_CMD_GET_CAPABILITIES response)
> + * @datapath_caps3: Further Capabilities of datapath firmware (FLAGS3 field of
> + * %MC_CMD_GET_CAPABILITIES response)
> * @rx_dpcpu_fw_id: Firmware ID of the RxDPCPU
> * @tx_dpcpu_fw_id: Firmware ID of the TxDPCPU
> * @must_probe_vswitching: Flag: vswitching has yet to be setup after MC reboot
> @@ -187,6 +189,7 @@ struct efx_ef10_nic_data {
> bool must_check_datapath_caps;
> u32 datapath_caps;
> u32 datapath_caps2;
> + u32 datapath_caps3;
> unsigned int rx_dpcpu_fw_id;
> unsigned int tx_dpcpu_fw_id;
> bool must_probe_vswitching;
^ permalink raw reply
* Re: [PATCH v28 4/5] sfc: obtain and map cxl range using devm_cxl_probe_mem
From: Dave Jiang @ 2026-06-18 23:05 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, djbw, edward.cree,
davem, kuba, pabeni, edumazet
Cc: Alejandro Lucero
In-Reply-To: <20260618181806.118745-5-alejandro.lucero-palau@amd.com>
On 6/18/26 11:18 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use core API for safely obtain the CXL range linked to an HDM committed
> by the BIOS. Map such a range for being used as the ctpio buffer.
>
> A potential user space action through sysfs unbinding or core cxl
> modules remove will trigger sfc driver device detachment, with that case
> not racing with this mapping as this is done during driver probe and
> therefore protected with device lock against those user space actions.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/net/ethernet/sfc/efx.c | 2 ++
> drivers/net/ethernet/sfc/efx_cxl.c | 23 +++++++++++++++++++++++
> drivers/net/ethernet/sfc/efx_cxl.h | 3 +++
> 3 files changed, 28 insertions(+)
>
> diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
> index da008462096d..abfa0ce2b4d1 100644
> --- a/drivers/net/ethernet/sfc/efx.c
> +++ b/drivers/net/ethernet/sfc/efx.c
> @@ -984,6 +984,7 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
> efx_fini_io(efx);
>
> probe_data = container_of(efx, struct efx_probe_data, efx);
> + efx_cxl_exit(probe_data);
>
> pci_dbg(efx->pci_dev, "shutdown successful\n");
>
> @@ -1244,6 +1245,7 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
> fail3:
> efx_fini_io(efx);
> fail2:
> + efx_cxl_exit(probe_data);
> efx_fini_struct(efx);
> fail1:
> WARN_ON(rc > 0);
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index 18b535b3ea40..3e7c950f83e9 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -18,6 +18,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> {
> struct efx_nic *efx = &probe_data->efx;
> struct pci_dev *pci_dev = efx->pci_dev;
> + struct range cxl_pio_range;
> struct efx_cxl *cxl;
> u16 dvsec;
> int rc;
> @@ -73,9 +74,31 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> return -ENODEV;
> }
>
> + cxl->cxlmd = devm_cxl_probe_mem(&cxl->cxlds, &cxl_pio_range);
> + if (IS_ERR(cxl->cxlmd)) {
> + pci_err(pci_dev, "CXL accel memdev creation failed\n");
> + return PTR_ERR(cxl->cxlmd);
> + }
> +
> + cxl->ctpio_cxl = ioremap_wc(cxl_pio_range.start,
> + range_len(&cxl_pio_range));
> + if (!cxl->ctpio_cxl) {
> + pci_err(pci_dev, "CXL ioremap region (%pra) failed\n",
> + &cxl_pio_range);
> + return -ENOMEM;
> + }
> +
> probe_data->cxl = cxl;
>
> return 0;
> }
>
> +void efx_cxl_exit(struct efx_probe_data *probe_data)
> +{
> + if (!probe_data->cxl)
> + return;
> +
> + iounmap(probe_data->cxl->ctpio_cxl);
> +}
> +
> MODULE_IMPORT_NS("CXL");
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
> index 04e46278464d..3e2705cb063f 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.h
> +++ b/drivers/net/ethernet/sfc/efx_cxl.h
> @@ -20,10 +20,13 @@ struct efx_probe_data;
> struct efx_cxl {
> struct cxl_dev_state cxlds;
> struct cxl_memdev *cxlmd;
> + void __iomem *ctpio_cxl;
> };
>
> int efx_cxl_init(struct efx_probe_data *probe_data);
> +void efx_cxl_exit(struct efx_probe_data *probe_data);
> #else
> static inline int efx_cxl_init(struct efx_probe_data *probe_data) { return 0; }
> +static inline void efx_cxl_exit(struct efx_probe_data *probe_data) {}
> #endif
> #endif
^ permalink raw reply
* Re: [PATCH net] net: dsa: realtek: fix memory leak in rtl8366rb_setup_led()
From: Linus Walleij @ 2026-06-18 22:58 UTC (permalink / raw)
To: David Yang
Cc: netdev, Alvin Šipraga, Andrew Lunn, Vladimir Oltean,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Luiz Angelo Daros de Luca, linux-kernel
In-Reply-To: <20260618140200.1888707-1-mmyangfl@gmail.com>
On Thu, Jun 18, 2026 at 4:02 PM David Yang <mmyangfl@gmail.com> wrote:
> led_classdev_register_ext() only reads init_data.devicename - it never
> stores the pointer. However, the caller allocated devicename with
> kasprintf() but never freed it, leaking the string memory.
>
> Fix it with a stack buffer to avoid dynamic buffers completely.
>
> Fixes: 32d617005475 ("net: dsa: realtek: add LED drivers for rtl8366rb")
> Signed-off-by: David Yang <mmyangfl@gmail.com>
Good catch!
Reviewed-by: Linus Walleij <linusw@kernel.org>
Yours,
Linus Walleij
^ permalink raw reply
* general protection fault in fou_nl_add_doit
From: sanan.hasanou @ 2026-06-18 22:22 UTC (permalink / raw)
To: davem, dsahern, edumazet, kuba, pabeni, horms, netdev,
linux-kernel
Cc: syzkaller, contact
Good day, dear maintainers,
We found a bug using a modified version of syzkaller.
Kernel Branch: 7.0-rc1
Kernel Config: <https://drive.google.com/open?id=1RsqMUgdFUMq9-iREK8DZCvDfjq0RWm5X>
Reproducer: <https://drive.google.com/open?id=1KNnULJOSBve4YaFQT2Z-pK2Ms8LwRN7P>
Thank you!
Best regards,
Sanan Hasanov
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 0 UID: 0 PID: 326872 Comm: syz.6.71753 Tainted: G L 7.0.0-rc1 #1 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:fou_create net/ipv4/fou_core.c:590 [inline]
RIP: 0010:fou_nl_add_doit+0x236/0xaa0 net/ipv4/fou_core.c:764
Code: 48 89 da 4d 89 2c 24 48 85 d2 0f 84 eb 07 00 00 4c 8b 6c 24 60 49 8d 5d 18 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 48 89 54 24 10 74 0d 48 89 df e8 47 c2 90 f8 48 8b 54
RSP: 0018:ffffc9002b54f260 EFLAGS: 00010216
RAX: 0000000000000003 RBX: 0000000000000018 RCX: dffffc0000000000
RDX: ffff88801aeb8d00 RSI: 00000000000002d1 RDI: 00000000ffffffff
RBP: ffffc9002b54f3d0 R08: ffffffff8ef0f1bf R09: 1ffffffff1de1e37
R10: dffffc0000000000 R11: fffffbfff1de1e38 R12: ffff88801d9a3a38
R13: 0000000000000000 R14: ffff88802440dd00 R15: ffffc9002b54f440
FS: 00007f863f9266c0(0000) GS:ffff88809d305000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000110c40b6a1 CR3: 0000000026d9b000 CR4: 00000000000006f0
Call Trace:
<TASK>
genl_family_rcv_msg_doit+0x20d/0x2f0 net/netlink/genetlink.c:1114
genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
genl_rcv_msg+0x60c/0x790 net/netlink/genetlink.c:1209
netlink_rcv_skb+0x206/0x460 net/netlink/af_netlink.c:2550
genl_rcv+0x31/0x40 net/netlink/genetlink.c:1218
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0xa42/0xc00 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x7ed/0xb00 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0x4dd/0x8e0 net/socket.c:2592
___sys_sendmsg+0x1ee/0x260 net/socket.c:2646
__sys_sendmsg net/socket.c:2678 [inline]
__do_sys_sendmsg net/socket.c:2683 [inline]
__se_sys_sendmsg net/socket.c:2681 [inline]
__x64_sys_sendmsg+0x189/0x240 net/socket.c:2681
x64_sys_call+0x17a2/0x2900 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x110/0x8a0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f86416d3b6d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f863f926018 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f8641945fa0 RCX: 00007f86416d3b6d
RDX: 0000000000000000 RSI: 0000200000000280 RDI: 0000000000000003
RBP: 00007f8641777c3e R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f8641946038 R14: 00007f8641945fa0 R15: 00007ffce4a75000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:fou_create net/ipv4/fou_core.c:590 [inline]
RIP: 0010:fou_nl_add_doit+0x236/0xaa0 net/ipv4/fou_core.c:764
Code: 48 89 da 4d 89 2c 24 48 85 d2 0f 84 eb 07 00 00 4c 8b 6c 24 60 49 8d 5d 18 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 48 89 54 24 10 74 0d 48 89 df e8 47 c2 90 f8 48 8b 54
RSP: 0018:ffffc9002b54f260 EFLAGS: 00010216
RAX: 0000000000000003 RBX: 0000000000000018 RCX: dffffc0000000000
RDX: ffff88801aeb8d00 RSI: 00000000000002d1 RDI: 00000000ffffffff
RBP: ffffc9002b54f3d0 R08: ffffffff8ef0f1bf R09: 1ffffffff1de1e37
R10: dffffc0000000000 R11: fffffbfff1de1e38 R12: ffff88801d9a3a38
R13: 0000000000000000 R14: ffff88802440dd00 R15: ffffc9002b54f440
FS: 00007f863f9266c0(0000) GS:ffff88809d305000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc1ad9e3080 CR3: 0000000026d9b000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
0: 48 89 da mov %rbx,%rdx
3: 4d 89 2c 24 mov %r13,(%r12)
7: 48 85 d2 test %rdx,%rdx
a: 0f 84 eb 07 00 00 je 0x7fb
10: 4c 8b 6c 24 60 mov 0x60(%rsp),%r13
15: 49 8d 5d 18 lea 0x18(%r13),%rbx
19: 48 89 d8 mov %rbx,%rax
1c: 48 c1 e8 03 shr $0x3,%rax
20: 48 b9 00 00 00 00 00 movabs $0xdffffc0000000000,%rcx
27: fc ff df
* 2a: 80 3c 08 00 cmpb $0x0,(%rax,%rcx,1) <-- trapping instruction
2e: 48 89 54 24 10 mov %rdx,0x10(%rsp)
33: 74 0d je 0x42
35: 48 89 df mov %rbx,%rdi
38: e8 47 c2 90 f8 call 0xf890c284
3d: 48 rex.W
3e: 8b .byte 0x8b
3f: 54 push %rsp
<<<<<<<<<<<<<<< tail report >>>>>>>>>>>>>>>
^ permalink raw reply
* Re: [PATCH v3 3/3] net/smc: bound the send length to the send buffer in smc_tx_sendmsg()
From: Bryam Vargas @ 2026-06-18 22:11 UTC (permalink / raw)
To: Dust Li
Cc: Wenjia Zhang, D . Wythe, Sidraya Jayagond, Eric Dumazet,
David S . Miller, Mahanta Jambigi, Wen Gu, Simon Horman,
Ursula Braun, Stefan Raspl, Tony Lu, Paolo Abeni, Jakub Kicinski,
netdev, linux-s390, linux-rdma, linux-kernel
In-Reply-To: <ajQX7_9xFI9GSaq5@linux.alibaba.com>
On Fri, 19 Jun 2026 00:08:15 +0800, Dust Li wrote:
> I think this is the same as patch #2.
Same story as 2/3, just on the SMC-D send side: sndbuf_space accumulates
diff_tx = smc_curs_diff(sndbuf_desc->len, tx_curs_fin, cons) from the peer's consumer
cursor, so a cons alternating wrap 0<->1 walks it past sndbuf_desc->len (and negative
over time), and smc_tx_sendmsg's wrap-around write then runs off the end of the
buffer. The boundary count check doesn't bound diff_tx here either, so I'd keep the
same two-line bound. The same A/B covers it.
Bryam
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox