* [GIT PULL] Networking for v6.16-rc6 (follow up)
@ 2025-07-11 15:10 Jakub Kicinski
2025-07-11 17:44 ` pr-tracker-bot
2025-07-11 18:33 ` Linus Torvalds
0 siblings, 2 replies; 15+ messages in thread
From: Jakub Kicinski @ 2025-07-11 15:10 UTC (permalink / raw)
To: torvalds; +Cc: kuba, davem, netdev, linux-kernel, pabeni
Hi Linus!
The following changes since commit bc9ff192a6c940d9a26e21a0a82f2667067aaf5f:
Merge tag 'net-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2025-07-10 09:18:53 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git tags/net-6.16-rc6-2
for you to fetch changes up to a215b5723922f8099078478122f02100e489cb80:
netlink: make sure we allow at least one dump skb (2025-07-11 07:31:47 -0700)
----------------------------------------------------------------
Big chunk of fixes for WiFi, Johannes says probably the last
for the release. The Netlink fixes (on top of the tree) restore
operation of iw (WiFi CLI) which uses sillily small recv buffer,
and is the reason for this "emergency PR". The GRE multicast
fix also stands out among the user-visible regressions.
Current release - fix to a fix:
- netlink: make sure we always allow at least one skb to be queued,
even if the recvbuf is (mis)configured to be tiny
Previous releases - regressions:
- gre: fix IPv6 multicast route creation
Previous releases - always broken:
- wifi: prevent A-MSDU attacks in mesh networks
- wifi: cfg80211: fix S1G beacon head validation and detection
- wifi: mac80211:
- always clear frame buffer to prevent stack leak in cases which
hit a WARN()
- fix monitor interface in device restart
- wifi: mwifiex: discard erroneous disassoc frames on STA interface
- wifi: mt76:
- prevent null-deref in mt7925_sta_set_decap_offload()
- add missing RCU annotations, and fix sleep in atomic
- fix decapsulation offload
- fixes for scanning
- phy: microchip: improve link establishment and reset handling
- eth: mlx5e: fix race between DIM disable and net_dim()
- bnxt_en: correct DMA unmap len for XDP_REDIRECT
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
----------------------------------------------------------------
Alok Tiwari (1):
net: ll_temac: Fix missing tx_pending check in ethtools_set_ringparam()
Carolina Jubran (2):
net/mlx5: Reset bw_share field when changing a node's parent
net/mlx5e: Fix race between DIM disable and net_dim()
Daniil Dulov (1):
wifi: zd1211rw: Fix potential NULL pointer dereference in zd_mac_tx_to_dev()
Deren Wu (2):
wifi: mt76: mt7925: prevent NULL pointer dereference in mt7925_sta_set_decap_offload()
wifi: mt76: mt7921: prevent decap offload config before STA initialization
Eric Dumazet (1):
netfilter: flowtable: account for Ethernet header in nf_flow_pppoe_proto()
Felix Fietkau (3):
wifi: rt2x00: fix remove callback type mismatch
wifi: mt76: add a wrapper for wcid access with validation
wifi: mt76: fix queue assignment for deauth packets
Guillaume Nault (2):
gre: Fix IPv6 multicast route creation.
selftests: Add IPv6 multicast route generation tests for GRE devices.
Hangbin Liu (1):
selftests: net: lib: fix shift count out of range
Henry Martin (1):
wifi: mt76: mt7925: Fix null-ptr-deref in mt7925_thermal_init()
Jakub Kicinski (7):
Merge tag 'wireless-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Merge branch 'net-phy-microchip-lan88xx-reliability-fixes'
Merge branch 'gre-fix-default-ipv6-multicast-route-creation'
Merge tag 'linux-can-fixes-for-6.16-20250711' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Merge branch 'mlx5-misc-fixes-2025-07-10'
Merge branch 'bnxt_en-3-bug-fixes'
netlink: make sure we allow at least one dump skb
Jianbo Liu (1):
net/mlx5e: Add new prio for promiscuous mode
Johannes Berg (3):
wifi: mac80211: clear frame buffer to never leak stack
wifi: mac80211: fix non-transmitted BSSID profile search
Merge tag 'mt76-fixes-2025-07-07' of https://github.com/nbd168/wireless
Kito Xu (1):
net: appletalk: Fix device refcount leak in atrtr_create()
Kuniyuki Iwashima (1):
netlink: Fix rmem check in netlink_broadcast_deliver().
Lachlan Hodges (2):
wifi: cfg80211: fix S1G beacon head validation in nl80211
wifi: mac80211: correctly identify S1G short beacon
Leon Yen (1):
wifi: mt76: mt792x: Limit the concurrent STA and SoftAP to operate on the same channel
Lorenzo Bianconi (5):
wifi: mt76: Assume __mt76_connac_mcu_alloc_sta_req runs in atomic context
wifi: mt76: Move RCU section in mt7996_mcu_set_fixed_field()
wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl_fixed()
wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl()
wifi: mt76: Remove RCU section in mt7996_mac_sta_rc_work()
Mathy Vanhoef (1):
wifi: prevent A-MSDU attacks in mesh networks
Michael Lo (1):
wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan
Ming Yen Hsieh (2):
wifi: mt76: mt7925: fix the wrong config for tx interrupt
wifi: mt76: mt7925: fix incorrect scan probe IE handling for hw_scan
Mingming Cao (1):
ibmvnic: Fix hardcoded NUM_RX_STATS/NUM_TX_STATS with dynamic sizeof
Miri Korenblit (2):
wifi: mac80211: always initialize sdata::key_list
wifi: mac80211: add the virtual monitor after reconfig complete
Moon Hee Lee (1):
wifi: mac80211: reject VHT opmode for unsupported channel widths
Oleksij Rempel (2):
net: phy: microchip: Use genphy_soft_reset() to purge stale LPA bits
net: phy: microchip: limit 100M workaround to link-down events on LAN88xx
Pagadala Yesu Anjaneyulu (1):
wifi: mac80211: Fix uninitialized variable with __free() in ieee80211_ml_epcs()
Sean Nyekjaer (1):
can: m_can: m_can_handle_lost_msg(): downgrade msg lost in rx message to debug level
Shravya KN (1):
bnxt_en: Fix DCB ETS validation
Shruti Parab (1):
bnxt_en: Flush FW trace before copying to the coredump
Somnath Kotur (1):
bnxt_en: Set DMA unmap len correctly for XDP_REDIRECT
Vitor Soares (1):
wifi: mwifiex: discard erroneous disassoc frames on STA interface
drivers/net/can/m_can/m_can.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_coredump.c | 18 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 2 +
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 2 +-
drivers/net/ethernet/ibm/ibmvnic.h | 8 +-
drivers/net/ethernet/mellanox/mlx5/core/en/fs.h | 9 +-
drivers/net/ethernet/mellanox/mlx5/core/en_dim.c | 4 +-
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 13 +-
drivers/net/ethernet/xilinx/ll_temac_main.c | 2 +-
drivers/net/phy/microchip.c | 3 +-
drivers/net/wireless/marvell/mwifiex/util.c | 4 +-
drivers/net/wireless/mediatek/mt76/mt76.h | 10 ++
drivers/net/wireless/mediatek/mt76/mt7603/dma.c | 2 +-
drivers/net/wireless/mediatek/mt76/mt7603/mac.c | 10 +-
drivers/net/wireless/mediatek/mt76/mt7615/mac.c | 7 +-
.../net/wireless/mediatek/mt76/mt76_connac_mac.c | 2 +-
.../net/wireless/mediatek/mt76/mt76_connac_mcu.c | 6 +-
drivers/net/wireless/mediatek/mt76/mt76x02.h | 5 +-
drivers/net/wireless/mediatek/mt76/mt76x02_mac.c | 4 +-
drivers/net/wireless/mediatek/mt76/mt7915/mac.c | 12 +-
drivers/net/wireless/mediatek/mt76/mt7915/mcu.c | 2 +-
drivers/net/wireless/mediatek/mt76/mt7915/mmio.c | 5 +-
drivers/net/wireless/mediatek/mt76/mt7921/mac.c | 6 +-
drivers/net/wireless/mediatek/mt76/mt7921/main.c | 3 +
drivers/net/wireless/mediatek/mt76/mt7925/init.c | 2 +
drivers/net/wireless/mediatek/mt76/mt7925/mac.c | 6 +-
drivers/net/wireless/mediatek/mt76/mt7925/main.c | 8 +-
drivers/net/wireless/mediatek/mt76/mt7925/mcu.c | 79 ++++++--
drivers/net/wireless/mediatek/mt76/mt7925/mcu.h | 5 +-
drivers/net/wireless/mediatek/mt76/mt7925/regs.h | 2 +-
drivers/net/wireless/mediatek/mt76/mt792x_core.c | 32 +++-
drivers/net/wireless/mediatek/mt76/mt792x_mac.c | 5 +-
drivers/net/wireless/mediatek/mt76/mt7996/mac.c | 52 ++----
drivers/net/wireless/mediatek/mt76/mt7996/main.c | 5 +-
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c | 199 +++++++++++++++------
drivers/net/wireless/mediatek/mt76/mt7996/mt7996.h | 16 +-
drivers/net/wireless/mediatek/mt76/tx.c | 11 +-
drivers/net/wireless/mediatek/mt76/util.c | 2 +-
drivers/net/wireless/ralink/rt2x00/rt2x00soc.c | 4 +-
drivers/net/wireless/ralink/rt2x00/rt2x00soc.h | 2 +-
drivers/net/wireless/zydas/zd1211rw/zd_mac.c | 6 +-
include/linux/ieee80211.h | 45 +++--
include/net/netfilter/nf_flow_table.h | 2 +-
net/appletalk/ddp.c | 1 +
net/ipv6/addrconf.c | 9 +-
net/mac80211/cfg.c | 14 ++
net/mac80211/iface.c | 4 +-
net/mac80211/mlme.c | 12 +-
net/mac80211/parse.c | 6 +-
net/mac80211/util.c | 9 +-
net/netlink/af_netlink.c | 7 +-
net/wireless/nl80211.c | 7 +-
net/wireless/util.c | 52 +++++-
tools/testing/selftests/net/gre_ipv6_lladdr.sh | 27 +--
tools/testing/selftests/net/lib.sh | 2 +-
57 files changed, 500 insertions(+), 277 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 15:10 [GIT PULL] Networking for v6.16-rc6 (follow up) Jakub Kicinski
@ 2025-07-11 17:44 ` pr-tracker-bot
2025-07-11 18:33 ` Linus Torvalds
1 sibling, 0 replies; 15+ messages in thread
From: pr-tracker-bot @ 2025-07-11 17:44 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: torvalds, kuba, davem, netdev, linux-kernel, pabeni
The pull request you sent on Fri, 11 Jul 2025 08:10:02 -0700:
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git tags/net-6.16-rc6-2
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c7979c3917fa1326dae3607e1c6a04c12057b194
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 15:10 [GIT PULL] Networking for v6.16-rc6 (follow up) Jakub Kicinski
2025-07-11 17:44 ` pr-tracker-bot
@ 2025-07-11 18:33 ` Linus Torvalds
2025-07-11 18:46 ` Jakub Kicinski
1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-07-11 18:33 UTC (permalink / raw)
To: Jakub Kicinski, Thomas Zimmermann, Simona Vetter, Dave Airlie
Cc: davem, netdev, linux-kernel, pabeni, dri-devel
[ Added in some drm people too, just to give a heads-up that it isn't
all their fault ]
On Fri, 11 Jul 2025 at 08:10, Jakub Kicinski <kuba@kernel.org> wrote:
>
> The Netlink fixes (on top of the tree) restore
> operation of iw (WiFi CLI) which uses sillily small recv buffer,
> and is the reason for this "emergency PR".
So this was "useful" in the sense that it seems to have taken my
"random long delays at initial graphical login" and made them
"reliable hangs at early boot time" instead.
I originally blamed the drm tree, because there were some other issues
in there with reference counting - and because the hang happened at
that "start graphical environment", but now it really looks like two
independent issues, where the netlink issues cause the delay, and the
drm object refcounting issues were entirely separate and coincidental.
I suspect that there is bootup code that needs more than that "just
one skb", and that all the recent issues with netlink sk_rmem_alloc
are broken and need reverting.
Because this "emergency PR" does seem to have turned my "annoying
problem with timeouts at initial login" into "now it doesn't boot at
all".
Which is good in that the random timeouts and delays were looking like
a nightmare to bisect, and now it looks like at least the cause of
them is more clear.
But it's certainly not good in the sense of "we're at almost rc6, we
shouldn't be having these kinds of issues".
The machine I see this on doesn't actually use WiFi at all, but there
*is* a WiFi chip in it, I just turn off that interface in favor of the
wired ports.
But obviously there might also be various other netlink users that are
unhappy with the accounting changes, so the WiFi angle may be a red
herring.
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 18:33 ` Linus Torvalds
@ 2025-07-11 18:46 ` Jakub Kicinski
2025-07-11 18:54 ` Linus Torvalds
0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2025-07-11 18:46 UTC (permalink / raw)
To: Linus Torvalds
Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev,
linux-kernel, pabeni, dri-devel
On Fri, 11 Jul 2025 11:33:10 -0700 Linus Torvalds wrote:
> Because this "emergency PR" does seem to have turned my "annoying
> problem with timeouts at initial login" into "now it doesn't boot at
> all".
Hm. I'm definitely okay with reverting. So if you revert these three:
a3c4a125ec72 ("netlink: Fix rmem check in netlink_broadcast_deliver().")
a3c4a125ec72 ("netlink: Fix rmem check in netlink_broadcast_deliver().")
ae8f160e7eb2 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.")
everything is just fine?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 18:46 ` Jakub Kicinski
@ 2025-07-11 18:54 ` Linus Torvalds
2025-07-11 19:18 ` Linus Torvalds
0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-07-11 18:54 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev,
linux-kernel, pabeni, dri-devel
On Fri, 11 Jul 2025 at 11:46, Jakub Kicinski <kuba@kernel.org> wrote:
>
> Hm. I'm definitely okay with reverting. So if you revert these three:
>
> a3c4a125ec72 ("netlink: Fix rmem check in netlink_broadcast_deliver().")
> a3c4a125ec72 ("netlink: Fix rmem check in netlink_broadcast_deliver().")
> ae8f160e7eb2 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.")
>
> everything is just fine?
I'm assuming you mean
a215b5723922 netlink: make sure we allow at least one dump skb
a3c4a125ec72 netlink: Fix rmem check in netlink_broadcast_deliver().
ae8f160e7eb2 netlink: Fix wraparounds of sk->sk_rmem_alloc.
Will do more testing.
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 18:54 ` Linus Torvalds
@ 2025-07-11 19:18 ` Linus Torvalds
2025-07-11 19:30 ` Linus Torvalds
0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-07-11 19:18 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev,
linux-kernel, pabeni, dri-devel
On Fri, 11 Jul 2025 at 11:54, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Will do more testing.
Bah. What I thought was a "reliable hang" isn't actually that at all.
It ends up still being very random indeed.
That said, I do think it's related to this netlink issue, because the
symptoms end up being random delays.
I've seen it at boot before even logging in (I saw that twice in a row
after the latest networking pull, which is why I thought it was
reliable).
But the much more common situation is that some random gnome app ends
up hanging and then timing out.
Sometimes it's gnome-shell itself, so when I log in nothing happens,
and then after a 30s timeout gnome-shell times out and I get back the
login window.
That was what I *thought* was the common failure case, but it turns
out that I've now several times seen just random other applications
having that issue. This boot, for example, things "worked", except
starting gnome-terminal took a long time, and then I get a random
crash report for gsd-screensaver-proxy.
The backtrace for that was
g_bus_get_sync ->
initable_init ->
g_data_input_stream_read_line ->
g_buffered_input_stream_fill ->
g_buffered_input_stream_real_fill ->
g_input_stream_read ->
g_socket_receive_with_timeout ->
g_socket_condition_timed_wait ->
poll ->
__syscall_cancel
and I suspect these are all symptoms of the same thing.
My *guess* is that all of these things use a netlink socket, and
presumably it's the *other* end of the socket has filled up its
receive queue and is dropping packets as a result, and never
answering, so then - entirely randomly - depending on how overworked
things got, and which requests got dropped, some poor gnome process
never gets a reply and times out and the thing fails.
And sometimes the things that fail are not very critical (like some
gsd-screensaver-proxy) and I can log in happily. And sometimes they
are rather more critical and nothing works.
Anyway, because it's so damn random, it's neither bisectable nor easy
to know when something is "fixed".
I spent several hours yesterday chasing all the wrong things (because
I thought it was in drm), and often thought "Oh, that fixed it". Only
to then realize that nope, the problem still happens.
I will test the reverts. Several times.
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 19:18 ` Linus Torvalds
@ 2025-07-11 19:30 ` Linus Torvalds
2025-07-11 19:34 ` Jakub Kicinski
2025-07-11 19:42 ` Linus Torvalds
0 siblings, 2 replies; 15+ messages in thread
From: Linus Torvalds @ 2025-07-11 19:30 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev,
linux-kernel, pabeni, dri-devel
On Fri, 11 Jul 2025 at 12:18, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I spent several hours yesterday chasing all the wrong things (because
> I thought it was in drm), and often thought "Oh, that fixed it". Only
> to then realize that nope, the problem still happens.
>
> I will test the reverts. Several times.
Well, the first boot with those three commits reverted shows no problem at all.
But as mentioned, I've now had "Oh, that fixed it" about ten times.
So that "Oh, it worked this time" has been tainted by past experience.
Will do several more boots now in the hope that it's gone for good.
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 19:30 ` Linus Torvalds
@ 2025-07-11 19:34 ` Jakub Kicinski
2025-07-11 19:42 ` Linus Torvalds
1 sibling, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2025-07-11 19:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev,
linux-kernel, pabeni, dri-devel
On Fri, 11 Jul 2025 12:30:28 -0700 Linus Torvalds wrote:
> > I spent several hours yesterday chasing all the wrong things (because
> > I thought it was in drm), and often thought "Oh, that fixed it". Only
> > to then realize that nope, the problem still happens.
> >
> > I will test the reverts. Several times.
>
> Well, the first boot with those three commits reverted shows no problem at all.
>
> But as mentioned, I've now had "Oh, that fixed it" about ten times.
>
> So that "Oh, it worked this time" has been tainted by past experience.
> Will do several more boots now in the hope that it's gone for good.
Fingers crossed. FWIW /proc/net/netlink should show the socket
drop counters. But my laptop running 6.15 has a number of
GNOME apps which never read their sockets so it's not going to
be as immediately obvious whether we regressed or its a bad app
as I hoped.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 19:30 ` Linus Torvalds
2025-07-11 19:34 ` Jakub Kicinski
@ 2025-07-11 19:42 ` Linus Torvalds
2025-07-11 19:53 ` Jakub Kicinski
1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-07-11 19:42 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev,
linux-kernel, pabeni, dri-devel
On Fri, 11 Jul 2025 at 12:30, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So that "Oh, it worked this time" has been tainted by past experience.
> Will do several more boots now in the hope that it's gone for good.
Yeah, no.
There's still something wrong. The second boot looked fine, but then
starting chrome had a 15s delay, and when that cleared I got a
notification that 'gnome-settings-daemon' had crashed.
And the backtrace is basically identical to the one I saw with
gsd-screensaver-proxy.
So it's some socket that times out, but reverting these three
a215b5723922 netlink: make sure we allow at least one dump skb
a3c4a125ec72 netlink: Fix rmem check in netlink_broadcast_deliver().
ae8f160e7eb2 netlink: Fix wraparounds of sk->sk_rmem_alloc.
did *not* fix it.
Were there any other socket changes perhaps?
I just looked, and gsd-screensaver-proxy seems to use a regular Unix
domain stream socket. Maybe not related to netlink, did unix domain
sockets end up with some similar changes?
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 19:42 ` Linus Torvalds
@ 2025-07-11 19:53 ` Jakub Kicinski
2025-07-11 20:07 ` Linus Torvalds
0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2025-07-11 19:53 UTC (permalink / raw)
To: Linus Torvalds
Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev,
linux-kernel, pabeni, dri-devel
On Fri, 11 Jul 2025 12:42:54 -0700 Linus Torvalds wrote:
> Were there any other socket changes perhaps?
>
> I just looked, and gsd-screensaver-proxy seems to use a regular Unix
> domain stream socket. Maybe not related to netlink, did unix domain
> sockets end up with some similar changes?
Humpf. Not that I can see, here's a list of commits since rc5 we sent
minus all the driver and wifi and data center stuff:
a3c4a125ec72 ("netlink: Fix rmem check in netlink_broadcast_deliver().")
a215b5723922 ("netlink: make sure we allow at least one dump skb")
ae8f160e7eb2 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.")
ef9675b0ef03 ("Bluetooth: hci_sync: Fix not disabling advertising instance")
59710a26a289 ("Bluetooth: hci_core: Remove check of BDADDR_ANY in hci_conn_hash_lookup_big_state")
314d30b15086 ("Bluetooth: hci_sync: Fix attempting to send HCI_Disconnect to BIS handle")
c7349772c268 ("Bluetooth: hci_event: Fix not marking Broadcast Sink BIS as connected")
d3a5f2871adc ("tcp: Correct signedness in skb remaining space calculation")
1a03edeb84e6 ("tcp: refine sk_rcvbuf increase for ooo packets")
ffdde7bf5a43 ("net/sched: Abort __tc_modify_qdisc if parent class does not exist")
Let me keep digging but other than the netlink stuff the rest doesn't
stand out..
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 19:53 ` Jakub Kicinski
@ 2025-07-11 20:07 ` Linus Torvalds
2025-07-11 20:35 ` Linus Torvalds
0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-07-11 20:07 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev,
linux-kernel, pabeni, dri-devel
On Fri, 11 Jul 2025 at 12:53, Jakub Kicinski <kuba@kernel.org> wrote:
>
> Let me keep digging but other than the netlink stuff the rest doesn't
> stand out..
Oh well. I think I'll just have to go back to bisecting this thing.
I've tried to do that several times, and it has failed due to being
too flaky, but I think I've learnt the signs to look out for better
too.
For example, the first few times I was just looking for "not able to
log in", because I hadn't caught on to the fact that sometimes the
failures simply didn't hit something very important.
This clearly is timing-sensitive, and it's presumably hardware-dependent too.
And it could easily be that some bootup process gets stuck on
something entirely unrelated. Some random driver change - sound, pin
control, whatever - might then just end up having odd interactions.
I don't see any issues on my laptop. And considering how random the
behavior problems are, it could have been going on for a while without
me ever realizing it (plus I was running a distro kernel for at least
a few days without even noticing that I wasn't running my own build
any more).
I was hoping it was some known problem, because I'm not sure how
successful a bisect will be.
I guess I had nothing better to do this weekend anyway....
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 20:07 ` Linus Torvalds
@ 2025-07-11 20:35 ` Linus Torvalds
2025-07-11 21:46 ` Linus Torvalds
0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-07-11 20:35 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev,
linux-kernel, pabeni, dri-devel
On Fri, 11 Jul 2025 at 13:07, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Oh well. I think I'll just have to go back to bisecting this thing.
> I've tried to do that several times, and it has failed due to being
> too flaky, but I think I've learnt the signs to look out for better
> too.
Indeed. It turns out that the problem actually started somewhere
between rc4 and rc5, and all my previous bisections never even came
close, because kernels usually work well enough that I never realized
that it went back that far.
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 20:35 ` Linus Torvalds
@ 2025-07-11 21:46 ` Linus Torvalds
2025-07-11 22:19 ` Linus Torvalds
0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-07-11 21:46 UTC (permalink / raw)
To: Jakub Kicinski, Frederic Weisbecker, Valentin Schneider, Nam Cao,
Christian Brauner
Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev,
linux-kernel, pabeni, dri-devel
On Fri, 11 Jul 2025 at 13:35, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Indeed. It turns out that the problem actually started somewhere
> between rc4 and rc5, and all my previous bisections never even came
> close, because kernels usually work well enough that I never realized
> that it went back that far.
It looks like it's actually due to commit 8c44dac8add7 ("eventpoll:
Fix priority inversion problem"), and it's been going on for a while
now and the behavior was just too subtle for me to have noticed.
Does not look hardware-specific, except in the sense that it probably
needs several CPU's along with the odd startup pattern to trigger
this.
It's possible that the bisection ended up wrong, and when it appeared
to start going off in the weeds I was like "this is broken again", but
before I marked a kernel "good" I tested it several times, and then in
the end that "eventpoll: Fix priority inversion problem" kind of makes
sense after all.
I would never have guessed at that commit otherwise (well, considering
that I blamed both the drm code and the netlink code first, that goes
without saying), but at the same time, that *is* the kind of change
that would certainly make user space get hung up with odd timeouts.
I've only tested the previous commit being good twice now, but I'll go
back to the head of tree and try a revert to verify that this is
really it. Because maybe it's the now Nth time I found something that
hides the problem, not the real issue.
Fingers crossed that this very timing-dependent odd problem really did
bisect right finally, after many false starts.
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 21:46 ` Linus Torvalds
@ 2025-07-11 22:19 ` Linus Torvalds
2025-07-11 23:58 ` Nam Cao
0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-07-11 22:19 UTC (permalink / raw)
To: Jakub Kicinski, Frederic Weisbecker, Valentin Schneider, Nam Cao,
Christian Brauner
Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev,
linux-kernel, pabeni, dri-devel
On Fri, 11 Jul 2025 at 14:46, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I've only tested the previous commit being good twice now, but I'll go
> back to the head of tree and try a revert to verify that this is
> really it. Because maybe it's the now Nth time I found something that
> hides the problem, not the real issue.
>
> Fingers crossed that this very timing-dependent odd problem really did
> bisect right finally, after many false starts.
Ok, verified. Finally.
I've rebooted this machine five times now with the revert in place,
and now that I know to recognize all the subtler signs of breakage,
I'm pretty sure I finally got the right culprit.
Sometimes the breakage is literally just something like "it takes an
extra ten or fifteen seconds to start up some app" and then everything
ends up working, which is why it was so easy to overlook, and why my
other bisection attempts were such abject failures.
But that last bisection when I was more careful and knew what to look
for ended up laser-guided to that thing.
And apologies to the drm and netlink people who I initially blamed
just because there were unrelated bugs that just got merged in the
timeframe when I started noticing oddities. You may have had your own
bugs, but you were blameless on this issue that I basically spent the
last day on (I'd say "wasted" the last day on, but right now I feel
good about finding it, so I guess it wasn't wasted time after all).
Anyway, I think reverting that commit 8c44dac8add7 ("eventpoll: Fix
priority inversion problem") is the right thing for 6.16, and
hopefully Nam Cao & co can figure out what went wrong and we'll
revisit this in the future.
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up)
2025-07-11 22:19 ` Linus Torvalds
@ 2025-07-11 23:58 ` Nam Cao
0 siblings, 0 replies; 15+ messages in thread
From: Nam Cao @ 2025-07-11 23:58 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jakub Kicinski, Frederic Weisbecker, Valentin Schneider,
Christian Brauner, Thomas Zimmermann, Simona Vetter, Dave Airlie,
davem, netdev, linux-kernel, pabeni, dri-devel
On Fri, Jul 11, 2025 at 03:19:00PM -0700, Linus Torvalds wrote:
> On Fri, 11 Jul 2025 at 14:46, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I've only tested the previous commit being good twice now, but I'll go
> > back to the head of tree and try a revert to verify that this is
> > really it. Because maybe it's the now Nth time I found something that
> > hides the problem, not the real issue.
> >
> > Fingers crossed that this very timing-dependent odd problem really did
> > bisect right finally, after many false starts.
>
> Ok, verified. Finally.
>
> I've rebooted this machine five times now with the revert in place,
> and now that I know to recognize all the subtler signs of breakage,
> I'm pretty sure I finally got the right culprit.
>
> Sometimes the breakage is literally just something like "it takes an
> extra ten or fifteen seconds to start up some app" and then everything
> ends up working, which is why it was so easy to overlook, and why my
> other bisection attempts were such abject failures.
>
> But that last bisection when I was more careful and knew what to look
> for ended up laser-guided to that thing.
>
> And apologies to the drm and netlink people who I initially blamed
> just because there were unrelated bugs that just got merged in the
> timeframe when I started noticing oddities. You may have had your own
> bugs, but you were blameless on this issue that I basically spent the
> last day on (I'd say "wasted" the last day on, but right now I feel
> good about finding it, so I guess it wasn't wasted time after all).
>
> Anyway, I think reverting that commit 8c44dac8add7 ("eventpoll: Fix
> priority inversion problem") is the right thing for 6.16, and
> hopefully Nam Cao & co can figure out what went wrong and we'll
> revisit this in the future.
Yes, please revert it. I had another person reported to me earlier today
about a breakage. We also think that reverting this commit for 6.16 is the
right thing.
Sorry for causing trouble. Strangely my laptop has been running with this
commit for ~6 weeks now without any trouble. Maybe I shouldn't have touched
this lockless business in the first place.
Best regards,
Nam
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-07-11 23:58 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-11 15:10 [GIT PULL] Networking for v6.16-rc6 (follow up) Jakub Kicinski
2025-07-11 17:44 ` pr-tracker-bot
2025-07-11 18:33 ` Linus Torvalds
2025-07-11 18:46 ` Jakub Kicinski
2025-07-11 18:54 ` Linus Torvalds
2025-07-11 19:18 ` Linus Torvalds
2025-07-11 19:30 ` Linus Torvalds
2025-07-11 19:34 ` Jakub Kicinski
2025-07-11 19:42 ` Linus Torvalds
2025-07-11 19:53 ` Jakub Kicinski
2025-07-11 20:07 ` Linus Torvalds
2025-07-11 20:35 ` Linus Torvalds
2025-07-11 21:46 ` Linus Torvalds
2025-07-11 22:19 ` Linus Torvalds
2025-07-11 23:58 ` Nam Cao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).