* [GIT PULL] Networking for v6.16-rc6 (follow up) @ 2025-07-11 15:10 Jakub Kicinski 2025-07-11 17:44 ` pr-tracker-bot 2025-07-11 18:33 ` Linus Torvalds 0 siblings, 2 replies; 15+ messages in thread From: Jakub Kicinski @ 2025-07-11 15:10 UTC (permalink / raw) To: torvalds; +Cc: kuba, davem, netdev, linux-kernel, pabeni Hi Linus! The following changes since commit bc9ff192a6c940d9a26e21a0a82f2667067aaf5f: Merge tag 'net-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2025-07-10 09:18:53 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git tags/net-6.16-rc6-2 for you to fetch changes up to a215b5723922f8099078478122f02100e489cb80: netlink: make sure we allow at least one dump skb (2025-07-11 07:31:47 -0700) ---------------------------------------------------------------- Big chunk of fixes for WiFi, Johannes says probably the last for the release. The Netlink fixes (on top of the tree) restore operation of iw (WiFi CLI) which uses sillily small recv buffer, and is the reason for this "emergency PR". The GRE multicast fix also stands out among the user-visible regressions. Current release - fix to a fix: - netlink: make sure we always allow at least one skb to be queued, even if the recvbuf is (mis)configured to be tiny Previous releases - regressions: - gre: fix IPv6 multicast route creation Previous releases - always broken: - wifi: prevent A-MSDU attacks in mesh networks - wifi: cfg80211: fix S1G beacon head validation and detection - wifi: mac80211: - always clear frame buffer to prevent stack leak in cases which hit a WARN() - fix monitor interface in device restart - wifi: mwifiex: discard erroneous disassoc frames on STA interface - wifi: mt76: - prevent null-deref in mt7925_sta_set_decap_offload() - add missing RCU annotations, and fix sleep in atomic - fix decapsulation offload - fixes for scanning - phy: microchip: improve link establishment and reset handling - eth: mlx5e: fix race between DIM disable and net_dim() - bnxt_en: correct DMA unmap len for XDP_REDIRECT Signed-off-by: Jakub Kicinski <kuba@kernel.org> ---------------------------------------------------------------- Alok Tiwari (1): net: ll_temac: Fix missing tx_pending check in ethtools_set_ringparam() Carolina Jubran (2): net/mlx5: Reset bw_share field when changing a node's parent net/mlx5e: Fix race between DIM disable and net_dim() Daniil Dulov (1): wifi: zd1211rw: Fix potential NULL pointer dereference in zd_mac_tx_to_dev() Deren Wu (2): wifi: mt76: mt7925: prevent NULL pointer dereference in mt7925_sta_set_decap_offload() wifi: mt76: mt7921: prevent decap offload config before STA initialization Eric Dumazet (1): netfilter: flowtable: account for Ethernet header in nf_flow_pppoe_proto() Felix Fietkau (3): wifi: rt2x00: fix remove callback type mismatch wifi: mt76: add a wrapper for wcid access with validation wifi: mt76: fix queue assignment for deauth packets Guillaume Nault (2): gre: Fix IPv6 multicast route creation. selftests: Add IPv6 multicast route generation tests for GRE devices. Hangbin Liu (1): selftests: net: lib: fix shift count out of range Henry Martin (1): wifi: mt76: mt7925: Fix null-ptr-deref in mt7925_thermal_init() Jakub Kicinski (7): Merge tag 'wireless-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Merge branch 'net-phy-microchip-lan88xx-reliability-fixes' Merge branch 'gre-fix-default-ipv6-multicast-route-creation' Merge tag 'linux-can-fixes-for-6.16-20250711' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Merge branch 'mlx5-misc-fixes-2025-07-10' Merge branch 'bnxt_en-3-bug-fixes' netlink: make sure we allow at least one dump skb Jianbo Liu (1): net/mlx5e: Add new prio for promiscuous mode Johannes Berg (3): wifi: mac80211: clear frame buffer to never leak stack wifi: mac80211: fix non-transmitted BSSID profile search Merge tag 'mt76-fixes-2025-07-07' of https://github.com/nbd168/wireless Kito Xu (1): net: appletalk: Fix device refcount leak in atrtr_create() Kuniyuki Iwashima (1): netlink: Fix rmem check in netlink_broadcast_deliver(). Lachlan Hodges (2): wifi: cfg80211: fix S1G beacon head validation in nl80211 wifi: mac80211: correctly identify S1G short beacon Leon Yen (1): wifi: mt76: mt792x: Limit the concurrent STA and SoftAP to operate on the same channel Lorenzo Bianconi (5): wifi: mt76: Assume __mt76_connac_mcu_alloc_sta_req runs in atomic context wifi: mt76: Move RCU section in mt7996_mcu_set_fixed_field() wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl_fixed() wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl() wifi: mt76: Remove RCU section in mt7996_mac_sta_rc_work() Mathy Vanhoef (1): wifi: prevent A-MSDU attacks in mesh networks Michael Lo (1): wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan Ming Yen Hsieh (2): wifi: mt76: mt7925: fix the wrong config for tx interrupt wifi: mt76: mt7925: fix incorrect scan probe IE handling for hw_scan Mingming Cao (1): ibmvnic: Fix hardcoded NUM_RX_STATS/NUM_TX_STATS with dynamic sizeof Miri Korenblit (2): wifi: mac80211: always initialize sdata::key_list wifi: mac80211: add the virtual monitor after reconfig complete Moon Hee Lee (1): wifi: mac80211: reject VHT opmode for unsupported channel widths Oleksij Rempel (2): net: phy: microchip: Use genphy_soft_reset() to purge stale LPA bits net: phy: microchip: limit 100M workaround to link-down events on LAN88xx Pagadala Yesu Anjaneyulu (1): wifi: mac80211: Fix uninitialized variable with __free() in ieee80211_ml_epcs() Sean Nyekjaer (1): can: m_can: m_can_handle_lost_msg(): downgrade msg lost in rx message to debug level Shravya KN (1): bnxt_en: Fix DCB ETS validation Shruti Parab (1): bnxt_en: Flush FW trace before copying to the coredump Somnath Kotur (1): bnxt_en: Set DMA unmap len correctly for XDP_REDIRECT Vitor Soares (1): wifi: mwifiex: discard erroneous disassoc frames on STA interface drivers/net/can/m_can/m_can.c | 2 +- drivers/net/ethernet/broadcom/bnxt/bnxt_coredump.c | 18 +- drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 2 + drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 2 +- drivers/net/ethernet/ibm/ibmvnic.h | 8 +- drivers/net/ethernet/mellanox/mlx5/core/en/fs.h | 9 +- drivers/net/ethernet/mellanox/mlx5/core/en_dim.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 13 +- drivers/net/ethernet/xilinx/ll_temac_main.c | 2 +- drivers/net/phy/microchip.c | 3 +- drivers/net/wireless/marvell/mwifiex/util.c | 4 +- drivers/net/wireless/mediatek/mt76/mt76.h | 10 ++ drivers/net/wireless/mediatek/mt76/mt7603/dma.c | 2 +- drivers/net/wireless/mediatek/mt76/mt7603/mac.c | 10 +- drivers/net/wireless/mediatek/mt76/mt7615/mac.c | 7 +- .../net/wireless/mediatek/mt76/mt76_connac_mac.c | 2 +- .../net/wireless/mediatek/mt76/mt76_connac_mcu.c | 6 +- drivers/net/wireless/mediatek/mt76/mt76x02.h | 5 +- drivers/net/wireless/mediatek/mt76/mt76x02_mac.c | 4 +- drivers/net/wireless/mediatek/mt76/mt7915/mac.c | 12 +- drivers/net/wireless/mediatek/mt76/mt7915/mcu.c | 2 +- drivers/net/wireless/mediatek/mt76/mt7915/mmio.c | 5 +- drivers/net/wireless/mediatek/mt76/mt7921/mac.c | 6 +- drivers/net/wireless/mediatek/mt76/mt7921/main.c | 3 + drivers/net/wireless/mediatek/mt76/mt7925/init.c | 2 + drivers/net/wireless/mediatek/mt76/mt7925/mac.c | 6 +- drivers/net/wireless/mediatek/mt76/mt7925/main.c | 8 +- drivers/net/wireless/mediatek/mt76/mt7925/mcu.c | 79 ++++++-- drivers/net/wireless/mediatek/mt76/mt7925/mcu.h | 5 +- drivers/net/wireless/mediatek/mt76/mt7925/regs.h | 2 +- drivers/net/wireless/mediatek/mt76/mt792x_core.c | 32 +++- drivers/net/wireless/mediatek/mt76/mt792x_mac.c | 5 +- drivers/net/wireless/mediatek/mt76/mt7996/mac.c | 52 ++---- drivers/net/wireless/mediatek/mt76/mt7996/main.c | 5 +- drivers/net/wireless/mediatek/mt76/mt7996/mcu.c | 199 +++++++++++++++------ drivers/net/wireless/mediatek/mt76/mt7996/mt7996.h | 16 +- drivers/net/wireless/mediatek/mt76/tx.c | 11 +- drivers/net/wireless/mediatek/mt76/util.c | 2 +- drivers/net/wireless/ralink/rt2x00/rt2x00soc.c | 4 +- drivers/net/wireless/ralink/rt2x00/rt2x00soc.h | 2 +- drivers/net/wireless/zydas/zd1211rw/zd_mac.c | 6 +- include/linux/ieee80211.h | 45 +++-- include/net/netfilter/nf_flow_table.h | 2 +- net/appletalk/ddp.c | 1 + net/ipv6/addrconf.c | 9 +- net/mac80211/cfg.c | 14 ++ net/mac80211/iface.c | 4 +- net/mac80211/mlme.c | 12 +- net/mac80211/parse.c | 6 +- net/mac80211/util.c | 9 +- net/netlink/af_netlink.c | 7 +- net/wireless/nl80211.c | 7 +- net/wireless/util.c | 52 +++++- tools/testing/selftests/net/gre_ipv6_lladdr.sh | 27 +-- tools/testing/selftests/net/lib.sh | 2 +- 57 files changed, 500 insertions(+), 277 deletions(-) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 15:10 [GIT PULL] Networking for v6.16-rc6 (follow up) Jakub Kicinski @ 2025-07-11 17:44 ` pr-tracker-bot 2025-07-11 18:33 ` Linus Torvalds 1 sibling, 0 replies; 15+ messages in thread From: pr-tracker-bot @ 2025-07-11 17:44 UTC (permalink / raw) To: Jakub Kicinski; +Cc: torvalds, kuba, davem, netdev, linux-kernel, pabeni The pull request you sent on Fri, 11 Jul 2025 08:10:02 -0700: > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git tags/net-6.16-rc6-2 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/c7979c3917fa1326dae3607e1c6a04c12057b194 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 15:10 [GIT PULL] Networking for v6.16-rc6 (follow up) Jakub Kicinski 2025-07-11 17:44 ` pr-tracker-bot @ 2025-07-11 18:33 ` Linus Torvalds 2025-07-11 18:46 ` Jakub Kicinski 1 sibling, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2025-07-11 18:33 UTC (permalink / raw) To: Jakub Kicinski, Thomas Zimmermann, Simona Vetter, Dave Airlie Cc: davem, netdev, linux-kernel, pabeni, dri-devel [ Added in some drm people too, just to give a heads-up that it isn't all their fault ] On Fri, 11 Jul 2025 at 08:10, Jakub Kicinski <kuba@kernel.org> wrote: > > The Netlink fixes (on top of the tree) restore > operation of iw (WiFi CLI) which uses sillily small recv buffer, > and is the reason for this "emergency PR". So this was "useful" in the sense that it seems to have taken my "random long delays at initial graphical login" and made them "reliable hangs at early boot time" instead. I originally blamed the drm tree, because there were some other issues in there with reference counting - and because the hang happened at that "start graphical environment", but now it really looks like two independent issues, where the netlink issues cause the delay, and the drm object refcounting issues were entirely separate and coincidental. I suspect that there is bootup code that needs more than that "just one skb", and that all the recent issues with netlink sk_rmem_alloc are broken and need reverting. Because this "emergency PR" does seem to have turned my "annoying problem with timeouts at initial login" into "now it doesn't boot at all". Which is good in that the random timeouts and delays were looking like a nightmare to bisect, and now it looks like at least the cause of them is more clear. But it's certainly not good in the sense of "we're at almost rc6, we shouldn't be having these kinds of issues". The machine I see this on doesn't actually use WiFi at all, but there *is* a WiFi chip in it, I just turn off that interface in favor of the wired ports. But obviously there might also be various other netlink users that are unhappy with the accounting changes, so the WiFi angle may be a red herring. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 18:33 ` Linus Torvalds @ 2025-07-11 18:46 ` Jakub Kicinski 2025-07-11 18:54 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Jakub Kicinski @ 2025-07-11 18:46 UTC (permalink / raw) To: Linus Torvalds Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev, linux-kernel, pabeni, dri-devel On Fri, 11 Jul 2025 11:33:10 -0700 Linus Torvalds wrote: > Because this "emergency PR" does seem to have turned my "annoying > problem with timeouts at initial login" into "now it doesn't boot at > all". Hm. I'm definitely okay with reverting. So if you revert these three: a3c4a125ec72 ("netlink: Fix rmem check in netlink_broadcast_deliver().") a3c4a125ec72 ("netlink: Fix rmem check in netlink_broadcast_deliver().") ae8f160e7eb2 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.") everything is just fine? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 18:46 ` Jakub Kicinski @ 2025-07-11 18:54 ` Linus Torvalds 2025-07-11 19:18 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2025-07-11 18:54 UTC (permalink / raw) To: Jakub Kicinski Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev, linux-kernel, pabeni, dri-devel On Fri, 11 Jul 2025 at 11:46, Jakub Kicinski <kuba@kernel.org> wrote: > > Hm. I'm definitely okay with reverting. So if you revert these three: > > a3c4a125ec72 ("netlink: Fix rmem check in netlink_broadcast_deliver().") > a3c4a125ec72 ("netlink: Fix rmem check in netlink_broadcast_deliver().") > ae8f160e7eb2 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.") > > everything is just fine? I'm assuming you mean a215b5723922 netlink: make sure we allow at least one dump skb a3c4a125ec72 netlink: Fix rmem check in netlink_broadcast_deliver(). ae8f160e7eb2 netlink: Fix wraparounds of sk->sk_rmem_alloc. Will do more testing. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 18:54 ` Linus Torvalds @ 2025-07-11 19:18 ` Linus Torvalds 2025-07-11 19:30 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2025-07-11 19:18 UTC (permalink / raw) To: Jakub Kicinski Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev, linux-kernel, pabeni, dri-devel On Fri, 11 Jul 2025 at 11:54, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Will do more testing. Bah. What I thought was a "reliable hang" isn't actually that at all. It ends up still being very random indeed. That said, I do think it's related to this netlink issue, because the symptoms end up being random delays. I've seen it at boot before even logging in (I saw that twice in a row after the latest networking pull, which is why I thought it was reliable). But the much more common situation is that some random gnome app ends up hanging and then timing out. Sometimes it's gnome-shell itself, so when I log in nothing happens, and then after a 30s timeout gnome-shell times out and I get back the login window. That was what I *thought* was the common failure case, but it turns out that I've now several times seen just random other applications having that issue. This boot, for example, things "worked", except starting gnome-terminal took a long time, and then I get a random crash report for gsd-screensaver-proxy. The backtrace for that was g_bus_get_sync -> initable_init -> g_data_input_stream_read_line -> g_buffered_input_stream_fill -> g_buffered_input_stream_real_fill -> g_input_stream_read -> g_socket_receive_with_timeout -> g_socket_condition_timed_wait -> poll -> __syscall_cancel and I suspect these are all symptoms of the same thing. My *guess* is that all of these things use a netlink socket, and presumably it's the *other* end of the socket has filled up its receive queue and is dropping packets as a result, and never answering, so then - entirely randomly - depending on how overworked things got, and which requests got dropped, some poor gnome process never gets a reply and times out and the thing fails. And sometimes the things that fail are not very critical (like some gsd-screensaver-proxy) and I can log in happily. And sometimes they are rather more critical and nothing works. Anyway, because it's so damn random, it's neither bisectable nor easy to know when something is "fixed". I spent several hours yesterday chasing all the wrong things (because I thought it was in drm), and often thought "Oh, that fixed it". Only to then realize that nope, the problem still happens. I will test the reverts. Several times. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 19:18 ` Linus Torvalds @ 2025-07-11 19:30 ` Linus Torvalds 2025-07-11 19:34 ` Jakub Kicinski 2025-07-11 19:42 ` Linus Torvalds 0 siblings, 2 replies; 15+ messages in thread From: Linus Torvalds @ 2025-07-11 19:30 UTC (permalink / raw) To: Jakub Kicinski Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev, linux-kernel, pabeni, dri-devel On Fri, 11 Jul 2025 at 12:18, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I spent several hours yesterday chasing all the wrong things (because > I thought it was in drm), and often thought "Oh, that fixed it". Only > to then realize that nope, the problem still happens. > > I will test the reverts. Several times. Well, the first boot with those three commits reverted shows no problem at all. But as mentioned, I've now had "Oh, that fixed it" about ten times. So that "Oh, it worked this time" has been tainted by past experience. Will do several more boots now in the hope that it's gone for good. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 19:30 ` Linus Torvalds @ 2025-07-11 19:34 ` Jakub Kicinski 2025-07-11 19:42 ` Linus Torvalds 1 sibling, 0 replies; 15+ messages in thread From: Jakub Kicinski @ 2025-07-11 19:34 UTC (permalink / raw) To: Linus Torvalds Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev, linux-kernel, pabeni, dri-devel On Fri, 11 Jul 2025 12:30:28 -0700 Linus Torvalds wrote: > > I spent several hours yesterday chasing all the wrong things (because > > I thought it was in drm), and often thought "Oh, that fixed it". Only > > to then realize that nope, the problem still happens. > > > > I will test the reverts. Several times. > > Well, the first boot with those three commits reverted shows no problem at all. > > But as mentioned, I've now had "Oh, that fixed it" about ten times. > > So that "Oh, it worked this time" has been tainted by past experience. > Will do several more boots now in the hope that it's gone for good. Fingers crossed. FWIW /proc/net/netlink should show the socket drop counters. But my laptop running 6.15 has a number of GNOME apps which never read their sockets so it's not going to be as immediately obvious whether we regressed or its a bad app as I hoped. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 19:30 ` Linus Torvalds 2025-07-11 19:34 ` Jakub Kicinski @ 2025-07-11 19:42 ` Linus Torvalds 2025-07-11 19:53 ` Jakub Kicinski 1 sibling, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2025-07-11 19:42 UTC (permalink / raw) To: Jakub Kicinski Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev, linux-kernel, pabeni, dri-devel On Fri, 11 Jul 2025 at 12:30, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So that "Oh, it worked this time" has been tainted by past experience. > Will do several more boots now in the hope that it's gone for good. Yeah, no. There's still something wrong. The second boot looked fine, but then starting chrome had a 15s delay, and when that cleared I got a notification that 'gnome-settings-daemon' had crashed. And the backtrace is basically identical to the one I saw with gsd-screensaver-proxy. So it's some socket that times out, but reverting these three a215b5723922 netlink: make sure we allow at least one dump skb a3c4a125ec72 netlink: Fix rmem check in netlink_broadcast_deliver(). ae8f160e7eb2 netlink: Fix wraparounds of sk->sk_rmem_alloc. did *not* fix it. Were there any other socket changes perhaps? I just looked, and gsd-screensaver-proxy seems to use a regular Unix domain stream socket. Maybe not related to netlink, did unix domain sockets end up with some similar changes? Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 19:42 ` Linus Torvalds @ 2025-07-11 19:53 ` Jakub Kicinski 2025-07-11 20:07 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Jakub Kicinski @ 2025-07-11 19:53 UTC (permalink / raw) To: Linus Torvalds Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev, linux-kernel, pabeni, dri-devel On Fri, 11 Jul 2025 12:42:54 -0700 Linus Torvalds wrote: > Were there any other socket changes perhaps? > > I just looked, and gsd-screensaver-proxy seems to use a regular Unix > domain stream socket. Maybe not related to netlink, did unix domain > sockets end up with some similar changes? Humpf. Not that I can see, here's a list of commits since rc5 we sent minus all the driver and wifi and data center stuff: a3c4a125ec72 ("netlink: Fix rmem check in netlink_broadcast_deliver().") a215b5723922 ("netlink: make sure we allow at least one dump skb") ae8f160e7eb2 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.") ef9675b0ef03 ("Bluetooth: hci_sync: Fix not disabling advertising instance") 59710a26a289 ("Bluetooth: hci_core: Remove check of BDADDR_ANY in hci_conn_hash_lookup_big_state") 314d30b15086 ("Bluetooth: hci_sync: Fix attempting to send HCI_Disconnect to BIS handle") c7349772c268 ("Bluetooth: hci_event: Fix not marking Broadcast Sink BIS as connected") d3a5f2871adc ("tcp: Correct signedness in skb remaining space calculation") 1a03edeb84e6 ("tcp: refine sk_rcvbuf increase for ooo packets") ffdde7bf5a43 ("net/sched: Abort __tc_modify_qdisc if parent class does not exist") Let me keep digging but other than the netlink stuff the rest doesn't stand out.. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 19:53 ` Jakub Kicinski @ 2025-07-11 20:07 ` Linus Torvalds 2025-07-11 20:35 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2025-07-11 20:07 UTC (permalink / raw) To: Jakub Kicinski Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev, linux-kernel, pabeni, dri-devel On Fri, 11 Jul 2025 at 12:53, Jakub Kicinski <kuba@kernel.org> wrote: > > Let me keep digging but other than the netlink stuff the rest doesn't > stand out.. Oh well. I think I'll just have to go back to bisecting this thing. I've tried to do that several times, and it has failed due to being too flaky, but I think I've learnt the signs to look out for better too. For example, the first few times I was just looking for "not able to log in", because I hadn't caught on to the fact that sometimes the failures simply didn't hit something very important. This clearly is timing-sensitive, and it's presumably hardware-dependent too. And it could easily be that some bootup process gets stuck on something entirely unrelated. Some random driver change - sound, pin control, whatever - might then just end up having odd interactions. I don't see any issues on my laptop. And considering how random the behavior problems are, it could have been going on for a while without me ever realizing it (plus I was running a distro kernel for at least a few days without even noticing that I wasn't running my own build any more). I was hoping it was some known problem, because I'm not sure how successful a bisect will be. I guess I had nothing better to do this weekend anyway.... Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 20:07 ` Linus Torvalds @ 2025-07-11 20:35 ` Linus Torvalds 2025-07-11 21:46 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2025-07-11 20:35 UTC (permalink / raw) To: Jakub Kicinski Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev, linux-kernel, pabeni, dri-devel On Fri, 11 Jul 2025 at 13:07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Oh well. I think I'll just have to go back to bisecting this thing. > I've tried to do that several times, and it has failed due to being > too flaky, but I think I've learnt the signs to look out for better > too. Indeed. It turns out that the problem actually started somewhere between rc4 and rc5, and all my previous bisections never even came close, because kernels usually work well enough that I never realized that it went back that far. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 20:35 ` Linus Torvalds @ 2025-07-11 21:46 ` Linus Torvalds 2025-07-11 22:19 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2025-07-11 21:46 UTC (permalink / raw) To: Jakub Kicinski, Frederic Weisbecker, Valentin Schneider, Nam Cao, Christian Brauner Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev, linux-kernel, pabeni, dri-devel On Fri, 11 Jul 2025 at 13:35, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Indeed. It turns out that the problem actually started somewhere > between rc4 and rc5, and all my previous bisections never even came > close, because kernels usually work well enough that I never realized > that it went back that far. It looks like it's actually due to commit 8c44dac8add7 ("eventpoll: Fix priority inversion problem"), and it's been going on for a while now and the behavior was just too subtle for me to have noticed. Does not look hardware-specific, except in the sense that it probably needs several CPU's along with the odd startup pattern to trigger this. It's possible that the bisection ended up wrong, and when it appeared to start going off in the weeds I was like "this is broken again", but before I marked a kernel "good" I tested it several times, and then in the end that "eventpoll: Fix priority inversion problem" kind of makes sense after all. I would never have guessed at that commit otherwise (well, considering that I blamed both the drm code and the netlink code first, that goes without saying), but at the same time, that *is* the kind of change that would certainly make user space get hung up with odd timeouts. I've only tested the previous commit being good twice now, but I'll go back to the head of tree and try a revert to verify that this is really it. Because maybe it's the now Nth time I found something that hides the problem, not the real issue. Fingers crossed that this very timing-dependent odd problem really did bisect right finally, after many false starts. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 21:46 ` Linus Torvalds @ 2025-07-11 22:19 ` Linus Torvalds 2025-07-11 23:58 ` Nam Cao 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2025-07-11 22:19 UTC (permalink / raw) To: Jakub Kicinski, Frederic Weisbecker, Valentin Schneider, Nam Cao, Christian Brauner Cc: Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev, linux-kernel, pabeni, dri-devel On Fri, 11 Jul 2025 at 14:46, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I've only tested the previous commit being good twice now, but I'll go > back to the head of tree and try a revert to verify that this is > really it. Because maybe it's the now Nth time I found something that > hides the problem, not the real issue. > > Fingers crossed that this very timing-dependent odd problem really did > bisect right finally, after many false starts. Ok, verified. Finally. I've rebooted this machine five times now with the revert in place, and now that I know to recognize all the subtler signs of breakage, I'm pretty sure I finally got the right culprit. Sometimes the breakage is literally just something like "it takes an extra ten or fifteen seconds to start up some app" and then everything ends up working, which is why it was so easy to overlook, and why my other bisection attempts were such abject failures. But that last bisection when I was more careful and knew what to look for ended up laser-guided to that thing. And apologies to the drm and netlink people who I initially blamed just because there were unrelated bugs that just got merged in the timeframe when I started noticing oddities. You may have had your own bugs, but you were blameless on this issue that I basically spent the last day on (I'd say "wasted" the last day on, but right now I feel good about finding it, so I guess it wasn't wasted time after all). Anyway, I think reverting that commit 8c44dac8add7 ("eventpoll: Fix priority inversion problem") is the right thing for 6.16, and hopefully Nam Cao & co can figure out what went wrong and we'll revisit this in the future. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GIT PULL] Networking for v6.16-rc6 (follow up) 2025-07-11 22:19 ` Linus Torvalds @ 2025-07-11 23:58 ` Nam Cao 0 siblings, 0 replies; 15+ messages in thread From: Nam Cao @ 2025-07-11 23:58 UTC (permalink / raw) To: Linus Torvalds Cc: Jakub Kicinski, Frederic Weisbecker, Valentin Schneider, Christian Brauner, Thomas Zimmermann, Simona Vetter, Dave Airlie, davem, netdev, linux-kernel, pabeni, dri-devel On Fri, Jul 11, 2025 at 03:19:00PM -0700, Linus Torvalds wrote: > On Fri, 11 Jul 2025 at 14:46, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > I've only tested the previous commit being good twice now, but I'll go > > back to the head of tree and try a revert to verify that this is > > really it. Because maybe it's the now Nth time I found something that > > hides the problem, not the real issue. > > > > Fingers crossed that this very timing-dependent odd problem really did > > bisect right finally, after many false starts. > > Ok, verified. Finally. > > I've rebooted this machine five times now with the revert in place, > and now that I know to recognize all the subtler signs of breakage, > I'm pretty sure I finally got the right culprit. > > Sometimes the breakage is literally just something like "it takes an > extra ten or fifteen seconds to start up some app" and then everything > ends up working, which is why it was so easy to overlook, and why my > other bisection attempts were such abject failures. > > But that last bisection when I was more careful and knew what to look > for ended up laser-guided to that thing. > > And apologies to the drm and netlink people who I initially blamed > just because there were unrelated bugs that just got merged in the > timeframe when I started noticing oddities. You may have had your own > bugs, but you were blameless on this issue that I basically spent the > last day on (I'd say "wasted" the last day on, but right now I feel > good about finding it, so I guess it wasn't wasted time after all). > > Anyway, I think reverting that commit 8c44dac8add7 ("eventpoll: Fix > priority inversion problem") is the right thing for 6.16, and > hopefully Nam Cao & co can figure out what went wrong and we'll > revisit this in the future. Yes, please revert it. I had another person reported to me earlier today about a breakage. We also think that reverting this commit for 6.16 is the right thing. Sorry for causing trouble. Strangely my laptop has been running with this commit for ~6 weeks now without any trouble. Maybe I shouldn't have touched this lockless business in the first place. Best regards, Nam ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-07-11 23:58 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-11 15:10 [GIT PULL] Networking for v6.16-rc6 (follow up) Jakub Kicinski 2025-07-11 17:44 ` pr-tracker-bot 2025-07-11 18:33 ` Linus Torvalds 2025-07-11 18:46 ` Jakub Kicinski 2025-07-11 18:54 ` Linus Torvalds 2025-07-11 19:18 ` Linus Torvalds 2025-07-11 19:30 ` Linus Torvalds 2025-07-11 19:34 ` Jakub Kicinski 2025-07-11 19:42 ` Linus Torvalds 2025-07-11 19:53 ` Jakub Kicinski 2025-07-11 20:07 ` Linus Torvalds 2025-07-11 20:35 ` Linus Torvalds 2025-07-11 21:46 ` Linus Torvalds 2025-07-11 22:19 ` Linus Torvalds 2025-07-11 23:58 ` Nam Cao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).