linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
[parent not found: <draft-87msmrdgkb.fsf@gmail.com>]
* RE: [REGRESSION] Freeze on resume from S3 (bisected)
@ 2024-07-08 15:55 Forty Five
  0 siblings, 0 replies; 28+ messages in thread
From: Forty Five @ 2024-07-08 15:55 UTC (permalink / raw)
  To: Ping-Ke Shih
  Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org,
	Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev

[-- Attachment #1: Type: text/plain, Size: 1649 bytes --]


Hi Ping-Ke,

Sorry for the delay since my last response.

First off - I am unable to apply your debug patch
(https://lore.kernel.org/linux-wireless/8583c53fa42848c9855b2b425ac18ca4@realtek.com/)
on top of either bcbefbd032df or 5bbd9b249880; the patch application
fails and I'm not confident enough to try and manually apply it. So the
logs here do not include debug output.

> Your setup is very complicated, so I can't setup in my side easily,

Fair enough. For (my) future reference, let me just note down the
procedure I'm using to reproduce the system freeze at this point -

1. Have a working hotspot using hostapd, managed using hostapd.service.
   Start hostapd.service.

2. Suspend and resume the system.

3. Wait for hostapd to produce 'handle_probe_req: send failed' error
   messages. (My means of triggering these is by having another device
   attempt to connect to the hotspot)

4. Restart hostapd.service. The system will usually freeze at this
   point. If not, I repeat these steps a few times until it does.

> First problem is the culprit commit [1] that makes system frozen, and I still
> feel the patch [2] you have taken can fix it. Please use [1] as code base and
> apply patch [2] to see the result (#exp 1).

This does indeed appear to be the case - the patch does fix the issue on
that specific commit.

> Third (unsure) problem could be introduced by commits between [1] and [3].
> If first problem can be addressed by #exp 1, it could be possible to bisect
> the problem between [1] and [3]. Even if [1] is the only problem, revert
> the commit to see if it becomes good (#exp 2).

Here are the crash logs from #exp 2:

[-- Attachment #2: kdumpst-202407081539.zip --]
[-- Type: application/zip, Size: 32447 bytes --]

[-- Attachment #3: kdumpst-202407081543.zip --]
[-- Type: application/zip, Size: 33202 bytes --]

[-- Attachment #4: Type: text/plain, Size: 244 bytes --]

In the first log, the freeze happened at the end of step 4; in the
second, I had to repeat steps 1-4 a second time, after which the freeze
happened at the end of step 4.

I'll get to work on the bisection, and send the log here when it's done.

^ permalink raw reply	[flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected)
@ 2024-07-01  6:15 Forty Five
  0 siblings, 0 replies; 28+ messages in thread
From: Forty Five @ 2024-07-01  6:15 UTC (permalink / raw)
  To: Forty Five, Forty Five, Ping-Ke Shih
  Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org,
	Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev


[-- Attachment #1.1: Type: text/plain, Size: 363 bytes --]

Forty Five <mathewegeorge@gmail.com> writes:

> Forty Five <mathewegeorge@gmail.com> writes:
>
>> I applied both patches on the latest master; here are the crash logs.
>
> Just realized I forgot to run the commands to get debug messages for all
> except the first crash log (kdumpst-202406301756.zip). I'll send more
> logs later.

4 logs reproducing the issue:


[-- Attachment #1.2: kdumpst-202407010530.zip --]
[-- Type: application/zip, Size: 139065 bytes --]

[-- Attachment #1.3: kdumpst-202407010534.zip --]
[-- Type: application/zip, Size: 180815 bytes --]

[-- Attachment #1.4: kdumpst-202407010539.zip --]
[-- Type: application/zip, Size: 101324 bytes --]

[-- Attachment #1.5: kdumpst-202407010542.zip --]
[-- Type: application/zip, Size: 99247 bytes --]

[-- Attachment #2: Type: text/plain, Size: 78 bytes --]


2 more that are triggered by Alt+SysRq+c, without starting
hostapd.service:


[-- Attachment #3: kdumpst-202407010557.zip --]
[-- Type: application/zip, Size: 157012 bytes --]

[-- Attachment #4: kdumpst-202407010610.zip --]
[-- Type: application/zip, Size: 149949 bytes --]

[-- Attachment #5: Type: text/plain, Size: 275 bytes --]



I set `log_buf_len=32M` to increase the dmesg buffer size; you should
see the full dmesg in there. I did run into the issue of kdump not
working at all, and the system just restarting; I resolved this by
adding `crash_kernel=2048M` to the grub parameters that kdumpst sets.

^ permalink raw reply	[flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected)
@ 2024-06-30 19:11 Forty Five
  2024-07-03  7:39 ` Ping-Ke Shih
  0 siblings, 1 reply; 28+ messages in thread
From: Forty Five @ 2024-06-30 19:11 UTC (permalink / raw)
  To: Ping-Ke Shih, Forty Five
  Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org,
	Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev


[-- Attachment #1.1: Type: text/plain, Size: 2149 bytes --]

Ping-Ke Shih <pkshih@realtek.com> writes:

> Since I saw 'NetworkManager' and 'hostapd' in code trace, I would like to know
> if you have two virtual interfaces, which for STA and AP modes? (Please check
> this by 'iw dev') If so, is it possible to remove hostapd (AP mode) to see if
> this is a factor causing crash.

I use hostapd as part of a Wi-Fi hotspot setup for this laptop. I REALLY
wish I'd connected the dots earlier and realised that it could be
related to this issue. While running gbcbefbd032 (first bad commit), I
disabled all the components of my setup and the issue went away; then I
enabled them one by one until the issue emerged. I'll walk you through
the relevant details, and my observations during this process.

I create a virtual interface for hostapd using this systemd unit:

```
[Unit]
Requires=sys-subsystem-net-devices-wlo1.device
After=network.target
After=sys-subsystem-net-devices-wlo1.device
[Service]
Type=oneshot
ExecStart=/usr/bin/iw dev wlo1 interface add wlo1_ap type __ap addr "xx:xx:xx:xx:xx:xx"
ExecStart=/usr/bin/ip addr add 192.168.30.1/24 dev wlo1_ap
[Install]
WantedBy=multi-user.target
```

I need the '__ap' type because my card doesn't support two interfaces in
managed mode; see [1] for details.

[1] https://wiki.archlinux.org/title/Talk:Software_access_point#Two_interfaces_on_same_card

Then I configure NetworkManager to ignore this interface.

```
;; in /etc/NetworkManager/conf.d/unmanaged.conf
[keyfile]
unmanaged-devices=interface-name:wlo1_ap
```

Coming to hostapd - this is where it gets rather complicated. First off,
let me mention that when I enabled hostapd.service again, I started
seeing the 'phy0: resume with hardware scan still in progress' warnings,
which had gone away upto this point.

Next - once I enabled hostapd.service, I was able to reproduce the
crashes. However, the dmesg in the crash log was different from what I
see when I have the rest of my setup enabled (I hadn't applied either
patch when this crash happened, and it's on b54846da4 because that's the
earliest bad commit in which I'm able to produce crash logs at all, as I
described in my original message):


[-- Attachment #1.2: kdumpst-202406301627.zip --]
[-- Type: application/zip, Size: 37476 bytes --]

[-- Attachment #1.3: Type: text/plain, Size: 70 bytes --]


Here are two more logs on 5bbd9b249880, again without either patch:


[-- Attachment #1.4: kdumpst-202406301810.zip --]
[-- Type: application/zip, Size: 37184 bytes --]

[-- Attachment #1.5: kdumpst-202406301814.zip --]
[-- Type: application/zip, Size: 25921 bytes --]

[-- Attachment #1.6: Type: text/plain, Size: 2888 bytes --]



For completeness, here's a description of the remaining elements of my
setup, but keep in mind that it's not necessary to reproduce the issue -
only to explain how the logs have looked so far.

hostapd cannot switch to a different channel while running; it has to be
restarted on the new channel. I'm usually connected to WiFi, and
constantly switching between stations depending on connectivity
(NetworkManager does this automatically), which means that I'm
constantly changing channels. So I have a script that runs `iw dev wlo1
info` every 2 seconds and greps the current channel number from its
output (yes I know that `iw` has a warning not to scrape its output
because it isn't considered stable; I don't know any other way to do
this), and compares it to the channel number from its previous run of
`iw`. It then restarts hostapd.service if they don't match, or
stops/starts it if one of them is the empty string (meaning the
interface had no channel number when `iw` was run).

Finally - hostapd does not handle suspend+resume well. It stops working
and spams 'handle_probe_req: send failed' into the logs, and it needs to
be restarted. So I have a systemd service to automatically restart it on
resume -

```
[Unit]
After=suspend.target
After=hibernate.target
Description=restart hostapd after resume from suspend
# ...because it stops working and spams the journal with
# 'handle_probe_req: send failed' error
[Service]
Type=simple
ExecStart=/usr/bin/systemctl restart hostapd.service
[Install]
WantedBy=suspend.target
WantedBy=hibernate.target
```

I'll leave out the dnsmasq and iptables configuration I had to do, since
I can't see how it could be related.


> Attachment is a debug patch that add more messages and code trace, please help
> to reproduce problem with patches of [2] and attachment. If your kernel enables
> dynamic debug, need additional commands to have debug message:
>    sudo bash -c 'echo -n "module rtw89_core +p" > /sys/kernel/debug/dynamic_debug/control'
>    sudo bash -c 'echo -n "module rtw89_pci +p" > /sys/kernel/debug/dynamic_debug/control'
> Since there are more than one symptoms causing system freeze, please collect
> four logs as before. Also please give me two logs that system can normally
> suspend/resume, so I can compare their difference.

I applied both patches on the latest master; here are the crash logs.
With the patches, I am no longer able to trigger the crash merely by
suspending and resuming. I have to run
`sudo systemctl restart hostapd.service` after hostapd emits the
'handle_probe_req: send failed' errors (which, as described above,
happen after suspend+resume). So maybe [2] is making a difference here.
I wanted to test with your debug patch and without [2], but the patch
application failed unless I applied both together.

[2] https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@realtek.com/


[-- Attachment #1.7: kdumpst-202406301756.zip --]
[-- Type: application/zip, Size: 21084 bytes --]

[-- Attachment #1.8: kdumpst-202406301800.zip --]
[-- Type: application/zip, Size: 25619 bytes --]

[-- Attachment #1.9: kdumpst-202406301828.zip --]
[-- Type: application/zip, Size: 24761 bytes --]

[-- Attachment #1.10: kdumpst-202406301830.zip --]
[-- Type: application/zip, Size: 22935 bytes --]

[-- Attachment #2: Type: text/plain, Size: 120 bytes --]


Finally, here are two crash logs generated with Alt+SysRq+c, without
hostapd enabled, and with both patches applied -


[-- Attachment #3: kdumpst-202406301855.zip --]
[-- Type: application/zip, Size: 23755 bytes --]

[-- Attachment #4: kdumpst-202406301857.zip --]
[-- Type: application/zip, Size: 23856 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread
* [REGRESSION] Freeze on resume from S3 (bisected)
@ 2024-06-19  4:39 Forty Five
  2024-06-19  6:07 ` Ping-Ke Shih
  0 siblings, 1 reply; 28+ messages in thread
From: Forty Five @ 2024-06-19  4:39 UTC (permalink / raw)
  To: linux-wireless; +Cc: linux-pm, phhuang, pkshih, kvalo, regressions


I've been experiencing system freezes that require a hard reset with the
power button, starting from v6.8-rc1. The warnings produced suggest a
wireless-related issue, and the bisection seems to confirm this.

The freezes usually occur right after resuming from S3 (suspend). To
reproduce the issue, I just suspend (`sudo systemctl suspend`) and
resume (press any key on keyboard) a few times until the freeze happens.
It takes 4-5 cycles of this most of the time, though it varies a lot.

The upshot of all of this is that S3 (and S4 as well, see below) are
pretty much unusable for me, due to the risk of losing work due to a
freeze.

Since a hard reset causes the systemd journal to be dropped, I set up
[kdumpst](https://gitlab.freedesktop.org/gpiccoli/kdumpst) so that I
could record dmesg output at the time of the freeze. I've linked to the
crash logs produced by that tool for every bad commit in the bisection
(except in cases where I couldn't get the kdump to happen - see the
comments in the bisection log below), as well as a crash log triggered
by SysRq (Alt+SysRq+c) on a good commit for comparison.

I've already submitted this as an Arch Linux bug report:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/61

The freezes have also occurred (seemingly) at random times, although
this is quite rare. They do seem to be more likely to occur after the
system has been hibernated, although I don't really have any way to
confirm this.


Git bisect log:

git bisect start
# status: waiting for both good and bad commits
# bad: [83a7eefedc9b56fe7bfeff13b6c7356688ffa670] Linux 6.10-rc3
git bisect bad 83a7eefedc9b56fe7bfeff13b6c7356688ffa670
# good: [e8f897f4afef0031fe618a8e94127a0934896aba] Linux 6.8
git bisect good e8f897f4afef0031fe618a8e94127a0934896aba
# bad: [445e60303883950161f67e18b9f048b18d7fb706] Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
git bisect bad 445e60303883950161f67e18b9f048b18d7fb706
# bad: [e5e038b7ae9da96b93974bf072ca1876899a01a3] Merge tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
# this one froze on the first suspend/resume
git bisect bad e5e038b7ae9da96b93974bf072ca1876899a01a3
# good: [1f440397665f4241346e4cc6d93f8b73880815d1] Merge tag 'docs-6.9' of git://git.lwn.net/linux
git bisect good 1f440397665f4241346e4cc6d93f8b73880815d1
# bad: [a2f24c8a955c8f941d6ac08dd7f401f54eef4627] Merge branch 'mptcp-some-clean-up-patches'
git bisect bad a2f24c8a955c8f941d6ac08dd7f401f54eef4627
# bad: [26f4dac11775a1ca24e2605cb30e828d4dbdea93] netfilter: x_tables: Use unsafe_memcpy() for 0-sized destination
# This one was very hard to reproduce - it must have taken 25-30 cycles total, and I hibernated the system twice
git bisect bad 26f4dac11775a1ca24e2605cb30e828d4dbdea93
# good: [4f5e5092fdbf5cec6bedc19fbe69cce4f5f08372] Merge tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect good 4f5e5092fdbf5cec6bedc19fbe69cce4f5f08372
# good: [b4e8ae5c8c41355791a99fdf2fcac16deace1e79] net: add napi_busy_loop_rcu()
git bisect good b4e8ae5c8c41355791a99fdf2fcac16deace1e79
# bad: [20ea9327c2fd545d6b96e998727bcd724290694d] net: dccp: Simplify the allocation of slab caches in dccp_ackvec_init
git bisect bad 20ea9327c2fd545d6b96e998727bcd724290694d
# bad: [92046e83c07b064ca65ac4ae7660a540016bdfc1] Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
git bisect bad 92046e83c07b064ca65ac4ae7660a540016bdfc1
# bad: [b54846da45942bbe4e5ebc59d497e4a48525ba5a] Merge tag 'wireless-next-2024-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
git bisect bad b54846da45942bbe4e5ebc59d497e4a48525ba5a
# good: [3832a9c40b356500c5b85a6fdf9577c590fcd637] wifi: rtw89: fw: extend JOIN H2C command to support WiFi 7 chips
git bisect good 3832a9c40b356500c5b85a6fdf9577c590fcd637
# bad: [5ba45ba77616637e554d66a57ef0334e5cc2efe4] wifi: rtw89: fix disabling concurrent mode TX hang issue
# I see the same kernel warnings here as with the other bad commits.
# However, there is no crash after a few suspend/resume cycles - instead
# the system simply restarts when I trigger the nth suspend. I'm not
# able to trigger a crash with SysRq either (Alt+SysRq+c) - again, the
# system restarts. This is the case with all bad commits after this.
git bisect bad 5ba45ba77616637e554d66a57ef0334e5cc2efe4
# good: [85da8f71aaa7b83ea7ef0e89182e0cd47e16d465] wifi: brcmfmac: Demote vendor-specific attach/detach messages to info
git bisect good 85da8f71aaa7b83ea7ef0e89182e0cd47e16d465
# good: [295304040d9f6f350b68652acd99650c7e16d0a8] wifi: rtw89: 8922a: add TX power related ops
git bisect good 295304040d9f6f350b68652acd99650c7e16d0a8
# good: [7cf6b6764b2f665d317ba0f91c247437019a2f4c] wifi: rtw89: Set default CQM config if not present
git bisect good 7cf6b6764b2f665d317ba0f91c247437019a2f4c
# good: [7e11a2966f51695c0af0b1f976a32d64dee243b2] wifi: rtw89: fix null pointer access when abort scan
git bisect good 7e11a2966f51695c0af0b1f976a32d64dee243b2
# bad: [f59a98c82534e986b06615ba94e060aa3129b08b] wifi: rtw89: fix HW scan timeout due to TSF sync issue
git bisect bad f59a98c82534e986b06615ba94e060aa3129b08b
# bad: [bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23] wifi: rtw89: add wait/completion for abort scan
git bisect bad bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23
# first bad commit: [bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23] wifi: rtw89: add wait/completion for abort scan


#regzbot introduced: bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23


Logs produced by kdumpst:

[kdumpst-e5e038b7ae9da96b93974bf072ca1876899a01a3.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/b925cb5e9c6cfb6b224993c017ba61f5/kdumpst-e5e038b7ae9da96b93974bf072ca1876899a01a3.zip)

[kdumpst-b54846da45942bbe4e5ebc59d497e4a48525ba5a.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/e1610c3ae26971a890121d3a5eb8fa9e/kdumpst-b54846da45942bbe4e5ebc59d497e4a48525ba5a.zip)

[kdumpst-a2f24c8a955c8f941d6ac08dd7f401f54eef4627.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/34171a87da304987cc8ce5066fc35dd5/kdumpst-a2f24c8a955c8f941d6ac08dd7f401f54eef4627.zip)

[kdumpst-92046e83c07b064ca65ac4ae7660a540016bdfc1.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/67d756f963ffafd0b332c88701286a30/kdumpst-92046e83c07b064ca65ac4ae7660a540016bdfc1.zip)

[kdumpst-445e60303883950161f67e18b9f048b18d7fb706.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/f75f91d127050a327114bf904f335087/kdumpst-445e60303883950161f67e18b9f048b18d7fb706.zip)

[kdumpst-26f4dac11775a1ca24e2605cb30e828d4dbdea93.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/f3d26fc4187c8b8854386fed6f15c972/kdumpst-26f4dac11775a1ca24e2605cb30e828d4dbdea93.zip)

[kdumpst-20ea9327c2fd545d6b96e998727bcd724290694d.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/8618207aff0eb92413ce0c2994c7888f/kdumpst-20ea9327c2fd545d6b96e998727bcd724290694d.zip)

Good commit: [kdumpst-e8f897f4afef0031fe618a8e94127a0934896aba.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/b13b3914e50fc85548c77a4163dec383/kdumpst-e8f897f4afef0031fe618a8e94127a0934896aba.zip)
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-07-12  0:59 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <875xtqjli4.fsf@gmail.com>
2024-06-30 19:20 ` [REGRESSION] Freeze on resume from S3 (bisected) Forty Five
2024-07-01  2:46   ` Ping-Ke Shih
2024-07-01  5:36   ` Ping-Ke Shih
     [not found] <draft-87msmrdgkb.fsf@gmail.com>
2024-07-08 16:30 ` Forty Five
2024-07-09  1:26   ` Ping-Ke Shih
2024-07-09  4:10     ` Forty Five
2024-07-09  4:25       ` Ping-Ke Shih
2024-07-09 11:49         ` Forty Five
2024-07-11  7:54     ` Forty Five
2024-07-12  0:59       ` Ping-Ke Shih
2024-07-08 15:55 Forty Five
  -- strict thread matches above, loose matches on Subject: below --
2024-07-01  6:15 Forty Five
2024-06-30 19:11 Forty Five
2024-07-03  7:39 ` Ping-Ke Shih
2024-06-19  4:39 Forty Five
2024-06-19  6:07 ` Ping-Ke Shih
2024-06-19 14:46   ` Forty Five
2024-06-20  8:16     ` Ping-Ke Shih
2024-06-20  8:56       ` Kalle Valo
2024-06-20  9:06         ` Ping-Ke Shih
2024-06-20  9:18       ` Mathew George
2024-06-20  9:33         ` Ping-Ke Shih
2024-06-20 10:05           ` Mathew George
2024-06-20 11:41             ` Ping-Ke Shih
2024-06-20 11:58               ` Johannes Berg
2024-06-20 13:05               ` Forty Five
2024-06-20 13:41         ` Forty Five
2024-06-28  3:55           ` Ping-Ke Shih

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).