* RE: [REGRESSION] Freeze on resume from S3 (bisected) @ 2024-07-08 15:55 Forty Five 0 siblings, 0 replies; 28+ messages in thread From: Forty Five @ 2024-07-08 15:55 UTC (permalink / raw) To: Ping-Ke Shih Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev [-- Attachment #1: Type: text/plain, Size: 1649 bytes --] Hi Ping-Ke, Sorry for the delay since my last response. First off - I am unable to apply your debug patch (https://lore.kernel.org/linux-wireless/8583c53fa42848c9855b2b425ac18ca4@realtek.com/) on top of either bcbefbd032df or 5bbd9b249880; the patch application fails and I'm not confident enough to try and manually apply it. So the logs here do not include debug output. > Your setup is very complicated, so I can't setup in my side easily, Fair enough. For (my) future reference, let me just note down the procedure I'm using to reproduce the system freeze at this point - 1. Have a working hotspot using hostapd, managed using hostapd.service. Start hostapd.service. 2. Suspend and resume the system. 3. Wait for hostapd to produce 'handle_probe_req: send failed' error messages. (My means of triggering these is by having another device attempt to connect to the hotspot) 4. Restart hostapd.service. The system will usually freeze at this point. If not, I repeat these steps a few times until it does. > First problem is the culprit commit [1] that makes system frozen, and I still > feel the patch [2] you have taken can fix it. Please use [1] as code base and > apply patch [2] to see the result (#exp 1). This does indeed appear to be the case - the patch does fix the issue on that specific commit. > Third (unsure) problem could be introduced by commits between [1] and [3]. > If first problem can be addressed by #exp 1, it could be possible to bisect > the problem between [1] and [3]. Even if [1] is the only problem, revert > the commit to see if it becomes good (#exp 2). Here are the crash logs from #exp 2: [-- Attachment #2: kdumpst-202407081539.zip --] [-- Type: application/zip, Size: 32447 bytes --] [-- Attachment #3: kdumpst-202407081543.zip --] [-- Type: application/zip, Size: 33202 bytes --] [-- Attachment #4: Type: text/plain, Size: 244 bytes --] In the first log, the freeze happened at the end of step 4; in the second, I had to repeat steps 1-4 a second time, after which the freeze happened at the end of step 4. I'll get to work on the bisection, and send the log here when it's done. ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <draft-87msmrdgkb.fsf@gmail.com>]
* RE: [REGRESSION] Freeze on resume from S3 (bisected) [not found] <draft-87msmrdgkb.fsf@gmail.com> @ 2024-07-08 16:30 ` Forty Five 2024-07-09 1:26 ` Ping-Ke Shih 0 siblings, 1 reply; 28+ messages in thread From: Forty Five @ 2024-07-08 16:30 UTC (permalink / raw) To: Ping-Ke Shih Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev > I'll get to work on the bisection, and send the log here when it's > done. Just to confirm - I should apply [1], and no other patches, during the bisection, right? [1] https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@realtek.com/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-07-08 16:30 ` Forty Five @ 2024-07-09 1:26 ` Ping-Ke Shih 2024-07-09 4:10 ` Forty Five 2024-07-11 7:54 ` Forty Five 0 siblings, 2 replies; 28+ messages in thread From: Ping-Ke Shih @ 2024-07-09 1:26 UTC (permalink / raw) To: Forty Five Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev Forty Five <mathewegeorge@gmail.com> wrote: > > > I'll get to work on the bisection, and send the log here when it's > > done. > > Just to confirm - I should apply [1], and no other patches, during the > bisection, right? > > [1] https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@realtek.com/ Right. Only apply [1] in every bisection step. Thanks for the exp #1 and #2. The results are in expectation. Then the new bisection will find out another problem. ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-07-09 1:26 ` Ping-Ke Shih @ 2024-07-09 4:10 ` Forty Five 2024-07-09 4:25 ` Ping-Ke Shih 2024-07-11 7:54 ` Forty Five 1 sibling, 1 reply; 28+ messages in thread From: Forty Five @ 2024-07-09 4:10 UTC (permalink / raw) To: Ping-Ke Shih Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev Ping-Ke Shih <pkshih@realtek.com> writes: > Right. Only apply [1] in every bisection step. Patch fails on 57f22c8dab6b266ae. Could you send a version that succeeds? Bisection log so far: git bisect start # status: waiting for both good and bad commits # bad: [5bbd9b249880dba032bffa002dd9cd12cd5af09c] Merge tag 'v6.10-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 git bisect bad 5bbd9b249880dba032bffa002dd9cd12cd5af09c # status: waiting for good commit(s), bad commit known # good: [bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23] wifi: rtw89: add wait/completion for abort scan git bisect good bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23 # bad: [480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel git bisect bad 480e035fc4c714fb5536e64ab9db04fedc89e910 ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-07-09 4:10 ` Forty Five @ 2024-07-09 4:25 ` Ping-Ke Shih 2024-07-09 11:49 ` Forty Five 0 siblings, 1 reply; 28+ messages in thread From: Ping-Ke Shih @ 2024-07-09 4:25 UTC (permalink / raw) To: Forty Five Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev Forty Five <mathewegeorge@gmail.com> wrote: > > Ping-Ke Shih <pkshih@realtek.com> writes: > > > Right. Only apply [1] in every bisection step. > > Patch fails on 57f22c8dab6b266ae. Could you send a version that succeeds? > > Bisection log so far: > > git bisect start > # status: waiting for both good and bad commits > # bad: [5bbd9b249880dba032bffa002dd9cd12cd5af09c] Merge tag 'v6.10-p4' of > git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 > git bisect bad 5bbd9b249880dba032bffa002dd9cd12cd5af09c > # status: waiting for good commit(s), bad commit known > # good: [bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23] wifi: rtw89: add wait/completion for abort scan > git bisect good bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23 > # bad: [480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of > https://gitlab.freedesktop.org/drm/kernel > git bisect bad 480e035fc4c714fb5536e64ab9db04fedc89e910 The commit date of culprit bcbefbd032d ("wifi: rtw89: add wait/completion for abort scan") is CommitDate: Tue Jan 23 13:38:15 2024 +0200 But, you want to apply to the top of 57f22c8dab6b whose date is CommitDate: Fri Jan 19 13:49:16 2024 -0800 and doesn't contain commit bcbefbd032d, so no need to apply [1] at this point. ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-07-09 4:25 ` Ping-Ke Shih @ 2024-07-09 11:49 ` Forty Five 0 siblings, 0 replies; 28+ messages in thread From: Forty Five @ 2024-07-09 11:49 UTC (permalink / raw) To: Ping-Ke Shih Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev Ping-Ke Shih <pkshih@realtek.com> writes: > The commit date of culprit bcbefbd032d ("wifi: rtw89: add wait/completion for abort scan") is > CommitDate: Tue Jan 23 13:38:15 2024 +0200 > > But, you want to apply to the top of 57f22c8dab6b whose date is > CommitDate: Fri Jan 19 13:49:16 2024 -0800 > and doesn't contain commit bcbefbd032d, so no need to apply [1] at this point. Ah, I get it. So, from here on, I'll be applying the patch only if this command returns exit status 0: git merge-base --is-ancestor bcbefbd032d <commit-to-test> Bisection log so far: git bisect start # status: waiting for both good and bad commits # bad: [5bbd9b249880dba032bffa002dd9cd12cd5af09c] Merge tag 'v6.10-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 git bisect bad 5bbd9b249880dba032bffa002dd9cd12cd5af09c # status: waiting for good commit(s), bad commit known # good: [bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23] wifi: rtw89: add wait/completion for abort scan git bisect good bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23 # bad: [480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel git bisect bad 480e035fc4c714fb5536e64ab9db04fedc89e910 # good: [57f22c8dab6b266ae36b89b073a4a33dea71e762] Merge tag 'strlcpy-removal-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux git bisect good 57f22c8dab6b266ae36b89b073a4a33dea71e762 # good: [43a7548e28a6df12a6170421d9d016c576010baa] Merge tag 'for-6.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux git bisect good 43a7548e28a6df12a6170421d9d016c576010baa # bad: [8c9c2f851b5a58195ed7ebd67d7c59683d1a02bc] Merge tag 'iommu-updates-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu git bisect bad 8c9c2f851b5a58195ed7ebd67d7c59683d1a02bc ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-07-09 1:26 ` Ping-Ke Shih 2024-07-09 4:10 ` Forty Five @ 2024-07-11 7:54 ` Forty Five 2024-07-12 0:59 ` Ping-Ke Shih 1 sibling, 1 reply; 28+ messages in thread From: Forty Five @ 2024-07-11 7:54 UTC (permalink / raw) To: Ping-Ke Shih Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev [-- Attachment #1: Type: text/plain, Size: 3470 bytes --] Ping-Ke Shih <pkshih@realtek.com> writes: > Right. Only apply [1] in every bisection step. > > Thanks for the exp #1 and #2. The results are in expectation. Then the new > bisection will find out another problem. Here are the bisection results: git bisect start # status: waiting for both good and bad commits # bad: [5bbd9b249880dba032bffa002dd9cd12cd5af09c] Merge tag 'v6.10-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 git bisect bad 5bbd9b249880dba032bffa002dd9cd12cd5af09c # good: [bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23] wifi: rtw89: add wait/completion for abort scan git bisect good bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23 # bad: [480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel git bisect bad 480e035fc4c714fb5536e64ab9db04fedc89e910 # good: [57f22c8dab6b266ae36b89b073a4a33dea71e762] Merge tag 'strlcpy-removal-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux git bisect good 57f22c8dab6b266ae36b89b073a4a33dea71e762 # good: [43a7548e28a6df12a6170421d9d016c576010baa] Merge tag 'for-6.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux git bisect good 43a7548e28a6df12a6170421d9d016c576010baa # bad: [8c9c2f851b5a58195ed7ebd67d7c59683d1a02bc] Merge tag 'iommu-updates-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu git bisect bad 8c9c2f851b5a58195ed7ebd67d7c59683d1a02bc # good: [b38061fe9cfa90a781e9e59fc761191fc8b469a1] net: phy: simplify genphy_c45_ethtool_set_eee git bisect good b38061fe9cfa90a781e9e59fc761191fc8b469a1 # bad: [75c2946db360e625f1447a37f47dbbb38b1dd478] Merge tag 'wireless-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next git bisect bad 75c2946db360e625f1447a37f47dbbb38b1dd478 # good: [d35c9659e56edd9e629b54da8ceca062517d3d6c] Merge branch 'net-gro-cleanups-and-fast-path-refinement' git bisect good d35c9659e56edd9e629b54da8ceca062517d3d6c # good: [e8bb2ccff7216d520a7bc33c22484dafebe8147e] Merge branch 'net-group-together-hot-data' git bisect good e8bb2ccff7216d520a7bc33c22484dafebe8147e # bad: [85977fc0aa489420709779cbc859966db94be68f] wifi: mac80211: remove TDLS peers only on affected link git bisect bad 85977fc0aa489420709779cbc859966db94be68f # good: [c2b22e26755c77299e1dffa2e17374cd28f7f3a7] wifi: mt76: mt7921: fix the unfinished command of regd_notifier before suspend git bisect good c2b22e26755c77299e1dffa2e17374cd28f7f3a7 # good: [9ad7974856926129f190ffbe3beea78460b3b7cc] wifi: cfg80211: check A-MSDU format more carefully git bisect good 9ad7974856926129f190ffbe3beea78460b3b7cc # bad: [68f6c6afbcebdc3acdc6084abfe453f4cba6b9dc] wifi: mac80211: add ieee80211_vif_link_active() helper git bisect bad 68f6c6afbcebdc3acdc6084abfe453f4cba6b9dc # bad: [b2edc721716f44e2a7e46eb592321960a1227c7b] wifi: cfg80211: print flags in tracing in hex git bisect bad b2edc721716f44e2a7e46eb592321960a1227c7b # bad: [5fcc7c51f9e72d1e62991f8b32be4a5adf44d556] wifi: mac80211: handle netif carrier up/down with link AP during MLO git bisect bad 5fcc7c51f9e72d1e62991f8b32be4a5adf44d556 # bad: [1c0d21c4b33a41be9090e73f8225813d72ac88d9] wifi: mac80211: remove only link keys during stopping link AP git bisect bad 1c0d21c4b33a41be9090e73f8225813d72ac88d9 # first bad commit: [1c0d21c4b33a41be9090e73f8225813d72ac88d9] wifi: mac80211: remove only link keys during stopping link AP And here are crash logs for all the bad commits: [-- Attachment #2: kdumpst-202407090344.zip --] [-- Type: application/zip, Size: 33740 bytes --] [-- Attachment #3: kdumpst-202407091142.zip --] [-- Type: application/zip, Size: 34769 bytes --] [-- Attachment #4: kdumpst-202407091506.zip --] [-- Type: application/zip, Size: 33505 bytes --] [-- Attachment #5: kdumpst-202407100543.zip --] [-- Type: application/zip, Size: 35016 bytes --] [-- Attachment #6: kdumpst-202407101146.zip --] [-- Type: application/zip, Size: 34688 bytes --] [-- Attachment #7: kdumpst-202407101230.zip --] [-- Type: application/zip, Size: 36019 bytes --] [-- Attachment #8: kdumpst-202407101301.zip --] [-- Type: application/zip, Size: 33660 bytes --] [-- Attachment #9: kdumpst-202407101338.zip --] [-- Type: application/zip, Size: 34355 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-07-11 7:54 ` Forty Five @ 2024-07-12 0:59 ` Ping-Ke Shih 0 siblings, 0 replies; 28+ messages in thread From: Ping-Ke Shih @ 2024-07-12 0:59 UTC (permalink / raw) To: Forty Five, johannes.berg@intel.com, quic_ramess@quicinc.com, quic_adisi@quicinc.com Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev + Johannes, Rameshkumar and Aditya Hi Johannes, Rameshkumar and Aditya, Mathew helped to do bisection and found the cause is commit 1c0d21c4b33a ("wifi: mac80211: remove only link keys during stopping link AP"). The use case is as description [1] using RTL8852BE. The STA and AP mode vifs operate on the same channels (SCC). Please give us guide to dig this problem. Thanks. [1] https://lore.kernel.org/linux-wireless/87le2bdgk0.fsf@gmail.com/ Forty Five <mathewegeorge@gmail.com> wrote: > Ping-Ke Shih <pkshih@realtek.com> writes: > > > Right. Only apply [1] in every bisection step. > > > > Thanks for the exp #1 and #2. The results are in expectation. Then the new > > bisection will find out another problem. > > Here are the bisection results: > > git bisect start > # status: waiting for both good and bad commits > # bad: [5bbd9b249880dba032bffa002dd9cd12cd5af09c] Merge tag 'v6.10-p4' of > git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 > git bisect bad 5bbd9b249880dba032bffa002dd9cd12cd5af09c > # good: [bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23] wifi: rtw89: add wait/completion for abort scan > git bisect good bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23 > # bad: [480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of > https://gitlab.freedesktop.org/drm/kernel > git bisect bad 480e035fc4c714fb5536e64ab9db04fedc89e910 > # good: [57f22c8dab6b266ae36b89b073a4a33dea71e762] Merge tag 'strlcpy-removal-v6.8-rc1' of > git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux > git bisect good 57f22c8dab6b266ae36b89b073a4a33dea71e762 > # good: [43a7548e28a6df12a6170421d9d016c576010baa] Merge tag 'for-6.9-tag' of > git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux > git bisect good 43a7548e28a6df12a6170421d9d016c576010baa > # bad: [8c9c2f851b5a58195ed7ebd67d7c59683d1a02bc] Merge tag 'iommu-updates-v6.9' of > git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu > git bisect bad 8c9c2f851b5a58195ed7ebd67d7c59683d1a02bc > # good: [b38061fe9cfa90a781e9e59fc761191fc8b469a1] net: phy: simplify genphy_c45_ethtool_set_eee > git bisect good b38061fe9cfa90a781e9e59fc761191fc8b469a1 > # bad: [75c2946db360e625f1447a37f47dbbb38b1dd478] Merge tag 'wireless-next-2024-03-08' of > git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next > git bisect bad 75c2946db360e625f1447a37f47dbbb38b1dd478 > # good: [d35c9659e56edd9e629b54da8ceca062517d3d6c] Merge branch > 'net-gro-cleanups-and-fast-path-refinement' > git bisect good d35c9659e56edd9e629b54da8ceca062517d3d6c > # good: [e8bb2ccff7216d520a7bc33c22484dafebe8147e] Merge branch 'net-group-together-hot-data' > git bisect good e8bb2ccff7216d520a7bc33c22484dafebe8147e > # bad: [85977fc0aa489420709779cbc859966db94be68f] wifi: mac80211: remove TDLS peers only on affected link > git bisect bad 85977fc0aa489420709779cbc859966db94be68f > # good: [c2b22e26755c77299e1dffa2e17374cd28f7f3a7] wifi: mt76: mt7921: fix the unfinished command of > regd_notifier before suspend > git bisect good c2b22e26755c77299e1dffa2e17374cd28f7f3a7 > # good: [9ad7974856926129f190ffbe3beea78460b3b7cc] wifi: cfg80211: check A-MSDU format more carefully > git bisect good 9ad7974856926129f190ffbe3beea78460b3b7cc > # bad: [68f6c6afbcebdc3acdc6084abfe453f4cba6b9dc] wifi: mac80211: add ieee80211_vif_link_active() helper > git bisect bad 68f6c6afbcebdc3acdc6084abfe453f4cba6b9dc > # bad: [b2edc721716f44e2a7e46eb592321960a1227c7b] wifi: cfg80211: print flags in tracing in hex > git bisect bad b2edc721716f44e2a7e46eb592321960a1227c7b > # bad: [5fcc7c51f9e72d1e62991f8b32be4a5adf44d556] wifi: mac80211: handle netif carrier up/down with link > AP during MLO > git bisect bad 5fcc7c51f9e72d1e62991f8b32be4a5adf44d556 > # bad: [1c0d21c4b33a41be9090e73f8225813d72ac88d9] wifi: mac80211: remove only link keys during stopping > link AP > git bisect bad 1c0d21c4b33a41be9090e73f8225813d72ac88d9 > # first bad commit: [1c0d21c4b33a41be9090e73f8225813d72ac88d9] wifi: mac80211: remove only link keys during > stopping link AP > > > And here are crash logs for all the bad commits: ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected)
@ 2024-07-01 6:15 Forty Five
0 siblings, 0 replies; 28+ messages in thread
From: Forty Five @ 2024-07-01 6:15 UTC (permalink / raw)
To: Forty Five, Forty Five, Ping-Ke Shih
Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org,
Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev
[-- Attachment #1.1: Type: text/plain, Size: 363 bytes --]
Forty Five <mathewegeorge@gmail.com> writes:
> Forty Five <mathewegeorge@gmail.com> writes:
>
>> I applied both patches on the latest master; here are the crash logs.
>
> Just realized I forgot to run the commands to get debug messages for all
> except the first crash log (kdumpst-202406301756.zip). I'll send more
> logs later.
4 logs reproducing the issue:
[-- Attachment #1.2: kdumpst-202407010530.zip --]
[-- Type: application/zip, Size: 139065 bytes --]
[-- Attachment #1.3: kdumpst-202407010534.zip --]
[-- Type: application/zip, Size: 180815 bytes --]
[-- Attachment #1.4: kdumpst-202407010539.zip --]
[-- Type: application/zip, Size: 101324 bytes --]
[-- Attachment #1.5: kdumpst-202407010542.zip --]
[-- Type: application/zip, Size: 99247 bytes --]
[-- Attachment #2: Type: text/plain, Size: 78 bytes --]
2 more that are triggered by Alt+SysRq+c, without starting
hostapd.service:
[-- Attachment #3: kdumpst-202407010557.zip --]
[-- Type: application/zip, Size: 157012 bytes --]
[-- Attachment #4: kdumpst-202407010610.zip --]
[-- Type: application/zip, Size: 149949 bytes --]
[-- Attachment #5: Type: text/plain, Size: 275 bytes --]
I set `log_buf_len=32M` to increase the dmesg buffer size; you should
see the full dmesg in there. I did run into the issue of kdump not
working at all, and the system just restarting; I resolved this by
adding `crash_kernel=2048M` to the grub parameters that kdumpst sets.
^ permalink raw reply [flat|nested] 28+ messages in thread[parent not found: <875xtqjli4.fsf@gmail.com>]
* RE: [REGRESSION] Freeze on resume from S3 (bisected) [not found] <875xtqjli4.fsf@gmail.com> @ 2024-06-30 19:20 ` Forty Five 2024-07-01 2:46 ` Ping-Ke Shih 2024-07-01 5:36 ` Ping-Ke Shih 0 siblings, 2 replies; 28+ messages in thread From: Forty Five @ 2024-06-30 19:20 UTC (permalink / raw) To: Forty Five, Ping-Ke Shih Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev Forty Five <mathewegeorge@gmail.com> writes: > I applied both patches on the latest master; here are the crash logs. Just realized I forgot to run the commands to get debug messages for all except the first crash log (kdumpst-202406301756.zip). I'll send more logs later. ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-30 19:20 ` Forty Five @ 2024-07-01 2:46 ` Ping-Ke Shih 2024-07-01 5:36 ` Ping-Ke Shih 1 sibling, 0 replies; 28+ messages in thread From: Ping-Ke Shih @ 2024-07-01 2:46 UTC (permalink / raw) To: Forty Five Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev Forty Five <mathewegeorge@gmail.com> wrote: > Forty Five <mathewegeorge@gmail.com> writes: > > > I applied both patches on the latest master; here are the crash logs. > > Just realized I forgot to run the commands to get debug messages for all > except the first crash log (kdumpst-202406301756.zip). I'll send more > logs later. It looks like you forgot to apply my attached debug patch. With debug mask, many messages output to kernel log but I can't see things before suspending, is it possible to enlarge buffer size of kdump? Thanks Ping-Ke ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-30 19:20 ` Forty Five 2024-07-01 2:46 ` Ping-Ke Shih @ 2024-07-01 5:36 ` Ping-Ke Shih 1 sibling, 0 replies; 28+ messages in thread From: Ping-Ke Shih @ 2024-07-01 5:36 UTC (permalink / raw) To: Forty Five Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev Ping-Ke Shih wrote: > Forty Five <mathewegeorge@gmail.com> wrote: > > Forty Five <mathewegeorge@gmail.com> writes: > > > > > I applied both patches on the latest master; here are the crash logs. > > > > Just realized I forgot to run the commands to get debug messages for all > > except the first crash log (kdumpst-202406301756.zip). I'll send more > > logs later. > > It looks like you forgot to apply my attached debug patch. With debug mask, > many messages output to kernel log but I can't see things before suspending, > is it possible to enlarge buffer size of kdump? Sorry. I misread what you meant. I have seen logs after kdumpst-202406301800.zip. ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) @ 2024-06-30 19:11 Forty Five 2024-07-03 7:39 ` Ping-Ke Shih 0 siblings, 1 reply; 28+ messages in thread From: Forty Five @ 2024-06-30 19:11 UTC (permalink / raw) To: Ping-Ke Shih, Forty Five Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev [-- Attachment #1.1: Type: text/plain, Size: 2149 bytes --] Ping-Ke Shih <pkshih@realtek.com> writes: > Since I saw 'NetworkManager' and 'hostapd' in code trace, I would like to know > if you have two virtual interfaces, which for STA and AP modes? (Please check > this by 'iw dev') If so, is it possible to remove hostapd (AP mode) to see if > this is a factor causing crash. I use hostapd as part of a Wi-Fi hotspot setup for this laptop. I REALLY wish I'd connected the dots earlier and realised that it could be related to this issue. While running gbcbefbd032 (first bad commit), I disabled all the components of my setup and the issue went away; then I enabled them one by one until the issue emerged. I'll walk you through the relevant details, and my observations during this process. I create a virtual interface for hostapd using this systemd unit: ``` [Unit] Requires=sys-subsystem-net-devices-wlo1.device After=network.target After=sys-subsystem-net-devices-wlo1.device [Service] Type=oneshot ExecStart=/usr/bin/iw dev wlo1 interface add wlo1_ap type __ap addr "xx:xx:xx:xx:xx:xx" ExecStart=/usr/bin/ip addr add 192.168.30.1/24 dev wlo1_ap [Install] WantedBy=multi-user.target ``` I need the '__ap' type because my card doesn't support two interfaces in managed mode; see [1] for details. [1] https://wiki.archlinux.org/title/Talk:Software_access_point#Two_interfaces_on_same_card Then I configure NetworkManager to ignore this interface. ``` ;; in /etc/NetworkManager/conf.d/unmanaged.conf [keyfile] unmanaged-devices=interface-name:wlo1_ap ``` Coming to hostapd - this is where it gets rather complicated. First off, let me mention that when I enabled hostapd.service again, I started seeing the 'phy0: resume with hardware scan still in progress' warnings, which had gone away upto this point. Next - once I enabled hostapd.service, I was able to reproduce the crashes. However, the dmesg in the crash log was different from what I see when I have the rest of my setup enabled (I hadn't applied either patch when this crash happened, and it's on b54846da4 because that's the earliest bad commit in which I'm able to produce crash logs at all, as I described in my original message): [-- Attachment #1.2: kdumpst-202406301627.zip --] [-- Type: application/zip, Size: 37476 bytes --] [-- Attachment #1.3: Type: text/plain, Size: 70 bytes --] Here are two more logs on 5bbd9b249880, again without either patch: [-- Attachment #1.4: kdumpst-202406301810.zip --] [-- Type: application/zip, Size: 37184 bytes --] [-- Attachment #1.5: kdumpst-202406301814.zip --] [-- Type: application/zip, Size: 25921 bytes --] [-- Attachment #1.6: Type: text/plain, Size: 2888 bytes --] For completeness, here's a description of the remaining elements of my setup, but keep in mind that it's not necessary to reproduce the issue - only to explain how the logs have looked so far. hostapd cannot switch to a different channel while running; it has to be restarted on the new channel. I'm usually connected to WiFi, and constantly switching between stations depending on connectivity (NetworkManager does this automatically), which means that I'm constantly changing channels. So I have a script that runs `iw dev wlo1 info` every 2 seconds and greps the current channel number from its output (yes I know that `iw` has a warning not to scrape its output because it isn't considered stable; I don't know any other way to do this), and compares it to the channel number from its previous run of `iw`. It then restarts hostapd.service if they don't match, or stops/starts it if one of them is the empty string (meaning the interface had no channel number when `iw` was run). Finally - hostapd does not handle suspend+resume well. It stops working and spams 'handle_probe_req: send failed' into the logs, and it needs to be restarted. So I have a systemd service to automatically restart it on resume - ``` [Unit] After=suspend.target After=hibernate.target Description=restart hostapd after resume from suspend # ...because it stops working and spams the journal with # 'handle_probe_req: send failed' error [Service] Type=simple ExecStart=/usr/bin/systemctl restart hostapd.service [Install] WantedBy=suspend.target WantedBy=hibernate.target ``` I'll leave out the dnsmasq and iptables configuration I had to do, since I can't see how it could be related. > Attachment is a debug patch that add more messages and code trace, please help > to reproduce problem with patches of [2] and attachment. If your kernel enables > dynamic debug, need additional commands to have debug message: > sudo bash -c 'echo -n "module rtw89_core +p" > /sys/kernel/debug/dynamic_debug/control' > sudo bash -c 'echo -n "module rtw89_pci +p" > /sys/kernel/debug/dynamic_debug/control' > Since there are more than one symptoms causing system freeze, please collect > four logs as before. Also please give me two logs that system can normally > suspend/resume, so I can compare their difference. I applied both patches on the latest master; here are the crash logs. With the patches, I am no longer able to trigger the crash merely by suspending and resuming. I have to run `sudo systemctl restart hostapd.service` after hostapd emits the 'handle_probe_req: send failed' errors (which, as described above, happen after suspend+resume). So maybe [2] is making a difference here. I wanted to test with your debug patch and without [2], but the patch application failed unless I applied both together. [2] https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@realtek.com/ [-- Attachment #1.7: kdumpst-202406301756.zip --] [-- Type: application/zip, Size: 21084 bytes --] [-- Attachment #1.8: kdumpst-202406301800.zip --] [-- Type: application/zip, Size: 25619 bytes --] [-- Attachment #1.9: kdumpst-202406301828.zip --] [-- Type: application/zip, Size: 24761 bytes --] [-- Attachment #1.10: kdumpst-202406301830.zip --] [-- Type: application/zip, Size: 22935 bytes --] [-- Attachment #2: Type: text/plain, Size: 120 bytes --] Finally, here are two crash logs generated with Alt+SysRq+c, without hostapd enabled, and with both patches applied - [-- Attachment #3: kdumpst-202406301855.zip --] [-- Type: application/zip, Size: 23755 bytes --] [-- Attachment #4: kdumpst-202406301857.zip --] [-- Type: application/zip, Size: 23856 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-30 19:11 Forty Five @ 2024-07-03 7:39 ` Ping-Ke Shih 0 siblings, 0 replies; 28+ messages in thread From: Ping-Ke Shih @ 2024-07-03 7:39 UTC (permalink / raw) To: Forty Five Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev [-- Attachment #1: Type: text/plain, Size: 4514 bytes --] Hi Mathew, Forty Five <mathewegeorge@gmail.com> wrote: > > Ping-Ke Shih <pkshih@realtek.com> writes: > > > Since I saw 'NetworkManager' and 'hostapd' in code trace, I would like to know > > if you have two virtual interfaces, which for STA and AP modes? (Please check > > this by 'iw dev') If so, is it possible to remove hostapd (AP mode) to see if > > this is a factor causing crash. > > I use hostapd as part of a Wi-Fi hotspot setup for this laptop. I REALLY > wish I'd connected the dots earlier and realised that it could be > related to this issue. While running gbcbefbd032 (first bad commit), I > disabled all the components of my setup and the issue went away; then I > enabled them one by one until the issue emerged. I'll walk you through > the relevant details, and my observations during this process. > > I create a virtual interface for hostapd using this systemd unit: > > ``` > [Unit] > Requires=sys-subsystem-net-devices-wlo1.device > After=network.target > After=sys-subsystem-net-devices-wlo1.device > [Service] > Type=oneshot > ExecStart=/usr/bin/iw dev wlo1 interface add wlo1_ap type __ap addr "xx:xx:xx:xx:xx:xx" > ExecStart=/usr/bin/ip addr add 192.168.30.1/24 dev wlo1_ap > [Install] > WantedBy=multi-user.target > ``` > > I need the '__ap' type because my card doesn't support two interfaces in > managed mode; see [1] for details. > > [1] https://wiki.archlinux.org/title/Talk:Software_access_point#Two_interfaces_on_same_card > > Then I configure NetworkManager to ignore this interface. > > ``` > ;; in /etc/NetworkManager/conf.d/unmanaged.conf > [keyfile] > unmanaged-devices=interface-name:wlo1_ap > ``` > > Coming to hostapd - this is where it gets rather complicated. First off, > let me mention that when I enabled hostapd.service again, I started > seeing the 'phy0: resume with hardware scan still in progress' warnings, > which had gone away upto this point. > > Next - once I enabled hostapd.service, I was able to reproduce the > crashes. However, the dmesg in the crash log was different from what I > see when I have the rest of my setup enabled (I hadn't applied either > patch when this crash happened, and it's on b54846da4 because that's the > earliest bad commit in which I'm able to produce crash logs at all, as I > described in my original message): Your setup is very complicated, so I can't setup in my side easily, and haven't time to dig deeper. I feel there are more than one problems, so please help to do some experiments to narrow down scope. First problem is the culprit commit [1] that makes system frozen, and I still feel the patch [2] you have taken can fix it. Please use [1] as code base and apply patch [2] to see the result (#exp 1). The difference between without [1] and with [1] + [2] is the timing driver report scan abort completion to mac80211. And the last few logs you collected show that crash after long time from scanning abort. Second problem is WiFi firmware get abnormal during doing resume. The log looks like (partially): [ T562] rtw89_8852be 0000:02:00.0: R_AX_RPQ_RXBD_IDX =0x00000000 [ T562] rtw89_8852be 0000:02:00.0: R_AX_DBG_ERR_FLAG=0x00000000 [ T562] rtw89_8852be 0000:02:00.0: R_AX_LBC_WATCHDOG=0x00000081 [ T562] rtw89_8852be 0000:02:00.0: <--- [ T562] rtw89_8852be 0000:02:00.0: SER catches error: 0x5000 In my side, this is rare, and your last few logs seem not happen. Not sure if this is because of timing result from adding many logs. I would defer this problem for now. Third (unsure) problem could be introduced by commits between [1] and [3]. If first problem can be addressed by #exp 1, it could be possible to bisect the problem between [1] and [3]. Even if [1] is the only problem, revert the commit to see if it becomes good (#exp 2). Summary: o 5bbd9b249880 [3] (v6.10-rc5) | #exp 2: 5bbd9b249880 + [4] (revert [1]; I feel this would be bad). : : : o bcbefbd032df [1] ("wifi: rtw89: add wait/completion for abort scan") | #exp 1: bcbefbd032df + [2] (I think this will be good.) o 7e11a2966f51 (this commit is good) [1] bcbefbd032df ("wifi: rtw89: add wait/completion for abort scan") [2] fix scan abort https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@realtek.com/ [3] 5bbd9b249880 (v6.10-rc5; the top of tree you are tring) [4] attached revert patch of [1] Ping-Ke [-- Attachment #2: 0001-Revert-wifi-rtw89-add-wait-completion-for-abort-scan.patch --] [-- Type: application/octet-stream, Size: 8312 bytes --] From e035c8bc79c05cb0a208566f9145590e104f6571 Mon Sep 17 00:00:00 2001 From: Ping-Ke Shih <pkshih@realtek.com> Date: Wed, 3 Jul 2024 15:09:05 +0800 Subject: [PATCH] Revert "wifi: rtw89: add wait/completion for abort scan" This reverts commit bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> --- drivers/net/wireless/realtek/rtw89/core.h | 1 - drivers/net/wireless/realtek/rtw89/fw.c | 23 ++++----------- drivers/net/wireless/realtek/rtw89/fw.h | 10 ------- drivers/net/wireless/realtek/rtw89/mac.c | 36 ++--------------------- drivers/net/wireless/realtek/rtw89/mac.h | 3 +- 5 files changed, 10 insertions(+), 63 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw89/core.h b/drivers/net/wireless/realtek/rtw89/core.h index 112bdd95fc6e..46d535f98083 100644 --- a/drivers/net/wireless/realtek/rtw89/core.h +++ b/drivers/net/wireless/realtek/rtw89/core.h @@ -5078,7 +5078,6 @@ struct rtw89_hw_scan_info { struct ieee80211_vif *scanning_vif; struct list_head pkt_list[NUM_NL80211_BANDS]; struct rtw89_chan op_chan; - bool abort; u32 last_chan_idx; }; diff --git a/drivers/net/wireless/realtek/rtw89/fw.c b/drivers/net/wireless/realtek/rtw89/fw.c index 044a5b90c7f4..dc76af924580 100644 --- a/drivers/net/wireless/realtek/rtw89/fw.c +++ b/drivers/net/wireless/realtek/rtw89/fw.c @@ -4653,7 +4653,7 @@ int rtw89_fw_h2c_scan_list_offload(struct rtw89_dev *rtwdev, int ch_num, H2C_CAT_MAC, H2C_CL_MAC_FW_OFLD, H2C_FUNC_ADD_SCANOFLD_CH, 1, 1, skb_len); - cond = RTW89_SCANOFLD_WAIT_COND_ADD_CH; + cond = RTW89_FW_OFLD_WAIT_COND(0, H2C_FUNC_ADD_SCANOFLD_CH); ret = rtw89_h2c_tx_and_wait(rtwdev, skb, wait, cond); if (ret) { @@ -4749,7 +4749,7 @@ int rtw89_fw_h2c_scan_list_offload_be(struct rtw89_dev *rtwdev, int ch_num, H2C_CAT_MAC, H2C_CL_MAC_FW_OFLD, H2C_FUNC_ADD_SCANOFLD_CH, 1, 1, skb_len); - cond = RTW89_SCANOFLD_WAIT_COND_ADD_CH; + cond = RTW89_FW_OFLD_WAIT_COND(0, H2C_FUNC_ADD_SCANOFLD_CH); ret = rtw89_h2c_tx_and_wait(rtwdev, skb, wait, cond); if (ret) { @@ -4808,10 +4808,7 @@ int rtw89_fw_h2c_scan_offload(struct rtw89_dev *rtwdev, H2C_FUNC_SCANOFLD, 1, 1, len); - if (option->enable) - cond = RTW89_SCANOFLD_WAIT_COND_START; - else - cond = RTW89_SCANOFLD_WAIT_COND_STOP; + cond = RTW89_FW_OFLD_WAIT_COND(0, H2C_FUNC_SCANOFLD); ret = rtw89_h2c_tx_and_wait(rtwdev, skb, wait, cond); if (ret) { @@ -5501,7 +5498,7 @@ static bool rtw89_fw_c2h_chk_atomic(struct rtw89_dev *rtwdev, default: return false; case RTW89_C2H_CAT_MAC: - return rtw89_mac_c2h_chk_atomic(rtwdev, c2h, class, func); + return rtw89_mac_c2h_chk_atomic(rtwdev, class, func); case RTW89_C2H_CAT_OUTSRC: return rtw89_phy_c2h_chk_atomic(rtwdev, class, func); } @@ -6178,7 +6175,6 @@ void rtw89_hw_scan_start(struct rtw89_dev *rtwdev, struct ieee80211_vif *vif, rtw89_get_channel(rtwdev, rtwvif, &rtwdev->scan_info.op_chan); rtwdev->scan_info.scanning_vif = vif; rtwdev->scan_info.last_chan_idx = 0; - rtwdev->scan_info.abort = false; rtwvif->scan_ies = &scan_req->ies; rtwvif->scan_req = req; ieee80211_stop_queues(rtwdev->hw); @@ -6231,21 +6227,14 @@ void rtw89_hw_scan_complete(struct rtw89_dev *rtwdev, struct ieee80211_vif *vif, rtwvif->scan_ies = NULL; scan_info->last_chan_idx = 0; scan_info->scanning_vif = NULL; - scan_info->abort = false; rtw89_chanctx_proceed(rtwdev); } void rtw89_hw_scan_abort(struct rtw89_dev *rtwdev, struct ieee80211_vif *vif) { - struct rtw89_hw_scan_info *scan_info = &rtwdev->scan_info; - int ret; - - scan_info->abort = true; - - ret = rtw89_hw_scan_offload(rtwdev, vif, false); - if (ret) - rtw89_hw_scan_complete(rtwdev, vif, true); + rtw89_hw_scan_offload(rtwdev, vif, false); + rtw89_hw_scan_complete(rtwdev, vif, true); } static bool rtw89_is_any_vif_connected_or_connecting(struct rtw89_dev *rtwdev) diff --git a/drivers/net/wireless/realtek/rtw89/fw.h b/drivers/net/wireless/realtek/rtw89/fw.h index 4151c9d566bd..99da64cf1b01 100644 --- a/drivers/net/wireless/realtek/rtw89/fw.h +++ b/drivers/net/wireless/realtek/rtw89/fw.h @@ -210,12 +210,6 @@ enum rtw89_scanofld_notify_reason { RTW89_SCAN_LEAVE_OP_NOTIFY, }; -enum rtw89_scanofld_status { - RTW89_SCAN_STATUS_NOTIFY, - RTW89_SCAN_STATUS_SUCCESS, - RTW89_SCAN_STATUS_FAIL, -}; - enum rtw89_chan_type { RTW89_CHAN_OPERATE = 0, RTW89_CHAN_ACTIVE, @@ -3997,10 +3991,6 @@ enum rtw89_fw_ofld_h2c_func { RTW89_FW_OFLD_WAIT_COND(RTW89_PKT_OFLD_WAIT_TAG(pkt_id, pkt_op), \ H2C_FUNC_PACKET_OFLD) -#define RTW89_SCANOFLD_WAIT_COND_ADD_CH RTW89_FW_OFLD_WAIT_COND(0, H2C_FUNC_ADD_SCANOFLD_CH) - -#define RTW89_SCANOFLD_WAIT_COND_START RTW89_FW_OFLD_WAIT_COND(0, H2C_FUNC_SCANOFLD) -#define RTW89_SCANOFLD_WAIT_COND_STOP RTW89_FW_OFLD_WAIT_COND(1, H2C_FUNC_SCANOFLD) #define RTW89_SCANOFLD_BE_WAIT_COND_START RTW89_FW_OFLD_WAIT_COND(0, H2C_FUNC_SCANOFLD_BE) #define RTW89_SCANOFLD_BE_WAIT_COND_STOP RTW89_FW_OFLD_WAIT_COND(1, H2C_FUNC_SCANOFLD_BE) diff --git a/drivers/net/wireless/realtek/rtw89/mac.c b/drivers/net/wireless/realtek/rtw89/mac.c index 3fe0046f6eaa..833c648440ab 100644 --- a/drivers/net/wireless/realtek/rtw89/mac.c +++ b/drivers/net/wireless/realtek/rtw89/mac.c @@ -4765,7 +4765,7 @@ rtw89_mac_c2h_scanofld_rsp(struct rtw89_dev *rtwdev, struct sk_buff *skb, rtw89_warn(rtwdev, "HW scan failed: %d\n", ret); } } else { - rtw89_hw_scan_complete(rtwdev, vif, rtwdev->scan_info.abort); + rtw89_hw_scan_complete(rtwdev, vif, false); } break; case RTW89_SCAN_ENTER_OP_NOTIFY: @@ -4888,10 +4888,8 @@ rtw89_mac_c2h_done_ack(struct rtw89_dev *rtwdev, struct sk_buff *skb_c2h, u32 le default: return; case H2C_FUNC_ADD_SCANOFLD_CH: - cond = RTW89_SCANOFLD_WAIT_COND_ADD_CH; - break; case H2C_FUNC_SCANOFLD: - cond = RTW89_SCANOFLD_WAIT_COND_START; + cond = RTW89_FW_OFLD_WAIT_COND(0, h2c_func); break; case H2C_FUNC_SCANOFLD_BE: cond = RTW89_SCANOFLD_BE_WAIT_COND_START; @@ -5260,32 +5258,7 @@ void (* const rtw89_mac_c2h_wow_handler[])(struct rtw89_dev *rtwdev, [RTW89_MAC_C2H_FUNC_AOAC_REPORT] = rtw89_mac_c2h_wow_aoac_rpt, }; -static void rtw89_mac_c2h_scanofld_rsp_atomic(struct rtw89_dev *rtwdev, - struct sk_buff *skb) -{ - const struct rtw89_c2h_scanofld *c2h = - (const struct rtw89_c2h_scanofld *)skb->data; - struct rtw89_wait_info *fw_ofld_wait = &rtwdev->mac.fw_ofld_wait; - struct rtw89_completion_data data = {}; - unsigned int cond; - u8 status, reason; - - status = le32_get_bits(c2h->w2, RTW89_C2H_SCANOFLD_W2_STATUS); - reason = le32_get_bits(c2h->w2, RTW89_C2H_SCANOFLD_W2_RSN); - data.err = status != RTW89_SCAN_STATUS_SUCCESS; - - if (reason == RTW89_SCAN_END_SCAN_NOTIFY) { - if (rtwdev->chip->chip_gen == RTW89_CHIP_BE) - cond = RTW89_SCANOFLD_BE_WAIT_COND_STOP; - else - cond = RTW89_SCANOFLD_WAIT_COND_STOP; - - rtw89_complete_cond(fw_ofld_wait, cond, &data); - } -} - -bool rtw89_mac_c2h_chk_atomic(struct rtw89_dev *rtwdev, struct sk_buff *c2h, - u8 class, u8 func) +bool rtw89_mac_c2h_chk_atomic(struct rtw89_dev *rtwdev, u8 class, u8 func) { switch (class) { default: @@ -5302,9 +5275,6 @@ bool rtw89_mac_c2h_chk_atomic(struct rtw89_dev *rtwdev, struct sk_buff *c2h, switch (func) { default: return false; - case RTW89_MAC_C2H_FUNC_SCANOFLD_RSP: - rtw89_mac_c2h_scanofld_rsp_atomic(rtwdev, c2h); - return false; case RTW89_MAC_C2H_FUNC_PKT_OFLD_RSP: return true; } diff --git a/drivers/net/wireless/realtek/rtw89/mac.h b/drivers/net/wireless/realtek/rtw89/mac.h index a580cb719233..5a9766a36fe4 100644 --- a/drivers/net/wireless/realtek/rtw89/mac.h +++ b/drivers/net/wireless/realtek/rtw89/mac.h @@ -1172,8 +1172,7 @@ static inline int rtw89_chip_reset_bb_rf(struct rtw89_dev *rtwdev) u32 rtw89_mac_get_err_status(struct rtw89_dev *rtwdev); int rtw89_mac_set_err_status(struct rtw89_dev *rtwdev, u32 err); -bool rtw89_mac_c2h_chk_atomic(struct rtw89_dev *rtwdev, struct sk_buff *c2h, - u8 class, u8 func); +bool rtw89_mac_c2h_chk_atomic(struct rtw89_dev *rtwdev, u8 class, u8 func); void rtw89_mac_c2h_handle(struct rtw89_dev *rtwdev, struct sk_buff *skb, u32 len, u8 class, u8 func); int rtw89_mac_setup_phycap(struct rtw89_dev *rtwdev); -- 2.25.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [REGRESSION] Freeze on resume from S3 (bisected) @ 2024-06-19 4:39 Forty Five 2024-06-19 6:07 ` Ping-Ke Shih 0 siblings, 1 reply; 28+ messages in thread From: Forty Five @ 2024-06-19 4:39 UTC (permalink / raw) To: linux-wireless; +Cc: linux-pm, phhuang, pkshih, kvalo, regressions I've been experiencing system freezes that require a hard reset with the power button, starting from v6.8-rc1. The warnings produced suggest a wireless-related issue, and the bisection seems to confirm this. The freezes usually occur right after resuming from S3 (suspend). To reproduce the issue, I just suspend (`sudo systemctl suspend`) and resume (press any key on keyboard) a few times until the freeze happens. It takes 4-5 cycles of this most of the time, though it varies a lot. The upshot of all of this is that S3 (and S4 as well, see below) are pretty much unusable for me, due to the risk of losing work due to a freeze. Since a hard reset causes the systemd journal to be dropped, I set up [kdumpst](https://gitlab.freedesktop.org/gpiccoli/kdumpst) so that I could record dmesg output at the time of the freeze. I've linked to the crash logs produced by that tool for every bad commit in the bisection (except in cases where I couldn't get the kdump to happen - see the comments in the bisection log below), as well as a crash log triggered by SysRq (Alt+SysRq+c) on a good commit for comparison. I've already submitted this as an Arch Linux bug report: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/61 The freezes have also occurred (seemingly) at random times, although this is quite rare. They do seem to be more likely to occur after the system has been hibernated, although I don't really have any way to confirm this. Git bisect log: git bisect start # status: waiting for both good and bad commits # bad: [83a7eefedc9b56fe7bfeff13b6c7356688ffa670] Linux 6.10-rc3 git bisect bad 83a7eefedc9b56fe7bfeff13b6c7356688ffa670 # good: [e8f897f4afef0031fe618a8e94127a0934896aba] Linux 6.8 git bisect good e8f897f4afef0031fe618a8e94127a0934896aba # bad: [445e60303883950161f67e18b9f048b18d7fb706] Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue git bisect bad 445e60303883950161f67e18b9f048b18d7fb706 # bad: [e5e038b7ae9da96b93974bf072ca1876899a01a3] Merge tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs # this one froze on the first suspend/resume git bisect bad e5e038b7ae9da96b93974bf072ca1876899a01a3 # good: [1f440397665f4241346e4cc6d93f8b73880815d1] Merge tag 'docs-6.9' of git://git.lwn.net/linux git bisect good 1f440397665f4241346e4cc6d93f8b73880815d1 # bad: [a2f24c8a955c8f941d6ac08dd7f401f54eef4627] Merge branch 'mptcp-some-clean-up-patches' git bisect bad a2f24c8a955c8f941d6ac08dd7f401f54eef4627 # bad: [26f4dac11775a1ca24e2605cb30e828d4dbdea93] netfilter: x_tables: Use unsafe_memcpy() for 0-sized destination # This one was very hard to reproduce - it must have taken 25-30 cycles total, and I hibernated the system twice git bisect bad 26f4dac11775a1ca24e2605cb30e828d4dbdea93 # good: [4f5e5092fdbf5cec6bedc19fbe69cce4f5f08372] Merge tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net git bisect good 4f5e5092fdbf5cec6bedc19fbe69cce4f5f08372 # good: [b4e8ae5c8c41355791a99fdf2fcac16deace1e79] net: add napi_busy_loop_rcu() git bisect good b4e8ae5c8c41355791a99fdf2fcac16deace1e79 # bad: [20ea9327c2fd545d6b96e998727bcd724290694d] net: dccp: Simplify the allocation of slab caches in dccp_ackvec_init git bisect bad 20ea9327c2fd545d6b96e998727bcd724290694d # bad: [92046e83c07b064ca65ac4ae7660a540016bdfc1] Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next git bisect bad 92046e83c07b064ca65ac4ae7660a540016bdfc1 # bad: [b54846da45942bbe4e5ebc59d497e4a48525ba5a] Merge tag 'wireless-next-2024-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next git bisect bad b54846da45942bbe4e5ebc59d497e4a48525ba5a # good: [3832a9c40b356500c5b85a6fdf9577c590fcd637] wifi: rtw89: fw: extend JOIN H2C command to support WiFi 7 chips git bisect good 3832a9c40b356500c5b85a6fdf9577c590fcd637 # bad: [5ba45ba77616637e554d66a57ef0334e5cc2efe4] wifi: rtw89: fix disabling concurrent mode TX hang issue # I see the same kernel warnings here as with the other bad commits. # However, there is no crash after a few suspend/resume cycles - instead # the system simply restarts when I trigger the nth suspend. I'm not # able to trigger a crash with SysRq either (Alt+SysRq+c) - again, the # system restarts. This is the case with all bad commits after this. git bisect bad 5ba45ba77616637e554d66a57ef0334e5cc2efe4 # good: [85da8f71aaa7b83ea7ef0e89182e0cd47e16d465] wifi: brcmfmac: Demote vendor-specific attach/detach messages to info git bisect good 85da8f71aaa7b83ea7ef0e89182e0cd47e16d465 # good: [295304040d9f6f350b68652acd99650c7e16d0a8] wifi: rtw89: 8922a: add TX power related ops git bisect good 295304040d9f6f350b68652acd99650c7e16d0a8 # good: [7cf6b6764b2f665d317ba0f91c247437019a2f4c] wifi: rtw89: Set default CQM config if not present git bisect good 7cf6b6764b2f665d317ba0f91c247437019a2f4c # good: [7e11a2966f51695c0af0b1f976a32d64dee243b2] wifi: rtw89: fix null pointer access when abort scan git bisect good 7e11a2966f51695c0af0b1f976a32d64dee243b2 # bad: [f59a98c82534e986b06615ba94e060aa3129b08b] wifi: rtw89: fix HW scan timeout due to TSF sync issue git bisect bad f59a98c82534e986b06615ba94e060aa3129b08b # bad: [bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23] wifi: rtw89: add wait/completion for abort scan git bisect bad bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23 # first bad commit: [bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23] wifi: rtw89: add wait/completion for abort scan #regzbot introduced: bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23 Logs produced by kdumpst: [kdumpst-e5e038b7ae9da96b93974bf072ca1876899a01a3.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/b925cb5e9c6cfb6b224993c017ba61f5/kdumpst-e5e038b7ae9da96b93974bf072ca1876899a01a3.zip) [kdumpst-b54846da45942bbe4e5ebc59d497e4a48525ba5a.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/e1610c3ae26971a890121d3a5eb8fa9e/kdumpst-b54846da45942bbe4e5ebc59d497e4a48525ba5a.zip) [kdumpst-a2f24c8a955c8f941d6ac08dd7f401f54eef4627.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/34171a87da304987cc8ce5066fc35dd5/kdumpst-a2f24c8a955c8f941d6ac08dd7f401f54eef4627.zip) [kdumpst-92046e83c07b064ca65ac4ae7660a540016bdfc1.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/67d756f963ffafd0b332c88701286a30/kdumpst-92046e83c07b064ca65ac4ae7660a540016bdfc1.zip) [kdumpst-445e60303883950161f67e18b9f048b18d7fb706.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/f75f91d127050a327114bf904f335087/kdumpst-445e60303883950161f67e18b9f048b18d7fb706.zip) [kdumpst-26f4dac11775a1ca24e2605cb30e828d4dbdea93.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/f3d26fc4187c8b8854386fed6f15c972/kdumpst-26f4dac11775a1ca24e2605cb30e828d4dbdea93.zip) [kdumpst-20ea9327c2fd545d6b96e998727bcd724290694d.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/8618207aff0eb92413ce0c2994c7888f/kdumpst-20ea9327c2fd545d6b96e998727bcd724290694d.zip) Good commit: [kdumpst-e8f897f4afef0031fe618a8e94127a0934896aba.zip](https://gitlab.archlinux.org/archlinux/packaging/packages/linux/uploads/b13b3914e50fc85548c77a4163dec383/kdumpst-e8f897f4afef0031fe618a8e94127a0934896aba.zip) -- ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-19 4:39 Forty Five @ 2024-06-19 6:07 ` Ping-Ke Shih 2024-06-19 14:46 ` Forty Five 0 siblings, 1 reply; 28+ messages in thread From: Ping-Ke Shih @ 2024-06-19 6:07 UTC (permalink / raw) To: Forty Five, linux-wireless@vger.kernel.org Cc: linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev > # first bad commit: [bcbefbd032df6bfe925e6afeca82eb9d2cc0cb23] wifi: rtw89: add wait/completion for abort > scan > Please try [1] that fixed "wifi: rtw89: add wait/completion for abort" for certain cases. [1] https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@realtek.com/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-19 6:07 ` Ping-Ke Shih @ 2024-06-19 14:46 ` Forty Five 2024-06-20 8:16 ` Ping-Ke Shih 0 siblings, 1 reply; 28+ messages in thread From: Forty Five @ 2024-06-19 14:46 UTC (permalink / raw) To: Ping-Ke Shih Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev [-- Attachment #1: Type: text/plain, Size: 283 bytes --] > Please try [1] that fixed "wifi: rtw89: add wait/completion for abort" for > certain cases. > > [1] https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@realtek.com/ Doesn't fix the issue. I get a freeze on the first suspend+resume. I've attached the crash log. [-- Attachment #2: kdumpst crash log --] [-- Type: application/zip, Size: 31722 bytes --] [-- Attachment #3: Type: text/plain, Size: 3 bytes --] -- ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-19 14:46 ` Forty Five @ 2024-06-20 8:16 ` Ping-Ke Shih 2024-06-20 8:56 ` Kalle Valo 2024-06-20 9:18 ` Mathew George 0 siblings, 2 replies; 28+ messages in thread From: Ping-Ke Shih @ 2024-06-20 8:16 UTC (permalink / raw) To: Forty Five Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev Forty Five <mathewegeorge@gmail.com> wrote: > > Please try [1] that fixed "wifi: rtw89: add wait/completion for abort" for > > certain cases. > > > > [1] https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@realtek.com/ > > Doesn't fix the issue. I get a freeze on the first suspend+resume. I've attached the crash log. I tried 4.10-rc4 + the patch on ubuntu. Never reproduce the symptom. Please share Arch Linux image you are using. Could you please help to collect 2 or more crash log? So I can check if there are more than one crash cases. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-20 8:16 ` Ping-Ke Shih @ 2024-06-20 8:56 ` Kalle Valo 2024-06-20 9:06 ` Ping-Ke Shih 2024-06-20 9:18 ` Mathew George 1 sibling, 1 reply; 28+ messages in thread From: Kalle Valo @ 2024-06-20 8:56 UTC (permalink / raw) To: Ping-Ke Shih Cc: Forty Five, linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, regressions@lists.linux.dev Ping-Ke Shih <pkshih@realtek.com> writes: > Forty Five <mathewegeorge@gmail.com> wrote: >> > Please try [1] that fixed "wifi: rtw89: add wait/completion for abort" for >> > certain cases. >> > >> > [1] >> > https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@realtek.com/ >> >> Doesn't fix the issue. I get a freeze on the first suspend+resume. >> I've attached the crash log. > > I tried 4.10-rc4 + the patch on ubuntu. Never reproduce the symptom. I guess you mean v6.10-rc4? -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-20 8:56 ` Kalle Valo @ 2024-06-20 9:06 ` Ping-Ke Shih 0 siblings, 0 replies; 28+ messages in thread From: Ping-Ke Shih @ 2024-06-20 9:06 UTC (permalink / raw) To: Kalle Valo Cc: Forty Five, linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, regressions@lists.linux.dev Kalle Valo <kvalo@kernel.org> worte: > > Ping-Ke Shih <pkshih@realtek.com> writes: > > > Forty Five <mathewegeorge@gmail.com> wrote: > >> > Please try [1] that fixed "wifi: rtw89: add wait/completion for abort" for > >> > certain cases. > >> > > >> > [1] > >> > https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@realtek.com/ > >> > >> Doesn't fix the issue. I get a freeze on the first suspend+resume. > >> I've attached the crash log. > > > > I tried 4.10-rc4 + the patch on ubuntu. Never reproduce the symptom. > > I guess you mean v6.10-rc4? Right. Sorry for the typo. ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-20 8:16 ` Ping-Ke Shih 2024-06-20 8:56 ` Kalle Valo @ 2024-06-20 9:18 ` Mathew George 2024-06-20 9:33 ` Ping-Ke Shih 2024-06-20 13:41 ` Forty Five 1 sibling, 2 replies; 28+ messages in thread From: Mathew George @ 2024-06-20 9:18 UTC (permalink / raw) To: Ping-Ke Shih Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev > Please share Arch Linux image you are using. Not sure what you mean by 'image'. As you can see in the crash log attached to my previous mail, I built the latest mainline kernel (445e60303883 at the time) with your patch applied. I used [this PKGBUILD](https://aur.archlinux.org/packages/linux-git) to build it; the file `config` contains the kernel configuration (I did not apply any other options), and there are no patches applied except yours. > Could you please help to collect 2 or more crash log? > So I can check if there are > more than one crash cases. When I am back at my system, I will reproduce the issue a few more times with this kernel, and attach the logs. In the meantime, you could have a look at the logs linked in my first mail. There are logs for most of the bad commits encountered in the bisection. On 20 June 2024 13:46:21 GMT+05:30, Ping-Ke Shih <pkshih@realtek.com> wrote: >Forty Five <mathewegeorge@gmail.com> wrote: >> > Please try [1] that fixed "wifi: rtw89: add wait/completion for abort" for >> > certain cases. >> > >> > [1] https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@realtek.com/ >> >> Doesn't fix the issue. I get a freeze on the first suspend+resume. I've attached the crash log. > >I tried 4.10-rc4 + the patch on ubuntu. Never reproduce the symptom. >Please share Arch Linux image you are using. > >Could you please help to collect 2 or more crash log? So I can check if there are >more than one crash cases. > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-20 9:18 ` Mathew George @ 2024-06-20 9:33 ` Ping-Ke Shih 2024-06-20 10:05 ` Mathew George 2024-06-20 13:41 ` Forty Five 1 sibling, 1 reply; 28+ messages in thread From: Ping-Ke Shih @ 2024-06-20 9:33 UTC (permalink / raw) To: Mathew George Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev Mathew George <mathewegeorge@gmail.com> wrote: > > Please share Arch Linux image you are using. > > Not sure what you mean by 'image'. I feel this problem may be easier to reproduce on Arch Linux, so I would like to know Arch Linux iso file you installed. > As you can see in the crash log attached to my > previous mail, I built the latest mainline kernel (445e60303883 at the time) with your > patch applied. I used [this PKGBUILD](https://aur.archlinux.org/packages/linux-git) > to build it; the file `config` contains the kernel configuration (I did not apply any other > options), and there are no patches applied except yours. I will do it as your side. > > > Could you please help to collect 2 or more crash log? > > So I can check if there are > > more than one crash cases. > > When I am back at my system, I will reproduce the issue a few more times with > this kernel, and attach the logs. In the meantime, you could have a look at the logs > linked in my first mail. There are logs for most of the bad commits encountered in the > bisection. I have seen that, but no clear idea for now, so I will install Arch Linux as yours in my side. ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-20 9:33 ` Ping-Ke Shih @ 2024-06-20 10:05 ` Mathew George 2024-06-20 11:41 ` Ping-Ke Shih 0 siblings, 1 reply; 28+ messages in thread From: Mathew George @ 2024-06-20 10:05 UTC (permalink / raw) To: Ping-Ke Shih Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev > I have seen that, but no clear idea for now, so I will install Arch Linux as yours > in my side. I really appreciate your dedication here, but I'm not sure that this is related to my OS. I feel it might be a corner case that manifests only on certain hardware configurations, otherwise it would probably have been encountered by other people by now. I can't say this with any confidence, since this is my first kernel bug, and I don't have any factual basis for this feeling; I just don't want you to burn yourself out with the Arch installation process when it might not help in diagnosing the issue. Ultimately we'll go by whatever you think is best, though; you're the expert here, not me. > I feel this problem may be easier to reproduce on Arch Linux, so I would like > to know Arch Linux iso file you installed. I don't remember the iso version that I used (it was years ago), and I don't know of any way to check, but it shouldn't matter. AFAIK the Arch iso is only used to bootstrap the system, so its version should not be of any consequence to my current configuration. You might want to look at https://wiki.archlinux.org/title/Installation_guide to get an idea of what the process is like; as you'll see it's very manual and takes a fair bit of effort. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-20 10:05 ` Mathew George @ 2024-06-20 11:41 ` Ping-Ke Shih 2024-06-20 11:58 ` Johannes Berg 2024-06-20 13:05 ` Forty Five 0 siblings, 2 replies; 28+ messages in thread From: Ping-Ke Shih @ 2024-06-20 11:41 UTC (permalink / raw) To: mathewegeorge@gmail.com Cc: linux-wireless@vger.kernel.org, kvalo@kernel.org, linux-pm@vger.kernel.org, Bernie Huang, regressions@lists.linux.dev On Thu, 2024-06-20 at 15:35 +0530, Mathew George wrote: > > > I feel this problem may be easier to reproduce on Arch Linux, so I would like > > to know Arch Linux iso file you installed. > > I don't remember the iso version that I used (it was years ago), and I don't know of any > way to check, but it shouldn't matter. AFAIK the Arch iso is only used to bootstrap the > system, so its version should not be of any consequence to my current configuration. > You might want to look at https://wiki.archlinux.org/title/Installation_guide > to get an idea of what the process is like; as you'll see it's very manual and takes a fair bit > of effort. Please provide output of 'cat /etc/lsb-release', which Arch Linux version should be there, to me. I hope using the same version as yours makes the symptom reproducible. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-20 11:41 ` Ping-Ke Shih @ 2024-06-20 11:58 ` Johannes Berg 2024-06-20 13:05 ` Forty Five 1 sibling, 0 replies; 28+ messages in thread From: Johannes Berg @ 2024-06-20 11:58 UTC (permalink / raw) To: Ping-Ke Shih, mathewegeorge@gmail.com Cc: linux-wireless@vger.kernel.org, kvalo@kernel.org, linux-pm@vger.kernel.org, Bernie Huang, regressions@lists.linux.dev I don't really know any of this here, but ... + ret = rtw89_hw_scan_offload(rtwdev, vif, false); + if (ret) + rtw89_hw_scan_complete(rtwdev, vif, true); seems strange? You have to say that it was completed here, in the good case, so maybe that was meant to be !ret? It _looks_ like the crash is a use-after-free (the wiphy pointer in a scan request cannot become NULL in normal flows), so maybe try with KASAN rather than waiting for the crash. According to the logs, it doesn't happen every time even for the reporter. There possibly seems to be some issue between cfg80211 and mac80211 in this code, we see the WARN_ON() in cfg80211_netdev_notifier_call() in the NETDEV_DOWN case, which calls ___cfg80211_scan_done() which frees the scan request. But shortly after the HW crashes, and we have "ieee80211_restart_work called with hardware scan in progress", mac80211 wants to cancel the HW scan but the HW is dead ("wlo1: Failed check- sdata-in-driver check, flags: 0x0"), and we see again "phy0: resume with hardware scan still in progress" ... but this time once tasks are restarted it crashes ... So I think KASAN, possibly rtw debugs, and perhaps something like https://p.sipsolutions.net/602684f34abfcf7c.txt will help debug it (yes it adds a leak) johannes ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-20 11:41 ` Ping-Ke Shih 2024-06-20 11:58 ` Johannes Berg @ 2024-06-20 13:05 ` Forty Five 1 sibling, 0 replies; 28+ messages in thread From: Forty Five @ 2024-06-20 13:05 UTC (permalink / raw) To: Ping-Ke Shih Cc: linux-wireless@vger.kernel.org, kvalo@kernel.org, linux-pm@vger.kernel.org, Bernie Huang, regressions@lists.linux.dev Ping-Ke Shih <pkshih@realtek.com> writes: > Please provide output of 'cat /etc/lsb-release', which Arch Linux version > should be there, to me. I don't have that file. But I do have `/etc/os-release`: NAME="Arch Linux" PRETTY_NAME="Arch Linux" ID=arch BUILD_ID=rolling ANSI_COLOR="38;2;23;147;209" HOME_URL="https://archlinux.org/" DOCUMENTATION_URL="https://wiki.archlinux.org/" SUPPORT_URL="https://bbs.archlinux.org/" BUG_REPORT_URL="https://gitlab.archlinux.org/groups/archlinux/-/issues" PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/" LOGO=archlinux-logo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-20 9:18 ` Mathew George 2024-06-20 9:33 ` Ping-Ke Shih @ 2024-06-20 13:41 ` Forty Five 2024-06-28 3:55 ` Ping-Ke Shih 1 sibling, 1 reply; 28+ messages in thread From: Forty Five @ 2024-06-20 13:41 UTC (permalink / raw) To: Forty Five Cc: Ping-Ke Shih, linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev [-- Attachment #1: Type: text/plain, Size: 344 bytes --] Mathew George <mathewegeorge@gmail.com> writes: > When I am back at my system, I will reproduce the issue a few more times with > this kernel, and attach the logs. In the meantime, you could have a look at the logs > linked in my first mail. There are logs for most of the bad commits encountered in the > bisection. I've attached more logs. [-- Attachment #2: kdumpst-202406201323.zip --] [-- Type: application/zip, Size: 36895 bytes --] [-- Attachment #3: kdumpst-202406201330.zip --] [-- Type: application/zip, Size: 31639 bytes --] [-- Attachment #4: kdumpst-202406201333.zip --] [-- Type: application/zip, Size: 36619 bytes --] [-- Attachment #5: kdumpst-202406201335.zip --] [-- Type: application/zip, Size: 31726 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [REGRESSION] Freeze on resume from S3 (bisected) 2024-06-20 13:41 ` Forty Five @ 2024-06-28 3:55 ` Ping-Ke Shih 0 siblings, 0 replies; 28+ messages in thread From: Ping-Ke Shih @ 2024-06-28 3:55 UTC (permalink / raw) To: Forty Five Cc: linux-wireless@vger.kernel.org, linux-pm@vger.kernel.org, Bernie Huang, kvalo@kernel.org, regressions@lists.linux.dev [-- Attachment #1: Type: text/plain, Size: 2054 bytes --] Hi Mathew, Forty Five <mathewegeorge@gmail.com> wrote: > > Mathew George <mathewegeorge@gmail.com> writes: > > > When I am back at my system, I will reproduce the issue a few more times with > > this kernel, and attach the logs. In the meantime, you could have a look at the logs > > linked in my first mail. There are logs for most of the bad commits encountered in the > > bisection. > > I've attached more logs. Thanks for the logs, which you met two kinds of problems. One is firmware gets wrong during system resumes, and the other is to close a disappear netdev. However I still can't dig why it gets wrong. I will focus on latter one first. The behavior between no commit [1] and the latest tree + [2] is the time waiting for ACK from scan abort firmware command. The former one is longer, and latter one is shorter. I also enable kernel debug KASAN suggested by Johannes to dig problem, but in my side I can't see any kernel warning and the crash. Since I saw 'NetworkManager' and 'hostapd' in code trace, I would like to know if you have two virtual interfaces, which for STA and AP modes? (Please check this by 'iw dev') If so, is it possible to remove hostapd (AP mode) to see if this is a factor causing crash. Attachment is a debug patch that add more messages and code trace, please help to reproduce problem with patches of [2] and attachment. If your kernel enables dynamic debug, need additional commands to have debug message: sudo bash -c 'echo -n "module rtw89_core +p" > /sys/kernel/debug/dynamic_debug/control' sudo bash -c 'echo -n "module rtw89_pci +p" > /sys/kernel/debug/dynamic_debug/control' Since there are more than one symptoms causing system freeze, please collect four logs as before. Also please give me two logs that system can normally suspend/resume, so I can compare their difference. [1] bcbefbd032 wifi: rtw89: add wait/completion for abort scan [2] https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@realtek.com/ Thanks Ping-ke [-- Attachment #2: 0001-debug-scan-abort.patch --] [-- Type: application/octet-stream, Size: 7181 bytes --] From 0dec70207ab62c87e2f9b3a232166e07c3260f36 Mon Sep 17 00:00:00 2001 From: Ping-Ke Shih <pkshih@realtek.com> Date: Fri, 28 Jun 2024 11:17:59 +0800 Subject: [PATCH] debug scan abort Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> --- drivers/net/wireless/realtek/rtw89/debug.c | 2 +- drivers/net/wireless/realtek/rtw89/fw.c | 5 +++ drivers/net/wireless/realtek/rtw89/mac.c | 4 ++ drivers/net/wireless/realtek/rtw89/mac80211.c | 42 ++++++++++++++++--- drivers/net/wireless/realtek/rtw89/pci.c | 4 ++ 5 files changed, 50 insertions(+), 7 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw89/debug.c b/drivers/net/wireless/realtek/rtw89/debug.c index 49bbbd049316..4709d50b8fa1 100644 --- a/drivers/net/wireless/realtek/rtw89/debug.c +++ b/drivers/net/wireless/realtek/rtw89/debug.c @@ -14,7 +14,7 @@ #include "sar.h" #ifdef CONFIG_RTW89_DEBUGMSG -unsigned int rtw89_debug_mask; +unsigned int rtw89_debug_mask = 0x80021000; EXPORT_SYMBOL(rtw89_debug_mask); module_param_named(debug_mask, rtw89_debug_mask, uint, 0644); MODULE_PARM_DESC(debug_mask, "Debugging mask"); diff --git a/drivers/net/wireless/realtek/rtw89/fw.c b/drivers/net/wireless/realtek/rtw89/fw.c index fbe08c162b93..eecb43ad6735 100644 --- a/drivers/net/wireless/realtek/rtw89/fw.c +++ b/drivers/net/wireless/realtek/rtw89/fw.c @@ -4855,6 +4855,9 @@ int rtw89_fw_h2c_scan_offload(struct rtw89_dev *rtwdev, else cond = RTW89_SCANOFLD_WAIT_COND_STOP; + printk("pk> %s:%d start scan offload abort=%d\n", __func__, __LINE__, + !option->enable); + ret = rtw89_h2c_tx_and_wait(rtwdev, skb, wait, cond); if (ret) { rtw89_debug(rtwdev, RTW89_DBG_FW, "failed to scan ofld\n"); @@ -6267,6 +6270,8 @@ void rtw89_hw_scan_complete(struct rtw89_dev *rtwdev, struct ieee80211_vif *vif, if (!vif) return; + printk("pk> %s:%d abort=%d\n", __func__, __LINE__, scan_info->abort); + rtw89_write32_mask(rtwdev, rtw89_mac_reg_by_idx(rtwdev, mac->rx_fltr, RTW89_MAC_0), B_AX_RX_FLTR_CFG_MASK, diff --git a/drivers/net/wireless/realtek/rtw89/mac.c b/drivers/net/wireless/realtek/rtw89/mac.c index 73462f3343e3..48ef521618e2 100644 --- a/drivers/net/wireless/realtek/rtw89/mac.c +++ b/drivers/net/wireless/realtek/rtw89/mac.c @@ -4757,6 +4757,8 @@ rtw89_mac_c2h_scanofld_rsp(struct rtw89_dev *rtwdev, struct sk_buff *skb, } return; case RTW89_SCAN_END_SCAN_NOTIFY: + printk("pk> %s:%d scan end handler abort=%d\n", __func__, __LINE__, + rtwdev->scan_info.abort); if (rtwdev->scan_info.abort) return; @@ -4895,6 +4897,7 @@ rtw89_mac_c2h_done_ack(struct rtw89_dev *rtwdev, struct sk_buff *skb_c2h, u32 le break; case H2C_FUNC_SCANOFLD: cond = RTW89_SCANOFLD_WAIT_COND_START; + printk("pk> %s:%d DACK for scan offload\n", __func__, __LINE__); break; case H2C_FUNC_SCANOFLD_BE: cond = RTW89_SCANOFLD_BE_WAIT_COND_START; @@ -5283,6 +5286,7 @@ static void rtw89_mac_c2h_scanofld_rsp_atomic(struct rtw89_dev *rtwdev, else cond = RTW89_SCANOFLD_WAIT_COND_STOP; + printk("pk> %s:%d scan end ISR\n", __func__, __LINE__); rtw89_complete_cond(fw_ofld_wait, cond, &data); } } diff --git a/drivers/net/wireless/realtek/rtw89/mac80211.c b/drivers/net/wireless/realtek/rtw89/mac80211.c index 41b286da3d59..9f2d42552ed2 100644 --- a/drivers/net/wireless/realtek/rtw89/mac80211.c +++ b/drivers/net/wireless/realtek/rtw89/mac80211.c @@ -60,6 +60,14 @@ static int rtw89_ops_start(struct ieee80211_hw *hw) int ret; mutex_lock(&rtwdev->mutex); + + { + extern void dump_stack(void); + + printk("pk> %s:%d\n", __func__, __LINE__); + dump_stack(); + } + ret = rtw89_core_start(rtwdev); mutex_unlock(&rtwdev->mutex); @@ -71,6 +79,14 @@ static void rtw89_ops_stop(struct ieee80211_hw *hw) struct rtw89_dev *rtwdev = hw->priv; mutex_lock(&rtwdev->mutex); + + { + extern void dump_stack(void); + + printk("pk> %s:%d\n", __func__, __LINE__); + dump_stack(); + } + rtw89_core_stop(rtwdev); mutex_unlock(&rtwdev->mutex); } @@ -112,8 +128,8 @@ static int rtw89_ops_add_interface(struct ieee80211_hw *hw, struct rtw89_vif *rtwvif = (struct rtw89_vif *)vif->drv_priv; int ret = 0; - rtw89_debug(rtwdev, RTW89_DBG_STATE, "add vif %pM type %d, p2p %d\n", - vif->addr, vif->type, vif->p2p); + rtw89_debug(rtwdev, RTW89_DBG_STATE, "add vif %p %pM type %d, p2p %d\n", + vif, vif->addr, vif->type, vif->p2p); mutex_lock(&rtwdev->mutex); @@ -175,8 +191,8 @@ static void rtw89_ops_remove_interface(struct ieee80211_hw *hw, struct rtw89_dev *rtwdev = hw->priv; struct rtw89_vif *rtwvif = (struct rtw89_vif *)vif->drv_priv; - rtw89_debug(rtwdev, RTW89_DBG_STATE, "remove vif %pM type %d p2p %d\n", - vif->addr, vif->type, vif->p2p); + rtw89_debug(rtwdev, RTW89_DBG_STATE, "remove vif %p %pM type %d p2p %d\n", + vif, vif->addr, vif->type, vif->p2p); cancel_work_sync(&rtwvif->update_beacon_work); cancel_delayed_work_sync(&rtwvif->roc.roc_work); @@ -202,8 +218,8 @@ static int rtw89_ops_change_interface(struct ieee80211_hw *hw, set_bit(RTW89_FLAG_CHANGING_INTERFACE, rtwdev->flags); - rtw89_debug(rtwdev, RTW89_DBG_STATE, "change vif %pM (%d)->(%d), p2p (%d)->(%d)\n", - vif->addr, vif->type, type, vif->p2p, p2p); + rtw89_debug(rtwdev, RTW89_DBG_STATE, "change vif %p %pM (%d)->(%d), p2p (%d)->(%d)\n", + vif, vif->addr, vif->type, type, vif->p2p, p2p); rtw89_ops_remove_interface(hw, vif); @@ -882,6 +898,13 @@ static int rtw89_ops_hw_scan(struct ieee80211_hw *hw, struct ieee80211_vif *vif, struct rtw89_vif *rtwvif = vif_to_rtwvif_safe(vif); int ret = 0; + { + extern void dump_stack(void); + + printk("pk> %s:%d vif=%p\n", __func__, __LINE__, vif); + dump_stack(); + } + if (!RTW89_CHK_FW_FEATURE(SCAN_OFFLOAD, &rtwdev->fw)) return 1; @@ -905,6 +928,13 @@ static void rtw89_ops_cancel_hw_scan(struct ieee80211_hw *hw, { struct rtw89_dev *rtwdev = hw->priv; + { + extern void dump_stack(void); + + printk("pk> %s:%d vif=%p\n", __func__, __LINE__, vif); + dump_stack(); + } + if (!RTW89_CHK_FW_FEATURE(SCAN_OFFLOAD, &rtwdev->fw)) return; diff --git a/drivers/net/wireless/realtek/rtw89/pci.c b/drivers/net/wireless/realtek/rtw89/pci.c index 02afeb3acce4..c95fa9e66cc4 100644 --- a/drivers/net/wireless/realtek/rtw89/pci.c +++ b/drivers/net/wireless/realtek/rtw89/pci.c @@ -4160,6 +4160,8 @@ static int __maybe_unused rtw89_pci_suspend(struct device *dev) struct rtw89_dev *rtwdev = hw->priv; enum rtw89_core_chip_id chip_id = rtwdev->chip->chip_id; + printk("pk> %s:%d\n", __func__, __LINE__); + rtw89_write32_set(rtwdev, R_AX_RSV_CTRL, B_AX_WLOCK_1C_BIT6); rtw89_write32_set(rtwdev, R_AX_RSV_CTRL, B_AX_R_DIS_PRST); rtw89_write32_clr(rtwdev, R_AX_RSV_CTRL, B_AX_WLOCK_1C_BIT6); @@ -4194,6 +4196,8 @@ static int __maybe_unused rtw89_pci_resume(struct device *dev) struct rtw89_dev *rtwdev = hw->priv; enum rtw89_core_chip_id chip_id = rtwdev->chip->chip_id; + printk("pk> %s:%d\n", __func__, __LINE__); + rtw89_write32_set(rtwdev, R_AX_RSV_CTRL, B_AX_WLOCK_1C_BIT6); rtw89_write32_clr(rtwdev, R_AX_RSV_CTRL, B_AX_R_DIS_PRST); rtw89_write32_clr(rtwdev, R_AX_RSV_CTRL, B_AX_WLOCK_1C_BIT6); -- 2.25.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
end of thread, other threads:[~2024-07-12 0:59 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-08 15:55 [REGRESSION] Freeze on resume from S3 (bisected) Forty Five
[not found] <draft-87msmrdgkb.fsf@gmail.com>
2024-07-08 16:30 ` Forty Five
2024-07-09 1:26 ` Ping-Ke Shih
2024-07-09 4:10 ` Forty Five
2024-07-09 4:25 ` Ping-Ke Shih
2024-07-09 11:49 ` Forty Five
2024-07-11 7:54 ` Forty Five
2024-07-12 0:59 ` Ping-Ke Shih
-- strict thread matches above, loose matches on Subject: below --
2024-07-01 6:15 Forty Five
[not found] <875xtqjli4.fsf@gmail.com>
2024-06-30 19:20 ` Forty Five
2024-07-01 2:46 ` Ping-Ke Shih
2024-07-01 5:36 ` Ping-Ke Shih
2024-06-30 19:11 Forty Five
2024-07-03 7:39 ` Ping-Ke Shih
2024-06-19 4:39 Forty Five
2024-06-19 6:07 ` Ping-Ke Shih
2024-06-19 14:46 ` Forty Five
2024-06-20 8:16 ` Ping-Ke Shih
2024-06-20 8:56 ` Kalle Valo
2024-06-20 9:06 ` Ping-Ke Shih
2024-06-20 9:18 ` Mathew George
2024-06-20 9:33 ` Ping-Ke Shih
2024-06-20 10:05 ` Mathew George
2024-06-20 11:41 ` Ping-Ke Shih
2024-06-20 11:58 ` Johannes Berg
2024-06-20 13:05 ` Forty Five
2024-06-20 13:41 ` Forty Five
2024-06-28 3:55 ` Ping-Ke Shih
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).