public inbox for linux-bluetooth@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] Bluetooth: qca: Fix delayed hw_error handling due to missing wakeup during SSR
@ 2026-03-27  8:32 Shuai Zhang
  2026-03-27  9:34 ` [v4] " bluez.test.bot
  2026-03-27 17:51 ` [PATCH v4] " Luiz Augusto von Dentz
  0 siblings, 2 replies; 3+ messages in thread
From: Shuai Zhang @ 2026-03-27  8:32 UTC (permalink / raw)
  To: Bartosz Golaszewski, Marcel Holtmann, Luiz Augusto von Dentz
  Cc: linux-arm-msm, linux-bluetooth, linux-kernel, cheng.jiang,
	quic_chezhou, wei.deng, jinwang.li, mengshi.wu, shuai.zhang,
	Shuai Zhang, Bartosz Golaszewski

From: Shuai Zhang <quic_shuaz@quicinc.com>

When Bluetooth controller encounters a coredump, it triggers
the Subsystem Restart (SSR) mechanism. The controller first
reports the coredump data, and once the data upload is complete,
it sends a hw_error event. The host relies on this event to
proceed with subsequent recovery actions.

If the host has not finished processing the coredump data
when the hw_error event is received,
it sets a timer to wait until either the data processing is complete
or the timeout expires before handling the event.

The current implementation lacks a wakeup trigger. As a result,
even if the coredump data has already been processed, the host
continues to wait until the timer expires, causing unnecessary
delays in handling the hw_error event.

To fix this issue, adds a `wake_up_bit()` call after the host finishes
processing the coredump data. This ensures that the waiting thread is
promptly notified and can proceed to handle the hw_error event without
waiting for the timeout.

Test case:
- Trigger controller coredump using the command: `hcitool cmd 0x3f 0c 26`.
- Use `btmon` to capture HCI logs.
- Observe the time interval between receiving the hw_error event
and the execution of the power-off sequence in the HCI log.

Signed-off-by: Shuai Zhang <quic_shuaz@quicinc.com>
Link: https://lore.kernel.org/stable/20251107033924.3707495-2-quic_shuaz%40quicinc.com
Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
---
Changes v4:
- add Acked-by signoff
- Link to v3
  https://lore.kernel.org/all/20251107033924.3707495-1-quic_shuaz@quicinc.com/

Changes v3:
- add Fixes tag
- Link to v2
  https://lore.kernel.org/all/20251106140103.1406081-1-quic_shuaz@quicinc.com/

Changes v2:
- Split timeout conversion into a separate patch.
- Clarified commit messages and added test case description.
- Link to v1
  https://lore.kernel.org/all/20251104112601.2670019-1-quic_shuaz@quicinc.com/
---
 drivers/bluetooth/hci_qca.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index c17a462ae..228a754a9 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -1108,7 +1108,7 @@ static void qca_controller_memdump(struct work_struct *work)
 				qca->qca_memdump = NULL;
 				qca->memdump_state = QCA_MEMDUMP_COLLECTED;
 				cancel_delayed_work(&qca->ctrl_memdump_timeout);
-				clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
+				clear_and_wake_up_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
 				clear_bit(QCA_IBS_DISABLED, &qca->flags);
 				mutex_unlock(&qca->hci_memdump_lock);
 				return;
@@ -1186,7 +1186,7 @@ static void qca_controller_memdump(struct work_struct *work)
 			kfree(qca->qca_memdump);
 			qca->qca_memdump = NULL;
 			qca->memdump_state = QCA_MEMDUMP_COLLECTED;
-			clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
+			clear_and_wake_up_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
 		}
 
 		mutex_unlock(&qca->hci_memdump_lock);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* RE: [v4] Bluetooth: qca: Fix delayed hw_error handling due to missing wakeup during SSR
  2026-03-27  8:32 [PATCH v4] Bluetooth: qca: Fix delayed hw_error handling due to missing wakeup during SSR Shuai Zhang
@ 2026-03-27  9:34 ` bluez.test.bot
  2026-03-27 17:51 ` [PATCH v4] " Luiz Augusto von Dentz
  1 sibling, 0 replies; 3+ messages in thread
From: bluez.test.bot @ 2026-03-27  9:34 UTC (permalink / raw)
  To: linux-bluetooth, shuai.zhang

[-- Attachment #1: Type: text/plain, Size: 2833 bytes --]

This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=1073417

---Test result---

Test Summary:
CheckPatch                    PENDING   0.34 seconds
GitLint                       PENDING   0.23 seconds
SubjectPrefix                 PASS      0.12 seconds
BuildKernel                   PASS      26.06 seconds
CheckAllWarning               PASS      28.27 seconds
CheckSparse                   PASS      27.47 seconds
BuildKernel32                 PASS      25.09 seconds
TestRunnerSetup               PASS      568.15 seconds
TestRunner_l2cap-tester       PASS      28.15 seconds
TestRunner_iso-tester         FAIL      31.79 seconds
TestRunner_bnep-tester        PASS      6.39 seconds
TestRunner_mgmt-tester        FAIL      113.48 seconds
TestRunner_rfcomm-tester      PASS      9.45 seconds
TestRunner_sco-tester         FAIL      14.35 seconds
TestRunner_ioctl-tester       PASS      10.19 seconds
TestRunner_mesh-tester        FAIL      12.43 seconds
TestRunner_smp-tester         PASS      8.58 seconds
TestRunner_userchan-tester    PASS      6.87 seconds
IncrementalBuild              PENDING   0.58 seconds

Details
##############################
Test: CheckPatch - PENDING
Desc: Run checkpatch.pl script
Output:

##############################
Test: GitLint - PENDING
Desc: Run gitlint
Output:

##############################
Test: TestRunner_iso-tester - FAIL
Desc: Run iso-tester with test-runner
Output:
BUG: KASAN: slab-use-after-free in le_read_features_complete+0x7e/0x2b0
Total: 141, Passed: 141 (100.0%), Failed: 0, Not Run: 0
##############################
Test: TestRunner_mgmt-tester - FAIL
Desc: Run mgmt-tester with test-runner
Output:
Total: 494, Passed: 489 (99.0%), Failed: 1, Not Run: 4

Failed Test Cases
Read Exp Feature - Success                           Failed       0.105 seconds
##############################
Test: TestRunner_sco-tester - FAIL
Desc: Run sco-tester with test-runner
Output:
WARNING: possible circular locking dependency detected
BUG: sleeping function called from invalid context at net/core/sock.c:3782
Total: 30, Passed: 30 (100.0%), Failed: 0, Not Run: 0
##############################
Test: TestRunner_mesh-tester - FAIL
Desc: Run mesh-tester with test-runner
Output:
Total: 10, Passed: 8 (80.0%), Failed: 2, Not Run: 0

Failed Test Cases
Mesh - Send cancel - 1                               Timed out    2.747 seconds
Mesh - Send cancel - 2                               Timed out    1.997 seconds
##############################
Test: IncrementalBuild - PENDING
Desc: Incremental build with the patches in the series
Output:



---
Regards,
Linux Bluetooth


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] Bluetooth: qca: Fix delayed hw_error handling due to missing wakeup during SSR
  2026-03-27  8:32 [PATCH v4] Bluetooth: qca: Fix delayed hw_error handling due to missing wakeup during SSR Shuai Zhang
  2026-03-27  9:34 ` [v4] " bluez.test.bot
@ 2026-03-27 17:51 ` Luiz Augusto von Dentz
  1 sibling, 0 replies; 3+ messages in thread
From: Luiz Augusto von Dentz @ 2026-03-27 17:51 UTC (permalink / raw)
  To: Shuai Zhang
  Cc: Bartosz Golaszewski, Marcel Holtmann, linux-arm-msm,
	linux-bluetooth, linux-kernel, cheng.jiang, quic_chezhou,
	wei.deng, jinwang.li, mengshi.wu, Shuai Zhang,
	Bartosz Golaszewski

Hi Shuai,

On Fri, Mar 27, 2026 at 4:33 AM Shuai Zhang
<shuai.zhang@oss.qualcomm.com> wrote:
>
> From: Shuai Zhang <quic_shuaz@quicinc.com>
>
> When Bluetooth controller encounters a coredump, it triggers
> the Subsystem Restart (SSR) mechanism. The controller first
> reports the coredump data, and once the data upload is complete,
> it sends a hw_error event. The host relies on this event to
> proceed with subsequent recovery actions.
>
> If the host has not finished processing the coredump data
> when the hw_error event is received,
> it sets a timer to wait until either the data processing is complete
> or the timeout expires before handling the event.
>
> The current implementation lacks a wakeup trigger. As a result,
> even if the coredump data has already been processed, the host
> continues to wait until the timer expires, causing unnecessary
> delays in handling the hw_error event.
>
> To fix this issue, adds a `wake_up_bit()` call after the host finishes
> processing the coredump data. This ensures that the waiting thread is
> promptly notified and can proceed to handle the hw_error event without
> waiting for the timeout.
>
> Test case:
> - Trigger controller coredump using the command: `hcitool cmd 0x3f 0c 26`.
> - Use `btmon` to capture HCI logs.
> - Observe the time interval between receiving the hw_error event
> and the execution of the power-off sequence in the HCI log.
>
> Signed-off-by: Shuai Zhang <quic_shuaz@quicinc.com>
> Link: https://lore.kernel.org/stable/20251107033924.3707495-2-quic_shuaz%40quicinc.com
> Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> ---
> Changes v4:
> - add Acked-by signoff
> - Link to v3
>   https://lore.kernel.org/all/20251107033924.3707495-1-quic_shuaz@quicinc.com/
>
> Changes v3:
> - add Fixes tag
> - Link to v2
>   https://lore.kernel.org/all/20251106140103.1406081-1-quic_shuaz@quicinc.com/
>
> Changes v2:
> - Split timeout conversion into a separate patch.
> - Clarified commit messages and added test case description.
> - Link to v1
>   https://lore.kernel.org/all/20251104112601.2670019-1-quic_shuaz@quicinc.com/
> ---
>  drivers/bluetooth/hci_qca.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index c17a462ae..228a754a9 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -1108,7 +1108,7 @@ static void qca_controller_memdump(struct work_struct *work)
>                                 qca->qca_memdump = NULL;
>                                 qca->memdump_state = QCA_MEMDUMP_COLLECTED;
>                                 cancel_delayed_work(&qca->ctrl_memdump_timeout);
> -                               clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
> +                               clear_and_wake_up_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
>                                 clear_bit(QCA_IBS_DISABLED, &qca->flags);
>                                 mutex_unlock(&qca->hci_memdump_lock);
>                                 return;
> @@ -1186,7 +1186,7 @@ static void qca_controller_memdump(struct work_struct *work)
>                         kfree(qca->qca_memdump);
>                         qca->qca_memdump = NULL;
>                         qca->memdump_state = QCA_MEMDUMP_COLLECTED;
> -                       clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
> +                       clear_and_wake_up_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
>                 }
>
>                 mutex_unlock(&qca->hci_memdump_lock);
> --
> 2.34.1

https://sashiko.dev/#/patchset/20260327083258.1398450-1-shuai.zhang%40oss.qualcomm.com

Not saying the feedback is actually valid, but if there are other part
of the code still using clear_bit(QCA_MEMDUMP_COLLECTION then perhaps
they should be updated as well?

-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-27 17:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27  8:32 [PATCH v4] Bluetooth: qca: Fix delayed hw_error handling due to missing wakeup during SSR Shuai Zhang
2026-03-27  9:34 ` [v4] " bluez.test.bot
2026-03-27 17:51 ` [PATCH v4] " Luiz Augusto von Dentz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox