* [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
@ 2026-05-13 18:55 Heitor Alves de Siqueira
2026-05-13 20:45 ` bluez.test.bot
2026-05-14 2:04 ` [PATCH] " Hillf Danton
0 siblings, 2 replies; 8+ messages in thread
From: Heitor Alves de Siqueira @ 2026-05-13 18:55 UTC (permalink / raw)
To: Marcel Holtmann, Luiz Augusto von Dentz, Gustavo Padovan
Cc: linux-bluetooth, linux-kernel, kernel-dev,
syzbot+97721dd81f792e838ba0
Syzbot reported a warning when L2CAP calls queue_work() on the hdev
workqueue while it's being drained. This can happen during device reset or
close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
- hci_dev_close_sync() clears the HCI_UP bit before draining
- hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
Add these checks before queuing tx_work, and free the SKB if it's not
queued for transmission.
Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
---
net/bluetooth/hci_core.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index c46c1236ebfa..5d5f8ad7d1a8 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
+ if (!test_bit(HCI_UP, &hdev->flags) ||
+ hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+ kfree_skb(skb);
+ return;
+ }
+
hci_queue_acl(chan, &chan->data_q, skb, flags);
queue_work(hdev->workqueue, &hdev->tx_work);
@@ -3291,6 +3297,12 @@ void hci_send_sco(struct hci_conn *conn, struct sk_buff *skb)
BT_DBG("%s len %d", hdev->name, skb->len);
+ if (!test_bit(HCI_UP, &hdev->flags) ||
+ hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+ kfree_skb(skb);
+ return;
+ }
+
hdr.handle = cpu_to_le16(conn->handle);
hdr.dlen = skb->len;
@@ -3374,6 +3386,12 @@ void hci_send_iso(struct hci_conn *conn, struct sk_buff *skb)
BT_DBG("%s len %d", hdev->name, skb->len);
+ if (!test_bit(HCI_UP, &hdev->flags) ||
+ hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+ kfree_skb(skb);
+ return;
+ }
+
hci_queue_iso(conn, &conn->data_q, skb);
queue_work(hdev->workqueue, &hdev->tx_work);
---
base-commit: 1f63dd8ca0dc05a8272bb8155f643c691d29bb11
change-id: 20260513-hci_send-640290de7acc
Best regards,
--
Heitor Alves de Siqueira <halves@igalia.com>
^ permalink raw reply related [flat|nested] 8+ messages in thread
* RE: Bluetooth: hci_core: Don't queue tx_work while draining workqueue
2026-05-13 18:55 [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue Heitor Alves de Siqueira
@ 2026-05-13 20:45 ` bluez.test.bot
2026-05-14 2:04 ` [PATCH] " Hillf Danton
1 sibling, 0 replies; 8+ messages in thread
From: bluez.test.bot @ 2026-05-13 20:45 UTC (permalink / raw)
To: linux-bluetooth, halves
[-- Attachment #1: Type: text/plain, Size: 2039 bytes --]
This is automated email and please do not reply to this email!
Dear submitter,
Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=1094429
---Test result---
Test Summary:
CheckPatch PASS 0.74 seconds
GitLint FAIL 0.34 seconds
SubjectPrefix PASS 0.22 seconds
BuildKernel PASS 25.24 seconds
CheckAllWarning PASS 27.80 seconds
CheckSparse PASS 26.72 seconds
BuildKernel32 PASS 24.56 seconds
TestRunnerSetup PASS 528.86 seconds
TestRunner_l2cap-tester PASS 374.94 seconds
TestRunner_iso-tester PASS 604.66 seconds
TestRunner_bnep-tester PASS 19.00 seconds
TestRunner_mgmt-tester PASS 2024.13 seconds
TestRunner_rfcomm-tester PASS 63.77 seconds
TestRunner_sco-tester PASS 141.62 seconds
TestRunner_ioctl-tester PASS 134.21 seconds
TestRunner_mesh-tester PASS 59.93 seconds
TestRunner_smp-tester PASS 18.06 seconds
TestRunner_userchan-tester PASS 19.33 seconds
TestRunner_6lowpan-tester PASS 51.00 seconds
IncrementalBuild PASS 24.65 seconds
Details
##############################
Test: GitLint - FAIL
Desc: Run gitlint
Output:
Bluetooth: hci_core: Don't queue tx_work while draining workqueue
WARNING: I3 - ignore-body-lines: gitlint will be switching from using Python regex 'match' (match beginning) to 'search' (match anywhere) semantics. Please review your ignore-body-lines.regex option accordingly. To remove this warning, set general.regex-style-search=True. More details: https://jorisroovers.github.io/gitlint/configuration/#regex-style-search
27: B2 Line has trailing whitespace: "-- "
https://github.com/bluez/bluetooth-next/pull/185
---
Regards,
Linux Bluetooth
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
2026-05-13 18:55 [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue Heitor Alves de Siqueira
2026-05-13 20:45 ` bluez.test.bot
@ 2026-05-14 2:04 ` Hillf Danton
2026-05-14 14:53 ` Heitor Alves de Siqueira
1 sibling, 1 reply; 8+ messages in thread
From: Hillf Danton @ 2026-05-14 2:04 UTC (permalink / raw)
To: Heitor Alves de Siqueira
Cc: Marcel Holtmann, Luiz Augusto von Dentz, Gustavo Padovan,
linux-bluetooth, linux-kernel, kernel-dev, syzkaller-bugs,
syzbot+97721dd81f792e838ba0
On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
> workqueue while it's being drained. This can happen during device reset or
> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
>
> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
> - hci_dev_close_sync() clears the HCI_UP bit before draining
> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
>
> Add these checks before queuing tx_work, and free the SKB if it's not
> queued for transmission.
>
> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
> ---
> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index c46c1236ebfa..5d5f8ad7d1a8 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
>
> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
>
> + if (!test_bit(HCI_UP, &hdev->flags) ||
> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
> + kfree_skb(skb);
> + return;
> + }
> +
> hci_queue_acl(chan, &chan->data_q, skb, flags);
>
> queue_work(hdev->workqueue, &hdev->tx_work);
>
What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
2026-05-14 2:04 ` [PATCH] " Hillf Danton
@ 2026-05-14 14:53 ` Heitor Alves de Siqueira
2026-05-15 13:33 ` Luiz Augusto von Dentz
0 siblings, 1 reply; 8+ messages in thread
From: Heitor Alves de Siqueira @ 2026-05-14 14:53 UTC (permalink / raw)
To: Hillf Danton
Cc: Marcel Holtmann, Luiz Augusto von Dentz, Gustavo Padovan,
linux-bluetooth, linux-kernel, kernel-dev, syzkaller-bugs,
syzbot+97721dd81f792e838ba0
On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
> On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
>> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
>> workqueue while it's being drained. This can happen during device reset or
>> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
>>
>> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
>> - hci_dev_close_sync() clears the HCI_UP bit before draining
>> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
>>
>> Add these checks before queuing tx_work, and free the SKB if it's not
>> queued for transmission.
>>
>> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
>> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
>> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
>> ---
>> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>>
>> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> index c46c1236ebfa..5d5f8ad7d1a8 100644
>> --- a/net/bluetooth/hci_core.c
>> +++ b/net/bluetooth/hci_core.c
>> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
>>
>> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
>>
>> + if (!test_bit(HCI_UP, &hdev->flags) ||
>> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
>> + kfree_skb(skb);
>> + return;
>> + }
>> +
>> hci_queue_acl(chan, &chan->data_q, skb, flags);
>>
>> queue_work(hdev->workqueue, &hdev->tx_work);
>>
> What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
> checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
I see, I missed the RCU guards for the device flags. Sorry about that,
I'll add them to v2.
Thanks for the catch!
Best,
Heitor
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
2026-05-14 14:53 ` Heitor Alves de Siqueira
@ 2026-05-15 13:33 ` Luiz Augusto von Dentz
2026-05-18 15:21 ` Heitor Alves de Siqueira
0 siblings, 1 reply; 8+ messages in thread
From: Luiz Augusto von Dentz @ 2026-05-15 13:33 UTC (permalink / raw)
To: Heitor Alves de Siqueira
Cc: Hillf Danton, Marcel Holtmann, Gustavo Padovan, linux-bluetooth,
linux-kernel, kernel-dev, syzkaller-bugs,
syzbot+97721dd81f792e838ba0
Hi Heitor,
On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
<halves@igalia.com> wrote:
>
> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
> >> workqueue while it's being drained. This can happen during device reset or
> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
> >>
> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
> >> - hci_dev_close_sync() clears the HCI_UP bit before draining
> >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
> >>
> >> Add these checks before queuing tx_work, and free the SKB if it's not
> >> queued for transmission.
> >>
> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
> >> ---
> >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
> >> 1 file changed, 18 insertions(+)
> >>
> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
> >> --- a/net/bluetooth/hci_core.c
> >> +++ b/net/bluetooth/hci_core.c
> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
> >>
> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
> >>
> >> + if (!test_bit(HCI_UP, &hdev->flags) ||
> >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
> >> + kfree_skb(skb);
> >> + return;
> >> + }
> >> +
> >> hci_queue_acl(chan, &chan->data_q, skb, flags);
> >>
> >> queue_work(hdev->workqueue, &hdev->tx_work);
> >>
> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
>
> I see, I missed the RCU guards for the device flags. Sorry about that,
> I'll add them to v2.
> Thanks for the catch!
Actually this whole thing might be because we try to clean the
workqueue without actually closing the hdev, so I suspect that if we
just remove all the code from hci_dev_do_reset with
hci_dev_do_close+hci_dev_do_open, it might work better and align with
how things work over the MGMT interface:
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index c46c1236ebfa..6782bbc9b6a7 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
hci_req_sync_lock(hdev);
- /* Drop queues */
- skb_queue_purge(&hdev->rx_q);
- skb_queue_purge(&hdev->cmd_q);
-
- /* Cancel these to avoid queueing non-chained pending work */
- hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
- /* Wait for
- *
- * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
- * queue_delayed_work(&hdev->{cmd,ncmd}_timer)
- *
- * inside RCU section to see the flag or complete scheduling.
- */
- synchronize_rcu();
- /* Explicitly cancel works in case scheduled after setting the flag. */
- cancel_delayed_work(&hdev->cmd_timer);
- cancel_delayed_work(&hdev->ncmd_timer);
-
- /* Avoid potential lockdep warnings from the *_flush() calls by
- * ensuring the workqueue is empty up front.
- */
- drain_workqueue(hdev->workqueue);
-
- hci_dev_lock(hdev);
- hci_inquiry_cache_flush(hdev);
- hci_conn_hash_flush(hdev);
- hci_dev_unlock(hdev);
-
- if (hdev->flush)
- hdev->flush(hdev);
-
- hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
-
- atomic_set(&hdev->cmd_cnt, 1);
- hdev->acl_cnt = 0;
- hdev->sco_cnt = 0;
- hdev->le_cnt = 0;
- hdev->iso_cnt = 0;
+ ret = hci_dev_close_sync(hdev);
+ if (ret)
+ goto done;
- ret = hci_reset_sync(hdev);
+ ret = hci_dev_open_sync(hdev);
+done:
hci_req_sync_unlock(hdev);
return ret;
}
Seem to work here and as a side effect we get notified over the MGMT
when a user uses hciconfig hci0 reset:
# tools/hciconfig hci0 reset
bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
has been disabled
bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
has been enabled
bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged
> Best,
> Heitor
--
Luiz Augusto von Dentz
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
2026-05-15 13:33 ` Luiz Augusto von Dentz
@ 2026-05-18 15:21 ` Heitor Alves de Siqueira
2026-05-18 20:48 ` Luiz Augusto von Dentz
0 siblings, 1 reply; 8+ messages in thread
From: Heitor Alves de Siqueira @ 2026-05-18 15:21 UTC (permalink / raw)
To: Luiz Augusto von Dentz
Cc: Hillf Danton, Marcel Holtmann, Gustavo Padovan, linux-bluetooth,
linux-kernel, kernel-dev, syzkaller-bugs,
syzbot+97721dd81f792e838ba0
Hi Luiz,
On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote:
> Hi Heitor,
>
> On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
> <halves@igalia.com> wrote:
>>
>> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
>> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
>> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
>> >> workqueue while it's being drained. This can happen during device reset or
>> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
>> >>
>> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
>> >> - hci_dev_close_sync() clears the HCI_UP bit before draining
>> >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
>> >>
>> >> Add these checks before queuing tx_work, and free the SKB if it's not
>> >> queued for transmission.
>> >>
>> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
>> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
>> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
>> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
>> >> ---
>> >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
>> >> 1 file changed, 18 insertions(+)
>> >>
>> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
>> >> --- a/net/bluetooth/hci_core.c
>> >> +++ b/net/bluetooth/hci_core.c
>> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
>> >>
>> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
>> >>
>> >> + if (!test_bit(HCI_UP, &hdev->flags) ||
>> >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
>> >> + kfree_skb(skb);
>> >> + return;
>> >> + }
>> >> +
>> >> hci_queue_acl(chan, &chan->data_q, skb, flags);
>> >>
>> >> queue_work(hdev->workqueue, &hdev->tx_work);
>> >>
>> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
>> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
>>
>> I see, I missed the RCU guards for the device flags. Sorry about that,
>> I'll add them to v2.
>> Thanks for the catch!
>
> Actually this whole thing might be because we try to clean the
> workqueue without actually closing the hdev, so I suspect that if we
> just remove all the code from hci_dev_do_reset with
> hci_dev_do_close+hci_dev_do_open, it might work better and align with
> how things work over the MGMT interface:
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index c46c1236ebfa..6782bbc9b6a7 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
>
> hci_req_sync_lock(hdev);
>
> - /* Drop queues */
> - skb_queue_purge(&hdev->rx_q);
> - skb_queue_purge(&hdev->cmd_q);
> -
> - /* Cancel these to avoid queueing non-chained pending work */
> - hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> - /* Wait for
> - *
> - * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> - * queue_delayed_work(&hdev->{cmd,ncmd}_timer)
> - *
> - * inside RCU section to see the flag or complete scheduling.
> - */
> - synchronize_rcu();
> - /* Explicitly cancel works in case scheduled after setting the flag. */
> - cancel_delayed_work(&hdev->cmd_timer);
> - cancel_delayed_work(&hdev->ncmd_timer);
> -
> - /* Avoid potential lockdep warnings from the *_flush() calls by
> - * ensuring the workqueue is empty up front.
> - */
> - drain_workqueue(hdev->workqueue);
> -
> - hci_dev_lock(hdev);
> - hci_inquiry_cache_flush(hdev);
> - hci_conn_hash_flush(hdev);
> - hci_dev_unlock(hdev);
> -
> - if (hdev->flush)
> - hdev->flush(hdev);
> -
> - hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> -
> - atomic_set(&hdev->cmd_cnt, 1);
> - hdev->acl_cnt = 0;
> - hdev->sco_cnt = 0;
> - hdev->le_cnt = 0;
> - hdev->iso_cnt = 0;
> + ret = hci_dev_close_sync(hdev);
> + if (ret)
> + goto done;
>
> - ret = hci_reset_sync(hdev);
> + ret = hci_dev_open_sync(hdev);
>
> +done:
> hci_req_sync_unlock(hdev);
> return ret;
> }
>
> Seem to work here and as a side effect we get notified over the MGMT
> when a user uses hciconfig hci0 reset:
>
> # tools/hciconfig hci0 reset
> bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
> bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
> bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
> has been disabled
> bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
> bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
> bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
> has been enabled
> bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
> bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged
Thank you for the review, and for the suggestion! There seems to be a
number of similar syzbot reports for the hdev workqueue, so maybe the
close+open approach is a simpler solution indeed. I've originally
considered implementing a wrapper/helper for workqueue submission in
hci_core.c, but if we can eliminate the race condition altogether that'd
be even better.
My only concern is that there seem to be slight differences between
what hci_dev_do_reset() does now and what we'd get with
hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use
HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be
suitable for the stable kernels? And if not, do you think the checks
from v1 would be appropriate, in that case?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
2026-05-18 15:21 ` Heitor Alves de Siqueira
@ 2026-05-18 20:48 ` Luiz Augusto von Dentz
2026-05-20 14:26 ` Heitor Alves de Siqueira
0 siblings, 1 reply; 8+ messages in thread
From: Luiz Augusto von Dentz @ 2026-05-18 20:48 UTC (permalink / raw)
To: Heitor Alves de Siqueira
Cc: Hillf Danton, Marcel Holtmann, Gustavo Padovan, linux-bluetooth,
linux-kernel, kernel-dev, syzkaller-bugs,
syzbot+97721dd81f792e838ba0
Hi Heitor,
On Mon, May 18, 2026 at 11:21 AM Heitor Alves de Siqueira
<halves@igalia.com> wrote:
>
> Hi Luiz,
>
> On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote:
> > Hi Heitor,
> >
> > On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
> > <halves@igalia.com> wrote:
> >>
> >> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
> >> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
> >> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
> >> >> workqueue while it's being drained. This can happen during device reset or
> >> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
> >> >>
> >> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
> >> >> - hci_dev_close_sync() clears the HCI_UP bit before draining
> >> >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
> >> >>
> >> >> Add these checks before queuing tx_work, and free the SKB if it's not
> >> >> queued for transmission.
> >> >>
> >> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
> >> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
> >> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
> >> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
> >> >> ---
> >> >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
> >> >> 1 file changed, 18 insertions(+)
> >> >>
> >> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> >> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
> >> >> --- a/net/bluetooth/hci_core.c
> >> >> +++ b/net/bluetooth/hci_core.c
> >> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
> >> >>
> >> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
> >> >>
> >> >> + if (!test_bit(HCI_UP, &hdev->flags) ||
> >> >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
> >> >> + kfree_skb(skb);
> >> >> + return;
> >> >> + }
> >> >> +
> >> >> hci_queue_acl(chan, &chan->data_q, skb, flags);
> >> >>
> >> >> queue_work(hdev->workqueue, &hdev->tx_work);
> >> >>
> >> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
> >> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
> >>
> >> I see, I missed the RCU guards for the device flags. Sorry about that,
> >> I'll add them to v2.
> >> Thanks for the catch!
> >
> > Actually this whole thing might be because we try to clean the
> > workqueue without actually closing the hdev, so I suspect that if we
> > just remove all the code from hci_dev_do_reset with
> > hci_dev_do_close+hci_dev_do_open, it might work better and align with
> > how things work over the MGMT interface:
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index c46c1236ebfa..6782bbc9b6a7 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
> >
> > hci_req_sync_lock(hdev);
> >
> > - /* Drop queues */
> > - skb_queue_purge(&hdev->rx_q);
> > - skb_queue_purge(&hdev->cmd_q);
> > -
> > - /* Cancel these to avoid queueing non-chained pending work */
> > - hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> > - /* Wait for
> > - *
> > - * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> > - * queue_delayed_work(&hdev->{cmd,ncmd}_timer)
> > - *
> > - * inside RCU section to see the flag or complete scheduling.
> > - */
> > - synchronize_rcu();
> > - /* Explicitly cancel works in case scheduled after setting the flag. */
> > - cancel_delayed_work(&hdev->cmd_timer);
> > - cancel_delayed_work(&hdev->ncmd_timer);
> > -
> > - /* Avoid potential lockdep warnings from the *_flush() calls by
> > - * ensuring the workqueue is empty up front.
> > - */
> > - drain_workqueue(hdev->workqueue);
> > -
> > - hci_dev_lock(hdev);
> > - hci_inquiry_cache_flush(hdev);
> > - hci_conn_hash_flush(hdev);
> > - hci_dev_unlock(hdev);
> > -
> > - if (hdev->flush)
> > - hdev->flush(hdev);
> > -
> > - hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> > -
> > - atomic_set(&hdev->cmd_cnt, 1);
> > - hdev->acl_cnt = 0;
> > - hdev->sco_cnt = 0;
> > - hdev->le_cnt = 0;
> > - hdev->iso_cnt = 0;
> > + ret = hci_dev_close_sync(hdev);
> > + if (ret)
> > + goto done;
> >
> > - ret = hci_reset_sync(hdev);
> > + ret = hci_dev_open_sync(hdev);
> >
> > +done:
> > hci_req_sync_unlock(hdev);
> > return ret;
> > }
> >
> > Seem to work here and as a side effect we get notified over the MGMT
> > when a user uses hciconfig hci0 reset:
> >
> > # tools/hciconfig hci0 reset
> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> > bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
> > bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
> > has been disabled
> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> > bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
> > has been enabled
> > bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
> > bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged
>
> Thank you for the review, and for the suggestion! There seems to be a
> number of similar syzbot reports for the hdev workqueue, so maybe the
> close+open approach is a simpler solution indeed. I've originally
> considered implementing a wrapper/helper for workqueue submission in
> hci_core.c, but if we can eliminate the race condition altogether that'd
> be even better.
>
> My only concern is that there seem to be slight differences between
> what hci_dev_do_reset() does now and what we'd get with
> hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use
> HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be
> suitable for the stable kernels? And if not, do you think the checks
> from v1 would be appropriate, in that case?
I think, the HCI_Reset is a heavy hammer because it's a global reset.
Getting all the pieces to cooperate has been causing all sort of
issues, and in any case I think what hci_dev_do_reset does is simply
wrong or outdated; for instance it doesn't seem to affect LE, such as
cancelling scanning or advertising. This is likely because interfaces
like MGMT don't use it and require a power cycle to reset.
--
Luiz Augusto von Dentz
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
2026-05-18 20:48 ` Luiz Augusto von Dentz
@ 2026-05-20 14:26 ` Heitor Alves de Siqueira
0 siblings, 0 replies; 8+ messages in thread
From: Heitor Alves de Siqueira @ 2026-05-20 14:26 UTC (permalink / raw)
To: Luiz Augusto von Dentz, Heitor Alves de Siqueira
Cc: Hillf Danton, Marcel Holtmann, Gustavo Padovan, linux-bluetooth,
linux-kernel, kernel-dev, syzkaller-bugs,
syzbot+97721dd81f792e838ba0
On Mon May 18, 2026 at 5:48 PM -03, Luiz Augusto von Dentz wrote:
> Hi Heitor,
>
> On Mon, May 18, 2026 at 11:21 AM Heitor Alves de Siqueira
> <halves@igalia.com> wrote:
>>
>> Hi Luiz,
>>
>> On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote:
>> > Hi Heitor,
>> >
>> > On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
>> > <halves@igalia.com> wrote:
>> >>
>> >> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
>> >> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
>> >> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
>> >> >> workqueue while it's being drained. This can happen during device reset or
>> >> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
>> >> >>
>> >> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
>> >> >> - hci_dev_close_sync() clears the HCI_UP bit before draining
>> >> >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
>> >> >>
>> >> >> Add these checks before queuing tx_work, and free the SKB if it's not
>> >> >> queued for transmission.
>> >> >>
>> >> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
>> >> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
>> >> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
>> >> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
>> >> >> ---
>> >> >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
>> >> >> 1 file changed, 18 insertions(+)
>> >> >>
>> >> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> >> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
>> >> >> --- a/net/bluetooth/hci_core.c
>> >> >> +++ b/net/bluetooth/hci_core.c
>> >> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
>> >> >>
>> >> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
>> >> >>
>> >> >> + if (!test_bit(HCI_UP, &hdev->flags) ||
>> >> >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
>> >> >> + kfree_skb(skb);
>> >> >> + return;
>> >> >> + }
>> >> >> +
>> >> >> hci_queue_acl(chan, &chan->data_q, skb, flags);
>> >> >>
>> >> >> queue_work(hdev->workqueue, &hdev->tx_work);
>> >> >>
>> >> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
>> >> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
>> >>
>> >> I see, I missed the RCU guards for the device flags. Sorry about that,
>> >> I'll add them to v2.
>> >> Thanks for the catch!
>> >
>> > Actually this whole thing might be because we try to clean the
>> > workqueue without actually closing the hdev, so I suspect that if we
>> > just remove all the code from hci_dev_do_reset with
>> > hci_dev_do_close+hci_dev_do_open, it might work better and align with
>> > how things work over the MGMT interface:
>> >
>> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> > index c46c1236ebfa..6782bbc9b6a7 100644
>> > --- a/net/bluetooth/hci_core.c
>> > +++ b/net/bluetooth/hci_core.c
>> > @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
>> >
>> > hci_req_sync_lock(hdev);
>> >
>> > - /* Drop queues */
>> > - skb_queue_purge(&hdev->rx_q);
>> > - skb_queue_purge(&hdev->cmd_q);
>> > -
>> > - /* Cancel these to avoid queueing non-chained pending work */
>> > - hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
>> > - /* Wait for
>> > - *
>> > - * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
>> > - * queue_delayed_work(&hdev->{cmd,ncmd}_timer)
>> > - *
>> > - * inside RCU section to see the flag or complete scheduling.
>> > - */
>> > - synchronize_rcu();
>> > - /* Explicitly cancel works in case scheduled after setting the flag. */
>> > - cancel_delayed_work(&hdev->cmd_timer);
>> > - cancel_delayed_work(&hdev->ncmd_timer);
>> > -
>> > - /* Avoid potential lockdep warnings from the *_flush() calls by
>> > - * ensuring the workqueue is empty up front.
>> > - */
>> > - drain_workqueue(hdev->workqueue);
>> > -
>> > - hci_dev_lock(hdev);
>> > - hci_inquiry_cache_flush(hdev);
>> > - hci_conn_hash_flush(hdev);
>> > - hci_dev_unlock(hdev);
>> > -
>> > - if (hdev->flush)
>> > - hdev->flush(hdev);
>> > -
>> > - hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
>> > -
>> > - atomic_set(&hdev->cmd_cnt, 1);
>> > - hdev->acl_cnt = 0;
>> > - hdev->sco_cnt = 0;
>> > - hdev->le_cnt = 0;
>> > - hdev->iso_cnt = 0;
>> > + ret = hci_dev_close_sync(hdev);
>> > + if (ret)
>> > + goto done;
>> >
>> > - ret = hci_reset_sync(hdev);
>> > + ret = hci_dev_open_sync(hdev);
>> >
>> > +done:
>> > hci_req_sync_unlock(hdev);
>> > return ret;
>> > }
>> >
>> > Seem to work here and as a side effect we get notified over the MGMT
>> > when a user uses hciconfig hci0 reset:
>> >
>> > # tools/hciconfig hci0 reset
>> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
>> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
>> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
>> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
>> > bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
>> > bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
>> > has been disabled
>> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
>> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
>> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
>> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
>> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
>> > bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
>> > has been enabled
>> > bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
>> > bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged
>>
>> Thank you for the review, and for the suggestion! There seems to be a
>> number of similar syzbot reports for the hdev workqueue, so maybe the
>> close+open approach is a simpler solution indeed. I've originally
>> considered implementing a wrapper/helper for workqueue submission in
>> hci_core.c, but if we can eliminate the race condition altogether that'd
>> be even better.
>>
>> My only concern is that there seem to be slight differences between
>> what hci_dev_do_reset() does now and what we'd get with
>> hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use
>> HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be
>> suitable for the stable kernels? And if not, do you think the checks
>> from v1 would be appropriate, in that case?
>
> I think, the HCI_Reset is a heavy hammer because it's a global reset.
> Getting all the pieces to cooperate has been causing all sort of
> issues, and in any case I think what hci_dev_do_reset does is simply
> wrong or outdated; for instance it doesn't seem to affect LE, such as
> cancelling scanning or advertising. This is likely because interfaces
> like MGMT don't use it and require a power cycle to reset.
ACK, that makes sense. Please disregard this patch, I'll send a
follow-up with the reset changes and any missing bits from the current
async reset path.
Thank you for the pointers, Luiz!
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-20 14:26 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 18:55 [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue Heitor Alves de Siqueira
2026-05-13 20:45 ` bluez.test.bot
2026-05-14 2:04 ` [PATCH] " Hillf Danton
2026-05-14 14:53 ` Heitor Alves de Siqueira
2026-05-15 13:33 ` Luiz Augusto von Dentz
2026-05-18 15:21 ` Heitor Alves de Siqueira
2026-05-18 20:48 ` Luiz Augusto von Dentz
2026-05-20 14:26 ` Heitor Alves de Siqueira
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.