* [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
@ 2026-05-13 18:55 Heitor Alves de Siqueira
2026-05-13 20:45 ` bluez.test.bot
2026-05-14 2:04 ` [PATCH] " Hillf Danton
0 siblings, 2 replies; 8+ messages in thread
From: Heitor Alves de Siqueira @ 2026-05-13 18:55 UTC (permalink / raw)
To: Marcel Holtmann, Luiz Augusto von Dentz, Gustavo Padovan
Cc: linux-bluetooth, linux-kernel, kernel-dev,
syzbot+97721dd81f792e838ba0
Syzbot reported a warning when L2CAP calls queue_work() on the hdev
workqueue while it's being drained. This can happen during device reset or
close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
- hci_dev_close_sync() clears the HCI_UP bit before draining
- hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
Add these checks before queuing tx_work, and free the SKB if it's not
queued for transmission.
Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
---
net/bluetooth/hci_core.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index c46c1236ebfa..5d5f8ad7d1a8 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
+ if (!test_bit(HCI_UP, &hdev->flags) ||
+ hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+ kfree_skb(skb);
+ return;
+ }
+
hci_queue_acl(chan, &chan->data_q, skb, flags);
queue_work(hdev->workqueue, &hdev->tx_work);
@@ -3291,6 +3297,12 @@ void hci_send_sco(struct hci_conn *conn, struct sk_buff *skb)
BT_DBG("%s len %d", hdev->name, skb->len);
+ if (!test_bit(HCI_UP, &hdev->flags) ||
+ hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+ kfree_skb(skb);
+ return;
+ }
+
hdr.handle = cpu_to_le16(conn->handle);
hdr.dlen = skb->len;
@@ -3374,6 +3386,12 @@ void hci_send_iso(struct hci_conn *conn, struct sk_buff *skb)
BT_DBG("%s len %d", hdev->name, skb->len);
+ if (!test_bit(HCI_UP, &hdev->flags) ||
+ hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+ kfree_skb(skb);
+ return;
+ }
+
hci_queue_iso(conn, &conn->data_q, skb);
queue_work(hdev->workqueue, &hdev->tx_work);
---
base-commit: 1f63dd8ca0dc05a8272bb8155f643c691d29bb11
change-id: 20260513-hci_send-640290de7acc
Best regards,
--
Heitor Alves de Siqueira <halves@igalia.com>
^ permalink raw reply related [flat|nested] 8+ messages in thread* RE: Bluetooth: hci_core: Don't queue tx_work while draining workqueue 2026-05-13 18:55 [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue Heitor Alves de Siqueira @ 2026-05-13 20:45 ` bluez.test.bot 2026-05-14 2:04 ` [PATCH] " Hillf Danton 1 sibling, 0 replies; 8+ messages in thread From: bluez.test.bot @ 2026-05-13 20:45 UTC (permalink / raw) To: linux-bluetooth, halves [-- Attachment #1: Type: text/plain, Size: 2039 bytes --] This is automated email and please do not reply to this email! Dear submitter, Thank you for submitting the patches to the linux bluetooth mailing list. This is a CI test results with your patch series: PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=1094429 ---Test result--- Test Summary: CheckPatch PASS 0.74 seconds GitLint FAIL 0.34 seconds SubjectPrefix PASS 0.22 seconds BuildKernel PASS 25.24 seconds CheckAllWarning PASS 27.80 seconds CheckSparse PASS 26.72 seconds BuildKernel32 PASS 24.56 seconds TestRunnerSetup PASS 528.86 seconds TestRunner_l2cap-tester PASS 374.94 seconds TestRunner_iso-tester PASS 604.66 seconds TestRunner_bnep-tester PASS 19.00 seconds TestRunner_mgmt-tester PASS 2024.13 seconds TestRunner_rfcomm-tester PASS 63.77 seconds TestRunner_sco-tester PASS 141.62 seconds TestRunner_ioctl-tester PASS 134.21 seconds TestRunner_mesh-tester PASS 59.93 seconds TestRunner_smp-tester PASS 18.06 seconds TestRunner_userchan-tester PASS 19.33 seconds TestRunner_6lowpan-tester PASS 51.00 seconds IncrementalBuild PASS 24.65 seconds Details ############################## Test: GitLint - FAIL Desc: Run gitlint Output: Bluetooth: hci_core: Don't queue tx_work while draining workqueue WARNING: I3 - ignore-body-lines: gitlint will be switching from using Python regex 'match' (match beginning) to 'search' (match anywhere) semantics. Please review your ignore-body-lines.regex option accordingly. To remove this warning, set general.regex-style-search=True. More details: https://jorisroovers.github.io/gitlint/configuration/#regex-style-search 27: B2 Line has trailing whitespace: "-- " https://github.com/bluez/bluetooth-next/pull/185 --- Regards, Linux Bluetooth ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue 2026-05-13 18:55 [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue Heitor Alves de Siqueira 2026-05-13 20:45 ` bluez.test.bot @ 2026-05-14 2:04 ` Hillf Danton 2026-05-14 14:53 ` Heitor Alves de Siqueira 1 sibling, 1 reply; 8+ messages in thread From: Hillf Danton @ 2026-05-14 2:04 UTC (permalink / raw) To: Heitor Alves de Siqueira Cc: Marcel Holtmann, Luiz Augusto von Dentz, Gustavo Padovan, linux-bluetooth, linux-kernel, kernel-dev, syzkaller-bugs, syzbot+97721dd81f792e838ba0 On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote: > Syzbot reported a warning when L2CAP calls queue_work() on the hdev > workqueue while it's being drained. This can happen during device reset or > close paths for hci_send_acl(), hci_send_sco() and hci_send_iso(). > > The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync(): > - hci_dev_close_sync() clears the HCI_UP bit before draining > - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining > > Add these checks before queuing tx_work, and free the SKB if it's not > queued for transmission. > > Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue") > Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0 > Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com> > --- > net/bluetooth/hci_core.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > index c46c1236ebfa..5d5f8ad7d1a8 100644 > --- a/net/bluetooth/hci_core.c > +++ b/net/bluetooth/hci_core.c > @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags) > > BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags); > > + if (!test_bit(HCI_UP, &hdev->flags) || > + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) { > + kfree_skb(skb); > + return; > + } > + > hci_queue_acl(chan, &chan->data_q, skb, flags); > > queue_work(hdev->workqueue, &hdev->tx_work); > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is checked in hci_cmd_work(), and in hci_dev_do_reset() for why. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue 2026-05-14 2:04 ` [PATCH] " Hillf Danton @ 2026-05-14 14:53 ` Heitor Alves de Siqueira 2026-05-15 13:33 ` Luiz Augusto von Dentz 0 siblings, 1 reply; 8+ messages in thread From: Heitor Alves de Siqueira @ 2026-05-14 14:53 UTC (permalink / raw) To: Hillf Danton Cc: Marcel Holtmann, Luiz Augusto von Dentz, Gustavo Padovan, linux-bluetooth, linux-kernel, kernel-dev, syzkaller-bugs, syzbot+97721dd81f792e838ba0 On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote: > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote: >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev >> workqueue while it's being drained. This can happen during device reset or >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso(). >> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync(): >> - hci_dev_close_sync() clears the HCI_UP bit before draining >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining >> >> Add these checks before queuing tx_work, and free the SKB if it's not >> queued for transmission. >> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue") >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0 >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com> >> --- >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c >> index c46c1236ebfa..5d5f8ad7d1a8 100644 >> --- a/net/bluetooth/hci_core.c >> +++ b/net/bluetooth/hci_core.c >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags) >> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags); >> >> + if (!test_bit(HCI_UP, &hdev->flags) || >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) { >> + kfree_skb(skb); >> + return; >> + } >> + >> hci_queue_acl(chan, &chan->data_q, skb, flags); >> >> queue_work(hdev->workqueue, &hdev->tx_work); >> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is > checked in hci_cmd_work(), and in hci_dev_do_reset() for why. I see, I missed the RCU guards for the device flags. Sorry about that, I'll add them to v2. Thanks for the catch! Best, Heitor ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue 2026-05-14 14:53 ` Heitor Alves de Siqueira @ 2026-05-15 13:33 ` Luiz Augusto von Dentz 2026-05-18 15:21 ` Heitor Alves de Siqueira 0 siblings, 1 reply; 8+ messages in thread From: Luiz Augusto von Dentz @ 2026-05-15 13:33 UTC (permalink / raw) To: Heitor Alves de Siqueira Cc: Hillf Danton, Marcel Holtmann, Gustavo Padovan, linux-bluetooth, linux-kernel, kernel-dev, syzkaller-bugs, syzbot+97721dd81f792e838ba0 Hi Heitor, On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira <halves@igalia.com> wrote: > > On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote: > > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote: > >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev > >> workqueue while it's being drained. This can happen during device reset or > >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso(). > >> > >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync(): > >> - hci_dev_close_sync() clears the HCI_UP bit before draining > >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining > >> > >> Add these checks before queuing tx_work, and free the SKB if it's not > >> queued for transmission. > >> > >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue") > >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com > >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0 > >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com> > >> --- > >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++ > >> 1 file changed, 18 insertions(+) > >> > >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > >> index c46c1236ebfa..5d5f8ad7d1a8 100644 > >> --- a/net/bluetooth/hci_core.c > >> +++ b/net/bluetooth/hci_core.c > >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags) > >> > >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags); > >> > >> + if (!test_bit(HCI_UP, &hdev->flags) || > >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) { > >> + kfree_skb(skb); > >> + return; > >> + } > >> + > >> hci_queue_acl(chan, &chan->data_q, skb, flags); > >> > >> queue_work(hdev->workqueue, &hdev->tx_work); > >> > > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is > > checked in hci_cmd_work(), and in hci_dev_do_reset() for why. > > I see, I missed the RCU guards for the device flags. Sorry about that, > I'll add them to v2. > Thanks for the catch! Actually this whole thing might be because we try to clean the workqueue without actually closing the hdev, so I suspect that if we just remove all the code from hci_dev_do_reset with hci_dev_do_close+hci_dev_do_open, it might work better and align with how things work over the MGMT interface: diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index c46c1236ebfa..6782bbc9b6a7 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev) hci_req_sync_lock(hdev); - /* Drop queues */ - skb_queue_purge(&hdev->rx_q); - skb_queue_purge(&hdev->cmd_q); - - /* Cancel these to avoid queueing non-chained pending work */ - hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); - /* Wait for - * - * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) - * queue_delayed_work(&hdev->{cmd,ncmd}_timer) - * - * inside RCU section to see the flag or complete scheduling. - */ - synchronize_rcu(); - /* Explicitly cancel works in case scheduled after setting the flag. */ - cancel_delayed_work(&hdev->cmd_timer); - cancel_delayed_work(&hdev->ncmd_timer); - - /* Avoid potential lockdep warnings from the *_flush() calls by - * ensuring the workqueue is empty up front. - */ - drain_workqueue(hdev->workqueue); - - hci_dev_lock(hdev); - hci_inquiry_cache_flush(hdev); - hci_conn_hash_flush(hdev); - hci_dev_unlock(hdev); - - if (hdev->flush) - hdev->flush(hdev); - - hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); - - atomic_set(&hdev->cmd_cnt, 1); - hdev->acl_cnt = 0; - hdev->sco_cnt = 0; - hdev->le_cnt = 0; - hdev->iso_cnt = 0; + ret = hci_dev_close_sync(hdev); + if (ret) + goto done; - ret = hci_reset_sync(hdev); + ret = hci_dev_open_sync(hdev); +done: hci_req_sync_unlock(hdev); return ret; } Seem to work here and as a side effect we get notified over the MGMT when a user uses hciconfig hci0 reset: # tools/hciconfig hci0 reset bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006 bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0 bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001 bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000 bluetoothd[34]: src/adapter.c:cancel_passive_scanning() bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0 has been disabled bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008 bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006 bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1 bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001 bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000 bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0 has been enabled bluetoothd[34]: src/adapter.c:trigger_passive_scanning() bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged > Best, > Heitor -- Luiz Augusto von Dentz ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue 2026-05-15 13:33 ` Luiz Augusto von Dentz @ 2026-05-18 15:21 ` Heitor Alves de Siqueira 2026-05-18 20:48 ` Luiz Augusto von Dentz 0 siblings, 1 reply; 8+ messages in thread From: Heitor Alves de Siqueira @ 2026-05-18 15:21 UTC (permalink / raw) To: Luiz Augusto von Dentz Cc: Hillf Danton, Marcel Holtmann, Gustavo Padovan, linux-bluetooth, linux-kernel, kernel-dev, syzkaller-bugs, syzbot+97721dd81f792e838ba0 Hi Luiz, On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote: > Hi Heitor, > > On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira > <halves@igalia.com> wrote: >> >> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote: >> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote: >> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev >> >> workqueue while it's being drained. This can happen during device reset or >> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso(). >> >> >> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync(): >> >> - hci_dev_close_sync() clears the HCI_UP bit before draining >> >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining >> >> >> >> Add these checks before queuing tx_work, and free the SKB if it's not >> >> queued for transmission. >> >> >> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue") >> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com >> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0 >> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com> >> >> --- >> >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++ >> >> 1 file changed, 18 insertions(+) >> >> >> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c >> >> index c46c1236ebfa..5d5f8ad7d1a8 100644 >> >> --- a/net/bluetooth/hci_core.c >> >> +++ b/net/bluetooth/hci_core.c >> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags) >> >> >> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags); >> >> >> >> + if (!test_bit(HCI_UP, &hdev->flags) || >> >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) { >> >> + kfree_skb(skb); >> >> + return; >> >> + } >> >> + >> >> hci_queue_acl(chan, &chan->data_q, skb, flags); >> >> >> >> queue_work(hdev->workqueue, &hdev->tx_work); >> >> >> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is >> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why. >> >> I see, I missed the RCU guards for the device flags. Sorry about that, >> I'll add them to v2. >> Thanks for the catch! > > Actually this whole thing might be because we try to clean the > workqueue without actually closing the hdev, so I suspect that if we > just remove all the code from hci_dev_do_reset with > hci_dev_do_close+hci_dev_do_open, it might work better and align with > how things work over the MGMT interface: > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > index c46c1236ebfa..6782bbc9b6a7 100644 > --- a/net/bluetooth/hci_core.c > +++ b/net/bluetooth/hci_core.c > @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev) > > hci_req_sync_lock(hdev); > > - /* Drop queues */ > - skb_queue_purge(&hdev->rx_q); > - skb_queue_purge(&hdev->cmd_q); > - > - /* Cancel these to avoid queueing non-chained pending work */ > - hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); > - /* Wait for > - * > - * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) > - * queue_delayed_work(&hdev->{cmd,ncmd}_timer) > - * > - * inside RCU section to see the flag or complete scheduling. > - */ > - synchronize_rcu(); > - /* Explicitly cancel works in case scheduled after setting the flag. */ > - cancel_delayed_work(&hdev->cmd_timer); > - cancel_delayed_work(&hdev->ncmd_timer); > - > - /* Avoid potential lockdep warnings from the *_flush() calls by > - * ensuring the workqueue is empty up front. > - */ > - drain_workqueue(hdev->workqueue); > - > - hci_dev_lock(hdev); > - hci_inquiry_cache_flush(hdev); > - hci_conn_hash_flush(hdev); > - hci_dev_unlock(hdev); > - > - if (hdev->flush) > - hdev->flush(hdev); > - > - hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); > - > - atomic_set(&hdev->cmd_cnt, 1); > - hdev->acl_cnt = 0; > - hdev->sco_cnt = 0; > - hdev->le_cnt = 0; > - hdev->iso_cnt = 0; > + ret = hci_dev_close_sync(hdev); > + if (ret) > + goto done; > > - ret = hci_reset_sync(hdev); > + ret = hci_dev_open_sync(hdev); > > +done: > hci_req_sync_unlock(hdev); > return ret; > } > > Seem to work here and as a side effect we get notified over the MGMT > when a user uses hciconfig hci0 reset: > > # tools/hciconfig hci0 reset > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006 > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0 > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001 > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000 > bluetoothd[34]: src/adapter.c:cancel_passive_scanning() > bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0 > has been disabled > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008 > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006 > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1 > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001 > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000 > bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0 > has been enabled > bluetoothd[34]: src/adapter.c:trigger_passive_scanning() > bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged Thank you for the review, and for the suggestion! There seems to be a number of similar syzbot reports for the hdev workqueue, so maybe the close+open approach is a simpler solution indeed. I've originally considered implementing a wrapper/helper for workqueue submission in hci_core.c, but if we can eliminate the race condition altogether that'd be even better. My only concern is that there seem to be slight differences between what hci_dev_do_reset() does now and what we'd get with hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be suitable for the stable kernels? And if not, do you think the checks from v1 would be appropriate, in that case? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue 2026-05-18 15:21 ` Heitor Alves de Siqueira @ 2026-05-18 20:48 ` Luiz Augusto von Dentz 2026-05-20 14:26 ` Heitor Alves de Siqueira 0 siblings, 1 reply; 8+ messages in thread From: Luiz Augusto von Dentz @ 2026-05-18 20:48 UTC (permalink / raw) To: Heitor Alves de Siqueira Cc: Hillf Danton, Marcel Holtmann, Gustavo Padovan, linux-bluetooth, linux-kernel, kernel-dev, syzkaller-bugs, syzbot+97721dd81f792e838ba0 Hi Heitor, On Mon, May 18, 2026 at 11:21 AM Heitor Alves de Siqueira <halves@igalia.com> wrote: > > Hi Luiz, > > On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote: > > Hi Heitor, > > > > On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira > > <halves@igalia.com> wrote: > >> > >> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote: > >> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote: > >> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev > >> >> workqueue while it's being drained. This can happen during device reset or > >> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso(). > >> >> > >> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync(): > >> >> - hci_dev_close_sync() clears the HCI_UP bit before draining > >> >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining > >> >> > >> >> Add these checks before queuing tx_work, and free the SKB if it's not > >> >> queued for transmission. > >> >> > >> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue") > >> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com > >> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0 > >> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com> > >> >> --- > >> >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++ > >> >> 1 file changed, 18 insertions(+) > >> >> > >> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > >> >> index c46c1236ebfa..5d5f8ad7d1a8 100644 > >> >> --- a/net/bluetooth/hci_core.c > >> >> +++ b/net/bluetooth/hci_core.c > >> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags) > >> >> > >> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags); > >> >> > >> >> + if (!test_bit(HCI_UP, &hdev->flags) || > >> >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) { > >> >> + kfree_skb(skb); > >> >> + return; > >> >> + } > >> >> + > >> >> hci_queue_acl(chan, &chan->data_q, skb, flags); > >> >> > >> >> queue_work(hdev->workqueue, &hdev->tx_work); > >> >> > >> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is > >> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why. > >> > >> I see, I missed the RCU guards for the device flags. Sorry about that, > >> I'll add them to v2. > >> Thanks for the catch! > > > > Actually this whole thing might be because we try to clean the > > workqueue without actually closing the hdev, so I suspect that if we > > just remove all the code from hci_dev_do_reset with > > hci_dev_do_close+hci_dev_do_open, it might work better and align with > > how things work over the MGMT interface: > > > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > > index c46c1236ebfa..6782bbc9b6a7 100644 > > --- a/net/bluetooth/hci_core.c > > +++ b/net/bluetooth/hci_core.c > > @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev) > > > > hci_req_sync_lock(hdev); > > > > - /* Drop queues */ > > - skb_queue_purge(&hdev->rx_q); > > - skb_queue_purge(&hdev->cmd_q); > > - > > - /* Cancel these to avoid queueing non-chained pending work */ > > - hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); > > - /* Wait for > > - * > > - * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) > > - * queue_delayed_work(&hdev->{cmd,ncmd}_timer) > > - * > > - * inside RCU section to see the flag or complete scheduling. > > - */ > > - synchronize_rcu(); > > - /* Explicitly cancel works in case scheduled after setting the flag. */ > > - cancel_delayed_work(&hdev->cmd_timer); > > - cancel_delayed_work(&hdev->ncmd_timer); > > - > > - /* Avoid potential lockdep warnings from the *_flush() calls by > > - * ensuring the workqueue is empty up front. > > - */ > > - drain_workqueue(hdev->workqueue); > > - > > - hci_dev_lock(hdev); > > - hci_inquiry_cache_flush(hdev); > > - hci_conn_hash_flush(hdev); > > - hci_dev_unlock(hdev); > > - > > - if (hdev->flush) > > - hdev->flush(hdev); > > - > > - hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); > > - > > - atomic_set(&hdev->cmd_cnt, 1); > > - hdev->acl_cnt = 0; > > - hdev->sco_cnt = 0; > > - hdev->le_cnt = 0; > > - hdev->iso_cnt = 0; > > + ret = hci_dev_close_sync(hdev); > > + if (ret) > > + goto done; > > > > - ret = hci_reset_sync(hdev); > > + ret = hci_dev_open_sync(hdev); > > > > +done: > > hci_req_sync_unlock(hdev); > > return ret; > > } > > > > Seem to work here and as a side effect we get notified over the MGMT > > when a user uses hciconfig hci0 reset: > > > > # tools/hciconfig hci0 reset > > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006 > > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0 > > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001 > > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000 > > bluetoothd[34]: src/adapter.c:cancel_passive_scanning() > > bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0 > > has been disabled > > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008 > > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006 > > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1 > > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001 > > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000 > > bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0 > > has been enabled > > bluetoothd[34]: src/adapter.c:trigger_passive_scanning() > > bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged > > Thank you for the review, and for the suggestion! There seems to be a > number of similar syzbot reports for the hdev workqueue, so maybe the > close+open approach is a simpler solution indeed. I've originally > considered implementing a wrapper/helper for workqueue submission in > hci_core.c, but if we can eliminate the race condition altogether that'd > be even better. > > My only concern is that there seem to be slight differences between > what hci_dev_do_reset() does now and what we'd get with > hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use > HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be > suitable for the stable kernels? And if not, do you think the checks > from v1 would be appropriate, in that case? I think, the HCI_Reset is a heavy hammer because it's a global reset. Getting all the pieces to cooperate has been causing all sort of issues, and in any case I think what hci_dev_do_reset does is simply wrong or outdated; for instance it doesn't seem to affect LE, such as cancelling scanning or advertising. This is likely because interfaces like MGMT don't use it and require a power cycle to reset. -- Luiz Augusto von Dentz ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue 2026-05-18 20:48 ` Luiz Augusto von Dentz @ 2026-05-20 14:26 ` Heitor Alves de Siqueira 0 siblings, 0 replies; 8+ messages in thread From: Heitor Alves de Siqueira @ 2026-05-20 14:26 UTC (permalink / raw) To: Luiz Augusto von Dentz, Heitor Alves de Siqueira Cc: Hillf Danton, Marcel Holtmann, Gustavo Padovan, linux-bluetooth, linux-kernel, kernel-dev, syzkaller-bugs, syzbot+97721dd81f792e838ba0 On Mon May 18, 2026 at 5:48 PM -03, Luiz Augusto von Dentz wrote: > Hi Heitor, > > On Mon, May 18, 2026 at 11:21 AM Heitor Alves de Siqueira > <halves@igalia.com> wrote: >> >> Hi Luiz, >> >> On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote: >> > Hi Heitor, >> > >> > On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira >> > <halves@igalia.com> wrote: >> >> >> >> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote: >> >> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote: >> >> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev >> >> >> workqueue while it's being drained. This can happen during device reset or >> >> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso(). >> >> >> >> >> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync(): >> >> >> - hci_dev_close_sync() clears the HCI_UP bit before draining >> >> >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining >> >> >> >> >> >> Add these checks before queuing tx_work, and free the SKB if it's not >> >> >> queued for transmission. >> >> >> >> >> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue") >> >> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com >> >> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0 >> >> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com> >> >> >> --- >> >> >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++ >> >> >> 1 file changed, 18 insertions(+) >> >> >> >> >> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c >> >> >> index c46c1236ebfa..5d5f8ad7d1a8 100644 >> >> >> --- a/net/bluetooth/hci_core.c >> >> >> +++ b/net/bluetooth/hci_core.c >> >> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags) >> >> >> >> >> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags); >> >> >> >> >> >> + if (!test_bit(HCI_UP, &hdev->flags) || >> >> >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) { >> >> >> + kfree_skb(skb); >> >> >> + return; >> >> >> + } >> >> >> + >> >> >> hci_queue_acl(chan, &chan->data_q, skb, flags); >> >> >> >> >> >> queue_work(hdev->workqueue, &hdev->tx_work); >> >> >> >> >> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is >> >> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why. >> >> >> >> I see, I missed the RCU guards for the device flags. Sorry about that, >> >> I'll add them to v2. >> >> Thanks for the catch! >> > >> > Actually this whole thing might be because we try to clean the >> > workqueue without actually closing the hdev, so I suspect that if we >> > just remove all the code from hci_dev_do_reset with >> > hci_dev_do_close+hci_dev_do_open, it might work better and align with >> > how things work over the MGMT interface: >> > >> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c >> > index c46c1236ebfa..6782bbc9b6a7 100644 >> > --- a/net/bluetooth/hci_core.c >> > +++ b/net/bluetooth/hci_core.c >> > @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev) >> > >> > hci_req_sync_lock(hdev); >> > >> > - /* Drop queues */ >> > - skb_queue_purge(&hdev->rx_q); >> > - skb_queue_purge(&hdev->cmd_q); >> > - >> > - /* Cancel these to avoid queueing non-chained pending work */ >> > - hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); >> > - /* Wait for >> > - * >> > - * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) >> > - * queue_delayed_work(&hdev->{cmd,ncmd}_timer) >> > - * >> > - * inside RCU section to see the flag or complete scheduling. >> > - */ >> > - synchronize_rcu(); >> > - /* Explicitly cancel works in case scheduled after setting the flag. */ >> > - cancel_delayed_work(&hdev->cmd_timer); >> > - cancel_delayed_work(&hdev->ncmd_timer); >> > - >> > - /* Avoid potential lockdep warnings from the *_flush() calls by >> > - * ensuring the workqueue is empty up front. >> > - */ >> > - drain_workqueue(hdev->workqueue); >> > - >> > - hci_dev_lock(hdev); >> > - hci_inquiry_cache_flush(hdev); >> > - hci_conn_hash_flush(hdev); >> > - hci_dev_unlock(hdev); >> > - >> > - if (hdev->flush) >> > - hdev->flush(hdev); >> > - >> > - hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); >> > - >> > - atomic_set(&hdev->cmd_cnt, 1); >> > - hdev->acl_cnt = 0; >> > - hdev->sco_cnt = 0; >> > - hdev->le_cnt = 0; >> > - hdev->iso_cnt = 0; >> > + ret = hci_dev_close_sync(hdev); >> > + if (ret) >> > + goto done; >> > >> > - ret = hci_reset_sync(hdev); >> > + ret = hci_dev_open_sync(hdev); >> > >> > +done: >> > hci_req_sync_unlock(hdev); >> > return ret; >> > } >> > >> > Seem to work here and as a side effect we get notified over the MGMT >> > when a user uses hciconfig hci0 reset: >> > >> > # tools/hciconfig hci0 reset >> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006 >> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0 >> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001 >> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000 >> > bluetoothd[34]: src/adapter.c:cancel_passive_scanning() >> > bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0 >> > has been disabled >> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008 >> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006 >> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1 >> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001 >> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000 >> > bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0 >> > has been enabled >> > bluetoothd[34]: src/adapter.c:trigger_passive_scanning() >> > bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged >> >> Thank you for the review, and for the suggestion! There seems to be a >> number of similar syzbot reports for the hdev workqueue, so maybe the >> close+open approach is a simpler solution indeed. I've originally >> considered implementing a wrapper/helper for workqueue submission in >> hci_core.c, but if we can eliminate the race condition altogether that'd >> be even better. >> >> My only concern is that there seem to be slight differences between >> what hci_dev_do_reset() does now and what we'd get with >> hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use >> HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be >> suitable for the stable kernels? And if not, do you think the checks >> from v1 would be appropriate, in that case? > > I think, the HCI_Reset is a heavy hammer because it's a global reset. > Getting all the pieces to cooperate has been causing all sort of > issues, and in any case I think what hci_dev_do_reset does is simply > wrong or outdated; for instance it doesn't seem to affect LE, such as > cancelling scanning or advertising. This is likely because interfaces > like MGMT don't use it and require a power cycle to reset. ACK, that makes sense. Please disregard this patch, I'll send a follow-up with the reset changes and any missing bits from the current async reset path. Thank you for the pointers, Luiz! ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-20 14:26 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-13 18:55 [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue Heitor Alves de Siqueira 2026-05-13 20:45 ` bluez.test.bot 2026-05-14 2:04 ` [PATCH] " Hillf Danton 2026-05-14 14:53 ` Heitor Alves de Siqueira 2026-05-15 13:33 ` Luiz Augusto von Dentz 2026-05-18 15:21 ` Heitor Alves de Siqueira 2026-05-18 20:48 ` Luiz Augusto von Dentz 2026-05-20 14:26 ` Heitor Alves de Siqueira
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.