* [PATCH v1] Bluetooth: btintel_pcie: split coredump worker into per-trigger works
@ 2026-07-02 17:03 Kiran K
2026-07-02 18:48 ` [v1] " bluez.test.bot
0 siblings, 1 reply; 2+ messages in thread
From: Kiran K @ 2026-07-02 17:03 UTC (permalink / raw)
To: linux-bluetooth
Cc: ravishankar.srivatsa, chethan.tumkur.narayan,
chandrashekar.devegowda, Kiran K
btintel_pcie_coredump_worker() handled three unrelated jobs in one
work item: collect a DRAM trace coredump, read the hardware exception
event, and read the firmware-trigger event. The worker walked three
flag bits at runtime and each interrupt path mutated multiple bits
to communicate which sub-jobs the worker should run, which made the
ownership rules for those bits hard to reason about and entangled
the trigger reason with the in-progress accounting.
Replace the single combined worker with three single-purpose ones,
each owning exactly one flag:
coredump_work -> btintel_pcie_dump_traces()
guarded by COREDUMP_INPROGRESS
hwexp_work -> btintel_pcie_read_hwexp()
guarded by CORE_HALTED (already permanent until
re-probe; HWEXP_INPROGRESS is now redundant
and removed)
fwtrigger_work -> btintel_pcie_dump_fwtrigger_event()
guarded by FWTRIGGER_DUMP_INPROGRESS
All three workers are queued on a shared ordered workqueue (renamed
coredump_workqueue -> dump_workqueue) so a companion event reader
(hwexp/fwtrigger) and the coredump always run FIFO. Companion work
is queued before coredump_work so dmp_hdr.event_type/event_id are
populated by the time dump_traces() consumes them, preserving the
original ordering.
Introduce btintel_pcie_queue_coredump() to centralize the coredump
trigger contract: it is the single writer of COREDUMP_INPROGRESS and
of dmp_hdr.trigger_reason, sets both atomically against concurrent
triggers, and rolls back the bit if the workqueue is disabled
(reset/remove in progress) so a later trigger after re-probe can
succeed. All four trigger sites (HWEXP IRQ, FW-trigger IRQ,
devcoredump user trigger, resume() D0 error path) go through the
helper.
Per-work guard bits are now cleared at the tail of each worker
rather than in the middle of the combined worker, which closes a
subtle race where a duplicate IRQ could observe a cleared bit and
requeue while the previous pass was still finalizing
dev_coredumpv().
reset_work() and remove() now disable_work_sync() all three workers
and, on the FLR-failure path, enable_work() all three to keep their
disable counters balanced. The PLDR/FLR-success contract (re-probe
re-INIT_WORKs everything with counter 0) is preserved.
No functional change to the dump payloads; this is a pure
restructuring of the worker dispatch and its synchronization.
Signed-off-by: Kiran K <kiran.k@intel.com>
Assisted-by: GitHub-Copilot:claude-4.7-opus
---
drivers/bluetooth/btintel_pcie.c | 185 ++++++++++++++++++++-----------
drivers/bluetooth/btintel_pcie.h | 12 +-
2 files changed, 130 insertions(+), 67 deletions(-)
diff --git a/drivers/bluetooth/btintel_pcie.c b/drivers/bluetooth/btintel_pcie.c
index 2b7231be5973..013568197a39 100644
--- a/drivers/bluetooth/btintel_pcie.c
+++ b/drivers/bluetooth/btintel_pcie.c
@@ -1446,72 +1446,134 @@ static int btintel_pcie_dump_fwtrigger_event(struct btintel_pcie_data *data)
return err;
}
+/* Queue a coredump dump_traces() pass.
+ *
+ * Returns true if a new coredump was queued, false if one was already
+ * in-flight (the BTINTEL_PCIE_COREDUMP_INPROGRESS bit serves as the
+ * single-writer guard for the @coredump_work item) or the workqueue is
+ * disabled (reset / remove in progress).
+ *
+ * Always queue this AFTER any companion event-reader work (hwexp /
+ * fwtrigger) so that, on the ordered @dump_workqueue, the event reader
+ * runs first and populates dmp_hdr.event_type / event_id before
+ * dump_traces consumes them.
+ */
+static bool btintel_pcie_queue_coredump(struct btintel_pcie_data *data,
+ u16 trigger_reason)
+{
+ if (test_and_set_bit(BTINTEL_PCIE_COREDUMP_INPROGRESS, &data->flags))
+ return false;
+
+ data->dmp_hdr.trigger_reason = trigger_reason;
+
+ if (queue_work(data->dump_workqueue, &data->coredump_work))
+ return true;
+
+ /* Workqueue is disabled (reset/remove drained it). Release the
+ * guard so a later trigger, after re-probe, can succeed.
+ */
+ clear_bit(BTINTEL_PCIE_COREDUMP_INPROGRESS, &data->flags);
+ return false;
+}
+
static void btintel_pcie_msix_fw_trigger_handler(struct btintel_pcie_data *data)
{
bt_dev_dbg(data->hdev, "Received firmware smart trigger cause");
- if (test_and_set_bit(BTINTEL_PCIE_FWTRIGGER_DUMP_INPROGRESS, &data->flags))
+ /* Per-work guard: deduplicate concurrent FW-trigger interrupts.
+ * Cleared at the tail of btintel_pcie_fwtrigger_worker().
+ */
+ if (test_and_set_bit(BTINTEL_PCIE_FWTRIGGER_DUMP_INPROGRESS,
+ &data->flags))
return;
- /* Trigger device core dump when there is FW assert */
- if (!test_and_set_bit(BTINTEL_PCIE_COREDUMP_INPROGRESS, &data->flags))
- data->dmp_hdr.trigger_reason = BTINTEL_PCIE_TRIGGER_REASON_FW_ASSERT;
+ if (!queue_work(data->dump_workqueue, &data->fwtrigger_work)) {
+ clear_bit(BTINTEL_PCIE_FWTRIGGER_DUMP_INPROGRESS, &data->flags);
+ return;
+ }
- queue_work(data->coredump_workqueue, &data->coredump_work);
+ /* Queue coredump after the fwtrigger event reader so dmp_hdr.event_*
+ * is populated before dump_traces consumes it.
+ */
+ btintel_pcie_queue_coredump(data, BTINTEL_PCIE_TRIGGER_REASON_FW_ASSERT);
}
static void btintel_pcie_msix_hw_exp_handler(struct btintel_pcie_data *data)
{
bt_dev_err(data->hdev, "Received hw exception interrupt");
+ /* CORE_HALTED is the single-writer guard for this handler. It is
+ * set once on first HW exception and cleared only by re-probe
+ * (data is reallocated), so it also serializes hwexp_work
+ * scheduling without needing a separate bit.
+ */
if (test_and_set_bit(BTINTEL_PCIE_CORE_HALTED, &data->flags))
return;
- if (test_and_set_bit(BTINTEL_PCIE_HWEXP_INPROGRESS, &data->flags))
- return;
-
- /* Trigger device core dump when there is HW exception */
- if (!test_and_set_bit(BTINTEL_PCIE_COREDUMP_INPROGRESS, &data->flags))
- data->dmp_hdr.trigger_reason = BTINTEL_PCIE_TRIGGER_REASON_FW_ASSERT;
+ /* Queue companion coredump first so it is appended after hwexp_work
+ * on the ordered @dump_workqueue (preserves the original
+ * coredump-then-hwexp ordering).
+ */
+ btintel_pcie_queue_coredump(data, BTINTEL_PCIE_TRIGGER_REASON_FW_ASSERT);
- queue_work(data->coredump_workqueue, &data->coredump_work);
+ queue_work(data->dump_workqueue, &data->hwexp_work);
}
static void btintel_pcie_coredump_worker(struct work_struct *work)
{
struct btintel_pcie_data *data = container_of(work,
struct btintel_pcie_data, coredump_work);
- int err;
/* hdev is NULL until setup_hdev() succeeds, and is cleared on
* teardown after disable_work_sync() drains us; bail in that case.
*/
+ if (!data->hdev)
+ goto out;
+
+ btintel_pcie_dump_traces(data->hdev);
+out:
+ /* Release guard last so a new trigger can run only after this
+ * pass has fully completed (including dev_coredumpv()).
+ */
+ clear_bit(BTINTEL_PCIE_COREDUMP_INPROGRESS, &data->flags);
+}
+
+static void btintel_pcie_hwexp_worker(struct work_struct *work)
+{
+ struct btintel_pcie_data *data = container_of(work,
+ struct btintel_pcie_data, hwexp_work);
+
if (!data->hdev)
return;
- if (test_bit(BTINTEL_PCIE_FWTRIGGER_DUMP_INPROGRESS, &data->flags)) {
- err = btintel_pcie_dump_fwtrigger_event(data);
- if (err)
- bt_dev_warn(data->hdev, "failed to log fwtrigger event");
- clear_bit(BTINTEL_PCIE_FWTRIGGER_DUMP_INPROGRESS, &data->flags);
- }
+ /* Unlike usb products, controller will not send hardware exception
+ * event on exception. Instead controller writes the hardware event
+ * to device memory along with optional debug events, raises MSIX
+ * and halts. Driver shall read the exception event from device
+ * memory and passes it to the stack for further processing.
+ *
+ * Re-entry is gated by BTINTEL_PCIE_CORE_HALTED in the IRQ
+ * handler, which is only cleared by re-probe; no per-work bit
+ * is needed here.
+ */
+ btintel_pcie_read_hwexp(data);
+}
- if (test_bit(BTINTEL_PCIE_COREDUMP_INPROGRESS, &data->flags)) {
- btintel_pcie_dump_traces(data->hdev);
- clear_bit(BTINTEL_PCIE_COREDUMP_INPROGRESS, &data->flags);
- }
+static void btintel_pcie_fwtrigger_worker(struct work_struct *work)
+{
+ struct btintel_pcie_data *data = container_of(work,
+ struct btintel_pcie_data, fwtrigger_work);
+ int err;
- if (test_bit(BTINTEL_PCIE_HWEXP_INPROGRESS, &data->flags)) {
- /* Unlike usb products, controller will not send hardware
- * exception event on exception. Instead controller writes the
- * hardware event to device memory along with optional debug
- * events, raises MSIX and halts. Driver shall read the
- * exception event from device memory and passes it stack for
- * further processing.
- */
- btintel_pcie_read_hwexp(data);
- clear_bit(BTINTEL_PCIE_HWEXP_INPROGRESS, &data->flags);
- }
+ if (!data->hdev)
+ goto out;
+
+ err = btintel_pcie_dump_fwtrigger_event(data);
+ if (err)
+ bt_dev_warn(data->hdev, "failed to log fwtrigger event");
+out:
+ /* Release guard last; matches set in fw_trigger handler. */
+ clear_bit(BTINTEL_PCIE_FWTRIGGER_DUMP_INPROGRESS, &data->flags);
}
static void btintel_pcie_rx_work(struct work_struct *work)
@@ -2650,20 +2712,22 @@ static void btintel_pcie_reset_work(struct work_struct *wk)
btintel_pcie_synchronize_irqs(data);
flush_work(&data->rx_work);
- /* Drain any in-flight coredump and block new ones across reset.
- * Safe from self-deadlock: coredump_work runs on a separate wq.
+ /* Drain any in-flight dump workers and block new ones across reset.
+ * Safe from self-deadlock: they all run on a separate wq.
*/
disable_work_sync(&data->coredump_work);
+ disable_work_sync(&data->hwexp_work);
+ disable_work_sync(&data->fwtrigger_work);
bt_dev_dbg(data->hdev, "Release bluetooth interface");
/* Both reset paths follow the same contract: on success they
* destroy 'data' via device_reprobe() (a fresh probe re-INIT_WORKs
- * the coredump_work with disable count 0), so enable_work() must
+ * the dump workers with disable count 0), so enable_work() must
* NOT be called on the success path. Only the FLR path can fail
* with 'data' still alive, in which case we balance the
- * disable_work_sync() above so a later successful reset is not
- * permanently blocked.
+ * disable_work_sync() calls above so a later successful reset is
+ * not permanently blocked.
*
* pci_lock_rescan_remove() (held above) serializes against PCI
* device addition/removal (hotplug), so no device can be added to
@@ -2674,8 +2738,11 @@ static void btintel_pcie_reset_work(struct work_struct *wk)
goto out;
}
- if (btintel_pcie_perform_flr(data))
+ if (btintel_pcie_perform_flr(data)) {
enable_work(&data->coredump_work);
+ enable_work(&data->hwexp_work);
+ enable_work(&data->fwtrigger_work);
+ }
out:
pci_dev_put(pdev);
@@ -2869,8 +2936,8 @@ static int btintel_pcie_probe(struct pci_dev *pdev,
if (!data->workqueue)
return -ENOMEM;
- data->coredump_workqueue = alloc_ordered_workqueue(KBUILD_MODNAME "_cd", 0);
- if (!data->coredump_workqueue) {
+ data->dump_workqueue = alloc_ordered_workqueue(KBUILD_MODNAME "_cd", 0);
+ if (!data->dump_workqueue) {
destroy_workqueue(data->workqueue);
return -ENOMEM;
}
@@ -2879,6 +2946,8 @@ static int btintel_pcie_probe(struct pci_dev *pdev,
INIT_WORK(&data->rx_work, btintel_pcie_rx_work);
INIT_WORK(&data->reset_work, btintel_pcie_reset_work);
INIT_WORK(&data->coredump_work, btintel_pcie_coredump_worker);
+ INIT_WORK(&data->hwexp_work, btintel_pcie_hwexp_worker);
+ INIT_WORK(&data->fwtrigger_work, btintel_pcie_fwtrigger_worker);
data->boot_stage_cache = 0x00;
data->img_resp_cache = 0x00;
@@ -2921,7 +2990,7 @@ static int btintel_pcie_probe(struct pci_dev *pdev,
/* reset device before exit */
btintel_pcie_reset_bt(data);
- destroy_workqueue(data->coredump_workqueue);
+ destroy_workqueue(data->dump_workqueue);
pci_clear_master(pdev);
@@ -2940,12 +3009,14 @@ static void btintel_pcie_remove(struct pci_dev *pdev)
return;
}
- /* Permanently block coredump triggers and drain the worker before
- * tearing down. Must run before cancel_work_sync(&reset_work) so
- * the disable counter stays >= 1 even after reset_work()'s
+ /* Permanently block all dump triggers and drain the workers before
+ * tearing down. Must run before disable_work_sync(&reset_work) so
+ * the disable counters stay >= 1 even after reset_work()'s
* balanced enable_work() (counter 2 -> 1, never reaching 0).
*/
disable_work_sync(&data->coredump_work);
+ disable_work_sync(&data->hwexp_work);
+ disable_work_sync(&data->fwtrigger_work);
/* Cancel pending reset work. Skip only when remove() is called from
* within the reset work itself (PLDR device_reprobe path) to avoid
@@ -2973,7 +3044,7 @@ static void btintel_pcie_remove(struct pci_dev *pdev)
btintel_pcie_release_hdev(data);
- destroy_workqueue(data->coredump_workqueue);
+ destroy_workqueue(data->dump_workqueue);
destroy_workqueue(data->workqueue);
btintel_pcie_free(data);
@@ -2992,16 +3063,8 @@ static void btintel_pcie_coredump(struct device *dev)
if (!data)
return;
- if (test_and_set_bit(BTINTEL_PCIE_COREDUMP_INPROGRESS, &data->flags))
- return;
-
- data->dmp_hdr.trigger_reason = BTINTEL_PCIE_TRIGGER_REASON_USER_TRIGGER;
- /* queue_work() returns false if the work is disabled (reset or
- * remove in progress); clear the in-progress bit so a later
- * trigger can succeed once the work is re-enabled.
- */
- if (!queue_work(data->coredump_workqueue, &data->coredump_work))
- clear_bit(BTINTEL_PCIE_COREDUMP_INPROGRESS, &data->flags);
+ btintel_pcie_queue_coredump(data,
+ BTINTEL_PCIE_TRIGGER_REASON_USER_TRIGGER);
}
#endif
@@ -3138,12 +3201,8 @@ static int btintel_pcie_resume(struct device *dev)
if (btintel_pcie_in_error(data) ||
btintel_pcie_in_device_halt(data)) {
bt_dev_err(data->hdev, "Controller in error state for D0 entry");
- if (!test_and_set_bit(BTINTEL_PCIE_COREDUMP_INPROGRESS,
- &data->flags)) {
- data->dmp_hdr.trigger_reason =
- BTINTEL_PCIE_TRIGGER_REASON_FW_ASSERT;
- queue_work(data->coredump_workqueue, &data->coredump_work);
- }
+ btintel_pcie_queue_coredump(data,
+ BTINTEL_PCIE_TRIGGER_REASON_FW_ASSERT);
set_bit(BTINTEL_PCIE_CORE_HALTED, &data->flags);
btintel_pcie_reset(data->hdev);
}
diff --git a/drivers/bluetooth/btintel_pcie.h b/drivers/bluetooth/btintel_pcie.h
index 7caee093e316..749369b24031 100644
--- a/drivers/bluetooth/btintel_pcie.h
+++ b/drivers/bluetooth/btintel_pcie.h
@@ -118,7 +118,6 @@ enum {
enum {
BTINTEL_PCIE_CORE_HALTED,
- BTINTEL_PCIE_HWEXP_INPROGRESS,
BTINTEL_PCIE_COREDUMP_INPROGRESS,
BTINTEL_PCIE_FWTRIGGER_DUMP_INPROGRESS,
BTINTEL_PCIE_RECOVERY_IN_PROGRESS,
@@ -466,8 +465,11 @@ struct btintel_pcie_dump_header {
* @workqueue: workqueue for RX work
* @rx_skb_q: SKB queue for RX packet
* @rx_work: RX work struct to process the RX packet in @rx_skb_q
- * @coredump_workqueue: dedicated workqueue for coredump collection
- * @coredump_work: work struct for coredump trace collection
+ * @dump_workqueue: dedicated ordered workqueue serializing the coredump,
+ * hardware exception, and firmware-trigger dump workers
+ * @coredump_work: work struct for DRAM trace coredump collection
+ * @hwexp_work: work struct for hardware exception event read
+ * @fwtrigger_work: work struct for firmware-triggered diagnostic event read
* @dma_pool: DMA pool for descriptors, index array and ci
* @dma_p_addr: DMA address for pool
* @dma_v_addr: address of pool
@@ -516,8 +518,10 @@ struct btintel_pcie_data {
struct work_struct rx_work;
struct work_struct reset_work;
- struct workqueue_struct *coredump_workqueue;
+ struct workqueue_struct *dump_workqueue;
struct work_struct coredump_work;
+ struct work_struct hwexp_work;
+ struct work_struct fwtrigger_work;
struct dma_pool *dma_pool;
dma_addr_t dma_p_addr;
--
2.54.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-07-02 18:48 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-02 17:03 [PATCH v1] Bluetooth: btintel_pcie: split coredump worker into per-trigger works Kiran K
2026-07-02 18:48 ` [v1] " bluez.test.bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox