All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling
@ 2026-03-23  3:15 Koichiro Den
  2026-03-23  3:15 ` [PATCH v3 01/10] PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset Koichiro Den
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Koichiro Den @ 2026-03-23  3:15 UTC (permalink / raw)
  To: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel
  Cc: linux-kernel, linux-pci, ntb

This series fixes doorbell bit/vector handling for the EPF-based NTB
pair (ntb_hw_epf <-> pci-epf-*ntb). Its primary goal is to enable safe
per-db-vector handling in the NTB core and clients (e.g. ntb_transport),
without changing the on-the-wire doorbell mapping.


Background / problem
====================

ntb_hw_epf historically applies an extra offset when ringing peer
doorbells: the link event uses the first interrupt slot, and doorbells
start from the third slot (i.e. a second slot is effectively unused).
pci-epf-vntb carries the matching offset on the EP side as well.

As long as db_vector_count()/db_vector_mask() are not implemented, this
mismatch is mostly masked. Doorbell events are effectively treated as
"can hit any QP" and the off-by-one vector numbering does not surface
clearly.

However, once per-vector handling is enabled, the current state becomes
problematic:

  - db_valid_mask exposes bits that do not correspond to real doorbells
    (link/unused slots leak into the mask).
  - ntb_db_event() is fed with 1-based/shifted vectors, while NTB core
    expects a 0-based db_vector for doorbells.
  - On pci-epf-vntb, .peer_db_set() may be called in atomic context, but
    it directly calls pci_epc_raise_irq(), which can sleep.


Why NOT fix the root offset?
============================

The natural "root" fix would be to remove the historical extra offset in
the peer_db_set() doorbell paths for ntb_hw_epf and pci-epf-vntb.
Unfortunately this would lead to interoperability issues when mixing old
and new kernel versions (old/new peers). A new side would ring a
different interrupt slot than what an old peer expects, leading to
missed or misrouted doorbells, once db_vector_count()/db_vector_mask()
are implemented.

Therefore this series intentionally keeps the legacy offset, and instead
fixes the surrounding pieces so the mapping is documented and handled
consistently in masks, vector numbering, and per-vector reporting.


What this series does
=====================

- pci-epf-vntb:

  - Document the legacy offset.
  - Defer MSI doorbell raises to process context to avoid sleeping in
    atomic context. This becomes relevant once multiple doorbells are
    raised concurrently at a high rate.
  - Report doorbell vectors as 0-based to ntb_db_event().
  - Fix db_valid_mask and implement db_vector_count()/db_vector_mask().

- ntb_hw_epf:

  - Document the legacy offset in ntb_epf_peer_db_set().
  - Fix db_valid_mask to cover only real doorbell bits.
  - Report 0-based db_vector to ntb_db_event() (accounting for the
    unused slot).
  - Keep db_val as a bitmask and fix db_read/db_clear semantics
    accordingly.
  - Implement db_vector_count()/db_vector_mask().


Compatibility
=============

By keeping the legacy offset intact, this series aims to remain
compatible across mixed kernel versions. The observable changes are
limited to correct mask/vector reporting and safer execution context
handling.

Patches 1-5 (PCI Endpoint) and 6-10 (NTB) are independent and can be
applied separately through the respective trees. They are sent together
in this v3 for convenience.

Once the remaining acks from NTB maintainers are collected, the plan is
to take the whole series through the PCI EP tree. See:
https://lore.kernel.org/linux-pci/rnzsnp5de4qf5w7smebkmqekpuaqckltx73rj6ha3q2nrby5yp@7hsgvdzvjkp6/

---
Changelog
=========

Changes since v2:
  - No functional changes.
  - Rebased onto current pci/endpoint
    e022f0c72c7f ("selftests: pci_endpoint: Skip reserved BARs").
    * Patch 2 needed a trivial context-only adjustment while rebasing, due
      to commit d799984233a5 ("PCI: endpoint: pci-epf-vntb: Stop
      cmd_handler work in epf_ntb_epc_cleanup").
  - Picked up additional Reviewed-by tags from Frank.
  - Fixed the incorrect v2 series title.

Changes since v1:
  - Addressed feedback from Dave (add a source code comment, introduce
    enum to eliminate magic numbers)
  - Updated source code comment in Patch 2.
  - No functional changes, so retained Reviewed-by tags by Frank and Dave.
    Thank you both for the review.

v2: https://lore.kernel.org/linux-pci/20260227084955.3184017-1-den@valinux.co.jp/
v1: https://lore.kernel.org/linux-pci/20260224133459.1741537-1-den@valinux.co.jp/


Best regards,
Koichiro


Koichiro Den (10):
  PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset
  PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic
    context
  PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via
    ntb_db_event()
  PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask
  PCI: endpoint: pci-epf-vntb: Implement db_vector_count/mask for
    doorbells
  NTB: epf: Document legacy doorbell slot offset in
    ntb_epf_peer_db_set()
  NTB: epf: Make db_valid_mask cover only real doorbell bits
  NTB: epf: Report 0-based doorbell vector via ntb_db_event()
  NTB: epf: Fix doorbell bitmask handling in db_read/db_clear
  NTB: epf: Implement db_vector_count/mask for doorbells

 drivers/ntb/hw/epf/ntb_hw_epf.c               |  89 ++++++++++-
 drivers/pci/endpoint/functions/pci-epf-vntb.c | 147 +++++++++++++++---
 2 files changed, 210 insertions(+), 26 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 01/10] PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset
  2026-03-23  3:15 [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
@ 2026-03-23  3:15 ` Koichiro Den
  2026-03-23  3:15 ` [PATCH v3 02/10] PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic context Koichiro Den
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Koichiro Den @ 2026-03-23  3:15 UTC (permalink / raw)
  To: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel
  Cc: linux-kernel, linux-pci, ntb

vntb_epf_peer_db_set() raises an MSI interrupt to notify the RC side of
a doorbell event. pci_epc_raise_irq(..., PCI_IRQ_MSI, interrupt_num)
takes a 1-based MSI interrupt number.

The ntb_hw_epf driver reserves MSI #1 for link events, so doorbells
would naturally start at MSI #2 (doorbell bit 0 -> MSI #2). However,
pci-epf-vntb has historically applied an extra offset and maps doorbell
bit 0 to MSI #3. This matches the legacy behavior of ntb_hw_epf and has
been preserved since commit e35f56bb0330 ("PCI: endpoint: Support NTB
transfer between RC and EP").

This offset has not surfaced as a functional issue because:
- ntb_hw_epf typically allocates enough MSI vectors, so the off-by-one
  still hits a valid MSI vector, and
- ntb_hw_epf does not implement .db_vector_count()/.db_vector_mask(), so
  client drivers such as ntb_transport effectively ignore the vector
  number and schedule all QPs.

Correcting the MSI number would break interoperability with peers
running older kernels.

Document the legacy offset to avoid confusion when enabling
per-db-vector handling.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/pci/endpoint/functions/pci-epf-vntb.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 805353528967..a75f8a30f8dc 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1367,6 +1367,25 @@ static int vntb_epf_peer_db_set(struct ntb_dev *ndev, u64 db_bits)
 	func_no = ntb->epf->func_no;
 	vfunc_no = ntb->epf->vfunc_no;
 
+	/*
+	 * pci_epc_raise_irq() for MSI expects a 1-based interrupt number.
+	 * ffs() returns a 1-based index (bit 0 -> 1). interrupt_num has already
+	 * been computed as ffs(db_bits) + 1 above. Adding one more +1 when
+	 * calling pci_epc_raise_irq() therefore results in:
+	 *
+	 *   doorbell bit 0 -> MSI #3
+	 *
+	 * Legacy mapping (kept for compatibility):
+	 *
+	 *   MSI #1 : link event (reserved)
+	 *   MSI #2 : unused (historical offset)
+	 *   MSI #3 : doorbell bit 0 (DB#0)
+	 *   MSI #4 : doorbell bit 1 (DB#1)
+	 *   ...
+	 *
+	 * Do not change this mapping to avoid breaking interoperability with
+	 * older peers.
+	 */
 	ret = pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
 				PCI_IRQ_MSI, interrupt_num + 1);
 	if (ret)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 02/10] PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic context
  2026-03-23  3:15 [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
  2026-03-23  3:15 ` [PATCH v3 01/10] PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset Koichiro Den
@ 2026-03-23  3:15 ` Koichiro Den
  2026-03-23 18:27   ` Frank Li
  2026-05-01 16:26   ` Manivannan Sadhasivam
  2026-03-23  3:15 ` [PATCH v3 03/10] PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via ntb_db_event() Koichiro Den
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 20+ messages in thread
From: Koichiro Den @ 2026-03-23  3:15 UTC (permalink / raw)
  To: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel
  Cc: linux-kernel, linux-pci, ntb

The NTB .peer_db_set() callback may be invoked from atomic context.
pci-epf-vntb currently calls pci_epc_raise_irq() directly, but
pci_epc_raise_irq() may sleep (it takes epc->lock).

Avoid sleeping in atomic context by coalescing doorbell bits into an
atomic64 pending mask and raising MSIs from a work item. Limit the
amount of work per run to avoid monopolizing the workqueue under a
doorbell storm.

Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
Changes since v2:
  - Resolved a trivial context-only conflict after
    d799984233a5 ("PCI: endpoint: pci-epf-vntb: Stop cmd_handler work in epf_ntb_epc_cleanup")
    landed in pci/endpoint.

 drivers/pci/endpoint/functions/pci-epf-vntb.c | 106 +++++++++++++-----
 1 file changed, 78 insertions(+), 28 deletions(-)

diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index a75f8a30f8dc..bc3b3df53ddb 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -69,6 +69,9 @@ static struct workqueue_struct *kpcintb_workqueue;
 #define MAX_DB_COUNT			32
 #define MAX_MW				4
 
+/* Limit per-work execution to avoid monopolizing kworker on doorbell storms. */
+#define VNTB_PEER_DB_WORK_BUDGET	5
+
 enum epf_ntb_bar {
 	BAR_CONFIG,
 	BAR_DB,
@@ -129,6 +132,8 @@ struct epf_ntb {
 	u32 spad_count;
 	u64 mws_size[MAX_MW];
 	atomic64_t db;
+	atomic64_t peer_db_pending;
+	struct work_struct peer_db_work;
 	u32 vbus_number;
 	u16 vntb_pid;
 	u16 vntb_vid;
@@ -920,6 +925,8 @@ static int epf_ntb_epc_init(struct epf_ntb *ntb)
 	INIT_DELAYED_WORK(&ntb->cmd_handler, epf_ntb_cmd_handler);
 	queue_work(kpcintb_workqueue, &ntb->cmd_handler.work);
 
+	enable_work(&ntb->peer_db_work);
+
 	return 0;
 
 err_write_header:
@@ -943,6 +950,7 @@ static int epf_ntb_epc_init(struct epf_ntb *ntb)
 static void epf_ntb_epc_cleanup(struct epf_ntb *ntb)
 {
 	disable_delayed_work_sync(&ntb->cmd_handler);
+	disable_work_sync(&ntb->peer_db_work);
 	epf_ntb_mw_bar_clear(ntb, ntb->num_mws);
 	epf_ntb_db_bar_clear(ntb);
 	epf_ntb_config_sspad_bar_clear(ntb);
@@ -1357,41 +1365,79 @@ static int vntb_epf_peer_spad_write(struct ntb_dev *ndev, int pidx, int idx, u32
 	return 0;
 }
 
+static void vntb_epf_peer_db_work(struct work_struct *work)
+{
+	struct epf_ntb *ntb = container_of(work, struct epf_ntb, peer_db_work);
+	struct pci_epf *epf = ntb->epf;
+	unsigned int budget = VNTB_PEER_DB_WORK_BUDGET;
+	u8 func_no, vfunc_no;
+	u32 interrupt_num;
+	u64 db_bits;
+	int ret;
+
+	if (!epf || !epf->epc)
+		return;
+
+	func_no = epf->func_no;
+	vfunc_no = epf->vfunc_no;
+
+	/*
+	 * Drain doorbells from peer_db_pending in snapshots (atomic64_xchg()).
+	 * Limit the number of snapshots handled per run so we don't monopolize
+	 * the workqueue under a doorbell storm.
+	 */
+	while (budget--) {
+		db_bits = atomic64_xchg(&ntb->peer_db_pending, 0);
+		if (!db_bits)
+			return;
+
+		while (db_bits) {
+			/*
+			 * pci_epc_raise_irq() for MSI expects a 1-based
+			 * interrupt number. ffs() returns a 1-based index (bit
+			 * 0 -> 1). We historically add +2 to compute
+			 * interrupt_num.
+			 *
+			 * Legacy mapping (kept for compatibility):
+			 *
+			 *   MSI #1 : link event (reserved)
+			 *   MSI #2 : unused (historical offset)
+			 *   MSI #3 : doorbell bit 0 (DB#0)
+			 *   MSI #4 : doorbell bit 1 (DB#1)
+			 *   ...
+			 *
+			 * Do not change this mapping to avoid breaking
+			 * interoperability with older peers.
+			 */
+			interrupt_num = ffs(db_bits) + 2;
+			db_bits &= db_bits - 1;
+
+			ret = pci_epc_raise_irq(epf->epc, func_no, vfunc_no,
+						PCI_IRQ_MSI, interrupt_num);
+			if (ret)
+				dev_err(&ntb->ntb.dev,
+					"Failed to raise IRQ for interrupt_num %u: %d\n",
+					interrupt_num, ret);
+		}
+	}
+
+	if (atomic64_read(&ntb->peer_db_pending))
+		queue_work(kpcintb_workqueue, &ntb->peer_db_work);
+}
+
 static int vntb_epf_peer_db_set(struct ntb_dev *ndev, u64 db_bits)
 {
-	u32 interrupt_num = ffs(db_bits) + 1;
 	struct epf_ntb *ntb = ntb_ndev(ndev);
-	u8 func_no, vfunc_no;
-	int ret;
-
-	func_no = ntb->epf->func_no;
-	vfunc_no = ntb->epf->vfunc_no;
 
 	/*
-	 * pci_epc_raise_irq() for MSI expects a 1-based interrupt number.
-	 * ffs() returns a 1-based index (bit 0 -> 1). interrupt_num has already
-	 * been computed as ffs(db_bits) + 1 above. Adding one more +1 when
-	 * calling pci_epc_raise_irq() therefore results in:
-	 *
-	 *   doorbell bit 0 -> MSI #3
-	 *
-	 * Legacy mapping (kept for compatibility):
-	 *
-	 *   MSI #1 : link event (reserved)
-	 *   MSI #2 : unused (historical offset)
-	 *   MSI #3 : doorbell bit 0 (DB#0)
-	 *   MSI #4 : doorbell bit 1 (DB#1)
-	 *   ...
-	 *
-	 * Do not change this mapping to avoid breaking interoperability with
-	 * older peers.
+	 * .peer_db_set() may be called from atomic context. pci_epc_raise_irq()
+	 * can sleep (it takes epc->lock), so defer MSI raising to process
+	 * context. Doorbell requests are coalesced in peer_db_pending.
 	 */
-	ret = pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
-				PCI_IRQ_MSI, interrupt_num + 1);
-	if (ret)
-		dev_err(&ntb->ntb.dev, "Failed to raise IRQ\n");
+	atomic64_or(db_bits, &ntb->peer_db_pending);
+	queue_work(kpcintb_workqueue, &ntb->peer_db_work);
 
-	return ret;
+	return 0;
 }
 
 static u64 vntb_epf_db_read(struct ntb_dev *ndev)
@@ -1629,6 +1675,10 @@ static int epf_ntb_probe(struct pci_epf *epf,
 	ntb->epf = epf;
 	ntb->vbus_number = 0xff;
 
+	INIT_WORK(&ntb->peer_db_work, vntb_epf_peer_db_work);
+	disable_work(&ntb->peer_db_work);
+	atomic64_set(&ntb->peer_db_pending, 0);
+
 	/* Initially, no bar is assigned */
 	for (i = 0; i < VNTB_BAR_NUM; i++)
 		ntb->epf_ntb_bar[i] = NO_BAR;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 03/10] PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via ntb_db_event()
  2026-03-23  3:15 [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
  2026-03-23  3:15 ` [PATCH v3 01/10] PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset Koichiro Den
  2026-03-23  3:15 ` [PATCH v3 02/10] PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic context Koichiro Den
@ 2026-03-23  3:15 ` Koichiro Den
  2026-03-23  3:15 ` [PATCH v3 04/10] PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask Koichiro Den
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Koichiro Den @ 2026-03-23  3:15 UTC (permalink / raw)
  To: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel
  Cc: linux-kernel, linux-pci, ntb

ntb_db_event() expects the vector number to be relative to the first
doorbell vector starting at 0.

pci-epf-vntb reserves vector 0 for link events and uses higher vector
indices for doorbells. By passing the raw slot index to ntb_db_event(),
it effectively assumes that doorbell 0 maps to vector 1.

However, because the host uses a legacy slot layout and writes doorbell
0 into the third slot, doorbell 0 ultimately appears as vector 2 from
the NTB core perspective.

Adjust pci-epf-vntb to:
  - skip the unused second slot, and
  - report doorbells as 0-based vectors (DB#0 -> vector 0).

This change does not introduce a behavioral difference until
.db_vector_count()/.db_vector_mask() are implemented, because without
those callbacks NTB clients effectively ignore the vector number.

Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/pci/endpoint/functions/pci-epf-vntb.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index bc3b3df53ddb..e86dc530c08e 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -82,6 +82,12 @@ enum epf_ntb_bar {
 	VNTB_BAR_NUM,
 };
 
+enum epf_irq_slot {
+	EPF_IRQ_LINK = 0,
+	EPF_IRQ_RESERVED_DB, /* Historically skipped slot */
+	EPF_IRQ_DB_START,
+};
+
 /*
  * +--------------------------------------------------+ Base
  * |                                                  |
@@ -266,10 +272,11 @@ static void epf_ntb_cmd_handler(struct work_struct *work)
 
 	ntb = container_of(work, struct epf_ntb, cmd_handler.work);
 
-	for (i = 1; i < ntb->db_count && !ntb->msi_doorbell; i++) {
+	for (i = EPF_IRQ_DB_START; i < ntb->db_count && !ntb->msi_doorbell;
+	     i++) {
 		if (ntb->epf_db[i]) {
-			atomic64_or(1 << (i - 1), &ntb->db);
-			ntb_db_event(&ntb->ntb, i);
+			atomic64_or(1 << (i - EPF_IRQ_DB_START), &ntb->db);
+			ntb_db_event(&ntb->ntb, i - EPF_IRQ_DB_START);
 			ntb->epf_db[i] = 0;
 		}
 	}
@@ -335,10 +342,10 @@ static irqreturn_t epf_ntb_doorbell_handler(int irq, void *data)
 	struct epf_ntb *ntb = data;
 	int i;
 
-	for (i = 1; i < ntb->db_count; i++)
+	for (i = EPF_IRQ_DB_START; i < ntb->db_count; i++)
 		if (irq == ntb->epf->db_msg[i].virq) {
-			atomic64_or(1 << (i - 1), &ntb->db);
-			ntb_db_event(&ntb->ntb, i);
+			atomic64_or(1 << (i - EPF_IRQ_DB_START), &ntb->db);
+			ntb_db_event(&ntb->ntb, i - EPF_IRQ_DB_START);
 		}
 
 	return IRQ_HANDLED;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 04/10] PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask
  2026-03-23  3:15 [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
                   ` (2 preceding siblings ...)
  2026-03-23  3:15 ` [PATCH v3 03/10] PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via ntb_db_event() Koichiro Den
@ 2026-03-23  3:15 ` Koichiro Den
  2026-03-23  3:15 ` [PATCH v3 05/10] PCI: endpoint: pci-epf-vntb: Implement db_vector_count/mask for doorbells Koichiro Den
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Koichiro Den @ 2026-03-23  3:15 UTC (permalink / raw)
  To: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel
  Cc: linux-kernel, linux-pci, ntb

In pci-epf-vntb, db_count represents the total number of doorbell slots
exposed to the peer, including:
  - slot #0 reserved for link events, and
  - slot #1 historically unused (kept for compatibility).

Only the remaining slots correspond to actual doorbell bits. The current
db_valid_mask() exposes all slots as valid doorbells.

Limit db_valid_mask() to the real doorbell bits by returning
BIT_ULL(db_count - 2) - 1, and guard against db_count < 2.

Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/pci/endpoint/functions/pci-epf-vntb.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index e86dc530c08e..d91033ab8e03 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1268,7 +1268,10 @@ static int vntb_epf_peer_mw_count(struct ntb_dev *ntb)
 
 static u64 vntb_epf_db_valid_mask(struct ntb_dev *ntb)
 {
-	return BIT_ULL(ntb_ndev(ntb)->db_count) - 1;
+	if (ntb_ndev(ntb)->db_count < EPF_IRQ_DB_START)
+		return 0;
+
+	return BIT_ULL(ntb_ndev(ntb)->db_count - EPF_IRQ_DB_START) - 1;
 }
 
 static int vntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 05/10] PCI: endpoint: pci-epf-vntb: Implement db_vector_count/mask for doorbells
  2026-03-23  3:15 [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
                   ` (3 preceding siblings ...)
  2026-03-23  3:15 ` [PATCH v3 04/10] PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask Koichiro Den
@ 2026-03-23  3:15 ` Koichiro Den
  2026-03-23  3:15 ` [PATCH v3 06/10] NTB: epf: Document legacy doorbell slot offset in ntb_epf_peer_db_set() Koichiro Den
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Koichiro Den @ 2026-03-23  3:15 UTC (permalink / raw)
  To: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel
  Cc: linux-kernel, linux-pci, ntb

Implement .db_vector_count and .db_vector_mask so ntb core/clients can map
doorbell events to per-vector work and avoid the thundering-herd behavior.

pci-epf-vntb reserves two slots in db_count: slot 0 for link events and
slot 1 which is historically unused. Therefore the number of doorbell
vectors is (db_count - 2).

Report vectors as 0..N-1 and return BIT_ULL(db_vector) for the
corresponding doorbell bit. While at it, use vntb_epf_db_vector_mask()
to simplify vntb_epf_db_valid_mask().

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/pci/endpoint/functions/pci-epf-vntb.c | 36 +++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index d91033ab8e03..d83a9be1113f 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1266,12 +1266,42 @@ static int vntb_epf_peer_mw_count(struct ntb_dev *ntb)
 	return ntb_ndev(ntb)->num_mws;
 }
 
+static int vntb_epf_db_vector_count(struct ntb_dev *ntb)
+{
+	struct epf_ntb *ndev = ntb_ndev(ntb);
+
+	/*
+	 * ndev->db_count is the total number of doorbell slots exposed to
+	 * the peer, including:
+	 *   - slot #0 reserved for link events
+	 *   - slot #1 historically unused (kept for protocol compatibility)
+	 *
+	 * Report only usable per-vector doorbell interrupts.
+	 */
+	if (ndev->db_count < EPF_IRQ_DB_START)
+		return 0;
+
+	return ndev->db_count - EPF_IRQ_DB_START;
+}
+
 static u64 vntb_epf_db_valid_mask(struct ntb_dev *ntb)
 {
-	if (ntb_ndev(ntb)->db_count < EPF_IRQ_DB_START)
+	return BIT_ULL(vntb_epf_db_vector_count(ntb)) - 1;
+}
+
+static u64 vntb_epf_db_vector_mask(struct ntb_dev *ntb, int db_vector)
+{
+	int nr_vec;
+
+	/*
+	 * Doorbell vectors are numbered [0 .. nr_vec - 1], where nr_vec
+	 * excludes the two reserved slots described above.
+	 */
+	nr_vec = vntb_epf_db_vector_count(ntb);
+	if (db_vector < 0 || db_vector >= nr_vec)
 		return 0;
 
-	return BIT_ULL(ntb_ndev(ntb)->db_count - EPF_IRQ_DB_START) - 1;
+	return BIT_ULL(db_vector);
 }
 
 static int vntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
@@ -1508,6 +1538,8 @@ static const struct ntb_dev_ops vntb_epf_ops = {
 	.spad_count		= vntb_epf_spad_count,
 	.peer_mw_count		= vntb_epf_peer_mw_count,
 	.db_valid_mask		= vntb_epf_db_valid_mask,
+	.db_vector_count	= vntb_epf_db_vector_count,
+	.db_vector_mask		= vntb_epf_db_vector_mask,
 	.db_set_mask		= vntb_epf_db_set_mask,
 	.mw_set_trans		= vntb_epf_mw_set_trans,
 	.mw_clear_trans		= vntb_epf_mw_clear_trans,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 06/10] NTB: epf: Document legacy doorbell slot offset in ntb_epf_peer_db_set()
  2026-03-23  3:15 [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
                   ` (4 preceding siblings ...)
  2026-03-23  3:15 ` [PATCH v3 05/10] PCI: endpoint: pci-epf-vntb: Implement db_vector_count/mask for doorbells Koichiro Den
@ 2026-03-23  3:15 ` Koichiro Den
  2026-03-23  3:15 ` [PATCH v3 07/10] NTB: epf: Make db_valid_mask cover only real doorbell bits Koichiro Den
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Koichiro Den @ 2026-03-23  3:15 UTC (permalink / raw)
  To: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel
  Cc: linux-kernel, linux-pci, ntb

ntb_epf_peer_db_set() uses ffs(db_bits) to select a doorbell to ring.
ffs() returns a 1-based bit index (bit 0 -> 1).

Entry 0 is reserved for link events, so doorbell bit 0 must map to entry
1. However, since the initial commit 812ce2f8d14e ("NTB: Add support for
EPF PCI Non-Transparent Bridge"), the implementation has been adding an
extra +1, ending up using entry 2 for bit 0. Fixing the extra increment
would break interoperability with peers running older kernels.

Keep the legacy behavior and document the offset and the resulting slot
layout to avoid confusion when enabling per-db-vector handling.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/hw/epf/ntb_hw_epf.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index d3ecf25a5162..bce7130fec39 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -43,6 +43,18 @@
 #define NTB_EPF_DB_DATA(n)	(0x34 + (n) * 4)
 #define NTB_EPF_DB_OFFSET(n)	(0xB4 + (n) * 4)
 
+/*
+ * Legacy doorbell slot layout when paired with pci-epf-*ntb:
+ *
+ *   slot 0 : reserved for link events
+ *   slot 1 : unused (historical extra offset)
+ *   slot 2 : DB#0
+ *   slot 3 : DB#1
+ *   ...
+ *
+ * Thus, NTB_EPF_MIN_DB_COUNT=3 means that we at least create vectors for
+ * doorbells DB#0 and DB#1.
+ */
 #define NTB_EPF_MIN_DB_COUNT	3
 #define NTB_EPF_MAX_DB_COUNT	31
 
@@ -473,6 +485,14 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
 static int ntb_epf_peer_db_set(struct ntb_dev *ntb, u64 db_bits)
 {
 	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
+	/*
+	 * ffs() returns a 1-based bit index (bit 0 -> 1).
+	 *
+	 * With slot 0 reserved for link events, DB#0 would naturally map to
+	 * slot 1. Historically an extra +1 offset was added, so DB#0 maps to
+	 * slot 2 and slot 1 remains unused. Keep this mapping for
+	 * backward-compatibility.
+	 */
 	u32 interrupt_num = ffs(db_bits) + 1;
 	struct device *dev = ndev->dev;
 	u32 db_entry_size;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 07/10] NTB: epf: Make db_valid_mask cover only real doorbell bits
  2026-03-23  3:15 [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
                   ` (5 preceding siblings ...)
  2026-03-23  3:15 ` [PATCH v3 06/10] NTB: epf: Document legacy doorbell slot offset in ntb_epf_peer_db_set() Koichiro Den
@ 2026-03-23  3:15 ` Koichiro Den
  2026-03-23  3:15 ` [PATCH v3 08/10] NTB: epf: Report 0-based doorbell vector via ntb_db_event() Koichiro Den
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Koichiro Den @ 2026-03-23  3:15 UTC (permalink / raw)
  To: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel
  Cc: linux-kernel, linux-pci, ntb

ndev->db_count includes an unused doorbell slot due to the legacy extra
offset in the peer doorbell path. db_valid_mask must cover only the real
doorbell bits and exclude the unused slot.

Set db_valid_mask to BIT_ULL(db_count - 1) - 1.

Fixes: 812ce2f8d14e ("NTB: Add support for EPF PCI Non-Transparent Bridge")
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/hw/epf/ntb_hw_epf.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index bce7130fec39..07dc97d3270b 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -580,7 +580,17 @@ static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
 		return ret;
 	}
 
-	ndev->db_valid_mask = BIT_ULL(ndev->db_count) - 1;
+	if (ndev->db_count < NTB_EPF_MIN_DB_COUNT) {
+		dev_err(dev, "db_count %u is less than %u\n", ndev->db_count,
+			NTB_EPF_MIN_DB_COUNT);
+		return -EINVAL;
+	}
+
+	/*
+	 * ndev->db_count includes an extra skipped slot due to the legacy
+	 * doorbell layout, hence -1.
+	 */
+	ndev->db_valid_mask = BIT_ULL(ndev->db_count - 1) - 1;
 	ndev->mw_count = readl(ndev->ctrl_reg + NTB_EPF_MW_COUNT);
 	ndev->spad_count = readl(ndev->ctrl_reg + NTB_EPF_SPAD_COUNT);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 08/10] NTB: epf: Report 0-based doorbell vector via ntb_db_event()
  2026-03-23  3:15 [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
                   ` (6 preceding siblings ...)
  2026-03-23  3:15 ` [PATCH v3 07/10] NTB: epf: Make db_valid_mask cover only real doorbell bits Koichiro Den
@ 2026-03-23  3:15 ` Koichiro Den
  2026-03-23  3:15 ` [PATCH v3 09/10] NTB: epf: Fix doorbell bitmask handling in db_read/db_clear Koichiro Den
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Koichiro Den @ 2026-03-23  3:15 UTC (permalink / raw)
  To: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel
  Cc: linux-kernel, linux-pci, ntb

ntb_db_event() expects the vector number to be relative to the first
doorbell vector starting at 0.

Vector 0 is reserved for link events in the EPF driver, so doorbells
start at vector 1. However, both supported peers (ntb_hw_epf with
pci-epf-ntb, and pci-epf-vntb) have historically skipped vector 1 and
started doorbells at vector 2.

Pass (irq_no - 2) to ntb_db_event() so doorbells are reported as 0..N-1.
If irq_no == 1 is ever observed, treat it as DB#0 and emit a warning, as
this would indicate an unexpected change in the slot layout.

Fixes: 812ce2f8d14e ("NTB: Add support for EPF PCI Non-Transparent Bridge")
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Suggested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/hw/epf/ntb_hw_epf.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index 07dc97d3270b..67cdc5d729d5 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -81,6 +81,12 @@ enum epf_ntb_bar {
 	NTB_BAR_NUM,
 };
 
+enum epf_irq_slot {
+	EPF_IRQ_LINK = 0,
+	EPF_IRQ_RESERVED_DB, /* Historically skipped slot */
+	EPF_IRQ_DB_START,
+};
+
 #define NTB_EPF_MAX_MW_COUNT	(NTB_BAR_NUM - BAR_MW1)
 
 struct ntb_epf_dev {
@@ -333,10 +339,15 @@ static irqreturn_t ntb_epf_vec_isr(int irq, void *dev)
 	irq_no = irq - pci_irq_vector(ndev->ntb.pdev, 0);
 	ndev->db_val = irq_no + 1;
 
-	if (irq_no == 0)
+	if (irq_no == EPF_IRQ_LINK) {
 		ntb_link_event(&ndev->ntb);
-	else
-		ntb_db_event(&ndev->ntb, irq_no);
+	} else if (irq_no == EPF_IRQ_RESERVED_DB) {
+		dev_warn_ratelimited(ndev->dev,
+				     "Unexpected irq_no 1 received. Treat it as DB#0.\n");
+		ntb_db_event(&ndev->ntb, 0);
+	} else {
+		ntb_db_event(&ndev->ntb, irq_no - EPF_IRQ_DB_START);
+	}
 
 	return IRQ_HANDLED;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 09/10] NTB: epf: Fix doorbell bitmask handling in db_read/db_clear
  2026-03-23  3:15 [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
                   ` (7 preceding siblings ...)
  2026-03-23  3:15 ` [PATCH v3 08/10] NTB: epf: Report 0-based doorbell vector via ntb_db_event() Koichiro Den
@ 2026-03-23  3:15 ` Koichiro Den
  2026-05-01 16:29   ` Manivannan Sadhasivam
  2026-03-23  3:15 ` [PATCH v3 10/10] NTB: epf: Implement db_vector_count/mask for doorbells Koichiro Den
  2026-03-23 15:43 ` [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
  10 siblings, 1 reply; 20+ messages in thread
From: Koichiro Den @ 2026-03-23  3:15 UTC (permalink / raw)
  To: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel
  Cc: linux-kernel, linux-pci, ntb

The EPF driver currently stores the incoming doorbell as a vector number
(irq_no + 1) in db_val and db_clear() clears all bits unconditionally.
This breaks db_read()/db_clear() semantics when multiple doorbells are
used.

Store doorbells as a bitmask (BIT_ULL(vector)) and make
db_clear(db_bits) clear only the specified bits. Use atomic64 operations
as db_val is updated from interrupt context.

Fixes: 812ce2f8d14e ("NTB: Add support for EPF PCI Non-Transparent Bridge")
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Suggested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/hw/epf/ntb_hw_epf.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index 67cdc5d729d5..741d30821390 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -6,6 +6,7 @@
  * Author: Kishon Vijay Abraham I <kishon@ti.com>
  */
 
+#include <linux/atomic.h>
 #include <linux/delay.h>
 #include <linux/module.h>
 #include <linux/pci.h>
@@ -108,7 +109,7 @@ struct ntb_epf_dev {
 	unsigned int self_spad;
 	unsigned int peer_spad;
 
-	int db_val;
+	atomic64_t db_val;
 	u64 db_valid_mask;
 };
 
@@ -337,15 +338,16 @@ static irqreturn_t ntb_epf_vec_isr(int irq, void *dev)
 	int irq_no;
 
 	irq_no = irq - pci_irq_vector(ndev->ntb.pdev, 0);
-	ndev->db_val = irq_no + 1;
 
 	if (irq_no == EPF_IRQ_LINK) {
 		ntb_link_event(&ndev->ntb);
 	} else if (irq_no == EPF_IRQ_RESERVED_DB) {
 		dev_warn_ratelimited(ndev->dev,
 				     "Unexpected irq_no 1 received. Treat it as DB#0.\n");
+		atomic64_or(BIT_ULL(0), &ndev->db_val);
 		ntb_db_event(&ndev->ntb, 0);
 	} else {
+		atomic64_or(BIT_ULL(irq_no - EPF_IRQ_DB_START), &ndev->db_val);
 		ntb_db_event(&ndev->ntb, irq_no - EPF_IRQ_DB_START);
 	}
 
@@ -530,7 +532,7 @@ static u64 ntb_epf_db_read(struct ntb_dev *ntb)
 {
 	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
 
-	return ndev->db_val;
+	return atomic64_read(&ndev->db_val);
 }
 
 static int ntb_epf_db_clear_mask(struct ntb_dev *ntb, u64 db_bits)
@@ -542,7 +544,7 @@ static int ntb_epf_db_clear(struct ntb_dev *ntb, u64 db_bits)
 {
 	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
 
-	ndev->db_val = 0;
+	atomic64_and(~db_bits, &ndev->db_val);
 
 	return 0;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 10/10] NTB: epf: Implement db_vector_count/mask for doorbells
  2026-03-23  3:15 [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
                   ` (8 preceding siblings ...)
  2026-03-23  3:15 ` [PATCH v3 09/10] NTB: epf: Fix doorbell bitmask handling in db_read/db_clear Koichiro Den
@ 2026-03-23  3:15 ` Koichiro Den
  2026-03-23 15:43 ` [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
  10 siblings, 0 replies; 20+ messages in thread
From: Koichiro Den @ 2026-03-23  3:15 UTC (permalink / raw)
  To: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel
  Cc: linux-kernel, linux-pci, ntb

Implement .db_vector_count and .db_vector_mask so ntb core/clients can
map doorbell events to per-vector work.

Report vectors as 0..(db_count - 2) (skipping the unused slot) and
return BIT_ULL(db_vector) for the corresponding doorbell bit.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/hw/epf/ntb_hw_epf.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index 741d30821390..d420699ff7d6 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -420,6 +420,34 @@ static u64 ntb_epf_db_valid_mask(struct ntb_dev *ntb)
 	return ntb_ndev(ntb)->db_valid_mask;
 }
 
+static int ntb_epf_db_vector_count(struct ntb_dev *ntb)
+{
+	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
+
+	/*
+	 * ndev->db_count includes an extra skipped slot due to the legacy
+	 * doorbell layout. Expose only the real doorbell vectors.
+	 */
+	if (ndev->db_count < 1)
+		return 0;
+
+	return ntb_ndev(ntb)->db_count - 1;
+}
+
+static u64 ntb_epf_db_vector_mask(struct ntb_dev *ntb, int db_vector)
+{
+	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
+
+	/*
+	 * ndev->db_count includes one skipped slot in the legacy layout. Valid
+	 * doorbell vectors are therefore [0 .. (db_count - 2)].
+	 */
+	if (db_vector < 0 || db_vector >= ndev->db_count - 1)
+		return 0;
+
+	return BIT_ULL(db_vector);
+}
+
 static int ntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
 {
 	return 0;
@@ -554,6 +582,8 @@ static const struct ntb_dev_ops ntb_epf_ops = {
 	.spad_count		= ntb_epf_spad_count,
 	.peer_mw_count		= ntb_epf_peer_mw_count,
 	.db_valid_mask		= ntb_epf_db_valid_mask,
+	.db_vector_count	= ntb_epf_db_vector_count,
+	.db_vector_mask		= ntb_epf_db_vector_mask,
 	.db_set_mask		= ntb_epf_db_set_mask,
 	.mw_set_trans		= ntb_epf_mw_set_trans,
 	.mw_clear_trans		= ntb_epf_mw_clear_trans,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling
  2026-03-23  3:15 [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
                   ` (9 preceding siblings ...)
  2026-03-23  3:15 ` [PATCH v3 10/10] NTB: epf: Implement db_vector_count/mask for doorbells Koichiro Den
@ 2026-03-23 15:43 ` Koichiro Den
  2026-03-25  6:23   ` Niklas Cassel
  2026-04-07 15:21   ` Koichiro Den
  10 siblings, 2 replies; 20+ messages in thread
From: Koichiro Den @ 2026-03-23 15:43 UTC (permalink / raw)
  To: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel
  Cc: linux-kernel, linux-pci, ntb

On Mon, Mar 23, 2026 at 12:15:34PM +0900, Koichiro Den wrote:
> This series fixes doorbell bit/vector handling for the EPF-based NTB
> pair (ntb_hw_epf <-> pci-epf-*ntb). Its primary goal is to enable safe
> per-db-vector handling in the NTB core and clients (e.g. ntb_transport),
> without changing the on-the-wire doorbell mapping.
> 
> 
> Background / problem
> ====================
> 
> ntb_hw_epf historically applies an extra offset when ringing peer
> doorbells: the link event uses the first interrupt slot, and doorbells
> start from the third slot (i.e. a second slot is effectively unused).
> pci-epf-vntb carries the matching offset on the EP side as well.
> 
> As long as db_vector_count()/db_vector_mask() are not implemented, this
> mismatch is mostly masked. Doorbell events are effectively treated as
> "can hit any QP" and the off-by-one vector numbering does not surface
> clearly.
> 
> However, once per-vector handling is enabled, the current state becomes
> problematic:
> 
>   - db_valid_mask exposes bits that do not correspond to real doorbells
>     (link/unused slots leak into the mask).
>   - ntb_db_event() is fed with 1-based/shifted vectors, while NTB core
>     expects a 0-based db_vector for doorbells.
>   - On pci-epf-vntb, .peer_db_set() may be called in atomic context, but
>     it directly calls pci_epc_raise_irq(), which can sleep.
> 
> 
> Why NOT fix the root offset?
> ============================
> 
> The natural "root" fix would be to remove the historical extra offset in
> the peer_db_set() doorbell paths for ntb_hw_epf and pci-epf-vntb.
> Unfortunately this would lead to interoperability issues when mixing old
> and new kernel versions (old/new peers). A new side would ring a
> different interrupt slot than what an old peer expects, leading to
> missed or misrouted doorbells, once db_vector_count()/db_vector_mask()
> are implemented.
> 
> Therefore this series intentionally keeps the legacy offset, and instead
> fixes the surrounding pieces so the mapping is documented and handled
> consistently in masks, vector numbering, and per-vector reporting.
> 
> 
> What this series does
> =====================
> 
> - pci-epf-vntb:
> 
>   - Document the legacy offset.
>   - Defer MSI doorbell raises to process context to avoid sleeping in
>     atomic context. This becomes relevant once multiple doorbells are
>     raised concurrently at a high rate.
>   - Report doorbell vectors as 0-based to ntb_db_event().
>   - Fix db_valid_mask and implement db_vector_count()/db_vector_mask().
> 
> - ntb_hw_epf:
> 
>   - Document the legacy offset in ntb_epf_peer_db_set().
>   - Fix db_valid_mask to cover only real doorbell bits.
>   - Report 0-based db_vector to ntb_db_event() (accounting for the
>     unused slot).
>   - Keep db_val as a bitmask and fix db_read/db_clear semantics
>     accordingly.
>   - Implement db_vector_count()/db_vector_mask().
> 
> 
> Compatibility
> =============
> 
> By keeping the legacy offset intact, this series aims to remain
> compatible across mixed kernel versions. The observable changes are
> limited to correct mask/vector reporting and safer execution context
> handling.
> 
> Patches 1-5 (PCI Endpoint) and 6-10 (NTB) are independent and can be
> applied separately through the respective trees. They are sent together
> in this v3 for convenience.
> 
> Once the remaining acks from NTB maintainers are collected, the plan is

Hi Dave,

When you have a chance, I'd appreciate another look at patches 6-10 (which are
unchanged since v2). If you do not see any blockers, Acked-by would be
greatly appreciated.


P.S. Regarding Sashiko's feedback [1], my understanding is that there are no
blockers, but there are a few points that would be better addressed separately
as orthogonal follow-ups:

- configfs knobs mutability and bounds checking, including (but not limited to)
  db_count.
  In my opinion, allowing updates after .bind() looks questionable, and
  returning -EBUSY once bound seems more appropriate. I'm leaning toward
  handling this as a separate hardening series.

- ntb_hw_epf IRQ unwind concern.
  This is what I was trying to address in [2], which I hope will land soon.

- Other lifecycle concerns.
  These are largely tied to the current vNTB implementation and were part of
  [3], for which I still plan to post a follow-up series that adds .remove()
  implementation to vntb_pci_driver.

[1] https://sashiko.dev/#/patchset/20260323031544.2598111-1-den%40valinux.co.jp
[2] https://lore.kernel.org/ntb/20260304083028.1391068-1-den@valinux.co.jp/
[3] https://lore.kernel.org/all/20260226084142.2226875-1-den@valinux.co.jp/


Best regards,
Koichiro

> to take the whole series through the PCI EP tree. See:
> https://lore.kernel.org/linux-pci/rnzsnp5de4qf5w7smebkmqekpuaqckltx73rj6ha3q2nrby5yp@7hsgvdzvjkp6/
> 
> ---
> Changelog
> =========
> 
> Changes since v2:
>   - No functional changes.
>   - Rebased onto current pci/endpoint
>     e022f0c72c7f ("selftests: pci_endpoint: Skip reserved BARs").
>     * Patch 2 needed a trivial context-only adjustment while rebasing, due
>       to commit d799984233a5 ("PCI: endpoint: pci-epf-vntb: Stop
>       cmd_handler work in epf_ntb_epc_cleanup").
>   - Picked up additional Reviewed-by tags from Frank.
>   - Fixed the incorrect v2 series title.
> 
> Changes since v1:
>   - Addressed feedback from Dave (add a source code comment, introduce
>     enum to eliminate magic numbers)
>   - Updated source code comment in Patch 2.
>   - No functional changes, so retained Reviewed-by tags by Frank and Dave.
>     Thank you both for the review.
> 
> v2: https://lore.kernel.org/linux-pci/20260227084955.3184017-1-den@valinux.co.jp/
> v1: https://lore.kernel.org/linux-pci/20260224133459.1741537-1-den@valinux.co.jp/
> 
> 
> Best regards,
> Koichiro
> 
> 
> Koichiro Den (10):
>   PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset
>   PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic
>     context
>   PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via
>     ntb_db_event()
>   PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask
>   PCI: endpoint: pci-epf-vntb: Implement db_vector_count/mask for
>     doorbells
>   NTB: epf: Document legacy doorbell slot offset in
>     ntb_epf_peer_db_set()
>   NTB: epf: Make db_valid_mask cover only real doorbell bits
>   NTB: epf: Report 0-based doorbell vector via ntb_db_event()
>   NTB: epf: Fix doorbell bitmask handling in db_read/db_clear
>   NTB: epf: Implement db_vector_count/mask for doorbells
> 
>  drivers/ntb/hw/epf/ntb_hw_epf.c               |  89 ++++++++++-
>  drivers/pci/endpoint/functions/pci-epf-vntb.c | 147 +++++++++++++++---
>  2 files changed, 210 insertions(+), 26 deletions(-)
> 
> -- 
> 2.51.0
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 02/10] PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic context
  2026-03-23  3:15 ` [PATCH v3 02/10] PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic context Koichiro Den
@ 2026-03-23 18:27   ` Frank Li
  2026-05-01 16:26   ` Manivannan Sadhasivam
  1 sibling, 0 replies; 20+ messages in thread
From: Frank Li @ 2026-03-23 18:27 UTC (permalink / raw)
  To: Koichiro Den
  Cc: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Jerome Brunet, Lorenzo Pieralisi, Niklas Cassel, linux-kernel,
	linux-pci, ntb

On Mon, Mar 23, 2026 at 12:15:36PM +0900, Koichiro Den wrote:
> The NTB .peer_db_set() callback may be invoked from atomic context.
> pci-epf-vntb currently calls pci_epc_raise_irq() directly, but
> pci_epc_raise_irq() may sleep (it takes epc->lock).
>
> Avoid sleeping in atomic context by coalescing doorbell bits into an
> atomic64 pending mask and raising MSIs from a work item. Limit the
> amount of work per run to avoid monopolizing the workqueue under a
> doorbell storm.
>
> Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---

Reviewed-by: Frank Li <Frank.Li@nxp.com>

> Changes since v2:
>   - Resolved a trivial context-only conflict after
>     d799984233a5 ("PCI: endpoint: pci-epf-vntb: Stop cmd_handler work in epf_ntb_epc_cleanup")
>     landed in pci/endpoint.
>
>  drivers/pci/endpoint/functions/pci-epf-vntb.c | 106 +++++++++++++-----
>  1 file changed, 78 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> index a75f8a30f8dc..bc3b3df53ddb 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> @@ -69,6 +69,9 @@ static struct workqueue_struct *kpcintb_workqueue;
>  #define MAX_DB_COUNT			32
>  #define MAX_MW				4
>
> +/* Limit per-work execution to avoid monopolizing kworker on doorbell storms. */
> +#define VNTB_PEER_DB_WORK_BUDGET	5
> +
>  enum epf_ntb_bar {
>  	BAR_CONFIG,
>  	BAR_DB,
> @@ -129,6 +132,8 @@ struct epf_ntb {
>  	u32 spad_count;
>  	u64 mws_size[MAX_MW];
>  	atomic64_t db;
> +	atomic64_t peer_db_pending;
> +	struct work_struct peer_db_work;
>  	u32 vbus_number;
>  	u16 vntb_pid;
>  	u16 vntb_vid;
> @@ -920,6 +925,8 @@ static int epf_ntb_epc_init(struct epf_ntb *ntb)
>  	INIT_DELAYED_WORK(&ntb->cmd_handler, epf_ntb_cmd_handler);
>  	queue_work(kpcintb_workqueue, &ntb->cmd_handler.work);
>
> +	enable_work(&ntb->peer_db_work);
> +
>  	return 0;
>
>  err_write_header:
> @@ -943,6 +950,7 @@ static int epf_ntb_epc_init(struct epf_ntb *ntb)
>  static void epf_ntb_epc_cleanup(struct epf_ntb *ntb)
>  {
>  	disable_delayed_work_sync(&ntb->cmd_handler);
> +	disable_work_sync(&ntb->peer_db_work);
>  	epf_ntb_mw_bar_clear(ntb, ntb->num_mws);
>  	epf_ntb_db_bar_clear(ntb);
>  	epf_ntb_config_sspad_bar_clear(ntb);
> @@ -1357,41 +1365,79 @@ static int vntb_epf_peer_spad_write(struct ntb_dev *ndev, int pidx, int idx, u32
>  	return 0;
>  }
>
> +static void vntb_epf_peer_db_work(struct work_struct *work)
> +{
> +	struct epf_ntb *ntb = container_of(work, struct epf_ntb, peer_db_work);
> +	struct pci_epf *epf = ntb->epf;
> +	unsigned int budget = VNTB_PEER_DB_WORK_BUDGET;
> +	u8 func_no, vfunc_no;
> +	u32 interrupt_num;
> +	u64 db_bits;
> +	int ret;
> +
> +	if (!epf || !epf->epc)
> +		return;
> +
> +	func_no = epf->func_no;
> +	vfunc_no = epf->vfunc_no;
> +
> +	/*
> +	 * Drain doorbells from peer_db_pending in snapshots (atomic64_xchg()).
> +	 * Limit the number of snapshots handled per run so we don't monopolize
> +	 * the workqueue under a doorbell storm.
> +	 */
> +	while (budget--) {
> +		db_bits = atomic64_xchg(&ntb->peer_db_pending, 0);
> +		if (!db_bits)
> +			return;
> +
> +		while (db_bits) {
> +			/*
> +			 * pci_epc_raise_irq() for MSI expects a 1-based
> +			 * interrupt number. ffs() returns a 1-based index (bit
> +			 * 0 -> 1). We historically add +2 to compute
> +			 * interrupt_num.
> +			 *
> +			 * Legacy mapping (kept for compatibility):
> +			 *
> +			 *   MSI #1 : link event (reserved)
> +			 *   MSI #2 : unused (historical offset)
> +			 *   MSI #3 : doorbell bit 0 (DB#0)
> +			 *   MSI #4 : doorbell bit 1 (DB#1)
> +			 *   ...
> +			 *
> +			 * Do not change this mapping to avoid breaking
> +			 * interoperability with older peers.
> +			 */
> +			interrupt_num = ffs(db_bits) + 2;
> +			db_bits &= db_bits - 1;
> +
> +			ret = pci_epc_raise_irq(epf->epc, func_no, vfunc_no,
> +						PCI_IRQ_MSI, interrupt_num);
> +			if (ret)
> +				dev_err(&ntb->ntb.dev,
> +					"Failed to raise IRQ for interrupt_num %u: %d\n",
> +					interrupt_num, ret);
> +		}
> +	}
> +
> +	if (atomic64_read(&ntb->peer_db_pending))
> +		queue_work(kpcintb_workqueue, &ntb->peer_db_work);
> +}
> +
>  static int vntb_epf_peer_db_set(struct ntb_dev *ndev, u64 db_bits)
>  {
> -	u32 interrupt_num = ffs(db_bits) + 1;
>  	struct epf_ntb *ntb = ntb_ndev(ndev);
> -	u8 func_no, vfunc_no;
> -	int ret;
> -
> -	func_no = ntb->epf->func_no;
> -	vfunc_no = ntb->epf->vfunc_no;
>
>  	/*
> -	 * pci_epc_raise_irq() for MSI expects a 1-based interrupt number.
> -	 * ffs() returns a 1-based index (bit 0 -> 1). interrupt_num has already
> -	 * been computed as ffs(db_bits) + 1 above. Adding one more +1 when
> -	 * calling pci_epc_raise_irq() therefore results in:
> -	 *
> -	 *   doorbell bit 0 -> MSI #3
> -	 *
> -	 * Legacy mapping (kept for compatibility):
> -	 *
> -	 *   MSI #1 : link event (reserved)
> -	 *   MSI #2 : unused (historical offset)
> -	 *   MSI #3 : doorbell bit 0 (DB#0)
> -	 *   MSI #4 : doorbell bit 1 (DB#1)
> -	 *   ...
> -	 *
> -	 * Do not change this mapping to avoid breaking interoperability with
> -	 * older peers.
> +	 * .peer_db_set() may be called from atomic context. pci_epc_raise_irq()
> +	 * can sleep (it takes epc->lock), so defer MSI raising to process
> +	 * context. Doorbell requests are coalesced in peer_db_pending.
>  	 */
> -	ret = pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
> -				PCI_IRQ_MSI, interrupt_num + 1);
> -	if (ret)
> -		dev_err(&ntb->ntb.dev, "Failed to raise IRQ\n");
> +	atomic64_or(db_bits, &ntb->peer_db_pending);
> +	queue_work(kpcintb_workqueue, &ntb->peer_db_work);
>
> -	return ret;
> +	return 0;
>  }
>
>  static u64 vntb_epf_db_read(struct ntb_dev *ndev)
> @@ -1629,6 +1675,10 @@ static int epf_ntb_probe(struct pci_epf *epf,
>  	ntb->epf = epf;
>  	ntb->vbus_number = 0xff;
>
> +	INIT_WORK(&ntb->peer_db_work, vntb_epf_peer_db_work);
> +	disable_work(&ntb->peer_db_work);
> +	atomic64_set(&ntb->peer_db_pending, 0);
> +
>  	/* Initially, no bar is assigned */
>  	for (i = 0; i < VNTB_BAR_NUM; i++)
>  		ntb->epf_ntb_bar[i] = NO_BAR;
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling
  2026-03-23 15:43 ` [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
@ 2026-03-25  6:23   ` Niklas Cassel
  2026-03-25  8:44     ` Koichiro Den
  2026-04-07 15:21   ` Koichiro Den
  1 sibling, 1 reply; 20+ messages in thread
From: Niklas Cassel @ 2026-03-25  6:23 UTC (permalink / raw)
  To: Koichiro Den
  Cc: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, linux-kernel,
	linux-pci, ntb

Hello Koichiro,

On Tue, Mar 24, 2026 at 12:43:53AM +0900, Koichiro Den wrote:
> 
> - configfs knobs mutability and bounds checking, including (but not limited to)
>   db_count.
>   In my opinion, allowing updates after .bind() looks questionable, and
>   returning -EBUSY once bound seems more appropriate. I'm leaning toward
>   handling this as a separate hardening series.

This is in line with what we did for pci-epf-test, see commit:
ffcc4850a161 ("PCI: endpoint: pci-epf-test: Allow overriding default BAR sizes")

but we return EOPNOTSUPP instead of EBUSY.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling
  2026-03-25  6:23   ` Niklas Cassel
@ 2026-03-25  8:44     ` Koichiro Den
  0 siblings, 0 replies; 20+ messages in thread
From: Koichiro Den @ 2026-03-25  8:44 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jon Mason, Dave Jiang, Allen Hubbe, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas,
	Frank Li, Jerome Brunet, Lorenzo Pieralisi, linux-kernel,
	linux-pci, ntb

On Wed, Mar 25, 2026 at 07:23:37AM +0100, Niklas Cassel wrote:
> Hello Koichiro,
> 
> On Tue, Mar 24, 2026 at 12:43:53AM +0900, Koichiro Den wrote:
> > 
> > - configfs knobs mutability and bounds checking, including (but not limited to)
> >   db_count.
> >   In my opinion, allowing updates after .bind() looks questionable, and
> >   returning -EBUSY once bound seems more appropriate. I'm leaning toward
> >   handling this as a separate hardening series.
> 
> This is in line with what we did for pci-epf-test, see commit:
> ffcc4850a161 ("PCI: endpoint: pci-epf-test: Allow overriding default BAR sizes")
> 
> but we return EOPNOTSUPP instead of EBUSY.

Yes, I remember you were discussing this with Mani in another thread. For
consistency, we should use -EOPNOTSUPP here as well. Thanks for the reminder.

Best regards,
Koichiro

> 
> 
> Kind regards,
> Niklas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling
  2026-03-23 15:43 ` [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
  2026-03-25  6:23   ` Niklas Cassel
@ 2026-04-07 15:21   ` Koichiro Den
  1 sibling, 0 replies; 20+ messages in thread
From: Koichiro Den @ 2026-04-07 15:21 UTC (permalink / raw)
  To: Dave Jiang, Manivannan Sadhasivam
  Cc: Jon Mason, Allen Hubbe, Krzysztof Wilczyński,
	Kishon Vijay Abraham I, Bjorn Helgaas, Frank Li, Jerome Brunet,
	Lorenzo Pieralisi, Niklas Cassel, linux-kernel, linux-pci, ntb

On Tue, Mar 24, 2026 at 12:43:55AM +0900, Koichiro Den wrote:
> On Mon, Mar 23, 2026 at 12:15:34PM +0900, Koichiro Den wrote:
> > This series fixes doorbell bit/vector handling for the EPF-based NTB
> > pair (ntb_hw_epf <-> pci-epf-*ntb). Its primary goal is to enable safe
> > per-db-vector handling in the NTB core and clients (e.g. ntb_transport),
> > without changing the on-the-wire doorbell mapping.
> > 
> > 
> > Background / problem
> > ====================
> > 
> > ntb_hw_epf historically applies an extra offset when ringing peer
> > doorbells: the link event uses the first interrupt slot, and doorbells
> > start from the third slot (i.e. a second slot is effectively unused).
> > pci-epf-vntb carries the matching offset on the EP side as well.
> > 
> > As long as db_vector_count()/db_vector_mask() are not implemented, this
> > mismatch is mostly masked. Doorbell events are effectively treated as
> > "can hit any QP" and the off-by-one vector numbering does not surface
> > clearly.
> > 
> > However, once per-vector handling is enabled, the current state becomes
> > problematic:
> > 
> >   - db_valid_mask exposes bits that do not correspond to real doorbells
> >     (link/unused slots leak into the mask).
> >   - ntb_db_event() is fed with 1-based/shifted vectors, while NTB core
> >     expects a 0-based db_vector for doorbells.
> >   - On pci-epf-vntb, .peer_db_set() may be called in atomic context, but
> >     it directly calls pci_epc_raise_irq(), which can sleep.
> > 
> > 
> > Why NOT fix the root offset?
> > ============================
> > 
> > The natural "root" fix would be to remove the historical extra offset in
> > the peer_db_set() doorbell paths for ntb_hw_epf and pci-epf-vntb.
> > Unfortunately this would lead to interoperability issues when mixing old
> > and new kernel versions (old/new peers). A new side would ring a
> > different interrupt slot than what an old peer expects, leading to
> > missed or misrouted doorbells, once db_vector_count()/db_vector_mask()
> > are implemented.
> > 
> > Therefore this series intentionally keeps the legacy offset, and instead
> > fixes the surrounding pieces so the mapping is documented and handled
> > consistently in masks, vector numbering, and per-vector reporting.
> > 
> > 
> > What this series does
> > =====================
> > 
> > - pci-epf-vntb:
> > 
> >   - Document the legacy offset.
> >   - Defer MSI doorbell raises to process context to avoid sleeping in
> >     atomic context. This becomes relevant once multiple doorbells are
> >     raised concurrently at a high rate.
> >   - Report doorbell vectors as 0-based to ntb_db_event().
> >   - Fix db_valid_mask and implement db_vector_count()/db_vector_mask().
> > 
> > - ntb_hw_epf:
> > 
> >   - Document the legacy offset in ntb_epf_peer_db_set().
> >   - Fix db_valid_mask to cover only real doorbell bits.
> >   - Report 0-based db_vector to ntb_db_event() (accounting for the
> >     unused slot).
> >   - Keep db_val as a bitmask and fix db_read/db_clear semantics
> >     accordingly.
> >   - Implement db_vector_count()/db_vector_mask().
> > 
> > 
> > Compatibility
> > =============
> > 
> > By keeping the legacy offset intact, this series aims to remain
> > compatible across mixed kernel versions. The observable changes are
> > limited to correct mask/vector reporting and safer execution context
> > handling.
> > 
> > Patches 1-5 (PCI Endpoint) and 6-10 (NTB) are independent and can be
> > applied separately through the respective trees. They are sent together
> > in this v3 for convenience.
> > 
> > Once the remaining acks from NTB maintainers are collected, the plan is
> 
> Hi Dave,
> 
> When you have a chance, I'd appreciate another look at patches 6-10 (which are
> unchanged since v2). If you do not see any blockers, Acked-by would be
> greatly appreciated.

Hi Dave (cc: Mani),

Just a gentle ping on this.

I'd appreciate your thoughts on patches 6-10 when you have a chance. If they
look fine to you, an Acked-by from the NTB side would be greatly appreciated.

Best regards,
Koichiro

> 
> 
> P.S. Regarding Sashiko's feedback [1], my understanding is that there are no
> blockers, but there are a few points that would be better addressed separately
> as orthogonal follow-ups:
> 
> - configfs knobs mutability and bounds checking, including (but not limited to)
>   db_count.
>   In my opinion, allowing updates after .bind() looks questionable, and
>   returning -EBUSY once bound seems more appropriate. I'm leaning toward
>   handling this as a separate hardening series.
> 
> - ntb_hw_epf IRQ unwind concern.
>   This is what I was trying to address in [2], which I hope will land soon.
> 
> - Other lifecycle concerns.
>   These are largely tied to the current vNTB implementation and were part of
>   [3], for which I still plan to post a follow-up series that adds .remove()
>   implementation to vntb_pci_driver.
> 
> [1] https://sashiko.dev/#/patchset/20260323031544.2598111-1-den%40valinux.co.jp
> [2] https://lore.kernel.org/ntb/20260304083028.1391068-1-den@valinux.co.jp/
> [3] https://lore.kernel.org/all/20260226084142.2226875-1-den@valinux.co.jp/
> 
> 
> Best regards,
> Koichiro
> 
> > to take the whole series through the PCI EP tree. See:
> > https://lore.kernel.org/linux-pci/rnzsnp5de4qf5w7smebkmqekpuaqckltx73rj6ha3q2nrby5yp@7hsgvdzvjkp6/
> > 
> > ---
> > Changelog
> > =========
> > 
> > Changes since v2:
> >   - No functional changes.
> >   - Rebased onto current pci/endpoint
> >     e022f0c72c7f ("selftests: pci_endpoint: Skip reserved BARs").
> >     * Patch 2 needed a trivial context-only adjustment while rebasing, due
> >       to commit d799984233a5 ("PCI: endpoint: pci-epf-vntb: Stop
> >       cmd_handler work in epf_ntb_epc_cleanup").
> >   - Picked up additional Reviewed-by tags from Frank.
> >   - Fixed the incorrect v2 series title.
> > 
> > Changes since v1:
> >   - Addressed feedback from Dave (add a source code comment, introduce
> >     enum to eliminate magic numbers)
> >   - Updated source code comment in Patch 2.
> >   - No functional changes, so retained Reviewed-by tags by Frank and Dave.
> >     Thank you both for the review.
> > 
> > v2: https://lore.kernel.org/linux-pci/20260227084955.3184017-1-den@valinux.co.jp/
> > v1: https://lore.kernel.org/linux-pci/20260224133459.1741537-1-den@valinux.co.jp/
> > 
> > 
> > Best regards,
> > Koichiro
> > 
> > 
> > Koichiro Den (10):
> >   PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset
> >   PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic
> >     context
> >   PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via
> >     ntb_db_event()
> >   PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask
> >   PCI: endpoint: pci-epf-vntb: Implement db_vector_count/mask for
> >     doorbells
> >   NTB: epf: Document legacy doorbell slot offset in
> >     ntb_epf_peer_db_set()
> >   NTB: epf: Make db_valid_mask cover only real doorbell bits
> >   NTB: epf: Report 0-based doorbell vector via ntb_db_event()
> >   NTB: epf: Fix doorbell bitmask handling in db_read/db_clear
> >   NTB: epf: Implement db_vector_count/mask for doorbells
> > 
> >  drivers/ntb/hw/epf/ntb_hw_epf.c               |  89 ++++++++++-
> >  drivers/pci/endpoint/functions/pci-epf-vntb.c | 147 +++++++++++++++---
> >  2 files changed, 210 insertions(+), 26 deletions(-)
> > 
> > -- 
> > 2.51.0
> > 
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 02/10] PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic context
  2026-03-23  3:15 ` [PATCH v3 02/10] PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic context Koichiro Den
  2026-03-23 18:27   ` Frank Li
@ 2026-05-01 16:26   ` Manivannan Sadhasivam
  2026-05-11  7:54     ` Koichiro Den
  1 sibling, 1 reply; 20+ messages in thread
From: Manivannan Sadhasivam @ 2026-05-01 16:26 UTC (permalink / raw)
  To: Koichiro Den
  Cc: Jon Mason, Dave Jiang, Allen Hubbe, Krzysztof Wilczyński,
	Kishon Vijay Abraham I, Bjorn Helgaas, Frank Li, Jerome Brunet,
	Lorenzo Pieralisi, Niklas Cassel, linux-kernel, linux-pci, ntb

On Mon, Mar 23, 2026 at 12:15:36PM +0900, Koichiro Den wrote:
> The NTB .peer_db_set() callback may be invoked from atomic context.
> pci-epf-vntb currently calls pci_epc_raise_irq() directly, but
> pci_epc_raise_irq() may sleep (it takes epc->lock).
> 
> Avoid sleeping in atomic context by coalescing doorbell bits into an
> atomic64 pending mask and raising MSIs from a work item. Limit the
> amount of work per run to avoid monopolizing the workqueue under a
> doorbell storm.
> 
> Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
> Changes since v2:
>   - Resolved a trivial context-only conflict after
>     d799984233a5 ("PCI: endpoint: pci-epf-vntb: Stop cmd_handler work in epf_ntb_epc_cleanup")
>     landed in pci/endpoint.
> 
>  drivers/pci/endpoint/functions/pci-epf-vntb.c | 106 +++++++++++++-----
>  1 file changed, 78 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> index a75f8a30f8dc..bc3b3df53ddb 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> @@ -69,6 +69,9 @@ static struct workqueue_struct *kpcintb_workqueue;
>  #define MAX_DB_COUNT			32
>  #define MAX_MW				4
>  
> +/* Limit per-work execution to avoid monopolizing kworker on doorbell storms. */
> +#define VNTB_PEER_DB_WORK_BUDGET	5
> +
>  enum epf_ntb_bar {
>  	BAR_CONFIG,
>  	BAR_DB,
> @@ -129,6 +132,8 @@ struct epf_ntb {
>  	u32 spad_count;
>  	u64 mws_size[MAX_MW];
>  	atomic64_t db;
> +	atomic64_t peer_db_pending;
> +	struct work_struct peer_db_work;
>  	u32 vbus_number;
>  	u16 vntb_pid;
>  	u16 vntb_vid;
> @@ -920,6 +925,8 @@ static int epf_ntb_epc_init(struct epf_ntb *ntb)
>  	INIT_DELAYED_WORK(&ntb->cmd_handler, epf_ntb_cmd_handler);
>  	queue_work(kpcintb_workqueue, &ntb->cmd_handler.work);
>  
> +	enable_work(&ntb->peer_db_work);
> +
>  	return 0;
>  
>  err_write_header:
> @@ -943,6 +950,7 @@ static int epf_ntb_epc_init(struct epf_ntb *ntb)
>  static void epf_ntb_epc_cleanup(struct epf_ntb *ntb)
>  {
>  	disable_delayed_work_sync(&ntb->cmd_handler);
> +	disable_work_sync(&ntb->peer_db_work);

Sashiko has flagged a couple of valid issues on this patch. Could you please
take a look?
https://sashiko.dev/#/patchset/20260323031544.2598111-1-den%40valinux.co.jp?part=2

- Mani

>  	epf_ntb_mw_bar_clear(ntb, ntb->num_mws);
>  	epf_ntb_db_bar_clear(ntb);
>  	epf_ntb_config_sspad_bar_clear(ntb);
> @@ -1357,41 +1365,79 @@ static int vntb_epf_peer_spad_write(struct ntb_dev *ndev, int pidx, int idx, u32
>  	return 0;
>  }
>  
> +static void vntb_epf_peer_db_work(struct work_struct *work)
> +{
> +	struct epf_ntb *ntb = container_of(work, struct epf_ntb, peer_db_work);
> +	struct pci_epf *epf = ntb->epf;
> +	unsigned int budget = VNTB_PEER_DB_WORK_BUDGET;
> +	u8 func_no, vfunc_no;
> +	u32 interrupt_num;
> +	u64 db_bits;
> +	int ret;
> +
> +	if (!epf || !epf->epc)
> +		return;
> +
> +	func_no = epf->func_no;
> +	vfunc_no = epf->vfunc_no;
> +
> +	/*
> +	 * Drain doorbells from peer_db_pending in snapshots (atomic64_xchg()).
> +	 * Limit the number of snapshots handled per run so we don't monopolize
> +	 * the workqueue under a doorbell storm.
> +	 */
> +	while (budget--) {
> +		db_bits = atomic64_xchg(&ntb->peer_db_pending, 0);
> +		if (!db_bits)
> +			return;
> +
> +		while (db_bits) {
> +			/*
> +			 * pci_epc_raise_irq() for MSI expects a 1-based
> +			 * interrupt number. ffs() returns a 1-based index (bit
> +			 * 0 -> 1). We historically add +2 to compute
> +			 * interrupt_num.
> +			 *
> +			 * Legacy mapping (kept for compatibility):
> +			 *
> +			 *   MSI #1 : link event (reserved)
> +			 *   MSI #2 : unused (historical offset)
> +			 *   MSI #3 : doorbell bit 0 (DB#0)
> +			 *   MSI #4 : doorbell bit 1 (DB#1)
> +			 *   ...
> +			 *
> +			 * Do not change this mapping to avoid breaking
> +			 * interoperability with older peers.
> +			 */
> +			interrupt_num = ffs(db_bits) + 2;
> +			db_bits &= db_bits - 1;
> +
> +			ret = pci_epc_raise_irq(epf->epc, func_no, vfunc_no,
> +						PCI_IRQ_MSI, interrupt_num);
> +			if (ret)
> +				dev_err(&ntb->ntb.dev,
> +					"Failed to raise IRQ for interrupt_num %u: %d\n",
> +					interrupt_num, ret);
> +		}
> +	}
> +
> +	if (atomic64_read(&ntb->peer_db_pending))
> +		queue_work(kpcintb_workqueue, &ntb->peer_db_work);
> +}
> +
>  static int vntb_epf_peer_db_set(struct ntb_dev *ndev, u64 db_bits)
>  {
> -	u32 interrupt_num = ffs(db_bits) + 1;
>  	struct epf_ntb *ntb = ntb_ndev(ndev);
> -	u8 func_no, vfunc_no;
> -	int ret;
> -
> -	func_no = ntb->epf->func_no;
> -	vfunc_no = ntb->epf->vfunc_no;
>  
>  	/*
> -	 * pci_epc_raise_irq() for MSI expects a 1-based interrupt number.
> -	 * ffs() returns a 1-based index (bit 0 -> 1). interrupt_num has already
> -	 * been computed as ffs(db_bits) + 1 above. Adding one more +1 when
> -	 * calling pci_epc_raise_irq() therefore results in:
> -	 *
> -	 *   doorbell bit 0 -> MSI #3
> -	 *
> -	 * Legacy mapping (kept for compatibility):
> -	 *
> -	 *   MSI #1 : link event (reserved)
> -	 *   MSI #2 : unused (historical offset)
> -	 *   MSI #3 : doorbell bit 0 (DB#0)
> -	 *   MSI #4 : doorbell bit 1 (DB#1)
> -	 *   ...
> -	 *
> -	 * Do not change this mapping to avoid breaking interoperability with
> -	 * older peers.
> +	 * .peer_db_set() may be called from atomic context. pci_epc_raise_irq()
> +	 * can sleep (it takes epc->lock), so defer MSI raising to process
> +	 * context. Doorbell requests are coalesced in peer_db_pending.
>  	 */
> -	ret = pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
> -				PCI_IRQ_MSI, interrupt_num + 1);
> -	if (ret)
> -		dev_err(&ntb->ntb.dev, "Failed to raise IRQ\n");
> +	atomic64_or(db_bits, &ntb->peer_db_pending);
> +	queue_work(kpcintb_workqueue, &ntb->peer_db_work);
>  
> -	return ret;
> +	return 0;
>  }
>  
>  static u64 vntb_epf_db_read(struct ntb_dev *ndev)
> @@ -1629,6 +1675,10 @@ static int epf_ntb_probe(struct pci_epf *epf,
>  	ntb->epf = epf;
>  	ntb->vbus_number = 0xff;
>  
> +	INIT_WORK(&ntb->peer_db_work, vntb_epf_peer_db_work);
> +	disable_work(&ntb->peer_db_work);
> +	atomic64_set(&ntb->peer_db_pending, 0);
> +
>  	/* Initially, no bar is assigned */
>  	for (i = 0; i < VNTB_BAR_NUM; i++)
>  		ntb->epf_ntb_bar[i] = NO_BAR;
> -- 
> 2.51.0
> 

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 09/10] NTB: epf: Fix doorbell bitmask handling in db_read/db_clear
  2026-03-23  3:15 ` [PATCH v3 09/10] NTB: epf: Fix doorbell bitmask handling in db_read/db_clear Koichiro Den
@ 2026-05-01 16:29   ` Manivannan Sadhasivam
  2026-05-11  9:15     ` Koichiro Den
  0 siblings, 1 reply; 20+ messages in thread
From: Manivannan Sadhasivam @ 2026-05-01 16:29 UTC (permalink / raw)
  To: Koichiro Den
  Cc: Jon Mason, Dave Jiang, Allen Hubbe, Krzysztof Wilczyński,
	Kishon Vijay Abraham I, Bjorn Helgaas, Frank Li, Jerome Brunet,
	Lorenzo Pieralisi, Niklas Cassel, linux-kernel, linux-pci, ntb

On Mon, Mar 23, 2026 at 12:15:43PM +0900, Koichiro Den wrote:
> The EPF driver currently stores the incoming doorbell as a vector number
> (irq_no + 1) in db_val and db_clear() clears all bits unconditionally.
> This breaks db_read()/db_clear() semantics when multiple doorbells are
> used.
> 
> Store doorbells as a bitmask (BIT_ULL(vector)) and make
> db_clear(db_bits) clear only the specified bits. Use atomic64 operations
> as db_val is updated from interrupt context.
> 
> Fixes: 812ce2f8d14e ("NTB: Add support for EPF PCI Non-Transparent Bridge")
> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> Suggested-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>

On this patch too, but some pre-existing issues as well:

https://sashiko.dev/#/patchset/20260323031544.2598111-1-den%40valinux.co.jp?part=9

- Mani

> ---
>  drivers/ntb/hw/epf/ntb_hw_epf.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
> index 67cdc5d729d5..741d30821390 100644
> --- a/drivers/ntb/hw/epf/ntb_hw_epf.c
> +++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
> @@ -6,6 +6,7 @@
>   * Author: Kishon Vijay Abraham I <kishon@ti.com>
>   */
>  
> +#include <linux/atomic.h>
>  #include <linux/delay.h>
>  #include <linux/module.h>
>  #include <linux/pci.h>
> @@ -108,7 +109,7 @@ struct ntb_epf_dev {
>  	unsigned int self_spad;
>  	unsigned int peer_spad;
>  
> -	int db_val;
> +	atomic64_t db_val;
>  	u64 db_valid_mask;
>  };
>  
> @@ -337,15 +338,16 @@ static irqreturn_t ntb_epf_vec_isr(int irq, void *dev)
>  	int irq_no;
>  
>  	irq_no = irq - pci_irq_vector(ndev->ntb.pdev, 0);
> -	ndev->db_val = irq_no + 1;
>  
>  	if (irq_no == EPF_IRQ_LINK) {
>  		ntb_link_event(&ndev->ntb);
>  	} else if (irq_no == EPF_IRQ_RESERVED_DB) {
>  		dev_warn_ratelimited(ndev->dev,
>  				     "Unexpected irq_no 1 received. Treat it as DB#0.\n");
> +		atomic64_or(BIT_ULL(0), &ndev->db_val);
>  		ntb_db_event(&ndev->ntb, 0);
>  	} else {
> +		atomic64_or(BIT_ULL(irq_no - EPF_IRQ_DB_START), &ndev->db_val);
>  		ntb_db_event(&ndev->ntb, irq_no - EPF_IRQ_DB_START);
>  	}
>  
> @@ -530,7 +532,7 @@ static u64 ntb_epf_db_read(struct ntb_dev *ntb)
>  {
>  	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
>  
> -	return ndev->db_val;
> +	return atomic64_read(&ndev->db_val);
>  }
>  
>  static int ntb_epf_db_clear_mask(struct ntb_dev *ntb, u64 db_bits)
> @@ -542,7 +544,7 @@ static int ntb_epf_db_clear(struct ntb_dev *ntb, u64 db_bits)
>  {
>  	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
>  
> -	ndev->db_val = 0;
> +	atomic64_and(~db_bits, &ndev->db_val);
>  
>  	return 0;
>  }
> -- 
> 2.51.0
> 

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 02/10] PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic context
  2026-05-01 16:26   ` Manivannan Sadhasivam
@ 2026-05-11  7:54     ` Koichiro Den
  0 siblings, 0 replies; 20+ messages in thread
From: Koichiro Den @ 2026-05-11  7:54 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Jon Mason, Dave Jiang, Allen Hubbe, Krzysztof Wilczyński,
	Kishon Vijay Abraham I, Bjorn Helgaas, Frank Li, Jerome Brunet,
	Lorenzo Pieralisi, Niklas Cassel, linux-kernel, linux-pci, ntb

On Fri, May 01, 2026 at 09:56:36PM +0530, Manivannan Sadhasivam wrote:
> On Mon, Mar 23, 2026 at 12:15:36PM +0900, Koichiro Den wrote:
> > The NTB .peer_db_set() callback may be invoked from atomic context.
> > pci-epf-vntb currently calls pci_epc_raise_irq() directly, but
> > pci_epc_raise_irq() may sleep (it takes epc->lock).
> > 
> > Avoid sleeping in atomic context by coalescing doorbell bits into an
> > atomic64 pending mask and raising MSIs from a work item. Limit the
> > amount of work per run to avoid monopolizing the workqueue under a
> > doorbell storm.
> > 
> > Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> > Changes since v2:
> >   - Resolved a trivial context-only conflict after
> >     d799984233a5 ("PCI: endpoint: pci-epf-vntb: Stop cmd_handler work in epf_ntb_epc_cleanup")
> >     landed in pci/endpoint.
> > 
> >  drivers/pci/endpoint/functions/pci-epf-vntb.c | 106 +++++++++++++-----
> >  1 file changed, 78 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > index a75f8a30f8dc..bc3b3df53ddb 100644
> > --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > @@ -69,6 +69,9 @@ static struct workqueue_struct *kpcintb_workqueue;
> >  #define MAX_DB_COUNT			32
> >  #define MAX_MW				4
> >  
> > +/* Limit per-work execution to avoid monopolizing kworker on doorbell storms. */
> > +#define VNTB_PEER_DB_WORK_BUDGET	5
> > +
> >  enum epf_ntb_bar {
> >  	BAR_CONFIG,
> >  	BAR_DB,
> > @@ -129,6 +132,8 @@ struct epf_ntb {
> >  	u32 spad_count;
> >  	u64 mws_size[MAX_MW];
> >  	atomic64_t db;
> > +	atomic64_t peer_db_pending;
> > +	struct work_struct peer_db_work;
> >  	u32 vbus_number;
> >  	u16 vntb_pid;
> >  	u16 vntb_vid;
> > @@ -920,6 +925,8 @@ static int epf_ntb_epc_init(struct epf_ntb *ntb)
> >  	INIT_DELAYED_WORK(&ntb->cmd_handler, epf_ntb_cmd_handler);
> >  	queue_work(kpcintb_workqueue, &ntb->cmd_handler.work);
> >  
> > +	enable_work(&ntb->peer_db_work);
> > +
> >  	return 0;
> >  
> >  err_write_header:
> > @@ -943,6 +950,7 @@ static int epf_ntb_epc_init(struct epf_ntb *ntb)
> >  static void epf_ntb_epc_cleanup(struct epf_ntb *ntb)
> >  {
> >  	disable_delayed_work_sync(&ntb->cmd_handler);
> > +	disable_work_sync(&ntb->peer_db_work);
> 
> Sashiko has flagged a couple of valid issues on this patch. Could you please
> take a look?
> https://sashiko.dev/#/patchset/20260323031544.2598111-1-den%40valinux.co.jp?part=2

Hi Mani,

Thanks for the pointer, and sorry for the late response.

I revisited Sashiko's comments. My eralier reply [1] grouped the patch 2 reports
too broadly under follow-up lifecycle work. After going through them again, I now
think two of the issues are local enough to handle in this series, while the
driver-unbind/.remove part remains part of the broader vNTB lifecycle work.

For the individual Sashiko reports against patch 2:

1). peer_db_pending

    The EPC unbind/bind path is already not fully safe today, independently of
    this patch. Some of that was discussed and posted in [2], and the remaining
    .remove work is still pending (sorry about the delay).

    Nevertheless, patch 2 adds a new deferred doorbell state, so v4 will clear
    peer_db_pending after disabling peer_db_work in epf_ntb_epc_cleanup() and
    before enabling peer_db_work in epf_ntb_epc_init().  This is intended as
    local hygiene for the new state, not as a claim that the whole EPC/EPF
    lifecycle is fixed.

2). ffs(db_bits) on a u64 mask

    I agree this is worth cleaning up, although I would classify it as a
    pre-existing hardening issue rather than a newly reachable bug in patch 2.
    For valid doorbell masks, the bits are expected to be within the supported
    doorbell range, so the upper-32-bit case should not be a normal input path.

    Still, the callback takes a u64 mask and patch 2 stores/drains it in an
    atomic64_t, so relying on ffs() implicitly truncating the value to int is
    fragile for future changes. I will make the supported-mask invariant
    explicit in v4 by masking unsupported bits before walking the mask and by
    using a u64 bit iterator. This should not change the behaviour for valid
    doorbell masks while keeping the existing DB#0 -> MSI #3 legacy mapping
    unchanged.

3). possible peer_db_work UAF on driver unbind

    I still think this belongs to the broader vNTB lifecycle hardening work.
    Driver unbind while still bound to an EPC is unsafe independently of this
    patch, and the real fix needs a .remove implementation that unregisters the
    NTB device and drains/cancels the work items, including the newly added
    peer_db_work. I will handle this in the follow-up lifecycle series to [2]
    and call it out in the v4 cover letter.

So, I'm planning to include the following in v4 patch 2:
  - For 1), clear peer_db_pending in epc_cleanup/init around peer_db_work.
  - For 2), make the u64 doorbell mask handling explicit by masking unsupported
    bits and iterating the mask as u64.

Please let me know if you would prefer a different approach.

[1] https://lore.kernel.org/linux-pci/7jkibbdz3m3vuq2kq2xxnkt2rjdt5v7itgowc7qormtwgnf6tv@krl5brfjul5y/
[2] https://lore.kernel.org/all/20260226084142.2226875-1-den@valinux.co.jp/

Best regards,
Koichiro

> 
> - Mani
> 
> >  	epf_ntb_mw_bar_clear(ntb, ntb->num_mws);
> >  	epf_ntb_db_bar_clear(ntb);
> >  	epf_ntb_config_sspad_bar_clear(ntb);
> > @@ -1357,41 +1365,79 @@ static int vntb_epf_peer_spad_write(struct ntb_dev *ndev, int pidx, int idx, u32
> >  	return 0;
> >  }
> >  
> > +static void vntb_epf_peer_db_work(struct work_struct *work)
> > +{
> > +	struct epf_ntb *ntb = container_of(work, struct epf_ntb, peer_db_work);
> > +	struct pci_epf *epf = ntb->epf;
> > +	unsigned int budget = VNTB_PEER_DB_WORK_BUDGET;
> > +	u8 func_no, vfunc_no;
> > +	u32 interrupt_num;
> > +	u64 db_bits;
> > +	int ret;
> > +
> > +	if (!epf || !epf->epc)
> > +		return;
> > +
> > +	func_no = epf->func_no;
> > +	vfunc_no = epf->vfunc_no;
> > +
> > +	/*
> > +	 * Drain doorbells from peer_db_pending in snapshots (atomic64_xchg()).
> > +	 * Limit the number of snapshots handled per run so we don't monopolize
> > +	 * the workqueue under a doorbell storm.
> > +	 */
> > +	while (budget--) {
> > +		db_bits = atomic64_xchg(&ntb->peer_db_pending, 0);
> > +		if (!db_bits)
> > +			return;
> > +
> > +		while (db_bits) {
> > +			/*
> > +			 * pci_epc_raise_irq() for MSI expects a 1-based
> > +			 * interrupt number. ffs() returns a 1-based index (bit
> > +			 * 0 -> 1). We historically add +2 to compute
> > +			 * interrupt_num.
> > +			 *
> > +			 * Legacy mapping (kept for compatibility):
> > +			 *
> > +			 *   MSI #1 : link event (reserved)
> > +			 *   MSI #2 : unused (historical offset)
> > +			 *   MSI #3 : doorbell bit 0 (DB#0)
> > +			 *   MSI #4 : doorbell bit 1 (DB#1)
> > +			 *   ...
> > +			 *
> > +			 * Do not change this mapping to avoid breaking
> > +			 * interoperability with older peers.
> > +			 */
> > +			interrupt_num = ffs(db_bits) + 2;
> > +			db_bits &= db_bits - 1;
> > +
> > +			ret = pci_epc_raise_irq(epf->epc, func_no, vfunc_no,
> > +						PCI_IRQ_MSI, interrupt_num);
> > +			if (ret)
> > +				dev_err(&ntb->ntb.dev,
> > +					"Failed to raise IRQ for interrupt_num %u: %d\n",
> > +					interrupt_num, ret);
> > +		}
> > +	}
> > +
> > +	if (atomic64_read(&ntb->peer_db_pending))
> > +		queue_work(kpcintb_workqueue, &ntb->peer_db_work);
> > +}
> > +
> >  static int vntb_epf_peer_db_set(struct ntb_dev *ndev, u64 db_bits)
> >  {
> > -	u32 interrupt_num = ffs(db_bits) + 1;
> >  	struct epf_ntb *ntb = ntb_ndev(ndev);
> > -	u8 func_no, vfunc_no;
> > -	int ret;
> > -
> > -	func_no = ntb->epf->func_no;
> > -	vfunc_no = ntb->epf->vfunc_no;
> >  
> >  	/*
> > -	 * pci_epc_raise_irq() for MSI expects a 1-based interrupt number.
> > -	 * ffs() returns a 1-based index (bit 0 -> 1). interrupt_num has already
> > -	 * been computed as ffs(db_bits) + 1 above. Adding one more +1 when
> > -	 * calling pci_epc_raise_irq() therefore results in:
> > -	 *
> > -	 *   doorbell bit 0 -> MSI #3
> > -	 *
> > -	 * Legacy mapping (kept for compatibility):
> > -	 *
> > -	 *   MSI #1 : link event (reserved)
> > -	 *   MSI #2 : unused (historical offset)
> > -	 *   MSI #3 : doorbell bit 0 (DB#0)
> > -	 *   MSI #4 : doorbell bit 1 (DB#1)
> > -	 *   ...
> > -	 *
> > -	 * Do not change this mapping to avoid breaking interoperability with
> > -	 * older peers.
> > +	 * .peer_db_set() may be called from atomic context. pci_epc_raise_irq()
> > +	 * can sleep (it takes epc->lock), so defer MSI raising to process
> > +	 * context. Doorbell requests are coalesced in peer_db_pending.
> >  	 */
> > -	ret = pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
> > -				PCI_IRQ_MSI, interrupt_num + 1);
> > -	if (ret)
> > -		dev_err(&ntb->ntb.dev, "Failed to raise IRQ\n");
> > +	atomic64_or(db_bits, &ntb->peer_db_pending);
> > +	queue_work(kpcintb_workqueue, &ntb->peer_db_work);
> >  
> > -	return ret;
> > +	return 0;
> >  }
> >  
> >  static u64 vntb_epf_db_read(struct ntb_dev *ndev)
> > @@ -1629,6 +1675,10 @@ static int epf_ntb_probe(struct pci_epf *epf,
> >  	ntb->epf = epf;
> >  	ntb->vbus_number = 0xff;
> >  
> > +	INIT_WORK(&ntb->peer_db_work, vntb_epf_peer_db_work);
> > +	disable_work(&ntb->peer_db_work);
> > +	atomic64_set(&ntb->peer_db_pending, 0);
> > +
> >  	/* Initially, no bar is assigned */
> >  	for (i = 0; i < VNTB_BAR_NUM; i++)
> >  		ntb->epf_ntb_bar[i] = NO_BAR;
> > -- 
> > 2.51.0
> > 
> 
> -- 
> மணிவண்ணன் சதாசிவம்
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 09/10] NTB: epf: Fix doorbell bitmask handling in db_read/db_clear
  2026-05-01 16:29   ` Manivannan Sadhasivam
@ 2026-05-11  9:15     ` Koichiro Den
  0 siblings, 0 replies; 20+ messages in thread
From: Koichiro Den @ 2026-05-11  9:15 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Jon Mason, Dave Jiang, Allen Hubbe, Krzysztof Wilczyński,
	Kishon Vijay Abraham I, Bjorn Helgaas, Frank Li, Jerome Brunet,
	Lorenzo Pieralisi, Niklas Cassel, linux-kernel, linux-pci, ntb

On Fri, May 01, 2026 at 09:59:42PM +0530, Manivannan Sadhasivam wrote:
> On Mon, Mar 23, 2026 at 12:15:43PM +0900, Koichiro Den wrote:
> > The EPF driver currently stores the incoming doorbell as a vector number
> > (irq_no + 1) in db_val and db_clear() clears all bits unconditionally.
> > This breaks db_read()/db_clear() semantics when multiple doorbells are
> > used.
> > 
> > Store doorbells as a bitmask (BIT_ULL(vector)) and make
> > db_clear(db_bits) clear only the specified bits. Use atomic64 operations
> > as db_val is updated from interrupt context.
> > 
> > Fixes: 812ce2f8d14e ("NTB: Add support for EPF PCI Non-Transparent Bridge")
> > Reviewed-by: Frank Li <Frank.Li@nxp.com>
> > Suggested-by: Dave Jiang <dave.jiang@intel.com>
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> 
> On this patch too, but some pre-existing issues as well:
> 
> https://sashiko.dev/#/patchset/20260323031544.2598111-1-den%40valinux.co.jp?part=9

Hi Mani,

I also revisited Sashiko's feedback against patch 9.

My current understanding is still that most of the underlying IRQ handling
concerns are pre-existing. However, I realized that patch 9 does make one of
them more relevant because this series starts giving the reported doorbell
vector real per-vector semantics, so I think this should be fixed in this
series.

For the individual Sashiko reports against patch 9:

1) irq_no derivation from the Linux IRQ number

   The reverse calculation itself is not introduced by patch 9. The existing
   code already derives irq_no from the Linux IRQ number and uses it as the
   doorbell event number. Patch 9 preserves that assumption.

   Still, I agree the assumption is fragile. Linux IRQ numbers are not the
   device vector numbers, and the code should not rely on arithmetic on Linux
   IRQ numbers to recover the device vector. Since I will respin in any case,
   I'll resolve this as well in v4.

   The IRQ series in [2] addresses the request_irq() unwind problem and avoids
   calling pci_irq_vector() from hardirq context. It is already acked, but is
   not visible in mainline yet. I will therefore make [2] an explicit
   prerequisite for v4 and call that out in the cover letter.

   So, on top of [2], v4 will replace the remaining irq_base-style vector
   derivation with a per-vector IRQ context passed as request_irq()'s dev_id, so
   the ISR receives the device vector index directly.

2) unvalidated irq_no

   This is also pre-existing in the sense that the current code already passes
   the derived irq_no to ntb_db_event() without validation.

   However, patch 9 additionally uses the derived value in BIT_ULL(), and this
   series implements db_vector_count/mask(), so an invalid vector could lead to
   both an invalid bit shift and wrong downstream per-vector doorbell handling.

   So v4 will validate the doorbell vector before updating db_val and reporting
   the doorbell event.

3) request_irq() unwind / probe failure cleanup

   This is the ntb_hw_epf IRQ lifetime issue I mentioned in [1], and what I was
   trying to address in [2]. I still think this is orthogonal to the doorbell
   bitmask semantics, so I'll keep [2] as a prerequisite rather than fold that
   cleanup into this series.

To summarize, I'm planning to include the following in v4:
  - list [2] as an explicit prerequisite and call it out in the cover letter,
  - stop deriving the device vector from Linux IRQ-number arithmetic by
    using per-vector IRQ context for ntb_epf_vec_isr(), and
  - validate the doorbell vector before BIT_ULL() and ntb_db_event().

Please let me know if you would prefer a different split.

[1] https://lore.kernel.org/linux-pci/7jkibbdz3m3vuq2kq2xxnkt2rjdt5v7itgowc7qormtwgnf6tv@krl5brfjul5y/
[2] https://lore.kernel.org/ntb/20260304083028.1391068-1-den@valinux.co.jp/

Best regards,
Koichiro

> 
> - Mani
> 
> > ---
> >  drivers/ntb/hw/epf/ntb_hw_epf.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
> > index 67cdc5d729d5..741d30821390 100644
> > --- a/drivers/ntb/hw/epf/ntb_hw_epf.c
> > +++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
> > @@ -6,6 +6,7 @@
> >   * Author: Kishon Vijay Abraham I <kishon@ti.com>
> >   */
> >  
> > +#include <linux/atomic.h>
> >  #include <linux/delay.h>
> >  #include <linux/module.h>
> >  #include <linux/pci.h>
> > @@ -108,7 +109,7 @@ struct ntb_epf_dev {
> >  	unsigned int self_spad;
> >  	unsigned int peer_spad;
> >  
> > -	int db_val;
> > +	atomic64_t db_val;
> >  	u64 db_valid_mask;
> >  };
> >  
> > @@ -337,15 +338,16 @@ static irqreturn_t ntb_epf_vec_isr(int irq, void *dev)
> >  	int irq_no;
> >  
> >  	irq_no = irq - pci_irq_vector(ndev->ntb.pdev, 0);
> > -	ndev->db_val = irq_no + 1;
> >  
> >  	if (irq_no == EPF_IRQ_LINK) {
> >  		ntb_link_event(&ndev->ntb);
> >  	} else if (irq_no == EPF_IRQ_RESERVED_DB) {
> >  		dev_warn_ratelimited(ndev->dev,
> >  				     "Unexpected irq_no 1 received. Treat it as DB#0.\n");
> > +		atomic64_or(BIT_ULL(0), &ndev->db_val);
> >  		ntb_db_event(&ndev->ntb, 0);
> >  	} else {
> > +		atomic64_or(BIT_ULL(irq_no - EPF_IRQ_DB_START), &ndev->db_val);
> >  		ntb_db_event(&ndev->ntb, irq_no - EPF_IRQ_DB_START);
> >  	}
> >  
> > @@ -530,7 +532,7 @@ static u64 ntb_epf_db_read(struct ntb_dev *ntb)
> >  {
> >  	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
> >  
> > -	return ndev->db_val;
> > +	return atomic64_read(&ndev->db_val);
> >  }
> >  
> >  static int ntb_epf_db_clear_mask(struct ntb_dev *ntb, u64 db_bits)
> > @@ -542,7 +544,7 @@ static int ntb_epf_db_clear(struct ntb_dev *ntb, u64 db_bits)
> >  {
> >  	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
> >  
> > -	ndev->db_val = 0;
> > +	atomic64_and(~db_bits, &ndev->db_val);
> >  
> >  	return 0;
> >  }
> > -- 
> > 2.51.0
> > 
> 
> -- 
> மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-05-11  9:15 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23  3:15 [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
2026-03-23  3:15 ` [PATCH v3 01/10] PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset Koichiro Den
2026-03-23  3:15 ` [PATCH v3 02/10] PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic context Koichiro Den
2026-03-23 18:27   ` Frank Li
2026-05-01 16:26   ` Manivannan Sadhasivam
2026-05-11  7:54     ` Koichiro Den
2026-03-23  3:15 ` [PATCH v3 03/10] PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via ntb_db_event() Koichiro Den
2026-03-23  3:15 ` [PATCH v3 04/10] PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask Koichiro Den
2026-03-23  3:15 ` [PATCH v3 05/10] PCI: endpoint: pci-epf-vntb: Implement db_vector_count/mask for doorbells Koichiro Den
2026-03-23  3:15 ` [PATCH v3 06/10] NTB: epf: Document legacy doorbell slot offset in ntb_epf_peer_db_set() Koichiro Den
2026-03-23  3:15 ` [PATCH v3 07/10] NTB: epf: Make db_valid_mask cover only real doorbell bits Koichiro Den
2026-03-23  3:15 ` [PATCH v3 08/10] NTB: epf: Report 0-based doorbell vector via ntb_db_event() Koichiro Den
2026-03-23  3:15 ` [PATCH v3 09/10] NTB: epf: Fix doorbell bitmask handling in db_read/db_clear Koichiro Den
2026-05-01 16:29   ` Manivannan Sadhasivam
2026-05-11  9:15     ` Koichiro Den
2026-03-23  3:15 ` [PATCH v3 10/10] NTB: epf: Implement db_vector_count/mask for doorbells Koichiro Den
2026-03-23 15:43 ` [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling Koichiro Den
2026-03-25  6:23   ` Niklas Cassel
2026-03-25  8:44     ` Koichiro Den
2026-04-07 15:21   ` Koichiro Den

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.