* Re: [PATCH v14 4/7] PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends
@ 2026-04-28 20:36 Max Boone
2026-04-29 9:33 ` Niklas Cassel
0 siblings, 1 reply; 6+ messages in thread
From: Max Boone @ 2026-04-28 20:36 UTC (permalink / raw)
To: den, mani, frank.li
Cc: Frank.Li, allenbh, bhanuseshukumar, bhelgaas, cassel, dave.jiang,
jdmason, jingoohan1, kishon, kwilczynski, linux-kernel, linux-pci,
lpieralisi, mani, marco.crivellari, mmaddireddy, ntb, robh,
shinichiro.kawasaki, mboone
On Tue, 14 Apr 2026 23:15:11 +0900, Koichiro Den wrote:
> Prepare pci-ep-msi for non-MSI doorbell backends.
>
> Factor MSI doorbell allocation into a helper and extend struct
> pci_epf_doorbell_msg with:
>
> - irq_flags: required IRQ request flags (e.g. IRQF_SHARED for some
> backends)
> - type: doorbell backend type
> - bar/offset: pre-exposed doorbell target location, if any
>
> Initialize these fields for the existing MSI-backed doorbell
> implementation.
>
> Also add PCI_EPF_DOORBELL_EMBEDDED type, which is to be implemented in a
> follow-up patch.
>
> No functional changes.
I’m not very fond of keeping this implementation in the pci-ep-msi file,
as the platform MSI and this implementation are both iiuc specific to
the designware ep driver. Even more so because the MSI implementation
is enabled by config rather than through device tree.
Wouldn’t we want end-users to specify what kind of doorbell they want,
as it seems to be that a more specific doorbell BAR layout can be
programmed with eDMA, allowing native support for nvmet’s doorbell
BAR for example.
Originally in a patchset by Frank Li the API that was proposed was more
generic, and the pci-epc-msi implementation was chosen because there
was only one implementation:
- https://lore.kernel.org/imx/20231019150441.GA7254@thinkpad/
- https://lore.kernel.org/imx/20231019172347.GC7254@thinkpad/
I’d personally prefer to see an abstraction that is weaved into pci-epc-core
and pci-epf-core that can be implemented by drivers as they wish. While
still keeping the enum for different types.
That also gives room to pull a poll-mode doorbell into the pci-epc-core,
which deduplicates that code from the nvmet and vntb epfs, and allows
other functions to use RC->EP doorbells without needing to bother with
writing the polling mechanism.
P.S. I’ve been working on a vfio-user based epc for development purposes
personally, and the last hurdle before I want to send it in for comments is
support for doorbells, and came across this patchset checking if there’s
any other activity in the space. Having an implementation-agnostic
doorbell API in the EPF/EPC core would be very helpful to me.
>
> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> Tested-by: Niklas Cassel <cassel@kernel.org>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
> drivers/pci/endpoint/pci-ep-msi.c | 54 ++++++++++++++++++++++---------
> include/linux/pci-epf.h | 23 +++++++++++--
> 2 files changed, 60 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/pci/endpoint/pci-ep-msi.c b/drivers/pci/endpoint/pci-ep-msi.c
> index 1395919571f8..85fe46103220 100644
> --- a/drivers/pci/endpoint/pci-ep-msi.c
> +++ b/drivers/pci/endpoint/pci-ep-msi.c
> @@ -8,6 +8,7 @@
>
> #include <linux/device.h>
> #include <linux/export.h>
> +#include <linux/interrupt.h>
> #include <linux/irqdomain.h>
> #include <linux/module.h>
> #include <linux/msi.h>
> @@ -35,23 +36,13 @@ static void pci_epf_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
> pci_epc_put(epc);
> }
>
> -int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
> +static int pci_epf_alloc_doorbell_msi(struct pci_epf *epf, u16 num_db)
> {
> - struct pci_epc *epc = epf->epc;
> + struct pci_epf_doorbell_msg *msg;
> struct device *dev = &epf->dev;
> + struct pci_epc *epc = epf->epc;
> struct irq_domain *domain;
> - void *msg;
> - int ret;
> - int i;
> -
> - /* TODO: Multi-EPF support */
> - if (list_first_entry_or_null(&epc->pci_epf, struct pci_epf, list) != epf) {
> - dev_err(dev, "MSI doorbell doesn't support multiple EPF\n");
> - return -EINVAL;
> - }
> -
> - if (epf->db_msg)
> - return -EBUSY;
> + int ret, i;
>
> domain = of_msi_map_get_device_domain(epc->dev.parent, 0,
> DOMAIN_BUS_PLATFORM_MSI);
> @@ -74,6 +65,12 @@ int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
> if (!msg)
> return -ENOMEM;
>
> + for (i = 0; i < num_db; i++)
> + msg[i] = (struct pci_epf_doorbell_msg) {
> + .type = PCI_EPF_DOORBELL_MSI,
> + .bar = NO_BAR,
> + };
> +
> epf->num_db = num_db;
> epf->db_msg = msg;
>
> @@ -90,13 +87,40 @@ int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
> for (i = 0; i < num_db; i++)
> epf->db_msg[i].virq = msi_get_virq(epc->dev.parent, i);
>
> + return 0;
> +}
> +
> +int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
> +{
> + struct pci_epc *epc = epf->epc;
> + struct device *dev = &epf->dev;
> + int ret;
> +
> + /* TODO: Multi-EPF support */
> + if (list_first_entry_or_null(&epc->pci_epf, struct pci_epf, list) != epf) {
> + dev_err(dev, "Doorbell doesn't support multiple EPF\n");
> + return -EINVAL;
> + }
> +
> + if (epf->db_msg)
> + return -EBUSY;
> +
> + ret = pci_epf_alloc_doorbell_msi(epf, num_db);
> + if (!ret)
> + return 0;
> +
> + dev_err(dev, "Failed to allocate doorbell: %d\n", ret);
> return ret;
> }
> EXPORT_SYMBOL_GPL(pci_epf_alloc_doorbell);
>
> void pci_epf_free_doorbell(struct pci_epf *epf)
> {
> - platform_device_msi_free_irqs_all(epf->epc->dev.parent);
> + if (!epf->db_msg)
> + return;
> +
> + if (epf->db_msg[0].type == PCI_EPF_DOORBELL_MSI)
> + platform_device_msi_free_irqs_all(epf->epc->dev.parent);
>
> kfree(epf->db_msg);
> epf->db_msg = NULL;
> diff --git a/include/linux/pci-epf.h b/include/linux/pci-epf.h
> index 7737a7c03260..cd747447a1ea 100644
> --- a/include/linux/pci-epf.h
> +++ b/include/linux/pci-epf.h
> @@ -152,14 +152,33 @@ struct pci_epf_bar {
> struct pci_epf_bar_submap *submap;
> };
>
> +enum pci_epf_doorbell_type {
> + PCI_EPF_DOORBELL_MSI = 0,
> + PCI_EPF_DOORBELL_EMBEDDED,
> +};
> +
> /**
> * struct pci_epf_doorbell_msg - represents doorbell message
> - * @msg: MSI message
> - * @virq: IRQ number of this doorbell MSI message
> + * @msg: Doorbell address/data pair to be mapped into BAR space.
> + * For MSI-backed doorbells this is the MSI message, while for
> + * "embedded" doorbells this represents an MMIO write that asserts
> + * an interrupt on the EP side.
> + * @virq: IRQ number of this doorbell message
> + * @irq_flags: Required flags for request_irq()/request_threaded_irq().
> + * Callers may OR-in additional flags (e.g. IRQF_ONESHOT).
> + * @type: Doorbell type.
> + * @bar: BAR number where the doorbell target is already exposed to the RC
> + * (NO_BAR if not)
> + * @offset: offset within @bar for the doorbell target (valid iff
> + * @bar != NO_BAR)
> */
> struct pci_epf_doorbell_msg {
> struct msi_msg msg;
> int virq;
> + unsigned long irq_flags;
> + enum pci_epf_doorbell_type type;
> + enum pci_barno bar;
> + resource_size_t offset;
> };
>
> /**
> --
> 2.51.0
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH v14 4/7] PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends
2026-04-28 20:36 [PATCH v14 4/7] PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends Max Boone
@ 2026-04-29 9:33 ` Niklas Cassel
2026-04-29 11:11 ` Max Boone
0 siblings, 1 reply; 6+ messages in thread
From: Niklas Cassel @ 2026-04-29 9:33 UTC (permalink / raw)
To: Max Boone
Cc: den, mani, frank.li, allenbh, bhanuseshukumar, bhelgaas,
dave.jiang, jdmason, jingoohan1, kishon, kwilczynski,
linux-kernel, linux-pci, lpieralisi, marco.crivellari,
mmaddireddy, ntb, robh, shinichiro.kawasaki, mboone
On Tue, Apr 28, 2026 at 10:36:15PM +0200, Max Boone wrote:
>
> I’m not very fond of keeping this implementation in the pci-ep-msi file,
> as the platform MSI and this implementation are both iiuc specific to
> the designware ep driver. Even more so because the MSI implementation
> is enabled by config rather than through device tree.
Why do you think that the current code with DOMAIN_BUS_PLATFORM_MSI is
designware EPC specific?
I don't see anything that is designware EPC specific.
Sure, it relies on GIC ITS, but I don't see why non-designware EPCs can't
use GIC ITS.
>
> Wouldn’t we want end-users to specify what kind of doorbell they want,
> as it seems to be that a more specific doorbell BAR layout can be
> programmed with eDMA, allowing native support for nvmet’s doorbell
> BAR for example.
I also wanted to use my designware based EPC for doorbells in nvmet-pci-epf,
specifically the support that Koichiro added for inbound subrange mapping.
However, most designware EPC have a strict alignment requirement
(CX_ATU_MIN_REGION_SIZE), which is often 4k.
This alignment requirement is there both on the PCI address (address within
the BAR, and for the physical memory address (target address)).
I thought that we could use the inbound subrange mapping and put the doorbells
in a separate inbound iATU, so we could remove polling in the nvmet-pci-epf
driver, just like they have done in the vNTB driver. However, it works in vNTB
because they have a register telling exactly which BAR and offset in that BAR
where the doorbells are.
In the NVMe PCIe Transport specification, the offset for the start of the
doorbells is fixed, at offset 0x1000 (4k) and the only thing you can change is
the stride between the doorbells.
Currently, a doorbell is a single 32-bit data, sure we could call
pci_epf_alloc_doorbell() with ("number of I/O queues" + 1 (admin queue)) * 2
(submission queue and completion queue).
However, the address which we get from pci_epf_alloc_doorbell() might not be
4k aligned.
We have the function pci_epf_align_inbound_addr() which can split this
non-aligned address to a 4k aligned base + offset from that base.
However, that would also require the host side driver to write to this offset
from the start address. (See e.g. doorbell_offset in pci-epf-test.c).
So, basically, with the current limitation that the doorbells must start at
0x1000, together with the fact that the doorbells returned from
pci_epf_alloc_doorbell() might have an arbitrary alignment, I don't see how
we could add support for doorbells in nvmet-pci-epf.
If we could supply an alignment requirement to pci_epf_alloc_doorbell(),
e.g. 4k, and the API is guaranteed to return an address that satisfies this
alignment requirement, then we would be good.
However, right now, we don't have such an API. We simple get an address
somewhere within the GIC ITS MMIO region.
>
> Originally in a patchset by Frank Li the API that was proposed was more
> generic, and the pci-epc-msi implementation was chosen because there
> was only one implementation:
> - https://lore.kernel.org/imx/20231019150441.GA7254@thinkpad/
> - https://lore.kernel.org/imx/20231019172347.GC7254@thinkpad/
>
> I’d personally prefer to see an abstraction that is weaved into pci-epc-core
> and pci-epf-core that can be implemented by drivers as they wish. While
> still keeping the enum for different types.
>
> That also gives room to pull a poll-mode doorbell into the pci-epc-core,
> which deduplicates that code from the nvmet and vntb epfs, and allows
> other functions to use RC->EP doorbells without needing to bother with
> writing the polling mechanism.
Sounds like a good idea.
>
> P.S. I’ve been working on a vfio-user based epc for development purposes
> personally, and the last hurdle before I want to send it in for comments is
> support for doorbells, and came across this patchset checking if there’s
> any other activity in the space. Having an implementation-agnostic
> doorbell API in the EPF/EPC core would be very helpful to me.
I have looked at adding doorbell support to nvmet-pci-epf, but got stuck on
pci_epf_alloc_doorbell() returning an address that is not 4k aligned.
(Since the NVMe PCIe transport specification has the doorbells at a fixed
location, we can't change that.)
But if we could provide an "alignment" parameter to pci_epf_alloc_doorbell(),
then I think it is possible.
Sure, the GIC ITS MMIO area might be quite small, so it might not be able
satisfy such a request. E.g. on rk3588, the its1 MMIO region is 0x20000 (128k):
https://github.com/torvalds/linux/blob/v7.1-rc1/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi#L2414
However, I have not idea of how much of this region the GIC driver uses for
actual registers, and how much of that region it can actually dedicate to
doorbells.
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH v14 4/7] PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends
2026-04-29 9:33 ` Niklas Cassel
@ 2026-04-29 11:11 ` Max Boone
2026-04-29 14:52 ` Niklas Cassel
0 siblings, 1 reply; 6+ messages in thread
From: Max Boone @ 2026-04-29 11:11 UTC (permalink / raw)
To: Niklas Cassel
Cc: den, mani, frank.li, allenbh, bhanuseshukumar, bhelgaas,
dave.jiang, jdmason, jingoohan1, kishon, kwilczynski,
linux-kernel, linux-pci, lpieralisi, marco.crivellari,
mmaddireddy, ntb, robh, shinichiro.kawasaki
> On Apr 29, 2026, at 11:33 AM, Niklas Cassel <cassel@kernel.org> wrote:
>
> On Tue, Apr 28, 2026 at 10:36:15PM +0200, Max Boone wrote:
>>
>> I’m not very fond of keeping this implementation in the pci-ep-msi file,
>> as the platform MSI and this implementation are both iiuc specific to
>> the designware ep driver. Even more so because the MSI implementation
>> is enabled by config rather than through device tree.
>
> Why do you think that the current code with DOMAIN_BUS_PLATFORM_MSI is
> designware EPC specific?
>
> I don't see anything that is designware EPC specific.
>
> Sure, it relies on GIC ITS, but I don't see why non-designware EPCs can't
> use GIC ITS.
Good point - looking through the device trees I only saw the msi-map / platform
msi set for the imx95 and assumed designware was the only EPC supporting
this (also because the code uses that of-node specifically), but I indeed don’t see
a reason that other chips can’t use this.
I’m a bit confused on the configuration, fwiw it’s probably me being unfamiliar
with PCIe, but it doesn’t seem right to configure the MSI and eDMA DBs through
a kconfig option rather than inferring it from the device tree and/or having the EP
driver enable the capability and expose an operation to realize it.
On the other hand, that way, we would probably end up with identical DB
implementations in mutliple drivers.
>> Wouldn’t we want end-users to specify what kind of doorbell they want,
>> as it seems to be that a more specific doorbell BAR layout can be
>> programmed with eDMA, allowing native support for nvmet’s doorbell
>> BAR for example.
>
> I also wanted to use my designware based EPC for doorbells in nvmet-pci-epf,
> specifically the support that Koichiro added for inbound subrange mapping.
>
> However, most designware EPC have a strict alignment requirement
> (CX_ATU_MIN_REGION_SIZE), which is often 4k.
>
> This alignment requirement is there both on the PCI address (address within
> the BAR, and for the physical memory address (target address)).
>
> I thought that we could use the inbound subrange mapping and put the doorbells
> in a separate inbound iATU, so we could remove polling in the nvmet-pci-epf
> driver, just like they have done in the vNTB driver. However, it works in vNTB
> because they have a register telling exactly which BAR and offset in that BAR
> where the doorbells are.
> In the NVMe PCIe Transport specification, the offset for the start of the
> doorbells is fixed, at offset 0x1000 (4k) and the only thing you can change is
> the stride between the doorbells.
>
> Currently, a doorbell is a single 32-bit data, sure we could call
> pci_epf_alloc_doorbell() with ("number of I/O queues" + 1 (admin queue)) * 2
> (submission queue and completion queue).
>
> However, the address which we get from pci_epf_alloc_doorbell() might not be
> 4k aligned.
>
> We have the function pci_epf_align_inbound_addr() which can split this
> non-aligned address to a 4k aligned base + offset from that base.
>
> However, that would also require the host side driver to write to this offset
> from the start address. (See e.g. doorbell_offset in pci-epf-test.c).
>
> So, basically, with the current limitation that the doorbells must start at
> 0x1000, together with the fact that the doorbells returned from
> pci_epf_alloc_doorbell() might have an arbitrary alignment, I don't see how
> we could add support for doorbells in nvmet-pci-epf.
>
> If we could supply an alignment requirement to pci_epf_alloc_doorbell(),
> e.g. 4k, and the API is guaranteed to return an address that satisfies this
> alignment requirement, then we would be good.
>
> However, right now, we don't have such an API. We simple get an address
> somewhere within the GIC ITS MMIO region.
Check, thanks for the write-up, this is also what I’m looking to get working,
coindicentally on the RK3588. I had imagined that it would be possible to build
a sufficient API by passing in a base offset and stride for the doorbell allocation,
but an alignment param sounds better. Can we program the resulting doorbells
at an arbitrary offset in a BAR, or would we waste the first allocated
doorbell that’s going to be located at 0x0000 - 0x1000?
In any case, I think it would be preferable for users of the alloc_doorbell function
to pass in what kind of doorbell they want instead of using a fallback mechanism.
It seems to me that the alignment and possibly a larger amount of doorbells are
possible with the eDMA doorbell mechanism. Or am I misunderstanding eDMA
here and is that bounded by mapping / size / alignment of the GIC ITS?
>> Originally in a patchset by Frank Li the API that was proposed was more
>> generic, and the pci-epc-msi implementation was chosen because there
>> was only one implementation:
>> - https://lore.kernel.org/imx/20231019150441.GA7254@thinkpad/
>> - https://lore.kernel.org/imx/20231019172347.GC7254@thinkpad/
>>
>> I’d personally prefer to see an abstraction that is weaved into pci-epc-core
>> and pci-epf-core that can be implemented by drivers as they wish. While
>> still keeping the enum for different types.
>>
>> That also gives room to pull a poll-mode doorbell into the pci-epc-core,
>> which deduplicates that code from the nvmet and vntb epfs, and allows
>> other functions to use RC->EP doorbells without needing to bother with
>> writing the polling mechanism.
>
> Sounds like a good idea.
I’ll refactor my local branch and include this patchset and send it in with RFC,
will probably not work on this for another couple days though.
>> P.S. I’ve been working on a vfio-user based epc for development purposes
>> personally, and the last hurdle before I want to send it in for comments is
>> support for doorbells, and came across this patchset checking if there’s
>> any other activity in the space. Having an implementation-agnostic
>> doorbell API in the EPF/EPC core would be very helpful to me.
>
> I have looked at adding doorbell support to nvmet-pci-epf, but got stuck on
> pci_epf_alloc_doorbell() returning an address that is not 4k aligned.
>
> (Since the NVMe PCIe transport specification has the doorbells at a fixed
> location, we can't change that.)
>
> But if we could provide an "alignment" parameter to pci_epf_alloc_doorbell(),
> then I think it is possible.
>
> Sure, the GIC ITS MMIO area might be quite small, so it might not be able
> satisfy such a request. E.g. on rk3588, the its1 MMIO region is 0x20000 (128k):
> https://github.com/torvalds/linux/blob/v7.1-rc1/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi#L2414
Hrm, I think I’m misunderstanding the eDMA mechanism that is proposed in this
patch. Is the fixed eDMA register block (e.g. BAR4 for the RK3588) translated to
a space in the GIC ITS MMIO area - or is restriction specifically on adding alignment
to the platform MSI doorbell implementation?
> However, I have not idea of how much of this region the GIC driver uses for
> actual registers, and how much of that region it can actually dedicate to
> doorbells.
>
>
> Kind regards,
> Niklas
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH v14 4/7] PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends
2026-04-29 11:11 ` Max Boone
@ 2026-04-29 14:52 ` Niklas Cassel
0 siblings, 0 replies; 6+ messages in thread
From: Niklas Cassel @ 2026-04-29 14:52 UTC (permalink / raw)
To: Max Boone
Cc: den, mani, frank.li, allenbh, bhanuseshukumar, bhelgaas,
dave.jiang, jdmason, jingoohan1, kishon, kwilczynski,
linux-kernel, linux-pci, lpieralisi, marco.crivellari,
mmaddireddy, ntb, robh, shinichiro.kawasaki
On Wed, Apr 29, 2026 at 01:11:12PM +0200, Max Boone wrote:
> Good point - looking through the device trees I only saw the msi-map / platform
> msi set for the imx95 and assumed designware was the only EPC supporting
> this (also because the code uses that of-node specifically), but I indeed don’t see
> a reason that other chips can’t use this.
>
> I’m a bit confused on the configuration, fwiw it’s probably me being unfamiliar
> with PCIe, but it doesn’t seem right to configure the MSI and eDMA DBs through
> a kconfig option rather than inferring it from the device tree and/or having the EP
> driver enable the capability and expose an operation to realize it.
I don't see anything that is configured using Kconfig options.
But in order to enable PCIe EP doorbell support in the vmlinux binary
you need to build with CONFIG_PCI_ENDPOINT_MSI_DOORBELL=y.
To make use of the GIC-ITS MSI support, you need to define msi-map defined in
device tree. I did send such a patch for rk3588:
https://lore.kernel.org/linux-rockchip/20250908162400.535441-2-cassel@kernel.org/
But, AFAICT, considering that the rk3588 does not have any way to map RID to
a predictable SID (which is possible on e.g. imx95), I don't think it is wise
to add msi-map to rk3588.
It is similar to the problem why we can't run with the IOMMU enabled when
running the PCIe controller in endpoint mode.
For more info see:
https://lore.kernel.org/all/20250207143900.2047949-2-cassel@kernel.org/
Basically, the problem is that the host assigns a BDF to each Root Complex
on the host side, and then assigns a BDF to each Endpoint it finds connected
to that Root Complex.
Thus the PCI Endpoint controller will have no idea which BDF the Root Complex
will have.
The Requester ID will be that matching the BDF of the Root Complex.
But the problem is that the Endpoint side cannot insert a mapping for this
Requester ID, because it does not know which Requester ID the RC will have.
I guess it could theoretically insert all possible Requester IDs in the IOMMU,
but that is not going to fly according to Robin (ARM SMMU maintainer):
"Yeah, that one pretty much settles it - we can certainly expect host
root ports with nonzero device numbers, so that's at least 13 bits of
the StreamID space to cover, which isn't going to fly."
Note that there are some PCI endpoint controllers that can run with the IOMMU
enabled, by using a look up table and sideband signals, see e.g. imx6:
commit ce4c4301728541db7e5f571a5688a3a236d9e488
Author: Frank Li <Frank.Li@nxp.com>
Date: Tue Jan 14 15:37:09 2025 -0500
PCI: imx6: Add IOMMU and ITS MSI support for i.MX95
I'm not sure if it is possible to configure a LUT on the RK3588 as well,
in order to keep the IOMMU enabled also in Endpoint mode. If it is, then
it wasn't clear from the RK3588 TRM.
(Having the IOMMU enabled when running in Root Complex mode is no problem,
as the Linux driver core automatically will add/insert a (single) Stream ID
matching the BDF it assigns to each Endpoint.)
> Check, thanks for the write-up, this is also what I’m looking to get working,
> coindicentally on the RK3588. I had imagined that it would be possible to build
> a sufficient API by passing in a base offset and stride for the doorbell allocation,
> but an alignment param sounds better. Can we program the resulting doorbells
> at an arbitrary offset in a BAR, or would we waste the first allocated
> doorbell that’s going to be located at 0x0000 - 0x1000?
I'm not sure if I follow.
If you use subrange inbound mapping, you split the BAR (BAR0) into two.
The first range, 0x0-0xfff would use one inbound iATU and would have inbound
address translation that points to allocated memory by nvmet-pci-epf.
This is the regular nvme registers in range 0x0-0xfff.
The second range 0x1000-XXXX (depends on how many I/O queues the nvmet-pci-epf
allocates) would point to a physical address that is used for doorbells
(the address returned by pci_epf_alloc_doorbell()). This would use another
inbound iATU.
Since there are no NVMe registers after the doorbells, we don't need a third
inbound iATU.
I think we can use stride == 0.
Another possible way would be to use stride == CX_ATU_MIN_REGION_SIZE, and then
use one iATU per doorbell, but considering that most DWC EPCs have a very
limited amount of inbound iATUs (rk3588 has 16 inbound iATUs, but some SoCs have
much fewer), I'm not sure if this approach is the best idea. One iATU per I/O
submission queue, and one iATU per I/O completion queue, then one iATU for Admin
submission queue, and one iATU for admin completion queue... I'm not sure if
this is a good approach. Stride == 0 and one iATU seems better.
(I don't really see any advantage of using one iATU per doorbell. We will still
have the problem that each address returned by pci_epf_alloc_doorbell() needs to
be aligned to CX_ATU_MIN_REGION_SIZE anyway.)
> In any case, I think it would be preferable for users of the alloc_doorbell function
> to pass in what kind of doorbell they want instead of using a fallback mechanism.
> It seems to me that the alignment and possibly a larger amount of doorbells are
> possible with the eDMA doorbell mechanism. Or am I misunderstanding eDMA
> here and is that bounded by mapping / size / alignment of the GIC ITS?
The GIC-ITS MSI way will return a physical address by pci_epf_alloc_doorbell().
(This option does not really seem feasible on rk3588.)
The alternative is to use the DWC eDMA hardware itself to emulate doorbells.
For the DWC eDMA option, there are two ways:
a) The PCIe EPC controller was synthesized to expose the eDMA registers in a
BAR at a fixed offset.
b) The PCIe EPC controller was not synthesized to expose the eDMA registers in
a BAR at a fixed offset.
For case a), we will get a physical address that is within the DWC eDMA MMIO
space. Here we will need to call pci_epf_alloc_doorbell() can set up an iATU
for inbound translation to the DWC eDMA MMIO address.
For case b), at least when I was testing, setting up an inbound iATU that
translates a region in e.g. BAR0 to the DWC eDMA MMIO addresses did NOT work.
Feel free to try this yourself. I don't fully understand why this does not work.
My theory is that when the DWC EPC was configured with the eDMA registers
exposed in a fixed location, e.g. in BAR4 on rk3588, the hardware has some
internal fixed translation for BAR4 to the eDMA MMIO addresses, and because of
that, setting up an inbound iATU which also translates inbound PCI TLPs, from
another BAR, e.g. BAR0 to the same DWC eDMA MMIO addresses, does not work.
Again, please feel free to try yourself, perhaps I missed something.
Thus, for pci-epf-test, we simply fill in DB_BAR and DB_OFFSET with the
BAR + offset in that BAR where the eDMA regs are exposed, rather than using
an inbound iATU to translate the inbound PCI TLPs to the DWC eDMA MMIO
addresses.
Regardless, for the DWC eDMA case, I'm not sure if it is possible to support an
"align" parameter to pci_epf_alloc_doorbell(), because I think it will always
return a single address (a specific address inside the DWC eDMA that can be used
to emulate doorbells). For GIC-ITS, I think pci_epf_alloc_doorbell() might
return different addresses, for each time it is called.
Koichiro, please correct me if I am wrong.
So, my suggestion to add an "align" parameter to pci_epf_alloc_doorbell() will
probably only work for the GIC-ITS case, unfortunately.
> Hrm, I think I’m misunderstanding the eDMA mechanism that is proposed in this
> patch. Is the fixed eDMA register block (e.g. BAR4 for the RK3588) translated to
> a space in the GIC ITS MMIO area - or is restriction specifically on adding alignment
> to the platform MSI doorbell implementation?
The iATU alignment requirement, that the base and target address must be aligned
to CX_ATU_MIN_REGION_SIZE is always there when using an iATU.
So for "GIC-ITS + iATU" or "DWC eDMA + iATU".
The difference is with e.g. rk3588, pci-epf-test does not use an inbound iATU
mapping to read/access the DWC eDMA regs (using the DWC eDMA MMIO address).
We simply fill in the DB_BAR and DB_OFFSET to point the BAR which has the eDMA
registers exposed by default. So we read the eDMA regs from the "fixed resource
BAR" (BAR4), rather than setting up an iATU mapping in e.g. BAR0, which translates
to the eDMA MMIO address.
(And because NVMe has the doorbells in a fixed location, as long as we can't set
up an inbound iATU that points to the eDMA MMIO regs, I don't see how we will
get nvmet-pci-epf to work with doorbells on rk3588).
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v14 0/7] PCI: endpoint: pci-ep-msi: Add embedded doorbell fallback
@ 2026-04-14 14:15 Koichiro Den
2026-04-14 14:15 ` [PATCH v14 4/7] PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends Koichiro Den
0 siblings, 1 reply; 6+ messages in thread
From: Koichiro Den @ 2026-04-14 14:15 UTC (permalink / raw)
To: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
Kishon Vijay Abraham I, Jon Mason, Dave Jiang, Allen Hubbe,
Niklas Cassel, Frank Li, Bhanu Seshu Kumar Valluri,
Marco Crivellari, Shin'ichiro Kawasaki, Manikanta Maddireddy
Cc: linux-pci, linux-kernel, ntb
Hi,
Some endpoint platforms cannot use a GIC ITS-backed MSI domain for
EP-side doorbells. In those cases, endpoint function (EPF) drivers
cannot provide a doorbell to the root complex (RC), and features such as
vNTB may fall back to polling with significantly higher latency.
This series adds an alternate doorbell backend based on the PCIe
endpoint controller (EPC)'s integrated eDMA interrupt-emulation feature.
The RC rings the doorbell by doing a single 32-bit MMIO write to an eDMA
doorbell location exposed in a BAR window. The EP side receives a Linux
IRQ that EPF drivers can use as a doorbell interrupt, without relying on
MSI message writes reaching the ITS.
The DesignWare eDMA interrupt-emulation doorbell is wired up as one user
of the generic EPC aux-resource API. Other vendors can support their
MMIO-based doorbells by implementing the EPC aux-resource callbacks:
pci_epc_ops.get_aux_resources_count() / pci_epc_ops.get_aux_resources().
Dependencies
============
(1). [PATCH 0/2] dmaengine: dw-edma: Interrupt-emulation doorbell support
https://lore.kernel.org/dmaengine/20260215152216.3393561-1-den@valinux.co.jp/
Note: already landed in dmaengine/next.
Tested on
=========
v14 re-tested on:
(1). R-Car S4 Spider: EP <-> RC
(2). RK3588 Rock 5B (EP) <-> CIX CD8180 Orion O6 (RC)
The EP in both scenarios prints the following in dmesg when running
DOORBELL_TEST:
pci_epf_test pci_epf_test.0: Can't find MSI domain for EPC
pci_epf_test pci_epf_test.0: Using embedded (DMA) doorbell fallback
With this series applied, the DOORBELL_TEST succeeds:
$ ./pci_endpoint_test -t DOORBELL_TEST
TAP version 13
1..1
# Starting 1 tests from 1 test cases.
# RUN pcie_ep_doorbell.DOORBELL_TEST ...
# OK pcie_ep_doorbell.DOORBELL_TEST
ok 1 pcie_ep_doorbell.DOORBELL_TEST
# PASSED: 1 / 1 tests passed.
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
IOMMU coverage tested:
On R-Car S4 Spider EP, DOORBELL_TEST passes with the EP IOMMU both
enabled and disabled.
On Rock 5B EP, DOORBELL_TEST passes with the EP IOMMU disabled. The
enabled case is not applicable, as the EP IOMMU is explicitly disabled
upstream on this platform.
Performance test: vNTB ping latency
===================================
(*) Re-tested with v14 only to confirm that no regression was introduced
in v13->v14. No performance difference is expected.
Prerequisite:
To test on 2x R-Car S4 Spider boards, the following two commits are
needed:
- 13f55a7ca773 ("PCI: dwc: rcar-gen4: Change EPC BAR alignment to 4K as
per the documentation")
- f761e0deb4d9 ("PCI: dwc: rcar-gen4: Mark BAR0 and BAR2 as Resizable
BARs in endpoint mode")
Note: these already landed in pci/controller/dwc-rcar-gen4-ep.
Setup:
- configfs (R-Car S4 Spider in EP mode):
cd /sys/kernel/config/pci_ep/
mkdir functions/pci_epf_vntb/func1
echo 0x1912 > functions/pci_epf_vntb/func1/vendorid
echo 0x0030 > functions/pci_epf_vntb/func1/deviceid
echo 32 > functions/pci_epf_vntb/func1/msi_interrupts
echo 4 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
echo 128 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
echo 1 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
echo 0x100000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
echo 0x1912 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
echo 0x0030 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
echo 0 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/ctrl_bar
echo 4 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_bar
echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
ln -s controllers/e65d0000.pcie-ep functions/pci_epf_vntb/func1/primary/
echo 1 > controllers/e65d0000.pcie-ep/start
- ensure ntb_transport/ntb_netdev are loaded on both sides
Results:
- Without this series (pci/endpoint)
$ ping -c 10 10.0.0.11
PING 10.0.0.11 (10.0.0.11) 56(84) bytes of data.
64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=12.1 ms
64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=6.17 ms
64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=12.2 ms
64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=6.10 ms
64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=12.1 ms
64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=9.96 ms
64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=4.04 ms
64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=10.2 ms
64 bytes from 10.0.0.11: icmp_seq=9 ttl=64 time=4.13 ms
64 bytes from 10.0.0.11: icmp_seq=10 ttl=64 time=10.0 ms
- With this series (on top of pci/endpoint + Dependency (1))
$ ping -c 10 10.0.0.11
PING 10.0.0.11 (10.0.0.11) 56(84) bytes of data.
64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=0.828 ms
64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=0.822 ms
64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=0.848 ms
64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=0.869 ms
64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=0.775 ms
64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.889 ms
64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.893 ms
64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=0.964 ms
64 bytes from 10.0.0.11: icmp_seq=9 ttl=64 time=0.890 ms
64 bytes from 10.0.0.11: icmp_seq=10 ttl=64 time=1.08 ms
---
Changelog
---------
* v13->v14 changes:
- Rebased onto the current pci/endpoint:
1d3225cb5d82 ("selftests: pci_endpoint: Skip BAR subrange test on -ENOSPC")
- Dropped -EOPNOTSUPP to -ENODEV conversions.
- Fixed whitespace and alignment issues reported by checkpatch.pl.
* v12->v13 changes:
- Rebased onto the current pci/endpoint:
5ab7a225888b ("misc: pci_endpoint_test: Add Tegra194 and Tegra234 device table entries")
- Renamed pci_epc_count_aux_resources() to
pci_epc_get_aux_resources_count() and have it return the count directly.
* v11->v12 changes:
- Rebased onto the current pci/endpoint:
185596ad93f5 ("PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()")
- Split the EPC auxiliary-resource API into
pci_epc_count_aux_resources() and pci_epc_get_aux_resources(), both
returning 0 on success.
- Added the corresponding count/get callbacks in pci_epc_ops and in the
DesignWare EPC provider.
- Updated the embedded doorbell fallback to use the new count/get
helpers.
- Stopped warning on duplicate DOORBELL_MMIO resources for now; use the
first matching resource and add a TODO for future multi-resource
support.
* v10->v11 changes:
- Rebased onto the current pci/endpoint:
e022f0c72c7f ("selftests: pci_endpoint: Skip reserved BARs")
- Dropped PCI_EPC_AUX_DMA_CTRL_MMIO and PCI_EPC_AUX_DMA_CHAN_DESC from
this series; they will be added later together with their first real
consumer.
- Picked up tags from Frank and Niklas for patches 5/6/7.
- Dropped tags for patches 1 and 3 due to code changes.
- Revised the commit message of Patch 2 to better explain the purpose.
* v9->v10 changes:
- Patch 7/7: report the dma_map_resource() DMA address instead of the
raw physical address, so EPF drivers do not need to perform any
additional IOMMU mapping and the semantics match the MSI doorbell
case.
- Rebased onto the latest pci/endpoint, and updated dependency references.
- Re-ran functional tests and vNTB ping-latency measurements, and added
Rock 5B (EP) <-> Orion O6 (RC) to the test matrix.
* v8->v9 changes:
- Add a new dependency series (3), which moved the BAR reserved-subregion
framework + the RK3588 BAR4 example out of v8 (dropping the corresponding
patches from this series).
- pci-epf-vntb: rename the duplicate-IRQ helper and invert the return value,
per Frank's review.
- pci-epf-test: drop the extra size_add() doorbell-offset check, per Niklas'
review.
- pci-ep-msi: add a DWORD alignment check for DOORBELL_MMIO, per Niklas's
review.
- Carry over Reviewed-by tags for unchanged patches + drop Reviewed-by tags
where code changed.
- Rename the last patch subject (drop 'eDMA' word).
* v7->v8 changes:
- Deduplicate request_irq()/free_irq() calls based on virq (shared
IRQ) rather than doorbell type, as suggested during review of v7
Patch #7.
- Clean up the pci_epf_alloc_doorbell() error path, as suggested
during review of v7 Patch #9.
- Use range_end_overflows_t() instead of an open-coded overflow check,
following discussion during review of v7 Patch #5.
- Add a write-data field to the DOORBELL_MMIO aux-resource metadata
and plumb it through to the embedded doorbell backend (DesignWare
uses data=0).
* v6->v7 changes:
- Split out preparatory patches to keep the series below 10 patches.
- Add support for platforms where the eDMA register block is fixed
within a reserved BAR window (e.g. RK3588 BAR4) and must be reused
as-is.
- Introduce a dedicated virtual IRQ and irq_chip (using
handle_level_irq) for interrupt-emulation doorbells instead of
reusing per-channel IRQs. This avoids delivery via different IRQs on
platforms with chip->nr_irqs > 1.
* v5->v6 changes:
- Fix a double-free in v5 Patch 8/8 caused by mixing __free(kfree) with
an explicit kfree(). This is a functional bug (detectable by KASAN),
hence the respin solely for this fix. Sorry for the noise. No other
changes.
* v4->v5 changes:
- Change the series subject now that the series has evolved into a
consumer-driven set focused on the embedded doorbell fallback and its
in-tree users (epf-test and epf-vntb).
- Drop [PATCH v4 01/09] (dw-edma per-channel interrupt routing control)
from this series for now, so the series focuses on what's needed by the
current consumer (i.e. the doorbell fallback implementation).
- Replace the v4 embedded-doorbell "test variant + host/kselftest
plumbing" with a generic embedded-doorbell fallback in
pci_epf_alloc_doorbell(), including exposing required IRQ request flags
to EPF drivers.
- Two preparatory fix patches (Patch 6/8 and 7/8) to clean up error
handling and state management ahead of Patch 8/8.
- Rename *_get_remote_resource() to *_get_aux_resources() and adjust
relevant variable namings and kernel docs. Discussion may continue.
- Rework dw-edma per-channel metadata exposure to cache the needed info
in dw_edma_chip (IRQ number + emulation doorbell offset) and consume it
from the DesignWare EPC auxiliary resource provider without calling back
to dw-edma.
* v3->v4 changes:
- Drop dma_slave_caps.hw_id and the dmaengine selfirq callback
registration API. Instead, add a dw-edma specific dw_edma_chan_info()
helper and extend the EPC remote resource metadata accordingly.
- Add explicit acking for eDMA interrupt emulation and adjust the
dw-edma IRQ path for embedded-doorbell usage.
- Replace the previous EPC API smoke test with an embedded doorbell
test variant (pci-epf-test + pci_endpoint_test/selftests).
- Rebase onto pci.git controller/dwc commit 43d324eeb08c.
* v2->v3 changes:
- Replace DWC-specific helpers with a generic EPC remote resource query API.
- Add pci-epf-test smoke test and host/kselftest support for the new API.
- Drop the dw-edma-specific notify-only channel and polling approach
([PATCH v2 4/7] and [PATCH v2 5/7]), and rework notification handling
around a generic dmaengine_(un)register_selfirq() API implemented
by dw-edma.
* v1->v2 changes:
- Combine the two previously posted series into a single set (per Frank's
suggestion). Order dmaengine/dw-edma patches first so hw_id support
lands before the PCI LL-region helper, which assumes
dma_slave_caps.hw_id availability.
v13: https://lore.kernel.org/linux-pci/20260406155717.880246-1-den@valinux.co.jp/
v12: https://lore.kernel.org/linux-pci/20260327035422.4020455-1-den@valinux.co.jp/
v11: https://lore.kernel.org/linux-pci/20260324083728.3744734-1-den@valinux.co.jp/
v10: https://lore.kernel.org/linux-pci/20260302071427.534158-1-den@valinux.co.jp/
v9: https://lore.kernel.org/linux-pci/20260219081318.4156901-1-den@valinux.co.jp/
v8: https://lore.kernel.org/linux-pci/20260217080601.3808847-1-den@valinux.co.jp/
v7: https://lore.kernel.org/linux-pci/20260215163847.3522572-1-den@valinux.co.jp/
v6: https://lore.kernel.org/all/20260209125316.2132589-1-den@valinux.co.jp/
v5: https://lore.kernel.org/all/20260209062952.2049053-1-den@valinux.co.jp/
v4: https://lore.kernel.org/all/20260206172646.1556847-1-den@valinux.co.jp/
v3: https://lore.kernel.org/all/20260204145440.950609-1-den@valinux.co.jp/
v2: https://lore.kernel.org/all/20260127033420.3460579-1-den@valinux.co.jp/
v1: https://lore.kernel.org/dmaengine/20260126073652.3293564-1-den@valinux.co.jp/
+
https://lore.kernel.org/linux-pci/20260126071550.3233631-1-den@valinux.co.jp/
Best regards,
Koichiro
Koichiro Den (7):
PCI: endpoint: Add auxiliary resource query API
PCI: dwc: Record integrated eDMA register window
PCI: dwc: ep: Expose integrated eDMA resources via EPC aux-resource
API
PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new
backends
PCI: endpoint: pci-epf-vntb: Reuse pre-exposed doorbells and IRQ flags
PCI: endpoint: pci-epf-test: Reuse pre-exposed doorbell targets
PCI: endpoint: pci-ep-msi: Add embedded doorbell fallback
.../pci/controller/dwc/pcie-designware-ep.c | 119 ++++++++++++
drivers/pci/controller/dwc/pcie-designware.c | 4 +
drivers/pci/controller/dwc/pcie-designware.h | 2 +
drivers/pci/endpoint/functions/pci-epf-test.c | 84 +++++---
drivers/pci/endpoint/functions/pci-epf-vntb.c | 61 +++++-
drivers/pci/endpoint/pci-ep-msi.c | 179 ++++++++++++++++--
drivers/pci/endpoint/pci-epc-core.c | 80 ++++++++
include/linux/pci-epc.h | 54 ++++++
include/linux/pci-epf.h | 31 ++-
9 files changed, 567 insertions(+), 47 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH v14 4/7] PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends
2026-04-14 14:15 [PATCH v14 0/7] PCI: endpoint: pci-ep-msi: Add embedded doorbell fallback Koichiro Den
@ 2026-04-14 14:15 ` Koichiro Den
2026-04-29 8:58 ` Max Boone
0 siblings, 1 reply; 6+ messages in thread
From: Koichiro Den @ 2026-04-14 14:15 UTC (permalink / raw)
To: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
Kishon Vijay Abraham I, Jon Mason, Dave Jiang, Allen Hubbe,
Niklas Cassel, Frank Li, Bhanu Seshu Kumar Valluri,
Marco Crivellari, Shin'ichiro Kawasaki, Manikanta Maddireddy
Cc: linux-pci, linux-kernel, ntb
Prepare pci-ep-msi for non-MSI doorbell backends.
Factor MSI doorbell allocation into a helper and extend struct
pci_epf_doorbell_msg with:
- irq_flags: required IRQ request flags (e.g. IRQF_SHARED for some
backends)
- type: doorbell backend type
- bar/offset: pre-exposed doorbell target location, if any
Initialize these fields for the existing MSI-backed doorbell
implementation.
Also add PCI_EPF_DOORBELL_EMBEDDED type, which is to be implemented in a
follow-up patch.
No functional changes.
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Tested-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/pci/endpoint/pci-ep-msi.c | 54 ++++++++++++++++++++++---------
include/linux/pci-epf.h | 23 +++++++++++--
2 files changed, 60 insertions(+), 17 deletions(-)
diff --git a/drivers/pci/endpoint/pci-ep-msi.c b/drivers/pci/endpoint/pci-ep-msi.c
index 1395919571f8..85fe46103220 100644
--- a/drivers/pci/endpoint/pci-ep-msi.c
+++ b/drivers/pci/endpoint/pci-ep-msi.c
@@ -8,6 +8,7 @@
#include <linux/device.h>
#include <linux/export.h>
+#include <linux/interrupt.h>
#include <linux/irqdomain.h>
#include <linux/module.h>
#include <linux/msi.h>
@@ -35,23 +36,13 @@ static void pci_epf_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
pci_epc_put(epc);
}
-int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
+static int pci_epf_alloc_doorbell_msi(struct pci_epf *epf, u16 num_db)
{
- struct pci_epc *epc = epf->epc;
+ struct pci_epf_doorbell_msg *msg;
struct device *dev = &epf->dev;
+ struct pci_epc *epc = epf->epc;
struct irq_domain *domain;
- void *msg;
- int ret;
- int i;
-
- /* TODO: Multi-EPF support */
- if (list_first_entry_or_null(&epc->pci_epf, struct pci_epf, list) != epf) {
- dev_err(dev, "MSI doorbell doesn't support multiple EPF\n");
- return -EINVAL;
- }
-
- if (epf->db_msg)
- return -EBUSY;
+ int ret, i;
domain = of_msi_map_get_device_domain(epc->dev.parent, 0,
DOMAIN_BUS_PLATFORM_MSI);
@@ -74,6 +65,12 @@ int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
if (!msg)
return -ENOMEM;
+ for (i = 0; i < num_db; i++)
+ msg[i] = (struct pci_epf_doorbell_msg) {
+ .type = PCI_EPF_DOORBELL_MSI,
+ .bar = NO_BAR,
+ };
+
epf->num_db = num_db;
epf->db_msg = msg;
@@ -90,13 +87,40 @@ int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
for (i = 0; i < num_db; i++)
epf->db_msg[i].virq = msi_get_virq(epc->dev.parent, i);
+ return 0;
+}
+
+int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
+{
+ struct pci_epc *epc = epf->epc;
+ struct device *dev = &epf->dev;
+ int ret;
+
+ /* TODO: Multi-EPF support */
+ if (list_first_entry_or_null(&epc->pci_epf, struct pci_epf, list) != epf) {
+ dev_err(dev, "Doorbell doesn't support multiple EPF\n");
+ return -EINVAL;
+ }
+
+ if (epf->db_msg)
+ return -EBUSY;
+
+ ret = pci_epf_alloc_doorbell_msi(epf, num_db);
+ if (!ret)
+ return 0;
+
+ dev_err(dev, "Failed to allocate doorbell: %d\n", ret);
return ret;
}
EXPORT_SYMBOL_GPL(pci_epf_alloc_doorbell);
void pci_epf_free_doorbell(struct pci_epf *epf)
{
- platform_device_msi_free_irqs_all(epf->epc->dev.parent);
+ if (!epf->db_msg)
+ return;
+
+ if (epf->db_msg[0].type == PCI_EPF_DOORBELL_MSI)
+ platform_device_msi_free_irqs_all(epf->epc->dev.parent);
kfree(epf->db_msg);
epf->db_msg = NULL;
diff --git a/include/linux/pci-epf.h b/include/linux/pci-epf.h
index 7737a7c03260..cd747447a1ea 100644
--- a/include/linux/pci-epf.h
+++ b/include/linux/pci-epf.h
@@ -152,14 +152,33 @@ struct pci_epf_bar {
struct pci_epf_bar_submap *submap;
};
+enum pci_epf_doorbell_type {
+ PCI_EPF_DOORBELL_MSI = 0,
+ PCI_EPF_DOORBELL_EMBEDDED,
+};
+
/**
* struct pci_epf_doorbell_msg - represents doorbell message
- * @msg: MSI message
- * @virq: IRQ number of this doorbell MSI message
+ * @msg: Doorbell address/data pair to be mapped into BAR space.
+ * For MSI-backed doorbells this is the MSI message, while for
+ * "embedded" doorbells this represents an MMIO write that asserts
+ * an interrupt on the EP side.
+ * @virq: IRQ number of this doorbell message
+ * @irq_flags: Required flags for request_irq()/request_threaded_irq().
+ * Callers may OR-in additional flags (e.g. IRQF_ONESHOT).
+ * @type: Doorbell type.
+ * @bar: BAR number where the doorbell target is already exposed to the RC
+ * (NO_BAR if not)
+ * @offset: offset within @bar for the doorbell target (valid iff
+ * @bar != NO_BAR)
*/
struct pci_epf_doorbell_msg {
struct msi_msg msg;
int virq;
+ unsigned long irq_flags;
+ enum pci_epf_doorbell_type type;
+ enum pci_barno bar;
+ resource_size_t offset;
};
/**
--
2.51.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH v14 4/7] PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends
2026-04-14 14:15 ` [PATCH v14 4/7] PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends Koichiro Den
@ 2026-04-29 8:58 ` Max Boone
0 siblings, 0 replies; 6+ messages in thread
From: Max Boone @ 2026-04-29 8:58 UTC (permalink / raw)
To: Koichiro Den, mani, Frank Li
Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
Kishon Vijay Abraham I, Jon Mason, Dave Jiang, Allen Hubbe,
Niklas Cassel, Frank Li, Bhanu Seshu Kumar Valluri,
Marco Crivellari, Shin'ichiro Kawasaki, Manikanta Maddireddy,
linux-pci, linux-kernel, ntb, mboone
> On Apr 14, 2026, at 4:15 PM, Koichiro Den <den@valinux.co.jp> wrote:
>
> Prepare pci-ep-msi for non-MSI doorbell backends.
>
> Factor MSI doorbell allocation into a helper and extend struct
> pci_epf_doorbell_msg with:
>
> - irq_flags: required IRQ request flags (e.g. IRQF_SHARED for some
> backends)
> - type: doorbell backend type
> - bar/offset: pre-exposed doorbell target location, if any
>
> Initialize these fields for the existing MSI-backed doorbell
> implementation.
>
> Also add PCI_EPF_DOORBELL_EMBEDDED type, which is to be implemented in a
> follow-up patch.
I’m not very fond of keeping this implementation in the pci-ep-msi file,
as the platform MSI and this implementation are both iiuc specific to
the designware ep driver. Even more so because the MSI implementation
is enabled by config rather than through device tree.
Wouldn’t we want end-users to specify what kind of doorbell they want,
as it seems to be that a more specific doorbell BAR layout can be
programmed with eDMA, allowing native support for nvmet’s doorbell
BAR for example.
Originally in a patchset by Frank Li the API that was proposed was more
generic, and the pci-epc-msi implementation was chosen because there
was only one implementation:
- https://lore.kernel.org/imx/20231019150441.GA7254@thinkpad/
- https://lore.kernel.org/imx/20231019172347.GC7254@thinkpad/
I’d personally prefer to see an abstraction that is weaved into pci-epc-core
and pci-epf-core that can be implemented by drivers as they wish. While
still keeping the enum for different types.
That also gives room to pull a poll-mode doorbell into the pci-epc-core,
which deduplicates that code from the nvmet and vntb epfs, and allows
other functions to use RC->EP doorbells without needing to bother with
writing the polling mechanism.
P.S. I’ve been working on a vfio-user based epc for development purposes
personally, and the last hurdle before I want to send it in for comments is
support for doorbells, and came across this patchset checking if there’s
any other activity in the space. Having an implementation-agnostic
doorbell API in the EPF/EPC core would be very helpful to me.
>
> No functional changes.
>
> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> Tested-by: Niklas Cassel <cassel@kernel.org>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
> drivers/pci/endpoint/pci-ep-msi.c | 54 ++++++++++++++++++++++---------
> include/linux/pci-epf.h | 23 +++++++++++--
> 2 files changed, 60 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/pci/endpoint/pci-ep-msi.c b/drivers/pci/endpoint/pci-ep-msi.c
> index 1395919571f8..85fe46103220 100644
> --- a/drivers/pci/endpoint/pci-ep-msi.c
> +++ b/drivers/pci/endpoint/pci-ep-msi.c
> @@ -8,6 +8,7 @@
>
> #include <linux/device.h>
> #include <linux/export.h>
> +#include <linux/interrupt.h>
> #include <linux/irqdomain.h>
> #include <linux/module.h>
> #include <linux/msi.h>
> @@ -35,23 +36,13 @@ static void pci_epf_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
> pci_epc_put(epc);
> }
>
> -int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
> +static int pci_epf_alloc_doorbell_msi(struct pci_epf *epf, u16 num_db)
> {
> - struct pci_epc *epc = epf->epc;
> + struct pci_epf_doorbell_msg *msg;
> struct device *dev = &epf->dev;
> + struct pci_epc *epc = epf->epc;
> struct irq_domain *domain;
> - void *msg;
> - int ret;
> - int i;
> -
> - /* TODO: Multi-EPF support */
> - if (list_first_entry_or_null(&epc->pci_epf, struct pci_epf, list) != epf) {
> - dev_err(dev, "MSI doorbell doesn't support multiple EPF\n");
> - return -EINVAL;
> - }
> -
> - if (epf->db_msg)
> - return -EBUSY;
> + int ret, i;
>
> domain = of_msi_map_get_device_domain(epc->dev.parent, 0,
> DOMAIN_BUS_PLATFORM_MSI);
> @@ -74,6 +65,12 @@ int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
> if (!msg)
> return -ENOMEM;
>
> + for (i = 0; i < num_db; i++)
> + msg[i] = (struct pci_epf_doorbell_msg) {
> + .type = PCI_EPF_DOORBELL_MSI,
> + .bar = NO_BAR,
> + };
> +
> epf->num_db = num_db;
> epf->db_msg = msg;
>
> @@ -90,13 +87,40 @@ int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
> for (i = 0; i < num_db; i++)
> epf->db_msg[i].virq = msi_get_virq(epc->dev.parent, i);
>
> + return 0;
> +}
> +
> +int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
> +{
> + struct pci_epc *epc = epf->epc;
> + struct device *dev = &epf->dev;
> + int ret;
> +
> + /* TODO: Multi-EPF support */
> + if (list_first_entry_or_null(&epc->pci_epf, struct pci_epf, list) != epf) {
> + dev_err(dev, "Doorbell doesn't support multiple EPF\n");
> + return -EINVAL;
> + }
> +
> + if (epf->db_msg)
> + return -EBUSY;
> +
> + ret = pci_epf_alloc_doorbell_msi(epf, num_db);
> + if (!ret)
> + return 0;
> +
> + dev_err(dev, "Failed to allocate doorbell: %d\n", ret);
> return ret;
> }
> EXPORT_SYMBOL_GPL(pci_epf_alloc_doorbell);
>
> void pci_epf_free_doorbell(struct pci_epf *epf)
> {
> - platform_device_msi_free_irqs_all(epf->epc->dev.parent);
> + if (!epf->db_msg)
> + return;
> +
> + if (epf->db_msg[0].type == PCI_EPF_DOORBELL_MSI)
> + platform_device_msi_free_irqs_all(epf->epc->dev.parent);
>
> kfree(epf->db_msg);
> epf->db_msg = NULL;
> diff --git a/include/linux/pci-epf.h b/include/linux/pci-epf.h
> index 7737a7c03260..cd747447a1ea 100644
> --- a/include/linux/pci-epf.h
> +++ b/include/linux/pci-epf.h
> @@ -152,14 +152,33 @@ struct pci_epf_bar {
> struct pci_epf_bar_submap *submap;
> };
>
> +enum pci_epf_doorbell_type {
> + PCI_EPF_DOORBELL_MSI = 0,
> + PCI_EPF_DOORBELL_EMBEDDED,
> +};
> +
> /**
> * struct pci_epf_doorbell_msg - represents doorbell message
> - * @msg: MSI message
> - * @virq: IRQ number of this doorbell MSI message
> + * @msg: Doorbell address/data pair to be mapped into BAR space.
> + * For MSI-backed doorbells this is the MSI message, while for
> + * "embedded" doorbells this represents an MMIO write that asserts
> + * an interrupt on the EP side.
> + * @virq: IRQ number of this doorbell message
> + * @irq_flags: Required flags for request_irq()/request_threaded_irq().
> + * Callers may OR-in additional flags (e.g. IRQF_ONESHOT).
> + * @type: Doorbell type.
> + * @bar: BAR number where the doorbell target is already exposed to the RC
> + * (NO_BAR if not)
> + * @offset: offset within @bar for the doorbell target (valid iff
> + * @bar != NO_BAR)
> */
> struct pci_epf_doorbell_msg {
> struct msi_msg msg;
> int virq;
> + unsigned long irq_flags;
> + enum pci_epf_doorbell_type type;
> + enum pci_barno bar;
> + resource_size_t offset;
> };
>
> /**
> --
> 2.51.0
>
>
--
Max
P.P.S. Sorry for the duplicate mail, the mailto link from lore didn’t work properly, at least this should put it in-thread.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-29 14:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 20:36 [PATCH v14 4/7] PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends Max Boone
2026-04-29 9:33 ` Niklas Cassel
2026-04-29 11:11 ` Max Boone
2026-04-29 14:52 ` Niklas Cassel
-- strict thread matches above, loose matches on Subject: below --
2026-04-14 14:15 [PATCH v14 0/7] PCI: endpoint: pci-ep-msi: Add embedded doorbell fallback Koichiro Den
2026-04-14 14:15 ` [PATCH v14 4/7] PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends Koichiro Den
2026-04-29 8:58 ` Max Boone
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox