* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr
@ 2017-01-05 10:20 Max Gurtovoy
2017-01-05 11:02 ` Haggai Eran
0 siblings, 1 reply; 8+ messages in thread
From: Max Gurtovoy @ 2017-01-05 10:20 UTC (permalink / raw)
hi Guys,
we have noticed that in drivers/nvme/host/pci.c we have the following lines:
"
dma_addr = pci_resource_start(pdev, NVME_CMB_BIR(dev->cmbloc)) + offset;
cmb = ioremap_wc(dma_addr, size);
if (!cmb)
return NULL;
dev->cmb_dma_addr = dma_addr;
dev->cmb_size = size;
"
in nvme_map_cmb func. pci_resource_start should return resource_size_t
(phys_addr_t) and not dma_addr_t. I don't have the HW to check this code
and propose a fix but it's seems buggy for me. In case we need dma
address we should map it.
thanks,
Max.
^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr
2017-01-05 10:20 phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr Max Gurtovoy
@ 2017-01-05 11:02 ` Haggai Eran
2017-01-05 18:39 ` Jon Derrick
0 siblings, 1 reply; 8+ messages in thread
From: Haggai Eran @ 2017-01-05 11:02 UTC (permalink / raw)
On 1/5/2017 12:20 PM, Max Gurtovoy wrote:
> dma_addr = pci_resource_start(pdev, NVME_CMB_BIR(dev->cmbloc)) + offset;
> cmb = ioremap_wc(dma_addr, size);
> if (!cmb)
> return NULL;
>
> dev->cmb_dma_addr = dma_addr;
> dev->cmb_size = size;
>
> in nvme_map_cmb func. pci_resource_start should return resource_size_t (phys_addr_t) and not dma_addr_t. I don't have the HW to check this code and propose a fix but it's seems buggy for me. In case we need dma address we should map it.
Perhaps I'm mistaken, but shouldn't the code use pcibios_resource_to_bus()
in this case to convert the resource to bus addresses? I see cmb_dma_addr
is later passed directly to the device as the sq_dma_addr.
Haggai
^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr
2017-01-05 11:02 ` Haggai Eran
@ 2017-01-05 18:39 ` Jon Derrick
2017-01-08 8:55 ` Haggai Eran
0 siblings, 1 reply; 8+ messages in thread
From: Jon Derrick @ 2017-01-05 18:39 UTC (permalink / raw)
On Thu, Jan 05, 2017@01:02:45PM +0200, Haggai Eran wrote:
> On 1/5/2017 12:20 PM, Max Gurtovoy wrote:
> > dma_addr = pci_resource_start(pdev, NVME_CMB_BIR(dev->cmbloc)) + offset;
> > cmb = ioremap_wc(dma_addr, size);
> > if (!cmb)
> > return NULL;
> >
> > dev->cmb_dma_addr = dma_addr;
> > dev->cmb_size = size;
> >
> > in nvme_map_cmb func. pci_resource_start should return resource_size_t (phys_addr_t) and not dma_addr_t. I don't have the HW to check this code and propose a fix but it's seems buggy for me. In case we need dma address we should map it.
Good catch. It does look wrong to pass a dma_addr_t to ioremap_wc and
Create-SQes's PRP1.
>
> Perhaps I'm mistaken, but shouldn't the code use pcibios_resource_to_bus()
> in this case to convert the resource to bus addresses? I see cmb_dma_addr
> is later passed directly to the device as the sq_dma_addr.
>
That gets us a region from a window within a larger region, but to me it
looks to me like resource_contains() would fail to match if the CMB
region went beyond the window.
There's another option - pci_bus_addr_t/pci_bus_region takes the largest
of phys_addr_t's width and dma_addr_t's width. So in the cases where
those two types might differ it should still be able to hold a valid
physical address, which is what both the resource API and Create-SQes
expect.
But I'd rather see full integration into the pci resource tree so that
it is aware of the CMB window. It isn't currently because the offset is
given in an nvme register. I'd like to see the resource tree aware of
all of this, and made a poor attempt to do that and sysfs integration at
the same time. But I think Haggai's genalloc modifications and
integration into the resource tree would mesh really well :)
I'll follow up with a patch shortly that I think fixes a few basic
issues expressed here.
Cheers
^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr
2017-01-05 18:39 ` Jon Derrick
@ 2017-01-08 8:55 ` Haggai Eran
2017-01-09 21:54 ` Jon Derrick
0 siblings, 1 reply; 8+ messages in thread
From: Haggai Eran @ 2017-01-08 8:55 UTC (permalink / raw)
On 1/5/2017 8:39 PM, Jon Derrick wrote:
>> Perhaps I'm mistaken, but shouldn't the code use pcibios_resource_to_bus()
>> in this case to convert the resource to bus addresses? I see cmb_dma_addr
>> is later passed directly to the device as the sq_dma_addr.
>>
> That gets us a region from a window within a larger region, but to me it
> looks to me like resource_contains() would fail to match if the CMB
> region went beyond the window.
I thought that the CMB must fit in its BAR, and therefore in the window that
contains it. Isn't it so?
> There's another option - pci_bus_addr_t/pci_bus_region takes the largest
> of phys_addr_t's width and dma_addr_t's width. So in the cases where
> those two types might differ it should still be able to hold a valid
> physical address, which is what both the resource API and Create-SQes
> expect.
I don't think the issue is just the width of the types. What happens on
architectures where phy_addr_t addresses are translated before going to
the PCIe bus?
Regards,
Haggai
^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr
2017-01-08 8:55 ` Haggai Eran
@ 2017-01-09 21:54 ` Jon Derrick
2017-01-11 8:15 ` Haggai Eran
0 siblings, 1 reply; 8+ messages in thread
From: Jon Derrick @ 2017-01-09 21:54 UTC (permalink / raw)
On Sun, Jan 08, 2017@10:55:28AM +0200, Haggai Eran wrote:
> On 1/5/2017 8:39 PM, Jon Derrick wrote:
> >> Perhaps I'm mistaken, but shouldn't the code use pcibios_resource_to_bus()
> >> in this case to convert the resource to bus addresses? I see cmb_dma_addr
> >> is later passed directly to the device as the sq_dma_addr.
> >>
> > That gets us a region from a window within a larger region, but to me it
> > looks to me like resource_contains() would fail to match if the CMB
> > region went beyond the window.
> I thought that the CMB must fit in its BAR, and therefore in the window that
> contains it. Isn't it so?
>
The spec is unclear if it's the host's responsibility to stay within the
BAR, or the device's to reduce CMBLOC and CMBSZ to fit:
"If the Offset + Size exceeds the length of
the indicated BAR, the size available to the host is limited by the
length of the BAR."
I think this would only happen if we're behind a bridge with a smaller
window than BAR.
> > There's another option - pci_bus_addr_t/pci_bus_region takes the largest
> > of phys_addr_t's width and dma_addr_t's width. So in the cases where
> > those two types might differ it should still be able to hold a valid
> > physical address, which is what both the resource API and Create-SQes
> > expect.
> I don't think the issue is just the width of the types. What happens on
> architectures where phy_addr_t addresses are translated before going to
> the PCIe bus?
If we have a DMA translation, we get the host side addresses from the
ioremapping and I believe the device is still expecting the untranslated
addresses, since it needs to DMA over the fabric. Do archs exists that
don't fit this model?
>
> Regards,
> Haggai
Cheers
^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr
2017-01-09 21:54 ` Jon Derrick
@ 2017-01-11 8:15 ` Haggai Eran
2017-01-11 9:06 ` hch
2017-01-11 15:36 ` Jon Derrick
0 siblings, 2 replies; 8+ messages in thread
From: Haggai Eran @ 2017-01-11 8:15 UTC (permalink / raw)
On Mon, 2017-01-09@14:54 -0700, Jon Derrick wrote:
> On Sun, Jan 08, 2017@10:55:28AM +0200, Haggai Eran wrote:
> >
> > On 1/5/2017 8:39 PM, Jon Derrick wrote:
> > >
> > > >
> > > > Perhaps I'm mistaken, but shouldn't the code use
> > > > pcibios_resource_to_bus()
> > > > in this case to convert the resource to bus addresses? I see
> > > > cmb_dma_addr?
> > > > is later passed directly to the device as the sq_dma_addr.
> > > >
> > > That gets us a region from a window within a larger region, but
> > > to me it
> > > looks to me like resource_contains() would fail to match if the
> > > CMB
> > > region went beyond the window.
> > I thought that the CMB must fit in its BAR, and therefore in the
> > window that?
> > contains it. Isn't it so?
> >
> The spec is unclear if it's the host's responsibility to stay within
> the
> BAR, or the device's to reduce CMBLOC and CMBSZ to fit:
>
> "If the Offset + Size exceeds the length of
> the indicated BAR, the size available to the host is limited by the
> length of the BAR."
If the BAR is smaller than (offset + size) then any address that is
outside the BAR must be treated by the device as if it is not in the
CMB (otherwise some other devices / host memory will simply be
inaccessible by the NVMe device).?
>
> I think this would only happen if we're behind a bridge with a
> smaller
> window than BAR.
I'm pretty sure that the bridge window must contain the underlying
device BARs. If it can't contain them, they can be simply left
disabled.
The situation can still happen in case the NVMe device exposes a
smaller BAR than the CMB, or if it supports the resizeable BAR PCIe
capability and the BIOS resized it to a smaller size (although I
haven't heard of any device or BIOS that supports that).?
>
>
> >
> > >
> > > There's another option - pci_bus_addr_t/pci_bus_region takes the
> > > largest
> > > of phys_addr_t's width and dma_addr_t's width. So in the cases
> > > where
> > > those two types might differ it should still be able to hold a
> > > valid
> > > physical address, which is what both the resource API and Create-
> > > SQes
> > > expect.
> > I don't think the issue is just the width of the types. What
> > happens on?
> > architectures where phy_addr_t addresses are translated before
> > going to?
> > the PCIe bus?
> If we have a DMA translation, we get the host side addresses from the
> ioremapping and I believe the device is still expecting the
> untranslated
> addresses, since it needs to DMA over the fabric. Do archs exists
> that
> don't fit this model?
I'm not talking about DMA translation. I'm talking about MMIO
translation. From what I understand this can happen on POWER systems.
The physical addresses for MMIO that are used by the CPU are different
from the ones that are used on the PCIe bus.
Regards,
Haggai
^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr
2017-01-11 8:15 ` Haggai Eran
@ 2017-01-11 9:06 ` hch
2017-01-11 15:36 ` Jon Derrick
1 sibling, 0 replies; 8+ messages in thread
From: hch @ 2017-01-11 9:06 UTC (permalink / raw)
On Wed, Jan 11, 2017@08:15:23AM +0000, Haggai Eran wrote:
> If the BAR is smaller than (offset + size) then any address that is
> outside the BAR must be treated by the device as if it is not in the
> CMB (otherwise some other devices / host memory will simply be
> inaccessible by the NVMe device).?
Yes. I so wish the CMB design wasn't so messed up and we'd just have
a separate BAR with a relative index for it. Maybe we'll need to
propose a CMBv2 in the working group, at least for the proposed new
extensions :)
> > I think this would only happen if we're behind a bridge with a
> > smaller
> > window than BAR.
>
> I'm pretty sure that the bridge window must contain the underlying
> device BARs. If it can't contain them, they can be simply left
> disabled.
Exactly.
> I'm not talking about DMA translation. I'm talking about MMIO
> translation. From what I understand this can happen on POWER systems.
> The physical addresses for MMIO that are used by the CPU are different
> from the ones that are used on the PCIe bus.
As far as I know MMIO translation is absolutely usual for IOMMUs.
That being said my knowledge of IOMMUs is mostly form before Intel
and AMD chipset added them and thus from the RISC/IA64 world.
^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr
2017-01-11 8:15 ` Haggai Eran
2017-01-11 9:06 ` hch
@ 2017-01-11 15:36 ` Jon Derrick
1 sibling, 0 replies; 8+ messages in thread
From: Jon Derrick @ 2017-01-11 15:36 UTC (permalink / raw)
On Wed, Jan 11, 2017@08:15:23AM +0000, Haggai Eran wrote:
> On Mon, 2017-01-09@14:54 -0700, Jon Derrick wrote:
> > On Sun, Jan 08, 2017@10:55:28AM +0200, Haggai Eran wrote:
> > >
> > > On 1/5/2017 8:39 PM, Jon Derrick wrote:
> > > >
> > > > >
> > > > > Perhaps I'm mistaken, but shouldn't the code use
> > > > > pcibios_resource_to_bus()
> > > > > in this case to convert the resource to bus addresses? I see
> > > > > cmb_dma_addr?
> > > > > is later passed directly to the device as the sq_dma_addr.
> > > > >
> > > > That gets us a region from a window within a larger region, but
> > > > to me it
> > > > looks to me like resource_contains() would fail to match if the
> > > > CMB
> > > > region went beyond the window.
> > > I thought that the CMB must fit in its BAR, and therefore in the
> > > window that?
> > > contains it. Isn't it so?
> > >
> > The spec is unclear if it's the host's responsibility to stay within
> > the
> > BAR, or the device's to reduce CMBLOC and CMBSZ to fit:
> >
> > "If the Offset + Size exceeds the length of
> > the indicated BAR, the size available to the host is limited by the
> > length of the BAR."
>
> If the BAR is smaller than (offset + size) then any address that is
> outside the BAR must be treated by the device as if it is not in the
> CMB (otherwise some other devices / host memory will simply be
> inaccessible by the NVMe device).?
> >
> > I think this would only happen if we're behind a bridge with a
> > smaller
> > window than BAR.
>
> I'm pretty sure that the bridge window must contain the underlying
> device BARs. If it can't contain them, they can be simply left
> disabled.
>
Oh good. I wasn't aware of those restrictions. That should make
pcibus_resource_to_bus a possibility.
> The situation can still happen in case the NVMe device exposes a
> smaller BAR than the CMB, or if it supports the resizeable BAR PCIe
> capability and the BIOS resized it to a smaller size (although I
> haven't heard of any device or BIOS that supports that).?
>
> >
> >
> > >
> > > >
> > > > There's another option - pci_bus_addr_t/pci_bus_region takes the
> > > > largest
> > > > of phys_addr_t's width and dma_addr_t's width. So in the cases
> > > > where
> > > > those two types might differ it should still be able to hold a
> > > > valid
> > > > physical address, which is what both the resource API and Create-
> > > > SQes
> > > > expect.
> > > I don't think the issue is just the width of the types. What
> > > happens on?
> > > architectures where phy_addr_t addresses are translated before
> > > going to?
> > > the PCIe bus?
> > If we have a DMA translation, we get the host side addresses from the
> > ioremapping and I believe the device is still expecting the
> > untranslated
> > addresses, since it needs to DMA over the fabric. Do archs exists
> > that
> > don't fit this model?
> I'm not talking about DMA translation. I'm talking about MMIO
> translation. From what I understand this can happen on POWER systems.
> The physical addresses for MMIO that are used by the CPU are different
> from the ones that are used on the PCIe bus.
>
I hadn't considered those but the address given in the Create SQes
command still should be in the range of the addresses in the device's
BARs. I'm guessing we'll may have to go through the IOMMU subsystem to
untranslate those.
> Regards,
> Haggai
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-01-11 15:36 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-05 10:20 phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr Max Gurtovoy
2017-01-05 11:02 ` Haggai Eran
2017-01-05 18:39 ` Jon Derrick
2017-01-08 8:55 ` Haggai Eran
2017-01-09 21:54 ` Jon Derrick
2017-01-11 8:15 ` Haggai Eran
2017-01-11 9:06 ` hch
2017-01-11 15:36 ` Jon Derrick
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).