* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr
@ 2017-01-05 10:20 Max Gurtovoy
2017-01-05 11:02 ` Haggai Eran
0 siblings, 1 reply; 8+ messages in thread
From: Max Gurtovoy @ 2017-01-05 10:20 UTC (permalink / raw)
hi Guys,
we have noticed that in drivers/nvme/host/pci.c we have the following lines:
"
dma_addr = pci_resource_start(pdev, NVME_CMB_BIR(dev->cmbloc)) + offset;
cmb = ioremap_wc(dma_addr, size);
if (!cmb)
return NULL;
dev->cmb_dma_addr = dma_addr;
dev->cmb_size = size;
"
in nvme_map_cmb func. pci_resource_start should return resource_size_t
(phys_addr_t) and not dma_addr_t. I don't have the HW to check this code
and propose a fix but it's seems buggy for me. In case we need dma
address we should map it.
thanks,
Max.
^ permalink raw reply [flat|nested] 8+ messages in thread* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr 2017-01-05 10:20 phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr Max Gurtovoy @ 2017-01-05 11:02 ` Haggai Eran 2017-01-05 18:39 ` Jon Derrick 0 siblings, 1 reply; 8+ messages in thread From: Haggai Eran @ 2017-01-05 11:02 UTC (permalink / raw) On 1/5/2017 12:20 PM, Max Gurtovoy wrote: > dma_addr = pci_resource_start(pdev, NVME_CMB_BIR(dev->cmbloc)) + offset; > cmb = ioremap_wc(dma_addr, size); > if (!cmb) > return NULL; > > dev->cmb_dma_addr = dma_addr; > dev->cmb_size = size; > > in nvme_map_cmb func. pci_resource_start should return resource_size_t (phys_addr_t) and not dma_addr_t. I don't have the HW to check this code and propose a fix but it's seems buggy for me. In case we need dma address we should map it. Perhaps I'm mistaken, but shouldn't the code use pcibios_resource_to_bus() in this case to convert the resource to bus addresses? I see cmb_dma_addr is later passed directly to the device as the sq_dma_addr. Haggai ^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr 2017-01-05 11:02 ` Haggai Eran @ 2017-01-05 18:39 ` Jon Derrick 2017-01-08 8:55 ` Haggai Eran 0 siblings, 1 reply; 8+ messages in thread From: Jon Derrick @ 2017-01-05 18:39 UTC (permalink / raw) On Thu, Jan 05, 2017@01:02:45PM +0200, Haggai Eran wrote: > On 1/5/2017 12:20 PM, Max Gurtovoy wrote: > > dma_addr = pci_resource_start(pdev, NVME_CMB_BIR(dev->cmbloc)) + offset; > > cmb = ioremap_wc(dma_addr, size); > > if (!cmb) > > return NULL; > > > > dev->cmb_dma_addr = dma_addr; > > dev->cmb_size = size; > > > > in nvme_map_cmb func. pci_resource_start should return resource_size_t (phys_addr_t) and not dma_addr_t. I don't have the HW to check this code and propose a fix but it's seems buggy for me. In case we need dma address we should map it. Good catch. It does look wrong to pass a dma_addr_t to ioremap_wc and Create-SQes's PRP1. > > Perhaps I'm mistaken, but shouldn't the code use pcibios_resource_to_bus() > in this case to convert the resource to bus addresses? I see cmb_dma_addr > is later passed directly to the device as the sq_dma_addr. > That gets us a region from a window within a larger region, but to me it looks to me like resource_contains() would fail to match if the CMB region went beyond the window. There's another option - pci_bus_addr_t/pci_bus_region takes the largest of phys_addr_t's width and dma_addr_t's width. So in the cases where those two types might differ it should still be able to hold a valid physical address, which is what both the resource API and Create-SQes expect. But I'd rather see full integration into the pci resource tree so that it is aware of the CMB window. It isn't currently because the offset is given in an nvme register. I'd like to see the resource tree aware of all of this, and made a poor attempt to do that and sysfs integration at the same time. But I think Haggai's genalloc modifications and integration into the resource tree would mesh really well :) I'll follow up with a patch shortly that I think fixes a few basic issues expressed here. Cheers ^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr 2017-01-05 18:39 ` Jon Derrick @ 2017-01-08 8:55 ` Haggai Eran 2017-01-09 21:54 ` Jon Derrick 0 siblings, 1 reply; 8+ messages in thread From: Haggai Eran @ 2017-01-08 8:55 UTC (permalink / raw) On 1/5/2017 8:39 PM, Jon Derrick wrote: >> Perhaps I'm mistaken, but shouldn't the code use pcibios_resource_to_bus() >> in this case to convert the resource to bus addresses? I see cmb_dma_addr >> is later passed directly to the device as the sq_dma_addr. >> > That gets us a region from a window within a larger region, but to me it > looks to me like resource_contains() would fail to match if the CMB > region went beyond the window. I thought that the CMB must fit in its BAR, and therefore in the window that contains it. Isn't it so? > There's another option - pci_bus_addr_t/pci_bus_region takes the largest > of phys_addr_t's width and dma_addr_t's width. So in the cases where > those two types might differ it should still be able to hold a valid > physical address, which is what both the resource API and Create-SQes > expect. I don't think the issue is just the width of the types. What happens on architectures where phy_addr_t addresses are translated before going to the PCIe bus? Regards, Haggai ^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr 2017-01-08 8:55 ` Haggai Eran @ 2017-01-09 21:54 ` Jon Derrick 2017-01-11 8:15 ` Haggai Eran 0 siblings, 1 reply; 8+ messages in thread From: Jon Derrick @ 2017-01-09 21:54 UTC (permalink / raw) On Sun, Jan 08, 2017@10:55:28AM +0200, Haggai Eran wrote: > On 1/5/2017 8:39 PM, Jon Derrick wrote: > >> Perhaps I'm mistaken, but shouldn't the code use pcibios_resource_to_bus() > >> in this case to convert the resource to bus addresses? I see cmb_dma_addr > >> is later passed directly to the device as the sq_dma_addr. > >> > > That gets us a region from a window within a larger region, but to me it > > looks to me like resource_contains() would fail to match if the CMB > > region went beyond the window. > I thought that the CMB must fit in its BAR, and therefore in the window that > contains it. Isn't it so? > The spec is unclear if it's the host's responsibility to stay within the BAR, or the device's to reduce CMBLOC and CMBSZ to fit: "If the Offset + Size exceeds the length of the indicated BAR, the size available to the host is limited by the length of the BAR." I think this would only happen if we're behind a bridge with a smaller window than BAR. > > There's another option - pci_bus_addr_t/pci_bus_region takes the largest > > of phys_addr_t's width and dma_addr_t's width. So in the cases where > > those two types might differ it should still be able to hold a valid > > physical address, which is what both the resource API and Create-SQes > > expect. > I don't think the issue is just the width of the types. What happens on > architectures where phy_addr_t addresses are translated before going to > the PCIe bus? If we have a DMA translation, we get the host side addresses from the ioremapping and I believe the device is still expecting the untranslated addresses, since it needs to DMA over the fabric. Do archs exists that don't fit this model? > > Regards, > Haggai Cheers ^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr 2017-01-09 21:54 ` Jon Derrick @ 2017-01-11 8:15 ` Haggai Eran 2017-01-11 9:06 ` hch 2017-01-11 15:36 ` Jon Derrick 0 siblings, 2 replies; 8+ messages in thread From: Haggai Eran @ 2017-01-11 8:15 UTC (permalink / raw) On Mon, 2017-01-09@14:54 -0700, Jon Derrick wrote: > On Sun, Jan 08, 2017@10:55:28AM +0200, Haggai Eran wrote: > > > > On 1/5/2017 8:39 PM, Jon Derrick wrote: > > > > > > > > > > > Perhaps I'm mistaken, but shouldn't the code use > > > > pcibios_resource_to_bus() > > > > in this case to convert the resource to bus addresses? I see > > > > cmb_dma_addr? > > > > is later passed directly to the device as the sq_dma_addr. > > > > > > > That gets us a region from a window within a larger region, but > > > to me it > > > looks to me like resource_contains() would fail to match if the > > > CMB > > > region went beyond the window. > > I thought that the CMB must fit in its BAR, and therefore in the > > window that? > > contains it. Isn't it so? > > > The spec is unclear if it's the host's responsibility to stay within > the > BAR, or the device's to reduce CMBLOC and CMBSZ to fit: > > "If the Offset + Size exceeds the length of > the indicated BAR, the size available to the host is limited by the > length of the BAR." If the BAR is smaller than (offset + size) then any address that is outside the BAR must be treated by the device as if it is not in the CMB (otherwise some other devices / host memory will simply be inaccessible by the NVMe device).? > > I think this would only happen if we're behind a bridge with a > smaller > window than BAR. I'm pretty sure that the bridge window must contain the underlying device BARs. If it can't contain them, they can be simply left disabled. The situation can still happen in case the NVMe device exposes a smaller BAR than the CMB, or if it supports the resizeable BAR PCIe capability and the BIOS resized it to a smaller size (although I haven't heard of any device or BIOS that supports that).? > > > > > > > > > > There's another option - pci_bus_addr_t/pci_bus_region takes the > > > largest > > > of phys_addr_t's width and dma_addr_t's width. So in the cases > > > where > > > those two types might differ it should still be able to hold a > > > valid > > > physical address, which is what both the resource API and Create- > > > SQes > > > expect. > > I don't think the issue is just the width of the types. What > > happens on? > > architectures where phy_addr_t addresses are translated before > > going to? > > the PCIe bus? > If we have a DMA translation, we get the host side addresses from the > ioremapping and I believe the device is still expecting the > untranslated > addresses, since it needs to DMA over the fabric. Do archs exists > that > don't fit this model? I'm not talking about DMA translation. I'm talking about MMIO translation. From what I understand this can happen on POWER systems. The physical addresses for MMIO that are used by the CPU are different from the ones that are used on the PCIe bus. Regards, Haggai ^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr 2017-01-11 8:15 ` Haggai Eran @ 2017-01-11 9:06 ` hch 2017-01-11 15:36 ` Jon Derrick 1 sibling, 0 replies; 8+ messages in thread From: hch @ 2017-01-11 9:06 UTC (permalink / raw) On Wed, Jan 11, 2017@08:15:23AM +0000, Haggai Eran wrote: > If the BAR is smaller than (offset + size) then any address that is > outside the BAR must be treated by the device as if it is not in the > CMB (otherwise some other devices / host memory will simply be > inaccessible by the NVMe device).? Yes. I so wish the CMB design wasn't so messed up and we'd just have a separate BAR with a relative index for it. Maybe we'll need to propose a CMBv2 in the working group, at least for the proposed new extensions :) > > I think this would only happen if we're behind a bridge with a > > smaller > > window than BAR. > > I'm pretty sure that the bridge window must contain the underlying > device BARs. If it can't contain them, they can be simply left > disabled. Exactly. > I'm not talking about DMA translation. I'm talking about MMIO > translation. From what I understand this can happen on POWER systems. > The physical addresses for MMIO that are used by the CPU are different > from the ones that are used on the PCIe bus. As far as I know MMIO translation is absolutely usual for IOMMUs. That being said my knowledge of IOMMUs is mostly form before Intel and AMD chipset added them and thus from the RISC/IA64 world. ^ permalink raw reply [flat|nested] 8+ messages in thread
* phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr 2017-01-11 8:15 ` Haggai Eran 2017-01-11 9:06 ` hch @ 2017-01-11 15:36 ` Jon Derrick 1 sibling, 0 replies; 8+ messages in thread From: Jon Derrick @ 2017-01-11 15:36 UTC (permalink / raw) On Wed, Jan 11, 2017@08:15:23AM +0000, Haggai Eran wrote: > On Mon, 2017-01-09@14:54 -0700, Jon Derrick wrote: > > On Sun, Jan 08, 2017@10:55:28AM +0200, Haggai Eran wrote: > > > > > > On 1/5/2017 8:39 PM, Jon Derrick wrote: > > > > > > > > > > > > > > Perhaps I'm mistaken, but shouldn't the code use > > > > > pcibios_resource_to_bus() > > > > > in this case to convert the resource to bus addresses? I see > > > > > cmb_dma_addr? > > > > > is later passed directly to the device as the sq_dma_addr. > > > > > > > > > That gets us a region from a window within a larger region, but > > > > to me it > > > > looks to me like resource_contains() would fail to match if the > > > > CMB > > > > region went beyond the window. > > > I thought that the CMB must fit in its BAR, and therefore in the > > > window that? > > > contains it. Isn't it so? > > > > > The spec is unclear if it's the host's responsibility to stay within > > the > > BAR, or the device's to reduce CMBLOC and CMBSZ to fit: > > > > "If the Offset + Size exceeds the length of > > the indicated BAR, the size available to the host is limited by the > > length of the BAR." > > If the BAR is smaller than (offset + size) then any address that is > outside the BAR must be treated by the device as if it is not in the > CMB (otherwise some other devices / host memory will simply be > inaccessible by the NVMe device).? > > > > I think this would only happen if we're behind a bridge with a > > smaller > > window than BAR. > > I'm pretty sure that the bridge window must contain the underlying > device BARs. If it can't contain them, they can be simply left > disabled. > Oh good. I wasn't aware of those restrictions. That should make pcibus_resource_to_bus a possibility. > The situation can still happen in case the NVMe device exposes a > smaller BAR than the CMB, or if it supports the resizeable BAR PCIe > capability and the BIOS resized it to a smaller size (although I > haven't heard of any device or BIOS that supports that).? > > > > > > > > > > > > > > > > There's another option - pci_bus_addr_t/pci_bus_region takes the > > > > largest > > > > of phys_addr_t's width and dma_addr_t's width. So in the cases > > > > where > > > > those two types might differ it should still be able to hold a > > > > valid > > > > physical address, which is what both the resource API and Create- > > > > SQes > > > > expect. > > > I don't think the issue is just the width of the types. What > > > happens on? > > > architectures where phy_addr_t addresses are translated before > > > going to? > > > the PCIe bus? > > If we have a DMA translation, we get the host side addresses from the > > ioremapping and I believe the device is still expecting the > > untranslated > > addresses, since it needs to DMA over the fabric. Do archs exists > > that > > don't fit this model? > I'm not talking about DMA translation. I'm talking about MMIO > translation. From what I understand this can happen on POWER systems. > The physical addresses for MMIO that are used by the CPU are different > from the ones that are used on the PCIe bus. > I hadn't considered those but the address given in the Create SQes command still should be in the range of the addresses in the device's BARs. I'm guessing we'll may have to go through the IOMMU subsystem to untranslate those. > Regards, > Haggai ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-01-11 15:36 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-01-05 10:20 phys_addr_t instead of dma_addr_t for nvme_dev->cmb_dma_addr Max Gurtovoy 2017-01-05 11:02 ` Haggai Eran 2017-01-05 18:39 ` Jon Derrick 2017-01-08 8:55 ` Haggai Eran 2017-01-09 21:54 ` Jon Derrick 2017-01-11 8:15 ` Haggai Eran 2017-01-11 9:06 ` hch 2017-01-11 15:36 ` Jon Derrick
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.