* [RFC] add __iomem cookie for EPF BAR @ 2023-08-07 12:28 Li Chen 2023-08-08 7:03 ` mani 0 siblings, 1 reply; 5+ messages in thread From: Li Chen @ 2023-08-07 12:28 UTC (permalink / raw) To: linux-pci, Lorenzo Pieralisi, mani, Kishon Vijay Abraham I, Bjorn Helgaas, Arnd Bergmann Hi All Currently, the EPF's bar is allocated by pci_epf_alloc_space, which internally uses dma_alloc_coherent and the caching behavior of dma_alloc_coherent may vary depending on platforms. The bar space is exported to RC, which means that RC may modify it without EP being aware of it, so EP still read from the cache and get stalled data. To address this issue, the bar space should be treated as iomem instead and forced to use io read/write APIs, which enforces volatile. If you agree, I would create patches for existing EPF and EPF/EPC core and submit them for review. Regards, Li ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] add __iomem cookie for EPF BAR 2023-08-07 12:28 [RFC] add __iomem cookie for EPF BAR Li Chen @ 2023-08-08 7:03 ` mani 2023-08-08 7:44 ` Arnd Bergmann 0 siblings, 1 reply; 5+ messages in thread From: mani @ 2023-08-08 7:03 UTC (permalink / raw) To: Li Chen Cc: linux-pci, Lorenzo Pieralisi, Kishon Vijay Abraham I, Bjorn Helgaas, Arnd Bergmann On Mon, Aug 07, 2023 at 08:28:30PM +0800, Li Chen wrote: > Hi All > > Currently, the EPF's bar is allocated by pci_epf_alloc_space, which internally uses dma_alloc_coherent and the caching behavior of dma_alloc_coherent may vary depending on platforms. > > The bar space is exported to RC, which means that RC may modify it without EP being aware of it, so EP still read from the cache and get stalled data. To address this issue, the bar space should be treated as iomem instead and forced to use io read/write APIs, which enforces volatile. > We already had a similar discussion on using volatile for BAR space and settled with {WRITE/READ}_ONCE macros in EPF Test driver [1]. Since the BAR space allocated in endpoint is not a MMIO, I don't think it should be forced as iomem. Rather EPF drivers should use _ONCE macros to access the fields to avoid coherency issues as suggested by Arnd earlier. - Mani [1] https://lore.kernel.org/linux-pci/c49df2f9-9848-45aa-9d64-9e4e9841440f@app.fastmail.com/ > If you agree, I would create patches for existing EPF and EPF/EPC core and submit them for review. > > Regards, > Li -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] add __iomem cookie for EPF BAR 2023-08-08 7:03 ` mani @ 2023-08-08 7:44 ` Arnd Bergmann 2023-08-09 1:09 ` Li Chen 0 siblings, 1 reply; 5+ messages in thread From: Arnd Bergmann @ 2023-08-08 7:44 UTC (permalink / raw) To: Manivannan Sadhasivam, Li Chen Cc: linux-pci, Lorenzo Pieralisi, Kishon Vijay Abraham I, Bjorn Helgaas On Tue, Aug 8, 2023, at 09:03, mani wrote: > On Mon, Aug 07, 2023 at 08:28:30PM +0800, Li Chen wrote: >> >> Currently, the EPF's bar is allocated by pci_epf_alloc_space, which internally uses dma_alloc_coherent and the caching behavior of dma_alloc_coherent may vary depending on platforms. >> >> The bar space is exported to RC, which means that RC may modify it without EP being aware of it, so EP still read from the cache and get stalled data. To address this issue, the bar space should be treated as iomem instead and forced to use io read/write APIs, which enforces volatile. >> > > We already had a similar discussion on using volatile for BAR space and settled > with {WRITE/READ}_ONCE macros in EPF Test driver [1]. > > Since the BAR space allocated in endpoint is not a MMIO, I don't think it should > be forced as iomem. Rather EPF drivers should use _ONCE macros to access the > fields to avoid coherency issues as suggested by Arnd earlier. Using readl/writel is clearly the wrong solution here as I explained before, but I assume that Li Chen is dealing with a real problem. If the cache is coherent with the device, then reading from the cache is clearly the right thing to do, but the mentioned "stall" problem may be related to the store buffers, where an dma_wmb() after the WRITE_ONCE() is missing. Similarly, a dma_rmb() might be missing before a READ_ONCE() to prevent prefetching during out-of-order execution. With readl()/writel(), you already get very heavy barriers, so it may end up working by accident, but these barriers are at the other side of the access (before writel and after readl) and may be the wrong type of barrier depending on the CPU. Arnd ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] add __iomem cookie for EPF BAR 2023-08-08 7:44 ` Arnd Bergmann @ 2023-08-09 1:09 ` Li Chen 2023-08-09 7:10 ` Arnd Bergmann 0 siblings, 1 reply; 5+ messages in thread From: Li Chen @ 2023-08-09 1:09 UTC (permalink / raw) To: Arnd Bergmann Cc: Manivannan Sadhasivam, linux-pci, Lorenzo Pieralisi, Kishon Vijay Abraham I, Bjorn Helgaas On Tue, 08 Aug 2023 15:44:44 +0800, Arnd Bergmann wrote: Hi Arnd, > > On Tue, Aug 8, 2023, at 09:03, mani wrote: > > On Mon, Aug 07, 2023 at 08:28:30PM +0800, Li Chen wrote: > >> > >> Currently, the EPF's bar is allocated by pci_epf_alloc_space, which internally uses dma_alloc_coherent and the caching behavior of dma_alloc_coherent may vary depending on platforms. > >> > >> The bar space is exported to RC, which means that RC may modify it without EP being aware of it, so EP still read from the cache and get stalled data. To address this issue, the bar space should be treated as iomem instead and forced to use io read/write APIs, which enforces volatile. > >> > > > > We already had a similar discussion on using volatile for BAR space and settled > > with {WRITE/READ}_ONCE macros in EPF Test driver [1]. > > > > Since the BAR space allocated in endpoint is not a MMIO, I don't think it should > > be forced as iomem. Rather EPF drivers should use _ONCE macros to access the > > fields to avoid coherency issues as suggested by Arnd earlier. > > Using readl/writel is clearly the wrong solution here as I explained > before, but I assume that Li Chen is dealing with a real problem. Thanks, I learnt much from your mail. Actually, I'm not dealing with a real problem. > If the cache is coherent with the device, then reading from the cache > is clearly the right thing to do, I guess that even SoCs with CCI support might not handle cache for RC access if specific bus interfaces are not connected. > but the mentioned "stall" problem may > be related to the store buffers, where an dma_wmb() after the > WRITE_ONCE() is missing. Similarly, a dma_rmb() might be missing before > a READ_ONCE() to prevent prefetching during out-of-order execution. > > With readl()/writel(), you already get very heavy barriers, so it may > end up working by accident, but these barriers are at the other side > of the access (before writel and after readl) and may be the wrong > type of barrier depending on the CPU. For systems that aren't cache-coherent, is it accurate to say that the store buffer might still be utilized, and that there might still be a need for dma_wmb and dma_rmb? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] add __iomem cookie for EPF BAR 2023-08-09 1:09 ` Li Chen @ 2023-08-09 7:10 ` Arnd Bergmann 0 siblings, 0 replies; 5+ messages in thread From: Arnd Bergmann @ 2023-08-09 7:10 UTC (permalink / raw) To: Li Chen Cc: Manivannan Sadhasivam, linux-pci, Lorenzo Pieralisi, Kishon Vijay Abraham I, Bjorn Helgaas On Wed, Aug 9, 2023, at 03:09, Li Chen wrote: > On Tue, 08 Aug 2023 15:44:44 +0800, > Arnd Bergmann wrote: >> On Tue, Aug 8, 2023, at 09:03, mani wrote: >> > On Mon, Aug 07, 2023 at 08:28:30PM +0800, Li Chen wrote: > >> If the cache is coherent with the device, then reading from the cache >> is clearly the right thing to do, > > I guess that even SoCs with CCI support might not handle cache for RC > access if specific bus interfaces are not connected. Correct, each device in the system can be cache-coherent or noncoherent, independent of the others, and needs to be marked correctly in the DT. The dma_alloc_coherent() call will either allocate cacheable or noncachable memory based on what the kernel thinks is required for the particular device. >> but the mentioned "stall" problem may >> be related to the store buffers, where an dma_wmb() after the >> WRITE_ONCE() is missing. Similarly, a dma_rmb() might be missing before >> a READ_ONCE() to prevent prefetching during out-of-order execution. >> >> With readl()/writel(), you already get very heavy barriers, so it may >> end up working by accident, but these barriers are at the other side >> of the access (before writel and after readl) and may be the wrong >> type of barrier depending on the CPU. > > For systems that aren't cache-coherent, is it accurate to say that the > store > buffer might still be utilized, and that there might still be a need > for dma_wmb and dma_rmb? Yes, the ordering is really independent of the cache, so these will be needed for portable code either way, the same way you need smp_wmb()/smp_rmb() between CPUs accessing shared memory locally. Arnd ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-08-09 7:10 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-07 12:28 [RFC] add __iomem cookie for EPF BAR Li Chen 2023-08-08 7:03 ` mani 2023-08-08 7:44 ` Arnd Bergmann 2023-08-09 1:09 ` Li Chen 2023-08-09 7:10 ` Arnd Bergmann
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.