* [RFC] add __iomem cookie for EPF BAR
@ 2023-08-07 12:28 Li Chen
2023-08-08 7:03 ` mani
0 siblings, 1 reply; 5+ messages in thread
From: Li Chen @ 2023-08-07 12:28 UTC (permalink / raw)
To: linux-pci, Lorenzo Pieralisi, mani, Kishon Vijay Abraham I,
Bjorn Helgaas, Arnd Bergmann
Hi All
Currently, the EPF's bar is allocated by pci_epf_alloc_space, which internally uses dma_alloc_coherent and the caching behavior of dma_alloc_coherent may vary depending on platforms.
The bar space is exported to RC, which means that RC may modify it without EP being aware of it, so EP still read from the cache and get stalled data. To address this issue, the bar space should be treated as iomem instead and forced to use io read/write APIs, which enforces volatile.
If you agree, I would create patches for existing EPF and EPF/EPC core and submit them for review.
Regards,
Li
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] add __iomem cookie for EPF BAR
2023-08-07 12:28 [RFC] add __iomem cookie for EPF BAR Li Chen
@ 2023-08-08 7:03 ` mani
2023-08-08 7:44 ` Arnd Bergmann
0 siblings, 1 reply; 5+ messages in thread
From: mani @ 2023-08-08 7:03 UTC (permalink / raw)
To: Li Chen
Cc: linux-pci, Lorenzo Pieralisi, Kishon Vijay Abraham I,
Bjorn Helgaas, Arnd Bergmann
On Mon, Aug 07, 2023 at 08:28:30PM +0800, Li Chen wrote:
> Hi All
>
> Currently, the EPF's bar is allocated by pci_epf_alloc_space, which internally uses dma_alloc_coherent and the caching behavior of dma_alloc_coherent may vary depending on platforms.
>
> The bar space is exported to RC, which means that RC may modify it without EP being aware of it, so EP still read from the cache and get stalled data. To address this issue, the bar space should be treated as iomem instead and forced to use io read/write APIs, which enforces volatile.
>
We already had a similar discussion on using volatile for BAR space and settled
with {WRITE/READ}_ONCE macros in EPF Test driver [1].
Since the BAR space allocated in endpoint is not a MMIO, I don't think it should
be forced as iomem. Rather EPF drivers should use _ONCE macros to access the
fields to avoid coherency issues as suggested by Arnd earlier.
- Mani
[1] https://lore.kernel.org/linux-pci/c49df2f9-9848-45aa-9d64-9e4e9841440f@app.fastmail.com/
> If you agree, I would create patches for existing EPF and EPF/EPC core and submit them for review.
>
> Regards,
> Li
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] add __iomem cookie for EPF BAR
2023-08-08 7:03 ` mani
@ 2023-08-08 7:44 ` Arnd Bergmann
2023-08-09 1:09 ` Li Chen
0 siblings, 1 reply; 5+ messages in thread
From: Arnd Bergmann @ 2023-08-08 7:44 UTC (permalink / raw)
To: Manivannan Sadhasivam, Li Chen
Cc: linux-pci, Lorenzo Pieralisi, Kishon Vijay Abraham I,
Bjorn Helgaas
On Tue, Aug 8, 2023, at 09:03, mani wrote:
> On Mon, Aug 07, 2023 at 08:28:30PM +0800, Li Chen wrote:
>>
>> Currently, the EPF's bar is allocated by pci_epf_alloc_space, which internally uses dma_alloc_coherent and the caching behavior of dma_alloc_coherent may vary depending on platforms.
>>
>> The bar space is exported to RC, which means that RC may modify it without EP being aware of it, so EP still read from the cache and get stalled data. To address this issue, the bar space should be treated as iomem instead and forced to use io read/write APIs, which enforces volatile.
>>
>
> We already had a similar discussion on using volatile for BAR space and settled
> with {WRITE/READ}_ONCE macros in EPF Test driver [1].
>
> Since the BAR space allocated in endpoint is not a MMIO, I don't think it should
> be forced as iomem. Rather EPF drivers should use _ONCE macros to access the
> fields to avoid coherency issues as suggested by Arnd earlier.
Using readl/writel is clearly the wrong solution here as I explained
before, but I assume that Li Chen is dealing with a real problem.
If the cache is coherent with the device, then reading from the cache
is clearly the right thing to do, but the mentioned "stall" problem may
be related to the store buffers, where an dma_wmb() after the
WRITE_ONCE() is missing. Similarly, a dma_rmb() might be missing before
a READ_ONCE() to prevent prefetching during out-of-order execution.
With readl()/writel(), you already get very heavy barriers, so it may
end up working by accident, but these barriers are at the other side
of the access (before writel and after readl) and may be the wrong
type of barrier depending on the CPU.
Arnd
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] add __iomem cookie for EPF BAR
2023-08-08 7:44 ` Arnd Bergmann
@ 2023-08-09 1:09 ` Li Chen
2023-08-09 7:10 ` Arnd Bergmann
0 siblings, 1 reply; 5+ messages in thread
From: Li Chen @ 2023-08-09 1:09 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Manivannan Sadhasivam, linux-pci, Lorenzo Pieralisi,
Kishon Vijay Abraham I, Bjorn Helgaas
On Tue, 08 Aug 2023 15:44:44 +0800,
Arnd Bergmann wrote:
Hi Arnd,
>
> On Tue, Aug 8, 2023, at 09:03, mani wrote:
> > On Mon, Aug 07, 2023 at 08:28:30PM +0800, Li Chen wrote:
> >>
> >> Currently, the EPF's bar is allocated by pci_epf_alloc_space, which internally uses dma_alloc_coherent and the caching behavior of dma_alloc_coherent may vary depending on platforms.
> >>
> >> The bar space is exported to RC, which means that RC may modify it without EP being aware of it, so EP still read from the cache and get stalled data. To address this issue, the bar space should be treated as iomem instead and forced to use io read/write APIs, which enforces volatile.
> >>
> >
> > We already had a similar discussion on using volatile for BAR space and settled
> > with {WRITE/READ}_ONCE macros in EPF Test driver [1].
> >
> > Since the BAR space allocated in endpoint is not a MMIO, I don't think it should
> > be forced as iomem. Rather EPF drivers should use _ONCE macros to access the
> > fields to avoid coherency issues as suggested by Arnd earlier.
>
> Using readl/writel is clearly the wrong solution here as I explained
> before, but I assume that Li Chen is dealing with a real problem.
Thanks, I learnt much from your mail.
Actually, I'm not dealing with a real problem.
> If the cache is coherent with the device, then reading from the cache
> is clearly the right thing to do,
I guess that even SoCs with CCI support might not handle cache for RC
access if specific bus interfaces are not connected.
> but the mentioned "stall" problem may
> be related to the store buffers, where an dma_wmb() after the
> WRITE_ONCE() is missing. Similarly, a dma_rmb() might be missing before
> a READ_ONCE() to prevent prefetching during out-of-order execution.
>
> With readl()/writel(), you already get very heavy barriers, so it may
> end up working by accident, but these barriers are at the other side
> of the access (before writel and after readl) and may be the wrong
> type of barrier depending on the CPU.
For systems that aren't cache-coherent, is it accurate to say that the store
buffer might still be utilized, and that there might still be a need for dma_wmb and dma_rmb?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] add __iomem cookie for EPF BAR
2023-08-09 1:09 ` Li Chen
@ 2023-08-09 7:10 ` Arnd Bergmann
0 siblings, 0 replies; 5+ messages in thread
From: Arnd Bergmann @ 2023-08-09 7:10 UTC (permalink / raw)
To: Li Chen
Cc: Manivannan Sadhasivam, linux-pci, Lorenzo Pieralisi,
Kishon Vijay Abraham I, Bjorn Helgaas
On Wed, Aug 9, 2023, at 03:09, Li Chen wrote:
> On Tue, 08 Aug 2023 15:44:44 +0800,
> Arnd Bergmann wrote:
>> On Tue, Aug 8, 2023, at 09:03, mani wrote:
>> > On Mon, Aug 07, 2023 at 08:28:30PM +0800, Li Chen wrote:
>
>> If the cache is coherent with the device, then reading from the cache
>> is clearly the right thing to do,
>
> I guess that even SoCs with CCI support might not handle cache for RC
> access if specific bus interfaces are not connected.
Correct, each device in the system can be cache-coherent or
noncoherent, independent of the others, and needs to be marked
correctly in the DT. The dma_alloc_coherent() call will either
allocate cacheable or noncachable memory based on what the
kernel thinks is required for the particular device.
>> but the mentioned "stall" problem may
>> be related to the store buffers, where an dma_wmb() after the
>> WRITE_ONCE() is missing. Similarly, a dma_rmb() might be missing before
>> a READ_ONCE() to prevent prefetching during out-of-order execution.
>>
>> With readl()/writel(), you already get very heavy barriers, so it may
>> end up working by accident, but these barriers are at the other side
>> of the access (before writel and after readl) and may be the wrong
>> type of barrier depending on the CPU.
>
> For systems that aren't cache-coherent, is it accurate to say that the
> store
> buffer might still be utilized, and that there might still be a need
> for dma_wmb and dma_rmb?
Yes, the ordering is really independent of the cache, so these will
be needed for portable code either way, the same way you need
smp_wmb()/smp_rmb() between CPUs accessing shared memory locally.
Arnd
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-08-09 7:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-07 12:28 [RFC] add __iomem cookie for EPF BAR Li Chen
2023-08-08 7:03 ` mani
2023-08-08 7:44 ` Arnd Bergmann
2023-08-09 1:09 ` Li Chen
2023-08-09 7:10 ` Arnd Bergmann
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.