From: Robin Murphy <robin.murphy@arm.com>
To: Michael Cheng <michael.cheng@intel.com>, intel-gfx@lists.freedesktop.org
Cc: Catalin Marinas <catalin.marinas@arm.com>,
lucas.demarchi@intel.com, dri-devel@lists.freedesktop.org,
Will Deacon <will@kernel.org>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [Intel-gfx] [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range
Date: Wed, 2 Mar 2022 19:10:29 +0000 [thread overview]
Message-ID: <9d09be2d-34f2-212c-70c0-c698a1d8145a@arm.com> (raw)
In-Reply-To: <49e4a740-c127-0d5d-e2e8-3ff482fdce3c@intel.com>
On 2022-03-02 15:55, Michael Cheng wrote:
> Thanks for the feedback Robin!
>
> Sorry my choices of word weren't that great, but what I meant is to
> understand how ARM flushes a range of dcache for device drivers, and not
> an equal to x86 clflush.
>
> I believe the concern is if the CPU writes an update, that update might
> only be sitting in the CPU cache and never make it to device memory
> where the device can see it; there are specific places that we are
> supposed to flush the CPU caches to make sure our updates are visible to
> the hardware.
Ah, OK, if it's more about ordering, and it's actually write buffers
rather than caches that you care about flushing, then we might be a lot
safer, phew!
For a very simple overview, in a case where the device itself needs to
observe memory writes in the correct order, e.g.:
data_descriptor.valid = 1;
clflush(&data_descriptor);
command_descriptor.data = &data_descriptor
writel(/* control register to read command to then read data */)
then dma_wmb() between the first two writes should be the right tool to
ensure that the command does not observe the command update while the
data update is still sat somewhere in a CPU write buffer.
If you want a slightly stronger notion that, at a given point, all prior
writes have actually been issued and should now be visible (rather than
just that they won't become visible in the wrong order whenever they
do), then wmb() should suffice on arm64.
Note that wioth arm64 memory types, a Non-Cacheable mapping of DRAM for
a non-coherent DMA mapping, or of VRAM in a prefetchable BAR, can still
be write-buffered, so barriers still matter even when actual cache
maintenance ops don't (and as before if you're trying to perform cache
maintenance outside the DMA API then you've already lost anyway). MMIO
registers should be mapped as Device memory via ioremap(), which is not
bufferable, hence the barrier implicit in writel() effectively pushes
out any prior buffered writes ahead of a register write, which is why we
don't need to worry about this most of the time.
This is only a very rough overview, though, and I'm not familiar enough
with x86 semantics, your hardware, or the exact use-case to be able to
say whether barriers alone are anywhere near the right answer or not.
Robin.
>
> +Matt Roper
>
> Matt, Lucas, any feed back here?
>
> On 2022-03-02 4:49 a.m., Robin Murphy wrote:
>> On 2022-02-25 19:27, Michael Cheng wrote:
>>> Hi Robin,
>>>
>>> [ +arm64 maintainers for their awareness, which would have been a
>>> good thing to do from the start ]
>>>
>>> * Thanks for adding the arm64 maintainer and sorry I didn't rope them
>>> in sooner.
>>>
>>> Why does i915 need to ensure the CPU's instruction cache is coherent
>>> with its data cache? Is it a self-modifying driver?
>>>
>>> * Also thanks for pointing this out. Initially I was using
>>> dcache_clean_inval_poc, which seem to be the equivalently to what
>>> x86 is doing for dcache flushing, but it was giving me build errors
>>> since its not on the global list of kernel symbols. And after
>>> revisiting the documentation for caches_clean_inval_pou, it won't
>>> fly for what we are trying to do. Moving forward, what would you (or
>>> someone in the ARM community) suggest we do? Could it be possible to
>>> export dcache_clean_inval_poc as a global symbol?
>>
>> Unlikely, unless something with a legitimate need for CPU-centric
>> cache maintenance like kexec or CPU hotplug ever becomes modular.
>>
>> In the case of a device driver, it's not even the basic issues of
>> assuming to find direct equivalents to x86 semantics in other CPU
>> architectures, or effectively reinventing parts of the DMA API, it's
>> even bigger than that. Once you move from being integrated in a single
>> vendor's system architecture to being on a discrete card, you
>> fundamentally *no longer have any control over cache coherency*.
>> Whether the host CPU architecture happens to be AArch64, RISC-V, or
>> whatever doesn't really matter, you're at the mercy of 3rd-party PCIe
>> and interconnect IP vendors, and SoC integrators. You'll find yourself
>> in systems where PCIe simply cannot snoop any caches, where you'd
>> better have the correct DMA API calls in place to have any hope of
>> even the most basic functionality working properly; you'll find
>> yourself in systems where even if the PCIe root complex claims to
>> support No Snoop, your uncached traffic will still end up snooping
>> stale data that got prefetched back into caches you thought you'd
>> invalidated; you'll find yourself in systems where your memory
>> attributes may or may not get forcibly rewritten by an IOMMU depending
>> on the kernel config and/or command line.
>>
>> It's not about simply finding a substitute for clflush, it's that the
>> reasons you have for using clflush in the first place can no longer be
>> assumed to be valid.
>>
>> Robin.
>>
>>> On 2022-02-25 10:24 a.m., Robin Murphy wrote:
>>>> [ +arm64 maintainers for their awareness, which would have been a
>>>> good thing to do from the start ]
>>>>
>>>> On 2022-02-25 03:24, Michael Cheng wrote:
>>>>> Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou
>>>>> performs a flush by first performing a clean, follow by an
>>>>> invalidation
>>>>> operation.
>>>>>
>>>>> v2 (Michael Cheng): Use correct macro for cleaning and invalidation
>>>>> the
>>>>> dcache. Thanks Tvrtko for the suggestion.
>>>>>
>>>>> v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h
>>>>>
>>>>> v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc as a
>>>>> symbol that could be use by other modules, thus use
>>>>> caches_clean_inval_pou instead. Also this version
>>>>> removes include for cacheflush, since its already
>>>>> included base on architecture type.
>>>>>
>>>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>>>> Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/drm_cache.c | 5 +++++
>>>>> 1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
>>>>> index c3e6e615bf09..81c28714f930 100644
>>>>> --- a/drivers/gpu/drm/drm_cache.c
>>>>> +++ b/drivers/gpu/drm/drm_cache.c
>>>>> @@ -174,6 +174,11 @@ drm_clflush_virt_range(void *addr, unsigned
>>>>> long length)
>>>>> if (wbinvd_on_all_cpus())
>>>>> pr_err("Timed out waiting for cache flush\n");
>>>>> +
>>>>> +#elif defined(CONFIG_ARM64)
>>>>> + void *end = addr + length;
>>>>> + caches_clean_inval_pou((unsigned long)addr, (unsigned long)end);
>>>>
>>>> Why does i915 need to ensure the CPU's instruction cache is coherent
>>>> with its data cache? Is it a self-modifying driver?
>>>>
>>>> Robin.
>>>>
>>>> (Note that the above is somewhat of a loaded question, and I do
>>>> actually have half an idea of what you're trying to do here and why
>>>> it won't fly, but I'd like to at least assume you've read the
>>>> documentation of the function you decided was OK to use)
>>>>
>>>>> +
>>>>> #else
>>>>> WARN_ONCE(1, "Architecture has no drm_cache.c support\n");
>>>>> #endif
next prev parent reply other threads:[~2022-03-03 18:07 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-25 3:24 [Intel-gfx] [PATCH v12 0/6] Use drm_clflush* instead of clflush Michael Cheng
2022-02-25 3:24 ` [Intel-gfx] [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range Michael Cheng
2022-02-25 16:28 ` Tvrtko Ursulin
2022-02-25 16:52 ` Michael Cheng
2022-02-25 17:33 ` Tvrtko Ursulin
2022-02-25 17:40 ` Michael Cheng
2022-02-25 18:19 ` Tvrtko Ursulin
2022-02-25 18:23 ` Michael Cheng
2022-02-25 18:42 ` Tvrtko Ursulin
2022-02-25 18:58 ` Matthew Wilcox
2022-02-25 18:24 ` Robin Murphy
2022-02-25 19:27 ` Michael Cheng
2022-03-02 12:49 ` Robin Murphy
2022-03-02 15:55 ` Michael Cheng
2022-03-02 17:06 ` Alex Deucher
2022-03-02 19:10 ` Robin Murphy [this message]
2022-03-07 16:52 ` Michael Cheng
2022-02-25 3:24 ` [Intel-gfx] [PATCH v12 2/6] drm/i915/gt: Re-work intel_write_status_page Michael Cheng
2022-02-25 3:24 ` [Intel-gfx] [PATCH v12 3/6] drm/i915/gt: Drop invalidate_csb_entries Michael Cheng
2022-02-25 3:24 ` [Intel-gfx] [PATCH v12 4/6] drm/i915/gt: Re-work reset_csb Michael Cheng
2022-02-25 3:24 ` [Intel-gfx] [PATCH v12 5/6] drm/i915/: Re-work clflush_write32 Michael Cheng
2022-02-25 3:24 ` [Intel-gfx] [PATCH v12 6/6] drm/i915/gt: replace cache_clflush_range Michael Cheng
2022-02-25 7:28 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Use drm_clflush* instead of clflush Patchwork
2022-02-25 7:29 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-02-25 7:59 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-02-26 1:56 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9d09be2d-34f2-212c-70c0-c698a1d8145a@arm.com \
--to=robin.murphy@arm.com \
--cc=catalin.marinas@arm.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=lucas.demarchi@intel.com \
--cc=michael.cheng@intel.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox