From: Michael Cheng <michael.cheng@intel.com>
To: Robin Murphy <robin.murphy@arm.com>, <intel-gfx@lists.freedesktop.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
lucas.demarchi@intel.com, dri-devel@lists.freedesktop.org,
Will Deacon <will@kernel.org>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [Intel-gfx] [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range
Date: Mon, 7 Mar 2022 08:52:06 -0800 [thread overview]
Message-ID: <9e3f9e41-1086-3636-b8a5-4dd2ca9f38ac@intel.com> (raw)
In-Reply-To: <9d09be2d-34f2-212c-70c0-c698a1d8145a@arm.com>
Ah Thanks for the great feedback!
@Lucas or @Matt, could you please chime in?
Michael Cheng
On 2022-03-02 11:10 a.m., Robin Murphy wrote:
> On 2022-03-02 15:55, Michael Cheng wrote:
>> Thanks for the feedback Robin!
>>
>> Sorry my choices of word weren't that great, but what I meant is to
>> understand how ARM flushes a range of dcache for device drivers, and
>> not an equal to x86 clflush.
>>
>> I believe the concern is if the CPU writes an update, that update
>> might only be sitting in the CPU cache and never make it to device
>> memory where the device can see it; there are specific places that we
>> are supposed to flush the CPU caches to make sure our updates are
>> visible to the hardware.
>
> Ah, OK, if it's more about ordering, and it's actually write buffers
> rather than caches that you care about flushing, then we might be a
> lot safer, phew!
>
> For a very simple overview, in a case where the device itself needs to
> observe memory writes in the correct order, e.g.:
>
> data_descriptor.valid = 1;
>
> clflush(&data_descriptor);
>
> command_descriptor.data = &data_descriptor
>
> writel(/* control register to read command to then read data */)
>
> then dma_wmb() between the first two writes should be the right tool
> to ensure that the command does not observe the command update while
> the data update is still sat somewhere in a CPU write buffer.
>
> If you want a slightly stronger notion that, at a given point, all
> prior writes have actually been issued and should now be visible
> (rather than just that they won't become visible in the wrong order
> whenever they do), then wmb() should suffice on arm64.
>
> Note that wioth arm64 memory types, a Non-Cacheable mapping of DRAM
> for a non-coherent DMA mapping, or of VRAM in a prefetchable BAR, can
> still be write-buffered, so barriers still matter even when actual
> cache maintenance ops don't (and as before if you're trying to perform
> cache maintenance outside the DMA API then you've already lost
> anyway). MMIO registers should be mapped as Device memory via
> ioremap(), which is not bufferable, hence the barrier implicit in
> writel() effectively pushes out any prior buffered writes ahead of a
> register write, which is why we don't need to worry about this most of
> the time.
>
> This is only a very rough overview, though, and I'm not familiar
> enough with x86 semantics, your hardware, or the exact use-case to be
> able to say whether barriers alone are anywhere near the right answer
> or not.
>
> Robin.
>
>>
>> +Matt Roper
>>
>> Matt, Lucas, any feed back here?
>>
>> On 2022-03-02 4:49 a.m., Robin Murphy wrote:
>>> On 2022-02-25 19:27, Michael Cheng wrote:
>>>> Hi Robin,
>>>>
>>>> [ +arm64 maintainers for their awareness, which would have been a
>>>> good thing to do from the start ]
>>>>
>>>> * Thanks for adding the arm64 maintainer and sorry I didn't rope
>>>> them
>>>> in sooner.
>>>>
>>>> Why does i915 need to ensure the CPU's instruction cache is
>>>> coherent with its data cache? Is it a self-modifying driver?
>>>>
>>>> * Also thanks for pointing this out. Initially I was using
>>>> dcache_clean_inval_poc, which seem to be the equivalently to what
>>>> x86 is doing for dcache flushing, but it was giving me build
>>>> errors
>>>> since its not on the global list of kernel symbols. And after
>>>> revisiting the documentation for caches_clean_inval_pou, it won't
>>>> fly for what we are trying to do. Moving forward, what would
>>>> you (or
>>>> someone in the ARM community) suggest we do? Could it be
>>>> possible to
>>>> export dcache_clean_inval_poc as a global symbol?
>>>
>>> Unlikely, unless something with a legitimate need for CPU-centric
>>> cache maintenance like kexec or CPU hotplug ever becomes modular.
>>>
>>> In the case of a device driver, it's not even the basic issues of
>>> assuming to find direct equivalents to x86 semantics in other CPU
>>> architectures, or effectively reinventing parts of the DMA API, it's
>>> even bigger than that. Once you move from being integrated in a
>>> single vendor's system architecture to being on a discrete card, you
>>> fundamentally *no longer have any control over cache coherency*.
>>> Whether the host CPU architecture happens to be AArch64, RISC-V, or
>>> whatever doesn't really matter, you're at the mercy of 3rd-party
>>> PCIe and interconnect IP vendors, and SoC integrators. You'll find
>>> yourself in systems where PCIe simply cannot snoop any caches, where
>>> you'd better have the correct DMA API calls in place to have any
>>> hope of even the most basic functionality working properly; you'll
>>> find yourself in systems where even if the PCIe root complex claims
>>> to support No Snoop, your uncached traffic will still end up
>>> snooping stale data that got prefetched back into caches you thought
>>> you'd invalidated; you'll find yourself in systems where your memory
>>> attributes may or may not get forcibly rewritten by an IOMMU
>>> depending on the kernel config and/or command line.
>>>
>>> It's not about simply finding a substitute for clflush, it's that
>>> the reasons you have for using clflush in the first place can no
>>> longer be assumed to be valid.
>>>
>>> Robin.
>>>
>>>> On 2022-02-25 10:24 a.m., Robin Murphy wrote:
>>>>> [ +arm64 maintainers for their awareness, which would have been a
>>>>> good thing to do from the start ]
>>>>>
>>>>> On 2022-02-25 03:24, Michael Cheng wrote:
>>>>>> Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou
>>>>>> performs a flush by first performing a clean, follow by an
>>>>>> invalidation
>>>>>> operation.
>>>>>>
>>>>>> v2 (Michael Cheng): Use correct macro for cleaning and
>>>>>> invalidation the
>>>>>> dcache. Thanks Tvrtko for the suggestion.
>>>>>>
>>>>>> v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h
>>>>>>
>>>>>> v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc
>>>>>> as a
>>>>>> symbol that could be use by other modules, thus use
>>>>>> caches_clean_inval_pou instead. Also this version
>>>>>> removes include for cacheflush, since its already
>>>>>> included base on architecture type.
>>>>>>
>>>>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>>>>> Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/drm_cache.c | 5 +++++
>>>>>> 1 file changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_cache.c
>>>>>> b/drivers/gpu/drm/drm_cache.c
>>>>>> index c3e6e615bf09..81c28714f930 100644
>>>>>> --- a/drivers/gpu/drm/drm_cache.c
>>>>>> +++ b/drivers/gpu/drm/drm_cache.c
>>>>>> @@ -174,6 +174,11 @@ drm_clflush_virt_range(void *addr, unsigned
>>>>>> long length)
>>>>>> if (wbinvd_on_all_cpus())
>>>>>> pr_err("Timed out waiting for cache flush\n");
>>>>>> +
>>>>>> +#elif defined(CONFIG_ARM64)
>>>>>> + void *end = addr + length;
>>>>>> + caches_clean_inval_pou((unsigned long)addr, (unsigned
>>>>>> long)end);
>>>>>
>>>>> Why does i915 need to ensure the CPU's instruction cache is
>>>>> coherent with its data cache? Is it a self-modifying driver?
>>>>>
>>>>> Robin.
>>>>>
>>>>> (Note that the above is somewhat of a loaded question, and I do
>>>>> actually have half an idea of what you're trying to do here and
>>>>> why it won't fly, but I'd like to at least assume you've read the
>>>>> documentation of the function you decided was OK to use)
>>>>>
>>>>>> +
>>>>>> #else
>>>>>> WARN_ONCE(1, "Architecture has no drm_cache.c support\n");
>>>>>> #endif
next prev parent reply other threads:[~2022-03-07 16:53 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-25 3:24 [Intel-gfx] [PATCH v12 0/6] Use drm_clflush* instead of clflush Michael Cheng
2022-02-25 3:24 ` [Intel-gfx] [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range Michael Cheng
2022-02-25 16:28 ` Tvrtko Ursulin
2022-02-25 16:52 ` Michael Cheng
2022-02-25 17:33 ` Tvrtko Ursulin
2022-02-25 17:40 ` Michael Cheng
2022-02-25 18:19 ` Tvrtko Ursulin
2022-02-25 18:23 ` Michael Cheng
2022-02-25 18:42 ` Tvrtko Ursulin
2022-02-25 18:58 ` Matthew Wilcox
2022-02-25 18:24 ` Robin Murphy
2022-02-25 19:27 ` Michael Cheng
2022-03-02 12:49 ` Robin Murphy
2022-03-02 15:55 ` Michael Cheng
2022-03-02 17:06 ` Alex Deucher
2022-03-02 19:10 ` Robin Murphy
2022-03-07 16:52 ` Michael Cheng [this message]
2022-02-25 3:24 ` [Intel-gfx] [PATCH v12 2/6] drm/i915/gt: Re-work intel_write_status_page Michael Cheng
2022-02-25 3:24 ` [Intel-gfx] [PATCH v12 3/6] drm/i915/gt: Drop invalidate_csb_entries Michael Cheng
2022-02-25 3:24 ` [Intel-gfx] [PATCH v12 4/6] drm/i915/gt: Re-work reset_csb Michael Cheng
2022-02-25 3:24 ` [Intel-gfx] [PATCH v12 5/6] drm/i915/: Re-work clflush_write32 Michael Cheng
2022-02-25 3:24 ` [Intel-gfx] [PATCH v12 6/6] drm/i915/gt: replace cache_clflush_range Michael Cheng
2022-02-25 7:28 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Use drm_clflush* instead of clflush Patchwork
2022-02-25 7:29 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-02-25 7:59 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-02-26 1:56 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9e3f9e41-1086-3636-b8a5-4dd2ca9f38ac@intel.com \
--to=michael.cheng@intel.com \
--cc=catalin.marinas@arm.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=lucas.demarchi@intel.com \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox