public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Michael Cheng <michael.cheng@intel.com>
To: Robin Murphy <robin.murphy@arm.com>, <intel-gfx@lists.freedesktop.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	lucas.demarchi@intel.com, dri-devel@lists.freedesktop.org,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [Intel-gfx] [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range
Date: Mon, 7 Mar 2022 08:52:06 -0800	[thread overview]
Message-ID: <9e3f9e41-1086-3636-b8a5-4dd2ca9f38ac@intel.com> (raw)
In-Reply-To: <9d09be2d-34f2-212c-70c0-c698a1d8145a@arm.com>

Ah Thanks for the great feedback!

@Lucas or @Matt, could you please chime in?

Michael Cheng

On 2022-03-02 11:10 a.m., Robin Murphy wrote:
> On 2022-03-02 15:55, Michael Cheng wrote:
>> Thanks for the feedback Robin!
>>
>> Sorry my choices of word weren't that great, but what I meant is to 
>> understand how ARM flushes a range of dcache for device drivers, and 
>> not an equal to x86 clflush.
>>
>> I believe the concern is if the CPU writes an update, that update 
>> might only be sitting in the CPU cache and never make it to device 
>> memory where the device can see it; there are specific places that we 
>> are supposed to flush the CPU caches to make sure our updates are 
>> visible to the hardware.
>
> Ah, OK, if it's more about ordering, and it's actually write buffers 
> rather than caches that you care about flushing, then we might be a 
> lot safer, phew!
>
> For a very simple overview, in a case where the device itself needs to 
> observe memory writes in the correct order, e.g.:
>
>     data_descriptor.valid = 1;
>
>     clflush(&data_descriptor);
>
>     command_descriptor.data = &data_descriptor
>
>     writel(/* control register to read command to then read data */)
>
> then dma_wmb() between the first two writes should be the right tool 
> to ensure that the command does not observe the command update while 
> the data update is still sat somewhere in a CPU write buffer.
>
> If you want a slightly stronger notion that, at a given point, all 
> prior writes have actually been issued and should now be visible 
> (rather than just that they won't become visible in the wrong order 
> whenever they do), then wmb() should suffice on arm64.
>
> Note that wioth arm64 memory types, a Non-Cacheable mapping of DRAM 
> for a non-coherent DMA mapping, or of VRAM in a prefetchable BAR, can 
> still be write-buffered, so barriers still matter even when actual 
> cache maintenance ops don't (and as before if you're trying to perform 
> cache maintenance outside the DMA API then you've already lost 
> anyway). MMIO registers should be mapped as Device memory via 
> ioremap(), which is not bufferable, hence the barrier implicit in 
> writel() effectively pushes out any prior buffered writes ahead of a 
> register write, which is why we don't need to worry about this most of 
> the time.
>
> This is only a very rough overview, though, and I'm not familiar 
> enough with x86 semantics, your hardware, or the exact use-case to be 
> able to say whether barriers alone are anywhere near the right answer 
> or not.
>
> Robin.
>
>>
>> +Matt Roper
>>
>> Matt, Lucas, any feed back here?
>>
>> On 2022-03-02 4:49 a.m., Robin Murphy wrote:
>>> On 2022-02-25 19:27, Michael Cheng wrote:
>>>> Hi Robin,
>>>>
>>>> [ +arm64 maintainers for their awareness, which would have been a 
>>>> good thing to do from the start ]
>>>>
>>>>   * Thanks for adding the arm64 maintainer and sorry I didn't rope 
>>>> them
>>>>     in sooner.
>>>>
>>>> Why does i915 need to ensure the CPU's instruction cache is 
>>>> coherent with its data cache? Is it a self-modifying driver?
>>>>
>>>>   * Also thanks for pointing this out. Initially I was using
>>>>     dcache_clean_inval_poc, which seem to be the equivalently to what
>>>>     x86 is doing for dcache flushing, but it was giving me build 
>>>> errors
>>>>     since its not on the global list of kernel symbols. And after
>>>>     revisiting the documentation for caches_clean_inval_pou, it won't
>>>>     fly for what we are trying to do. Moving forward, what would 
>>>> you (or
>>>>     someone in the ARM community) suggest we do? Could it be 
>>>> possible to
>>>>     export dcache_clean_inval_poc as a global symbol?
>>>
>>> Unlikely, unless something with a legitimate need for CPU-centric 
>>> cache maintenance like kexec or CPU hotplug ever becomes modular.
>>>
>>> In the case of a device driver, it's not even the basic issues of 
>>> assuming to find direct equivalents to x86 semantics in other CPU 
>>> architectures, or effectively reinventing parts of the DMA API, it's 
>>> even bigger than that. Once you move from being integrated in a 
>>> single vendor's system architecture to being on a discrete card, you 
>>> fundamentally *no longer have any control over cache coherency*. 
>>> Whether the host CPU architecture happens to be AArch64, RISC-V, or 
>>> whatever doesn't really matter, you're at the mercy of 3rd-party 
>>> PCIe and interconnect IP vendors, and SoC integrators. You'll find 
>>> yourself in systems where PCIe simply cannot snoop any caches, where 
>>> you'd better have the correct DMA API calls in place to have any 
>>> hope of even the most basic functionality working properly; you'll 
>>> find yourself in systems where even if the PCIe root complex claims 
>>> to support No Snoop, your uncached traffic will still end up 
>>> snooping stale data that got prefetched back into caches you thought 
>>> you'd invalidated; you'll find yourself in systems where your memory 
>>> attributes may or may not get forcibly rewritten by an IOMMU 
>>> depending on the kernel config and/or command line.
>>>
>>> It's not about simply finding a substitute for clflush, it's that 
>>> the reasons you have for using clflush in the first place can no 
>>> longer be assumed to be valid.
>>>
>>> Robin.
>>>
>>>> On 2022-02-25 10:24 a.m., Robin Murphy wrote:
>>>>> [ +arm64 maintainers for their awareness, which would have been a 
>>>>> good thing to do from the start ]
>>>>>
>>>>> On 2022-02-25 03:24, Michael Cheng wrote:
>>>>>> Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou
>>>>>> performs a flush by first performing a clean, follow by an 
>>>>>> invalidation
>>>>>> operation.
>>>>>>
>>>>>> v2 (Michael Cheng): Use correct macro for cleaning and 
>>>>>> invalidation the
>>>>>>             dcache. Thanks Tvrtko for the suggestion.
>>>>>>
>>>>>> v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h
>>>>>>
>>>>>> v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc 
>>>>>> as a
>>>>>>             symbol that could be use by other modules, thus use
>>>>>>             caches_clean_inval_pou instead. Also this version
>>>>>>                 removes include for cacheflush, since its already
>>>>>>             included base on architecture type.
>>>>>>
>>>>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>>>>> Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/drm_cache.c | 5 +++++
>>>>>>   1 file changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_cache.c 
>>>>>> b/drivers/gpu/drm/drm_cache.c
>>>>>> index c3e6e615bf09..81c28714f930 100644
>>>>>> --- a/drivers/gpu/drm/drm_cache.c
>>>>>> +++ b/drivers/gpu/drm/drm_cache.c
>>>>>> @@ -174,6 +174,11 @@ drm_clflush_virt_range(void *addr, unsigned 
>>>>>> long length)
>>>>>>         if (wbinvd_on_all_cpus())
>>>>>>           pr_err("Timed out waiting for cache flush\n");
>>>>>> +
>>>>>> +#elif defined(CONFIG_ARM64)
>>>>>> +    void *end = addr + length;
>>>>>> +    caches_clean_inval_pou((unsigned long)addr, (unsigned 
>>>>>> long)end);
>>>>>
>>>>> Why does i915 need to ensure the CPU's instruction cache is 
>>>>> coherent with its data cache? Is it a self-modifying driver?
>>>>>
>>>>> Robin.
>>>>>
>>>>> (Note that the above is somewhat of a loaded question, and I do 
>>>>> actually have half an idea of what you're trying to do here and 
>>>>> why it won't fly, but I'd like to at least assume you've read the 
>>>>> documentation of the function you decided was OK to use)
>>>>>
>>>>>> +
>>>>>>   #else
>>>>>>       WARN_ONCE(1, "Architecture has no drm_cache.c support\n");
>>>>>>   #endif

  reply	other threads:[~2022-03-07 16:53 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-25  3:24 [Intel-gfx] [PATCH v12 0/6] Use drm_clflush* instead of clflush Michael Cheng
2022-02-25  3:24 ` [Intel-gfx] [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range Michael Cheng
2022-02-25 16:28   ` Tvrtko Ursulin
2022-02-25 16:52     ` Michael Cheng
2022-02-25 17:33       ` Tvrtko Ursulin
2022-02-25 17:40         ` Michael Cheng
2022-02-25 18:19           ` Tvrtko Ursulin
2022-02-25 18:23             ` Michael Cheng
2022-02-25 18:42               ` Tvrtko Ursulin
2022-02-25 18:58                 ` Matthew Wilcox
2022-02-25 18:24   ` Robin Murphy
2022-02-25 19:27     ` Michael Cheng
2022-03-02 12:49       ` Robin Murphy
2022-03-02 15:55         ` Michael Cheng
2022-03-02 17:06           ` Alex Deucher
2022-03-02 19:10           ` Robin Murphy
2022-03-07 16:52             ` Michael Cheng [this message]
2022-02-25  3:24 ` [Intel-gfx] [PATCH v12 2/6] drm/i915/gt: Re-work intel_write_status_page Michael Cheng
2022-02-25  3:24 ` [Intel-gfx] [PATCH v12 3/6] drm/i915/gt: Drop invalidate_csb_entries Michael Cheng
2022-02-25  3:24 ` [Intel-gfx] [PATCH v12 4/6] drm/i915/gt: Re-work reset_csb Michael Cheng
2022-02-25  3:24 ` [Intel-gfx] [PATCH v12 5/6] drm/i915/: Re-work clflush_write32 Michael Cheng
2022-02-25  3:24 ` [Intel-gfx] [PATCH v12 6/6] drm/i915/gt: replace cache_clflush_range Michael Cheng
2022-02-25  7:28 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Use drm_clflush* instead of clflush Patchwork
2022-02-25  7:29 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-02-25  7:59 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-02-26  1:56 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e3f9e41-1086-3636-b8a5-4dd2ca9f38ac@intel.com \
    --to=michael.cheng@intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=lucas.demarchi@intel.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox