* [PATCH 00/21] dma-mapping: unify support for cache flushes
@ 2023-03-27 12:12 ` Arnd Bergmann
0 siblings, 0 replies; 456+ messages in thread
From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw)
To: linux-kernel
Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong,
Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain,
Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer,
Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman,
Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker,
John Paul Adrian Glaubitz, David S. Miller, Max Filippov,
Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley,
linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky,
linux-hexagon, linux-m68k, linux-mips, linux-openrisc,
linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux,
linux-xtensa
From: Arnd Bergmann <arnd@arndb.de>
After a long discussion about adding SoC specific semantics for when
to flush caches in drivers/soc/ drivers that we determined to be
fundamentally flawed[1], I volunteered to try to move that logic into
architecture-independent code and make all existing architectures do
the same thing.
As we had determined earlier, the behavior is wildly different across
architectures, but most of the differences come down to either bugs
(when required flushes are missing) or extra flushes that are harmless
but might hurt performance.
I finally found the time to come up with an implementation of this, which
starts by replacing every outlier with one of the three common options:
1. architectures without speculative prefetching (hegagon, m68k,
openrisc, sh, sparc, and certain armv4 and xtensa implementations)
only flush their caches before a DMA, by cleaning write-back caches
(if any) before a DMA to the device, and by invalidating the caches
before a DMA from a device
2. arc, microblaze, mips, nios2, sh and later xtensa now follow the
normal 32-bit arm model and invalidate their writeback caches
again after a DMA from the device, to remove stale cache lines
that got prefetched during the DMA. arc, csky and mips used to
invalidate buffers also before the bidirectional DMA, but this
is now skipped whenever we know it gets invalidated again
after the DMA.
3. parisc, powerpc and riscv already flushed buffers before
a DMA_FROM_DEVICE, and these get moved to the arm64 behavior
that does the writeback before and invalidate after both
DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the
problem of accidentally leaking stale data if the DMA does
not actually happen[2].
The last patch in the series replaces the architecture specific code
with a shared version that implements all three based on architecture
specific parameters that are almost always determined at compile time.
The difference between cases 1. and 2. is hardware specific, while between
2. and 3. we need to decide which semantics we want, but I explicitly
avoid this question in my series and leave it to be decided later.
Another difference that I do not address here is what cache invalidation
does for partical cache lines. On arm32, arm64 and powerpc, a partial
cache line always gets written back before invalidation in order to
ensure that data before or after the buffer is not discarded. On all
other architectures, the assumption is cache lines are never shared
between DMA buffer and data that is accessed by the CPU. If we end up
always writing back dirty cache lines before a DMA (option 3 above),
then this point becomes moot, otherwise we should probably address this
in a follow-up series to document one behavior or the other and implement
it consistently.
Please review!
Arnd
[1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/
[2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/
Arnd Bergmann (21):
openrisc: dma-mapping: flush bidirectional mappings
xtensa: dma-mapping: use normal cache invalidation rules
sparc32: flush caches in dma_sync_*for_device
microblaze: dma-mapping: skip extra DMA flushes
powerpc: dma-mapping: split out cache operation logic
powerpc: dma-mapping: minimize for_cpu flushing
powerpc: dma-mapping: always clean cache in _for_device() op
riscv: dma-mapping: only invalidate after DMA, not flush
riscv: dma-mapping: skip invalidation before bidirectional DMA
csky: dma-mapping: skip invalidating before DMA from device
mips: dma-mapping: skip invalidating before bidirectional DMA
mips: dma-mapping: split out cache operation logic
arc: dma-mapping: skip invalidating before bidirectional DMA
parisc: dma-mapping: use regular flush/invalidate ops
ARM: dma-mapping: always invalidate WT caches before DMA
ARM: dma-mapping: bring back dmac_{clean,inv}_range
ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally
ARM: drop SMP support for ARM11MPCore
ARM: dma-mapping: use generic form of arch_sync_dma_* helpers
ARM: dma-mapping: split out arch_dma_mark_clean() helper
dma-mapping: replace custom code with generic implementation
arch/arc/mm/dma.c | 66 ++------
arch/arm/Kconfig | 4 +
arch/arm/include/asm/cacheflush.h | 21 +++
arch/arm/include/asm/glue-cache.h | 4 +
arch/arm/mach-oxnas/Kconfig | 4 -
arch/arm/mach-oxnas/Makefile | 1 -
arch/arm/mach-oxnas/headsmp.S | 23 ---
arch/arm/mach-oxnas/platsmp.c | 96 -----------
arch/arm/mach-versatile/platsmp-realview.c | 4 -
arch/arm/mm/Kconfig | 19 ---
arch/arm/mm/cache-fa.S | 4 +-
arch/arm/mm/cache-nop.S | 6 +
arch/arm/mm/cache-v4.S | 13 +-
arch/arm/mm/cache-v4wb.S | 4 +-
arch/arm/mm/cache-v4wt.S | 22 ++-
arch/arm/mm/cache-v6.S | 35 +---
arch/arm/mm/cache-v7.S | 6 +-
arch/arm/mm/cache-v7m.S | 4 +-
arch/arm/mm/dma-mapping-nommu.c | 36 ++--
arch/arm/mm/dma-mapping.c | 181 ++++++++++-----------
arch/arm/mm/proc-arm1020.S | 4 +-
arch/arm/mm/proc-arm1020e.S | 4 +-
arch/arm/mm/proc-arm1022.S | 4 +-
arch/arm/mm/proc-arm1026.S | 4 +-
arch/arm/mm/proc-arm920.S | 4 +-
arch/arm/mm/proc-arm922.S | 4 +-
arch/arm/mm/proc-arm925.S | 4 +-
arch/arm/mm/proc-arm926.S | 4 +-
arch/arm/mm/proc-arm940.S | 4 +-
arch/arm/mm/proc-arm946.S | 4 +-
arch/arm/mm/proc-feroceon.S | 8 +-
arch/arm/mm/proc-macros.S | 2 +
arch/arm/mm/proc-mohawk.S | 4 +-
arch/arm/mm/proc-xsc3.S | 4 +-
arch/arm/mm/proc-xscale.S | 6 +-
arch/arm64/mm/dma-mapping.c | 28 ++--
arch/csky/mm/dma-mapping.c | 46 +++---
arch/hexagon/kernel/dma.c | 44 ++---
arch/m68k/kernel/dma.c | 43 +++--
arch/microblaze/kernel/dma.c | 38 ++---
arch/mips/mm/dma-noncoherent.c | 75 +++------
arch/nios2/mm/dma-mapping.c | 57 +++----
arch/openrisc/kernel/dma.c | 62 ++++---
arch/parisc/include/asm/cacheflush.h | 6 +-
arch/parisc/kernel/pci-dma.c | 33 +++-
arch/powerpc/mm/dma-noncoherent.c | 76 +++++----
arch/riscv/mm/dma-noncoherent.c | 51 +++---
arch/sh/kernel/dma-coherent.c | 43 +++--
arch/sparc/Kconfig | 2 +-
arch/sparc/kernel/ioport.c | 38 +++--
arch/xtensa/Kconfig | 1 -
arch/xtensa/include/asm/cacheflush.h | 6 +-
arch/xtensa/kernel/pci-dma.c | 47 +++---
include/linux/dma-sync.h | 107 ++++++++++++
54 files changed, 721 insertions(+), 699 deletions(-)
delete mode 100644 arch/arm/mach-oxnas/headsmp.S
delete mode 100644 arch/arm/mach-oxnas/platsmp.c
create mode 100644 include/linux/dma-sync.h
--
2.39.2
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Neil Armstrong <neil.armstrong@linaro.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Cc: Conor Dooley <conor.dooley@microchip.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-oxnas@groups.io
Cc: linux-csky@vger.kernel.org
Cc: linux-hexagon@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@vger.kernel.org
Cc: linux-openrisc@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
^ permalink raw reply [flat|nested] 456+ messages in thread* [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the same thing. As we had determined earlier, the behavior is wildly different across architectures, but most of the differences come down to either bugs (when required flushes are missing) or extra flushes that are harmless but might hurt performance. I finally found the time to come up with an implementation of this, which starts by replacing every outlier with one of the three common options: 1. architectures without speculative prefetching (hegagon, m68k, openrisc, sh, sparc, and certain armv4 and xtensa implementations) only flush their caches before a DMA, by cleaning write-back caches (if any) before a DMA to the device, and by invalidating the caches before a DMA from a device 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the normal 32-bit arm model and invalidate their writeback caches again after a DMA from the device, to remove stale cache lines that got prefetched during the DMA. arc, csky and mips used to invalidate buffers also before the bidirectional DMA, but this is now skipped whenever we know it gets invalidated again after the DMA. 3. parisc, powerpc and riscv already flushed buffers before a DMA_FROM_DEVICE, and these get moved to the arm64 behavior that does the writeback before and invalidate after both DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the problem of accidentally leaking stale data if the DMA does not actually happen[2]. The last patch in the series replaces the architecture specific code with a shared version that implements all three based on architecture specific parameters that are almost always determined at compile time. The difference between cases 1. and 2. is hardware specific, while between 2. and 3. we need to decide which semantics we want, but I explicitly avoid this question in my series and leave it to be decided later. Another difference that I do not address here is what cache invalidation does for partical cache lines. On arm32, arm64 and powerpc, a partial cache line always gets written back before invalidation in order to ensure that data before or after the buffer is not discarded. On all other architectures, the assumption is cache lines are never shared between DMA buffer and data that is accessed by the CPU. If we end up always writing back dirty cache lines before a DMA (option 3 above), then this point becomes moot, otherwise we should probably address this in a follow-up series to document one behavior or the other and implement it consistently. Please review! Arnd [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Arnd Bergmann (21): openrisc: dma-mapping: flush bidirectional mappings xtensa: dma-mapping: use normal cache invalidation rules sparc32: flush caches in dma_sync_*for_device microblaze: dma-mapping: skip extra DMA flushes powerpc: dma-mapping: split out cache operation logic powerpc: dma-mapping: minimize for_cpu flushing powerpc: dma-mapping: always clean cache in _for_device() op riscv: dma-mapping: only invalidate after DMA, not flush riscv: dma-mapping: skip invalidation before bidirectional DMA csky: dma-mapping: skip invalidating before DMA from device mips: dma-mapping: skip invalidating before bidirectional DMA mips: dma-mapping: split out cache operation logic arc: dma-mapping: skip invalidating before bidirectional DMA parisc: dma-mapping: use regular flush/invalidate ops ARM: dma-mapping: always invalidate WT caches before DMA ARM: dma-mapping: bring back dmac_{clean,inv}_range ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally ARM: drop SMP support for ARM11MPCore ARM: dma-mapping: use generic form of arch_sync_dma_* helpers ARM: dma-mapping: split out arch_dma_mark_clean() helper dma-mapping: replace custom code with generic implementation arch/arc/mm/dma.c | 66 ++------ arch/arm/Kconfig | 4 + arch/arm/include/asm/cacheflush.h | 21 +++ arch/arm/include/asm/glue-cache.h | 4 + arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 --- arch/arm/mach-oxnas/platsmp.c | 96 ----------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 --- arch/arm/mm/cache-fa.S | 4 +- arch/arm/mm/cache-nop.S | 6 + arch/arm/mm/cache-v4.S | 13 +- arch/arm/mm/cache-v4wb.S | 4 +- arch/arm/mm/cache-v4wt.S | 22 ++- arch/arm/mm/cache-v6.S | 35 +--- arch/arm/mm/cache-v7.S | 6 +- arch/arm/mm/cache-v7m.S | 4 +- arch/arm/mm/dma-mapping-nommu.c | 36 ++-- arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- arch/arm/mm/proc-arm1020.S | 4 +- arch/arm/mm/proc-arm1020e.S | 4 +- arch/arm/mm/proc-arm1022.S | 4 +- arch/arm/mm/proc-arm1026.S | 4 +- arch/arm/mm/proc-arm920.S | 4 +- arch/arm/mm/proc-arm922.S | 4 +- arch/arm/mm/proc-arm925.S | 4 +- arch/arm/mm/proc-arm926.S | 4 +- arch/arm/mm/proc-arm940.S | 4 +- arch/arm/mm/proc-arm946.S | 4 +- arch/arm/mm/proc-feroceon.S | 8 +- arch/arm/mm/proc-macros.S | 2 + arch/arm/mm/proc-mohawk.S | 4 +- arch/arm/mm/proc-xsc3.S | 4 +- arch/arm/mm/proc-xscale.S | 6 +- arch/arm64/mm/dma-mapping.c | 28 ++-- arch/csky/mm/dma-mapping.c | 46 +++--- arch/hexagon/kernel/dma.c | 44 ++--- arch/m68k/kernel/dma.c | 43 +++-- arch/microblaze/kernel/dma.c | 38 ++--- arch/mips/mm/dma-noncoherent.c | 75 +++------ arch/nios2/mm/dma-mapping.c | 57 +++---- arch/openrisc/kernel/dma.c | 62 ++++--- arch/parisc/include/asm/cacheflush.h | 6 +- arch/parisc/kernel/pci-dma.c | 33 +++- arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++--- arch/sh/kernel/dma-coherent.c | 43 +++-- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 38 +++-- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +- arch/xtensa/kernel/pci-dma.c | 47 +++--- include/linux/dma-sync.h | 107 ++++++++++++ 54 files changed, 721 insertions(+), 699 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c create mode 100644 include/linux/dma-sync.h -- 2.39.2 Cc: Vineet Gupta <vgupta@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Brian Cain <bcain@quicinc.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cc: Conor Dooley <conor.dooley@microchip.com> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-oxnas@groups.io Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-openrisc@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the same thing. As we had determined earlier, the behavior is wildly different across architectures, but most of the differences come down to either bugs (when required flushes are missing) or extra flushes that are harmless but might hurt performance. I finally found the time to come up with an implementation of this, which starts by replacing every outlier with one of the three common options: 1. architectures without speculative prefetching (hegagon, m68k, openrisc, sh, sparc, and certain armv4 and xtensa implementations) only flush their caches before a DMA, by cleaning write-back caches (if any) before a DMA to the device, and by invalidating the caches before a DMA from a device 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the normal 32-bit arm model and invalidate their writeback caches again after a DMA from the device, to remove stale cache lines that got prefetched during the DMA. arc, csky and mips used to invalidate buffers also before the bidirectional DMA, but this is now skipped whenever we know it gets invalidated again after the DMA. 3. parisc, powerpc and riscv already flushed buffers before a DMA_FROM_DEVICE, and these get moved to the arm64 behavior that does the writeback before and invalidate after both DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the problem of accidentally leaking stale data if the DMA does not actually happen[2]. The last patch in the series replaces the architecture specific code with a shared version that implements all three based on architecture specific parameters that are almost always determined at compile time. The difference between cases 1. and 2. is hardware specific, while between 2. and 3. we need to decide which semantics we want, but I explicitly avoid this question in my series and leave it to be decided later. Another difference that I do not address here is what cache invalidation does for partical cache lines. On arm32, arm64 and powerpc, a partial cache line always gets written back before invalidation in order to ensure that data before or after the buffer is not discarded. On all other architectures, the assumption is cache lines are never shared between DMA buffer and data that is accessed by the CPU. If we end up always writing back dirty cache lines before a DMA (option 3 above), then this point becomes moot, otherwise we should probably address this in a follow-up series to document one behavior or the other and implement it consistently. Please review! Arnd [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Arnd Bergmann (21): openrisc: dma-mapping: flush bidirectional mappings xtensa: dma-mapping: use normal cache invalidation rules sparc32: flush caches in dma_sync_*for_device microblaze: dma-mapping: skip extra DMA flushes powerpc: dma-mapping: split out cache operation logic powerpc: dma-mapping: minimize for_cpu flushing powerpc: dma-mapping: always clean cache in _for_device() op riscv: dma-mapping: only invalidate after DMA, not flush riscv: dma-mapping: skip invalidation before bidirectional DMA csky: dma-mapping: skip invalidating before DMA from device mips: dma-mapping: skip invalidating before bidirectional DMA mips: dma-mapping: split out cache operation logic arc: dma-mapping: skip invalidating before bidirectional DMA parisc: dma-mapping: use regular flush/invalidate ops ARM: dma-mapping: always invalidate WT caches before DMA ARM: dma-mapping: bring back dmac_{clean,inv}_range ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally ARM: drop SMP support for ARM11MPCore ARM: dma-mapping: use generic form of arch_sync_dma_* helpers ARM: dma-mapping: split out arch_dma_mark_clean() helper dma-mapping: replace custom code with generic implementation arch/arc/mm/dma.c | 66 ++------ arch/arm/Kconfig | 4 + arch/arm/include/asm/cacheflush.h | 21 +++ arch/arm/include/asm/glue-cache.h | 4 + arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 --- arch/arm/mach-oxnas/platsmp.c | 96 ----------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 --- arch/arm/mm/cache-fa.S | 4 +- arch/arm/mm/cache-nop.S | 6 + arch/arm/mm/cache-v4.S | 13 +- arch/arm/mm/cache-v4wb.S | 4 +- arch/arm/mm/cache-v4wt.S | 22 ++- arch/arm/mm/cache-v6.S | 35 +--- arch/arm/mm/cache-v7.S | 6 +- arch/arm/mm/cache-v7m.S | 4 +- arch/arm/mm/dma-mapping-nommu.c | 36 ++-- arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- arch/arm/mm/proc-arm1020.S | 4 +- arch/arm/mm/proc-arm1020e.S | 4 +- arch/arm/mm/proc-arm1022.S | 4 +- arch/arm/mm/proc-arm1026.S | 4 +- arch/arm/mm/proc-arm920.S | 4 +- arch/arm/mm/proc-arm922.S | 4 +- arch/arm/mm/proc-arm925.S | 4 +- arch/arm/mm/proc-arm926.S | 4 +- arch/arm/mm/proc-arm940.S | 4 +- arch/arm/mm/proc-arm946.S | 4 +- arch/arm/mm/proc-feroceon.S | 8 +- arch/arm/mm/proc-macros.S | 2 + arch/arm/mm/proc-mohawk.S | 4 +- arch/arm/mm/proc-xsc3.S | 4 +- arch/arm/mm/proc-xscale.S | 6 +- arch/arm64/mm/dma-mapping.c | 28 ++-- arch/csky/mm/dma-mapping.c | 46 +++--- arch/hexagon/kernel/dma.c | 44 ++--- arch/m68k/kernel/dma.c | 43 +++-- arch/microblaze/kernel/dma.c | 38 ++--- arch/mips/mm/dma-noncoherent.c | 75 +++------ arch/nios2/mm/dma-mapping.c | 57 +++---- arch/openrisc/kernel/dma.c | 62 ++++--- arch/parisc/include/asm/cacheflush.h | 6 +- arch/parisc/kernel/pci-dma.c | 33 +++- arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++--- arch/sh/kernel/dma-coherent.c | 43 +++-- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 38 +++-- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +- arch/xtensa/kernel/pci-dma.c | 47 +++--- include/linux/dma-sync.h | 107 ++++++++++++ 54 files changed, 721 insertions(+), 699 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c create mode 100644 include/linux/dma-sync.h -- 2.39.2 Cc: Vineet Gupta <vgupta@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Brian Cain <bcain@quicinc.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cc: Conor Dooley <conor.dooley@microchip.com> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-oxnas@groups.io Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-openrisc@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the same thing. As we had determined earlier, the behavior is wildly different across architectures, but most of the differences come down to either bugs (when required flushes are missing) or extra flushes that are harmless but might hurt performance. I finally found the time to come up with an implementation of this, which starts by replacing every outlier with one of the three common options: 1. architectures without speculative prefetching (hegagon, m68k, openrisc, sh, sparc, and certain armv4 and xtensa implementations) only flush their caches before a DMA, by cleaning write-back caches (if any) before a DMA to the device, and by invalidating the caches before a DMA from a device 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the normal 32-bit arm model and invalidate their writeback caches again after a DMA from the device, to remove stale cache lines that got prefetched during the DMA. arc, csky and mips used to invalidate buffers also before the bidirectional DMA, but this is now skipped whenever we know it gets invalidated again after the DMA. 3. parisc, powerpc and riscv already flushed buffers before a DMA_FROM_DEVICE, and these get moved to the arm64 behavior that does the writeback before and invalidate after both DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the problem of accidentally leaking stale data if the DMA does not actually happen[2]. The last patch in the series replaces the architecture specific code with a shared version that implements all three based on architecture specific parameters that are almost always determined at compile time. The difference between cases 1. and 2. is hardware specific, while between 2. and 3. we need to decide which semantics we want, but I explicitly avoid this question in my series and leave it to be decided later. Another difference that I do not address here is what cache invalidation does for partical cache lines. On arm32, arm64 and powerpc, a partial cache line always gets written back before invalidation in order to ensure that data before or after the buffer is not discarded. On all other architectures, the assumption is cache lines are never shared between DMA buffer and data that is accessed by the CPU. If we end up always writing back dirty cache lines before a DMA (option 3 above), then this point becomes moot, otherwise we should probably address this in a follow-up series to document one behavior or the other and implement it consistently. Please review! Arnd [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Arnd Bergmann (21): openrisc: dma-mapping: flush bidirectional mappings xtensa: dma-mapping: use normal cache invalidation rules sparc32: flush caches in dma_sync_*for_device microblaze: dma-mapping: skip extra DMA flushes powerpc: dma-mapping: split out cache operation logic powerpc: dma-mapping: minimize for_cpu flushing powerpc: dma-mapping: always clean cache in _for_device() op riscv: dma-mapping: only invalidate after DMA, not flush riscv: dma-mapping: skip invalidation before bidirectional DMA csky: dma-mapping: skip invalidating before DMA from device mips: dma-mapping: skip invalidating before bidirectional DMA mips: dma-mapping: split out cache operation logic arc: dma-mapping: skip invalidating before bidirectional DMA parisc: dma-mapping: use regular flush/invalidate ops ARM: dma-mapping: always invalidate WT caches before DMA ARM: dma-mapping: bring back dmac_{clean,inv}_range ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally ARM: drop SMP support for ARM11MPCore ARM: dma-mapping: use generic form of arch_sync_dma_* helpers ARM: dma-mapping: split out arch_dma_mark_clean() helper dma-mapping: replace custom code with generic implementation arch/arc/mm/dma.c | 66 ++------ arch/arm/Kconfig | 4 + arch/arm/include/asm/cacheflush.h | 21 +++ arch/arm/include/asm/glue-cache.h | 4 + arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 --- arch/arm/mach-oxnas/platsmp.c | 96 ----------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 --- arch/arm/mm/cache-fa.S | 4 +- arch/arm/mm/cache-nop.S | 6 + arch/arm/mm/cache-v4.S | 13 +- arch/arm/mm/cache-v4wb.S | 4 +- arch/arm/mm/cache-v4wt.S | 22 ++- arch/arm/mm/cache-v6.S | 35 +--- arch/arm/mm/cache-v7.S | 6 +- arch/arm/mm/cache-v7m.S | 4 +- arch/arm/mm/dma-mapping-nommu.c | 36 ++-- arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- arch/arm/mm/proc-arm1020.S | 4 +- arch/arm/mm/proc-arm1020e.S | 4 +- arch/arm/mm/proc-arm1022.S | 4 +- arch/arm/mm/proc-arm1026.S | 4 +- arch/arm/mm/proc-arm920.S | 4 +- arch/arm/mm/proc-arm922.S | 4 +- arch/arm/mm/proc-arm925.S | 4 +- arch/arm/mm/proc-arm926.S | 4 +- arch/arm/mm/proc-arm940.S | 4 +- arch/arm/mm/proc-arm946.S | 4 +- arch/arm/mm/proc-feroceon.S | 8 +- arch/arm/mm/proc-macros.S | 2 + arch/arm/mm/proc-mohawk.S | 4 +- arch/arm/mm/proc-xsc3.S | 4 +- arch/arm/mm/proc-xscale.S | 6 +- arch/arm64/mm/dma-mapping.c | 28 ++-- arch/csky/mm/dma-mapping.c | 46 +++--- arch/hexagon/kernel/dma.c | 44 ++--- arch/m68k/kernel/dma.c | 43 +++-- arch/microblaze/kernel/dma.c | 38 ++--- arch/mips/mm/dma-noncoherent.c | 75 +++------ arch/nios2/mm/dma-mapping.c | 57 +++---- arch/openrisc/kernel/dma.c | 62 ++++--- arch/parisc/include/asm/cacheflush.h | 6 +- arch/parisc/kernel/pci-dma.c | 33 +++- arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++--- arch/sh/kernel/dma-coherent.c | 43 +++-- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 38 +++-- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +- arch/xtensa/kernel/pci-dma.c | 47 +++--- include/linux/dma-sync.h | 107 ++++++++++++ 54 files changed, 721 insertions(+), 699 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c create mode 100644 include/linux/dma-sync.h -- 2.39.2 Cc: Vineet Gupta <vgupta@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Brian Cain <bcain@quicinc.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cc: Conor Dooley <conor.dooley@microchip.com> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-oxnas@groups.io Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-openrisc@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the same thing. As we had determined earlier, the behavior is wildly different across architectures, but most of the differences come down to either bugs (when required flushes are missing) or extra flushes that are harmless but might hurt performance. I finally found the time to come up with an implementation of this, which starts by replacing every outlier with one of the three common options: 1. architectures without speculative prefetching (hegagon, m68k, openrisc, sh, sparc, and certain armv4 and xtensa implementations) only flush their caches before a DMA, by cleaning write-back caches (if any) before a DMA to the device, and by invalidating the caches before a DMA from a device 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the normal 32-bit arm model and invalidate their writeback caches again after a DMA from the device, to remove stale cache lines that got prefetched during the DMA. arc, csky and mips used to invalidate buffers also before the bidirectional DMA, but this is now skipped whenever we know it gets invalidated again after the DMA. 3. parisc, powerpc and riscv already flushed buffers before a DMA_FROM_DEVICE, and these get moved to the arm64 behavior that does the writeback before and invalidate after both DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the problem of accidentally leaking stale data if the DMA does not actually happen[2]. The last patch in the series replaces the architecture specific code with a shared version that implements all three based on architecture specific parameters that are almost always determined at compile time. The difference between cases 1. and 2. is hardware specific, while between 2. and 3. we need to decide which semantics we want, but I explicitly avoid this question in my series and leave it to be decided later. Another difference that I do not address here is what cache invalidation does for partical cache lines. On arm32, arm64 and powerpc, a partial cache line always gets written back before invalidation in order to ensure that data before or after the buffer is not discarded. On all other architectures, the assumption is cache lines are never shared between DMA buffer and data that is accessed by the CPU. If we end up always writing back dirty cache lines before a DMA (option 3 above), then this point becomes moot, otherwise we should probably address this in a follow-up series to document one behavior or the other and implement it consistently. Please review! Arnd [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Arnd Bergmann (21): openrisc: dma-mapping: flush bidirectional mappings xtensa: dma-mapping: use normal cache invalidation rules sparc32: flush caches in dma_sync_*for_device microblaze: dma-mapping: skip extra DMA flushes powerpc: dma-mapping: split out cache operation logic powerpc: dma-mapping: minimize for_cpu flushing powerpc: dma-mapping: always clean cache in _for_device() op riscv: dma-mapping: only invalidate after DMA, not flush riscv: dma-mapping: skip invalidation before bidirectional DMA csky: dma-mapping: skip invalidating before DMA from device mips: dma-mapping: skip invalidating before bidirectional DMA mips: dma-mapping: split out cache operation logic arc: dma-mapping: skip invalidating before bidirectional DMA parisc: dma-mapping: use regular flush/invalidate ops ARM: dma-mapping: always invalidate WT caches before DMA ARM: dma-mapping: bring back dmac_{clean,inv}_range ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally ARM: drop SMP support for ARM11MPCore ARM: dma-mapping: use generic form of arch_sync_dma_* helpers ARM: dma-mapping: split out arch_dma_mark_clean() helper dma-mapping: replace custom code with generic implementation arch/arc/mm/dma.c | 66 ++------ arch/arm/Kconfig | 4 + arch/arm/include/asm/cacheflush.h | 21 +++ arch/arm/include/asm/glue-cache.h | 4 + arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 --- arch/arm/mach-oxnas/platsmp.c | 96 ----------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 --- arch/arm/mm/cache-fa.S | 4 +- arch/arm/mm/cache-nop.S | 6 + arch/arm/mm/cache-v4.S | 13 +- arch/arm/mm/cache-v4wb.S | 4 +- arch/arm/mm/cache-v4wt.S | 22 ++- arch/arm/mm/cache-v6.S | 35 +--- arch/arm/mm/cache-v7.S | 6 +- arch/arm/mm/cache-v7m.S | 4 +- arch/arm/mm/dma-mapping-nommu.c | 36 ++-- arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- arch/arm/mm/proc-arm1020.S | 4 +- arch/arm/mm/proc-arm1020e.S | 4 +- arch/arm/mm/proc-arm1022.S | 4 +- arch/arm/mm/proc-arm1026.S | 4 +- arch/arm/mm/proc-arm920.S | 4 +- arch/arm/mm/proc-arm922.S | 4 +- arch/arm/mm/proc-arm925.S | 4 +- arch/arm/mm/proc-arm926.S | 4 +- arch/arm/mm/proc-arm940.S | 4 +- arch/arm/mm/proc-arm946.S | 4 +- arch/arm/mm/proc-feroceon.S | 8 +- arch/arm/mm/proc-macros.S | 2 + arch/arm/mm/proc-mohawk.S | 4 +- arch/arm/mm/proc-xsc3.S | 4 +- arch/arm/mm/proc-xscale.S | 6 +- arch/arm64/mm/dma-mapping.c | 28 ++-- arch/csky/mm/dma-mapping.c | 46 +++--- arch/hexagon/kernel/dma.c | 44 ++--- arch/m68k/kernel/dma.c | 43 +++-- arch/microblaze/kernel/dma.c | 38 ++--- arch/mips/mm/dma-noncoherent.c | 75 +++------ arch/nios2/mm/dma-mapping.c | 57 +++---- arch/openrisc/kernel/dma.c | 62 ++++--- arch/parisc/include/asm/cacheflush.h | 6 +- arch/parisc/kernel/pci-dma.c | 33 +++- arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++--- arch/sh/kernel/dma-coherent.c | 43 +++-- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 38 +++-- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +- arch/xtensa/kernel/pci-dma.c | 47 +++--- include/linux/dma-sync.h | 107 ++++++++++++ 54 files changed, 721 insertions(+), 699 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c create mode 100644 include/linux/dma-sync.h -- 2.39.2 Cc: Vineet Gupta <vgupta@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Brian Cain <bcain@quicinc.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cc: Conor Dooley <conor.dooley@microchip.com> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-oxnas@groups.io Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-openrisc@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John From: Arnd Bergmann <arnd@arndb.de> After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the same thing. As we had determined earlier, the behavior is wildly different across architectures, but most of the differences come down to either bugs (when required flushes are missing) or extra flushes that are harmless but might hurt performance. I finally found the time to come up with an implementation of this, which starts by replacing every outlier with one of the three common options: 1. architectures without speculative prefetching (hegagon, m68k, openrisc, sh, sparc, and certain armv4 and xtensa implementations) only flush their caches before a DMA, by cleaning write-back caches (if any) before a DMA to the device, and by invalidating the caches before a DMA from a device 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the normal 32-bit arm model and invalidate their writeback caches again after a DMA from the device, to remove stale cache lines that got prefetched during the DMA. arc, csky and mips used to invalidate buffers also before the bidirectional DMA, but this is now skipped whenever we know it gets invalidated again after the DMA. 3. parisc, powerpc and riscv already flushed buffers before a DMA_FROM_DEVICE, and these get moved to the arm64 behavior that does the writeback before and invalidate after both DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the problem of accidentally leaking stale data if the DMA does not actually happen[2]. The last patch in the series replaces the architecture specific code with a shared version that implements all three based on architecture specific parameters that are almost always determined at compile time. The difference between cases 1. and 2. is hardware specific, while between 2. and 3. we need to decide which semantics we want, but I explicitly avoid this question in my series and leave it to be decided later. Another difference that I do not address here is what cache invalidation does for partical cache lines. On arm32, arm64 and powerpc, a partial cache line always gets written back before invalidation in order to ensure that data before or after the buffer is not discarded. On all other architectures, the assumption is cache lines are never shared between DMA buffer and data that is accessed by the CPU. If we end up always writing back dirty cache lines before a DMA (option 3 above), then this point becomes moot, otherwise we should probably address this in a follow-up series to document one behavior or the other and implement it consistently. Please review! Arnd [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Arnd Bergmann (21): openrisc: dma-mapping: flush bidirectional mappings xtensa: dma-mapping: use normal cache invalidation rules sparc32: flush caches in dma_sync_*for_device microblaze: dma-mapping: skip extra DMA flushes powerpc: dma-mapping: split out cache operation logic powerpc: dma-mapping: minimize for_cpu flushing powerpc: dma-mapping: always clean cache in _for_device() op riscv: dma-mapping: only invalidate after DMA, not flush riscv: dma-mapping: skip invalidation before bidirectional DMA csky: dma-mapping: skip invalidating before DMA from device mips: dma-mapping: skip invalidating before bidirectional DMA mips: dma-mapping: split out cache operation logic arc: dma-mapping: skip invalidating before bidirectional DMA parisc: dma-mapping: use regular flush/invalidate ops ARM: dma-mapping: always invalidate WT caches before DMA ARM: dma-mapping: bring back dmac_{clean,inv}_range ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally ARM: drop SMP support for ARM11MPCore ARM: dma-mapping: use generic form of arch_sync_dma_* helpers ARM: dma-mapping: split out arch_dma_mark_clean() helper dma-mapping: replace custom code with generic implementation arch/arc/mm/dma.c | 66 ++------ arch/arm/Kconfig | 4 + arch/arm/include/asm/cacheflush.h | 21 +++ arch/arm/include/asm/glue-cache.h | 4 + arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 --- arch/arm/mach-oxnas/platsmp.c | 96 ----------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 --- arch/arm/mm/cache-fa.S | 4 +- arch/arm/mm/cache-nop.S | 6 + arch/arm/mm/cache-v4.S | 13 +- arch/arm/mm/cache-v4wb.S | 4 +- arch/arm/mm/cache-v4wt.S | 22 ++- arch/arm/mm/cache-v6.S | 35 +--- arch/arm/mm/cache-v7.S | 6 +- arch/arm/mm/cache-v7m.S | 4 +- arch/arm/mm/dma-mapping-nommu.c | 36 ++-- arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- arch/arm/mm/proc-arm1020.S | 4 +- arch/arm/mm/proc-arm1020e.S | 4 +- arch/arm/mm/proc-arm1022.S | 4 +- arch/arm/mm/proc-arm1026.S | 4 +- arch/arm/mm/proc-arm920.S | 4 +- arch/arm/mm/proc-arm922.S | 4 +- arch/arm/mm/proc-arm925.S | 4 +- arch/arm/mm/proc-arm926.S | 4 +- arch/arm/mm/proc-arm940.S | 4 +- arch/arm/mm/proc-arm946.S | 4 +- arch/arm/mm/proc-feroceon.S | 8 +- arch/arm/mm/proc-macros.S | 2 + arch/arm/mm/proc-mohawk.S | 4 +- arch/arm/mm/proc-xsc3.S | 4 +- arch/arm/mm/proc-xscale.S | 6 +- arch/arm64/mm/dma-mapping.c | 28 ++-- arch/csky/mm/dma-mapping.c | 46 +++--- arch/hexagon/kernel/dma.c | 44 ++--- arch/m68k/kernel/dma.c | 43 +++-- arch/microblaze/kernel/dma.c | 38 ++--- arch/mips/mm/dma-noncoherent.c | 75 +++------ arch/nios2/mm/dma-mapping.c | 57 +++---- arch/openrisc/kernel/dma.c | 62 ++++--- arch/parisc/include/asm/cacheflush.h | 6 +- arch/parisc/kernel/pci-dma.c | 33 +++- arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++--- arch/sh/kernel/dma-coherent.c | 43 +++-- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 38 +++-- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +- arch/xtensa/kernel/pci-dma.c | 47 +++--- include/linux/dma-sync.h | 107 ++++++++++++ 54 files changed, 721 insertions(+), 699 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c create mode 100644 include/linux/dma-sync.h -- 2.39.2 Cc: Vineet Gupta <vgupta@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Brian Cain <bcain@quicinc.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cc: Conor Dooley <conor.dooley@microchip.com> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-oxnas@groups.io Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-openrisc@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 01/21] openrisc: dma-mapping: flush bidirectional mappings 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:12 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The cache management operations on DMA are different from the other architectures: - on DMA_TO_DEVICE, Openrisc currently invalidates the cache after the writeback, where a simple writeback without invalidation should be sufficient. - on DMA_BIDIRECTIONAL, Openrisc does nothing, while most architectures either flush before DMA, or writeback before and invalidate after DMA. The separate invalidation for DMA_BIDIRECTIONAL/DMA_FROM_DEVICE is only required on CPUs that can do speculative prefetches. Change both to have the normal set of operations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/openrisc/kernel/dma.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index b3edbb33b621..91a00d09ffad 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -103,10 +103,10 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, switch (dir) { case DMA_TO_DEVICE: - /* Flush the dcache for the requested range */ + /* Write back the dcache for the requested range */ for (cl = addr; cl < addr + size; cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); + mtspr(SPR_DCBWR, cl); break; case DMA_FROM_DEVICE: /* Invalidate the dcache for the requested range */ @@ -114,12 +114,13 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, cl += cpuinfo->dcache_block_size) mtspr(SPR_DCBIR, cl); break; + case DMA_BIDIRECTIONAL: + /* Flush the dcache for the requested range */ + for (cl = addr; cl < addr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); + break; default: - /* - * NOTE: If dir == DMA_BIDIRECTIONAL then there's no need to - * flush nor invalidate the cache here as the area will need - * to be manually synced anyway. - */ break; } } -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 01/21] openrisc: dma-mapping: flush bidirectional mappings @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The cache management operations on DMA are different from the other architectures: - on DMA_TO_DEVICE, Openrisc currently invalidates the cache after the writeback, where a simple writeback without invalidation should be sufficient. - on DMA_BIDIRECTIONAL, Openrisc does nothing, while most architectures either flush before DMA, or writeback before and invalidate after DMA. The separate invalidation for DMA_BIDIRECTIONAL/DMA_FROM_DEVICE is only required on CPUs that can do speculative prefetches. Change both to have the normal set of operations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/openrisc/kernel/dma.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index b3edbb33b621..91a00d09ffad 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -103,10 +103,10 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, switch (dir) { case DMA_TO_DEVICE: - /* Flush the dcache for the requested range */ + /* Write back the dcache for the requested range */ for (cl = addr; cl < addr + size; cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); + mtspr(SPR_DCBWR, cl); break; case DMA_FROM_DEVICE: /* Invalidate the dcache for the requested range */ @@ -114,12 +114,13 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, cl += cpuinfo->dcache_block_size) mtspr(SPR_DCBIR, cl); break; + case DMA_BIDIRECTIONAL: + /* Flush the dcache for the requested range */ + for (cl = addr; cl < addr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); + break; default: - /* - * NOTE: If dir == DMA_BIDIRECTIONAL then there's no need to - * flush nor invalidate the cache here as the area will need - * to be manually synced anyway. - */ break; } } -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 01/21] openrisc: dma-mapping: flush bidirectional mappings @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> The cache management operations on DMA are different from the other architectures: - on DMA_TO_DEVICE, Openrisc currently invalidates the cache after the writeback, where a simple writeback without invalidation should be sufficient. - on DMA_BIDIRECTIONAL, Openrisc does nothing, while most architectures either flush before DMA, or writeback before and invalidate after DMA. The separate invalidation for DMA_BIDIRECTIONAL/DMA_FROM_DEVICE is only required on CPUs that can do speculative prefetches. Change both to have the normal set of operations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/openrisc/kernel/dma.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index b3edbb33b621..91a00d09ffad 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -103,10 +103,10 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, switch (dir) { case DMA_TO_DEVICE: - /* Flush the dcache for the requested range */ + /* Write back the dcache for the requested range */ for (cl = addr; cl < addr + size; cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); + mtspr(SPR_DCBWR, cl); break; case DMA_FROM_DEVICE: /* Invalidate the dcache for the requested range */ @@ -114,12 +114,13 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, cl += cpuinfo->dcache_block_size) mtspr(SPR_DCBIR, cl); break; + case DMA_BIDIRECTIONAL: + /* Flush the dcache for the requested range */ + for (cl = addr; cl < addr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); + break; default: - /* - * NOTE: If dir == DMA_BIDIRECTIONAL then there's no need to - * flush nor invalidate the cache here as the area will need - * to be manually synced anyway. - */ break; } } -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 01/21] openrisc: dma-mapping: flush bidirectional mappings @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The cache management operations on DMA are different from the other architectures: - on DMA_TO_DEVICE, Openrisc currently invalidates the cache after the writeback, where a simple writeback without invalidation should be sufficient. - on DMA_BIDIRECTIONAL, Openrisc does nothing, while most architectures either flush before DMA, or writeback before and invalidate after DMA. The separate invalidation for DMA_BIDIRECTIONAL/DMA_FROM_DEVICE is only required on CPUs that can do speculative prefetches. Change both to have the normal set of operations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/openrisc/kernel/dma.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index b3edbb33b621..91a00d09ffad 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -103,10 +103,10 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, switch (dir) { case DMA_TO_DEVICE: - /* Flush the dcache for the requested range */ + /* Write back the dcache for the requested range */ for (cl = addr; cl < addr + size; cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); + mtspr(SPR_DCBWR, cl); break; case DMA_FROM_DEVICE: /* Invalidate the dcache for the requested range */ @@ -114,12 +114,13 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, cl += cpuinfo->dcache_block_size) mtspr(SPR_DCBIR, cl); break; + case DMA_BIDIRECTIONAL: + /* Flush the dcache for the requested range */ + for (cl = addr; cl < addr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); + break; default: - /* - * NOTE: If dir == DMA_BIDIRECTIONAL then there's no need to - * flush nor invalidate the cache here as the area will need - * to be manually synced anyway. - */ break; } } -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 01/21] openrisc: dma-mapping: flush bidirectional mappings @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The cache management operations on DMA are different from the other architectures: - on DMA_TO_DEVICE, Openrisc currently invalidates the cache after the writeback, where a simple writeback without invalidation should be sufficient. - on DMA_BIDIRECTIONAL, Openrisc does nothing, while most architectures either flush before DMA, or writeback before and invalidate after DMA. The separate invalidation for DMA_BIDIRECTIONAL/DMA_FROM_DEVICE is only required on CPUs that can do speculative prefetches. Change both to have the normal set of operations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/openrisc/kernel/dma.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index b3edbb33b621..91a00d09ffad 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -103,10 +103,10 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, switch (dir) { case DMA_TO_DEVICE: - /* Flush the dcache for the requested range */ + /* Write back the dcache for the requested range */ for (cl = addr; cl < addr + size; cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); + mtspr(SPR_DCBWR, cl); break; case DMA_FROM_DEVICE: /* Invalidate the dcache for the requested range */ @@ -114,12 +114,13 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, cl += cpuinfo->dcache_block_size) mtspr(SPR_DCBIR, cl); break; + case DMA_BIDIRECTIONAL: + /* Flush the dcache for the requested range */ + for (cl = addr; cl < addr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); + break; default: - /* - * NOTE: If dir == DMA_BIDIRECTIONAL then there's no need to - * flush nor invalidate the cache here as the area will need - * to be manually synced anyway. - */ break; } } -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 01/21] openrisc: dma-mapping: flush bidirectional mappings @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John From: Arnd Bergmann <arnd@arndb.de> The cache management operations on DMA are different from the other architectures: - on DMA_TO_DEVICE, Openrisc currently invalidates the cache after the writeback, where a simple writeback without invalidation should be sufficient. - on DMA_BIDIRECTIONAL, Openrisc does nothing, while most architectures either flush before DMA, or writeback before and invalidate after DMA. The separate invalidation for DMA_BIDIRECTIONAL/DMA_FROM_DEVICE is only required on CPUs that can do speculative prefetches. Change both to have the normal set of operations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/openrisc/kernel/dma.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index b3edbb33b621..91a00d09ffad 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -103,10 +103,10 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, switch (dir) { case DMA_TO_DEVICE: - /* Flush the dcache for the requested range */ + /* Write back the dcache for the requested range */ for (cl = addr; cl < addr + size; cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); + mtspr(SPR_DCBWR, cl); break; case DMA_FROM_DEVICE: /* Invalidate the dcache for the requested range */ @@ -114,12 +114,13 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, cl += cpuinfo->dcache_block_size) mtspr(SPR_DCBIR, cl); break; + case DMA_BIDIRECTIONAL: + /* Flush the dcache for the requested range */ + for (cl = addr; cl < addr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); + break; default: - /* - * NOTE: If dir == DMA_BIDIRECTIONAL then there's no need to - * flush nor invalidate the cache here as the area will need - * to be manually synced anyway. - */ break; } } -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:12 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> xtensa is one of the platforms that has both write-back and write-through caches, and needs to account for both in its DMA mapping operations. It does this through a set of operations that is different from any architecture. This is not a problem by itself, but it makes it rather hard to figure out whether this is correct or not, and to unify this implementation with the others. Change the semantics to the usual ones for non-speculating CPUs: - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the writeback even on writethrough caches, where this is a nop. - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather than afterwards. - On DMA_BIDIRECTIONAL, combine the pre-writeback with the post-invalidate into a call to __flush_invalidate_dcache_range() that turns into a simple invalidate on writeback caches. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +++--- arch/xtensa/kernel/pci-dma.c | 29 +++++----------------------- 3 files changed, 8 insertions(+), 28 deletions(-) diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index bcb0c5d2abc2..b938bacbb9af 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -8,7 +8,6 @@ config XTENSA select ARCH_HAS_DMA_PREP_COHERENT if MMU select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV - select ARCH_HAS_SYNC_DMA_FOR_CPU if MMU select ARCH_HAS_SYNC_DMA_FOR_DEVICE if MMU select ARCH_HAS_DMA_SET_UNCACHED if MMU select ARCH_HAS_STRNCPY_FROM_USER if !KASAN diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h index 7b4359312c25..2f645d25565a 100644 --- a/arch/xtensa/include/asm/cacheflush.h +++ b/arch/xtensa/include/asm/cacheflush.h @@ -61,9 +61,9 @@ static inline void __flush_dcache_page(unsigned long va) static inline void __flush_dcache_range(unsigned long va, unsigned long sz) { } -# define __flush_invalidate_dcache_all() __invalidate_dcache_all() -# define __flush_invalidate_dcache_page(p) __invalidate_dcache_page(p) -# define __flush_invalidate_dcache_range(p,s) __invalidate_dcache_range(p,s) +# define __flush_invalidate_dcache_all __invalidate_dcache_all +# define __flush_invalidate_dcache_page __invalidate_dcache_page +# define __flush_invalidate_dcache_range __invalidate_dcache_range #endif #if defined(CONFIG_MMU) && (DCACHE_WAY_SIZE > PAGE_SIZE) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index 94955caa4488..ff3bf015eca4 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,38 +43,19 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { switch (dir) { - case DMA_BIDIRECTIONAL: + case DMA_TO_DEVICE: + do_cache_op(paddr, size, __flush_dcache_range); + break; case DMA_FROM_DEVICE: do_cache_op(paddr, size, __invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - - default: - break; - } -} - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - if (XCHAL_DCACHE_IS_WRITEBACK) - do_cache_op(paddr, size, __flush_dcache_range); + do_cache_op(paddr, size, __flush_invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - default: break; } -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> xtensa is one of the platforms that has both write-back and write-through caches, and needs to account for both in its DMA mapping operations. It does this through a set of operations that is different from any architecture. This is not a problem by itself, but it makes it rather hard to figure out whether this is correct or not, and to unify this implementation with the others. Change the semantics to the usual ones for non-speculating CPUs: - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the writeback even on writethrough caches, where this is a nop. - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather than afterwards. - On DMA_BIDIRECTIONAL, combine the pre-writeback with the post-invalidate into a call to __flush_invalidate_dcache_range() that turns into a simple invalidate on writeback caches. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +++--- arch/xtensa/kernel/pci-dma.c | 29 +++++----------------------- 3 files changed, 8 insertions(+), 28 deletions(-) diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index bcb0c5d2abc2..b938bacbb9af 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -8,7 +8,6 @@ config XTENSA select ARCH_HAS_DMA_PREP_COHERENT if MMU select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV - select ARCH_HAS_SYNC_DMA_FOR_CPU if MMU select ARCH_HAS_SYNC_DMA_FOR_DEVICE if MMU select ARCH_HAS_DMA_SET_UNCACHED if MMU select ARCH_HAS_STRNCPY_FROM_USER if !KASAN diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h index 7b4359312c25..2f645d25565a 100644 --- a/arch/xtensa/include/asm/cacheflush.h +++ b/arch/xtensa/include/asm/cacheflush.h @@ -61,9 +61,9 @@ static inline void __flush_dcache_page(unsigned long va) static inline void __flush_dcache_range(unsigned long va, unsigned long sz) { } -# define __flush_invalidate_dcache_all() __invalidate_dcache_all() -# define __flush_invalidate_dcache_page(p) __invalidate_dcache_page(p) -# define __flush_invalidate_dcache_range(p,s) __invalidate_dcache_range(p,s) +# define __flush_invalidate_dcache_all __invalidate_dcache_all +# define __flush_invalidate_dcache_page __invalidate_dcache_page +# define __flush_invalidate_dcache_range __invalidate_dcache_range #endif #if defined(CONFIG_MMU) && (DCACHE_WAY_SIZE > PAGE_SIZE) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index 94955caa4488..ff3bf015eca4 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,38 +43,19 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { switch (dir) { - case DMA_BIDIRECTIONAL: + case DMA_TO_DEVICE: + do_cache_op(paddr, size, __flush_dcache_range); + break; case DMA_FROM_DEVICE: do_cache_op(paddr, size, __invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - - default: - break; - } -} - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - if (XCHAL_DCACHE_IS_WRITEBACK) - do_cache_op(paddr, size, __flush_dcache_range); + do_cache_op(paddr, size, __flush_invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - default: break; } -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> xtensa is one of the platforms that has both write-back and write-through caches, and needs to account for both in its DMA mapping operations. It does this through a set of operations that is different from any architecture. This is not a problem by itself, but it makes it rather hard to figure out whether this is correct or not, and to unify this implementation with the others. Change the semantics to the usual ones for non-speculating CPUs: - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the writeback even on writethrough caches, where this is a nop. - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather than afterwards. - On DMA_BIDIRECTIONAL, combine the pre-writeback with the post-invalidate into a call to __flush_invalidate_dcache_range() that turns into a simple invalidate on writeback caches. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +++--- arch/xtensa/kernel/pci-dma.c | 29 +++++----------------------- 3 files changed, 8 insertions(+), 28 deletions(-) diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index bcb0c5d2abc2..b938bacbb9af 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -8,7 +8,6 @@ config XTENSA select ARCH_HAS_DMA_PREP_COHERENT if MMU select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV - select ARCH_HAS_SYNC_DMA_FOR_CPU if MMU select ARCH_HAS_SYNC_DMA_FOR_DEVICE if MMU select ARCH_HAS_DMA_SET_UNCACHED if MMU select ARCH_HAS_STRNCPY_FROM_USER if !KASAN diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h index 7b4359312c25..2f645d25565a 100644 --- a/arch/xtensa/include/asm/cacheflush.h +++ b/arch/xtensa/include/asm/cacheflush.h @@ -61,9 +61,9 @@ static inline void __flush_dcache_page(unsigned long va) static inline void __flush_dcache_range(unsigned long va, unsigned long sz) { } -# define __flush_invalidate_dcache_all() __invalidate_dcache_all() -# define __flush_invalidate_dcache_page(p) __invalidate_dcache_page(p) -# define __flush_invalidate_dcache_range(p,s) __invalidate_dcache_range(p,s) +# define __flush_invalidate_dcache_all __invalidate_dcache_all +# define __flush_invalidate_dcache_page __invalidate_dcache_page +# define __flush_invalidate_dcache_range __invalidate_dcache_range #endif #if defined(CONFIG_MMU) && (DCACHE_WAY_SIZE > PAGE_SIZE) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index 94955caa4488..ff3bf015eca4 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,38 +43,19 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { switch (dir) { - case DMA_BIDIRECTIONAL: + case DMA_TO_DEVICE: + do_cache_op(paddr, size, __flush_dcache_range); + break; case DMA_FROM_DEVICE: do_cache_op(paddr, size, __invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - - default: - break; - } -} - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - if (XCHAL_DCACHE_IS_WRITEBACK) - do_cache_op(paddr, size, __flush_dcache_range); + do_cache_op(paddr, size, __flush_invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - default: break; } -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> xtensa is one of the platforms that has both write-back and write-through caches, and needs to account for both in its DMA mapping operations. It does this through a set of operations that is different from any architecture. This is not a problem by itself, but it makes it rather hard to figure out whether this is correct or not, and to unify this implementation with the others. Change the semantics to the usual ones for non-speculating CPUs: - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the writeback even on writethrough caches, where this is a nop. - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather than afterwards. - On DMA_BIDIRECTIONAL, combine the pre-writeback with the post-invalidate into a call to __flush_invalidate_dcache_range() that turns into a simple invalidate on writeback caches. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +++--- arch/xtensa/kernel/pci-dma.c | 29 +++++----------------------- 3 files changed, 8 insertions(+), 28 deletions(-) diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index bcb0c5d2abc2..b938bacbb9af 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -8,7 +8,6 @@ config XTENSA select ARCH_HAS_DMA_PREP_COHERENT if MMU select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV - select ARCH_HAS_SYNC_DMA_FOR_CPU if MMU select ARCH_HAS_SYNC_DMA_FOR_DEVICE if MMU select ARCH_HAS_DMA_SET_UNCACHED if MMU select ARCH_HAS_STRNCPY_FROM_USER if !KASAN diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h index 7b4359312c25..2f645d25565a 100644 --- a/arch/xtensa/include/asm/cacheflush.h +++ b/arch/xtensa/include/asm/cacheflush.h @@ -61,9 +61,9 @@ static inline void __flush_dcache_page(unsigned long va) static inline void __flush_dcache_range(unsigned long va, unsigned long sz) { } -# define __flush_invalidate_dcache_all() __invalidate_dcache_all() -# define __flush_invalidate_dcache_page(p) __invalidate_dcache_page(p) -# define __flush_invalidate_dcache_range(p,s) __invalidate_dcache_range(p,s) +# define __flush_invalidate_dcache_all __invalidate_dcache_all +# define __flush_invalidate_dcache_page __invalidate_dcache_page +# define __flush_invalidate_dcache_range __invalidate_dcache_range #endif #if defined(CONFIG_MMU) && (DCACHE_WAY_SIZE > PAGE_SIZE) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index 94955caa4488..ff3bf015eca4 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,38 +43,19 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { switch (dir) { - case DMA_BIDIRECTIONAL: + case DMA_TO_DEVICE: + do_cache_op(paddr, size, __flush_dcache_range); + break; case DMA_FROM_DEVICE: do_cache_op(paddr, size, __invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - - default: - break; - } -} - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - if (XCHAL_DCACHE_IS_WRITEBACK) - do_cache_op(paddr, size, __flush_dcache_range); + do_cache_op(paddr, size, __flush_invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - default: break; } -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> xtensa is one of the platforms that has both write-back and write-through caches, and needs to account for both in its DMA mapping operations. It does this through a set of operations that is different from any architecture. This is not a problem by itself, but it makes it rather hard to figure out whether this is correct or not, and to unify this implementation with the others. Change the semantics to the usual ones for non-speculating CPUs: - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the writeback even on writethrough caches, where this is a nop. - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather than afterwards. - On DMA_BIDIRECTIONAL, combine the pre-writeback with the post-invalidate into a call to __flush_invalidate_dcache_range() that turns into a simple invalidate on writeback caches. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +++--- arch/xtensa/kernel/pci-dma.c | 29 +++++----------------------- 3 files changed, 8 insertions(+), 28 deletions(-) diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index bcb0c5d2abc2..b938bacbb9af 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -8,7 +8,6 @@ config XTENSA select ARCH_HAS_DMA_PREP_COHERENT if MMU select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV - select ARCH_HAS_SYNC_DMA_FOR_CPU if MMU select ARCH_HAS_SYNC_DMA_FOR_DEVICE if MMU select ARCH_HAS_DMA_SET_UNCACHED if MMU select ARCH_HAS_STRNCPY_FROM_USER if !KASAN diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h index 7b4359312c25..2f645d25565a 100644 --- a/arch/xtensa/include/asm/cacheflush.h +++ b/arch/xtensa/include/asm/cacheflush.h @@ -61,9 +61,9 @@ static inline void __flush_dcache_page(unsigned long va) static inline void __flush_dcache_range(unsigned long va, unsigned long sz) { } -# define __flush_invalidate_dcache_all() __invalidate_dcache_all() -# define __flush_invalidate_dcache_page(p) __invalidate_dcache_page(p) -# define __flush_invalidate_dcache_range(p,s) __invalidate_dcache_range(p,s) +# define __flush_invalidate_dcache_all __invalidate_dcache_all +# define __flush_invalidate_dcache_page __invalidate_dcache_page +# define __flush_invalidate_dcache_range __invalidate_dcache_range #endif #if defined(CONFIG_MMU) && (DCACHE_WAY_SIZE > PAGE_SIZE) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index 94955caa4488..ff3bf015eca4 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,38 +43,19 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { switch (dir) { - case DMA_BIDIRECTIONAL: + case DMA_TO_DEVICE: + do_cache_op(paddr, size, __flush_dcache_range); + break; case DMA_FROM_DEVICE: do_cache_op(paddr, size, __invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - - default: - break; - } -} - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - if (XCHAL_DCACHE_IS_WRITEBACK) - do_cache_op(paddr, size, __flush_dcache_range); + do_cache_op(paddr, size, __flush_invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - default: break; } -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John From: Arnd Bergmann <arnd@arndb.de> xtensa is one of the platforms that has both write-back and write-through caches, and needs to account for both in its DMA mapping operations. It does this through a set of operations that is different from any architecture. This is not a problem by itself, but it makes it rather hard to figure out whether this is correct or not, and to unify this implementation with the others. Change the semantics to the usual ones for non-speculating CPUs: - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the writeback even on writethrough caches, where this is a nop. - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather than afterwards. - On DMA_BIDIRECTIONAL, combine the pre-writeback with the post-invalidate into a call to __flush_invalidate_dcache_range() that turns into a simple invalidate on writeback caches. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +++--- arch/xtensa/kernel/pci-dma.c | 29 +++++----------------------- 3 files changed, 8 insertions(+), 28 deletions(-) diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index bcb0c5d2abc2..b938bacbb9af 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -8,7 +8,6 @@ config XTENSA select ARCH_HAS_DMA_PREP_COHERENT if MMU select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV - select ARCH_HAS_SYNC_DMA_FOR_CPU if MMU select ARCH_HAS_SYNC_DMA_FOR_DEVICE if MMU select ARCH_HAS_DMA_SET_UNCACHED if MMU select ARCH_HAS_STRNCPY_FROM_USER if !KASAN diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h index 7b4359312c25..2f645d25565a 100644 --- a/arch/xtensa/include/asm/cacheflush.h +++ b/arch/xtensa/include/asm/cacheflush.h @@ -61,9 +61,9 @@ static inline void __flush_dcache_page(unsigned long va) static inline void __flush_dcache_range(unsigned long va, unsigned long sz) { } -# define __flush_invalidate_dcache_all() __invalidate_dcache_all() -# define __flush_invalidate_dcache_page(p) __invalidate_dcache_page(p) -# define __flush_invalidate_dcache_range(p,s) __invalidate_dcache_range(p,s) +# define __flush_invalidate_dcache_all __invalidate_dcache_all +# define __flush_invalidate_dcache_page __invalidate_dcache_page +# define __flush_invalidate_dcache_range __invalidate_dcache_range #endif #if defined(CONFIG_MMU) && (DCACHE_WAY_SIZE > PAGE_SIZE) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index 94955caa4488..ff3bf015eca4 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,38 +43,19 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { switch (dir) { - case DMA_BIDIRECTIONAL: + case DMA_TO_DEVICE: + do_cache_op(paddr, size, __flush_dcache_range); + break; case DMA_FROM_DEVICE: do_cache_op(paddr, size, __invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - - default: - break; - } -} - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - if (XCHAL_DCACHE_IS_WRITEBACK) - do_cache_op(paddr, size, __flush_dcache_range); + do_cache_op(paddr, size, __flush_invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - default: break; } -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 15:42 ` Max Filippov -1 siblings, 0 replies; 456+ messages in thread From: Max Filippov @ 2023-03-27 15:42 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 5:14 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > xtensa is one of the platforms that has both write-back and write-through > caches, and needs to account for both in its DMA mapping operations. > > It does this through a set of operations that is different from any > architecture. This is not a problem by itself, but it makes it rather > hard to figure out whether this is correct or not, and to unify this > implementation with the others. > > Change the semantics to the usual ones for non-speculating CPUs: > > - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the > writeback even on writethrough caches, where this is a nop. > > - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather > than afterwards. > > - On DMA_BIDIRECTIONAL, combine the pre-writeback with the > post-invalidate into a call to __flush_invalidate_dcache_range() > that turns into a simple invalidate on writeback caches. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +++--- > arch/xtensa/kernel/pci-dma.c | 29 +++++----------------------- > 3 files changed, 8 insertions(+), 28 deletions(-) Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> -- Thanks. -- Max ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules @ 2023-03-27 15:42 ` Max Filippov 0 siblings, 0 replies; 456+ messages in thread From: Max Filippov @ 2023-03-27 15:42 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 5:14 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > xtensa is one of the platforms that has both write-back and write-through > caches, and needs to account for both in its DMA mapping operations. > > It does this through a set of operations that is different from any > architecture. This is not a problem by itself, but it makes it rather > hard to figure out whether this is correct or not, and to unify this > implementation with the others. > > Change the semantics to the usual ones for non-speculating CPUs: > > - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the > writeback even on writethrough caches, where this is a nop. > > - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather > than afterwards. > > - On DMA_BIDIRECTIONAL, combine the pre-writeback with the > post-invalidate into a call to __flush_invalidate_dcache_range() > that turns into a simple invalidate on writeback caches. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +++--- > arch/xtensa/kernel/pci-dma.c | 29 +++++----------------------- > 3 files changed, 8 insertions(+), 28 deletions(-) Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> -- Thanks. -- Max _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules @ 2023-03-27 15:42 ` Max Filippov 0 siblings, 0 replies; 456+ messages in thread From: Max Filippov @ 2023-03-27 15:42 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, linux-openrisc, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller On Mon, Mar 27, 2023 at 5:14 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > xtensa is one of the platforms that has both write-back and write-through > caches, and needs to account for both in its DMA mapping operations. > > It does this through a set of operations that is different from any > architecture. This is not a problem by itself, but it makes it rather > hard to figure out whether this is correct or not, and to unify this > implementation with the others. > > Change the semantics to the usual ones for non-speculating CPUs: > > - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the > writeback even on writethrough caches, where this is a nop. > > - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather > than afterwards. > > - On DMA_BIDIRECTIONAL, combine the pre-writeback with the > post-invalidate into a call to __flush_invalidate_dcache_range() > that turns into a simple invalidate on writeback caches. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +++--- > arch/xtensa/kernel/pci-dma.c | 29 +++++----------------------- > 3 files changed, 8 insertions(+), 28 deletions(-) Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> -- Thanks. -- Max ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules @ 2023-03-27 15:42 ` Max Filippov 0 siblings, 0 replies; 456+ messages in thread From: Max Filippov @ 2023-03-27 15:42 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 5:14 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > xtensa is one of the platforms that has both write-back and write-through > caches, and needs to account for both in its DMA mapping operations. > > It does this through a set of operations that is different from any > architecture. This is not a problem by itself, but it makes it rather > hard to figure out whether this is correct or not, and to unify this > implementation with the others. > > Change the semantics to the usual ones for non-speculating CPUs: > > - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the > writeback even on writethrough caches, where this is a nop. > > - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather > than afterwards. > > - On DMA_BIDIRECTIONAL, combine the pre-writeback with the > post-invalidate into a call to __flush_invalidate_dcache_range() > that turns into a simple invalidate on writeback caches. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +++--- > arch/xtensa/kernel/pci-dma.c | 29 +++++----------------------- > 3 files changed, 8 insertions(+), 28 deletions(-) Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> -- Thanks. -- Max _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules @ 2023-03-27 15:42 ` Max Filippov 0 siblings, 0 replies; 456+ messages in thread From: Max Filippov @ 2023-03-27 15:42 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 5:14 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > xtensa is one of the platforms that has both write-back and write-through > caches, and needs to account for both in its DMA mapping operations. > > It does this through a set of operations that is different from any > architecture. This is not a problem by itself, but it makes it rather > hard to figure out whether this is correct or not, and to unify this > implementation with the others. > > Change the semantics to the usual ones for non-speculating CPUs: > > - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the > writeback even on writethrough caches, where this is a nop. > > - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather > than afterwards. > > - On DMA_BIDIRECTIONAL, combine the pre-writeback with the > post-invalidate into a call to __flush_invalidate_dcache_range() > that turns into a simple invalidate on writeback caches. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +++--- > arch/xtensa/kernel/pci-dma.c | 29 +++++----------------------- > 3 files changed, 8 insertions(+), 28 deletions(-) Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> -- Thanks. -- Max _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules @ 2023-03-27 15:42 ` Max Filippov 0 siblings, 0 replies; 456+ messages in thread From: Max Filippov @ 2023-03-27 15:42 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, C On Mon, Mar 27, 2023 at 5:14 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > xtensa is one of the platforms that has both write-back and write-through > caches, and needs to account for both in its DMA mapping operations. > > It does this through a set of operations that is different from any > architecture. This is not a problem by itself, but it makes it rather > hard to figure out whether this is correct or not, and to unify this > implementation with the others. > > Change the semantics to the usual ones for non-speculating CPUs: > > - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the > writeback even on writethrough caches, where this is a nop. > > - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather > than afterwards. > > - On DMA_BIDIRECTIONAL, combine the pre-writeback with the > post-invalidate into a call to __flush_invalidate_dcache_range() > that turns into a simple invalidate on writeback caches. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +++--- > arch/xtensa/kernel/pci-dma.c | 29 +++++----------------------- > 3 files changed, 8 insertions(+), 28 deletions(-) Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> -- Thanks. -- Max _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 03/21] sparc32: flush caches in dma_sync_*for_device 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:12 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Leon has a very minimalistic cache that has no range operations and requires being flushed entirely to deal with noncoherent DMA. Most in-order architectures do their cache management in the dma_sync_*for_device() operations rather than dma_sync_*for_cpu. Since the cache is write-through only, both should have the same effect, so change it for consistency with the other architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 84437a4c6545..637da50e236c 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -51,7 +51,7 @@ config SPARC config SPARC32 def_bool !64BIT select ARCH_32BIT_OFF_T - select ARCH_HAS_SYNC_DMA_FOR_CPU + select ARCH_HAS_SYNC_DMA_FOR_DEVICE select CLZ_TAB select DMA_DIRECT_REMAP select GENERIC_ATOMIC64 diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4e4f3d3263e4..4f3d26066ec2 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -306,7 +306,7 @@ arch_initcall(sparc_register_ioport); * On LEON systems without cache snooping, the entire D-CACHE must be flushed to * make DMA to cacheable memory coherent. */ -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { if (dir != DMA_TO_DEVICE && -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 03/21] sparc32: flush caches in dma_sync_*for_device @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Leon has a very minimalistic cache that has no range operations and requires being flushed entirely to deal with noncoherent DMA. Most in-order architectures do their cache management in the dma_sync_*for_device() operations rather than dma_sync_*for_cpu. Since the cache is write-through only, both should have the same effect, so change it for consistency with the other architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 84437a4c6545..637da50e236c 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -51,7 +51,7 @@ config SPARC config SPARC32 def_bool !64BIT select ARCH_32BIT_OFF_T - select ARCH_HAS_SYNC_DMA_FOR_CPU + select ARCH_HAS_SYNC_DMA_FOR_DEVICE select CLZ_TAB select DMA_DIRECT_REMAP select GENERIC_ATOMIC64 diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4e4f3d3263e4..4f3d26066ec2 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -306,7 +306,7 @@ arch_initcall(sparc_register_ioport); * On LEON systems without cache snooping, the entire D-CACHE must be flushed to * make DMA to cacheable memory coherent. */ -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { if (dir != DMA_TO_DEVICE && -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 03/21] sparc32: flush caches in dma_sync_*for_device @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> Leon has a very minimalistic cache that has no range operations and requires being flushed entirely to deal with noncoherent DMA. Most in-order architectures do their cache management in the dma_sync_*for_device() operations rather than dma_sync_*for_cpu. Since the cache is write-through only, both should have the same effect, so change it for consistency with the other architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 84437a4c6545..637da50e236c 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -51,7 +51,7 @@ config SPARC config SPARC32 def_bool !64BIT select ARCH_32BIT_OFF_T - select ARCH_HAS_SYNC_DMA_FOR_CPU + select ARCH_HAS_SYNC_DMA_FOR_DEVICE select CLZ_TAB select DMA_DIRECT_REMAP select GENERIC_ATOMIC64 diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4e4f3d3263e4..4f3d26066ec2 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -306,7 +306,7 @@ arch_initcall(sparc_register_ioport); * On LEON systems without cache snooping, the entire D-CACHE must be flushed to * make DMA to cacheable memory coherent. */ -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { if (dir != DMA_TO_DEVICE && -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 03/21] sparc32: flush caches in dma_sync_*for_device @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Leon has a very minimalistic cache that has no range operations and requires being flushed entirely to deal with noncoherent DMA. Most in-order architectures do their cache management in the dma_sync_*for_device() operations rather than dma_sync_*for_cpu. Since the cache is write-through only, both should have the same effect, so change it for consistency with the other architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 84437a4c6545..637da50e236c 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -51,7 +51,7 @@ config SPARC config SPARC32 def_bool !64BIT select ARCH_32BIT_OFF_T - select ARCH_HAS_SYNC_DMA_FOR_CPU + select ARCH_HAS_SYNC_DMA_FOR_DEVICE select CLZ_TAB select DMA_DIRECT_REMAP select GENERIC_ATOMIC64 diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4e4f3d3263e4..4f3d26066ec2 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -306,7 +306,7 @@ arch_initcall(sparc_register_ioport); * On LEON systems without cache snooping, the entire D-CACHE must be flushed to * make DMA to cacheable memory coherent. */ -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { if (dir != DMA_TO_DEVICE && -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 03/21] sparc32: flush caches in dma_sync_*for_device @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Leon has a very minimalistic cache that has no range operations and requires being flushed entirely to deal with noncoherent DMA. Most in-order architectures do their cache management in the dma_sync_*for_device() operations rather than dma_sync_*for_cpu. Since the cache is write-through only, both should have the same effect, so change it for consistency with the other architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 84437a4c6545..637da50e236c 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -51,7 +51,7 @@ config SPARC config SPARC32 def_bool !64BIT select ARCH_32BIT_OFF_T - select ARCH_HAS_SYNC_DMA_FOR_CPU + select ARCH_HAS_SYNC_DMA_FOR_DEVICE select CLZ_TAB select DMA_DIRECT_REMAP select GENERIC_ATOMIC64 diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4e4f3d3263e4..4f3d26066ec2 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -306,7 +306,7 @@ arch_initcall(sparc_register_ioport); * On LEON systems without cache snooping, the entire D-CACHE must be flushed to * make DMA to cacheable memory coherent. */ -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { if (dir != DMA_TO_DEVICE && -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 03/21] sparc32: flush caches in dma_sync_*for_device @ 2023-03-27 12:12 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:12 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John From: Arnd Bergmann <arnd@arndb.de> Leon has a very minimalistic cache that has no range operations and requires being flushed entirely to deal with noncoherent DMA. Most in-order architectures do their cache management in the dma_sync_*for_device() operations rather than dma_sync_*for_cpu. Since the cache is write-through only, both should have the same effect, so change it for consistency with the other architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 84437a4c6545..637da50e236c 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -51,7 +51,7 @@ config SPARC config SPARC32 def_bool !64BIT select ARCH_32BIT_OFF_T - select ARCH_HAS_SYNC_DMA_FOR_CPU + select ARCH_HAS_SYNC_DMA_FOR_DEVICE select CLZ_TAB select DMA_DIRECT_REMAP select GENERIC_ATOMIC64 diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4e4f3d3263e4..4f3d26066ec2 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -306,7 +306,7 @@ arch_initcall(sparc_register_ioport); * On LEON systems without cache snooping, the entire D-CACHE must be flushed to * make DMA to cacheable memory coherent. */ -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { if (dir != DMA_TO_DEVICE && -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 04/21] microblaze: dma-mapping: skip extra DMA flushes 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The microblaze dma_sync_* implementation uses the same function for both _for_cpu() and _for_device(), which is inconsistent with other architectures and slightly more expensive. Split it up into separate functions and skip the parts that are not needed: - on dma_sync_*_for_cpu(..., DMA_TO_DEVICE), skip the second writeback, which does nothing. - on dma_sync_*_for_cpu(..., DMA_BIDIRECTIONAL), only invalidate the cache to clear out cache lines that got loaded speculatively, but skip the extraneous writeback. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/microblaze/kernel/dma.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index 04d091ade417..b4c4e45fd45e 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,8 +14,8 @@ #include <linux/bug.h> #include <asm/cacheflush.h> -static void __dma_sync(phys_addr_t paddr, size_t size, - enum dma_data_direction direction) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { switch (direction) { case DMA_TO_DEVICE: @@ -30,14 +30,16 @@ static void __dma_sync(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_sync(paddr, size, dir); -} - void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync(paddr, size, dir); -} + switch (direction) { + case DMA_TO_DEVICE: + break; + case DMA_BIDIRECTIONAL: + case DMA_FROM_DEVICE: + invalidate_dcache_range(paddr, paddr + size); + break; + default: + BUG(); + }} -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 04/21] microblaze: dma-mapping: skip extra DMA flushes @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The microblaze dma_sync_* implementation uses the same function for both _for_cpu() and _for_device(), which is inconsistent with other architectures and slightly more expensive. Split it up into separate functions and skip the parts that are not needed: - on dma_sync_*_for_cpu(..., DMA_TO_DEVICE), skip the second writeback, which does nothing. - on dma_sync_*_for_cpu(..., DMA_BIDIRECTIONAL), only invalidate the cache to clear out cache lines that got loaded speculatively, but skip the extraneous writeback. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/microblaze/kernel/dma.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index 04d091ade417..b4c4e45fd45e 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,8 +14,8 @@ #include <linux/bug.h> #include <asm/cacheflush.h> -static void __dma_sync(phys_addr_t paddr, size_t size, - enum dma_data_direction direction) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { switch (direction) { case DMA_TO_DEVICE: @@ -30,14 +30,16 @@ static void __dma_sync(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_sync(paddr, size, dir); -} - void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync(paddr, size, dir); -} + switch (direction) { + case DMA_TO_DEVICE: + break; + case DMA_BIDIRECTIONAL: + case DMA_FROM_DEVICE: + invalidate_dcache_range(paddr, paddr + size); + break; + default: + BUG(); + }} -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 04/21] microblaze: dma-mapping: skip extra DMA flushes @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> The microblaze dma_sync_* implementation uses the same function for both _for_cpu() and _for_device(), which is inconsistent with other architectures and slightly more expensive. Split it up into separate functions and skip the parts that are not needed: - on dma_sync_*_for_cpu(..., DMA_TO_DEVICE), skip the second writeback, which does nothing. - on dma_sync_*_for_cpu(..., DMA_BIDIRECTIONAL), only invalidate the cache to clear out cache lines that got loaded speculatively, but skip the extraneous writeback. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/microblaze/kernel/dma.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index 04d091ade417..b4c4e45fd45e 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,8 +14,8 @@ #include <linux/bug.h> #include <asm/cacheflush.h> -static void __dma_sync(phys_addr_t paddr, size_t size, - enum dma_data_direction direction) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { switch (direction) { case DMA_TO_DEVICE: @@ -30,14 +30,16 @@ static void __dma_sync(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_sync(paddr, size, dir); -} - void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync(paddr, size, dir); -} + switch (direction) { + case DMA_TO_DEVICE: + break; + case DMA_BIDIRECTIONAL: + case DMA_FROM_DEVICE: + invalidate_dcache_range(paddr, paddr + size); + break; + default: + BUG(); + }} -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 04/21] microblaze: dma-mapping: skip extra DMA flushes @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The microblaze dma_sync_* implementation uses the same function for both _for_cpu() and _for_device(), which is inconsistent with other architectures and slightly more expensive. Split it up into separate functions and skip the parts that are not needed: - on dma_sync_*_for_cpu(..., DMA_TO_DEVICE), skip the second writeback, which does nothing. - on dma_sync_*_for_cpu(..., DMA_BIDIRECTIONAL), only invalidate the cache to clear out cache lines that got loaded speculatively, but skip the extraneous writeback. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/microblaze/kernel/dma.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index 04d091ade417..b4c4e45fd45e 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,8 +14,8 @@ #include <linux/bug.h> #include <asm/cacheflush.h> -static void __dma_sync(phys_addr_t paddr, size_t size, - enum dma_data_direction direction) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { switch (direction) { case DMA_TO_DEVICE: @@ -30,14 +30,16 @@ static void __dma_sync(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_sync(paddr, size, dir); -} - void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync(paddr, size, dir); -} + switch (direction) { + case DMA_TO_DEVICE: + break; + case DMA_BIDIRECTIONAL: + case DMA_FROM_DEVICE: + invalidate_dcache_range(paddr, paddr + size); + break; + default: + BUG(); + }} -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 04/21] microblaze: dma-mapping: skip extra DMA flushes @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The microblaze dma_sync_* implementation uses the same function for both _for_cpu() and _for_device(), which is inconsistent with other architectures and slightly more expensive. Split it up into separate functions and skip the parts that are not needed: - on dma_sync_*_for_cpu(..., DMA_TO_DEVICE), skip the second writeback, which does nothing. - on dma_sync_*_for_cpu(..., DMA_BIDIRECTIONAL), only invalidate the cache to clear out cache lines that got loaded speculatively, but skip the extraneous writeback. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/microblaze/kernel/dma.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index 04d091ade417..b4c4e45fd45e 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,8 +14,8 @@ #include <linux/bug.h> #include <asm/cacheflush.h> -static void __dma_sync(phys_addr_t paddr, size_t size, - enum dma_data_direction direction) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { switch (direction) { case DMA_TO_DEVICE: @@ -30,14 +30,16 @@ static void __dma_sync(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_sync(paddr, size, dir); -} - void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync(paddr, size, dir); -} + switch (direction) { + case DMA_TO_DEVICE: + break; + case DMA_BIDIRECTIONAL: + case DMA_FROM_DEVICE: + invalidate_dcache_range(paddr, paddr + size); + break; + default: + BUG(); + }} -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 04/21] microblaze: dma-mapping: skip extra DMA flushes @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John From: Arnd Bergmann <arnd@arndb.de> The microblaze dma_sync_* implementation uses the same function for both _for_cpu() and _for_device(), which is inconsistent with other architectures and slightly more expensive. Split it up into separate functions and skip the parts that are not needed: - on dma_sync_*_for_cpu(..., DMA_TO_DEVICE), skip the second writeback, which does nothing. - on dma_sync_*_for_cpu(..., DMA_BIDIRECTIONAL), only invalidate the cache to clear out cache lines that got loaded speculatively, but skip the extraneous writeback. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/microblaze/kernel/dma.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index 04d091ade417..b4c4e45fd45e 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,8 +14,8 @@ #include <linux/bug.h> #include <asm/cacheflush.h> -static void __dma_sync(phys_addr_t paddr, size_t size, - enum dma_data_direction direction) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { switch (direction) { case DMA_TO_DEVICE: @@ -30,14 +30,16 @@ static void __dma_sync(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_sync(paddr, size, dir); -} - void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync(paddr, size, dir); -} + switch (direction) { + case DMA_TO_DEVICE: + break; + case DMA_BIDIRECTIONAL: + case DMA_FROM_DEVICE: + invalidate_dcache_range(paddr, paddr + size); + break; + default: + BUG(); + }} -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 05/21] powerpc: dma-mapping: split out cache operation logic 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The powerpc arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave differently from all other architectures, at least for some of the operations. As a preparation for making the behavior more consistent, reorder the logic in which they decide whether to flush, invalidate or clean the. No change in behavior is intended. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 91 +++++++++++++++++++++---------- 1 file changed, 63 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 30260b5d146d..f10869d27de5 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -16,31 +16,28 @@ #include <asm/tlbflush.h> #include <asm/dma.h> +enum dma_cache_op { + DMA_CACHE_CLEAN, + DMA_CACHE_INVAL, + DMA_CACHE_FLUSH, +}; + /* * make an area consistent. */ -static void __dma_sync(void *vaddr, size_t size, int direction) +static void __dma_op(void *vaddr, size_t size, enum dma_cache_op op) { unsigned long start = (unsigned long)vaddr; unsigned long end = start + size; - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - flush_dcache_range(start, end); - else - invalidate_dcache_range(start, end); - break; - case DMA_TO_DEVICE: /* writeback only */ + switch (op) { + case DMA_CACHE_CLEAN: clean_dcache_range(start, end); break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + case DMA_CACHE_INVAL: + invalidate_dcache_range(start, end); + break; + case DMA_CACHE_FLUSH: flush_dcache_range(start, end); break; } @@ -48,16 +45,16 @@ static void __dma_sync(void *vaddr, size_t size, int direction) #ifdef CONFIG_HIGHMEM /* - * __dma_sync_page() implementation for systems using highmem. + * __dma_highmem_op() implementation for systems using highmem. * In this case, each page of a buffer must be kmapped/kunmapped - * in order to have a virtual address for __dma_sync(). This must + * in order to have a virtual address for __dma_op(). This must * not sleep so kmap_atomic()/kunmap_atomic() are used. * * Note: yes, it is possible and correct to have a buffer extend * beyond the first page. */ -static inline void __dma_sync_page_highmem(struct page *page, - unsigned long offset, size_t size, int direction) +static inline void __dma_highmem_op(struct page *page, + unsigned long offset, size_t size, enum dma_cache_op op) { size_t seg_size = min((size_t)(PAGE_SIZE - offset), size); size_t cur_size = seg_size; @@ -71,7 +68,7 @@ static inline void __dma_sync_page_highmem(struct page *page, start = (unsigned long)kmap_atomic(page + seg_nr) + seg_offset; /* Sync this buffer segment */ - __dma_sync((void *)start, seg_size, direction); + __dma_op((void *)start, seg_size, op); kunmap_atomic((void *)start); seg_nr++; @@ -88,32 +85,70 @@ static inline void __dma_sync_page_highmem(struct page *page, #endif /* CONFIG_HIGHMEM */ /* - * __dma_sync_page makes memory consistent. identical to __dma_sync, but - * takes a struct page instead of a virtual address + * __dma_phys_op makes memory consistent. identical to __dma_op, but + * takes a phys_addr_t instead of a virtual address */ -static void __dma_sync_page(phys_addr_t paddr, size_t size, int dir) +static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned offset = paddr & ~PAGE_MASK; #ifdef CONFIG_HIGHMEM - __dma_sync_page_highmem(page, offset, size, dir); + __dma_highmem_op(page, offset, size, op); #else unsigned long start = (unsigned long)page_address(page) + offset; - __dma_sync((void *)start, size, dir); + __dma_op((void *)start, size, op); #endif } void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* + * invalidate only when cache-line aligned otherwise there is + * the potential for discarding uncommitted data from the cache + */ + if ((start | end) & (L1_CACHE_BYTES - 1)) + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + else + __dma_phys_op(start, end, DMA_CACHE_INVAL); + break; + case DMA_TO_DEVICE: /* writeback only */ + __dma_phys_op(start, end, DMA_CACHE_CLEAN); + break; + case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* + * invalidate only when cache-line aligned otherwise there is + * the potential for discarding uncommitted data from the cache + */ + if ((start | end) & (L1_CACHE_BYTES - 1)) + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + else + __dma_phys_op(start, end, DMA_CACHE_INVAL); + break; + case DMA_TO_DEVICE: /* writeback only */ + __dma_phys_op(start, end, DMA_CACHE_CLEAN); + break; + case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + break; + } } void arch_dma_prep_coherent(struct page *page, size_t size) -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 05/21] powerpc: dma-mapping: split out cache operation logic @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The powerpc arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave differently from all other architectures, at least for some of the operations. As a preparation for making the behavior more consistent, reorder the logic in which they decide whether to flush, invalidate or clean the. No change in behavior is intended. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 91 +++++++++++++++++++++---------- 1 file changed, 63 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 30260b5d146d..f10869d27de5 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -16,31 +16,28 @@ #include <asm/tlbflush.h> #include <asm/dma.h> +enum dma_cache_op { + DMA_CACHE_CLEAN, + DMA_CACHE_INVAL, + DMA_CACHE_FLUSH, +}; + /* * make an area consistent. */ -static void __dma_sync(void *vaddr, size_t size, int direction) +static void __dma_op(void *vaddr, size_t size, enum dma_cache_op op) { unsigned long start = (unsigned long)vaddr; unsigned long end = start + size; - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - flush_dcache_range(start, end); - else - invalidate_dcache_range(start, end); - break; - case DMA_TO_DEVICE: /* writeback only */ + switch (op) { + case DMA_CACHE_CLEAN: clean_dcache_range(start, end); break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + case DMA_CACHE_INVAL: + invalidate_dcache_range(start, end); + break; + case DMA_CACHE_FLUSH: flush_dcache_range(start, end); break; } @@ -48,16 +45,16 @@ static void __dma_sync(void *vaddr, size_t size, int direction) #ifdef CONFIG_HIGHMEM /* - * __dma_sync_page() implementation for systems using highmem. + * __dma_highmem_op() implementation for systems using highmem. * In this case, each page of a buffer must be kmapped/kunmapped - * in order to have a virtual address for __dma_sync(). This must + * in order to have a virtual address for __dma_op(). This must * not sleep so kmap_atomic()/kunmap_atomic() are used. * * Note: yes, it is possible and correct to have a buffer extend * beyond the first page. */ -static inline void __dma_sync_page_highmem(struct page *page, - unsigned long offset, size_t size, int direction) +static inline void __dma_highmem_op(struct page *page, + unsigned long offset, size_t size, enum dma_cache_op op) { size_t seg_size = min((size_t)(PAGE_SIZE - offset), size); size_t cur_size = seg_size; @@ -71,7 +68,7 @@ static inline void __dma_sync_page_highmem(struct page *page, start = (unsigned long)kmap_atomic(page + seg_nr) + seg_offset; /* Sync this buffer segment */ - __dma_sync((void *)start, seg_size, direction); + __dma_op((void *)start, seg_size, op); kunmap_atomic((void *)start); seg_nr++; @@ -88,32 +85,70 @@ static inline void __dma_sync_page_highmem(struct page *page, #endif /* CONFIG_HIGHMEM */ /* - * __dma_sync_page makes memory consistent. identical to __dma_sync, but - * takes a struct page instead of a virtual address + * __dma_phys_op makes memory consistent. identical to __dma_op, but + * takes a phys_addr_t instead of a virtual address */ -static void __dma_sync_page(phys_addr_t paddr, size_t size, int dir) +static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned offset = paddr & ~PAGE_MASK; #ifdef CONFIG_HIGHMEM - __dma_sync_page_highmem(page, offset, size, dir); + __dma_highmem_op(page, offset, size, op); #else unsigned long start = (unsigned long)page_address(page) + offset; - __dma_sync((void *)start, size, dir); + __dma_op((void *)start, size, op); #endif } void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* + * invalidate only when cache-line aligned otherwise there is + * the potential for discarding uncommitted data from the cache + */ + if ((start | end) & (L1_CACHE_BYTES - 1)) + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + else + __dma_phys_op(start, end, DMA_CACHE_INVAL); + break; + case DMA_TO_DEVICE: /* writeback only */ + __dma_phys_op(start, end, DMA_CACHE_CLEAN); + break; + case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* + * invalidate only when cache-line aligned otherwise there is + * the potential for discarding uncommitted data from the cache + */ + if ((start | end) & (L1_CACHE_BYTES - 1)) + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + else + __dma_phys_op(start, end, DMA_CACHE_INVAL); + break; + case DMA_TO_DEVICE: /* writeback only */ + __dma_phys_op(start, end, DMA_CACHE_CLEAN); + break; + case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + break; + } } void arch_dma_prep_coherent(struct page *page, size_t size) -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 05/21] powerpc: dma-mapping: split out cache operation logic @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> The powerpc arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave differently from all other architectures, at least for some of the operations. As a preparation for making the behavior more consistent, reorder the logic in which they decide whether to flush, invalidate or clean the. No change in behavior is intended. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 91 +++++++++++++++++++++---------- 1 file changed, 63 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 30260b5d146d..f10869d27de5 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -16,31 +16,28 @@ #include <asm/tlbflush.h> #include <asm/dma.h> +enum dma_cache_op { + DMA_CACHE_CLEAN, + DMA_CACHE_INVAL, + DMA_CACHE_FLUSH, +}; + /* * make an area consistent. */ -static void __dma_sync(void *vaddr, size_t size, int direction) +static void __dma_op(void *vaddr, size_t size, enum dma_cache_op op) { unsigned long start = (unsigned long)vaddr; unsigned long end = start + size; - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - flush_dcache_range(start, end); - else - invalidate_dcache_range(start, end); - break; - case DMA_TO_DEVICE: /* writeback only */ + switch (op) { + case DMA_CACHE_CLEAN: clean_dcache_range(start, end); break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + case DMA_CACHE_INVAL: + invalidate_dcache_range(start, end); + break; + case DMA_CACHE_FLUSH: flush_dcache_range(start, end); break; } @@ -48,16 +45,16 @@ static void __dma_sync(void *vaddr, size_t size, int direction) #ifdef CONFIG_HIGHMEM /* - * __dma_sync_page() implementation for systems using highmem. + * __dma_highmem_op() implementation for systems using highmem. * In this case, each page of a buffer must be kmapped/kunmapped - * in order to have a virtual address for __dma_sync(). This must + * in order to have a virtual address for __dma_op(). This must * not sleep so kmap_atomic()/kunmap_atomic() are used. * * Note: yes, it is possible and correct to have a buffer extend * beyond the first page. */ -static inline void __dma_sync_page_highmem(struct page *page, - unsigned long offset, size_t size, int direction) +static inline void __dma_highmem_op(struct page *page, + unsigned long offset, size_t size, enum dma_cache_op op) { size_t seg_size = min((size_t)(PAGE_SIZE - offset), size); size_t cur_size = seg_size; @@ -71,7 +68,7 @@ static inline void __dma_sync_page_highmem(struct page *page, start = (unsigned long)kmap_atomic(page + seg_nr) + seg_offset; /* Sync this buffer segment */ - __dma_sync((void *)start, seg_size, direction); + __dma_op((void *)start, seg_size, op); kunmap_atomic((void *)start); seg_nr++; @@ -88,32 +85,70 @@ static inline void __dma_sync_page_highmem(struct page *page, #endif /* CONFIG_HIGHMEM */ /* - * __dma_sync_page makes memory consistent. identical to __dma_sync, but - * takes a struct page instead of a virtual address + * __dma_phys_op makes memory consistent. identical to __dma_op, but + * takes a phys_addr_t instead of a virtual address */ -static void __dma_sync_page(phys_addr_t paddr, size_t size, int dir) +static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned offset = paddr & ~PAGE_MASK; #ifdef CONFIG_HIGHMEM - __dma_sync_page_highmem(page, offset, size, dir); + __dma_highmem_op(page, offset, size, op); #else unsigned long start = (unsigned long)page_address(page) + offset; - __dma_sync((void *)start, size, dir); + __dma_op((void *)start, size, op); #endif } void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* + * invalidate only when cache-line aligned otherwise there is + * the potential for discarding uncommitted data from the cache + */ + if ((start | end) & (L1_CACHE_BYTES - 1)) + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + else + __dma_phys_op(start, end, DMA_CACHE_INVAL); + break; + case DMA_TO_DEVICE: /* writeback only */ + __dma_phys_op(start, end, DMA_CACHE_CLEAN); + break; + case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* + * invalidate only when cache-line aligned otherwise there is + * the potential for discarding uncommitted data from the cache + */ + if ((start | end) & (L1_CACHE_BYTES - 1)) + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + else + __dma_phys_op(start, end, DMA_CACHE_INVAL); + break; + case DMA_TO_DEVICE: /* writeback only */ + __dma_phys_op(start, end, DMA_CACHE_CLEAN); + break; + case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + break; + } } void arch_dma_prep_coherent(struct page *page, size_t size) -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 05/21] powerpc: dma-mapping: split out cache operation logic @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The powerpc arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave differently from all other architectures, at least for some of the operations. As a preparation for making the behavior more consistent, reorder the logic in which they decide whether to flush, invalidate or clean the. No change in behavior is intended. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 91 +++++++++++++++++++++---------- 1 file changed, 63 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 30260b5d146d..f10869d27de5 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -16,31 +16,28 @@ #include <asm/tlbflush.h> #include <asm/dma.h> +enum dma_cache_op { + DMA_CACHE_CLEAN, + DMA_CACHE_INVAL, + DMA_CACHE_FLUSH, +}; + /* * make an area consistent. */ -static void __dma_sync(void *vaddr, size_t size, int direction) +static void __dma_op(void *vaddr, size_t size, enum dma_cache_op op) { unsigned long start = (unsigned long)vaddr; unsigned long end = start + size; - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - flush_dcache_range(start, end); - else - invalidate_dcache_range(start, end); - break; - case DMA_TO_DEVICE: /* writeback only */ + switch (op) { + case DMA_CACHE_CLEAN: clean_dcache_range(start, end); break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + case DMA_CACHE_INVAL: + invalidate_dcache_range(start, end); + break; + case DMA_CACHE_FLUSH: flush_dcache_range(start, end); break; } @@ -48,16 +45,16 @@ static void __dma_sync(void *vaddr, size_t size, int direction) #ifdef CONFIG_HIGHMEM /* - * __dma_sync_page() implementation for systems using highmem. + * __dma_highmem_op() implementation for systems using highmem. * In this case, each page of a buffer must be kmapped/kunmapped - * in order to have a virtual address for __dma_sync(). This must + * in order to have a virtual address for __dma_op(). This must * not sleep so kmap_atomic()/kunmap_atomic() are used. * * Note: yes, it is possible and correct to have a buffer extend * beyond the first page. */ -static inline void __dma_sync_page_highmem(struct page *page, - unsigned long offset, size_t size, int direction) +static inline void __dma_highmem_op(struct page *page, + unsigned long offset, size_t size, enum dma_cache_op op) { size_t seg_size = min((size_t)(PAGE_SIZE - offset), size); size_t cur_size = seg_size; @@ -71,7 +68,7 @@ static inline void __dma_sync_page_highmem(struct page *page, start = (unsigned long)kmap_atomic(page + seg_nr) + seg_offset; /* Sync this buffer segment */ - __dma_sync((void *)start, seg_size, direction); + __dma_op((void *)start, seg_size, op); kunmap_atomic((void *)start); seg_nr++; @@ -88,32 +85,70 @@ static inline void __dma_sync_page_highmem(struct page *page, #endif /* CONFIG_HIGHMEM */ /* - * __dma_sync_page makes memory consistent. identical to __dma_sync, but - * takes a struct page instead of a virtual address + * __dma_phys_op makes memory consistent. identical to __dma_op, but + * takes a phys_addr_t instead of a virtual address */ -static void __dma_sync_page(phys_addr_t paddr, size_t size, int dir) +static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned offset = paddr & ~PAGE_MASK; #ifdef CONFIG_HIGHMEM - __dma_sync_page_highmem(page, offset, size, dir); + __dma_highmem_op(page, offset, size, op); #else unsigned long start = (unsigned long)page_address(page) + offset; - __dma_sync((void *)start, size, dir); + __dma_op((void *)start, size, op); #endif } void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* + * invalidate only when cache-line aligned otherwise there is + * the potential for discarding uncommitted data from the cache + */ + if ((start | end) & (L1_CACHE_BYTES - 1)) + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + else + __dma_phys_op(start, end, DMA_CACHE_INVAL); + break; + case DMA_TO_DEVICE: /* writeback only */ + __dma_phys_op(start, end, DMA_CACHE_CLEAN); + break; + case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* + * invalidate only when cache-line aligned otherwise there is + * the potential for discarding uncommitted data from the cache + */ + if ((start | end) & (L1_CACHE_BYTES - 1)) + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + else + __dma_phys_op(start, end, DMA_CACHE_INVAL); + break; + case DMA_TO_DEVICE: /* writeback only */ + __dma_phys_op(start, end, DMA_CACHE_CLEAN); + break; + case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + break; + } } void arch_dma_prep_coherent(struct page *page, size_t size) -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 05/21] powerpc: dma-mapping: split out cache operation logic @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The powerpc arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave differently from all other architectures, at least for some of the operations. As a preparation for making the behavior more consistent, reorder the logic in which they decide whether to flush, invalidate or clean the. No change in behavior is intended. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 91 +++++++++++++++++++++---------- 1 file changed, 63 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 30260b5d146d..f10869d27de5 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -16,31 +16,28 @@ #include <asm/tlbflush.h> #include <asm/dma.h> +enum dma_cache_op { + DMA_CACHE_CLEAN, + DMA_CACHE_INVAL, + DMA_CACHE_FLUSH, +}; + /* * make an area consistent. */ -static void __dma_sync(void *vaddr, size_t size, int direction) +static void __dma_op(void *vaddr, size_t size, enum dma_cache_op op) { unsigned long start = (unsigned long)vaddr; unsigned long end = start + size; - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - flush_dcache_range(start, end); - else - invalidate_dcache_range(start, end); - break; - case DMA_TO_DEVICE: /* writeback only */ + switch (op) { + case DMA_CACHE_CLEAN: clean_dcache_range(start, end); break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + case DMA_CACHE_INVAL: + invalidate_dcache_range(start, end); + break; + case DMA_CACHE_FLUSH: flush_dcache_range(start, end); break; } @@ -48,16 +45,16 @@ static void __dma_sync(void *vaddr, size_t size, int direction) #ifdef CONFIG_HIGHMEM /* - * __dma_sync_page() implementation for systems using highmem. + * __dma_highmem_op() implementation for systems using highmem. * In this case, each page of a buffer must be kmapped/kunmapped - * in order to have a virtual address for __dma_sync(). This must + * in order to have a virtual address for __dma_op(). This must * not sleep so kmap_atomic()/kunmap_atomic() are used. * * Note: yes, it is possible and correct to have a buffer extend * beyond the first page. */ -static inline void __dma_sync_page_highmem(struct page *page, - unsigned long offset, size_t size, int direction) +static inline void __dma_highmem_op(struct page *page, + unsigned long offset, size_t size, enum dma_cache_op op) { size_t seg_size = min((size_t)(PAGE_SIZE - offset), size); size_t cur_size = seg_size; @@ -71,7 +68,7 @@ static inline void __dma_sync_page_highmem(struct page *page, start = (unsigned long)kmap_atomic(page + seg_nr) + seg_offset; /* Sync this buffer segment */ - __dma_sync((void *)start, seg_size, direction); + __dma_op((void *)start, seg_size, op); kunmap_atomic((void *)start); seg_nr++; @@ -88,32 +85,70 @@ static inline void __dma_sync_page_highmem(struct page *page, #endif /* CONFIG_HIGHMEM */ /* - * __dma_sync_page makes memory consistent. identical to __dma_sync, but - * takes a struct page instead of a virtual address + * __dma_phys_op makes memory consistent. identical to __dma_op, but + * takes a phys_addr_t instead of a virtual address */ -static void __dma_sync_page(phys_addr_t paddr, size_t size, int dir) +static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned offset = paddr & ~PAGE_MASK; #ifdef CONFIG_HIGHMEM - __dma_sync_page_highmem(page, offset, size, dir); + __dma_highmem_op(page, offset, size, op); #else unsigned long start = (unsigned long)page_address(page) + offset; - __dma_sync((void *)start, size, dir); + __dma_op((void *)start, size, op); #endif } void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* + * invalidate only when cache-line aligned otherwise there is + * the potential for discarding uncommitted data from the cache + */ + if ((start | end) & (L1_CACHE_BYTES - 1)) + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + else + __dma_phys_op(start, end, DMA_CACHE_INVAL); + break; + case DMA_TO_DEVICE: /* writeback only */ + __dma_phys_op(start, end, DMA_CACHE_CLEAN); + break; + case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* + * invalidate only when cache-line aligned otherwise there is + * the potential for discarding uncommitted data from the cache + */ + if ((start | end) & (L1_CACHE_BYTES - 1)) + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + else + __dma_phys_op(start, end, DMA_CACHE_INVAL); + break; + case DMA_TO_DEVICE: /* writeback only */ + __dma_phys_op(start, end, DMA_CACHE_CLEAN); + break; + case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + break; + } } void arch_dma_prep_coherent(struct page *page, size_t size) -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 05/21] powerpc: dma-mapping: split out cache operation logic @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> The powerpc arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave differently from all other architectures, at least for some of the operations. As a preparation for making the behavior more consistent, reorder the logic in which they decide whether to flush, invalidate or clean the. No change in behavior is intended. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 91 +++++++++++++++++++++---------- 1 file changed, 63 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 30260b5d146d..f10869d27de5 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -16,31 +16,28 @@ #include <asm/tlbflush.h> #include <asm/dma.h> +enum dma_cache_op { + DMA_CACHE_CLEAN, + DMA_CACHE_INVAL, + DMA_CACHE_FLUSH, +}; + /* * make an area consistent. */ -static void __dma_sync(void *vaddr, size_t size, int direction) +static void __dma_op(void *vaddr, size_t size, enum dma_cache_op op) { unsigned long start = (unsigned long)vaddr; unsigned long end = start + size; - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - flush_dcache_range(start, end); - else - invalidate_dcache_range(start, end); - break; - case DMA_TO_DEVICE: /* writeback only */ + switch (op) { + case DMA_CACHE_CLEAN: clean_dcache_range(start, end); break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + case DMA_CACHE_INVAL: + invalidate_dcache_range(start, end); + break; + case DMA_CACHE_FLUSH: flush_dcache_range(start, end); break; } @@ -48,16 +45,16 @@ static void __dma_sync(void *vaddr, size_t size, int direction) #ifdef CONFIG_HIGHMEM /* - * __dma_sync_page() implementation for systems using highmem. + * __dma_highmem_op() implementation for systems using highmem. * In this case, each page of a buffer must be kmapped/kunmapped - * in order to have a virtual address for __dma_sync(). This must + * in order to have a virtual address for __dma_op(). This must * not sleep so kmap_atomic()/kunmap_atomic() are used. * * Note: yes, it is possible and correct to have a buffer extend * beyond the first page. */ -static inline void __dma_sync_page_highmem(struct page *page, - unsigned long offset, size_t size, int direction) +static inline void __dma_highmem_op(struct page *page, + unsigned long offset, size_t size, enum dma_cache_op op) { size_t seg_size = min((size_t)(PAGE_SIZE - offset), size); size_t cur_size = seg_size; @@ -71,7 +68,7 @@ static inline void __dma_sync_page_highmem(struct page *page, start = (unsigned long)kmap_atomic(page + seg_nr) + seg_offset; /* Sync this buffer segment */ - __dma_sync((void *)start, seg_size, direction); + __dma_op((void *)start, seg_size, op); kunmap_atomic((void *)start); seg_nr++; @@ -88,32 +85,70 @@ static inline void __dma_sync_page_highmem(struct page *page, #endif /* CONFIG_HIGHMEM */ /* - * __dma_sync_page makes memory consistent. identical to __dma_sync, but - * takes a struct page instead of a virtual address + * __dma_phys_op makes memory consistent. identical to __dma_op, but + * takes a phys_addr_t instead of a virtual address */ -static void __dma_sync_page(phys_addr_t paddr, size_t size, int dir) +static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned offset = paddr & ~PAGE_MASK; #ifdef CONFIG_HIGHMEM - __dma_sync_page_highmem(page, offset, size, dir); + __dma_highmem_op(page, offset, size, op); #else unsigned long start = (unsigned long)page_address(page) + offset; - __dma_sync((void *)start, size, dir); + __dma_op((void *)start, size, op); #endif } void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* + * invalidate only when cache-line aligned otherwise there is + * the potential for discarding uncommitted data from the cache + */ + if ((start | end) & (L1_CACHE_BYTES - 1)) + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + else + __dma_phys_op(start, end, DMA_CACHE_INVAL); + break; + case DMA_TO_DEVICE: /* writeback only */ + __dma_phys_op(start, end, DMA_CACHE_CLEAN); + break; + case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* + * invalidate only when cache-line aligned otherwise there is + * the potential for discarding uncommitted data from the cache + */ + if ((start | end) & (L1_CACHE_BYTES - 1)) + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + else + __dma_phys_op(start, end, DMA_CACHE_INVAL); + break; + case DMA_TO_DEVICE: /* writeback only */ + __dma_phys_op(start, end, DMA_CACHE_CLEAN); + break; + case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); + break; + } } void arch_dma_prep_coherent(struct page *page, size_t size) -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other architectures. Reduce it to what everyone else does: - No flush is needed after data has been sent to a device - When data has been received from a device, the cache only needs to be invalidated to clear out cache lines that were speculatively prefetched. In particular, the second flushing of partial cache lines of bidirectional buffers is actively harmful -- if a single cache line is written by both the CPU and the device, flushing it again does not maintain coherency but instead overwrite the data that was just received from the device. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index f10869d27de5..e108cacf877f 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, switch (direction) { case DMA_NONE: BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); + case DMA_TO_DEVICE: break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + __dma_phys_op(start, end, DMA_CACHE_INVAL); break; } } -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other architectures. Reduce it to what everyone else does: - No flush is needed after data has been sent to a device - When data has been received from a device, the cache only needs to be invalidated to clear out cache lines that were speculatively prefetched. In particular, the second flushing of partial cache lines of bidirectional buffers is actively harmful -- if a single cache line is written by both the CPU and the device, flushing it again does not maintain coherency but instead overwrite the data that was just received from the device. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index f10869d27de5..e108cacf877f 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, switch (direction) { case DMA_NONE: BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); + case DMA_TO_DEVICE: break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + __dma_phys_op(start, end, DMA_CACHE_INVAL); break; } } -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other architectures. Reduce it to what everyone else does: - No flush is needed after data has been sent to a device - When data has been received from a device, the cache only needs to be invalidated to clear out cache lines that were speculatively prefetched. In particular, the second flushing of partial cache lines of bidirectional buffers is actively harmful -- if a single cache line is written by both the CPU and the device, flushing it again does not maintain coherency but instead overwrite the data that was just received from the device. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index f10869d27de5..e108cacf877f 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, switch (direction) { case DMA_NONE: BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); + case DMA_TO_DEVICE: break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + __dma_phys_op(start, end, DMA_CACHE_INVAL); break; } } -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other architectures. Reduce it to what everyone else does: - No flush is needed after data has been sent to a device - When data has been received from a device, the cache only needs to be invalidated to clear out cache lines that were speculatively prefetched. In particular, the second flushing of partial cache lines of bidirectional buffers is actively harmful -- if a single cache line is written by both the CPU and the device, flushing it again does not maintain coherency but instead overwrite the data that was just received from the device. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index f10869d27de5..e108cacf877f 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, switch (direction) { case DMA_NONE: BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); + case DMA_TO_DEVICE: break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + __dma_phys_op(start, end, DMA_CACHE_INVAL); break; } } -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other architectures. Reduce it to what everyone else does: - No flush is needed after data has been sent to a device - When data has been received from a device, the cache only needs to be invalidated to clear out cache lines that were speculatively prefetched. In particular, the second flushing of partial cache lines of bidirectional buffers is actively harmful -- if a single cache line is written by both the CPU and the device, flushing it again does not maintain coherency but instead overwrite the data that was just received from the device. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index f10869d27de5..e108cacf877f 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, switch (direction) { case DMA_NONE: BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); + case DMA_TO_DEVICE: break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + __dma_phys_op(start, end, DMA_CACHE_INVAL); break; } } -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John From: Arnd Bergmann <arnd@arndb.de> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other architectures. Reduce it to what everyone else does: - No flush is needed after data has been sent to a device - When data has been received from a device, the cache only needs to be invalidated to clear out cache lines that were speculatively prefetched. In particular, the second flushing of partial cache lines of bidirectional buffers is actively harmful -- if a single cache line is written by both the CPU and the device, flushing it again does not maintain coherency but instead overwrite the data that was just received from the device. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index f10869d27de5..e108cacf877f 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, switch (direction) { case DMA_NONE: BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); + case DMA_TO_DEVICE: break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + __dma_phys_op(start, end, DMA_CACHE_INVAL); break; } } -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:56 ` Christophe Leroy -1 siblings, 0 replies; 456+ messages in thread From: Christophe Leroy @ 2023-03-27 12:56 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : > From: Arnd Bergmann <arnd@arndb.de> > > The powerpc dma_sync_*_for_cpu() variants do more flushes than on other > architectures. Reduce it to what everyone else does: > > - No flush is needed after data has been sent to a device > > - When data has been received from a device, the cache only needs to > be invalidated to clear out cache lines that were speculatively > prefetched. > > In particular, the second flushing of partial cache lines of bidirectional > buffers is actively harmful -- if a single cache line is written by both > the CPU and the device, flushing it again does not maintain coherency > but instead overwrite the data that was just received from the device. Hum ..... Who is right ? That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer") I think your commit log should explain why that commit was wrong, and maybe say that your patch is a revert of that commit ? Christophe > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/powerpc/mm/dma-noncoherent.c | 18 ++++-------------- > 1 file changed, 4 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c > index f10869d27de5..e108cacf877f 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > switch (direction) { > case DMA_NONE: > BUG(); > - case DMA_FROM_DEVICE: > - /* > - * invalidate only when cache-line aligned otherwise there is > - * the potential for discarding uncommitted data from the cache > - */ > - if ((start | end) & (L1_CACHE_BYTES - 1)) > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > - else > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __dma_phys_op(start, end, DMA_CACHE_CLEAN); > + case DMA_TO_DEVICE: > break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > break; > } > } ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 12:56 ` Christophe Leroy 0 siblings, 0 replies; 456+ messages in thread From: Christophe Leroy @ 2023-03-27 12:56 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : > From: Arnd Bergmann <arnd@arndb.de> > > The powerpc dma_sync_*_for_cpu() variants do more flushes than on other > architectures. Reduce it to what everyone else does: > > - No flush is needed after data has been sent to a device > > - When data has been received from a device, the cache only needs to > be invalidated to clear out cache lines that were speculatively > prefetched. > > In particular, the second flushing of partial cache lines of bidirectional > buffers is actively harmful -- if a single cache line is written by both > the CPU and the device, flushing it again does not maintain coherency > but instead overwrite the data that was just received from the device. Hum ..... Who is right ? That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer") I think your commit log should explain why that commit was wrong, and maybe say that your patch is a revert of that commit ? Christophe > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/powerpc/mm/dma-noncoherent.c | 18 ++++-------------- > 1 file changed, 4 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c > index f10869d27de5..e108cacf877f 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > switch (direction) { > case DMA_NONE: > BUG(); > - case DMA_FROM_DEVICE: > - /* > - * invalidate only when cache-line aligned otherwise there is > - * the potential for discarding uncommitted data from the cache > - */ > - if ((start | end) & (L1_CACHE_BYTES - 1)) > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > - else > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __dma_phys_op(start, end, DMA_CACHE_CLEAN); > + case DMA_TO_DEVICE: > break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > break; > } > } _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 12:56 ` Christophe Leroy 0 siblings, 0 replies; 456+ messages in thread From: Christophe Leroy @ 2023-03-27 12:56 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Rich Felker, linux-sh@vger.kernel.org, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, linux-csky@vger.kernel.org, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org, Arnd Bergmann, Brian Cain, Lad Prabhakar, li nux-m68k@lists.linux-m68k.org, Paul Walmsley, Stafford Horne, linux-arm-kernel@lists.infradead.org, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc@vger.kernel.org, linux-openrisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mips@vger.kernel.org, Dinh Nguyen, Palmer Dabbelt, linux-hexagon@vger.kernel.org, linux-oxnas@groups.io, Robin Murphy, David S. Miller Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : > From: Arnd Bergmann <arnd@arndb.de> > > The powerpc dma_sync_*_for_cpu() variants do more flushes than on other > architectures. Reduce it to what everyone else does: > > - No flush is needed after data has been sent to a device > > - When data has been received from a device, the cache only needs to > be invalidated to clear out cache lines that were speculatively > prefetched. > > In particular, the second flushing of partial cache lines of bidirectional > buffers is actively harmful -- if a single cache line is written by both > the CPU and the device, flushing it again does not maintain coherency > but instead overwrite the data that was just received from the device. Hum ..... Who is right ? That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer") I think your commit log should explain why that commit was wrong, and maybe say that your patch is a revert of that commit ? Christophe > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/powerpc/mm/dma-noncoherent.c | 18 ++++-------------- > 1 file changed, 4 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c > index f10869d27de5..e108cacf877f 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > switch (direction) { > case DMA_NONE: > BUG(); > - case DMA_FROM_DEVICE: > - /* > - * invalidate only when cache-line aligned otherwise there is > - * the potential for discarding uncommitted data from the cache > - */ > - if ((start | end) & (L1_CACHE_BYTES - 1)) > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > - else > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __dma_phys_op(start, end, DMA_CACHE_CLEAN); > + case DMA_TO_DEVICE: > break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > break; > } > } ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 12:56 ` Christophe Leroy 0 siblings, 0 replies; 456+ messages in thread From: Christophe Leroy @ 2023-03-27 12:56 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : > From: Arnd Bergmann <arnd@arndb.de> > > The powerpc dma_sync_*_for_cpu() variants do more flushes than on other > architectures. Reduce it to what everyone else does: > > - No flush is needed after data has been sent to a device > > - When data has been received from a device, the cache only needs to > be invalidated to clear out cache lines that were speculatively > prefetched. > > In particular, the second flushing of partial cache lines of bidirectional > buffers is actively harmful -- if a single cache line is written by both > the CPU and the device, flushing it again does not maintain coherency > but instead overwrite the data that was just received from the device. Hum ..... Who is right ? That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer") I think your commit log should explain why that commit was wrong, and maybe say that your patch is a revert of that commit ? Christophe > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/powerpc/mm/dma-noncoherent.c | 18 ++++-------------- > 1 file changed, 4 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c > index f10869d27de5..e108cacf877f 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > switch (direction) { > case DMA_NONE: > BUG(); > - case DMA_FROM_DEVICE: > - /* > - * invalidate only when cache-line aligned otherwise there is > - * the potential for discarding uncommitted data from the cache > - */ > - if ((start | end) & (L1_CACHE_BYTES - 1)) > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > - else > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __dma_phys_op(start, end, DMA_CACHE_CLEAN); > + case DMA_TO_DEVICE: > break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > break; > } > } _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 12:56 ` Christophe Leroy 0 siblings, 0 replies; 456+ messages in thread From: Christophe Leroy @ 2023-03-27 12:56 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : > From: Arnd Bergmann <arnd@arndb.de> > > The powerpc dma_sync_*_for_cpu() variants do more flushes than on other > architectures. Reduce it to what everyone else does: > > - No flush is needed after data has been sent to a device > > - When data has been received from a device, the cache only needs to > be invalidated to clear out cache lines that were speculatively > prefetched. > > In particular, the second flushing of partial cache lines of bidirectional > buffers is actively harmful -- if a single cache line is written by both > the CPU and the device, flushing it again does not maintain coherency > but instead overwrite the data that was just received from the device. Hum ..... Who is right ? That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer") I think your commit log should explain why that commit was wrong, and maybe say that your patch is a revert of that commit ? Christophe > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/powerpc/mm/dma-noncoherent.c | 18 ++++-------------- > 1 file changed, 4 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c > index f10869d27de5..e108cacf877f 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > switch (direction) { > case DMA_NONE: > BUG(); > - case DMA_FROM_DEVICE: > - /* > - * invalidate only when cache-line aligned otherwise there is > - * the potential for discarding uncommitted data from the cache > - */ > - if ((start | end) & (L1_CACHE_BYTES - 1)) > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > - else > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __dma_phys_op(start, end, DMA_CACHE_CLEAN); > + case DMA_TO_DEVICE: > break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > break; > } > } _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 12:56 ` Christophe Leroy 0 siblings, 0 replies; 456+ messages in thread From: Christophe Leroy @ 2023-03-27 12:56 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy <robin.> Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : > From: Arnd Bergmann <arnd@arndb.de> > > The powerpc dma_sync_*_for_cpu() variants do more flushes than on other > architectures. Reduce it to what everyone else does: > > - No flush is needed after data has been sent to a device > > - When data has been received from a device, the cache only needs to > be invalidated to clear out cache lines that were speculatively > prefetched. > > In particular, the second flushing of partial cache lines of bidirectional > buffers is actively harmful -- if a single cache line is written by both > the CPU and the device, flushing it again does not maintain coherency > but instead overwrite the data that was just received from the device. Hum ..... Who is right ? That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer") I think your commit log should explain why that commit was wrong, and maybe say that your patch is a revert of that commit ? Christophe > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/powerpc/mm/dma-noncoherent.c | 18 ++++-------------- > 1 file changed, 4 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c > index f10869d27de5..e108cacf877f 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > switch (direction) { > case DMA_NONE: > BUG(); > - case DMA_FROM_DEVICE: > - /* > - * invalidate only when cache-line aligned otherwise there is > - * the potential for discarding uncommitted data from the cache > - */ > - if ((start | end) & (L1_CACHE_BYTES - 1)) > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > - else > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __dma_phys_op(start, end, DMA_CACHE_CLEAN); > + case DMA_TO_DEVICE: > break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > break; > } > } _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing 2023-03-27 12:56 ` Christophe Leroy ` (3 preceding siblings ...) (?) @ 2023-03-27 13:02 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 13:02 UTC (permalink / raw) To: Christophe Leroy, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org On Mon, Mar 27, 2023, at 14:56, Christophe Leroy wrote: > Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : >> From: Arnd Bergmann <arnd@arndb.de> >> >> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other >> architectures. Reduce it to what everyone else does: >> >> - No flush is needed after data has been sent to a device >> >> - When data has been received from a device, the cache only needs to >> be invalidated to clear out cache lines that were speculatively >> prefetched. >> >> In particular, the second flushing of partial cache lines of bidirectional >> buffers is actively harmful -- if a single cache line is written by both >> the CPU and the device, flushing it again does not maintain coherency >> but instead overwrite the data that was just received from the device. > > Hum ..... Who is right ? > > That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent > memory corruption due to cache invalidation of unaligned DMA buffer") > > I think your commit log should explain why that commit was wrong, and > maybe say that your patch is a revert of that commit ? Ok, I'll try to explain this better. To clarify here: the __dma_sync() function in commit 03d70617b8a7 is used both before and after a DMA, but my patch 05/21 splits this in two, and patch 06/21 only changes the part that gets called after the DMA-from-device but leaves the part before DMA-from-device unchanged, which Andrew's patch addressed. As I mentioned in the cover letter, it is still unclear whether we want to consider this the expected behavior as the documentation seems unclear, but my series does not attempt to answer that question. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 13:02 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 13:02 UTC (permalink / raw) To: Christophe Leroy, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org On Mon, Mar 27, 2023, at 14:56, Christophe Leroy wrote: > Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : >> From: Arnd Bergmann <arnd@arndb.de> >> >> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other >> architectures. Reduce it to what everyone else does: >> >> - No flush is needed after data has been sent to a device >> >> - When data has been received from a device, the cache only needs to >> be invalidated to clear out cache lines that were speculatively >> prefetched. >> >> In particular, the second flushing of partial cache lines of bidirectional >> buffers is actively harmful -- if a single cache line is written by both >> the CPU and the device, flushing it again does not maintain coherency >> but instead overwrite the data that was just received from the device. > > Hum ..... Who is right ? > > That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent > memory corruption due to cache invalidation of unaligned DMA buffer") > > I think your commit log should explain why that commit was wrong, and > maybe say that your patch is a revert of that commit ? Ok, I'll try to explain this better. To clarify here: the __dma_sync() function in commit 03d70617b8a7 is used both before and after a DMA, but my patch 05/21 splits this in two, and patch 06/21 only changes the part that gets called after the DMA-from-device but leaves the part before DMA-from-device unchanged, which Andrew's patch addressed. As I mentioned in the cover letter, it is still unclear whether we want to consider this the expected behavior as the documentation seems unclear, but my series does not attempt to answer that question. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 13:02 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 13:02 UTC (permalink / raw) To: Christophe Leroy, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Rich Felker, linux-sh@vger.kernel.org, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor.Dooley, guoren, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, linux-csky@vger.kernel.org, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org, Brian Cain, Lad, Prabhakar, linux-m68k@lists.linux-m68k.org, Paul Walmsley, Stafford Horne, linux-arm-kernel@lists.infradead.org, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc@vger.kernel.org, linux-openrisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mips@vger.kernel.org, Dinh Nguyen, Palmer Dabbelt, linux-hexagon@vger.kernel.org, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Mon, Mar 27, 2023, at 14:56, Christophe Leroy wrote: > Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : >> From: Arnd Bergmann <arnd@arndb.de> >> >> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other >> architectures. Reduce it to what everyone else does: >> >> - No flush is needed after data has been sent to a device >> >> - When data has been received from a device, the cache only needs to >> be invalidated to clear out cache lines that were speculatively >> prefetched. >> >> In particular, the second flushing of partial cache lines of bidirectional >> buffers is actively harmful -- if a single cache line is written by both >> the CPU and the device, flushing it again does not maintain coherency >> but instead overwrite the data that was just received from the device. > > Hum ..... Who is right ? > > That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent > memory corruption due to cache invalidation of unaligned DMA buffer") > > I think your commit log should explain why that commit was wrong, and > maybe say that your patch is a revert of that commit ? Ok, I'll try to explain this better. To clarify here: the __dma_sync() function in commit 03d70617b8a7 is used both before and after a DMA, but my patch 05/21 splits this in two, and patch 06/21 only changes the part that gets called after the DMA-from-device but leaves the part before DMA-from-device unchanged, which Andrew's patch addressed. As I mentioned in the cover letter, it is still unclear whether we want to consider this the expected behavior as the documentation seems unclear, but my series does not attempt to answer that question. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 13:02 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 13:02 UTC (permalink / raw) To: Christophe Leroy, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org On Mon, Mar 27, 2023, at 14:56, Christophe Leroy wrote: > Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : >> From: Arnd Bergmann <arnd@arndb.de> >> >> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other >> architectures. Reduce it to what everyone else does: >> >> - No flush is needed after data has been sent to a device >> >> - When data has been received from a device, the cache only needs to >> be invalidated to clear out cache lines that were speculatively >> prefetched. >> >> In particular, the second flushing of partial cache lines of bidirectional >> buffers is actively harmful -- if a single cache line is written by both >> the CPU and the device, flushing it again does not maintain coherency >> but instead overwrite the data that was just received from the device. > > Hum ..... Who is right ? > > That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent > memory corruption due to cache invalidation of unaligned DMA buffer") > > I think your commit log should explain why that commit was wrong, and > maybe say that your patch is a revert of that commit ? Ok, I'll try to explain this better. To clarify here: the __dma_sync() function in commit 03d70617b8a7 is used both before and after a DMA, but my patch 05/21 splits this in two, and patch 06/21 only changes the part that gets called after the DMA-from-device but leaves the part before DMA-from-device unchanged, which Andrew's patch addressed. As I mentioned in the cover letter, it is still unclear whether we want to consider this the expected behavior as the documentation seems unclear, but my series does not attempt to answer that question. Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 13:02 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 13:02 UTC (permalink / raw) To: Christophe Leroy, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org On Mon, Mar 27, 2023, at 14:56, Christophe Leroy wrote: > Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : >> From: Arnd Bergmann <arnd@arndb.de> >> >> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other >> architectures. Reduce it to what everyone else does: >> >> - No flush is needed after data has been sent to a device >> >> - When data has been received from a device, the cache only needs to >> be invalidated to clear out cache lines that were speculatively >> prefetched. >> >> In particular, the second flushing of partial cache lines of bidirectional >> buffers is actively harmful -- if a single cache line is written by both >> the CPU and the device, flushing it again does not maintain coherency >> but instead overwrite the data that was just received from the device. > > Hum ..... Who is right ? > > That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent > memory corruption due to cache invalidation of unaligned DMA buffer") > > I think your commit log should explain why that commit was wrong, and > maybe say that your patch is a revert of that commit ? Ok, I'll try to explain this better. To clarify here: the __dma_sync() function in commit 03d70617b8a7 is used both before and after a DMA, but my patch 05/21 splits this in two, and patch 06/21 only changes the part that gets called after the DMA-from-device but leaves the part before DMA-from-device unchanged, which Andrew's patch addressed. As I mentioned in the cover letter, it is still unclear whether we want to consider this the expected behavior as the documentation seems unclear, but my series does not attempt to answer that question. Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing @ 2023-03-27 13:02 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 13:02 UTC (permalink / raw) To: Christophe Leroy, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin On Mon, Mar 27, 2023, at 14:56, Christophe Leroy wrote: > Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : >> From: Arnd Bergmann <arnd@arndb.de> >> >> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other >> architectures. Reduce it to what everyone else does: >> >> - No flush is needed after data has been sent to a device >> >> - When data has been received from a device, the cache only needs to >> be invalidated to clear out cache lines that were speculatively >> prefetched. >> >> In particular, the second flushing of partial cache lines of bidirectional >> buffers is actively harmful -- if a single cache line is written by both >> the CPU and the device, flushing it again does not maintain coherency >> but instead overwrite the data that was just received from the device. > > Hum ..... Who is right ? > > That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent > memory corruption due to cache invalidation of unaligned DMA buffer") > > I think your commit log should explain why that commit was wrong, and > maybe say that your patch is a revert of that commit ? Ok, I'll try to explain this better. To clarify here: the __dma_sync() function in commit 03d70617b8a7 is used both before and after a DMA, but my patch 05/21 splits this in two, and patch 06/21 only changes the part that gets called after the DMA-from-device but leaves the part before DMA-from-device unchanged, which Andrew's patch addressed. As I mentioned in the cover letter, it is still unclear whether we want to consider this the expected behavior as the documentation seems unclear, but my series does not attempt to answer that question. Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 07/21] powerpc: dma-mapping: always clean cache in _for_device() op 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The powerpc implementation of arch_sync_dma_for_device() is unique in that it sometimes performs a full flush for the arch_sync_dma_for_device(paddr, size, DMA_FROM_DEVICE) operation when the address is unaligned, but otherwise invalidates the caches. Since the _for_cpu() counterpart has to invalidate the cache already in order to avoid stale data from prefetching, this operation only really needs to ensure that there are no dirty cache lines, which can be done using either invalidation or cleaning the cache, but not necessarily both. Most architectures traditionally go for invalidation here, but as Will Deacon points out, this can leak old data to user space if a DMA is started but the device ends up not actually filling the entire buffer, see the link below. The same argument applies to DMA_BIDIRECTIONAL transfers. Using a cache-clean operation is the safe choice here, followed by invalidating the cache after the DMA to get rid of stale data that was prefetched before the completion of the DMA. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 21 +-------------------- 1 file changed, 1 insertion(+), 20 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index e108cacf877f..00e59a4faa2b 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -104,26 +104,7 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - break; - } + __dma_phys_op(start, end, DMA_CACHE_CLEAN); } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 07/21] powerpc: dma-mapping: always clean cache in _for_device() op @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The powerpc implementation of arch_sync_dma_for_device() is unique in that it sometimes performs a full flush for the arch_sync_dma_for_device(paddr, size, DMA_FROM_DEVICE) operation when the address is unaligned, but otherwise invalidates the caches. Since the _for_cpu() counterpart has to invalidate the cache already in order to avoid stale data from prefetching, this operation only really needs to ensure that there are no dirty cache lines, which can be done using either invalidation or cleaning the cache, but not necessarily both. Most architectures traditionally go for invalidation here, but as Will Deacon points out, this can leak old data to user space if a DMA is started but the device ends up not actually filling the entire buffer, see the link below. The same argument applies to DMA_BIDIRECTIONAL transfers. Using a cache-clean operation is the safe choice here, followed by invalidating the cache after the DMA to get rid of stale data that was prefetched before the completion of the DMA. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 21 +-------------------- 1 file changed, 1 insertion(+), 20 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index e108cacf877f..00e59a4faa2b 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -104,26 +104,7 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - break; - } + __dma_phys_op(start, end, DMA_CACHE_CLEAN); } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 07/21] powerpc: dma-mapping: always clean cache in _for_device() op @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> The powerpc implementation of arch_sync_dma_for_device() is unique in that it sometimes performs a full flush for the arch_sync_dma_for_device(paddr, size, DMA_FROM_DEVICE) operation when the address is unaligned, but otherwise invalidates the caches. Since the _for_cpu() counterpart has to invalidate the cache already in order to avoid stale data from prefetching, this operation only really needs to ensure that there are no dirty cache lines, which can be done using either invalidation or cleaning the cache, but not necessarily both. Most architectures traditionally go for invalidation here, but as Will Deacon points out, this can leak old data to user space if a DMA is started but the device ends up not actually filling the entire buffer, see the link below. The same argument applies to DMA_BIDIRECTIONAL transfers. Using a cache-clean operation is the safe choice here, followed by invalidating the cache after the DMA to get rid of stale data that was prefetched before the completion of the DMA. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 21 +-------------------- 1 file changed, 1 insertion(+), 20 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index e108cacf877f..00e59a4faa2b 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -104,26 +104,7 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - break; - } + __dma_phys_op(start, end, DMA_CACHE_CLEAN); } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 07/21] powerpc: dma-mapping: always clean cache in _for_device() op @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The powerpc implementation of arch_sync_dma_for_device() is unique in that it sometimes performs a full flush for the arch_sync_dma_for_device(paddr, size, DMA_FROM_DEVICE) operation when the address is unaligned, but otherwise invalidates the caches. Since the _for_cpu() counterpart has to invalidate the cache already in order to avoid stale data from prefetching, this operation only really needs to ensure that there are no dirty cache lines, which can be done using either invalidation or cleaning the cache, but not necessarily both. Most architectures traditionally go for invalidation here, but as Will Deacon points out, this can leak old data to user space if a DMA is started but the device ends up not actually filling the entire buffer, see the link below. The same argument applies to DMA_BIDIRECTIONAL transfers. Using a cache-clean operation is the safe choice here, followed by invalidating the cache after the DMA to get rid of stale data that was prefetched before the completion of the DMA. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 21 +-------------------- 1 file changed, 1 insertion(+), 20 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index e108cacf877f..00e59a4faa2b 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -104,26 +104,7 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - break; - } + __dma_phys_op(start, end, DMA_CACHE_CLEAN); } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 07/21] powerpc: dma-mapping: always clean cache in _for_device() op @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The powerpc implementation of arch_sync_dma_for_device() is unique in that it sometimes performs a full flush for the arch_sync_dma_for_device(paddr, size, DMA_FROM_DEVICE) operation when the address is unaligned, but otherwise invalidates the caches. Since the _for_cpu() counterpart has to invalidate the cache already in order to avoid stale data from prefetching, this operation only really needs to ensure that there are no dirty cache lines, which can be done using either invalidation or cleaning the cache, but not necessarily both. Most architectures traditionally go for invalidation here, but as Will Deacon points out, this can leak old data to user space if a DMA is started but the device ends up not actually filling the entire buffer, see the link below. The same argument applies to DMA_BIDIRECTIONAL transfers. Using a cache-clean operation is the safe choice here, followed by invalidating the cache after the DMA to get rid of stale data that was prefetched before the completion of the DMA. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 21 +-------------------- 1 file changed, 1 insertion(+), 20 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index e108cacf877f..00e59a4faa2b 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -104,26 +104,7 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - break; - } + __dma_phys_op(start, end, DMA_CACHE_CLEAN); } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 07/21] powerpc: dma-mapping: always clean cache in _for_device() op @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> The powerpc implementation of arch_sync_dma_for_device() is unique in that it sometimes performs a full flush for the arch_sync_dma_for_device(paddr, size, DMA_FROM_DEVICE) operation when the address is unaligned, but otherwise invalidates the caches. Since the _for_cpu() counterpart has to invalidate the cache already in order to avoid stale data from prefetching, this operation only really needs to ensure that there are no dirty cache lines, which can be done using either invalidation or cleaning the cache, but not necessarily both. Most architectures traditionally go for invalidation here, but as Will Deacon points out, this can leak old data to user space if a DMA is started but the device ends up not actually filling the entire buffer, see the link below. The same argument applies to DMA_BIDIRECTIONAL transfers. Using a cache-clean operation is the safe choice here, followed by invalidating the cache after the DMA to get rid of stale data that was prefetched before the completion of the DMA. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/powerpc/mm/dma-noncoherent.c | 21 +-------------------- 1 file changed, 1 insertion(+), 20 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index e108cacf877f..00e59a4faa2b 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -104,26 +104,7 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* - * invalidate only when cache-line aligned otherwise there is - * the potential for discarding uncommitted data from the cache - */ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - break; - } + __dma_phys_op(start, end, DMA_CACHE_CLEAN); } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> No other architecture intentionally writes back dirty cache lines into a buffer that a device has just finished writing into. If the cache is clean, this has no effect at all, but if a cacheline in the buffer has actually been written by the CPU, there is a drive bug that is likely made worse by overwriting that buffer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..640f4c496d26 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> No other architecture intentionally writes back dirty cache lines into a buffer that a device has just finished writing into. If the cache is clean, this has no effect at all, but if a cacheline in the buffer has actually been written by the CPU, there is a drive bug that is likely made worse by overwriting that buffer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..640f4c496d26 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> No other architecture intentionally writes back dirty cache lines into a buffer that a device has just finished writing into. If the cache is clean, this has no effect at all, but if a cacheline in the buffer has actually been written by the CPU, there is a drive bug that is likely made worse by overwriting that buffer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..640f4c496d26 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> No other architecture intentionally writes back dirty cache lines into a buffer that a device has just finished writing into. If the cache is clean, this has no effect at all, but if a cacheline in the buffer has actually been written by the CPU, there is a drive bug that is likely made worse by overwriting that buffer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..640f4c496d26 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> No other architecture intentionally writes back dirty cache lines into a buffer that a device has just finished writing into. If the cache is clean, this has no effect at all, but if a cacheline in the buffer has actually been written by the CPU, there is a drive bug that is likely made worse by overwriting that buffer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..640f4c496d26 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> No other architecture intentionally writes back dirty cache lines into a buffer that a device has just finished writing into. If the cache is clean, this has no effect at all, but if a cacheline in the buffer has actually been written by the CPU, there is a drive bug that is likely made worse by overwriting that buffer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..640f4c496d26 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-29 20:48 ` Conor Dooley -1 siblings, 0 replies; 456+ messages in thread From: Conor Dooley @ 2023-03-29 20:48 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, samuel [-- Attachment #1: Type: text/plain, Size: 1836 bytes --] On Mon, Mar 27, 2023 at 02:13:04PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but > if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. So does this need a Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using zicbom extension") then, even if the cacheline really should not have been touched by the CPU? Also, minor typo, s/drive/driver/. In the thread we had that sparked this, I went digging for the source of the flushes, and it came from a review comment: https://lore.kernel.org/linux-riscv/342e3c12-ebb0-badf-7d4c-c444a2b842b2@sholland.org/ But *surely* if no other arch needs to do that, then we are safe to also not do it... Your logic seems right by me at least, especially given the lack of flushes elsewhere. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Cheers, Conor. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-29 20:48 ` Conor Dooley 0 siblings, 0 replies; 456+ messages in thread From: Conor Dooley @ 2023-03-29 20:48 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, samuel [-- Attachment #1.1: Type: text/plain, Size: 1836 bytes --] On Mon, Mar 27, 2023 at 02:13:04PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but > if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. So does this need a Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using zicbom extension") then, even if the cacheline really should not have been touched by the CPU? Also, minor typo, s/drive/driver/. In the thread we had that sparked this, I went digging for the source of the flushes, and it came from a review comment: https://lore.kernel.org/linux-riscv/342e3c12-ebb0-badf-7d4c-c444a2b842b2@sholland.org/ But *surely* if no other arch needs to do that, then we are safe to also not do it... Your logic seems right by me at least, especially given the lack of flushes elsewhere. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Cheers, Conor. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-29 20:48 ` Conor Dooley 0 siblings, 0 replies; 456+ messages in thread From: Conor Dooley @ 2023-03-29 20:48 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, samuel, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil A rmstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller [-- Attachment #1: Type: text/plain, Size: 1836 bytes --] On Mon, Mar 27, 2023 at 02:13:04PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but > if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. So does this need a Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using zicbom extension") then, even if the cacheline really should not have been touched by the CPU? Also, minor typo, s/drive/driver/. In the thread we had that sparked this, I went digging for the source of the flushes, and it came from a review comment: https://lore.kernel.org/linux-riscv/342e3c12-ebb0-badf-7d4c-c444a2b842b2@sholland.org/ But *surely* if no other arch needs to do that, then we are safe to also not do it... Your logic seems right by me at least, especially given the lack of flushes elsewhere. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Cheers, Conor. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-29 20:48 ` Conor Dooley 0 siblings, 0 replies; 456+ messages in thread From: Conor Dooley @ 2023-03-29 20:48 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, samuel [-- Attachment #1.1: Type: text/plain, Size: 1836 bytes --] On Mon, Mar 27, 2023 at 02:13:04PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but > if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. So does this need a Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using zicbom extension") then, even if the cacheline really should not have been touched by the CPU? Also, minor typo, s/drive/driver/. In the thread we had that sparked this, I went digging for the source of the flushes, and it came from a review comment: https://lore.kernel.org/linux-riscv/342e3c12-ebb0-badf-7d4c-c444a2b842b2@sholland.org/ But *surely* if no other arch needs to do that, then we are safe to also not do it... Your logic seems right by me at least, especially given the lack of flushes elsewhere. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Cheers, Conor. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] [-- Attachment #2: Type: text/plain, Size: 170 bytes --] _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-29 20:48 ` Conor Dooley 0 siblings, 0 replies; 456+ messages in thread From: Conor Dooley @ 2023-03-29 20:48 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, samuel [-- Attachment #1.1: Type: text/plain, Size: 1836 bytes --] On Mon, Mar 27, 2023 at 02:13:04PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but > if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. So does this need a Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using zicbom extension") then, even if the cacheline really should not have been touched by the CPU? Also, minor typo, s/drive/driver/. In the thread we had that sparked this, I went digging for the source of the flushes, and it came from a review comment: https://lore.kernel.org/linux-riscv/342e3c12-ebb0-badf-7d4c-c444a2b842b2@sholland.org/ But *surely* if no other arch needs to do that, then we are safe to also not do it... Your logic seems right by me at least, especially given the lack of flushes elsewhere. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Cheers, Conor. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] [-- Attachment #2: Type: text/plain, Size: 161 bytes --] _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-29 20:48 ` Conor Dooley 0 siblings, 0 replies; 456+ messages in thread From: Conor Dooley @ 2023-03-29 20:48 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich [-- Attachment #1: Type: text/plain, Size: 1836 bytes --] On Mon, Mar 27, 2023 at 02:13:04PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but > if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. So does this need a Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using zicbom extension") then, even if the cacheline really should not have been touched by the CPU? Also, minor typo, s/drive/driver/. In the thread we had that sparked this, I went digging for the source of the flushes, and it came from a review comment: https://lore.kernel.org/linux-riscv/342e3c12-ebb0-badf-7d4c-c444a2b842b2@sholland.org/ But *surely* if no other arch needs to do that, then we are safe to also not do it... Your logic seems right by me at least, especially given the lack of flushes elsewhere. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Cheers, Conor. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush 2023-03-29 20:48 ` Conor Dooley ` (3 preceding siblings ...) (?) @ 2023-03-30 7:10 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-30 7:10 UTC (permalink / raw) To: Conor Dooley, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Samuel Holland On Wed, Mar 29, 2023, at 22:48, Conor Dooley wrote: > On Mon, Mar 27, 2023 at 02:13:04PM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> No other architecture intentionally writes back dirty cache lines into >> a buffer that a device has just finished writing into. If the cache is >> clean, this has no effect at all, but > >> if a cacheline in the buffer has >> actually been written by the CPU, there is a drive bug that is likely >> made worse by overwriting that buffer. > > So does this need a > Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using > zicbom extension") > then, even if the cacheline really should not have been touched by the > CPU? > Also, minor typo, s/drive/driver/. done > In the thread we had that sparked this, I went digging for the source of > the flushes, and it came from a review comment: > https://lore.kernel.org/linux-riscv/342e3c12-ebb0-badf-7d4c-c444a2b842b2@sholland.org/ Ah, so the comment that led to it was "For arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL), we expect the CPU to have written to the buffer, so this should flush, not invalidate." which sounds like Samuel just misunderstood what "bidirectional" means: the comment implies that both the cpu and the device access the buffer before arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL), but this is not allowed. Instead, the point is that the device may both read and write the buffer, requiring that we must do a writeback at arch_sync_dma_for_device(DMA_BIDIRECTIONAL) and an invalidate at arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL). The comment about arch_sync_dma_for_device(DMA_FROM_DEVICE) (in the same email) seems equally confused. It's of course easy to misunderstand these, and many others have gotten confused in similar ways before. > But *surely* if no other arch needs to do that, then we are safe to also > not do it... Your logic seems right by me at least, especially given the > lack of flushes elsewhere. Right, I remove the extra writeback from powerpc, parisc and microblaze for the same reason. Those appear to only be there because they used the same function for _for_device() as for _for_cpu(), not because someone thought they were required. > Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks! Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-30 7:10 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-30 7:10 UTC (permalink / raw) To: Conor Dooley, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Samuel Holland On Wed, Mar 29, 2023, at 22:48, Conor Dooley wrote: > On Mon, Mar 27, 2023 at 02:13:04PM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> No other architecture intentionally writes back dirty cache lines into >> a buffer that a device has just finished writing into. If the cache is >> clean, this has no effect at all, but > >> if a cacheline in the buffer has >> actually been written by the CPU, there is a drive bug that is likely >> made worse by overwriting that buffer. > > So does this need a > Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using > zicbom extension") > then, even if the cacheline really should not have been touched by the > CPU? > Also, minor typo, s/drive/driver/. done > In the thread we had that sparked this, I went digging for the source of > the flushes, and it came from a review comment: > https://lore.kernel.org/linux-riscv/342e3c12-ebb0-badf-7d4c-c444a2b842b2@sholland.org/ Ah, so the comment that led to it was "For arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL), we expect the CPU to have written to the buffer, so this should flush, not invalidate." which sounds like Samuel just misunderstood what "bidirectional" means: the comment implies that both the cpu and the device access the buffer before arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL), but this is not allowed. Instead, the point is that the device may both read and write the buffer, requiring that we must do a writeback at arch_sync_dma_for_device(DMA_BIDIRECTIONAL) and an invalidate at arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL). The comment about arch_sync_dma_for_device(DMA_FROM_DEVICE) (in the same email) seems equally confused. It's of course easy to misunderstand these, and many others have gotten confused in similar ways before. > But *surely* if no other arch needs to do that, then we are safe to also > not do it... Your logic seems right by me at least, especially given the > lack of flushes elsewhere. Right, I remove the extra writeback from powerpc, parisc and microblaze for the same reason. Those appear to only be there because they used the same function for _for_device() as for _for_cpu(), not because someone thought they were required. > Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks! Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-30 7:10 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-30 7:10 UTC (permalink / raw) To: Conor Dooley, Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Samuel Holland, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Brian Cain, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Wed, Mar 29, 2023, at 22:48, Conor Dooley wrote: > On Mon, Mar 27, 2023 at 02:13:04PM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> No other architecture intentionally writes back dirty cache lines into >> a buffer that a device has just finished writing into. If the cache is >> clean, this has no effect at all, but > >> if a cacheline in the buffer has >> actually been written by the CPU, there is a drive bug that is likely >> made worse by overwriting that buffer. > > So does this need a > Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using > zicbom extension") > then, even if the cacheline really should not have been touched by the > CPU? > Also, minor typo, s/drive/driver/. done > In the thread we had that sparked this, I went digging for the source of > the flushes, and it came from a review comment: > https://lore.kernel.org/linux-riscv/342e3c12-ebb0-badf-7d4c-c444a2b842b2@sholland.org/ Ah, so the comment that led to it was "For arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL), we expect the CPU to have written to the buffer, so this should flush, not invalidate." which sounds like Samuel just misunderstood what "bidirectional" means: the comment implies that both the cpu and the device access the buffer before arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL), but this is not allowed. Instead, the point is that the device may both read and write the buffer, requiring that we must do a writeback at arch_sync_dma_for_device(DMA_BIDIRECTIONAL) and an invalidate at arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL). The comment about arch_sync_dma_for_device(DMA_FROM_DEVICE) (in the same email) seems equally confused. It's of course easy to misunderstand these, and many others have gotten confused in similar ways before. > But *surely* if no other arch needs to do that, then we are safe to also > not do it... Your logic seems right by me at least, especially given the > lack of flushes elsewhere. Right, I remove the extra writeback from powerpc, parisc and microblaze for the same reason. Those appear to only be there because they used the same function for _for_device() as for _for_cpu(), not because someone thought they were required. > Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks! Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-30 7:10 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-30 7:10 UTC (permalink / raw) To: Conor Dooley, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Samuel Holland On Wed, Mar 29, 2023, at 22:48, Conor Dooley wrote: > On Mon, Mar 27, 2023 at 02:13:04PM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> No other architecture intentionally writes back dirty cache lines into >> a buffer that a device has just finished writing into. If the cache is >> clean, this has no effect at all, but > >> if a cacheline in the buffer has >> actually been written by the CPU, there is a drive bug that is likely >> made worse by overwriting that buffer. > > So does this need a > Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using > zicbom extension") > then, even if the cacheline really should not have been touched by the > CPU? > Also, minor typo, s/drive/driver/. done > In the thread we had that sparked this, I went digging for the source of > the flushes, and it came from a review comment: > https://lore.kernel.org/linux-riscv/342e3c12-ebb0-badf-7d4c-c444a2b842b2@sholland.org/ Ah, so the comment that led to it was "For arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL), we expect the CPU to have written to the buffer, so this should flush, not invalidate." which sounds like Samuel just misunderstood what "bidirectional" means: the comment implies that both the cpu and the device access the buffer before arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL), but this is not allowed. Instead, the point is that the device may both read and write the buffer, requiring that we must do a writeback at arch_sync_dma_for_device(DMA_BIDIRECTIONAL) and an invalidate at arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL). The comment about arch_sync_dma_for_device(DMA_FROM_DEVICE) (in the same email) seems equally confused. It's of course easy to misunderstand these, and many others have gotten confused in similar ways before. > But *surely* if no other arch needs to do that, then we are safe to also > not do it... Your logic seems right by me at least, especially given the > lack of flushes elsewhere. Right, I remove the extra writeback from powerpc, parisc and microblaze for the same reason. Those appear to only be there because they used the same function for _for_device() as for _for_cpu(), not because someone thought they were required. > Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks! Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-30 7:10 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-30 7:10 UTC (permalink / raw) To: Conor Dooley, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Samuel Holland On Wed, Mar 29, 2023, at 22:48, Conor Dooley wrote: > On Mon, Mar 27, 2023 at 02:13:04PM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> No other architecture intentionally writes back dirty cache lines into >> a buffer that a device has just finished writing into. If the cache is >> clean, this has no effect at all, but > >> if a cacheline in the buffer has >> actually been written by the CPU, there is a drive bug that is likely >> made worse by overwriting that buffer. > > So does this need a > Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using > zicbom extension") > then, even if the cacheline really should not have been touched by the > CPU? > Also, minor typo, s/drive/driver/. done > In the thread we had that sparked this, I went digging for the source of > the flushes, and it came from a review comment: > https://lore.kernel.org/linux-riscv/342e3c12-ebb0-badf-7d4c-c444a2b842b2@sholland.org/ Ah, so the comment that led to it was "For arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL), we expect the CPU to have written to the buffer, so this should flush, not invalidate." which sounds like Samuel just misunderstood what "bidirectional" means: the comment implies that both the cpu and the device access the buffer before arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL), but this is not allowed. Instead, the point is that the device may both read and write the buffer, requiring that we must do a writeback at arch_sync_dma_for_device(DMA_BIDIRECTIONAL) and an invalidate at arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL). The comment about arch_sync_dma_for_device(DMA_FROM_DEVICE) (in the same email) seems equally confused. It's of course easy to misunderstand these, and many others have gotten confused in similar ways before. > But *surely* if no other arch needs to do that, then we are safe to also > not do it... Your logic seems right by me at least, especially given the > lack of flushes elsewhere. Right, I remove the extra writeback from powerpc, parisc and microblaze for the same reason. Those appear to only be there because they used the same function for _for_device() as for _for_cpu(), not because someone thought they were required. > Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks! Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-30 7:10 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-30 7:10 UTC (permalink / raw) To: Conor Dooley, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich On Wed, Mar 29, 2023, at 22:48, Conor Dooley wrote: > On Mon, Mar 27, 2023 at 02:13:04PM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> No other architecture intentionally writes back dirty cache lines into >> a buffer that a device has just finished writing into. If the cache is >> clean, this has no effect at all, but > >> if a cacheline in the buffer has >> actually been written by the CPU, there is a drive bug that is likely >> made worse by overwriting that buffer. > > So does this need a > Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using > zicbom extension") > then, even if the cacheline really should not have been touched by the > CPU? > Also, minor typo, s/drive/driver/. done > In the thread we had that sparked this, I went digging for the source of > the flushes, and it came from a review comment: > https://lore.kernel.org/linux-riscv/342e3c12-ebb0-badf-7d4c-c444a2b842b2@sholland.org/ Ah, so the comment that led to it was "For arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL), we expect the CPU to have written to the buffer, so this should flush, not invalidate." which sounds like Samuel just misunderstood what "bidirectional" means: the comment implies that both the cpu and the device access the buffer before arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL), but this is not allowed. Instead, the point is that the device may both read and write the buffer, requiring that we must do a writeback at arch_sync_dma_for_device(DMA_BIDIRECTIONAL) and an invalidate at arch_sync_dma_for_cpu(DMA_BIDIRECTIONAL). The comment about arch_sync_dma_for_device(DMA_FROM_DEVICE) (in the same email) seems equally confused. It's of course easy to misunderstand these, and many others have gotten confused in similar ways before. > But *surely* if no other arch needs to do that, then we are safe to also > not do it... Your logic seems right by me at least, especially given the > lack of flushes elsewhere. Right, I remove the extra writeback from powerpc, parisc and microblaze for the same reason. Those appear to only be there because they used the same function for _for_device() as for _for_cpu(), not because someone thought they were required. > Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks! Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-29 21:51 ` Jessica Clarke -1 siblings, 0 replies; 456+ messages in thread From: Jessica Clarke @ 2023-03-29 21:51 UTC (permalink / raw) To: Arnd Bergmann Cc: Linux Kernel Mailing List, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, Linux ARM, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On 27 Mar 2023, at 13:13, Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. FYI [1] proposed this same change a while ago but its justification was flawed (which was my objection at the time, not the diff itself). Jess [1] https://lore.kernel.org/all/20220818165105.99746-1-s.miroshnichenko@yadro.com > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-29 21:51 ` Jessica Clarke 0 siblings, 0 replies; 456+ messages in thread From: Jessica Clarke @ 2023-03-29 21:51 UTC (permalink / raw) To: Arnd Bergmann Cc: Linux Kernel Mailing List, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, Linux ARM, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On 27 Mar 2023, at 13:13, Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. FYI [1] proposed this same change a while ago but its justification was flawed (which was my objection at the time, not the diff itself). Jess [1] https://lore.kernel.org/all/20220818165105.99746-1-s.miroshnichenko@yadro.com > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-29 21:51 ` Jessica Clarke 0 siblings, 0 replies; 456+ messages in thread From: Jessica Clarke @ 2023-03-29 21:51 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, Linux ARM, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, Linux Kernel Mailing List, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller On 27 Mar 2023, at 13:13, Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. FYI [1] proposed this same change a while ago but its justification was flawed (which was my objection at the time, not the diff itself). Jess [1] https://lore.kernel.org/all/20220818165105.99746-1-s.miroshnichenko@yadro.com > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-29 21:51 ` Jessica Clarke 0 siblings, 0 replies; 456+ messages in thread From: Jessica Clarke @ 2023-03-29 21:51 UTC (permalink / raw) To: Arnd Bergmann Cc: Linux Kernel Mailing List, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, Linux ARM, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On 27 Mar 2023, at 13:13, Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. FYI [1] proposed this same change a while ago but its justification was flawed (which was my objection at the time, not the diff itself). Jess [1] https://lore.kernel.org/all/20220818165105.99746-1-s.miroshnichenko@yadro.com > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-29 21:51 ` Jessica Clarke 0 siblings, 0 replies; 456+ messages in thread From: Jessica Clarke @ 2023-03-29 21:51 UTC (permalink / raw) To: Arnd Bergmann Cc: Linux Kernel Mailing List, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, Linux ARM, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On 27 Mar 2023, at 13:13, Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. FYI [1] proposed this same change a while ago but its justification was flawed (which was my objection at the time, not the diff itself). Jess [1] https://lore.kernel.org/all/20220818165105.99746-1-s.miroshnichenko@yadro.com > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-29 21:51 ` Jessica Clarke 0 siblings, 0 replies; 456+ messages in thread From: Jessica Clarke @ 2023-03-29 21:51 UTC (permalink / raw) To: Arnd Bergmann Cc: Linux Kernel Mailing List, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt <> On 27 Mar 2023, at 13:13, Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. FYI [1] proposed this same change a while ago but its justification was flawed (which was my objection at the time, not the diff itself). Jess [1] https://lore.kernel.org/all/20220818165105.99746-1-s.miroshnichenko@yadro.com > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-30 12:59 ` Lad, Prabhakar -1 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 12:59 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 1:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cheers, Prabhakar > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-30 12:59 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 12:59 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 1:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cheers, Prabhakar > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-30 12:59 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 12:59 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong <neil.armstr On Mon, Mar 27, 2023 at 1:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cheers, Prabhakar > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-30 12:59 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 12:59 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 1:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cheers, Prabhakar > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-30 12:59 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 12:59 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 1:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cheers, Prabhakar > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-03-30 12:59 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 12:59 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich On Mon, Mar 27, 2023 at 1:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cheers, Prabhakar > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-04-19 14:22 ` Palmer Dabbelt -1 siblings, 0 replies; 456+ messages in thread From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw) To: arnd Cc: linux-kernel, Arnd Bergmann, vgupta, linux, neil.armstrong, linus.walleij, Catalin Marinas, Will Deacon, guoren, bcain, geert, monstr, tsbogend, dinguyen, shorne, deller, mpe, christophe.leroy, Paul Walmsley, dalias, glaubitz, davem, jcmvbkbc, Christoph Hellwig, robin.murphy, prabhakar.mahadev-lad.rj, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, 27 Mar 2023 05:13:04 PDT (-0700), arnd@kernel.org wrote: > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; Acked-by: Palmer Dabbelt <palmer@rivosinc.com> ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-04-19 14:22 ` Palmer Dabbelt 0 siblings, 0 replies; 456+ messages in thread From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw) To: arnd Cc: linux-kernel, Arnd Bergmann, vgupta, linux, neil.armstrong, linus.walleij, Catalin Marinas, Will Deacon, guoren, bcain, geert, monstr, tsbogend, dinguyen, shorne, deller, mpe, christophe.leroy, Paul Walmsley, dalias, glaubitz, davem, jcmvbkbc, Christoph Hellwig, robin.murphy, prabhakar.mahadev-lad.rj, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, 27 Mar 2023 05:13:04 PDT (-0700), arnd@kernel.org wrote: > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; Acked-by: Palmer Dabbelt <palmer@rivosinc.com> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-04-19 14:22 ` Palmer Dabbelt 0 siblings, 0 replies; 456+ messages in thread From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw) To: arnd Cc: dalias, linux-sh, Catalin Marinas, linus.walleij, glaubitz, linux-mips, jcmvbkbc, Conor Dooley, guoren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, deller, linux, geert, vgupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, bcain, prabhakar.mahadev-lad.rj, linux-m68k, Paul Walmsley, shorne, linux-arm-kernel, neil.armstrong, monstr, tsbogend, linux-parisc, linux-openrisc, linuxppc-dev, linux-kernel, dinguyen, linux- On Mon, 27 Mar 2023 05:13:04 PDT (-0700), arnd@kernel.org wrote: > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; Acked-by: Palmer Dabbelt <palmer@rivosinc.com> ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-04-19 14:22 ` Palmer Dabbelt 0 siblings, 0 replies; 456+ messages in thread From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw) To: arnd Cc: linux-kernel, Arnd Bergmann, vgupta, linux, neil.armstrong, linus.walleij, Catalin Marinas, Will Deacon, guoren, bcain, geert, monstr, tsbogend, dinguyen, shorne, deller, mpe, christophe.leroy, Paul Walmsley, dalias, glaubitz, davem, jcmvbkbc, Christoph Hellwig, robin.murphy, prabhakar.mahadev-lad.rj, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, 27 Mar 2023 05:13:04 PDT (-0700), arnd@kernel.org wrote: > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; Acked-by: Palmer Dabbelt <palmer@rivosinc.com> _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-04-19 14:22 ` Palmer Dabbelt 0 siblings, 0 replies; 456+ messages in thread From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw) To: arnd Cc: linux-kernel, Arnd Bergmann, vgupta, linux, neil.armstrong, linus.walleij, Catalin Marinas, Will Deacon, guoren, bcain, geert, monstr, tsbogend, dinguyen, shorne, deller, mpe, christophe.leroy, Paul Walmsley, dalias, glaubitz, davem, jcmvbkbc, Christoph Hellwig, robin.murphy, prabhakar.mahadev-lad.rj, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, 27 Mar 2023 05:13:04 PDT (-0700), arnd@kernel.org wrote: > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; Acked-by: Palmer Dabbelt <palmer@rivosinc.com> _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush @ 2023-04-19 14:22 ` Palmer Dabbelt 0 siblings, 0 replies; 456+ messages in thread From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw) To: arnd Cc: linux-kernel, Arnd Bergmann, vgupta, linux, neil.armstrong, linus.walleij, Catalin Marinas, Will Deacon, guoren, bcain, geert, monstr, tsbogend, dinguyen, shorne, deller, mpe, christophe.leroy, Paul Walmsley, dalias, glaubitz, davem, jcmvbkbc, Christoph Hellwig, robin.murphy, prabhakar.mahadev-lad.rj, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k On Mon, 27 Mar 2023 05:13:04 PDT (-0700), arnd@kernel.org wrote: > From: Arnd Bergmann <arnd@arndb.de> > > No other architecture intentionally writes back dirty cache lines into > a buffer that a device has just finished writing into. If the cache is > clean, this has no effect at all, but if a cacheline in the buffer has > actually been written by the CPU, there is a drive bug that is likely > made worse by overwriting that buffer. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..640f4c496d26 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; Acked-by: Palmer Dabbelt <palmer@rivosinc.com> ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned first to let the device see data written by the CPU, and invalidated after the transfer to let the CPU see data written by the device. riscv also invalidates the caches before the transfer, which does not appear to serve any purpose. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 640f4c496d26..69c80b2155a1 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned first to let the device see data written by the CPU, and invalidated after the transfer to let the CPU see data written by the device. riscv also invalidates the caches before the transfer, which does not appear to serve any purpose. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 640f4c496d26..69c80b2155a1 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned first to let the device see data written by the CPU, and invalidated after the transfer to let the CPU see data written by the device. riscv also invalidates the caches before the transfer, which does not appear to serve any purpose. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 640f4c496d26..69c80b2155a1 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned first to let the device see data written by the CPU, and invalidated after the transfer to let the CPU see data written by the device. riscv also invalidates the caches before the transfer, which does not appear to serve any purpose. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 640f4c496d26..69c80b2155a1 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned first to let the device see data written by the CPU, and invalidated after the transfer to let the CPU see data written by the device. riscv also invalidates the caches before the transfer, which does not appear to serve any purpose. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 640f4c496d26..69c80b2155a1 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned first to let the device see data written by the CPU, and invalidated after the transfer to let the CPU see data written by the device. riscv also invalidates the caches before the transfer, which does not appear to serve any purpose. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 640f4c496d26..69c80b2155a1 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-29 20:16 ` Conor Dooley -1 siblings, 0 replies; 456+ messages in thread From: Conor Dooley @ 2023-03-29 20:16 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa [-- Attachment #1: Type: text/plain, Size: 556 bytes --] On Mon, Mar 27, 2023 at 02:13:05PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. Rationale makes sense to me.. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks for working on all of this Arnd! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-29 20:16 ` Conor Dooley 0 siblings, 0 replies; 456+ messages in thread From: Conor Dooley @ 2023-03-29 20:16 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa [-- Attachment #1.1: Type: text/plain, Size: 556 bytes --] On Mon, Mar 27, 2023 at 02:13:05PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. Rationale makes sense to me.. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks for working on all of this Arnd! [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-29 20:16 ` Conor Dooley 0 siblings, 0 replies; 456+ messages in thread From: Conor Dooley @ 2023-03-29 20:16 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong <neil.armstr [-- Attachment #1: Type: text/plain, Size: 556 bytes --] On Mon, Mar 27, 2023 at 02:13:05PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. Rationale makes sense to me.. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks for working on all of this Arnd! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-29 20:16 ` Conor Dooley 0 siblings, 0 replies; 456+ messages in thread From: Conor Dooley @ 2023-03-29 20:16 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa [-- Attachment #1.1: Type: text/plain, Size: 556 bytes --] On Mon, Mar 27, 2023 at 02:13:05PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. Rationale makes sense to me.. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks for working on all of this Arnd! [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] [-- Attachment #2: Type: text/plain, Size: 170 bytes --] _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-29 20:16 ` Conor Dooley 0 siblings, 0 replies; 456+ messages in thread From: Conor Dooley @ 2023-03-29 20:16 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa [-- Attachment #1.1: Type: text/plain, Size: 556 bytes --] On Mon, Mar 27, 2023 at 02:13:05PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. Rationale makes sense to me.. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks for working on all of this Arnd! [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] [-- Attachment #2: Type: text/plain, Size: 161 bytes --] _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-29 20:16 ` Conor Dooley 0 siblings, 0 replies; 456+ messages in thread From: Conor Dooley @ 2023-03-29 20:16 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich [-- Attachment #1: Type: text/plain, Size: 556 bytes --] On Mon, Mar 27, 2023 at 02:13:05PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. Rationale makes sense to me.. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks for working on all of this Arnd! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-30 13:26 ` Lad, Prabhakar -1 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 13:26 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 1:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cheers, Prabhakar > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-30 13:26 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 13:26 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 1:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cheers, Prabhakar > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-30 13:26 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 13:26 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong <neil.armstr On Mon, Mar 27, 2023 at 1:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cheers, Prabhakar > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-30 13:26 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 13:26 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 1:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cheers, Prabhakar > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-30 13:26 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 13:26 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 1:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cheers, Prabhakar > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-03-30 13:26 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 13:26 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, M On Mon, Mar 27, 2023 at 1:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cheers, Prabhakar > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-04-19 14:22 ` Palmer Dabbelt -1 siblings, 0 replies; 456+ messages in thread From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw) To: arnd Cc: linux-kernel, Arnd Bergmann, vgupta, linux, neil.armstrong, linus.walleij, Catalin Marinas, Will Deacon, guoren, bcain, geert, monstr, tsbogend, dinguyen, shorne, deller, mpe, christophe.leroy, Paul Walmsley, dalias, glaubitz, davem, jcmvbkbc, Christoph Hellwig, robin.murphy, prabhakar.mahadev-lad.rj, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, 27 Mar 2023 05:13:05 PDT (-0700), arnd@kernel.org wrote: > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; Acked-by: Palmer Dabbelt <palmer@rivosinc.com> ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-04-19 14:22 ` Palmer Dabbelt 0 siblings, 0 replies; 456+ messages in thread From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw) To: arnd Cc: linux-kernel, Arnd Bergmann, vgupta, linux, neil.armstrong, linus.walleij, Catalin Marinas, Will Deacon, guoren, bcain, geert, monstr, tsbogend, dinguyen, shorne, deller, mpe, christophe.leroy, Paul Walmsley, dalias, glaubitz, davem, jcmvbkbc, Christoph Hellwig, robin.murphy, prabhakar.mahadev-lad.rj, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, 27 Mar 2023 05:13:05 PDT (-0700), arnd@kernel.org wrote: > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; Acked-by: Palmer Dabbelt <palmer@rivosinc.com> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-04-19 14:22 ` Palmer Dabbelt 0 siblings, 0 replies; 456+ messages in thread From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw) To: arnd Cc: dalias, linux-sh, Catalin Marinas, linus.walleij, glaubitz, linux-mips, jcmvbkbc, Conor Dooley, guoren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, deller, linux, geert, vgupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, bcain, prabhakar.mahadev-lad.rj, linux-m68k, Paul Walmsley, shorne, linux-arm-kernel, neil.armstrong, monstr, tsbogend, linux-parisc, linux-openrisc, linuxppc-dev, linux-kernel, dinguyen, linux- On Mon, 27 Mar 2023 05:13:05 PDT (-0700), arnd@kernel.org wrote: > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; Acked-by: Palmer Dabbelt <palmer@rivosinc.com> ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-04-19 14:22 ` Palmer Dabbelt 0 siblings, 0 replies; 456+ messages in thread From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw) To: arnd Cc: linux-kernel, Arnd Bergmann, vgupta, linux, neil.armstrong, linus.walleij, Catalin Marinas, Will Deacon, guoren, bcain, geert, monstr, tsbogend, dinguyen, shorne, deller, mpe, christophe.leroy, Paul Walmsley, dalias, glaubitz, davem, jcmvbkbc, Christoph Hellwig, robin.murphy, prabhakar.mahadev-lad.rj, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, 27 Mar 2023 05:13:05 PDT (-0700), arnd@kernel.org wrote: > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; Acked-by: Palmer Dabbelt <palmer@rivosinc.com> _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-04-19 14:22 ` Palmer Dabbelt 0 siblings, 0 replies; 456+ messages in thread From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw) To: arnd Cc: linux-kernel, Arnd Bergmann, vgupta, linux, neil.armstrong, linus.walleij, Catalin Marinas, Will Deacon, guoren, bcain, geert, monstr, tsbogend, dinguyen, shorne, deller, mpe, christophe.leroy, Paul Walmsley, dalias, glaubitz, davem, jcmvbkbc, Christoph Hellwig, robin.murphy, prabhakar.mahadev-lad.rj, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, 27 Mar 2023 05:13:05 PDT (-0700), arnd@kernel.org wrote: > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; Acked-by: Palmer Dabbelt <palmer@rivosinc.com> _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-04-19 14:22 ` Palmer Dabbelt 0 siblings, 0 replies; 456+ messages in thread From: Palmer Dabbelt @ 2023-04-19 14:22 UTC (permalink / raw) To: arnd Cc: linux-kernel, Arnd Bergmann, vgupta, linux, neil.armstrong, linus.walleij, Catalin Marinas, Will Deacon, guoren, bcain, geert, monstr, tsbogend, dinguyen, shorne, deller, mpe, christophe.leroy, Paul Walmsley, dalias, glaubitz, davem, jcmvbkbc, Christoph Hellwig, robin.murphy, prabhakar.mahadev-lad.rj, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k On Mon, 27 Mar 2023 05:13:05 PDT (-0700), arnd@kernel.org wrote: > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; Acked-by: Palmer Dabbelt <palmer@rivosinc.com> ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-05-05 5:47 ` Guo Ren -1 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-05-05 5:47 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, Christoph Hellwig Cc: linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. Yes, we can't guarantee the CPU pre-load cache lines randomly during dma working. But I've two purposes to keep invalidates before dma transfer: - We clearly tell the CPU these cache lines are invalid. The caching algorithm would use these invalid slots first instead of replacing valid ones. - Invalidating is very cheap. Actually, flush and clean have the same performance in our machine. So, how about: diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..2c52fbc15064 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: /* I'm not sure all drivers have guaranteed cacheline alignment. If not, this inval would cause problems */ - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > -- Best Regards Guo Ren ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-05 5:47 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-05-05 5:47 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, Christoph Hellwig Cc: linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. Yes, we can't guarantee the CPU pre-load cache lines randomly during dma working. But I've two purposes to keep invalidates before dma transfer: - We clearly tell the CPU these cache lines are invalid. The caching algorithm would use these invalid slots first instead of replacing valid ones. - Invalidating is very cheap. Actually, flush and clean have the same performance in our machine. So, how about: diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..2c52fbc15064 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: /* I'm not sure all drivers have guaranteed cacheline alignment. If not, this inval would cause problems */ - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > -- Best Regards Guo Ren _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-05 5:47 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-05-05 5:47 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, Christoph Hellwig Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, linux-csky, sparclinux, linux-riscv, Will Deacon, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. Yes, we can't guarantee the CPU pre-load cache lines randomly during dma working. But I've two purposes to keep invalidates before dma transfer: - We clearly tell the CPU these cache lines are invalid. The caching algorithm would use these invalid slots first instead of replacing valid ones. - Invalidating is very cheap. Actually, flush and clean have the same performance in our machine. So, how about: diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..2c52fbc15064 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: /* I'm not sure all drivers have guaranteed cacheline alignment. If not, this inval would cause problems */ - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > -- Best Regards Guo Ren ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-05 5:47 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-05-05 5:47 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, Christoph Hellwig Cc: linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. Yes, we can't guarantee the CPU pre-load cache lines randomly during dma working. But I've two purposes to keep invalidates before dma transfer: - We clearly tell the CPU these cache lines are invalid. The caching algorithm would use these invalid slots first instead of replacing valid ones. - Invalidating is very cheap. Actually, flush and clean have the same performance in our machine. So, how about: diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..2c52fbc15064 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: /* I'm not sure all drivers have guaranteed cacheline alignment. If not, this inval would cause problems */ - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > -- Best Regards Guo Ren _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-05 5:47 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-05-05 5:47 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, Christoph Hellwig Cc: linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. Yes, we can't guarantee the CPU pre-load cache lines randomly during dma working. But I've two purposes to keep invalidates before dma transfer: - We clearly tell the CPU these cache lines are invalid. The caching algorithm would use these invalid slots first instead of replacing valid ones. - Invalidating is very cheap. Actually, flush and clean have the same performance in our machine. So, how about: diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..2c52fbc15064 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: /* I'm not sure all drivers have guaranteed cacheline alignment. If not, this inval would cause problems */ - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > -- Best Regards Guo Ren _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-05 5:47 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-05-05 5:47 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, Christoph Hellwig Cc: linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz <> On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned > first to let the device see data written by the CPU, and invalidated > after the transfer to let the CPU see data written by the device. > > riscv also invalidates the caches before the transfer, which does > not appear to serve any purpose. Yes, we can't guarantee the CPU pre-load cache lines randomly during dma working. But I've two purposes to keep invalidates before dma transfer: - We clearly tell the CPU these cache lines are invalid. The caching algorithm would use these invalid slots first instead of replacing valid ones. - Invalidating is very cheap. Actually, flush and clean have the same performance in our machine. So, how about: diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..2c52fbc15064 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: /* I'm not sure all drivers have guaranteed cacheline alignment. If not, this inval would cause problems */ - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 640f4c496d26..69c80b2155a1 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.39.2 > -- Best Regards Guo Ren ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA 2023-05-05 5:47 ` Guo Ren ` (3 preceding siblings ...) (?) @ 2023-05-05 13:18 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-05-05 13:18 UTC (permalink / raw) To: guoren, Arnd Bergmann, Christoph Hellwig Cc: linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: >> >> riscv also invalidates the caches before the transfer, which does >> not appear to serve any purpose. > Yes, we can't guarantee the CPU pre-load cache lines randomly during > dma working. > > But I've two purposes to keep invalidates before dma transfer: > - We clearly tell the CPU these cache lines are invalid. The caching > algorithm would use these invalid slots first instead of replacing > valid ones. > - Invalidating is very cheap. Actually, flush and clean have the same > performance in our machine. The main purpose of the series was to get consistent behavior on all machines, so I really don't want a custom optimization on one architecture. You make a good point about cacheline reuse after invalidation, but if we do that, I'd suggest doing this across all architectures. > So, how about: > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..2c52fbc15064 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > case DMA_BIDIRECTIONAL: > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > break; This is something we can consider. Unfortunately, this is something that no architecture (except pa-risc, which has other problems) does at the moment, so we'd probably need to have a proper debate about this. We already have two conflicting ways to handle DMA_FROM_DEVICE, either invalidate/invalidate, or clean/invalidate. I can see that flush/invalidate may be a sensible option as well, but I'd want to have that discussion after the series is complete, so we can come to a generic solution that has the same documented behavior across all architectures. In particular, if we end up moving arm64 and riscv back to the traditional invalidate/invalidate for DMA_FROM_DEVICE and document that driver must not rely on buffers getting cleaned before a partial DMA_FROM_DEVICE, the question between clean or flush becomes moot as well. > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > /* I'm not sure all drivers have guaranteed cacheline > alignment. If not, this inval would cause problems */ > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; This is my original patch, and I would not mix it with the other change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in is that both flush and inval would be wrong if you get simultaneous writes from device and cpu to the same cache line, so there is no way to win this. Using inval instead of flush would at least work if the CPU data in the cacheline is read-only from the CPU, so that seems better than something that is always wrong. The documented API is that sharing the cache line is not allowed at all, so anything that would observe a difference between the two is also a bug. One idea that we have considered already is that we could overwrite the unused bits of the cacheline with poison values and/or mark them as invalid using KASAN for debugging purposes, to find drivers that already violate this. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-05 13:18 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-05-05 13:18 UTC (permalink / raw) To: guoren, Arnd Bergmann, Christoph Hellwig Cc: linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: >> >> riscv also invalidates the caches before the transfer, which does >> not appear to serve any purpose. > Yes, we can't guarantee the CPU pre-load cache lines randomly during > dma working. > > But I've two purposes to keep invalidates before dma transfer: > - We clearly tell the CPU these cache lines are invalid. The caching > algorithm would use these invalid slots first instead of replacing > valid ones. > - Invalidating is very cheap. Actually, flush and clean have the same > performance in our machine. The main purpose of the series was to get consistent behavior on all machines, so I really don't want a custom optimization on one architecture. You make a good point about cacheline reuse after invalidation, but if we do that, I'd suggest doing this across all architectures. > So, how about: > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..2c52fbc15064 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > case DMA_BIDIRECTIONAL: > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > break; This is something we can consider. Unfortunately, this is something that no architecture (except pa-risc, which has other problems) does at the moment, so we'd probably need to have a proper debate about this. We already have two conflicting ways to handle DMA_FROM_DEVICE, either invalidate/invalidate, or clean/invalidate. I can see that flush/invalidate may be a sensible option as well, but I'd want to have that discussion after the series is complete, so we can come to a generic solution that has the same documented behavior across all architectures. In particular, if we end up moving arm64 and riscv back to the traditional invalidate/invalidate for DMA_FROM_DEVICE and document that driver must not rely on buffers getting cleaned before a partial DMA_FROM_DEVICE, the question between clean or flush becomes moot as well. > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > /* I'm not sure all drivers have guaranteed cacheline > alignment. If not, this inval would cause problems */ > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; This is my original patch, and I would not mix it with the other change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in is that both flush and inval would be wrong if you get simultaneous writes from device and cpu to the same cache line, so there is no way to win this. Using inval instead of flush would at least work if the CPU data in the cacheline is read-only from the CPU, so that seems better than something that is always wrong. The documented API is that sharing the cache line is not allowed at all, so anything that would observe a difference between the two is also a bug. One idea that we have considered already is that we could overwrite the unused bits of the cacheline with poison values and/or mark them as invalid using KASAN for debugging purposes, to find drivers that already violate this. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-05 13:18 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-05-05 13:18 UTC (permalink / raw) To: guoren, Arnd Bergmann, Christoph Hellwig Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Brian Cain, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: >> >> riscv also invalidates the caches before the transfer, which does >> not appear to serve any purpose. > Yes, we can't guarantee the CPU pre-load cache lines randomly during > dma working. > > But I've two purposes to keep invalidates before dma transfer: > - We clearly tell the CPU these cache lines are invalid. The caching > algorithm would use these invalid slots first instead of replacing > valid ones. > - Invalidating is very cheap. Actually, flush and clean have the same > performance in our machine. The main purpose of the series was to get consistent behavior on all machines, so I really don't want a custom optimization on one architecture. You make a good point about cacheline reuse after invalidation, but if we do that, I'd suggest doing this across all architectures. > So, how about: > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..2c52fbc15064 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > case DMA_BIDIRECTIONAL: > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > break; This is something we can consider. Unfortunately, this is something that no architecture (except pa-risc, which has other problems) does at the moment, so we'd probably need to have a proper debate about this. We already have two conflicting ways to handle DMA_FROM_DEVICE, either invalidate/invalidate, or clean/invalidate. I can see that flush/invalidate may be a sensible option as well, but I'd want to have that discussion after the series is complete, so we can come to a generic solution that has the same documented behavior across all architectures. In particular, if we end up moving arm64 and riscv back to the traditional invalidate/invalidate for DMA_FROM_DEVICE and document that driver must not rely on buffers getting cleaned before a partial DMA_FROM_DEVICE, the question between clean or flush becomes moot as well. > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > /* I'm not sure all drivers have guaranteed cacheline > alignment. If not, this inval would cause problems */ > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; This is my original patch, and I would not mix it with the other change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in is that both flush and inval would be wrong if you get simultaneous writes from device and cpu to the same cache line, so there is no way to win this. Using inval instead of flush would at least work if the CPU data in the cacheline is read-only from the CPU, so that seems better than something that is always wrong. The documented API is that sharing the cache line is not allowed at all, so anything that would observe a difference between the two is also a bug. One idea that we have considered already is that we could overwrite the unused bits of the cacheline with poison values and/or mark them as invalid using KASAN for debugging purposes, to find drivers that already violate this. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-05 13:18 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-05-05 13:18 UTC (permalink / raw) To: guoren, Arnd Bergmann, Christoph Hellwig Cc: linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: >> >> riscv also invalidates the caches before the transfer, which does >> not appear to serve any purpose. > Yes, we can't guarantee the CPU pre-load cache lines randomly during > dma working. > > But I've two purposes to keep invalidates before dma transfer: > - We clearly tell the CPU these cache lines are invalid. The caching > algorithm would use these invalid slots first instead of replacing > valid ones. > - Invalidating is very cheap. Actually, flush and clean have the same > performance in our machine. The main purpose of the series was to get consistent behavior on all machines, so I really don't want a custom optimization on one architecture. You make a good point about cacheline reuse after invalidation, but if we do that, I'd suggest doing this across all architectures. > So, how about: > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..2c52fbc15064 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > case DMA_BIDIRECTIONAL: > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > break; This is something we can consider. Unfortunately, this is something that no architecture (except pa-risc, which has other problems) does at the moment, so we'd probably need to have a proper debate about this. We already have two conflicting ways to handle DMA_FROM_DEVICE, either invalidate/invalidate, or clean/invalidate. I can see that flush/invalidate may be a sensible option as well, but I'd want to have that discussion after the series is complete, so we can come to a generic solution that has the same documented behavior across all architectures. In particular, if we end up moving arm64 and riscv back to the traditional invalidate/invalidate for DMA_FROM_DEVICE and document that driver must not rely on buffers getting cleaned before a partial DMA_FROM_DEVICE, the question between clean or flush becomes moot as well. > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > /* I'm not sure all drivers have guaranteed cacheline > alignment. If not, this inval would cause problems */ > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; This is my original patch, and I would not mix it with the other change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in is that both flush and inval would be wrong if you get simultaneous writes from device and cpu to the same cache line, so there is no way to win this. Using inval instead of flush would at least work if the CPU data in the cacheline is read-only from the CPU, so that seems better than something that is always wrong. The documented API is that sharing the cache line is not allowed at all, so anything that would observe a difference between the two is also a bug. One idea that we have considered already is that we could overwrite the unused bits of the cacheline with poison values and/or mark them as invalid using KASAN for debugging purposes, to find drivers that already violate this. Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-05 13:18 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-05-05 13:18 UTC (permalink / raw) To: guoren, Arnd Bergmann, Christoph Hellwig Cc: linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: >> >> riscv also invalidates the caches before the transfer, which does >> not appear to serve any purpose. > Yes, we can't guarantee the CPU pre-load cache lines randomly during > dma working. > > But I've two purposes to keep invalidates before dma transfer: > - We clearly tell the CPU these cache lines are invalid. The caching > algorithm would use these invalid slots first instead of replacing > valid ones. > - Invalidating is very cheap. Actually, flush and clean have the same > performance in our machine. The main purpose of the series was to get consistent behavior on all machines, so I really don't want a custom optimization on one architecture. You make a good point about cacheline reuse after invalidation, but if we do that, I'd suggest doing this across all architectures. > So, how about: > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..2c52fbc15064 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > case DMA_BIDIRECTIONAL: > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > break; This is something we can consider. Unfortunately, this is something that no architecture (except pa-risc, which has other problems) does at the moment, so we'd probably need to have a proper debate about this. We already have two conflicting ways to handle DMA_FROM_DEVICE, either invalidate/invalidate, or clean/invalidate. I can see that flush/invalidate may be a sensible option as well, but I'd want to have that discussion after the series is complete, so we can come to a generic solution that has the same documented behavior across all architectures. In particular, if we end up moving arm64 and riscv back to the traditional invalidate/invalidate for DMA_FROM_DEVICE and document that driver must not rely on buffers getting cleaned before a partial DMA_FROM_DEVICE, the question between clean or flush becomes moot as well. > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > /* I'm not sure all drivers have guaranteed cacheline > alignment. If not, this inval would cause problems */ > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; This is my original patch, and I would not mix it with the other change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in is that both flush and inval would be wrong if you get simultaneous writes from device and cpu to the same cache line, so there is no way to win this. Using inval instead of flush would at least work if the CPU data in the cacheline is read-only from the CPU, so that seems better than something that is always wrong. The documented API is that sharing the cache line is not allowed at all, so anything that would observe a difference between the two is also a bug. One idea that we have considered already is that we could overwrite the unused bits of the cacheline with poison values and/or mark them as invalid using KASAN for debugging purposes, to find drivers that already violate this. Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-05 13:18 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-05-05 13:18 UTC (permalink / raw) To: guoren, Arnd Bergmann, Christoph Hellwig Cc: linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: >> >> riscv also invalidates the caches before the transfer, which does >> not appear to serve any purpose. > Yes, we can't guarantee the CPU pre-load cache lines randomly during > dma working. > > But I've two purposes to keep invalidates before dma transfer: > - We clearly tell the CPU these cache lines are invalid. The caching > algorithm would use these invalid slots first instead of replacing > valid ones. > - Invalidating is very cheap. Actually, flush and clean have the same > performance in our machine. The main purpose of the series was to get consistent behavior on all machines, so I really don't want a custom optimization on one architecture. You make a good point about cacheline reuse after invalidation, but if we do that, I'd suggest doing this across all architectures. > So, how about: > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index d919efab6eba..2c52fbc15064 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > break; > case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > case DMA_BIDIRECTIONAL: > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > break; This is something we can consider. Unfortunately, this is something that no architecture (except pa-risc, which has other problems) does at the moment, so we'd probably need to have a proper debate about this. We already have two conflicting ways to handle DMA_FROM_DEVICE, either invalidate/invalidate, or clean/invalidate. I can see that flush/invalidate may be a sensible option as well, but I'd want to have that discussion after the series is complete, so we can come to a generic solution that has the same documented behavior across all architectures. In particular, if we end up moving arm64 and riscv back to the traditional invalidate/invalidate for DMA_FROM_DEVICE and document that driver must not rely on buffers getting cleaned before a partial DMA_FROM_DEVICE, the question between clean or flush becomes moot as well. > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > /* I'm not sure all drivers have guaranteed cacheline > alignment. If not, this inval would cause problems */ > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; This is my original patch, and I would not mix it with the other change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in is that both flush and inval would be wrong if you get simultaneous writes from device and cpu to the same cache line, so there is no way to win this. Using inval instead of flush would at least work if the CPU data in the cacheline is read-only from the CPU, so that seems better than something that is always wrong. The documented API is that sharing the cache line is not allowed at all, so anything that would observe a difference between the two is also a bug. One idea that we have considered already is that we could overwrite the unused bits of the cacheline with poison values and/or mark them as invalid using KASAN for debugging purposes, to find drivers that already violate this. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA 2023-05-05 13:18 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-05-06 7:25 ` Guo Ren -1 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-05-06 7:25 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, Christoph Hellwig, linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > >> > >> riscv also invalidates the caches before the transfer, which does > >> not appear to serve any purpose. > > Yes, we can't guarantee the CPU pre-load cache lines randomly during > > dma working. > > > > But I've two purposes to keep invalidates before dma transfer: > > - We clearly tell the CPU these cache lines are invalid. The caching > > algorithm would use these invalid slots first instead of replacing > > valid ones. > > - Invalidating is very cheap. Actually, flush and clean have the same > > performance in our machine. > > The main purpose of the series was to get consistent behavior on > all machines, so I really don't want a custom optimization on > one architecture. You make a good point about cacheline reuse > after invalidation, but if we do that, I'd suggest doing this > across all architectures. Yes, invalidation of DMA_FROM_DEVICE-for_device is a proposal for all architectures. > > > So, how about: > > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > index d919efab6eba..2c52fbc15064 100644 > > --- a/arch/riscv/mm/dma-noncoherent.c > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > break; > > case DMA_FROM_DEVICE: > > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > - break; > > case DMA_BIDIRECTIONAL: > > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > break; > > This is something we can consider. Unfortunately, this is something > that no architecture (except pa-risc, which has other problems) > does at the moment, so we'd probably need to have a proper debate > about this. > > We already have two conflicting ways to handle DMA_FROM_DEVICE, > either invalidate/invalidate, or clean/invalidate. I can see I vote to invalidate/invalidate. My key point is to let DMA_FROM_DEVICE-for_device invalidate, and DMA_BIDIRECTIONAL contains DMA_FROM_DEVICE. So I also agree: @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(invalidate, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; > that flush/invalidate may be a sensible option as well, but I'd > want to have that discussion after the series is complete, so > we can come to a generic solution that has the same documented > behavior across all architectures. Yes, I agree to unify them into a generic solution first. My proposal could be another topic in the future. For that purpose, I give Acked-by: Guo Ren <guoren@kernel.org> > > In particular, if we end up moving arm64 and riscv back to the > traditional invalidate/invalidate for DMA_FROM_DEVICE and > document that driver must not rely on buffers getting cleaned After invalidation, the cache lines are also cleaned, right? So why do we need to document it additionally? > before a partial DMA_FROM_DEVICE, the question between clean > or flush becomes moot as well. > > > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > > break; > > case DMA_FROM_DEVICE: > > case DMA_BIDIRECTIONAL: > > /* I'm not sure all drivers have guaranteed cacheline > > alignment. If not, this inval would cause problems */ > > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > > break; > > This is my original patch, and I would not mix it with the other > change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in > is that both flush and inval would be wrong if you get simultaneous > writes from device and cpu to the same cache line, so there is > no way to win this. Using inval instead of flush would at least > work if the CPU data in the cacheline is read-only from the CPU, > so that seems better than something that is always wrong. If CPU data in the cacheline is read-only, the cacheline would never be dirty. Yes, It's always safe. Okay, I agree we must keep cache-line-aligned. I comment it here, just worry some dirty drivers couldn't work with the "invalid mechanism" because of the CPU data corruption, and device data in the cacheline is useless. > > The documented API is that sharing the cache line is not allowed > at all, so anything that would observe a difference between the > two is also a bug. One idea that we have considered already is > that we could overwrite the unused bits of the cacheline with > poison values and/or mark them as invalid using KASAN for debugging > purposes, to find drivers that already violate this. > > Arnd -- Best Regards Guo Ren ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-06 7:25 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-05-06 7:25 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, Christoph Hellwig, linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > >> > >> riscv also invalidates the caches before the transfer, which does > >> not appear to serve any purpose. > > Yes, we can't guarantee the CPU pre-load cache lines randomly during > > dma working. > > > > But I've two purposes to keep invalidates before dma transfer: > > - We clearly tell the CPU these cache lines are invalid. The caching > > algorithm would use these invalid slots first instead of replacing > > valid ones. > > - Invalidating is very cheap. Actually, flush and clean have the same > > performance in our machine. > > The main purpose of the series was to get consistent behavior on > all machines, so I really don't want a custom optimization on > one architecture. You make a good point about cacheline reuse > after invalidation, but if we do that, I'd suggest doing this > across all architectures. Yes, invalidation of DMA_FROM_DEVICE-for_device is a proposal for all architectures. > > > So, how about: > > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > index d919efab6eba..2c52fbc15064 100644 > > --- a/arch/riscv/mm/dma-noncoherent.c > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > break; > > case DMA_FROM_DEVICE: > > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > - break; > > case DMA_BIDIRECTIONAL: > > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > break; > > This is something we can consider. Unfortunately, this is something > that no architecture (except pa-risc, which has other problems) > does at the moment, so we'd probably need to have a proper debate > about this. > > We already have two conflicting ways to handle DMA_FROM_DEVICE, > either invalidate/invalidate, or clean/invalidate. I can see I vote to invalidate/invalidate. My key point is to let DMA_FROM_DEVICE-for_device invalidate, and DMA_BIDIRECTIONAL contains DMA_FROM_DEVICE. So I also agree: @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(invalidate, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; > that flush/invalidate may be a sensible option as well, but I'd > want to have that discussion after the series is complete, so > we can come to a generic solution that has the same documented > behavior across all architectures. Yes, I agree to unify them into a generic solution first. My proposal could be another topic in the future. For that purpose, I give Acked-by: Guo Ren <guoren@kernel.org> > > In particular, if we end up moving arm64 and riscv back to the > traditional invalidate/invalidate for DMA_FROM_DEVICE and > document that driver must not rely on buffers getting cleaned After invalidation, the cache lines are also cleaned, right? So why do we need to document it additionally? > before a partial DMA_FROM_DEVICE, the question between clean > or flush becomes moot as well. > > > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > > break; > > case DMA_FROM_DEVICE: > > case DMA_BIDIRECTIONAL: > > /* I'm not sure all drivers have guaranteed cacheline > > alignment. If not, this inval would cause problems */ > > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > > break; > > This is my original patch, and I would not mix it with the other > change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in > is that both flush and inval would be wrong if you get simultaneous > writes from device and cpu to the same cache line, so there is > no way to win this. Using inval instead of flush would at least > work if the CPU data in the cacheline is read-only from the CPU, > so that seems better than something that is always wrong. If CPU data in the cacheline is read-only, the cacheline would never be dirty. Yes, It's always safe. Okay, I agree we must keep cache-line-aligned. I comment it here, just worry some dirty drivers couldn't work with the "invalid mechanism" because of the CPU data corruption, and device data in the cacheline is useless. > > The documented API is that sharing the cache line is not allowed > at all, so anything that would observe a difference between the > two is also a bug. One idea that we have considered already is > that we could overwrite the unused bits of the cacheline with > poison values and/or mark them as invalid using KASAN for debugging > purposes, to find drivers that already violate this. > > Arnd -- Best Regards Guo Ren _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-06 7:25 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-05-06 7:25 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Neil Armstrong, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Brian Cain, Arnd Be rgmann, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > >> > >> riscv also invalidates the caches before the transfer, which does > >> not appear to serve any purpose. > > Yes, we can't guarantee the CPU pre-load cache lines randomly during > > dma working. > > > > But I've two purposes to keep invalidates before dma transfer: > > - We clearly tell the CPU these cache lines are invalid. The caching > > algorithm would use these invalid slots first instead of replacing > > valid ones. > > - Invalidating is very cheap. Actually, flush and clean have the same > > performance in our machine. > > The main purpose of the series was to get consistent behavior on > all machines, so I really don't want a custom optimization on > one architecture. You make a good point about cacheline reuse > after invalidation, but if we do that, I'd suggest doing this > across all architectures. Yes, invalidation of DMA_FROM_DEVICE-for_device is a proposal for all architectures. > > > So, how about: > > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > index d919efab6eba..2c52fbc15064 100644 > > --- a/arch/riscv/mm/dma-noncoherent.c > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > break; > > case DMA_FROM_DEVICE: > > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > - break; > > case DMA_BIDIRECTIONAL: > > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > break; > > This is something we can consider. Unfortunately, this is something > that no architecture (except pa-risc, which has other problems) > does at the moment, so we'd probably need to have a proper debate > about this. > > We already have two conflicting ways to handle DMA_FROM_DEVICE, > either invalidate/invalidate, or clean/invalidate. I can see I vote to invalidate/invalidate. My key point is to let DMA_FROM_DEVICE-for_device invalidate, and DMA_BIDIRECTIONAL contains DMA_FROM_DEVICE. So I also agree: @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(invalidate, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; > that flush/invalidate may be a sensible option as well, but I'd > want to have that discussion after the series is complete, so > we can come to a generic solution that has the same documented > behavior across all architectures. Yes, I agree to unify them into a generic solution first. My proposal could be another topic in the future. For that purpose, I give Acked-by: Guo Ren <guoren@kernel.org> > > In particular, if we end up moving arm64 and riscv back to the > traditional invalidate/invalidate for DMA_FROM_DEVICE and > document that driver must not rely on buffers getting cleaned After invalidation, the cache lines are also cleaned, right? So why do we need to document it additionally? > before a partial DMA_FROM_DEVICE, the question between clean > or flush becomes moot as well. > > > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > > break; > > case DMA_FROM_DEVICE: > > case DMA_BIDIRECTIONAL: > > /* I'm not sure all drivers have guaranteed cacheline > > alignment. If not, this inval would cause problems */ > > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > > break; > > This is my original patch, and I would not mix it with the other > change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in > is that both flush and inval would be wrong if you get simultaneous > writes from device and cpu to the same cache line, so there is > no way to win this. Using inval instead of flush would at least > work if the CPU data in the cacheline is read-only from the CPU, > so that seems better than something that is always wrong. If CPU data in the cacheline is read-only, the cacheline would never be dirty. Yes, It's always safe. Okay, I agree we must keep cache-line-aligned. I comment it here, just worry some dirty drivers couldn't work with the "invalid mechanism" because of the CPU data corruption, and device data in the cacheline is useless. > > The documented API is that sharing the cache line is not allowed > at all, so anything that would observe a difference between the > two is also a bug. One idea that we have considered already is > that we could overwrite the unused bits of the cacheline with > poison values and/or mark them as invalid using KASAN for debugging > purposes, to find drivers that already violate this. > > Arnd -- Best Regards Guo Ren ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-06 7:25 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-05-06 7:25 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, Christoph Hellwig, linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > >> > >> riscv also invalidates the caches before the transfer, which does > >> not appear to serve any purpose. > > Yes, we can't guarantee the CPU pre-load cache lines randomly during > > dma working. > > > > But I've two purposes to keep invalidates before dma transfer: > > - We clearly tell the CPU these cache lines are invalid. The caching > > algorithm would use these invalid slots first instead of replacing > > valid ones. > > - Invalidating is very cheap. Actually, flush and clean have the same > > performance in our machine. > > The main purpose of the series was to get consistent behavior on > all machines, so I really don't want a custom optimization on > one architecture. You make a good point about cacheline reuse > after invalidation, but if we do that, I'd suggest doing this > across all architectures. Yes, invalidation of DMA_FROM_DEVICE-for_device is a proposal for all architectures. > > > So, how about: > > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > index d919efab6eba..2c52fbc15064 100644 > > --- a/arch/riscv/mm/dma-noncoherent.c > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > break; > > case DMA_FROM_DEVICE: > > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > - break; > > case DMA_BIDIRECTIONAL: > > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > break; > > This is something we can consider. Unfortunately, this is something > that no architecture (except pa-risc, which has other problems) > does at the moment, so we'd probably need to have a proper debate > about this. > > We already have two conflicting ways to handle DMA_FROM_DEVICE, > either invalidate/invalidate, or clean/invalidate. I can see I vote to invalidate/invalidate. My key point is to let DMA_FROM_DEVICE-for_device invalidate, and DMA_BIDIRECTIONAL contains DMA_FROM_DEVICE. So I also agree: @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(invalidate, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; > that flush/invalidate may be a sensible option as well, but I'd > want to have that discussion after the series is complete, so > we can come to a generic solution that has the same documented > behavior across all architectures. Yes, I agree to unify them into a generic solution first. My proposal could be another topic in the future. For that purpose, I give Acked-by: Guo Ren <guoren@kernel.org> > > In particular, if we end up moving arm64 and riscv back to the > traditional invalidate/invalidate for DMA_FROM_DEVICE and > document that driver must not rely on buffers getting cleaned After invalidation, the cache lines are also cleaned, right? So why do we need to document it additionally? > before a partial DMA_FROM_DEVICE, the question between clean > or flush becomes moot as well. > > > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > > break; > > case DMA_FROM_DEVICE: > > case DMA_BIDIRECTIONAL: > > /* I'm not sure all drivers have guaranteed cacheline > > alignment. If not, this inval would cause problems */ > > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > > break; > > This is my original patch, and I would not mix it with the other > change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in > is that both flush and inval would be wrong if you get simultaneous > writes from device and cpu to the same cache line, so there is > no way to win this. Using inval instead of flush would at least > work if the CPU data in the cacheline is read-only from the CPU, > so that seems better than something that is always wrong. If CPU data in the cacheline is read-only, the cacheline would never be dirty. Yes, It's always safe. Okay, I agree we must keep cache-line-aligned. I comment it here, just worry some dirty drivers couldn't work with the "invalid mechanism" because of the CPU data corruption, and device data in the cacheline is useless. > > The documented API is that sharing the cache line is not allowed > at all, so anything that would observe a difference between the > two is also a bug. One idea that we have considered already is > that we could overwrite the unused bits of the cacheline with > poison values and/or mark them as invalid using KASAN for debugging > purposes, to find drivers that already violate this. > > Arnd -- Best Regards Guo Ren _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-06 7:25 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-05-06 7:25 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, Christoph Hellwig, linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > >> > >> riscv also invalidates the caches before the transfer, which does > >> not appear to serve any purpose. > > Yes, we can't guarantee the CPU pre-load cache lines randomly during > > dma working. > > > > But I've two purposes to keep invalidates before dma transfer: > > - We clearly tell the CPU these cache lines are invalid. The caching > > algorithm would use these invalid slots first instead of replacing > > valid ones. > > - Invalidating is very cheap. Actually, flush and clean have the same > > performance in our machine. > > The main purpose of the series was to get consistent behavior on > all machines, so I really don't want a custom optimization on > one architecture. You make a good point about cacheline reuse > after invalidation, but if we do that, I'd suggest doing this > across all architectures. Yes, invalidation of DMA_FROM_DEVICE-for_device is a proposal for all architectures. > > > So, how about: > > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > index d919efab6eba..2c52fbc15064 100644 > > --- a/arch/riscv/mm/dma-noncoherent.c > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > break; > > case DMA_FROM_DEVICE: > > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > - break; > > case DMA_BIDIRECTIONAL: > > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > break; > > This is something we can consider. Unfortunately, this is something > that no architecture (except pa-risc, which has other problems) > does at the moment, so we'd probably need to have a proper debate > about this. > > We already have two conflicting ways to handle DMA_FROM_DEVICE, > either invalidate/invalidate, or clean/invalidate. I can see I vote to invalidate/invalidate. My key point is to let DMA_FROM_DEVICE-for_device invalidate, and DMA_BIDIRECTIONAL contains DMA_FROM_DEVICE. So I also agree: @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(invalidate, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; > that flush/invalidate may be a sensible option as well, but I'd > want to have that discussion after the series is complete, so > we can come to a generic solution that has the same documented > behavior across all architectures. Yes, I agree to unify them into a generic solution first. My proposal could be another topic in the future. For that purpose, I give Acked-by: Guo Ren <guoren@kernel.org> > > In particular, if we end up moving arm64 and riscv back to the > traditional invalidate/invalidate for DMA_FROM_DEVICE and > document that driver must not rely on buffers getting cleaned After invalidation, the cache lines are also cleaned, right? So why do we need to document it additionally? > before a partial DMA_FROM_DEVICE, the question between clean > or flush becomes moot as well. > > > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > > break; > > case DMA_FROM_DEVICE: > > case DMA_BIDIRECTIONAL: > > /* I'm not sure all drivers have guaranteed cacheline > > alignment. If not, this inval would cause problems */ > > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > > break; > > This is my original patch, and I would not mix it with the other > change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in > is that both flush and inval would be wrong if you get simultaneous > writes from device and cpu to the same cache line, so there is > no way to win this. Using inval instead of flush would at least > work if the CPU data in the cacheline is read-only from the CPU, > so that seems better than something that is always wrong. If CPU data in the cacheline is read-only, the cacheline would never be dirty. Yes, It's always safe. Okay, I agree we must keep cache-line-aligned. I comment it here, just worry some dirty drivers couldn't work with the "invalid mechanism" because of the CPU data corruption, and device data in the cacheline is useless. > > The documented API is that sharing the cache line is not allowed > at all, so anything that would observe a difference between the > two is also a bug. One idea that we have considered already is > that we could overwrite the unused bits of the cacheline with > poison values and/or mark them as invalid using KASAN for debugging > purposes, to find drivers that already violate this. > > Arnd -- Best Regards Guo Ren _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-06 7:25 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-05-06 7:25 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, Christoph Hellwig, linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Ri On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > >> > >> riscv also invalidates the caches before the transfer, which does > >> not appear to serve any purpose. > > Yes, we can't guarantee the CPU pre-load cache lines randomly during > > dma working. > > > > But I've two purposes to keep invalidates before dma transfer: > > - We clearly tell the CPU these cache lines are invalid. The caching > > algorithm would use these invalid slots first instead of replacing > > valid ones. > > - Invalidating is very cheap. Actually, flush and clean have the same > > performance in our machine. > > The main purpose of the series was to get consistent behavior on > all machines, so I really don't want a custom optimization on > one architecture. You make a good point about cacheline reuse > after invalidation, but if we do that, I'd suggest doing this > across all architectures. Yes, invalidation of DMA_FROM_DEVICE-for_device is a proposal for all architectures. > > > So, how about: > > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > index d919efab6eba..2c52fbc15064 100644 > > --- a/arch/riscv/mm/dma-noncoherent.c > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > break; > > case DMA_FROM_DEVICE: > > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > - break; > > case DMA_BIDIRECTIONAL: > > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > break; > > This is something we can consider. Unfortunately, this is something > that no architecture (except pa-risc, which has other problems) > does at the moment, so we'd probably need to have a proper debate > about this. > > We already have two conflicting ways to handle DMA_FROM_DEVICE, > either invalidate/invalidate, or clean/invalidate. I can see I vote to invalidate/invalidate. My key point is to let DMA_FROM_DEVICE-for_device invalidate, and DMA_BIDIRECTIONAL contains DMA_FROM_DEVICE. So I also agree: @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(invalidate, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; > that flush/invalidate may be a sensible option as well, but I'd > want to have that discussion after the series is complete, so > we can come to a generic solution that has the same documented > behavior across all architectures. Yes, I agree to unify them into a generic solution first. My proposal could be another topic in the future. For that purpose, I give Acked-by: Guo Ren <guoren@kernel.org> > > In particular, if we end up moving arm64 and riscv back to the > traditional invalidate/invalidate for DMA_FROM_DEVICE and > document that driver must not rely on buffers getting cleaned After invalidation, the cache lines are also cleaned, right? So why do we need to document it additionally? > before a partial DMA_FROM_DEVICE, the question between clean > or flush becomes moot as well. > > > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > > break; > > case DMA_FROM_DEVICE: > > case DMA_BIDIRECTIONAL: > > /* I'm not sure all drivers have guaranteed cacheline > > alignment. If not, this inval would cause problems */ > > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > > break; > > This is my original patch, and I would not mix it with the other > change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in > is that both flush and inval would be wrong if you get simultaneous > writes from device and cpu to the same cache line, so there is > no way to win this. Using inval instead of flush would at least > work if the CPU data in the cacheline is read-only from the CPU, > so that seems better than something that is always wrong. If CPU data in the cacheline is read-only, the cacheline would never be dirty. Yes, It's always safe. Okay, I agree we must keep cache-line-aligned. I comment it here, just worry some dirty drivers couldn't work with the "invalid mechanism" because of the CPU data corruption, and device data in the cacheline is useless. > > The documented API is that sharing the cache line is not allowed > at all, so anything that would observe a difference between the > two is also a bug. One idea that we have considered already is > that we could overwrite the unused bits of the cacheline with > poison values and/or mark them as invalid using KASAN for debugging > purposes, to find drivers that already violate this. > > Arnd -- Best Regards Guo Ren ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA 2023-05-06 7:25 ` Guo Ren ` (3 preceding siblings ...) (?) @ 2023-05-06 7:53 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-05-06 7:53 UTC (permalink / raw) To: guoren Cc: Arnd Bergmann, Christoph Hellwig, linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Sat, May 6, 2023, at 09:25, Guo Ren wrote: > On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@arndb.de> wrote: >> >> This is something we can consider. Unfortunately, this is something >> that no architecture (except pa-risc, which has other problems) >> does at the moment, so we'd probably need to have a proper debate >> about this. >> >> We already have two conflicting ways to handle DMA_FROM_DEVICE, >> either invalidate/invalidate, or clean/invalidate. I can see > I vote to invalidate/invalidate. > ... > >> that flush/invalidate may be a sensible option as well, but I'd >> want to have that discussion after the series is complete, so >> we can come to a generic solution that has the same documented >> behavior across all architectures. > Yes, I agree to unify them into a generic solution first. My proposal > could be another topic in the future. Right, I was explicitly trying to exclude that question from my series, and left it as an architecture specific Kconfig option based on the current behavior. >> In particular, if we end up moving arm64 and riscv back to the >> traditional invalidate/invalidate for DMA_FROM_DEVICE and >> document that driver must not rely on buffers getting cleaned > After invalidation, the cache lines are also cleaned, right? So why do > we need to document it additionally? I mentioned the debate in the cover letter, the full explanation is archived at https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ In short, the problem that is addressed here is leaking sensitive kernel data to user space or a device as in this sequence: 1. A DMA buffer is allocated in the kernel and contains stale data that is no longer needed but must not be exposed to untrusted userspace, i.e. encryption keys or user file pages 2. allocator uses memset() to clear out the buffer 3. buffer gets mapped into a device for DMA_FROM_DEVICE 4. writeback cache gets invalidated, uncovering the sensitive data by discarding the zeros 5. device returns less data than expected 6. buffer is unmapped 7. whole buffer is mapped or copied to user space Will added his patch for arm64 to prevent this scenario by using 'clean' instead of 'invalidate' in step 4, and the same behavior got copied to riscv but not most of the other architectures. The dma-mapping documentation does not say anything about this case, and an alternative approach would be to document that device drivers must watch out for short reads in step 5, or that kzalloc() should clean the cache in step 2. Both of these come at a cost as well. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-06 7:53 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-05-06 7:53 UTC (permalink / raw) To: guoren Cc: Arnd Bergmann, Christoph Hellwig, linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Sat, May 6, 2023, at 09:25, Guo Ren wrote: > On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@arndb.de> wrote: >> >> This is something we can consider. Unfortunately, this is something >> that no architecture (except pa-risc, which has other problems) >> does at the moment, so we'd probably need to have a proper debate >> about this. >> >> We already have two conflicting ways to handle DMA_FROM_DEVICE, >> either invalidate/invalidate, or clean/invalidate. I can see > I vote to invalidate/invalidate. > ... > >> that flush/invalidate may be a sensible option as well, but I'd >> want to have that discussion after the series is complete, so >> we can come to a generic solution that has the same documented >> behavior across all architectures. > Yes, I agree to unify them into a generic solution first. My proposal > could be another topic in the future. Right, I was explicitly trying to exclude that question from my series, and left it as an architecture specific Kconfig option based on the current behavior. >> In particular, if we end up moving arm64 and riscv back to the >> traditional invalidate/invalidate for DMA_FROM_DEVICE and >> document that driver must not rely on buffers getting cleaned > After invalidation, the cache lines are also cleaned, right? So why do > we need to document it additionally? I mentioned the debate in the cover letter, the full explanation is archived at https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ In short, the problem that is addressed here is leaking sensitive kernel data to user space or a device as in this sequence: 1. A DMA buffer is allocated in the kernel and contains stale data that is no longer needed but must not be exposed to untrusted userspace, i.e. encryption keys or user file pages 2. allocator uses memset() to clear out the buffer 3. buffer gets mapped into a device for DMA_FROM_DEVICE 4. writeback cache gets invalidated, uncovering the sensitive data by discarding the zeros 5. device returns less data than expected 6. buffer is unmapped 7. whole buffer is mapped or copied to user space Will added his patch for arm64 to prevent this scenario by using 'clean' instead of 'invalidate' in step 4, and the same behavior got copied to riscv but not most of the other architectures. The dma-mapping documentation does not say anything about this case, and an alternative approach would be to document that device drivers must watch out for short reads in step 5, or that kzalloc() should clean the cache in step 2. Both of these come at a cost as well. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-06 7:53 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-05-06 7:53 UTC (permalink / raw) To: guoren Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Neil Armstrong, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Brian Cain, Arnd Bergmann, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Sat, May 6, 2023, at 09:25, Guo Ren wrote: > On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@arndb.de> wrote: >> >> This is something we can consider. Unfortunately, this is something >> that no architecture (except pa-risc, which has other problems) >> does at the moment, so we'd probably need to have a proper debate >> about this. >> >> We already have two conflicting ways to handle DMA_FROM_DEVICE, >> either invalidate/invalidate, or clean/invalidate. I can see > I vote to invalidate/invalidate. > ... > >> that flush/invalidate may be a sensible option as well, but I'd >> want to have that discussion after the series is complete, so >> we can come to a generic solution that has the same documented >> behavior across all architectures. > Yes, I agree to unify them into a generic solution first. My proposal > could be another topic in the future. Right, I was explicitly trying to exclude that question from my series, and left it as an architecture specific Kconfig option based on the current behavior. >> In particular, if we end up moving arm64 and riscv back to the >> traditional invalidate/invalidate for DMA_FROM_DEVICE and >> document that driver must not rely on buffers getting cleaned > After invalidation, the cache lines are also cleaned, right? So why do > we need to document it additionally? I mentioned the debate in the cover letter, the full explanation is archived at https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ In short, the problem that is addressed here is leaking sensitive kernel data to user space or a device as in this sequence: 1. A DMA buffer is allocated in the kernel and contains stale data that is no longer needed but must not be exposed to untrusted userspace, i.e. encryption keys or user file pages 2. allocator uses memset() to clear out the buffer 3. buffer gets mapped into a device for DMA_FROM_DEVICE 4. writeback cache gets invalidated, uncovering the sensitive data by discarding the zeros 5. device returns less data than expected 6. buffer is unmapped 7. whole buffer is mapped or copied to user space Will added his patch for arm64 to prevent this scenario by using 'clean' instead of 'invalidate' in step 4, and the same behavior got copied to riscv but not most of the other architectures. The dma-mapping documentation does not say anything about this case, and an alternative approach would be to document that device drivers must watch out for short reads in step 5, or that kzalloc() should clean the cache in step 2. Both of these come at a cost as well. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-06 7:53 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-05-06 7:53 UTC (permalink / raw) To: guoren Cc: Arnd Bergmann, Christoph Hellwig, linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Sat, May 6, 2023, at 09:25, Guo Ren wrote: > On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@arndb.de> wrote: >> >> This is something we can consider. Unfortunately, this is something >> that no architecture (except pa-risc, which has other problems) >> does at the moment, so we'd probably need to have a proper debate >> about this. >> >> We already have two conflicting ways to handle DMA_FROM_DEVICE, >> either invalidate/invalidate, or clean/invalidate. I can see > I vote to invalidate/invalidate. > ... > >> that flush/invalidate may be a sensible option as well, but I'd >> want to have that discussion after the series is complete, so >> we can come to a generic solution that has the same documented >> behavior across all architectures. > Yes, I agree to unify them into a generic solution first. My proposal > could be another topic in the future. Right, I was explicitly trying to exclude that question from my series, and left it as an architecture specific Kconfig option based on the current behavior. >> In particular, if we end up moving arm64 and riscv back to the >> traditional invalidate/invalidate for DMA_FROM_DEVICE and >> document that driver must not rely on buffers getting cleaned > After invalidation, the cache lines are also cleaned, right? So why do > we need to document it additionally? I mentioned the debate in the cover letter, the full explanation is archived at https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ In short, the problem that is addressed here is leaking sensitive kernel data to user space or a device as in this sequence: 1. A DMA buffer is allocated in the kernel and contains stale data that is no longer needed but must not be exposed to untrusted userspace, i.e. encryption keys or user file pages 2. allocator uses memset() to clear out the buffer 3. buffer gets mapped into a device for DMA_FROM_DEVICE 4. writeback cache gets invalidated, uncovering the sensitive data by discarding the zeros 5. device returns less data than expected 6. buffer is unmapped 7. whole buffer is mapped or copied to user space Will added his patch for arm64 to prevent this scenario by using 'clean' instead of 'invalidate' in step 4, and the same behavior got copied to riscv but not most of the other architectures. The dma-mapping documentation does not say anything about this case, and an alternative approach would be to document that device drivers must watch out for short reads in step 5, or that kzalloc() should clean the cache in step 2. Both of these come at a cost as well. Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-06 7:53 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-05-06 7:53 UTC (permalink / raw) To: guoren Cc: Arnd Bergmann, Christoph Hellwig, linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Sat, May 6, 2023, at 09:25, Guo Ren wrote: > On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@arndb.de> wrote: >> >> This is something we can consider. Unfortunately, this is something >> that no architecture (except pa-risc, which has other problems) >> does at the moment, so we'd probably need to have a proper debate >> about this. >> >> We already have two conflicting ways to handle DMA_FROM_DEVICE, >> either invalidate/invalidate, or clean/invalidate. I can see > I vote to invalidate/invalidate. > ... > >> that flush/invalidate may be a sensible option as well, but I'd >> want to have that discussion after the series is complete, so >> we can come to a generic solution that has the same documented >> behavior across all architectures. > Yes, I agree to unify them into a generic solution first. My proposal > could be another topic in the future. Right, I was explicitly trying to exclude that question from my series, and left it as an architecture specific Kconfig option based on the current behavior. >> In particular, if we end up moving arm64 and riscv back to the >> traditional invalidate/invalidate for DMA_FROM_DEVICE and >> document that driver must not rely on buffers getting cleaned > After invalidation, the cache lines are also cleaned, right? So why do > we need to document it additionally? I mentioned the debate in the cover letter, the full explanation is archived at https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ In short, the problem that is addressed here is leaking sensitive kernel data to user space or a device as in this sequence: 1. A DMA buffer is allocated in the kernel and contains stale data that is no longer needed but must not be exposed to untrusted userspace, i.e. encryption keys or user file pages 2. allocator uses memset() to clear out the buffer 3. buffer gets mapped into a device for DMA_FROM_DEVICE 4. writeback cache gets invalidated, uncovering the sensitive data by discarding the zeros 5. device returns less data than expected 6. buffer is unmapped 7. whole buffer is mapped or copied to user space Will added his patch for arm64 to prevent this scenario by using 'clean' instead of 'invalidate' in step 4, and the same behavior got copied to riscv but not most of the other architectures. The dma-mapping documentation does not say anything about this case, and an alternative approach would be to document that device drivers must watch out for short reads in step 5, or that kzalloc() should clean the cache in step 2. Both of these come at a cost as well. Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA @ 2023-05-06 7:53 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-05-06 7:53 UTC (permalink / raw) To: guoren Cc: Arnd Bergmann, Christoph Hellwig, linux-kernel, Vineet Gupta, Will Deacon, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Pa On Sat, May 6, 2023, at 09:25, Guo Ren wrote: > On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@arndb.de> wrote: >> >> This is something we can consider. Unfortunately, this is something >> that no architecture (except pa-risc, which has other problems) >> does at the moment, so we'd probably need to have a proper debate >> about this. >> >> We already have two conflicting ways to handle DMA_FROM_DEVICE, >> either invalidate/invalidate, or clean/invalidate. I can see > I vote to invalidate/invalidate. > ... > >> that flush/invalidate may be a sensible option as well, but I'd >> want to have that discussion after the series is complete, so >> we can come to a generic solution that has the same documented >> behavior across all architectures. > Yes, I agree to unify them into a generic solution first. My proposal > could be another topic in the future. Right, I was explicitly trying to exclude that question from my series, and left it as an architecture specific Kconfig option based on the current behavior. >> In particular, if we end up moving arm64 and riscv back to the >> traditional invalidate/invalidate for DMA_FROM_DEVICE and >> document that driver must not rely on buffers getting cleaned > After invalidation, the cache lines are also cleaned, right? So why do > we need to document it additionally? I mentioned the debate in the cover letter, the full explanation is archived at https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ In short, the problem that is addressed here is leaking sensitive kernel data to user space or a device as in this sequence: 1. A DMA buffer is allocated in the kernel and contains stale data that is no longer needed but must not be exposed to untrusted userspace, i.e. encryption keys or user file pages 2. allocator uses memset() to clear out the buffer 3. buffer gets mapped into a device for DMA_FROM_DEVICE 4. writeback cache gets invalidated, uncovering the sensitive data by discarding the zeros 5. device returns less data than expected 6. buffer is unmapped 7. whole buffer is mapped or copied to user space Will added his patch for arm64 to prevent this scenario by using 'clean' instead of 'invalidate' in step 4, and the same behavior got copied to riscv but not most of the other architectures. The dma-mapping documentation does not say anything about this case, and an alternative approach would be to document that device drivers must watch out for short reads in step 5, or that kzalloc() should clean the cache in step 2. Both of these come at a cost as well. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> csky is the only architecture that does a full flush for the dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement is only make sure there are no dirty cache lines for the buffer, which can be either done through an invalidate operation (as on most architectures including arm32, mips and arc), or a writeback (as on arm64 and riscv). The cache also has to be invalidated eventually but csky already does that after the transfer. Use a 'clean' operation here for consistency with arm64 and riscv. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/csky/mm/dma-mapping.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index 82447029feb4..c90f912e2822 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, { switch (dir) { case DMA_TO_DEVICE: - cache_op(paddr, size, dma_wb_range); - break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wbinv_range); + cache_op(paddr, size, dma_wb_range); break; default: BUG(); -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> csky is the only architecture that does a full flush for the dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement is only make sure there are no dirty cache lines for the buffer, which can be either done through an invalidate operation (as on most architectures including arm32, mips and arc), or a writeback (as on arm64 and riscv). The cache also has to be invalidated eventually but csky already does that after the transfer. Use a 'clean' operation here for consistency with arm64 and riscv. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/csky/mm/dma-mapping.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index 82447029feb4..c90f912e2822 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, { switch (dir) { case DMA_TO_DEVICE: - cache_op(paddr, size, dma_wb_range); - break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wbinv_range); + cache_op(paddr, size, dma_wb_range); break; default: BUG(); -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> csky is the only architecture that does a full flush for the dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement is only make sure there are no dirty cache lines for the buffer, which can be either done through an invalidate operation (as on most architectures including arm32, mips and arc), or a writeback (as on arm64 and riscv). The cache also has to be invalidated eventually but csky already does that after the transfer. Use a 'clean' operation here for consistency with arm64 and riscv. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/csky/mm/dma-mapping.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index 82447029feb4..c90f912e2822 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, { switch (dir) { case DMA_TO_DEVICE: - cache_op(paddr, size, dma_wb_range); - break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wbinv_range); + cache_op(paddr, size, dma_wb_range); break; default: BUG(); -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> csky is the only architecture that does a full flush for the dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement is only make sure there are no dirty cache lines for the buffer, which can be either done through an invalidate operation (as on most architectures including arm32, mips and arc), or a writeback (as on arm64 and riscv). The cache also has to be invalidated eventually but csky already does that after the transfer. Use a 'clean' operation here for consistency with arm64 and riscv. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/csky/mm/dma-mapping.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index 82447029feb4..c90f912e2822 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, { switch (dir) { case DMA_TO_DEVICE: - cache_op(paddr, size, dma_wb_range); - break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wbinv_range); + cache_op(paddr, size, dma_wb_range); break; default: BUG(); -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> csky is the only architecture that does a full flush for the dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement is only make sure there are no dirty cache lines for the buffer, which can be either done through an invalidate operation (as on most architectures including arm32, mips and arc), or a writeback (as on arm64 and riscv). The cache also has to be invalidated eventually but csky already does that after the transfer. Use a 'clean' operation here for consistency with arm64 and riscv. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/csky/mm/dma-mapping.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index 82447029feb4..c90f912e2822 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, { switch (dir) { case DMA_TO_DEVICE: - cache_op(paddr, size, dma_wb_range); - break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wbinv_range); + cache_op(paddr, size, dma_wb_range); break; default: BUG(); -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> csky is the only architecture that does a full flush for the dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement is only make sure there are no dirty cache lines for the buffer, which can be either done through an invalidate operation (as on most architectures including arm32, mips and arc), or a writeback (as on arm64 and riscv). The cache also has to be invalidated eventually but csky already does that after the transfer. Use a 'clean' operation here for consistency with arm64 and riscv. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/csky/mm/dma-mapping.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index 82447029feb4..c90f912e2822 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, { switch (dir) { case DMA_TO_DEVICE: - cache_op(paddr, size, dma_wb_range); - break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wbinv_range); + cache_op(paddr, size, dma_wb_range); break; default: BUG(); -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 13:37 ` Guo Ren -1 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-03-27 13:37 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > csky is the only architecture that does a full flush for the > dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement > is only make sure there are no dirty cache lines for the buffer, > which can be either done through an invalidate operation (as on most > architectures including arm32, mips and arc), or a writeback (as on > arm64 and riscv). The cache also has to be invalidated eventually but > csky already does that after the transfer. > > Use a 'clean' operation here for consistency with arm64 and riscv. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/csky/mm/dma-mapping.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index 82447029feb4..c90f912e2822 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > { > switch (dir) { > case DMA_TO_DEVICE: > - cache_op(paddr, size, dma_wb_range); > - break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wbinv_range); > + cache_op(paddr, size, dma_wb_range); Reviewed-by: Guo Ren <guoren@kernel.org> > break; > default: > BUG(); > -- > 2.39.2 > -- Best Regards Guo Ren ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device @ 2023-03-27 13:37 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-03-27 13:37 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > csky is the only architecture that does a full flush for the > dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement > is only make sure there are no dirty cache lines for the buffer, > which can be either done through an invalidate operation (as on most > architectures including arm32, mips and arc), or a writeback (as on > arm64 and riscv). The cache also has to be invalidated eventually but > csky already does that after the transfer. > > Use a 'clean' operation here for consistency with arm64 and riscv. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/csky/mm/dma-mapping.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index 82447029feb4..c90f912e2822 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > { > switch (dir) { > case DMA_TO_DEVICE: > - cache_op(paddr, size, dma_wb_range); > - break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wbinv_range); > + cache_op(paddr, size, dma_wb_range); Reviewed-by: Guo Ren <guoren@kernel.org> > break; > default: > BUG(); > -- > 2.39.2 > -- Best Regards Guo Ren _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device @ 2023-03-27 13:37 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-03-27 13:37 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > csky is the only architecture that does a full flush for the > dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement > is only make sure there are no dirty cache lines for the buffer, > which can be either done through an invalidate operation (as on most > architectures including arm32, mips and arc), or a writeback (as on > arm64 and riscv). The cache also has to be invalidated eventually but > csky already does that after the transfer. > > Use a 'clean' operation here for consistency with arm64 and riscv. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/csky/mm/dma-mapping.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index 82447029feb4..c90f912e2822 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > { > switch (dir) { > case DMA_TO_DEVICE: > - cache_op(paddr, size, dma_wb_range); > - break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wbinv_range); > + cache_op(paddr, size, dma_wb_range); Reviewed-by: Guo Ren <guoren@kernel.org> > break; > default: > BUG(); > -- > 2.39.2 > -- Best Regards Guo Ren ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device @ 2023-03-27 13:37 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-03-27 13:37 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > csky is the only architecture that does a full flush for the > dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement > is only make sure there are no dirty cache lines for the buffer, > which can be either done through an invalidate operation (as on most > architectures including arm32, mips and arc), or a writeback (as on > arm64 and riscv). The cache also has to be invalidated eventually but > csky already does that after the transfer. > > Use a 'clean' operation here for consistency with arm64 and riscv. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/csky/mm/dma-mapping.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index 82447029feb4..c90f912e2822 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > { > switch (dir) { > case DMA_TO_DEVICE: > - cache_op(paddr, size, dma_wb_range); > - break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wbinv_range); > + cache_op(paddr, size, dma_wb_range); Reviewed-by: Guo Ren <guoren@kernel.org> > break; > default: > BUG(); > -- > 2.39.2 > -- Best Regards Guo Ren _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device @ 2023-03-27 13:37 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-03-27 13:37 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > csky is the only architecture that does a full flush for the > dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement > is only make sure there are no dirty cache lines for the buffer, > which can be either done through an invalidate operation (as on most > architectures including arm32, mips and arc), or a writeback (as on > arm64 and riscv). The cache also has to be invalidated eventually but > csky already does that after the transfer. > > Use a 'clean' operation here for consistency with arm64 and riscv. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/csky/mm/dma-mapping.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index 82447029feb4..c90f912e2822 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > { > switch (dir) { > case DMA_TO_DEVICE: > - cache_op(paddr, size, dma_wb_range); > - break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wbinv_range); > + cache_op(paddr, size, dma_wb_range); Reviewed-by: Guo Ren <guoren@kernel.org> > break; > default: > BUG(); > -- > 2.39.2 > -- Best Regards Guo Ren _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device @ 2023-03-27 13:37 ` Guo Ren 0 siblings, 0 replies; 456+ messages in thread From: Guo Ren @ 2023-03-27 13:37 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > csky is the only architecture that does a full flush for the > dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement > is only make sure there are no dirty cache lines for the buffer, > which can be either done through an invalidate operation (as on most > architectures including arm32, mips and arc), or a writeback (as on > arm64 and riscv). The cache also has to be invalidated eventually but > csky already does that after the transfer. > > Use a 'clean' operation here for consistency with arm64 and riscv. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/csky/mm/dma-mapping.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index 82447029feb4..c90f912e2822 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > { > switch (dir) { > case DMA_TO_DEVICE: > - cache_op(paddr, size, dma_wb_range); > - break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wbinv_range); > + cache_op(paddr, size, dma_wb_range); Reviewed-by: Guo Ren <guoren@kernel.org> > break; > default: > BUG(); > -- > 2.39.2 > -- Best Regards Guo Ren ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 11/21] mips: dma-mapping: skip invalidating before bidirectional DMA 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. The behavior on mips is slightly inconsistent, as it always does the invalidation before bidirectional DMA and conditionally does it a second time. In order to make the behavior the same as the rest, change it so that there is exactly one invalidation here, either before or after the DMA. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/mips/mm/dma-noncoherent.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index 3c4fc97b9f39..b4350faf4f1e 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -65,7 +65,11 @@ static inline void dma_sync_virt_for_device(void *addr, size_t size, dma_cache_inv((unsigned long)addr, size); break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv((unsigned long)addr, size); + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_cache_wback((unsigned long)addr, size); + else + dma_cache_wback_inv((unsigned long)addr, size); break; default: BUG(); -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 11/21] mips: dma-mapping: skip invalidating before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. The behavior on mips is slightly inconsistent, as it always does the invalidation before bidirectional DMA and conditionally does it a second time. In order to make the behavior the same as the rest, change it so that there is exactly one invalidation here, either before or after the DMA. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/mips/mm/dma-noncoherent.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index 3c4fc97b9f39..b4350faf4f1e 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -65,7 +65,11 @@ static inline void dma_sync_virt_for_device(void *addr, size_t size, dma_cache_inv((unsigned long)addr, size); break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv((unsigned long)addr, size); + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_cache_wback((unsigned long)addr, size); + else + dma_cache_wback_inv((unsigned long)addr, size); break; default: BUG(); -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 11/21] mips: dma-mapping: skip invalidating before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. The behavior on mips is slightly inconsistent, as it always does the invalidation before bidirectional DMA and conditionally does it a second time. In order to make the behavior the same as the rest, change it so that there is exactly one invalidation here, either before or after the DMA. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/mips/mm/dma-noncoherent.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index 3c4fc97b9f39..b4350faf4f1e 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -65,7 +65,11 @@ static inline void dma_sync_virt_for_device(void *addr, size_t size, dma_cache_inv((unsigned long)addr, size); break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv((unsigned long)addr, size); + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_cache_wback((unsigned long)addr, size); + else + dma_cache_wback_inv((unsigned long)addr, size); break; default: BUG(); -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 11/21] mips: dma-mapping: skip invalidating before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. The behavior on mips is slightly inconsistent, as it always does the invalidation before bidirectional DMA and conditionally does it a second time. In order to make the behavior the same as the rest, change it so that there is exactly one invalidation here, either before or after the DMA. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/mips/mm/dma-noncoherent.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index 3c4fc97b9f39..b4350faf4f1e 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -65,7 +65,11 @@ static inline void dma_sync_virt_for_device(void *addr, size_t size, dma_cache_inv((unsigned long)addr, size); break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv((unsigned long)addr, size); + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_cache_wback((unsigned long)addr, size); + else + dma_cache_wback_inv((unsigned long)addr, size); break; default: BUG(); -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 11/21] mips: dma-mapping: skip invalidating before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. The behavior on mips is slightly inconsistent, as it always does the invalidation before bidirectional DMA and conditionally does it a second time. In order to make the behavior the same as the rest, change it so that there is exactly one invalidation here, either before or after the DMA. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/mips/mm/dma-noncoherent.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index 3c4fc97b9f39..b4350faf4f1e 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -65,7 +65,11 @@ static inline void dma_sync_virt_for_device(void *addr, size_t size, dma_cache_inv((unsigned long)addr, size); break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv((unsigned long)addr, size); + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_cache_wback((unsigned long)addr, size); + else + dma_cache_wback_inv((unsigned long)addr, size); break; default: BUG(); -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 11/21] mips: dma-mapping: skip invalidating before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. The behavior on mips is slightly inconsistent, as it always does the invalidation before bidirectional DMA and conditionally does it a second time. In order to make the behavior the same as the rest, change it so that there is exactly one invalidation here, either before or after the DMA. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/mips/mm/dma-noncoherent.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index 3c4fc97b9f39..b4350faf4f1e 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -65,7 +65,11 @@ static inline void dma_sync_virt_for_device(void *addr, size_t size, dma_cache_inv((unsigned long)addr, size); break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv((unsigned long)addr, size); + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_cache_wback((unsigned long)addr, size); + else + dma_cache_wback_inv((unsigned long)addr, size); break; default: BUG(); -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 12/21] mips: dma-mapping: split out cache operation logic 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The mips arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave the same way as on other architectures, but in order to unify the implementations, the code needs to be rearranged to pick the type of cache operation in the outermost function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/mips/mm/dma-noncoherent.c | 75 ++++++++++++++-------------------- 1 file changed, 30 insertions(+), 45 deletions(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b4350faf4f1e..b9d68bcc5d53 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -54,50 +54,13 @@ void *arch_dma_set_uncached(void *addr, size_t size) return (void *)(__pa(addr) + UNCAC_BASE); } -static inline void dma_sync_virt_for_device(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback((unsigned long)addr, size); - break; - case DMA_FROM_DEVICE: - dma_cache_inv((unsigned long)addr, size); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_cache_wback((unsigned long)addr, size); - else - dma_cache_wback_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - -static inline void dma_sync_virt_for_cpu(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - /* * A single sg entry may refer to multiple physically contiguous pages. But * we still need to process highmem pages individually. If highmem is not * configured then the bulk of this loop gets optimized out. */ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, - enum dma_data_direction dir, bool for_device) + void(*cache_op)(unsigned long start, unsigned long size)) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned long offset = paddr & ~PAGE_MASK; @@ -113,10 +76,7 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } addr = kmap_atomic(page); - if (for_device) - dma_sync_virt_for_device(addr + offset, len, dir); - else - dma_sync_virt_for_cpu(addr + offset, len, dir); + cache_op((unsigned long)addr + offset, len); kunmap_atomic(addr); offset = 0; @@ -128,15 +88,40 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_sync_phys(paddr, size, dir, true); + switch (dir) { + case DMA_TO_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_wback); + break; + case DMA_FROM_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + case DMA_BIDIRECTIONAL: + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_wback); + else + dma_sync_phys(paddr, size, _dma_cache_wback_inv); + break; + default: + break; + } } #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, dir, false); + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + if (cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + default: + break; + } } #endif -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 12/21] mips: dma-mapping: split out cache operation logic @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The mips arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave the same way as on other architectures, but in order to unify the implementations, the code needs to be rearranged to pick the type of cache operation in the outermost function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/mips/mm/dma-noncoherent.c | 75 ++++++++++++++-------------------- 1 file changed, 30 insertions(+), 45 deletions(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b4350faf4f1e..b9d68bcc5d53 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -54,50 +54,13 @@ void *arch_dma_set_uncached(void *addr, size_t size) return (void *)(__pa(addr) + UNCAC_BASE); } -static inline void dma_sync_virt_for_device(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback((unsigned long)addr, size); - break; - case DMA_FROM_DEVICE: - dma_cache_inv((unsigned long)addr, size); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_cache_wback((unsigned long)addr, size); - else - dma_cache_wback_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - -static inline void dma_sync_virt_for_cpu(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - /* * A single sg entry may refer to multiple physically contiguous pages. But * we still need to process highmem pages individually. If highmem is not * configured then the bulk of this loop gets optimized out. */ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, - enum dma_data_direction dir, bool for_device) + void(*cache_op)(unsigned long start, unsigned long size)) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned long offset = paddr & ~PAGE_MASK; @@ -113,10 +76,7 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } addr = kmap_atomic(page); - if (for_device) - dma_sync_virt_for_device(addr + offset, len, dir); - else - dma_sync_virt_for_cpu(addr + offset, len, dir); + cache_op((unsigned long)addr + offset, len); kunmap_atomic(addr); offset = 0; @@ -128,15 +88,40 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_sync_phys(paddr, size, dir, true); + switch (dir) { + case DMA_TO_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_wback); + break; + case DMA_FROM_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + case DMA_BIDIRECTIONAL: + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_wback); + else + dma_sync_phys(paddr, size, _dma_cache_wback_inv); + break; + default: + break; + } } #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, dir, false); + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + if (cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + default: + break; + } } #endif -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 12/21] mips: dma-mapping: split out cache operation logic @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> The mips arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave the same way as on other architectures, but in order to unify the implementations, the code needs to be rearranged to pick the type of cache operation in the outermost function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/mips/mm/dma-noncoherent.c | 75 ++++++++++++++-------------------- 1 file changed, 30 insertions(+), 45 deletions(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b4350faf4f1e..b9d68bcc5d53 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -54,50 +54,13 @@ void *arch_dma_set_uncached(void *addr, size_t size) return (void *)(__pa(addr) + UNCAC_BASE); } -static inline void dma_sync_virt_for_device(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback((unsigned long)addr, size); - break; - case DMA_FROM_DEVICE: - dma_cache_inv((unsigned long)addr, size); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_cache_wback((unsigned long)addr, size); - else - dma_cache_wback_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - -static inline void dma_sync_virt_for_cpu(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - /* * A single sg entry may refer to multiple physically contiguous pages. But * we still need to process highmem pages individually. If highmem is not * configured then the bulk of this loop gets optimized out. */ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, - enum dma_data_direction dir, bool for_device) + void(*cache_op)(unsigned long start, unsigned long size)) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned long offset = paddr & ~PAGE_MASK; @@ -113,10 +76,7 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } addr = kmap_atomic(page); - if (for_device) - dma_sync_virt_for_device(addr + offset, len, dir); - else - dma_sync_virt_for_cpu(addr + offset, len, dir); + cache_op((unsigned long)addr + offset, len); kunmap_atomic(addr); offset = 0; @@ -128,15 +88,40 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_sync_phys(paddr, size, dir, true); + switch (dir) { + case DMA_TO_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_wback); + break; + case DMA_FROM_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + case DMA_BIDIRECTIONAL: + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_wback); + else + dma_sync_phys(paddr, size, _dma_cache_wback_inv); + break; + default: + break; + } } #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, dir, false); + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + if (cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + default: + break; + } } #endif -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 12/21] mips: dma-mapping: split out cache operation logic @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The mips arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave the same way as on other architectures, but in order to unify the implementations, the code needs to be rearranged to pick the type of cache operation in the outermost function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/mips/mm/dma-noncoherent.c | 75 ++++++++++++++-------------------- 1 file changed, 30 insertions(+), 45 deletions(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b4350faf4f1e..b9d68bcc5d53 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -54,50 +54,13 @@ void *arch_dma_set_uncached(void *addr, size_t size) return (void *)(__pa(addr) + UNCAC_BASE); } -static inline void dma_sync_virt_for_device(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback((unsigned long)addr, size); - break; - case DMA_FROM_DEVICE: - dma_cache_inv((unsigned long)addr, size); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_cache_wback((unsigned long)addr, size); - else - dma_cache_wback_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - -static inline void dma_sync_virt_for_cpu(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - /* * A single sg entry may refer to multiple physically contiguous pages. But * we still need to process highmem pages individually. If highmem is not * configured then the bulk of this loop gets optimized out. */ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, - enum dma_data_direction dir, bool for_device) + void(*cache_op)(unsigned long start, unsigned long size)) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned long offset = paddr & ~PAGE_MASK; @@ -113,10 +76,7 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } addr = kmap_atomic(page); - if (for_device) - dma_sync_virt_for_device(addr + offset, len, dir); - else - dma_sync_virt_for_cpu(addr + offset, len, dir); + cache_op((unsigned long)addr + offset, len); kunmap_atomic(addr); offset = 0; @@ -128,15 +88,40 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_sync_phys(paddr, size, dir, true); + switch (dir) { + case DMA_TO_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_wback); + break; + case DMA_FROM_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + case DMA_BIDIRECTIONAL: + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_wback); + else + dma_sync_phys(paddr, size, _dma_cache_wback_inv); + break; + default: + break; + } } #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, dir, false); + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + if (cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + default: + break; + } } #endif -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 12/21] mips: dma-mapping: split out cache operation logic @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The mips arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave the same way as on other architectures, but in order to unify the implementations, the code needs to be rearranged to pick the type of cache operation in the outermost function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/mips/mm/dma-noncoherent.c | 75 ++++++++++++++-------------------- 1 file changed, 30 insertions(+), 45 deletions(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b4350faf4f1e..b9d68bcc5d53 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -54,50 +54,13 @@ void *arch_dma_set_uncached(void *addr, size_t size) return (void *)(__pa(addr) + UNCAC_BASE); } -static inline void dma_sync_virt_for_device(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback((unsigned long)addr, size); - break; - case DMA_FROM_DEVICE: - dma_cache_inv((unsigned long)addr, size); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_cache_wback((unsigned long)addr, size); - else - dma_cache_wback_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - -static inline void dma_sync_virt_for_cpu(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - /* * A single sg entry may refer to multiple physically contiguous pages. But * we still need to process highmem pages individually. If highmem is not * configured then the bulk of this loop gets optimized out. */ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, - enum dma_data_direction dir, bool for_device) + void(*cache_op)(unsigned long start, unsigned long size)) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned long offset = paddr & ~PAGE_MASK; @@ -113,10 +76,7 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } addr = kmap_atomic(page); - if (for_device) - dma_sync_virt_for_device(addr + offset, len, dir); - else - dma_sync_virt_for_cpu(addr + offset, len, dir); + cache_op((unsigned long)addr + offset, len); kunmap_atomic(addr); offset = 0; @@ -128,15 +88,40 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_sync_phys(paddr, size, dir, true); + switch (dir) { + case DMA_TO_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_wback); + break; + case DMA_FROM_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + case DMA_BIDIRECTIONAL: + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_wback); + else + dma_sync_phys(paddr, size, _dma_cache_wback_inv); + break; + default: + break; + } } #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, dir, false); + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + if (cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + default: + break; + } } #endif -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 12/21] mips: dma-mapping: split out cache operation logic @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> The mips arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave the same way as on other architectures, but in order to unify the implementations, the code needs to be rearranged to pick the type of cache operation in the outermost function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/mips/mm/dma-noncoherent.c | 75 ++++++++++++++-------------------- 1 file changed, 30 insertions(+), 45 deletions(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b4350faf4f1e..b9d68bcc5d53 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -54,50 +54,13 @@ void *arch_dma_set_uncached(void *addr, size_t size) return (void *)(__pa(addr) + UNCAC_BASE); } -static inline void dma_sync_virt_for_device(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback((unsigned long)addr, size); - break; - case DMA_FROM_DEVICE: - dma_cache_inv((unsigned long)addr, size); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_cache_wback((unsigned long)addr, size); - else - dma_cache_wback_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - -static inline void dma_sync_virt_for_cpu(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - /* * A single sg entry may refer to multiple physically contiguous pages. But * we still need to process highmem pages individually. If highmem is not * configured then the bulk of this loop gets optimized out. */ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, - enum dma_data_direction dir, bool for_device) + void(*cache_op)(unsigned long start, unsigned long size)) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned long offset = paddr & ~PAGE_MASK; @@ -113,10 +76,7 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } addr = kmap_atomic(page); - if (for_device) - dma_sync_virt_for_device(addr + offset, len, dir); - else - dma_sync_virt_for_cpu(addr + offset, len, dir); + cache_op((unsigned long)addr + offset, len); kunmap_atomic(addr); offset = 0; @@ -128,15 +88,40 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_sync_phys(paddr, size, dir, true); + switch (dir) { + case DMA_TO_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_wback); + break; + case DMA_FROM_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + case DMA_BIDIRECTIONAL: + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_wback); + else + dma_sync_phys(paddr, size, _dma_cache_wback_inv); + break; + default: + break; + } } #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, dir, false); + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + if (cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + default: + break; + } } #endif -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. arc is one of the architectures that does both, which seems unnecessary. Change it to behave like arm/arm64/xtensa instead, and use just a writeback before the DMA when we do the invalidate afterwards. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arc/mm/dma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index 2a7fbbb83b70..ddb96786f765 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -40,7 +40,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) * |---------------------------------------------------------------- * TO_DEV | writeback writeback | none none * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR | writeback+inv writeback+inv | invalidate invalidate + * BIDIR | writeback writeback | invalidate invalidate * * [*] needed for CPU speculative prefetches * @@ -61,7 +61,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv(paddr, size); + dma_cache_wback(paddr, size); break; default: -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. arc is one of the architectures that does both, which seems unnecessary. Change it to behave like arm/arm64/xtensa instead, and use just a writeback before the DMA when we do the invalidate afterwards. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arc/mm/dma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index 2a7fbbb83b70..ddb96786f765 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -40,7 +40,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) * |---------------------------------------------------------------- * TO_DEV | writeback writeback | none none * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR | writeback+inv writeback+inv | invalidate invalidate + * BIDIR | writeback writeback | invalidate invalidate * * [*] needed for CPU speculative prefetches * @@ -61,7 +61,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv(paddr, size); + dma_cache_wback(paddr, size); break; default: -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. arc is one of the architectures that does both, which seems unnecessary. Change it to behave like arm/arm64/xtensa instead, and use just a writeback before the DMA when we do the invalidate afterwards. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arc/mm/dma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index 2a7fbbb83b70..ddb96786f765 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -40,7 +40,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) * |---------------------------------------------------------------- * TO_DEV | writeback writeback | none none * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR | writeback+inv writeback+inv | invalidate invalidate + * BIDIR | writeback writeback | invalidate invalidate * * [*] needed for CPU speculative prefetches * @@ -61,7 +61,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv(paddr, size); + dma_cache_wback(paddr, size); break; default: -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. arc is one of the architectures that does both, which seems unnecessary. Change it to behave like arm/arm64/xtensa instead, and use just a writeback before the DMA when we do the invalidate afterwards. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arc/mm/dma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index 2a7fbbb83b70..ddb96786f765 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -40,7 +40,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) * |---------------------------------------------------------------- * TO_DEV | writeback writeback | none none * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR | writeback+inv writeback+inv | invalidate invalidate + * BIDIR | writeback writeback | invalidate invalidate * * [*] needed for CPU speculative prefetches * @@ -61,7 +61,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv(paddr, size); + dma_cache_wback(paddr, size); break; default: -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. arc is one of the architectures that does both, which seems unnecessary. Change it to behave like arm/arm64/xtensa instead, and use just a writeback before the DMA when we do the invalidate afterwards. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arc/mm/dma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index 2a7fbbb83b70..ddb96786f765 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -40,7 +40,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) * |---------------------------------------------------------------- * TO_DEV | writeback writeback | none none * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR | writeback+inv writeback+inv | invalidate invalidate + * BIDIR | writeback writeback | invalidate invalidate * * [*] needed for CPU speculative prefetches * @@ -61,7 +61,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv(paddr, size); + dma_cache_wback(paddr, size); break; default: -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. arc is one of the architectures that does both, which seems unnecessary. Change it to behave like arm/arm64/xtensa instead, and use just a writeback before the DMA when we do the invalidate afterwards. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arc/mm/dma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index 2a7fbbb83b70..ddb96786f765 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -40,7 +40,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) * |---------------------------------------------------------------- * TO_DEV | writeback writeback | none none * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR | writeback+inv writeback+inv | invalidate invalidate + * BIDIR | writeback writeback | invalidate invalidate * * [*] needed for CPU speculative prefetches * @@ -61,7 +61,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv(paddr, size); + dma_cache_wback(paddr, size); break; default: -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-04-02 6:52 ` Vineet Gupta -1 siblings, 0 replies; 456+ messages in thread From: Vineet Gupta @ 2023-04-02 6:52 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Shahab Vahedi CC Shahab On 3/27/23 17:43, Arnd Bergmann wrote: > From: Arnd Bergmann<arnd@arndb.de> > > Some architectures that need to invalidate buffers after bidirectional > DMA because of speculative prefetching only do a simpler writeback > before that DMA, while architectures that don't need to do the second > invalidate tend to have a combined writeback+invalidate before the > DMA. > > arc is one of the architectures that does both, which seems unnecessary. > > Change it to behave like arm/arm64/xtensa instead, and use just a > writeback before the DMA when we do the invalidate afterwards. > > Signed-off-by: Arnd Bergmann<arnd@arndb.de> Reviewed-by: Vineet Gupta <vgupta@kernel.org> Shahab can you give this a spin on hsdk - run glibc testsuite over ssh and make sure nothing strange happens. Thx, -Vineet ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-02 6:52 ` Vineet Gupta 0 siblings, 0 replies; 456+ messages in thread From: Vineet Gupta @ 2023-04-02 6:52 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Shahab Vahedi CC Shahab On 3/27/23 17:43, Arnd Bergmann wrote: > From: Arnd Bergmann<arnd@arndb.de> > > Some architectures that need to invalidate buffers after bidirectional > DMA because of speculative prefetching only do a simpler writeback > before that DMA, while architectures that don't need to do the second > invalidate tend to have a combined writeback+invalidate before the > DMA. > > arc is one of the architectures that does both, which seems unnecessary. > > Change it to behave like arm/arm64/xtensa instead, and use just a > writeback before the DMA when we do the invalidate afterwards. > > Signed-off-by: Arnd Bergmann<arnd@arndb.de> Reviewed-by: Vineet Gupta <vgupta@kernel.org> Shahab can you give this a spin on hsdk - run glibc testsuite over ssh and make sure nothing strange happens. Thx, -Vineet _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-02 6:52 ` Vineet Gupta 0 siblings, 0 replies; 456+ messages in thread From: Vineet Gupta @ 2023-04-02 6:52 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, Shahab Vahedi, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstr ong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller CC Shahab On 3/27/23 17:43, Arnd Bergmann wrote: > From: Arnd Bergmann<arnd@arndb.de> > > Some architectures that need to invalidate buffers after bidirectional > DMA because of speculative prefetching only do a simpler writeback > before that DMA, while architectures that don't need to do the second > invalidate tend to have a combined writeback+invalidate before the > DMA. > > arc is one of the architectures that does both, which seems unnecessary. > > Change it to behave like arm/arm64/xtensa instead, and use just a > writeback before the DMA when we do the invalidate afterwards. > > Signed-off-by: Arnd Bergmann<arnd@arndb.de> Reviewed-by: Vineet Gupta <vgupta@kernel.org> Shahab can you give this a spin on hsdk - run glibc testsuite over ssh and make sure nothing strange happens. Thx, -Vineet ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-02 6:52 ` Vineet Gupta 0 siblings, 0 replies; 456+ messages in thread From: Vineet Gupta @ 2023-04-02 6:52 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Shahab Vahedi CC Shahab On 3/27/23 17:43, Arnd Bergmann wrote: > From: Arnd Bergmann<arnd@arndb.de> > > Some architectures that need to invalidate buffers after bidirectional > DMA because of speculative prefetching only do a simpler writeback > before that DMA, while architectures that don't need to do the second > invalidate tend to have a combined writeback+invalidate before the > DMA. > > arc is one of the architectures that does both, which seems unnecessary. > > Change it to behave like arm/arm64/xtensa instead, and use just a > writeback before the DMA when we do the invalidate afterwards. > > Signed-off-by: Arnd Bergmann<arnd@arndb.de> Reviewed-by: Vineet Gupta <vgupta@kernel.org> Shahab can you give this a spin on hsdk - run glibc testsuite over ssh and make sure nothing strange happens. Thx, -Vineet _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-02 6:52 ` Vineet Gupta 0 siblings, 0 replies; 456+ messages in thread From: Vineet Gupta @ 2023-04-02 6:52 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Shahab Vahedi CC Shahab On 3/27/23 17:43, Arnd Bergmann wrote: > From: Arnd Bergmann<arnd@arndb.de> > > Some architectures that need to invalidate buffers after bidirectional > DMA because of speculative prefetching only do a simpler writeback > before that DMA, while architectures that don't need to do the second > invalidate tend to have a combined writeback+invalidate before the > DMA. > > arc is one of the architectures that does both, which seems unnecessary. > > Change it to behave like arm/arm64/xtensa instead, and use just a > writeback before the DMA when we do the invalidate afterwards. > > Signed-off-by: Arnd Bergmann<arnd@arndb.de> Reviewed-by: Vineet Gupta <vgupta@kernel.org> Shahab can you give this a spin on hsdk - run glibc testsuite over ssh and make sure nothing strange happens. Thx, -Vineet _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-02 6:52 ` Vineet Gupta 0 siblings, 0 replies; 456+ messages in thread From: Vineet Gupta @ 2023-04-02 6:52 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John CC Shahab On 3/27/23 17:43, Arnd Bergmann wrote: > From: Arnd Bergmann<arnd@arndb.de> > > Some architectures that need to invalidate buffers after bidirectional > DMA because of speculative prefetching only do a simpler writeback > before that DMA, while architectures that don't need to do the second > invalidate tend to have a combined writeback+invalidate before the > DMA. > > arc is one of the architectures that does both, which seems unnecessary. > > Change it to behave like arm/arm64/xtensa instead, and use just a > writeback before the DMA when we do the invalidate afterwards. > > Signed-off-by: Arnd Bergmann<arnd@arndb.de> Reviewed-by: Vineet Gupta <vgupta@kernel.org> Shahab can you give this a spin on hsdk - run glibc testsuite over ssh and make sure nothing strange happens. Thx, -Vineet ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA 2023-04-02 6:52 ` Vineet Gupta ` (3 preceding siblings ...) (?) @ 2023-04-04 8:27 ` Shahab Vahedi -1 siblings, 0 replies; 456+ messages in thread From: Shahab Vahedi @ 2023-04-04 8:27 UTC (permalink / raw) To: Vineet Gupta, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Shahab Vahedi On 4/2/23 08:52, Vineet Gupta wrote: > CC Shahab > > On 3/27/23 17:43, Arnd Bergmann wrote: >> From: Arnd Bergmann<arnd@arndb.de> >> >> Some architectures that need to invalidate buffers after bidirectional >> DMA because of speculative prefetching only do a simpler writeback >> before that DMA, while architectures that don't need to do the second >> invalidate tend to have a combined writeback+invalidate before the >> DMA. >> >> arc is one of the architectures that does both, which seems unnecessary. >> >> Change it to behave like arm/arm64/xtensa instead, and use just a >> writeback before the DMA when we do the invalidate afterwards. >> >> Signed-off-by: Arnd Bergmann<arnd@arndb.de> > > Reviewed-by: Vineet Gupta <vgupta@kernel.org> > > Shahab can you give this a spin on hsdk - run glibc testsuite over ssh and make sure nothing strange happens. > > Thx, > -Vineet On it. -- Shahab ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-04 8:27 ` Shahab Vahedi 0 siblings, 0 replies; 456+ messages in thread From: Shahab Vahedi @ 2023-04-04 8:27 UTC (permalink / raw) To: Vineet Gupta, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Shahab Vahedi On 4/2/23 08:52, Vineet Gupta wrote: > CC Shahab > > On 3/27/23 17:43, Arnd Bergmann wrote: >> From: Arnd Bergmann<arnd@arndb.de> >> >> Some architectures that need to invalidate buffers after bidirectional >> DMA because of speculative prefetching only do a simpler writeback >> before that DMA, while architectures that don't need to do the second >> invalidate tend to have a combined writeback+invalidate before the >> DMA. >> >> arc is one of the architectures that does both, which seems unnecessary. >> >> Change it to behave like arm/arm64/xtensa instead, and use just a >> writeback before the DMA when we do the invalidate afterwards. >> >> Signed-off-by: Arnd Bergmann<arnd@arndb.de> > > Reviewed-by: Vineet Gupta <vgupta@kernel.org> > > Shahab can you give this a spin on hsdk - run glibc testsuite over ssh and make sure nothing strange happens. > > Thx, > -Vineet On it. -- Shahab _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-04 8:27 ` Shahab Vahedi 0 siblings, 0 replies; 456+ messages in thread From: Shahab Vahedi @ 2023-04-04 8:27 UTC (permalink / raw) To: Vineet Gupta, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Rich Felker, linux-sh@vger.kernel.org, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky@vger.kernel.org, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org, Arnd Bergmann, Brian Cain, Lad Prabhakar, Shahab Vahedi, linux-m68k@lists.linux-m68k.org, Paul Walmsley, Stafford Horne, linux-arm-kernel@lists.infradead.org, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc@vger.kernel.org, linux-openrisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mips@vger.kernel.org, Dinh Nguyen, Palmer Dabbelt, linux-hexagon@vger.kernel.org, linux-oxnas@groups.io, Robin Murphy, David S. Miller On 4/2/23 08:52, Vineet Gupta wrote: > CC Shahab > > On 3/27/23 17:43, Arnd Bergmann wrote: >> From: Arnd Bergmann<arnd@arndb.de> >> >> Some architectures that need to invalidate buffers after bidirectional >> DMA because of speculative prefetching only do a simpler writeback >> before that DMA, while architectures that don't need to do the second >> invalidate tend to have a combined writeback+invalidate before the >> DMA. >> >> arc is one of the architectures that does both, which seems unnecessary. >> >> Change it to behave like arm/arm64/xtensa instead, and use just a >> writeback before the DMA when we do the invalidate afterwards. >> >> Signed-off-by: Arnd Bergmann<arnd@arndb.de> > > Reviewed-by: Vineet Gupta <vgupta@kernel.org> > > Shahab can you give this a spin on hsdk - run glibc testsuite over ssh and make sure nothing strange happens. > > Thx, > -Vineet On it. -- Shahab ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-04 8:27 ` Shahab Vahedi 0 siblings, 0 replies; 456+ messages in thread From: Shahab Vahedi @ 2023-04-04 8:27 UTC (permalink / raw) To: Vineet Gupta, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Shahab Vahedi On 4/2/23 08:52, Vineet Gupta wrote: > CC Shahab > > On 3/27/23 17:43, Arnd Bergmann wrote: >> From: Arnd Bergmann<arnd@arndb.de> >> >> Some architectures that need to invalidate buffers after bidirectional >> DMA because of speculative prefetching only do a simpler writeback >> before that DMA, while architectures that don't need to do the second >> invalidate tend to have a combined writeback+invalidate before the >> DMA. >> >> arc is one of the architectures that does both, which seems unnecessary. >> >> Change it to behave like arm/arm64/xtensa instead, and use just a >> writeback before the DMA when we do the invalidate afterwards. >> >> Signed-off-by: Arnd Bergmann<arnd@arndb.de> > > Reviewed-by: Vineet Gupta <vgupta@kernel.org> > > Shahab can you give this a spin on hsdk - run glibc testsuite over ssh and make sure nothing strange happens. > > Thx, > -Vineet On it. -- Shahab _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-04 8:27 ` Shahab Vahedi 0 siblings, 0 replies; 456+ messages in thread From: Shahab Vahedi @ 2023-04-04 8:27 UTC (permalink / raw) To: Vineet Gupta, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Shahab Vahedi On 4/2/23 08:52, Vineet Gupta wrote: > CC Shahab > > On 3/27/23 17:43, Arnd Bergmann wrote: >> From: Arnd Bergmann<arnd@arndb.de> >> >> Some architectures that need to invalidate buffers after bidirectional >> DMA because of speculative prefetching only do a simpler writeback >> before that DMA, while architectures that don't need to do the second >> invalidate tend to have a combined writeback+invalidate before the >> DMA. >> >> arc is one of the architectures that does both, which seems unnecessary. >> >> Change it to behave like arm/arm64/xtensa instead, and use just a >> writeback before the DMA when we do the invalidate afterwards. >> >> Signed-off-by: Arnd Bergmann<arnd@arndb.de> > > Reviewed-by: Vineet Gupta <vgupta@kernel.org> > > Shahab can you give this a spin on hsdk - run glibc testsuite over ssh and make sure nothing strange happens. > > Thx, > -Vineet On it. -- Shahab _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-04 8:27 ` Shahab Vahedi 0 siblings, 0 replies; 456+ messages in thread From: Shahab Vahedi @ 2023-04-04 8:27 UTC (permalink / raw) To: Vineet Gupta, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz On 4/2/23 08:52, Vineet Gupta wrote: > CC Shahab > > On 3/27/23 17:43, Arnd Bergmann wrote: >> From: Arnd Bergmann<arnd@arndb.de> >> >> Some architectures that need to invalidate buffers after bidirectional >> DMA because of speculative prefetching only do a simpler writeback >> before that DMA, while architectures that don't need to do the second >> invalidate tend to have a combined writeback+invalidate before the >> DMA. >> >> arc is one of the architectures that does both, which seems unnecessary. >> >> Change it to behave like arm/arm64/xtensa instead, and use just a >> writeback before the DMA when we do the invalidate afterwards. >> >> Signed-off-by: Arnd Bergmann<arnd@arndb.de> > > Reviewed-by: Vineet Gupta <vgupta@kernel.org> > > Shahab can you give this a spin on hsdk - run glibc testsuite over ssh and make sure nothing strange happens. > > Thx, > -Vineet On it. -- Shahab ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA 2023-04-02 6:52 ` Vineet Gupta ` (3 preceding siblings ...) (?) @ 2023-04-06 9:01 ` Shahab Vahedi -1 siblings, 0 replies; 456+ messages in thread From: Shahab Vahedi @ 2023-04-06 9:01 UTC (permalink / raw) To: Vineet Gupta, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Shahab Vahedi On 4/2/23 08:52, Vineet Gupta wrote: > CC Shahab > > On 3/27/23 17:43, Arnd Bergmann wrote: >> From: Arnd Bergmann<arnd@arndb.de> >> >> Some architectures that need to invalidate buffers after bidirectional >> DMA because of speculative prefetching only do a simpler writeback >> before that DMA, while architectures that don't need to do the second >> invalidate tend to have a combined writeback+invalidate before the >> DMA. >> >> arc is one of the architectures that does both, which seems unnecessary. >> >> Change it to behave like arm/arm64/xtensa instead, and use just a >> writeback before the DMA when we do the invalidate afterwards. >> >> Signed-off-by: Arnd Bergmann<arnd@arndb.de> > > Reviewed-by: Vineet Gupta <vgupta@kernel.org> > > Shahab can you give this a spin on hsdk - run glibc testsuite over ssh > and make sure nothing strange happens. > > Thx, > -Vineet Tested-by: Shahab Vahedi <shahab@synopsys.com> No regression was observed for the ARC target before and after applying these 21 patches. The test environment and its summary follow. board: ARC HSDK base: repo: linux-next tag: next-20230403 commit: 31bd35b66249 Add linux-next specific files for 20230403 hotfix: net: stmmac: check fwnode for phy device before scanning for phy [1] glibc: 2.37 Summary of test results: 20 FAIL 4227 PASS 38 UNSUPPORTED 16 XFAIL 2 XPASS [1] https://lore.kernel.org/lkml/20230405093945.3549491-1-michael.wei.hong.sit@intel.com/#r -- Shahab ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-06 9:01 ` Shahab Vahedi 0 siblings, 0 replies; 456+ messages in thread From: Shahab Vahedi @ 2023-04-06 9:01 UTC (permalink / raw) To: Vineet Gupta, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Shahab Vahedi On 4/2/23 08:52, Vineet Gupta wrote: > CC Shahab > > On 3/27/23 17:43, Arnd Bergmann wrote: >> From: Arnd Bergmann<arnd@arndb.de> >> >> Some architectures that need to invalidate buffers after bidirectional >> DMA because of speculative prefetching only do a simpler writeback >> before that DMA, while architectures that don't need to do the second >> invalidate tend to have a combined writeback+invalidate before the >> DMA. >> >> arc is one of the architectures that does both, which seems unnecessary. >> >> Change it to behave like arm/arm64/xtensa instead, and use just a >> writeback before the DMA when we do the invalidate afterwards. >> >> Signed-off-by: Arnd Bergmann<arnd@arndb.de> > > Reviewed-by: Vineet Gupta <vgupta@kernel.org> > > Shahab can you give this a spin on hsdk - run glibc testsuite over ssh > and make sure nothing strange happens. > > Thx, > -Vineet Tested-by: Shahab Vahedi <shahab@synopsys.com> No regression was observed for the ARC target before and after applying these 21 patches. The test environment and its summary follow. board: ARC HSDK base: repo: linux-next tag: next-20230403 commit: 31bd35b66249 Add linux-next specific files for 20230403 hotfix: net: stmmac: check fwnode for phy device before scanning for phy [1] glibc: 2.37 Summary of test results: 20 FAIL 4227 PASS 38 UNSUPPORTED 16 XFAIL 2 XPASS [1] https://lore.kernel.org/lkml/20230405093945.3549491-1-michael.wei.hong.sit@intel.com/#r -- Shahab _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-06 9:01 ` Shahab Vahedi 0 siblings, 0 replies; 456+ messages in thread From: Shahab Vahedi @ 2023-04-06 9:01 UTC (permalink / raw) To: Vineet Gupta, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Rich Felker, linux-sh@vger.kernel.org, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky@vger.kernel.org, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org, Arnd Bergmann, Brian Cain, Lad Prabhakar, Shahab Vahedi, linux-m68k@lists.linux-m68k.org, Paul Walmsley, Stafford Horne, linux-arm-kernel@lists.infradead.org, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc@vger.kernel.org, linux-openrisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mips@vger.kernel.org, Dinh Nguyen, Palmer Dabbelt, linux-hexagon@vger.kernel.org, linux-oxnas@groups.io, Robin Murphy, David S. Miller On 4/2/23 08:52, Vineet Gupta wrote: > CC Shahab > > On 3/27/23 17:43, Arnd Bergmann wrote: >> From: Arnd Bergmann<arnd@arndb.de> >> >> Some architectures that need to invalidate buffers after bidirectional >> DMA because of speculative prefetching only do a simpler writeback >> before that DMA, while architectures that don't need to do the second >> invalidate tend to have a combined writeback+invalidate before the >> DMA. >> >> arc is one of the architectures that does both, which seems unnecessary. >> >> Change it to behave like arm/arm64/xtensa instead, and use just a >> writeback before the DMA when we do the invalidate afterwards. >> >> Signed-off-by: Arnd Bergmann<arnd@arndb.de> > > Reviewed-by: Vineet Gupta <vgupta@kernel.org> > > Shahab can you give this a spin on hsdk - run glibc testsuite over ssh > and make sure nothing strange happens. > > Thx, > -Vineet Tested-by: Shahab Vahedi <shahab@synopsys.com> No regression was observed for the ARC target before and after applying these 21 patches. The test environment and its summary follow. board: ARC HSDK base: repo: linux-next tag: next-20230403 commit: 31bd35b66249 Add linux-next specific files for 20230403 hotfix: net: stmmac: check fwnode for phy device before scanning for phy [1] glibc: 2.37 Summary of test results: 20 FAIL 4227 PASS 38 UNSUPPORTED 16 XFAIL 2 XPASS [1] https://lore.kernel.org/lkml/20230405093945.3549491-1-michael.wei.hong.sit@intel.com/#r -- Shahab ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-06 9:01 ` Shahab Vahedi 0 siblings, 0 replies; 456+ messages in thread From: Shahab Vahedi @ 2023-04-06 9:01 UTC (permalink / raw) To: Vineet Gupta, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Shahab Vahedi On 4/2/23 08:52, Vineet Gupta wrote: > CC Shahab > > On 3/27/23 17:43, Arnd Bergmann wrote: >> From: Arnd Bergmann<arnd@arndb.de> >> >> Some architectures that need to invalidate buffers after bidirectional >> DMA because of speculative prefetching only do a simpler writeback >> before that DMA, while architectures that don't need to do the second >> invalidate tend to have a combined writeback+invalidate before the >> DMA. >> >> arc is one of the architectures that does both, which seems unnecessary. >> >> Change it to behave like arm/arm64/xtensa instead, and use just a >> writeback before the DMA when we do the invalidate afterwards. >> >> Signed-off-by: Arnd Bergmann<arnd@arndb.de> > > Reviewed-by: Vineet Gupta <vgupta@kernel.org> > > Shahab can you give this a spin on hsdk - run glibc testsuite over ssh > and make sure nothing strange happens. > > Thx, > -Vineet Tested-by: Shahab Vahedi <shahab@synopsys.com> No regression was observed for the ARC target before and after applying these 21 patches. The test environment and its summary follow. board: ARC HSDK base: repo: linux-next tag: next-20230403 commit: 31bd35b66249 Add linux-next specific files for 20230403 hotfix: net: stmmac: check fwnode for phy device before scanning for phy [1] glibc: 2.37 Summary of test results: 20 FAIL 4227 PASS 38 UNSUPPORTED 16 XFAIL 2 XPASS [1] https://lore.kernel.org/lkml/20230405093945.3549491-1-michael.wei.hong.sit@intel.com/#r -- Shahab _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-06 9:01 ` Shahab Vahedi 0 siblings, 0 replies; 456+ messages in thread From: Shahab Vahedi @ 2023-04-06 9:01 UTC (permalink / raw) To: Vineet Gupta, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Shahab Vahedi On 4/2/23 08:52, Vineet Gupta wrote: > CC Shahab > > On 3/27/23 17:43, Arnd Bergmann wrote: >> From: Arnd Bergmann<arnd@arndb.de> >> >> Some architectures that need to invalidate buffers after bidirectional >> DMA because of speculative prefetching only do a simpler writeback >> before that DMA, while architectures that don't need to do the second >> invalidate tend to have a combined writeback+invalidate before the >> DMA. >> >> arc is one of the architectures that does both, which seems unnecessary. >> >> Change it to behave like arm/arm64/xtensa instead, and use just a >> writeback before the DMA when we do the invalidate afterwards. >> >> Signed-off-by: Arnd Bergmann<arnd@arndb.de> > > Reviewed-by: Vineet Gupta <vgupta@kernel.org> > > Shahab can you give this a spin on hsdk - run glibc testsuite over ssh > and make sure nothing strange happens. > > Thx, > -Vineet Tested-by: Shahab Vahedi <shahab@synopsys.com> No regression was observed for the ARC target before and after applying these 21 patches. The test environment and its summary follow. board: ARC HSDK base: repo: linux-next tag: next-20230403 commit: 31bd35b66249 Add linux-next specific files for 20230403 hotfix: net: stmmac: check fwnode for phy device before scanning for phy [1] glibc: 2.37 Summary of test results: 20 FAIL 4227 PASS 38 UNSUPPORTED 16 XFAIL 2 XPASS [1] https://lore.kernel.org/lkml/20230405093945.3549491-1-michael.wei.hong.sit@intel.com/#r -- Shahab _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA @ 2023-04-06 9:01 ` Shahab Vahedi 0 siblings, 0 replies; 456+ messages in thread From: Shahab Vahedi @ 2023-04-06 9:01 UTC (permalink / raw) To: Vineet Gupta, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz On 4/2/23 08:52, Vineet Gupta wrote: > CC Shahab > > On 3/27/23 17:43, Arnd Bergmann wrote: >> From: Arnd Bergmann<arnd@arndb.de> >> >> Some architectures that need to invalidate buffers after bidirectional >> DMA because of speculative prefetching only do a simpler writeback >> before that DMA, while architectures that don't need to do the second >> invalidate tend to have a combined writeback+invalidate before the >> DMA. >> >> arc is one of the architectures that does both, which seems unnecessary. >> >> Change it to behave like arm/arm64/xtensa instead, and use just a >> writeback before the DMA when we do the invalidate afterwards. >> >> Signed-off-by: Arnd Bergmann<arnd@arndb.de> > > Reviewed-by: Vineet Gupta <vgupta@kernel.org> > > Shahab can you give this a spin on hsdk - run glibc testsuite over ssh > and make sure nothing strange happens. > > Thx, > -Vineet Tested-by: Shahab Vahedi <shahab@synopsys.com> No regression was observed for the ARC target before and after applying these 21 patches. The test environment and its summary follow. board: ARC HSDK base: repo: linux-next tag: next-20230403 commit: 31bd35b66249 Add linux-next specific files for 20230403 hotfix: net: stmmac: check fwnode for phy device before scanning for phy [1] glibc: 2.37 Summary of test results: 20 FAIL 4227 PASS 38 UNSUPPORTED 16 XFAIL 2 XPASS [1] https://lore.kernel.org/lkml/20230405093945.3549491-1-michael.wei.hong.sit@intel.com/#r -- Shahab ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 14/21] parisc: dma-mapping: use regular flush/invalidate ops 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> non-coherent devices on parisc traditionally use a full flush+invalidate before and after each DMA, which is more expensive that what we do on other architectures. Before transfers to a device, the cache only has to be written back, but apparently there is no operation for this on parisc. There is no need to flush it again after the transfer though. After transfers from a device, the second writeback can be skipped because the CPU was not allowed to write to the buffer anyway, instead a purge (invalidate without flush) can be used. The DMA_FROM_DEVICE is handled differently across architectures, most use only an invalidate (purge) operation, but some have moved to flush in order to preserve dirty data when the device does not write to the buffer, see the link below. As parisc already did the full flush here, keep that behavior. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I'm not really sure I understand the semantics of the 'flush' and 'purge' operations on parisc correctly, please double-check that this makes sense in the context of this architecture. --- arch/parisc/include/asm/cacheflush.h | 6 +++++- arch/parisc/kernel/pci-dma.c | 25 +++++++++++++++++++++++-- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h index 0bdee6724132..a4c5042f1821 100644 --- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -33,8 +33,12 @@ void flush_cache_mm(struct mm_struct *mm); void flush_kernel_dcache_page_addr(const void *addr); +#define clean_kernel_dcache_range(start,size) \ + flush_kernel_dcache_range((start), (size)) #define flush_kernel_dcache_range(start,size) \ - flush_kernel_dcache_range_asm((start), (start)+(size)); + flush_kernel_dcache_range_asm((start), (start)+(size)) +#define purge_kernel_dcache_range(start,size) \ + purge_kernel_dcache_range_asm((start), (start)+(size)) #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1 void flush_kernel_vmap_range(void *vaddr, int size); diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index ba87f791323b..6d3d3cffb316 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -446,11 +446,32 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_FROM_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_BIDIRECTIONAL: + flush_kernel_dcache_range(virt, size); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + purge_kernel_dcache_range(virt, size); + break; + } } -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 14/21] parisc: dma-mapping: use regular flush/invalidate ops @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> non-coherent devices on parisc traditionally use a full flush+invalidate before and after each DMA, which is more expensive that what we do on other architectures. Before transfers to a device, the cache only has to be written back, but apparently there is no operation for this on parisc. There is no need to flush it again after the transfer though. After transfers from a device, the second writeback can be skipped because the CPU was not allowed to write to the buffer anyway, instead a purge (invalidate without flush) can be used. The DMA_FROM_DEVICE is handled differently across architectures, most use only an invalidate (purge) operation, but some have moved to flush in order to preserve dirty data when the device does not write to the buffer, see the link below. As parisc already did the full flush here, keep that behavior. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I'm not really sure I understand the semantics of the 'flush' and 'purge' operations on parisc correctly, please double-check that this makes sense in the context of this architecture. --- arch/parisc/include/asm/cacheflush.h | 6 +++++- arch/parisc/kernel/pci-dma.c | 25 +++++++++++++++++++++++-- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h index 0bdee6724132..a4c5042f1821 100644 --- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -33,8 +33,12 @@ void flush_cache_mm(struct mm_struct *mm); void flush_kernel_dcache_page_addr(const void *addr); +#define clean_kernel_dcache_range(start,size) \ + flush_kernel_dcache_range((start), (size)) #define flush_kernel_dcache_range(start,size) \ - flush_kernel_dcache_range_asm((start), (start)+(size)); + flush_kernel_dcache_range_asm((start), (start)+(size)) +#define purge_kernel_dcache_range(start,size) \ + purge_kernel_dcache_range_asm((start), (start)+(size)) #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1 void flush_kernel_vmap_range(void *vaddr, int size); diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index ba87f791323b..6d3d3cffb316 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -446,11 +446,32 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_FROM_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_BIDIRECTIONAL: + flush_kernel_dcache_range(virt, size); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + purge_kernel_dcache_range(virt, size); + break; + } } -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 14/21] parisc: dma-mapping: use regular flush/invalidate ops @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> non-coherent devices on parisc traditionally use a full flush+invalidate before and after each DMA, which is more expensive that what we do on other architectures. Before transfers to a device, the cache only has to be written back, but apparently there is no operation for this on parisc. There is no need to flush it again after the transfer though. After transfers from a device, the second writeback can be skipped because the CPU was not allowed to write to the buffer anyway, instead a purge (invalidate without flush) can be used. The DMA_FROM_DEVICE is handled differently across architectures, most use only an invalidate (purge) operation, but some have moved to flush in order to preserve dirty data when the device does not write to the buffer, see the link below. As parisc already did the full flush here, keep that behavior. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I'm not really sure I understand the semantics of the 'flush' and 'purge' operations on parisc correctly, please double-check that this makes sense in the context of this architecture. --- arch/parisc/include/asm/cacheflush.h | 6 +++++- arch/parisc/kernel/pci-dma.c | 25 +++++++++++++++++++++++-- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h index 0bdee6724132..a4c5042f1821 100644 --- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -33,8 +33,12 @@ void flush_cache_mm(struct mm_struct *mm); void flush_kernel_dcache_page_addr(const void *addr); +#define clean_kernel_dcache_range(start,size) \ + flush_kernel_dcache_range((start), (size)) #define flush_kernel_dcache_range(start,size) \ - flush_kernel_dcache_range_asm((start), (start)+(size)); + flush_kernel_dcache_range_asm((start), (start)+(size)) +#define purge_kernel_dcache_range(start,size) \ + purge_kernel_dcache_range_asm((start), (start)+(size)) #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1 void flush_kernel_vmap_range(void *vaddr, int size); diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index ba87f791323b..6d3d3cffb316 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -446,11 +446,32 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_FROM_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_BIDIRECTIONAL: + flush_kernel_dcache_range(virt, size); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + purge_kernel_dcache_range(virt, size); + break; + } } -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 14/21] parisc: dma-mapping: use regular flush/invalidate ops @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> non-coherent devices on parisc traditionally use a full flush+invalidate before and after each DMA, which is more expensive that what we do on other architectures. Before transfers to a device, the cache only has to be written back, but apparently there is no operation for this on parisc. There is no need to flush it again after the transfer though. After transfers from a device, the second writeback can be skipped because the CPU was not allowed to write to the buffer anyway, instead a purge (invalidate without flush) can be used. The DMA_FROM_DEVICE is handled differently across architectures, most use only an invalidate (purge) operation, but some have moved to flush in order to preserve dirty data when the device does not write to the buffer, see the link below. As parisc already did the full flush here, keep that behavior. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I'm not really sure I understand the semantics of the 'flush' and 'purge' operations on parisc correctly, please double-check that this makes sense in the context of this architecture. --- arch/parisc/include/asm/cacheflush.h | 6 +++++- arch/parisc/kernel/pci-dma.c | 25 +++++++++++++++++++++++-- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h index 0bdee6724132..a4c5042f1821 100644 --- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -33,8 +33,12 @@ void flush_cache_mm(struct mm_struct *mm); void flush_kernel_dcache_page_addr(const void *addr); +#define clean_kernel_dcache_range(start,size) \ + flush_kernel_dcache_range((start), (size)) #define flush_kernel_dcache_range(start,size) \ - flush_kernel_dcache_range_asm((start), (start)+(size)); + flush_kernel_dcache_range_asm((start), (start)+(size)) +#define purge_kernel_dcache_range(start,size) \ + purge_kernel_dcache_range_asm((start), (start)+(size)) #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1 void flush_kernel_vmap_range(void *vaddr, int size); diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index ba87f791323b..6d3d3cffb316 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -446,11 +446,32 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_FROM_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_BIDIRECTIONAL: + flush_kernel_dcache_range(virt, size); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + purge_kernel_dcache_range(virt, size); + break; + } } -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 14/21] parisc: dma-mapping: use regular flush/invalidate ops @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> non-coherent devices on parisc traditionally use a full flush+invalidate before and after each DMA, which is more expensive that what we do on other architectures. Before transfers to a device, the cache only has to be written back, but apparently there is no operation for this on parisc. There is no need to flush it again after the transfer though. After transfers from a device, the second writeback can be skipped because the CPU was not allowed to write to the buffer anyway, instead a purge (invalidate without flush) can be used. The DMA_FROM_DEVICE is handled differently across architectures, most use only an invalidate (purge) operation, but some have moved to flush in order to preserve dirty data when the device does not write to the buffer, see the link below. As parisc already did the full flush here, keep that behavior. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I'm not really sure I understand the semantics of the 'flush' and 'purge' operations on parisc correctly, please double-check that this makes sense in the context of this architecture. --- arch/parisc/include/asm/cacheflush.h | 6 +++++- arch/parisc/kernel/pci-dma.c | 25 +++++++++++++++++++++++-- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h index 0bdee6724132..a4c5042f1821 100644 --- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -33,8 +33,12 @@ void flush_cache_mm(struct mm_struct *mm); void flush_kernel_dcache_page_addr(const void *addr); +#define clean_kernel_dcache_range(start,size) \ + flush_kernel_dcache_range((start), (size)) #define flush_kernel_dcache_range(start,size) \ - flush_kernel_dcache_range_asm((start), (start)+(size)); + flush_kernel_dcache_range_asm((start), (start)+(size)) +#define purge_kernel_dcache_range(start,size) \ + purge_kernel_dcache_range_asm((start), (start)+(size)) #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1 void flush_kernel_vmap_range(void *vaddr, int size); diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index ba87f791323b..6d3d3cffb316 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -446,11 +446,32 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_FROM_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_BIDIRECTIONAL: + flush_kernel_dcache_range(virt, size); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + purge_kernel_dcache_range(virt, size); + break; + } } -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 14/21] parisc: dma-mapping: use regular flush/invalidate ops @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> non-coherent devices on parisc traditionally use a full flush+invalidate before and after each DMA, which is more expensive that what we do on other architectures. Before transfers to a device, the cache only has to be written back, but apparently there is no operation for this on parisc. There is no need to flush it again after the transfer though. After transfers from a device, the second writeback can be skipped because the CPU was not allowed to write to the buffer anyway, instead a purge (invalidate without flush) can be used. The DMA_FROM_DEVICE is handled differently across architectures, most use only an invalidate (purge) operation, but some have moved to flush in order to preserve dirty data when the device does not write to the buffer, see the link below. As parisc already did the full flush here, keep that behavior. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I'm not really sure I understand the semantics of the 'flush' and 'purge' operations on parisc correctly, please double-check that this makes sense in the context of this architecture. --- arch/parisc/include/asm/cacheflush.h | 6 +++++- arch/parisc/kernel/pci-dma.c | 25 +++++++++++++++++++++++-- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h index 0bdee6724132..a4c5042f1821 100644 --- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -33,8 +33,12 @@ void flush_cache_mm(struct mm_struct *mm); void flush_kernel_dcache_page_addr(const void *addr); +#define clean_kernel_dcache_range(start,size) \ + flush_kernel_dcache_range((start), (size)) #define flush_kernel_dcache_range(start,size) \ - flush_kernel_dcache_range_asm((start), (start)+(size)); + flush_kernel_dcache_range_asm((start), (start)+(size)) +#define purge_kernel_dcache_range(start,size) \ + purge_kernel_dcache_range_asm((start), (start)+(size)) #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1 void flush_kernel_vmap_range(void *vaddr, int size); diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index ba87f791323b..6d3d3cffb316 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -446,11 +446,32 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_FROM_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_BIDIRECTIONAL: + flush_kernel_dcache_range(virt, size); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + purge_kernel_dcache_range(virt, size); + break; + } } -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Most ARM CPUs can have write-back caches and that require cache management to be done in the dma_sync_*_for_device() operation. This is typically done in both writeback and writethrough mode. The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S (arm920t, arm940t) implementations are the exception here, and only do the cache management after the DMA is complete, in the dma_sync_*_for_cpu() operation. Change this for consistency with the other platforms. This should have no user visible effect. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/cache-v4.S | 8 ++++---- arch/arm/mm/cache-v4wt.S | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index 7787057e4990..e2b104876340 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -117,23 +117,23 @@ ENTRY(v4_dma_flush_range) ret lr /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_unmap_area) +ENTRY(v4_dma_map_area) teq r2, #DMA_TO_DEVICE bne v4_dma_flush_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_map_area) +ENTRY(v4_dma_unmap_area) ret lr ENDPROC(v4_dma_unmap_area) ENDPROC(v4_dma_map_area) diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 0b290c25a99d..652218752f88 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -172,24 +172,24 @@ v4wt_dma_inv_range: .equ v4wt_dma_flush_range, v4wt_dma_inv_range /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_unmap_area) +ENTRY(v4wt_dma_map_area) add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v4wt_dma_inv_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_map_area) +ENTRY(v4wt_dma_unmap_area) ret lr ENDPROC(v4wt_dma_unmap_area) ENDPROC(v4wt_dma_map_area) -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Most ARM CPUs can have write-back caches and that require cache management to be done in the dma_sync_*_for_device() operation. This is typically done in both writeback and writethrough mode. The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S (arm920t, arm940t) implementations are the exception here, and only do the cache management after the DMA is complete, in the dma_sync_*_for_cpu() operation. Change this for consistency with the other platforms. This should have no user visible effect. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/cache-v4.S | 8 ++++---- arch/arm/mm/cache-v4wt.S | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index 7787057e4990..e2b104876340 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -117,23 +117,23 @@ ENTRY(v4_dma_flush_range) ret lr /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_unmap_area) +ENTRY(v4_dma_map_area) teq r2, #DMA_TO_DEVICE bne v4_dma_flush_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_map_area) +ENTRY(v4_dma_unmap_area) ret lr ENDPROC(v4_dma_unmap_area) ENDPROC(v4_dma_map_area) diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 0b290c25a99d..652218752f88 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -172,24 +172,24 @@ v4wt_dma_inv_range: .equ v4wt_dma_flush_range, v4wt_dma_inv_range /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_unmap_area) +ENTRY(v4wt_dma_map_area) add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v4wt_dma_inv_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_map_area) +ENTRY(v4wt_dma_unmap_area) ret lr ENDPROC(v4wt_dma_unmap_area) ENDPROC(v4wt_dma_map_area) -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> Most ARM CPUs can have write-back caches and that require cache management to be done in the dma_sync_*_for_device() operation. This is typically done in both writeback and writethrough mode. The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S (arm920t, arm940t) implementations are the exception here, and only do the cache management after the DMA is complete, in the dma_sync_*_for_cpu() operation. Change this for consistency with the other platforms. This should have no user visible effect. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/cache-v4.S | 8 ++++---- arch/arm/mm/cache-v4wt.S | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index 7787057e4990..e2b104876340 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -117,23 +117,23 @@ ENTRY(v4_dma_flush_range) ret lr /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_unmap_area) +ENTRY(v4_dma_map_area) teq r2, #DMA_TO_DEVICE bne v4_dma_flush_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_map_area) +ENTRY(v4_dma_unmap_area) ret lr ENDPROC(v4_dma_unmap_area) ENDPROC(v4_dma_map_area) diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 0b290c25a99d..652218752f88 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -172,24 +172,24 @@ v4wt_dma_inv_range: .equ v4wt_dma_flush_range, v4wt_dma_inv_range /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_unmap_area) +ENTRY(v4wt_dma_map_area) add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v4wt_dma_inv_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_map_area) +ENTRY(v4wt_dma_unmap_area) ret lr ENDPROC(v4wt_dma_unmap_area) ENDPROC(v4wt_dma_map_area) -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Most ARM CPUs can have write-back caches and that require cache management to be done in the dma_sync_*_for_device() operation. This is typically done in both writeback and writethrough mode. The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S (arm920t, arm940t) implementations are the exception here, and only do the cache management after the DMA is complete, in the dma_sync_*_for_cpu() operation. Change this for consistency with the other platforms. This should have no user visible effect. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/cache-v4.S | 8 ++++---- arch/arm/mm/cache-v4wt.S | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index 7787057e4990..e2b104876340 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -117,23 +117,23 @@ ENTRY(v4_dma_flush_range) ret lr /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_unmap_area) +ENTRY(v4_dma_map_area) teq r2, #DMA_TO_DEVICE bne v4_dma_flush_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_map_area) +ENTRY(v4_dma_unmap_area) ret lr ENDPROC(v4_dma_unmap_area) ENDPROC(v4_dma_map_area) diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 0b290c25a99d..652218752f88 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -172,24 +172,24 @@ v4wt_dma_inv_range: .equ v4wt_dma_flush_range, v4wt_dma_inv_range /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_unmap_area) +ENTRY(v4wt_dma_map_area) add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v4wt_dma_inv_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_map_area) +ENTRY(v4wt_dma_unmap_area) ret lr ENDPROC(v4wt_dma_unmap_area) ENDPROC(v4wt_dma_map_area) -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Most ARM CPUs can have write-back caches and that require cache management to be done in the dma_sync_*_for_device() operation. This is typically done in both writeback and writethrough mode. The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S (arm920t, arm940t) implementations are the exception here, and only do the cache management after the DMA is complete, in the dma_sync_*_for_cpu() operation. Change this for consistency with the other platforms. This should have no user visible effect. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/cache-v4.S | 8 ++++---- arch/arm/mm/cache-v4wt.S | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index 7787057e4990..e2b104876340 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -117,23 +117,23 @@ ENTRY(v4_dma_flush_range) ret lr /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_unmap_area) +ENTRY(v4_dma_map_area) teq r2, #DMA_TO_DEVICE bne v4_dma_flush_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_map_area) +ENTRY(v4_dma_unmap_area) ret lr ENDPROC(v4_dma_unmap_area) ENDPROC(v4_dma_map_area) diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 0b290c25a99d..652218752f88 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -172,24 +172,24 @@ v4wt_dma_inv_range: .equ v4wt_dma_flush_range, v4wt_dma_inv_range /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_unmap_area) +ENTRY(v4wt_dma_map_area) add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v4wt_dma_inv_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_map_area) +ENTRY(v4wt_dma_unmap_area) ret lr ENDPROC(v4wt_dma_unmap_area) ENDPROC(v4wt_dma_map_area) -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John From: Arnd Bergmann <arnd@arndb.de> Most ARM CPUs can have write-back caches and that require cache management to be done in the dma_sync_*_for_device() operation. This is typically done in both writeback and writethrough mode. The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S (arm920t, arm940t) implementations are the exception here, and only do the cache management after the DMA is complete, in the dma_sync_*_for_cpu() operation. Change this for consistency with the other platforms. This should have no user visible effect. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/cache-v4.S | 8 ++++---- arch/arm/mm/cache-v4wt.S | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index 7787057e4990..e2b104876340 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -117,23 +117,23 @@ ENTRY(v4_dma_flush_range) ret lr /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_unmap_area) +ENTRY(v4_dma_map_area) teq r2, #DMA_TO_DEVICE bne v4_dma_flush_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_map_area) +ENTRY(v4_dma_unmap_area) ret lr ENDPROC(v4_dma_unmap_area) ENDPROC(v4_dma_map_area) diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 0b290c25a99d..652218752f88 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -172,24 +172,24 @@ v4wt_dma_inv_range: .equ v4wt_dma_flush_range, v4wt_dma_inv_range /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_unmap_area) +ENTRY(v4wt_dma_map_area) add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v4wt_dma_inv_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_map_area) +ENTRY(v4wt_dma_unmap_area) ret lr ENDPROC(v4wt_dma_unmap_area) ENDPROC(v4wt_dma_map_area) -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-31 9:01 ` Linus Walleij -1 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-31 9:01 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Most ARM CPUs can have write-back caches and that require > cache management to be done in the dma_sync_*_for_device() > operation. This is typically done in both writeback and > writethrough mode. > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > (arm920t, arm940t) implementations are the exception here, > and only do the cache management after the DMA is complete, > in the dma_sync_*_for_cpu() operation. > > Change this for consistency with the other platforms. This > should have no user visible effect. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Looks good to me. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:01 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-31 9:01 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Most ARM CPUs can have write-back caches and that require > cache management to be done in the dma_sync_*_for_device() > operation. This is typically done in both writeback and > writethrough mode. > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > (arm920t, arm940t) implementations are the exception here, > and only do the cache management after the DMA is complete, > in the dma_sync_*_for_cpu() operation. > > Change this for consistency with the other platforms. This > should have no user visible effect. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Looks good to me. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:01 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-31 9:01 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Most ARM CPUs can have write-back caches and that require > cache management to be done in the dma_sync_*_for_device() > operation. This is typically done in both writeback and > writethrough mode. > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > (arm920t, arm940t) implementations are the exception here, > and only do the cache management after the DMA is complete, > in the dma_sync_*_for_cpu() operation. > > Change this for consistency with the other platforms. This > should have no user visible effect. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Looks good to me. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:01 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-31 9:01 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Most ARM CPUs can have write-back caches and that require > cache management to be done in the dma_sync_*_for_device() > operation. This is typically done in both writeback and > writethrough mode. > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > (arm920t, arm940t) implementations are the exception here, > and only do the cache management after the DMA is complete, > in the dma_sync_*_for_cpu() operation. > > Change this for consistency with the other platforms. This > should have no user visible effect. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Looks good to me. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:01 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-31 9:01 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Most ARM CPUs can have write-back caches and that require > cache management to be done in the dma_sync_*_for_device() > operation. This is typically done in both writeback and > writethrough mode. > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > (arm920t, arm940t) implementations are the exception here, > and only do the cache management after the DMA is complete, > in the dma_sync_*_for_cpu() operation. > > Change this for consistency with the other platforms. This > should have no user visible effect. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Looks good to me. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:01 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-31 9:01 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, nux-m68k.org, Paul Walmsley, Stafford Horne, linux-a On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Most ARM CPUs can have write-back caches and that require > cache management to be done in the dma_sync_*_for_device() > operation. This is typically done in both writeback and > writethrough mode. > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > (arm920t, arm940t) implementations are the exception here, > and only do the cache management after the DMA is complete, > in the dma_sync_*_for_cpu() operation. > > Change this for consistency with the other platforms. This > should have no user visible effect. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Looks good to me. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-31 9:07 ` Russell King (Oracle) -1 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 9:07 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Most ARM CPUs can have write-back caches and that require > cache management to be done in the dma_sync_*_for_device() > operation. This is typically done in both writeback and > writethrough mode. > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > (arm920t, arm940t) implementations are the exception here, > and only do the cache management after the DMA is complete, > in the dma_sync_*_for_cpu() operation. > > Change this for consistency with the other platforms. This > should have no user visible effect. NAK... The reason we do cache management _after_ is to ensure that there is no stale data. The kernel _has_ (at the very least in the past) performed DMA to data structures that are embedded within other data structures, resulting in cache lines being shared. If one of those cache lines is touched while DMA is progressing, then we must to cache management _after_ the DMA operation has completed. Doing it before is no good. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:07 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 9:07 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Most ARM CPUs can have write-back caches and that require > cache management to be done in the dma_sync_*_for_device() > operation. This is typically done in both writeback and > writethrough mode. > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > (arm920t, arm940t) implementations are the exception here, > and only do the cache management after the DMA is complete, > in the dma_sync_*_for_cpu() operation. > > Change this for consistency with the other platforms. This > should have no user visible effect. NAK... The reason we do cache management _after_ is to ensure that there is no stale data. The kernel _has_ (at the very least in the past) performed DMA to data structures that are embedded within other data structures, resulting in cache lines being shared. If one of those cache lines is touched while DMA is progressing, then we must to cache management _after_ the DMA operation has completed. Doing it before is no good. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:07 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 9:07 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Most ARM CPUs can have write-back caches and that require > cache management to be done in the dma_sync_*_for_device() > operation. This is typically done in both writeback and > writethrough mode. > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > (arm920t, arm940t) implementations are the exception here, > and only do the cache management after the DMA is complete, > in the dma_sync_*_for_cpu() operation. > > Change this for consistency with the other platforms. This > should have no user visible effect. NAK... The reason we do cache management _after_ is to ensure that there is no stale data. The kernel _has_ (at the very least in the past) performed DMA to data structures that are embedded within other data structures, resulting in cache lines being shared. If one of those cache lines is touched while DMA is progressing, then we must to cache management _after_ the DMA operation has completed. Doing it before is no good. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:07 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 9:07 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Most ARM CPUs can have write-back caches and that require > cache management to be done in the dma_sync_*_for_device() > operation. This is typically done in both writeback and > writethrough mode. > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > (arm920t, arm940t) implementations are the exception here, > and only do the cache management after the DMA is complete, > in the dma_sync_*_for_cpu() operation. > > Change this for consistency with the other platforms. This > should have no user visible effect. NAK... The reason we do cache management _after_ is to ensure that there is no stale data. The kernel _has_ (at the very least in the past) performed DMA to data structures that are embedded within other data structures, resulting in cache lines being shared. If one of those cache lines is touched while DMA is progressing, then we must to cache management _after_ the DMA operation has completed. Doing it before is no good. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:07 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 9:07 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Most ARM CPUs can have write-back caches and that require > cache management to be done in the dma_sync_*_for_device() > operation. This is typically done in both writeback and > writethrough mode. > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > (arm920t, arm940t) implementations are the exception here, > and only do the cache management after the DMA is complete, > in the dma_sync_*_for_cpu() operation. > > Change this for consistency with the other platforms. This > should have no user visible effect. NAK... The reason we do cache management _after_ is to ensure that there is no stale data. The kernel _has_ (at the very least in the past) performed DMA to data structures that are embedded within other data structures, resulting in cache lines being shared. If one of those cache lines is touched while DMA is progressing, then we must to cache management _after_ the DMA operation has completed. Doing it before is no good. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:07 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 9:07 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Most ARM CPUs can have write-back caches and that require > cache management to be done in the dma_sync_*_for_device() > operation. This is typically done in both writeback and > writethrough mode. > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > (arm920t, arm940t) implementations are the exception here, > and only do the cache management after the DMA is complete, > in the dma_sync_*_for_cpu() operation. > > Change this for consistency with the other platforms. This > should have no user visible effect. NAK... The reason we do cache management _after_ is to ensure that there is no stale data. The kernel _has_ (at the very least in the past) performed DMA to data structures that are embedded within other data structures, resulting in cache lines being shared. If one of those cache lines is touched while DMA is progressing, then we must to cache management _after_ the DMA operation has completed. Doing it before is no good. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA 2023-03-31 9:07 ` Russell King (Oracle) ` (3 preceding siblings ...) (?) @ 2023-03-31 9:35 ` Russell King (Oracle) -1 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 9:35 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: > On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > > > Most ARM CPUs can have write-back caches and that require > > cache management to be done in the dma_sync_*_for_device() > > operation. This is typically done in both writeback and > > writethrough mode. > > > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > > (arm920t, arm940t) implementations are the exception here, > > and only do the cache management after the DMA is complete, > > in the dma_sync_*_for_cpu() operation. > > > > Change this for consistency with the other platforms. This > > should have no user visible effect. > > NAK... > > The reason we do cache management _after_ is to ensure that there > is no stale data. The kernel _has_ (at the very least in the past) > performed DMA to data structures that are embedded within other > data structures, resulting in cache lines being shared. If one of > those cache lines is touched while DMA is progressing, then we > must to cache management _after_ the DMA operation has completed. > Doing it before is no good. It looks like the main offender of "touching cache lines shared with DMA" has now been resolved - that was the SCSI sense buffer, and was fixed some time ago: commit de25deb18016f66dcdede165d07654559bb332bc Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Date: Wed Jan 16 13:32:17 2008 +0900 /if/ that is the one and only case, then we're probably fine, but having been through an era where this kind of thing was the norm and requests to fix it did not get great responses from subsystem maintainers, I just don't trust the kernel not to want to DMA to overlapping cache lines. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:35 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 9:35 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: > On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > > > Most ARM CPUs can have write-back caches and that require > > cache management to be done in the dma_sync_*_for_device() > > operation. This is typically done in both writeback and > > writethrough mode. > > > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > > (arm920t, arm940t) implementations are the exception here, > > and only do the cache management after the DMA is complete, > > in the dma_sync_*_for_cpu() operation. > > > > Change this for consistency with the other platforms. This > > should have no user visible effect. > > NAK... > > The reason we do cache management _after_ is to ensure that there > is no stale data. The kernel _has_ (at the very least in the past) > performed DMA to data structures that are embedded within other > data structures, resulting in cache lines being shared. If one of > those cache lines is touched while DMA is progressing, then we > must to cache management _after_ the DMA operation has completed. > Doing it before is no good. It looks like the main offender of "touching cache lines shared with DMA" has now been resolved - that was the SCSI sense buffer, and was fixed some time ago: commit de25deb18016f66dcdede165d07654559bb332bc Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Date: Wed Jan 16 13:32:17 2008 +0900 /if/ that is the one and only case, then we're probably fine, but having been through an era where this kind of thing was the norm and requests to fix it did not get great responses from subsystem maintainers, I just don't trust the kernel not to want to DMA to overlapping cache lines. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:35 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 9:35 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: > On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > > > Most ARM CPUs can have write-back caches and that require > > cache management to be done in the dma_sync_*_for_device() > > operation. This is typically done in both writeback and > > writethrough mode. > > > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > > (arm920t, arm940t) implementations are the exception here, > > and only do the cache management after the DMA is complete, > > in the dma_sync_*_for_cpu() operation. > > > > Change this for consistency with the other platforms. This > > should have no user visible effect. > > NAK... > > The reason we do cache management _after_ is to ensure that there > is no stale data. The kernel _has_ (at the very least in the past) > performed DMA to data structures that are embedded within other > data structures, resulting in cache lines being shared. If one of > those cache lines is touched while DMA is progressing, then we > must to cache management _after_ the DMA operation has completed. > Doing it before is no good. It looks like the main offender of "touching cache lines shared with DMA" has now been resolved - that was the SCSI sense buffer, and was fixed some time ago: commit de25deb18016f66dcdede165d07654559bb332bc Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Date: Wed Jan 16 13:32:17 2008 +0900 /if/ that is the one and only case, then we're probably fine, but having been through an era where this kind of thing was the norm and requests to fix it did not get great responses from subsystem maintainers, I just don't trust the kernel not to want to DMA to overlapping cache lines. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:35 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 9:35 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: > On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > > > Most ARM CPUs can have write-back caches and that require > > cache management to be done in the dma_sync_*_for_device() > > operation. This is typically done in both writeback and > > writethrough mode. > > > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > > (arm920t, arm940t) implementations are the exception here, > > and only do the cache management after the DMA is complete, > > in the dma_sync_*_for_cpu() operation. > > > > Change this for consistency with the other platforms. This > > should have no user visible effect. > > NAK... > > The reason we do cache management _after_ is to ensure that there > is no stale data. The kernel _has_ (at the very least in the past) > performed DMA to data structures that are embedded within other > data structures, resulting in cache lines being shared. If one of > those cache lines is touched while DMA is progressing, then we > must to cache management _after_ the DMA operation has completed. > Doing it before is no good. It looks like the main offender of "touching cache lines shared with DMA" has now been resolved - that was the SCSI sense buffer, and was fixed some time ago: commit de25deb18016f66dcdede165d07654559bb332bc Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Date: Wed Jan 16 13:32:17 2008 +0900 /if/ that is the one and only case, then we're probably fine, but having been through an era where this kind of thing was the norm and requests to fix it did not get great responses from subsystem maintainers, I just don't trust the kernel not to want to DMA to overlapping cache lines. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:35 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 9:35 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: > On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > > > Most ARM CPUs can have write-back caches and that require > > cache management to be done in the dma_sync_*_for_device() > > operation. This is typically done in both writeback and > > writethrough mode. > > > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > > (arm920t, arm940t) implementations are the exception here, > > and only do the cache management after the DMA is complete, > > in the dma_sync_*_for_cpu() operation. > > > > Change this for consistency with the other platforms. This > > should have no user visible effect. > > NAK... > > The reason we do cache management _after_ is to ensure that there > is no stale data. The kernel _has_ (at the very least in the past) > performed DMA to data structures that are embedded within other > data structures, resulting in cache lines being shared. If one of > those cache lines is touched while DMA is progressing, then we > must to cache management _after_ the DMA operation has completed. > Doing it before is no good. It looks like the main offender of "touching cache lines shared with DMA" has now been resolved - that was the SCSI sense buffer, and was fixed some time ago: commit de25deb18016f66dcdede165d07654559bb332bc Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Date: Wed Jan 16 13:32:17 2008 +0900 /if/ that is the one and only case, then we're probably fine, but having been through an era where this kind of thing was the norm and requests to fix it did not get great responses from subsystem maintainers, I just don't trust the kernel not to want to DMA to overlapping cache lines. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 9:35 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 9:35 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christ On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: > On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > > > Most ARM CPUs can have write-back caches and that require > > cache management to be done in the dma_sync_*_for_device() > > operation. This is typically done in both writeback and > > writethrough mode. > > > > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > > (arm920t, arm940t) implementations are the exception here, > > and only do the cache management after the DMA is complete, > > in the dma_sync_*_for_cpu() operation. > > > > Change this for consistency with the other platforms. This > > should have no user visible effect. > > NAK... > > The reason we do cache management _after_ is to ensure that there > is no stale data. The kernel _has_ (at the very least in the past) > performed DMA to data structures that are embedded within other > data structures, resulting in cache lines being shared. If one of > those cache lines is touched while DMA is progressing, then we > must to cache management _after_ the DMA operation has completed. > Doing it before is no good. It looks like the main offender of "touching cache lines shared with DMA" has now been resolved - that was the SCSI sense buffer, and was fixed some time ago: commit de25deb18016f66dcdede165d07654559bb332bc Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Date: Wed Jan 16 13:32:17 2008 +0900 /if/ that is the one and only case, then we're probably fine, but having been through an era where this kind of thing was the norm and requests to fix it did not get great responses from subsystem maintainers, I just don't trust the kernel not to want to DMA to overlapping cache lines. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA 2023-03-31 9:35 ` Russell King (Oracle) ` (3 preceding siblings ...) (?) @ 2023-03-31 10:38 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 10:38 UTC (permalink / raw) To: Russell King, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: >> > From: Arnd Bergmann <arnd@arndb.de> >> > >> > Most ARM CPUs can have write-back caches and that require >> > cache management to be done in the dma_sync_*_for_device() >> > operation. This is typically done in both writeback and >> > writethrough mode. >> > >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S >> > (arm920t, arm940t) implementations are the exception here, >> > and only do the cache management after the DMA is complete, >> > in the dma_sync_*_for_cpu() operation. >> > >> > Change this for consistency with the other platforms. This >> > should have no user visible effect. >> >> NAK... >> >> The reason we do cache management _after_ is to ensure that there >> is no stale data. The kernel _has_ (at the very least in the past) >> performed DMA to data structures that are embedded within other >> data structures, resulting in cache lines being shared. If one of >> those cache lines is touched while DMA is progressing, then we >> must to cache management _after_ the DMA operation has completed. >> Doing it before is no good. What I'm trying to address here is the inconsistency between implementations. If we decide that we always want to invalidate after FROM_DEVICE, I can do that as part of the series, but then I have to change most of the other arm implementations. Right now, the only WT cache implementations that do the the invalidation after the DMA are cache-v4.S (arm720 integrator and clps711x), cache-v4wt.S (arm920/arm922 at91rm9200, clps711x, ep93xx, omap15xx, imx1 and integrator), some sparc32 leon3 and early xtensa. Most architectures that have write-through caches (m68k, microblaze) or write-back caches but no speculation (all other armv4/armv5, hexagon, openrisc, sh, most mips, later xtensa) only invalidate before DMA but not after. OTOH, most machines that are actually in use today (armv6+, powerpc, later mips, microblaze, riscv, nios2) also have to deal with speculative accesses, so they end up having to invalidate or flush both before and after a DMA_FROM_DEVICE and DMA_BIDIRECTIONAL. > It looks like the main offender of "touching cache lines shared > with DMA" has now been resolved - that was the SCSI sense buffer, > and was fixed some time ago: > > commit de25deb18016f66dcdede165d07654559bb332bc > Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> > Date: Wed Jan 16 13:32:17 2008 +0900 > > /if/ that is the one and only case, then we're probably fine, but > having been through an era where this kind of thing was the norm > and requests to fix it did not get great responses from subsystem > maintainers, I just don't trust the kernel not to want to DMA to > overlapping cache lines. Thanks for digging that out, that is very useful. It looks like this was around the same time as 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer"), so it may well have been related. I know we also had more recent problems with USB drivers trying to DMA to stack, which would also cause problems on non-coherent machines, but some of these were only found after we introduced VMAP_STACK. It would be nice to use KASAN prevent reads on cache lines that have in-flight DMA. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 10:38 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 10:38 UTC (permalink / raw) To: Russell King, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: >> > From: Arnd Bergmann <arnd@arndb.de> >> > >> > Most ARM CPUs can have write-back caches and that require >> > cache management to be done in the dma_sync_*_for_device() >> > operation. This is typically done in both writeback and >> > writethrough mode. >> > >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S >> > (arm920t, arm940t) implementations are the exception here, >> > and only do the cache management after the DMA is complete, >> > in the dma_sync_*_for_cpu() operation. >> > >> > Change this for consistency with the other platforms. This >> > should have no user visible effect. >> >> NAK... >> >> The reason we do cache management _after_ is to ensure that there >> is no stale data. The kernel _has_ (at the very least in the past) >> performed DMA to data structures that are embedded within other >> data structures, resulting in cache lines being shared. If one of >> those cache lines is touched while DMA is progressing, then we >> must to cache management _after_ the DMA operation has completed. >> Doing it before is no good. What I'm trying to address here is the inconsistency between implementations. If we decide that we always want to invalidate after FROM_DEVICE, I can do that as part of the series, but then I have to change most of the other arm implementations. Right now, the only WT cache implementations that do the the invalidation after the DMA are cache-v4.S (arm720 integrator and clps711x), cache-v4wt.S (arm920/arm922 at91rm9200, clps711x, ep93xx, omap15xx, imx1 and integrator), some sparc32 leon3 and early xtensa. Most architectures that have write-through caches (m68k, microblaze) or write-back caches but no speculation (all other armv4/armv5, hexagon, openrisc, sh, most mips, later xtensa) only invalidate before DMA but not after. OTOH, most machines that are actually in use today (armv6+, powerpc, later mips, microblaze, riscv, nios2) also have to deal with speculative accesses, so they end up having to invalidate or flush both before and after a DMA_FROM_DEVICE and DMA_BIDIRECTIONAL. > It looks like the main offender of "touching cache lines shared > with DMA" has now been resolved - that was the SCSI sense buffer, > and was fixed some time ago: > > commit de25deb18016f66dcdede165d07654559bb332bc > Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> > Date: Wed Jan 16 13:32:17 2008 +0900 > > /if/ that is the one and only case, then we're probably fine, but > having been through an era where this kind of thing was the norm > and requests to fix it did not get great responses from subsystem > maintainers, I just don't trust the kernel not to want to DMA to > overlapping cache lines. Thanks for digging that out, that is very useful. It looks like this was around the same time as 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer"), so it may well have been related. I know we also had more recent problems with USB drivers trying to DMA to stack, which would also cause problems on non-coherent machines, but some of these were only found after we introduced VMAP_STACK. It would be nice to use KASAN prevent reads on cache lines that have in-flight DMA. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 10:38 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 10:38 UTC (permalink / raw) To: Russell King, Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Brian Cain, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: >> > From: Arnd Bergmann <arnd@arndb.de> >> > >> > Most ARM CPUs can have write-back caches and that require >> > cache management to be done in the dma_sync_*_for_device() >> > operation. This is typically done in both writeback and >> > writethrough mode. >> > >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S >> > (arm920t, arm940t) implementations are the exception here, >> > and only do the cache management after the DMA is complete, >> > in the dma_sync_*_for_cpu() operation. >> > >> > Change this for consistency with the other platforms. This >> > should have no user visible effect. >> >> NAK... >> >> The reason we do cache management _after_ is to ensure that there >> is no stale data. The kernel _has_ (at the very least in the past) >> performed DMA to data structures that are embedded within other >> data structures, resulting in cache lines being shared. If one of >> those cache lines is touched while DMA is progressing, then we >> must to cache management _after_ the DMA operation has completed. >> Doing it before is no good. What I'm trying to address here is the inconsistency between implementations. If we decide that we always want to invalidate after FROM_DEVICE, I can do that as part of the series, but then I have to change most of the other arm implementations. Right now, the only WT cache implementations that do the the invalidation after the DMA are cache-v4.S (arm720 integrator and clps711x), cache-v4wt.S (arm920/arm922 at91rm9200, clps711x, ep93xx, omap15xx, imx1 and integrator), some sparc32 leon3 and early xtensa. Most architectures that have write-through caches (m68k, microblaze) or write-back caches but no speculation (all other armv4/armv5, hexagon, openrisc, sh, most mips, later xtensa) only invalidate before DMA but not after. OTOH, most machines that are actually in use today (armv6+, powerpc, later mips, microblaze, riscv, nios2) also have to deal with speculative accesses, so they end up having to invalidate or flush both before and after a DMA_FROM_DEVICE and DMA_BIDIRECTIONAL. > It looks like the main offender of "touching cache lines shared > with DMA" has now been resolved - that was the SCSI sense buffer, > and was fixed some time ago: > > commit de25deb18016f66dcdede165d07654559bb332bc > Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> > Date: Wed Jan 16 13:32:17 2008 +0900 > > /if/ that is the one and only case, then we're probably fine, but > having been through an era where this kind of thing was the norm > and requests to fix it did not get great responses from subsystem > maintainers, I just don't trust the kernel not to want to DMA to > overlapping cache lines. Thanks for digging that out, that is very useful. It looks like this was around the same time as 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer"), so it may well have been related. I know we also had more recent problems with USB drivers trying to DMA to stack, which would also cause problems on non-coherent machines, but some of these were only found after we introduced VMAP_STACK. It would be nice to use KASAN prevent reads on cache lines that have in-flight DMA. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 10:38 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 10:38 UTC (permalink / raw) To: Russell King, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: >> > From: Arnd Bergmann <arnd@arndb.de> >> > >> > Most ARM CPUs can have write-back caches and that require >> > cache management to be done in the dma_sync_*_for_device() >> > operation. This is typically done in both writeback and >> > writethrough mode. >> > >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S >> > (arm920t, arm940t) implementations are the exception here, >> > and only do the cache management after the DMA is complete, >> > in the dma_sync_*_for_cpu() operation. >> > >> > Change this for consistency with the other platforms. This >> > should have no user visible effect. >> >> NAK... >> >> The reason we do cache management _after_ is to ensure that there >> is no stale data. The kernel _has_ (at the very least in the past) >> performed DMA to data structures that are embedded within other >> data structures, resulting in cache lines being shared. If one of >> those cache lines is touched while DMA is progressing, then we >> must to cache management _after_ the DMA operation has completed. >> Doing it before is no good. What I'm trying to address here is the inconsistency between implementations. If we decide that we always want to invalidate after FROM_DEVICE, I can do that as part of the series, but then I have to change most of the other arm implementations. Right now, the only WT cache implementations that do the the invalidation after the DMA are cache-v4.S (arm720 integrator and clps711x), cache-v4wt.S (arm920/arm922 at91rm9200, clps711x, ep93xx, omap15xx, imx1 and integrator), some sparc32 leon3 and early xtensa. Most architectures that have write-through caches (m68k, microblaze) or write-back caches but no speculation (all other armv4/armv5, hexagon, openrisc, sh, most mips, later xtensa) only invalidate before DMA but not after. OTOH, most machines that are actually in use today (armv6+, powerpc, later mips, microblaze, riscv, nios2) also have to deal with speculative accesses, so they end up having to invalidate or flush both before and after a DMA_FROM_DEVICE and DMA_BIDIRECTIONAL. > It looks like the main offender of "touching cache lines shared > with DMA" has now been resolved - that was the SCSI sense buffer, > and was fixed some time ago: > > commit de25deb18016f66dcdede165d07654559bb332bc > Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> > Date: Wed Jan 16 13:32:17 2008 +0900 > > /if/ that is the one and only case, then we're probably fine, but > having been through an era where this kind of thing was the norm > and requests to fix it did not get great responses from subsystem > maintainers, I just don't trust the kernel not to want to DMA to > overlapping cache lines. Thanks for digging that out, that is very useful. It looks like this was around the same time as 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer"), so it may well have been related. I know we also had more recent problems with USB drivers trying to DMA to stack, which would also cause problems on non-coherent machines, but some of these were only found after we introduced VMAP_STACK. It would be nice to use KASAN prevent reads on cache lines that have in-flight DMA. Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 10:38 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 10:38 UTC (permalink / raw) To: Russell King, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: >> > From: Arnd Bergmann <arnd@arndb.de> >> > >> > Most ARM CPUs can have write-back caches and that require >> > cache management to be done in the dma_sync_*_for_device() >> > operation. This is typically done in both writeback and >> > writethrough mode. >> > >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S >> > (arm920t, arm940t) implementations are the exception here, >> > and only do the cache management after the DMA is complete, >> > in the dma_sync_*_for_cpu() operation. >> > >> > Change this for consistency with the other platforms. This >> > should have no user visible effect. >> >> NAK... >> >> The reason we do cache management _after_ is to ensure that there >> is no stale data. The kernel _has_ (at the very least in the past) >> performed DMA to data structures that are embedded within other >> data structures, resulting in cache lines being shared. If one of >> those cache lines is touched while DMA is progressing, then we >> must to cache management _after_ the DMA operation has completed. >> Doing it before is no good. What I'm trying to address here is the inconsistency between implementations. If we decide that we always want to invalidate after FROM_DEVICE, I can do that as part of the series, but then I have to change most of the other arm implementations. Right now, the only WT cache implementations that do the the invalidation after the DMA are cache-v4.S (arm720 integrator and clps711x), cache-v4wt.S (arm920/arm922 at91rm9200, clps711x, ep93xx, omap15xx, imx1 and integrator), some sparc32 leon3 and early xtensa. Most architectures that have write-through caches (m68k, microblaze) or write-back caches but no speculation (all other armv4/armv5, hexagon, openrisc, sh, most mips, later xtensa) only invalidate before DMA but not after. OTOH, most machines that are actually in use today (armv6+, powerpc, later mips, microblaze, riscv, nios2) also have to deal with speculative accesses, so they end up having to invalidate or flush both before and after a DMA_FROM_DEVICE and DMA_BIDIRECTIONAL. > It looks like the main offender of "touching cache lines shared > with DMA" has now been resolved - that was the SCSI sense buffer, > and was fixed some time ago: > > commit de25deb18016f66dcdede165d07654559bb332bc > Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> > Date: Wed Jan 16 13:32:17 2008 +0900 > > /if/ that is the one and only case, then we're probably fine, but > having been through an era where this kind of thing was the norm > and requests to fix it did not get great responses from subsystem > maintainers, I just don't trust the kernel not to want to DMA to > overlapping cache lines. Thanks for digging that out, that is very useful. It looks like this was around the same time as 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer"), so it may well have been related. I know we also had more recent problems with USB drivers trying to DMA to stack, which would also cause problems on non-coherent machines, but some of these were only found after we introduced VMAP_STACK. It would be nice to use KASAN prevent reads on cache lines that have in-flight DMA. Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 10:38 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 10:38 UTC (permalink / raw) To: Russell King, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: >> > From: Arnd Bergmann <arnd@arndb.de> >> > >> > Most ARM CPUs can have write-back caches and that require >> > cache management to be done in the dma_sync_*_for_device() >> > operation. This is typically done in both writeback and >> > writethrough mode. >> > >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S >> > (arm920t, arm940t) implementations are the exception here, >> > and only do the cache management after the DMA is complete, >> > in the dma_sync_*_for_cpu() operation. >> > >> > Change this for consistency with the other platforms. This >> > should have no user visible effect. >> >> NAK... >> >> The reason we do cache management _after_ is to ensure that there >> is no stale data. The kernel _has_ (at the very least in the past) >> performed DMA to data structures that are embedded within other >> data structures, resulting in cache lines being shared. If one of >> those cache lines is touched while DMA is progressing, then we >> must to cache management _after_ the DMA operation has completed. >> Doing it before is no good. What I'm trying to address here is the inconsistency between implementations. If we decide that we always want to invalidate after FROM_DEVICE, I can do that as part of the series, but then I have to change most of the other arm implementations. Right now, the only WT cache implementations that do the the invalidation after the DMA are cache-v4.S (arm720 integrator and clps711x), cache-v4wt.S (arm920/arm922 at91rm9200, clps711x, ep93xx, omap15xx, imx1 and integrator), some sparc32 leon3 and early xtensa. Most architectures that have write-through caches (m68k, microblaze) or write-back caches but no speculation (all other armv4/armv5, hexagon, openrisc, sh, most mips, later xtensa) only invalidate before DMA but not after. OTOH, most machines that are actually in use today (armv6+, powerpc, later mips, microblaze, riscv, nios2) also have to deal with speculative accesses, so they end up having to invalidate or flush both before and after a DMA_FROM_DEVICE and DMA_BIDIRECTIONAL. > It looks like the main offender of "touching cache lines shared > with DMA" has now been resolved - that was the SCSI sense buffer, > and was fixed some time ago: > > commit de25deb18016f66dcdede165d07654559bb332bc > Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> > Date: Wed Jan 16 13:32:17 2008 +0900 > > /if/ that is the one and only case, then we're probably fine, but > having been through an era where this kind of thing was the norm > and requests to fix it did not get great responses from subsystem > maintainers, I just don't trust the kernel not to want to DMA to > overlapping cache lines. Thanks for digging that out, that is very useful. It looks like this was around the same time as 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer"), so it may well have been related. I know we also had more recent problems with USB drivers trying to DMA to stack, which would also cause problems on non-coherent machines, but some of these were only found after we introduced VMAP_STACK. It would be nice to use KASAN prevent reads on cache lines that have in-flight DMA. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* RE: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA 2023-03-31 10:38 ` Arnd Bergmann ` (2 preceding siblings ...) (?) @ 2023-03-31 11:01 ` David Laight -1 siblings, 0 replies; 456+ messages in thread From: David Laight @ 2023-03-31 11:01 UTC (permalink / raw) To: 'Arnd Bergmann', Russell King, Arnd Bergmann Cc: Rich Felker, linux-sh@vger.kernel.org, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips@vger.kernel.org, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org From: Arnd Bergmann > Sent: 31 March 2023 11:39 ... > Most architectures that have write-through caches (m68k, > microblaze) or write-back caches but no speculation (all other > armv4/armv5, hexagon, openrisc, sh, most mips, later xtensa) > only invalidate before DMA but not after. > > OTOH, most machines that are actually in use today (armv6+, > powerpc, later mips, microblaze, riscv, nios2) also have to > deal with speculative accesses, so they end up having to > invalidate or flush both before and after a DMA_FROM_DEVICE > and DMA_BIDIRECTIONAL. nios2 is a simple in-order cpu with a short pipeline (it is a soft-cpu made from normal fpga logic elements). Definitely doesn't do speculative accesses. OTOH any one trying to run Linux on it needs their head examined. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 456+ messages in thread
* RE: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 11:01 ` David Laight 0 siblings, 0 replies; 456+ messages in thread From: David Laight @ 2023-03-31 11:01 UTC (permalink / raw) To: 'Arnd Bergmann', Russell King, Arnd Bergmann Cc: Rich Felker, linux-sh@vger.kernel.org, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips@vger.kernel.org, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org, Brian Cain, Lad, Prabhakar, linux-m68k@lists.linux-m68k.org, Paul Walmsley, Stafford Horne, linux-arm-kernel@lists.infradead.org, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc@vger.kernel.org, linux-openrisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Dinh Nguyen, Palmer Dabbelt, linux-hexagon@vger.kernel.org, linux-oxnas@groups.io, Robin Murphy, David S . Miller From: Arnd Bergmann > Sent: 31 March 2023 11:39 ... > Most architectures that have write-through caches (m68k, > microblaze) or write-back caches but no speculation (all other > armv4/armv5, hexagon, openrisc, sh, most mips, later xtensa) > only invalidate before DMA but not after. > > OTOH, most machines that are actually in use today (armv6+, > powerpc, later mips, microblaze, riscv, nios2) also have to > deal with speculative accesses, so they end up having to > invalidate or flush both before and after a DMA_FROM_DEVICE > and DMA_BIDIRECTIONAL. nios2 is a simple in-order cpu with a short pipeline (it is a soft-cpu made from normal fpga logic elements). Definitely doesn't do speculative accesses. OTOH any one trying to run Linux on it needs their head examined. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* RE: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 11:01 ` David Laight 0 siblings, 0 replies; 456+ messages in thread From: David Laight @ 2023-03-31 11:01 UTC (permalink / raw) To: 'Arnd Bergmann', Russell King, Arnd Bergmann Cc: Rich Felker, linux-sh@vger.kernel.org, Catalin Marinas, Linus Walleij, Paul Walmsley, linux-kernel@vger.kernel.org, Max Filippov, Conor.Dooley, guoren, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon, Christoph Hellwig, linux-hexagon@vger.kernel.org, Helge Deller, linux-csky@vger.kernel.org, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org, Neil Armstrong, Lad, Prabhakar, linux-m68k@lists.linux-m68k.org, John Paul Adrian Glaubitz, Stafford Horne, linux-arm-kernel@lists.infradead.org, Brian Cain, Michal Simek, Thomas Bogendoerfer, linux-parisc@vger.kernel.org, linux-openrisc@vger.kernel.org, Robin Murphy, linux-mips@vger.kernel.org, Dinh Nguyen, Palmer Dabbelt, linux-oxnas@groups.io, linuxppc-dev@lists.ozlabs.org, David S . Miller From: Arnd Bergmann > Sent: 31 March 2023 11:39 ... > Most architectures that have write-through caches (m68k, > microblaze) or write-back caches but no speculation (all other > armv4/armv5, hexagon, openrisc, sh, most mips, later xtensa) > only invalidate before DMA but not after. > > OTOH, most machines that are actually in use today (armv6+, > powerpc, later mips, microblaze, riscv, nios2) also have to > deal with speculative accesses, so they end up having to > invalidate or flush both before and after a DMA_FROM_DEVICE > and DMA_BIDIRECTIONAL. nios2 is a simple in-order cpu with a short pipeline (it is a soft-cpu made from normal fpga logic elements). Definitely doesn't do speculative accesses. OTOH any one trying to run Linux on it needs their head examined. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 456+ messages in thread
* RE: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 11:01 ` David Laight 0 siblings, 0 replies; 456+ messages in thread From: David Laight @ 2023-03-31 11:01 UTC (permalink / raw) To: 'Arnd Bergmann', Russell King, Arnd Bergmann Cc: Rich Felker, linux-sh@vger.kernel.org, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips@vger.kernel.org, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org, Brian Cain, Lad, Prabhakar, linux-m68k@lists.linux-m68k.org, Paul Walmsley, Stafford Horne, linux-arm-kernel@lists.infradead.org, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc@vger.kernel.org, linux-openrisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Dinh Nguyen, Palmer Dabbelt, linux-hexagon@vger.kernel.org, linux-oxnas@groups.io, Robin Murphy, David S . Miller From: Arnd Bergmann > Sent: 31 March 2023 11:39 ... > Most architectures that have write-through caches (m68k, > microblaze) or write-back caches but no speculation (all other > armv4/armv5, hexagon, openrisc, sh, most mips, later xtensa) > only invalidate before DMA but not after. > > OTOH, most machines that are actually in use today (armv6+, > powerpc, later mips, microblaze, riscv, nios2) also have to > deal with speculative accesses, so they end up having to > invalidate or flush both before and after a DMA_FROM_DEVICE > and DMA_BIDIRECTIONAL. nios2 is a simple in-order cpu with a short pipeline (it is a soft-cpu made from normal fpga logic elements). Definitely doesn't do speculative accesses. OTOH any one trying to run Linux on it needs their head examined. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* RE: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 11:01 ` David Laight 0 siblings, 0 replies; 456+ messages in thread From: David Laight @ 2023-03-31 11:01 UTC (permalink / raw) To: 'Arnd Bergmann', Russell King, Arnd Bergmann Cc: Rich Felker, linux-sh@vger.kernel.org, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips@vger.kernel.org, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org, Brian Cain, Lad, Prabhakar, linux-m68k@lists.linux-m68k.org, Paul Walmsley, Stafford Horne, linux-arm-kernel@lists.infradead.org, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc@vger.kernel.org, linux-openrisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Dinh Nguyen, Palmer Dabbelt, linux-hexagon@vger.kernel.org, linux-oxnas@groups.io, Robin Murphy, David S . Miller From: Arnd Bergmann > Sent: 31 March 2023 11:39 ... > Most architectures that have write-through caches (m68k, > microblaze) or write-back caches but no speculation (all other > armv4/armv5, hexagon, openrisc, sh, most mips, later xtensa) > only invalidate before DMA but not after. > > OTOH, most machines that are actually in use today (armv6+, > powerpc, later mips, microblaze, riscv, nios2) also have to > deal with speculative accesses, so they end up having to > invalidate or flush both before and after a DMA_FROM_DEVICE > and DMA_BIDIRECTIONAL. nios2 is a simple in-order cpu with a short pipeline (it is a soft-cpu made from normal fpga logic elements). Definitely doesn't do speculative accesses. OTOH any one trying to run Linux on it needs their head examined. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA 2023-03-31 10:38 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-31 11:08 ` Russell King (Oracle) -1 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 11:08 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023 at 12:38:45PM +0200, Arnd Bergmann wrote: > On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: > > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: > >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > >> > From: Arnd Bergmann <arnd@arndb.de> > >> > > >> > Most ARM CPUs can have write-back caches and that require > >> > cache management to be done in the dma_sync_*_for_device() > >> > operation. This is typically done in both writeback and > >> > writethrough mode. > >> > > >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > >> > (arm920t, arm940t) implementations are the exception here, > >> > and only do the cache management after the DMA is complete, > >> > in the dma_sync_*_for_cpu() operation. > >> > > >> > Change this for consistency with the other platforms. This > >> > should have no user visible effect. > >> > >> NAK... > >> > >> The reason we do cache management _after_ is to ensure that there > >> is no stale data. The kernel _has_ (at the very least in the past) > >> performed DMA to data structures that are embedded within other > >> data structures, resulting in cache lines being shared. If one of > >> those cache lines is touched while DMA is progressing, then we > >> must to cache management _after_ the DMA operation has completed. > >> Doing it before is no good. > > What I'm trying to address here is the inconsistency between > implementations. If we decide that we always want to invalidate > after FROM_DEVICE, I can do that as part of the series, but then > I have to change most of the other arm implementations. Why? First thing to say is that DMA to buffers where the cache lines are shared with data the CPU may be accessing need to be outlawed - they are a recipe for data corruption - always have been. Sadly, some folk don't see it that way because of a passed "x86 just works and we demand that all architectures behave like x86!" attitude. The SCSI sense buffer has historically been a big culpret for that. For WT, FROM_DEVICE, invalidating after DMA is the right thing to do, because we want to ensure that the DMA'd data is properly readable upon completion of the DMA. If overlapping cache lines have been touched while DMA is progressing, and we invalidate before DMA, then the cache will contain stale data that will remain in the cache after DMA has completed. Invalidating a WT cache does not destroy any data, so is safe to do. So the safest approach is to invalidate after DMA has completed in this instance. For WB, FROM_DEVICE, we have the problem of dirty cache lines which we have to get rid of. For the overlapping cache lines, we have to clean those before DMA begins to ensure that data written to the non-DMA-buffer part is preserved. All other cache lines need to be invalidated before DMA begins to ensure that writebacks do not corrupt data from the device. Hence why it's different. And hence why the ARM implementation is based around buffer ownership. And hence why they're called dma_map_area()/dma_unmap_area() rather than the cache operations themselves. This is an intentional change, one that was done when ARMv6 came along. > OTOH, most machines that are actually in use today (armv6+, > powerpc, later mips, microblaze, riscv, nios2) also have to > deal with speculative accesses, so they end up having to > invalidate or flush both before and after a DMA_FROM_DEVICE > and DMA_BIDIRECTIONAL. Again, these are implementation details of the cache, and this is precisely why having the map/unmap interface is so much better than having generic code explicitly call "clean" and "invalidate" interfaces into arch code. If we treat everything as a speculative cache, then we're doing needless extra work for those caches that aren't speculative. So, ARM would have to step through every cache line for every DMA buffer at 32-byte intervals performing cache maintenance whether the cache is speculative or not. That is expensive, and hurts performance. I put a lot of thought into this when I updated the ARM DMA implementation when we started seeing these different cache types particularly when ARMv6 came along. I really don't want that work wrecked. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 11:08 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 11:08 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023 at 12:38:45PM +0200, Arnd Bergmann wrote: > On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: > > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: > >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > >> > From: Arnd Bergmann <arnd@arndb.de> > >> > > >> > Most ARM CPUs can have write-back caches and that require > >> > cache management to be done in the dma_sync_*_for_device() > >> > operation. This is typically done in both writeback and > >> > writethrough mode. > >> > > >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > >> > (arm920t, arm940t) implementations are the exception here, > >> > and only do the cache management after the DMA is complete, > >> > in the dma_sync_*_for_cpu() operation. > >> > > >> > Change this for consistency with the other platforms. This > >> > should have no user visible effect. > >> > >> NAK... > >> > >> The reason we do cache management _after_ is to ensure that there > >> is no stale data. The kernel _has_ (at the very least in the past) > >> performed DMA to data structures that are embedded within other > >> data structures, resulting in cache lines being shared. If one of > >> those cache lines is touched while DMA is progressing, then we > >> must to cache management _after_ the DMA operation has completed. > >> Doing it before is no good. > > What I'm trying to address here is the inconsistency between > implementations. If we decide that we always want to invalidate > after FROM_DEVICE, I can do that as part of the series, but then > I have to change most of the other arm implementations. Why? First thing to say is that DMA to buffers where the cache lines are shared with data the CPU may be accessing need to be outlawed - they are a recipe for data corruption - always have been. Sadly, some folk don't see it that way because of a passed "x86 just works and we demand that all architectures behave like x86!" attitude. The SCSI sense buffer has historically been a big culpret for that. For WT, FROM_DEVICE, invalidating after DMA is the right thing to do, because we want to ensure that the DMA'd data is properly readable upon completion of the DMA. If overlapping cache lines have been touched while DMA is progressing, and we invalidate before DMA, then the cache will contain stale data that will remain in the cache after DMA has completed. Invalidating a WT cache does not destroy any data, so is safe to do. So the safest approach is to invalidate after DMA has completed in this instance. For WB, FROM_DEVICE, we have the problem of dirty cache lines which we have to get rid of. For the overlapping cache lines, we have to clean those before DMA begins to ensure that data written to the non-DMA-buffer part is preserved. All other cache lines need to be invalidated before DMA begins to ensure that writebacks do not corrupt data from the device. Hence why it's different. And hence why the ARM implementation is based around buffer ownership. And hence why they're called dma_map_area()/dma_unmap_area() rather than the cache operations themselves. This is an intentional change, one that was done when ARMv6 came along. > OTOH, most machines that are actually in use today (armv6+, > powerpc, later mips, microblaze, riscv, nios2) also have to > deal with speculative accesses, so they end up having to > invalidate or flush both before and after a DMA_FROM_DEVICE > and DMA_BIDIRECTIONAL. Again, these are implementation details of the cache, and this is precisely why having the map/unmap interface is so much better than having generic code explicitly call "clean" and "invalidate" interfaces into arch code. If we treat everything as a speculative cache, then we're doing needless extra work for those caches that aren't speculative. So, ARM would have to step through every cache line for every DMA buffer at 32-byte intervals performing cache maintenance whether the cache is speculative or not. That is expensive, and hurts performance. I put a lot of thought into this when I updated the ARM DMA implementation when we started seeing these different cache types particularly when ARMv6 came along. I really don't want that work wrecked. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 11:08 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 11:08 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Neil Armstrong, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Brian Cain, Arnd Bergmann <ar On Fri, Mar 31, 2023 at 12:38:45PM +0200, Arnd Bergmann wrote: > On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: > > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: > >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > >> > From: Arnd Bergmann <arnd@arndb.de> > >> > > >> > Most ARM CPUs can have write-back caches and that require > >> > cache management to be done in the dma_sync_*_for_device() > >> > operation. This is typically done in both writeback and > >> > writethrough mode. > >> > > >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > >> > (arm920t, arm940t) implementations are the exception here, > >> > and only do the cache management after the DMA is complete, > >> > in the dma_sync_*_for_cpu() operation. > >> > > >> > Change this for consistency with the other platforms. This > >> > should have no user visible effect. > >> > >> NAK... > >> > >> The reason we do cache management _after_ is to ensure that there > >> is no stale data. The kernel _has_ (at the very least in the past) > >> performed DMA to data structures that are embedded within other > >> data structures, resulting in cache lines being shared. If one of > >> those cache lines is touched while DMA is progressing, then we > >> must to cache management _after_ the DMA operation has completed. > >> Doing it before is no good. > > What I'm trying to address here is the inconsistency between > implementations. If we decide that we always want to invalidate > after FROM_DEVICE, I can do that as part of the series, but then > I have to change most of the other arm implementations. Why? First thing to say is that DMA to buffers where the cache lines are shared with data the CPU may be accessing need to be outlawed - they are a recipe for data corruption - always have been. Sadly, some folk don't see it that way because of a passed "x86 just works and we demand that all architectures behave like x86!" attitude. The SCSI sense buffer has historically been a big culpret for that. For WT, FROM_DEVICE, invalidating after DMA is the right thing to do, because we want to ensure that the DMA'd data is properly readable upon completion of the DMA. If overlapping cache lines have been touched while DMA is progressing, and we invalidate before DMA, then the cache will contain stale data that will remain in the cache after DMA has completed. Invalidating a WT cache does not destroy any data, so is safe to do. So the safest approach is to invalidate after DMA has completed in this instance. For WB, FROM_DEVICE, we have the problem of dirty cache lines which we have to get rid of. For the overlapping cache lines, we have to clean those before DMA begins to ensure that data written to the non-DMA-buffer part is preserved. All other cache lines need to be invalidated before DMA begins to ensure that writebacks do not corrupt data from the device. Hence why it's different. And hence why the ARM implementation is based around buffer ownership. And hence why they're called dma_map_area()/dma_unmap_area() rather than the cache operations themselves. This is an intentional change, one that was done when ARMv6 came along. > OTOH, most machines that are actually in use today (armv6+, > powerpc, later mips, microblaze, riscv, nios2) also have to > deal with speculative accesses, so they end up having to > invalidate or flush both before and after a DMA_FROM_DEVICE > and DMA_BIDIRECTIONAL. Again, these are implementation details of the cache, and this is precisely why having the map/unmap interface is so much better than having generic code explicitly call "clean" and "invalidate" interfaces into arch code. If we treat everything as a speculative cache, then we're doing needless extra work for those caches that aren't speculative. So, ARM would have to step through every cache line for every DMA buffer at 32-byte intervals performing cache maintenance whether the cache is speculative or not. That is expensive, and hurts performance. I put a lot of thought into this when I updated the ARM DMA implementation when we started seeing these different cache types particularly when ARMv6 came along. I really don't want that work wrecked. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 11:08 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 11:08 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023 at 12:38:45PM +0200, Arnd Bergmann wrote: > On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: > > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: > >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > >> > From: Arnd Bergmann <arnd@arndb.de> > >> > > >> > Most ARM CPUs can have write-back caches and that require > >> > cache management to be done in the dma_sync_*_for_device() > >> > operation. This is typically done in both writeback and > >> > writethrough mode. > >> > > >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > >> > (arm920t, arm940t) implementations are the exception here, > >> > and only do the cache management after the DMA is complete, > >> > in the dma_sync_*_for_cpu() operation. > >> > > >> > Change this for consistency with the other platforms. This > >> > should have no user visible effect. > >> > >> NAK... > >> > >> The reason we do cache management _after_ is to ensure that there > >> is no stale data. The kernel _has_ (at the very least in the past) > >> performed DMA to data structures that are embedded within other > >> data structures, resulting in cache lines being shared. If one of > >> those cache lines is touched while DMA is progressing, then we > >> must to cache management _after_ the DMA operation has completed. > >> Doing it before is no good. > > What I'm trying to address here is the inconsistency between > implementations. If we decide that we always want to invalidate > after FROM_DEVICE, I can do that as part of the series, but then > I have to change most of the other arm implementations. Why? First thing to say is that DMA to buffers where the cache lines are shared with data the CPU may be accessing need to be outlawed - they are a recipe for data corruption - always have been. Sadly, some folk don't see it that way because of a passed "x86 just works and we demand that all architectures behave like x86!" attitude. The SCSI sense buffer has historically been a big culpret for that. For WT, FROM_DEVICE, invalidating after DMA is the right thing to do, because we want to ensure that the DMA'd data is properly readable upon completion of the DMA. If overlapping cache lines have been touched while DMA is progressing, and we invalidate before DMA, then the cache will contain stale data that will remain in the cache after DMA has completed. Invalidating a WT cache does not destroy any data, so is safe to do. So the safest approach is to invalidate after DMA has completed in this instance. For WB, FROM_DEVICE, we have the problem of dirty cache lines which we have to get rid of. For the overlapping cache lines, we have to clean those before DMA begins to ensure that data written to the non-DMA-buffer part is preserved. All other cache lines need to be invalidated before DMA begins to ensure that writebacks do not corrupt data from the device. Hence why it's different. And hence why the ARM implementation is based around buffer ownership. And hence why they're called dma_map_area()/dma_unmap_area() rather than the cache operations themselves. This is an intentional change, one that was done when ARMv6 came along. > OTOH, most machines that are actually in use today (armv6+, > powerpc, later mips, microblaze, riscv, nios2) also have to > deal with speculative accesses, so they end up having to > invalidate or flush both before and after a DMA_FROM_DEVICE > and DMA_BIDIRECTIONAL. Again, these are implementation details of the cache, and this is precisely why having the map/unmap interface is so much better than having generic code explicitly call "clean" and "invalidate" interfaces into arch code. If we treat everything as a speculative cache, then we're doing needless extra work for those caches that aren't speculative. So, ARM would have to step through every cache line for every DMA buffer at 32-byte intervals performing cache maintenance whether the cache is speculative or not. That is expensive, and hurts performance. I put a lot of thought into this when I updated the ARM DMA implementation when we started seeing these different cache types particularly when ARMv6 came along. I really don't want that work wrecked. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 11:08 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 11:08 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023 at 12:38:45PM +0200, Arnd Bergmann wrote: > On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: > > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: > >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > >> > From: Arnd Bergmann <arnd@arndb.de> > >> > > >> > Most ARM CPUs can have write-back caches and that require > >> > cache management to be done in the dma_sync_*_for_device() > >> > operation. This is typically done in both writeback and > >> > writethrough mode. > >> > > >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > >> > (arm920t, arm940t) implementations are the exception here, > >> > and only do the cache management after the DMA is complete, > >> > in the dma_sync_*_for_cpu() operation. > >> > > >> > Change this for consistency with the other platforms. This > >> > should have no user visible effect. > >> > >> NAK... > >> > >> The reason we do cache management _after_ is to ensure that there > >> is no stale data. The kernel _has_ (at the very least in the past) > >> performed DMA to data structures that are embedded within other > >> data structures, resulting in cache lines being shared. If one of > >> those cache lines is touched while DMA is progressing, then we > >> must to cache management _after_ the DMA operation has completed. > >> Doing it before is no good. > > What I'm trying to address here is the inconsistency between > implementations. If we decide that we always want to invalidate > after FROM_DEVICE, I can do that as part of the series, but then > I have to change most of the other arm implementations. Why? First thing to say is that DMA to buffers where the cache lines are shared with data the CPU may be accessing need to be outlawed - they are a recipe for data corruption - always have been. Sadly, some folk don't see it that way because of a passed "x86 just works and we demand that all architectures behave like x86!" attitude. The SCSI sense buffer has historically been a big culpret for that. For WT, FROM_DEVICE, invalidating after DMA is the right thing to do, because we want to ensure that the DMA'd data is properly readable upon completion of the DMA. If overlapping cache lines have been touched while DMA is progressing, and we invalidate before DMA, then the cache will contain stale data that will remain in the cache after DMA has completed. Invalidating a WT cache does not destroy any data, so is safe to do. So the safest approach is to invalidate after DMA has completed in this instance. For WB, FROM_DEVICE, we have the problem of dirty cache lines which we have to get rid of. For the overlapping cache lines, we have to clean those before DMA begins to ensure that data written to the non-DMA-buffer part is preserved. All other cache lines need to be invalidated before DMA begins to ensure that writebacks do not corrupt data from the device. Hence why it's different. And hence why the ARM implementation is based around buffer ownership. And hence why they're called dma_map_area()/dma_unmap_area() rather than the cache operations themselves. This is an intentional change, one that was done when ARMv6 came along. > OTOH, most machines that are actually in use today (armv6+, > powerpc, later mips, microblaze, riscv, nios2) also have to > deal with speculative accesses, so they end up having to > invalidate or flush both before and after a DMA_FROM_DEVICE > and DMA_BIDIRECTIONAL. Again, these are implementation details of the cache, and this is precisely why having the map/unmap interface is so much better than having generic code explicitly call "clean" and "invalidate" interfaces into arch code. If we treat everything as a speculative cache, then we're doing needless extra work for those caches that aren't speculative. So, ARM would have to step through every cache line for every DMA buffer at 32-byte intervals performing cache maintenance whether the cache is speculative or not. That is expensive, and hurts performance. I put a lot of thought into this when I updated the ARM DMA implementation when we started seeing these different cache types particularly when ARMv6 came along. I really don't want that work wrecked. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 11:08 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 11:08 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Chri On Fri, Mar 31, 2023 at 12:38:45PM +0200, Arnd Bergmann wrote: > On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: > > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: > >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: > >> > From: Arnd Bergmann <arnd@arndb.de> > >> > > >> > Most ARM CPUs can have write-back caches and that require > >> > cache management to be done in the dma_sync_*_for_device() > >> > operation. This is typically done in both writeback and > >> > writethrough mode. > >> > > >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S > >> > (arm920t, arm940t) implementations are the exception here, > >> > and only do the cache management after the DMA is complete, > >> > in the dma_sync_*_for_cpu() operation. > >> > > >> > Change this for consistency with the other platforms. This > >> > should have no user visible effect. > >> > >> NAK... > >> > >> The reason we do cache management _after_ is to ensure that there > >> is no stale data. The kernel _has_ (at the very least in the past) > >> performed DMA to data structures that are embedded within other > >> data structures, resulting in cache lines being shared. If one of > >> those cache lines is touched while DMA is progressing, then we > >> must to cache management _after_ the DMA operation has completed. > >> Doing it before is no good. > > What I'm trying to address here is the inconsistency between > implementations. If we decide that we always want to invalidate > after FROM_DEVICE, I can do that as part of the series, but then > I have to change most of the other arm implementations. Why? First thing to say is that DMA to buffers where the cache lines are shared with data the CPU may be accessing need to be outlawed - they are a recipe for data corruption - always have been. Sadly, some folk don't see it that way because of a passed "x86 just works and we demand that all architectures behave like x86!" attitude. The SCSI sense buffer has historically been a big culpret for that. For WT, FROM_DEVICE, invalidating after DMA is the right thing to do, because we want to ensure that the DMA'd data is properly readable upon completion of the DMA. If overlapping cache lines have been touched while DMA is progressing, and we invalidate before DMA, then the cache will contain stale data that will remain in the cache after DMA has completed. Invalidating a WT cache does not destroy any data, so is safe to do. So the safest approach is to invalidate after DMA has completed in this instance. For WB, FROM_DEVICE, we have the problem of dirty cache lines which we have to get rid of. For the overlapping cache lines, we have to clean those before DMA begins to ensure that data written to the non-DMA-buffer part is preserved. All other cache lines need to be invalidated before DMA begins to ensure that writebacks do not corrupt data from the device. Hence why it's different. And hence why the ARM implementation is based around buffer ownership. And hence why they're called dma_map_area()/dma_unmap_area() rather than the cache operations themselves. This is an intentional change, one that was done when ARMv6 came along. > OTOH, most machines that are actually in use today (armv6+, > powerpc, later mips, microblaze, riscv, nios2) also have to > deal with speculative accesses, so they end up having to > invalidate or flush both before and after a DMA_FROM_DEVICE > and DMA_BIDIRECTIONAL. Again, these are implementation details of the cache, and this is precisely why having the map/unmap interface is so much better than having generic code explicitly call "clean" and "invalidate" interfaces into arch code. If we treat everything as a speculative cache, then we're doing needless extra work for those caches that aren't speculative. So, ARM would have to step through every cache line for every DMA buffer at 32-byte intervals performing cache maintenance whether the cache is speculative or not. That is expensive, and hurts performance. I put a lot of thought into this when I updated the ARM DMA implementation when we started seeing these different cache types particularly when ARMv6 came along. I really don't want that work wrecked. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA 2023-03-31 11:08 ` Russell King (Oracle) ` (3 preceding siblings ...) (?) @ 2023-03-31 12:32 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:32 UTC (permalink / raw) To: Russell King Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 13:08, Russell King (Oracle) wrote: > On Fri, Mar 31, 2023 at 12:38:45PM +0200, Arnd Bergmann wrote: >> On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: >> > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: >> >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: >> >> > From: Arnd Bergmann <arnd@arndb.de> >> >> > >> >> > Most ARM CPUs can have write-back caches and that require >> >> > cache management to be done in the dma_sync_*_for_device() >> >> > operation. This is typically done in both writeback and >> >> > writethrough mode. >> >> > >> >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S >> >> > (arm920t, arm940t) implementations are the exception here, >> >> > and only do the cache management after the DMA is complete, >> >> > in the dma_sync_*_for_cpu() operation. >> >> > >> >> > Change this for consistency with the other platforms. This >> >> > should have no user visible effect. >> >> >> >> NAK...So t >> >> >> >> The reason we do cache management _after_ is to ensure that there >> >> is no stale data. The kernel _has_ (at the very least in the past) >> >> performed DMA to data structures that are embedded within other >> >> data structures, resulting in cache lines being shared. If one of >> >> those cache lines is touched while DMA is progressing, then we >> >> must to cache management _after_ the DMA operation has completed. >> >> Doing it before is no good. >> >> What I'm trying to address here is the inconsistency between >> implementations. If we decide that we always want to invalidate >> after FROM_DEVICE, I can do that as part of the series, but then >> I have to change most of the other arm implementations. > > Why? > > First thing to say is that DMA to buffers where the cache lines are > shared with data the CPU may be accessing need to be outlawed - they > are a recipe for data corruption - always have been. Sadly, some folk > don't see it that way because of a passed "x86 just works and we demand > that all architectures behave like x86!" attitude. The SCSI sense > buffer has historically been a big culpret for that. I think that part is pretty much agree by everyone, the difference between architectures is to what extend they try to work around drivers that get it wrong. > For WT, FROM_DEVICE, invalidating after DMA is the right thing to do, > because we want to ensure that the DMA'd data is properly readable upon > completion of the DMA. If overlapping cache lines haveDoes that mean you take back you NAK on this patch tehn? been touched > while DMA is proSo tgressing, and we invalidate before DMA, then the cache > will contain stale data that will remain in the cache after DMA has > completed. Invalidating a WT cache does not destroy any data, so is > safe to do. So the safest approach is to invalidate after DMA has > completed in this instance. > For WB, FROM_DEVICE, we have the problem of dirty cache lines which > we have to get rid of. For the overlapping cache lines, we have to > clean those before DMA begins to ensure that data written to the > non-DMA-buffer part is preserved. All other cache lines need to be > invalidated before DMA begins to ensure that writebacks do not > corrupt data from the device. Hence why it's different. I don't see how WB and Wt caches being different implies that we should give extra guarantees to (broken) drivers when WT caches on other architectures. Always doing it first in the absence of prefetching avoids a special case in the generic implementation and makes the driver interface on Arm/sparc32/xtensa WT caches no different from what everything provides. The writeback before DMA_FROM_DEVICE is another issue that we have to address at some point, as there are clearly incompatible expectations here. It makes no sense that a device driver can rely on the entire to be written back on a 64-bit arm kernel but not on a 32-bit kernel. > And hence why the ARM implementation is based around buffer ownership. > And hence why they're called dma_map_area()/dma_unmap_area() rather > than the cache operations themselves. This is an intentional change, > one that was done when ARMv6 came along. The bit that has changed in the meantime though is that the buffer ownership interfaces has moved up in the stack and is now handled mostly in the common kernel/dma/*.c that multiplexes between the direct/iommu/swiotlb dma_map_ops, except for the bit about noncoherent devices. Right now, we have 37 implementations that are mostly identical, and all the differences are either bugs or disagreements about the API guarantees but not related to architecture specific requirements. >> OTOH, most machines that are actually in use today (armv6+, >> powerpc, later mips, microblaze, riscv, nios2) also have to >> deal with speculative accesses, so they end up having to >> invalidate or flush both before and after a DMA_FROM_DEVICE >> and DMA_BIDIRECTIONAL. > > Again, these are implementation details of the cache, and this is > precisely why having the map/unmap interface is so much better than > having generic code explicitly call "clean" and "invalidate" > interfaces into arch code. > > If we treat everything as a speculative cache, then we're doing > needless extra work for those caches that aren't speculative. So, > ARM would have to step through every cache line for every DMA > buffer at 32-byte intervals performing cache maintenance whether > the cache is speculative or not. That is expensive, and hurts > performance. Dop that mean that you agree with this patch 15 then after all? If you think we don't need an invalidation after DMA_FROM_DEVICE on non-speculating CPUs, it should be fine to make the WT case consistent with the rest. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 12:32 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:32 UTC (permalink / raw) To: Russell King Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 13:08, Russell King (Oracle) wrote: > On Fri, Mar 31, 2023 at 12:38:45PM +0200, Arnd Bergmann wrote: >> On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: >> > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: >> >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: >> >> > From: Arnd Bergmann <arnd@arndb.de> >> >> > >> >> > Most ARM CPUs can have write-back caches and that require >> >> > cache management to be done in the dma_sync_*_for_device() >> >> > operation. This is typically done in both writeback and >> >> > writethrough mode. >> >> > >> >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S >> >> > (arm920t, arm940t) implementations are the exception here, >> >> > and only do the cache management after the DMA is complete, >> >> > in the dma_sync_*_for_cpu() operation. >> >> > >> >> > Change this for consistency with the other platforms. This >> >> > should have no user visible effect. >> >> >> >> NAK...So t >> >> >> >> The reason we do cache management _after_ is to ensure that there >> >> is no stale data. The kernel _has_ (at the very least in the past) >> >> performed DMA to data structures that are embedded within other >> >> data structures, resulting in cache lines being shared. If one of >> >> those cache lines is touched while DMA is progressing, then we >> >> must to cache management _after_ the DMA operation has completed. >> >> Doing it before is no good. >> >> What I'm trying to address here is the inconsistency between >> implementations. If we decide that we always want to invalidate >> after FROM_DEVICE, I can do that as part of the series, but then >> I have to change most of the other arm implementations. > > Why? > > First thing to say is that DMA to buffers where the cache lines are > shared with data the CPU may be accessing need to be outlawed - they > are a recipe for data corruption - always have been. Sadly, some folk > don't see it that way because of a passed "x86 just works and we demand > that all architectures behave like x86!" attitude. The SCSI sense > buffer has historically been a big culpret for that. I think that part is pretty much agree by everyone, the difference between architectures is to what extend they try to work around drivers that get it wrong. > For WT, FROM_DEVICE, invalidating after DMA is the right thing to do, > because we want to ensure that the DMA'd data is properly readable upon > completion of the DMA. If overlapping cache lines haveDoes that mean you take back you NAK on this patch tehn? been touched > while DMA is proSo tgressing, and we invalidate before DMA, then the cache > will contain stale data that will remain in the cache after DMA has > completed. Invalidating a WT cache does not destroy any data, so is > safe to do. So the safest approach is to invalidate after DMA has > completed in this instance. > For WB, FROM_DEVICE, we have the problem of dirty cache lines which > we have to get rid of. For the overlapping cache lines, we have to > clean those before DMA begins to ensure that data written to the > non-DMA-buffer part is preserved. All other cache lines need to be > invalidated before DMA begins to ensure that writebacks do not > corrupt data from the device. Hence why it's different. I don't see how WB and Wt caches being different implies that we should give extra guarantees to (broken) drivers when WT caches on other architectures. Always doing it first in the absence of prefetching avoids a special case in the generic implementation and makes the driver interface on Arm/sparc32/xtensa WT caches no different from what everything provides. The writeback before DMA_FROM_DEVICE is another issue that we have to address at some point, as there are clearly incompatible expectations here. It makes no sense that a device driver can rely on the entire to be written back on a 64-bit arm kernel but not on a 32-bit kernel. > And hence why the ARM implementation is based around buffer ownership. > And hence why they're called dma_map_area()/dma_unmap_area() rather > than the cache operations themselves. This is an intentional change, > one that was done when ARMv6 came along. The bit that has changed in the meantime though is that the buffer ownership interfaces has moved up in the stack and is now handled mostly in the common kernel/dma/*.c that multiplexes between the direct/iommu/swiotlb dma_map_ops, except for the bit about noncoherent devices. Right now, we have 37 implementations that are mostly identical, and all the differences are either bugs or disagreements about the API guarantees but not related to architecture specific requirements. >> OTOH, most machines that are actually in use today (armv6+, >> powerpc, later mips, microblaze, riscv, nios2) also have to >> deal with speculative accesses, so they end up having to >> invalidate or flush both before and after a DMA_FROM_DEVICE >> and DMA_BIDIRECTIONAL. > > Again, these are implementation details of the cache, and this is > precisely why having the map/unmap interface is so much better than > having generic code explicitly call "clean" and "invalidate" > interfaces into arch code. > > If we treat everything as a speculative cache, then we're doing > needless extra work for those caches that aren't speculative. So, > ARM would have to step through every cache line for every DMA > buffer at 32-byte intervals performing cache maintenance whether > the cache is speculative or not. That is expensive, and hurts > performance. Dop that mean that you agree with this patch 15 then after all? If you think we don't need an invalidation after DMA_FROM_DEVICE on non-speculating CPUs, it should be fine to make the WT case consistent with the rest. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 12:32 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:32 UTC (permalink / raw) To: Russell King Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Neil Armstrong, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Brian Cain, Arnd Bergmann, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Fri, Mar 31, 2023, at 13:08, Russell King (Oracle) wrote: > On Fri, Mar 31, 2023 at 12:38:45PM +0200, Arnd Bergmann wrote: >> On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: >> > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: >> >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: >> >> > From: Arnd Bergmann <arnd@arndb.de> >> >> > >> >> > Most ARM CPUs can have write-back caches and that require >> >> > cache management to be done in the dma_sync_*_for_device() >> >> > operation. This is typically done in both writeback and >> >> > writethrough mode. >> >> > >> >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S >> >> > (arm920t, arm940t) implementations are the exception here, >> >> > and only do the cache management after the DMA is complete, >> >> > in the dma_sync_*_for_cpu() operation. >> >> > >> >> > Change this for consistency with the other platforms. This >> >> > should have no user visible effect. >> >> >> >> NAK...So t >> >> >> >> The reason we do cache management _after_ is to ensure that there >> >> is no stale data. The kernel _has_ (at the very least in the past) >> >> performed DMA to data structures that are embedded within other >> >> data structures, resulting in cache lines being shared. If one of >> >> those cache lines is touched while DMA is progressing, then we >> >> must to cache management _after_ the DMA operation has completed. >> >> Doing it before is no good. >> >> What I'm trying to address here is the inconsistency between >> implementations. If we decide that we always want to invalidate >> after FROM_DEVICE, I can do that as part of the series, but then >> I have to change most of the other arm implementations. > > Why? > > First thing to say is that DMA to buffers where the cache lines are > shared with data the CPU may be accessing need to be outlawed - they > are a recipe for data corruption - always have been. Sadly, some folk > don't see it that way because of a passed "x86 just works and we demand > that all architectures behave like x86!" attitude. The SCSI sense > buffer has historically been a big culpret for that. I think that part is pretty much agree by everyone, the difference between architectures is to what extend they try to work around drivers that get it wrong. > For WT, FROM_DEVICE, invalidating after DMA is the right thing to do, > because we want to ensure that the DMA'd data is properly readable upon > completion of the DMA. If overlapping cache lines haveDoes that mean you take back you NAK on this patch tehn? been touched > while DMA is proSo tgressing, and we invalidate before DMA, then the cache > will contain stale data that will remain in the cache after DMA has > completed. Invalidating a WT cache does not destroy any data, so is > safe to do. So the safest approach is to invalidate after DMA has > completed in this instance. > For WB, FROM_DEVICE, we have the problem of dirty cache lines which > we have to get rid of. For the overlapping cache lines, we have to > clean those before DMA begins to ensure that data written to the > non-DMA-buffer part is preserved. All other cache lines need to be > invalidated before DMA begins to ensure that writebacks do not > corrupt data from the device. Hence why it's different. I don't see how WB and Wt caches being different implies that we should give extra guarantees to (broken) drivers when WT caches on other architectures. Always doing it first in the absence of prefetching avoids a special case in the generic implementation and makes the driver interface on Arm/sparc32/xtensa WT caches no different from what everything provides. The writeback before DMA_FROM_DEVICE is another issue that we have to address at some point, as there are clearly incompatible expectations here. It makes no sense that a device driver can rely on the entire to be written back on a 64-bit arm kernel but not on a 32-bit kernel. > And hence why the ARM implementation is based around buffer ownership. > And hence why they're called dma_map_area()/dma_unmap_area() rather > than the cache operations themselves. This is an intentional change, > one that was done when ARMv6 came along. The bit that has changed in the meantime though is that the buffer ownership interfaces has moved up in the stack and is now handled mostly in the common kernel/dma/*.c that multiplexes between the direct/iommu/swiotlb dma_map_ops, except for the bit about noncoherent devices. Right now, we have 37 implementations that are mostly identical, and all the differences are either bugs or disagreements about the API guarantees but not related to architecture specific requirements. >> OTOH, most machines that are actually in use today (armv6+, >> powerpc, later mips, microblaze, riscv, nios2) also have to >> deal with speculative accesses, so they end up having to >> invalidate or flush both before and after a DMA_FROM_DEVICE >> and DMA_BIDIRECTIONAL. > > Again, these are implementation details of the cache, and this is > precisely why having the map/unmap interface is so much better than > having generic code explicitly call "clean" and "invalidate" > interfaces into arch code. > > If we treat everything as a speculative cache, then we're doing > needless extra work for those caches that aren't speculative. So, > ARM would have to step through every cache line for every DMA > buffer at 32-byte intervals performing cache maintenance whether > the cache is speculative or not. That is expensive, and hurts > performance. Dop that mean that you agree with this patch 15 then after all? If you think we don't need an invalidation after DMA_FROM_DEVICE on non-speculating CPUs, it should be fine to make the WT case consistent with the rest. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 12:32 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:32 UTC (permalink / raw) To: Russell King Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 13:08, Russell King (Oracle) wrote: > On Fri, Mar 31, 2023 at 12:38:45PM +0200, Arnd Bergmann wrote: >> On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: >> > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: >> >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: >> >> > From: Arnd Bergmann <arnd@arndb.de> >> >> > >> >> > Most ARM CPUs can have write-back caches and that require >> >> > cache management to be done in the dma_sync_*_for_device() >> >> > operation. This is typically done in both writeback and >> >> > writethrough mode. >> >> > >> >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S >> >> > (arm920t, arm940t) implementations are the exception here, >> >> > and only do the cache management after the DMA is complete, >> >> > in the dma_sync_*_for_cpu() operation. >> >> > >> >> > Change this for consistency with the other platforms. This >> >> > should have no user visible effect. >> >> >> >> NAK...So t >> >> >> >> The reason we do cache management _after_ is to ensure that there >> >> is no stale data. The kernel _has_ (at the very least in the past) >> >> performed DMA to data structures that are embedded within other >> >> data structures, resulting in cache lines being shared. If one of >> >> those cache lines is touched while DMA is progressing, then we >> >> must to cache management _after_ the DMA operation has completed. >> >> Doing it before is no good. >> >> What I'm trying to address here is the inconsistency between >> implementations. If we decide that we always want to invalidate >> after FROM_DEVICE, I can do that as part of the series, but then >> I have to change most of the other arm implementations. > > Why? > > First thing to say is that DMA to buffers where the cache lines are > shared with data the CPU may be accessing need to be outlawed - they > are a recipe for data corruption - always have been. Sadly, some folk > don't see it that way because of a passed "x86 just works and we demand > that all architectures behave like x86!" attitude. The SCSI sense > buffer has historically been a big culpret for that. I think that part is pretty much agree by everyone, the difference between architectures is to what extend they try to work around drivers that get it wrong. > For WT, FROM_DEVICE, invalidating after DMA is the right thing to do, > because we want to ensure that the DMA'd data is properly readable upon > completion of the DMA. If overlapping cache lines haveDoes that mean you take back you NAK on this patch tehn? been touched > while DMA is proSo tgressing, and we invalidate before DMA, then the cache > will contain stale data that will remain in the cache after DMA has > completed. Invalidating a WT cache does not destroy any data, so is > safe to do. So the safest approach is to invalidate after DMA has > completed in this instance. > For WB, FROM_DEVICE, we have the problem of dirty cache lines which > we have to get rid of. For the overlapping cache lines, we have to > clean those before DMA begins to ensure that data written to the > non-DMA-buffer part is preserved. All other cache lines need to be > invalidated before DMA begins to ensure that writebacks do not > corrupt data from the device. Hence why it's different. I don't see how WB and Wt caches being different implies that we should give extra guarantees to (broken) drivers when WT caches on other architectures. Always doing it first in the absence of prefetching avoids a special case in the generic implementation and makes the driver interface on Arm/sparc32/xtensa WT caches no different from what everything provides. The writeback before DMA_FROM_DEVICE is another issue that we have to address at some point, as there are clearly incompatible expectations here. It makes no sense that a device driver can rely on the entire to be written back on a 64-bit arm kernel but not on a 32-bit kernel. > And hence why the ARM implementation is based around buffer ownership. > And hence why they're called dma_map_area()/dma_unmap_area() rather > than the cache operations themselves. This is an intentional change, > one that was done when ARMv6 came along. The bit that has changed in the meantime though is that the buffer ownership interfaces has moved up in the stack and is now handled mostly in the common kernel/dma/*.c that multiplexes between the direct/iommu/swiotlb dma_map_ops, except for the bit about noncoherent devices. Right now, we have 37 implementations that are mostly identical, and all the differences are either bugs or disagreements about the API guarantees but not related to architecture specific requirements. >> OTOH, most machines that are actually in use today (armv6+, >> powerpc, later mips, microblaze, riscv, nios2) also have to >> deal with speculative accesses, so they end up having to >> invalidate or flush both before and after a DMA_FROM_DEVICE >> and DMA_BIDIRECTIONAL. > > Again, these are implementation details of the cache, and this is > precisely why having the map/unmap interface is so much better than > having generic code explicitly call "clean" and "invalidate" > interfaces into arch code. > > If we treat everything as a speculative cache, then we're doing > needless extra work for those caches that aren't speculative. So, > ARM would have to step through every cache line for every DMA > buffer at 32-byte intervals performing cache maintenance whether > the cache is speculative or not. That is expensive, and hurts > performance. Dop that mean that you agree with this patch 15 then after all? If you think we don't need an invalidation after DMA_FROM_DEVICE on non-speculating CPUs, it should be fine to make the WT case consistent with the rest. Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 12:32 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:32 UTC (permalink / raw) To: Russell King Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 13:08, Russell King (Oracle) wrote: > On Fri, Mar 31, 2023 at 12:38:45PM +0200, Arnd Bergmann wrote: >> On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: >> > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: >> >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: >> >> > From: Arnd Bergmann <arnd@arndb.de> >> >> > >> >> > Most ARM CPUs can have write-back caches and that require >> >> > cache management to be done in the dma_sync_*_for_device() >> >> > operation. This is typically done in both writeback and >> >> > writethrough mode. >> >> > >> >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S >> >> > (arm920t, arm940t) implementations are the exception here, >> >> > and only do the cache management after the DMA is complete, >> >> > in the dma_sync_*_for_cpu() operation. >> >> > >> >> > Change this for consistency with the other platforms. This >> >> > should have no user visible effect. >> >> >> >> NAK...So t >> >> >> >> The reason we do cache management _after_ is to ensure that there >> >> is no stale data. The kernel _has_ (at the very least in the past) >> >> performed DMA to data structures that are embedded within other >> >> data structures, resulting in cache lines being shared. If one of >> >> those cache lines is touched while DMA is progressing, then we >> >> must to cache management _after_ the DMA operation has completed. >> >> Doing it before is no good. >> >> What I'm trying to address here is the inconsistency between >> implementations. If we decide that we always want to invalidate >> after FROM_DEVICE, I can do that as part of the series, but then >> I have to change most of the other arm implementations. > > Why? > > First thing to say is that DMA to buffers where the cache lines are > shared with data the CPU may be accessing need to be outlawed - they > are a recipe for data corruption - always have been. Sadly, some folk > don't see it that way because of a passed "x86 just works and we demand > that all architectures behave like x86!" attitude. The SCSI sense > buffer has historically been a big culpret for that. I think that part is pretty much agree by everyone, the difference between architectures is to what extend they try to work around drivers that get it wrong. > For WT, FROM_DEVICE, invalidating after DMA is the right thing to do, > because we want to ensure that the DMA'd data is properly readable upon > completion of the DMA. If overlapping cache lines haveDoes that mean you take back you NAK on this patch tehn? been touched > while DMA is proSo tgressing, and we invalidate before DMA, then the cache > will contain stale data that will remain in the cache after DMA has > completed. Invalidating a WT cache does not destroy any data, so is > safe to do. So the safest approach is to invalidate after DMA has > completed in this instance. > For WB, FROM_DEVICE, we have the problem of dirty cache lines which > we have to get rid of. For the overlapping cache lines, we have to > clean those before DMA begins to ensure that data written to the > non-DMA-buffer part is preserved. All other cache lines need to be > invalidated before DMA begins to ensure that writebacks do not > corrupt data from the device. Hence why it's different. I don't see how WB and Wt caches being different implies that we should give extra guarantees to (broken) drivers when WT caches on other architectures. Always doing it first in the absence of prefetching avoids a special case in the generic implementation and makes the driver interface on Arm/sparc32/xtensa WT caches no different from what everything provides. The writeback before DMA_FROM_DEVICE is another issue that we have to address at some point, as there are clearly incompatible expectations here. It makes no sense that a device driver can rely on the entire to be written back on a 64-bit arm kernel but not on a 32-bit kernel. > And hence why the ARM implementation is based around buffer ownership. > And hence why they're called dma_map_area()/dma_unmap_area() rather > than the cache operations themselves. This is an intentional change, > one that was done when ARMv6 came along. The bit that has changed in the meantime though is that the buffer ownership interfaces has moved up in the stack and is now handled mostly in the common kernel/dma/*.c that multiplexes between the direct/iommu/swiotlb dma_map_ops, except for the bit about noncoherent devices. Right now, we have 37 implementations that are mostly identical, and all the differences are either bugs or disagreements about the API guarantees but not related to architecture specific requirements. >> OTOH, most machines that are actually in use today (armv6+, >> powerpc, later mips, microblaze, riscv, nios2) also have to >> deal with speculative accesses, so they end up having to >> invalidate or flush both before and after a DMA_FROM_DEVICE >> and DMA_BIDIRECTIONAL. > > Again, these are implementation details of the cache, and this is > precisely why having the map/unmap interface is so much better than > having generic code explicitly call "clean" and "invalidate" > interfaces into arch code. > > If we treat everything as a speculative cache, then we're doing > needless extra work for those caches that aren't speculative. So, > ARM would have to step through every cache line for every DMA > buffer at 32-byte intervals performing cache maintenance whether > the cache is speculative or not. That is expensive, and hurts > performance. Dop that mean that you agree with this patch 15 then after all? If you think we don't need an invalidation after DMA_FROM_DEVICE on non-speculating CPUs, it should be fine to make the WT case consistent with the rest. Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA @ 2023-03-31 12:32 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:32 UTC (permalink / raw) To: Russell King Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, M On Fri, Mar 31, 2023, at 13:08, Russell King (Oracle) wrote: > On Fri, Mar 31, 2023 at 12:38:45PM +0200, Arnd Bergmann wrote: >> On Fri, Mar 31, 2023, at 11:35, Russell King (Oracle) wrote: >> > On Fri, Mar 31, 2023 at 10:07:28AM +0100, Russell King (Oracle) wrote: >> >> On Mon, Mar 27, 2023 at 02:13:11PM +0200, Arnd Bergmann wrote: >> >> > From: Arnd Bergmann <arnd@arndb.de> >> >> > >> >> > Most ARM CPUs can have write-back caches and that require >> >> > cache management to be done in the dma_sync_*_for_device() >> >> > operation. This is typically done in both writeback and >> >> > writethrough mode. >> >> > >> >> > The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S >> >> > (arm920t, arm940t) implementations are the exception here, >> >> > and only do the cache management after the DMA is complete, >> >> > in the dma_sync_*_for_cpu() operation. >> >> > >> >> > Change this for consistency with the other platforms. This >> >> > should have no user visible effect. >> >> >> >> NAK...So t >> >> >> >> The reason we do cache management _after_ is to ensure that there >> >> is no stale data. The kernel _has_ (at the very least in the past) >> >> performed DMA to data structures that are embedded within other >> >> data structures, resulting in cache lines being shared. If one of >> >> those cache lines is touched while DMA is progressing, then we >> >> must to cache management _after_ the DMA operation has completed. >> >> Doing it before is no good. >> >> What I'm trying to address here is the inconsistency between >> implementations. If we decide that we always want to invalidate >> after FROM_DEVICE, I can do that as part of the series, but then >> I have to change most of the other arm implementations. > > Why? > > First thing to say is that DMA to buffers where the cache lines are > shared with data the CPU may be accessing need to be outlawed - they > are a recipe for data corruption - always have been. Sadly, some folk > don't see it that way because of a passed "x86 just works and we demand > that all architectures behave like x86!" attitude. The SCSI sense > buffer has historically been a big culpret for that. I think that part is pretty much agree by everyone, the difference between architectures is to what extend they try to work around drivers that get it wrong. > For WT, FROM_DEVICE, invalidating after DMA is the right thing to do, > because we want to ensure that the DMA'd data is properly readable upon > completion of the DMA. If overlapping cache lines haveDoes that mean you take back you NAK on this patch tehn? been touched > while DMA is proSo tgressing, and we invalidate before DMA, then the cache > will contain stale data that will remain in the cache after DMA has > completed. Invalidating a WT cache does not destroy any data, so is > safe to do. So the safest approach is to invalidate after DMA has > completed in this instance. > For WB, FROM_DEVICE, we have the problem of dirty cache lines which > we have to get rid of. For the overlapping cache lines, we have to > clean those before DMA begins to ensure that data written to the > non-DMA-buffer part is preserved. All other cache lines need to be > invalidated before DMA begins to ensure that writebacks do not > corrupt data from the device. Hence why it's different. I don't see how WB and Wt caches being different implies that we should give extra guarantees to (broken) drivers when WT caches on other architectures. Always doing it first in the absence of prefetching avoids a special case in the generic implementation and makes the driver interface on Arm/sparc32/xtensa WT caches no different from what everything provides. The writeback before DMA_FROM_DEVICE is another issue that we have to address at some point, as there are clearly incompatible expectations here. It makes no sense that a device driver can rely on the entire to be written back on a 64-bit arm kernel but not on a 32-bit kernel. > And hence why the ARM implementation is based around buffer ownership. > And hence why they're called dma_map_area()/dma_unmap_area() rather > than the cache operations themselves. This is an intentional change, > one that was done when ARMv6 came along. The bit that has changed in the meantime though is that the buffer ownership interfaces has moved up in the stack and is now handled mostly in the common kernel/dma/*.c that multiplexes between the direct/iommu/swiotlb dma_map_ops, except for the bit about noncoherent devices. Right now, we have 37 implementations that are mostly identical, and all the differences are either bugs or disagreements about the API guarantees but not related to architecture specific requirements. >> OTOH, most machines that are actually in use today (armv6+, >> powerpc, later mips, microblaze, riscv, nios2) also have to >> deal with speculative accesses, so they end up having to >> invalidate or flush both before and after a DMA_FROM_DEVICE >> and DMA_BIDIRECTIONAL. > > Again, these are implementation details of the cache, and this is > precisely why having the map/unmap interface is so much better than > having generic code explicitly call "clean" and "invalidate" > interfaces into arch code. > > If we treat everything as a speculative cache, then we're doing > needless extra work for those caches that aren't speculative. So, > ARM would have to step through every cache line for every DMA > buffer at 32-byte intervals performing cache maintenance whether > the cache is speculative or not. That is expensive, and hurts > performance. Dop that mean that you agree with this patch 15 then after all? If you think we don't need an invalidation after DMA_FROM_DEVICE on non-speculating CPUs, it should be fine to make the WT case consistent with the rest. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: remove dmac_clean_range and dmac_inv_range") in an effort to sanitize the dma-mapping API. Now this logic is getting moved into the generic dma-mapping implementation in order to give architectures less control over it, which requires reverting that earlier work. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/include/asm/cacheflush.h | 21 +++++++++++++++++++++ arch/arm/include/asm/glue-cache.h | 4 ++++ arch/arm/mm/cache-fa.S | 4 ++-- arch/arm/mm/cache-nop.S | 6 ++++++ arch/arm/mm/cache-v4.S | 5 +++++ arch/arm/mm/cache-v4wb.S | 4 ++-- arch/arm/mm/cache-v4wt.S | 14 +++++++++++++- arch/arm/mm/cache-v6.S | 4 ++-- arch/arm/mm/cache-v7.S | 6 ++++-- arch/arm/mm/cache-v7m.S | 4 ++-- arch/arm/mm/proc-arm1020.S | 4 ++-- arch/arm/mm/proc-arm1020e.S | 4 ++-- arch/arm/mm/proc-arm1022.S | 4 ++-- arch/arm/mm/proc-arm1026.S | 4 ++-- arch/arm/mm/proc-arm920.S | 4 ++-- arch/arm/mm/proc-arm922.S | 4 ++-- arch/arm/mm/proc-arm925.S | 4 ++-- arch/arm/mm/proc-arm926.S | 4 ++-- arch/arm/mm/proc-arm940.S | 4 ++-- arch/arm/mm/proc-arm946.S | 4 ++-- arch/arm/mm/proc-feroceon.S | 8 ++++---- arch/arm/mm/proc-macros.S | 2 ++ arch/arm/mm/proc-mohawk.S | 4 ++-- arch/arm/mm/proc-xsc3.S | 4 ++-- arch/arm/mm/proc-xscale.S | 6 ++++-- 25 files changed, 95 insertions(+), 41 deletions(-) diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index a094f964c869..04462bfe9130 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -91,6 +91,21 @@ * DMA Cache Coherency * =================== * + * dma_inv_range(start, end) + * + * Invalidate (discard) the specified virtual address range. + * May not write back any entries. If 'start' or 'end' + * are not cache line aligned, those lines must be written + * back. + * - start - virtual start address + * - end - virtual end address + * + * dma_clean_range(start, end) + * + * Clean (write back) the specified virtual address range. + * - start - virtual start address + * - end - virtual end address + * * dma_flush_range(start, end) * * Clean and invalidate the specified virtual address range. @@ -112,6 +127,8 @@ struct cpu_cache_fns { void (*dma_map_area)(const void *, size_t, int); void (*dma_unmap_area)(const void *, size_t, int); + void (*dma_clean_range)(const void *, const void *); + void (*dma_inv_range)(const void *, const void *); void (*dma_flush_range)(const void *, const void *); } __no_randomize_layout; @@ -137,6 +154,8 @@ extern struct cpu_cache_fns cpu_cache; * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +#define dmac_clean_range cpu_cache.dma_clean_range +#define dmac_inv_range cpu_cache.dma_inv_range #define dmac_flush_range cpu_cache.dma_flush_range #else @@ -156,6 +175,8 @@ extern void __cpuc_flush_dcache_area(void *, size_t); * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +extern void dmac_clean_range(const void *, const void *); +extern void dmac_inv_range(const void *, const void *); extern void dmac_flush_range(const void *, const void *); #endif diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h index 724f8dac1e5b..d8c93b483adf 100644 --- a/arch/arm/include/asm/glue-cache.h +++ b/arch/arm/include/asm/glue-cache.h @@ -139,6 +139,8 @@ static inline int nop_coherent_user_range(unsigned long a, unsigned long b) { return 0; } static inline void nop_flush_kern_dcache_area(void *a, size_t s) { } +static inline void nop_dma_clean_range(const void *a, const void *b) { } +static inline void nop_dma_inv_range(const void *a, const void *b) { } static inline void nop_dma_flush_range(const void *a, const void *b) { } static inline void nop_dma_map_area(const void *s, size_t l, int f) { } @@ -155,6 +157,8 @@ static inline void nop_dma_unmap_area(const void *s, size_t l, int f) { } #define __cpuc_coherent_user_range __glue(_CACHE,_coherent_user_range) #define __cpuc_flush_dcache_area __glue(_CACHE,_flush_kern_dcache_area) +#define dmac_clean_range __glue(_CACHE,_dma_clean_range) +#define dmac_inv_range __glue(_CACHE,_dma_inv_range) #define dmac_flush_range __glue(_CACHE,_dma_flush_range) #endif diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S index 3a464d1649b4..abc3d58948dd 100644 --- a/arch/arm/mm/cache-fa.S +++ b/arch/arm/mm/cache-fa.S @@ -166,7 +166,7 @@ ENTRY(fa_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -fa_dma_inv_range: +ENTRY(fa_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c14, 1 @ clean & invalidate D entry @@ -189,7 +189,7 @@ fa_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -fa_dma_clean_range: +ENTRY(fa_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/cache-nop.S b/arch/arm/mm/cache-nop.S index 72d939ef8798..a058544d6c2b 100644 --- a/arch/arm/mm/cache-nop.S +++ b/arch/arm/mm/cache-nop.S @@ -32,6 +32,12 @@ ENDPROC(nop_coherent_user_range) .globl nop_flush_kern_dcache_area .equ nop_flush_kern_dcache_area, nop_flush_icache_all + .globl nop_dma_clean_range + .equ nop_dma_clean_range, nop_flush_icache_all + + .globl nop_dma_inv_range + .equ nop_dma_inv_range, nop_flush_icache_all + .globl nop_dma_flush_range .equ nop_dma_flush_range, nop_flush_icache_all diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index e2b104876340..b747e591109c 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -103,17 +103,22 @@ ENTRY(v4_flush_kern_dcache_area) /* * dma_flush_range(start, end) + * dma_inv_range(start, end) * * Clean and invalidate the specified virtual address range. + * As only write-through caches are supported here, this is the + * same as invalidate, while the clean operation does nothing. * * - start - virtual start address * - end - virtual end address */ +ENTRY(v4_dma_inv_range) ENTRY(v4_dma_flush_range) #ifdef CONFIG_CPU_CP15 mov r0, #0 mcr p15, 0, r0, c7, c7, 0 @ flush ID cache #endif +ENTRY(v4_dma_clean_range) ret lr /* diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S index 905ac2fa2b1e..55f609eae38d 100644 --- a/arch/arm/mm/cache-v4wb.S +++ b/arch/arm/mm/cache-v4wb.S @@ -183,7 +183,7 @@ ENTRY(v4wb_coherent_user_range) * - start - virtual start address * - end - virtual end address */ -v4wb_dma_inv_range: +ENTRY(v4wb_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -204,7 +204,7 @@ v4wb_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -v4wb_dma_clean_range: +ENTRY(v4wb_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 652218752f88..1a88627ec09b 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -152,7 +152,7 @@ ENTRY(v4wt_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -v4wt_dma_inv_range: +ENTRY(v4wt_dma_inv_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c6, 1 @ invalidate D entry add r0, r0, #CACHE_DLINESIZE @@ -171,6 +171,18 @@ v4wt_dma_inv_range: .globl v4wt_dma_flush_range .equ v4wt_dma_flush_range, v4wt_dma_inv_range +/* + * dma_clean_range(start, end) + * + * Clean the specified virtual address range. + * Empty implementation for writethrough caches. + * + * - start - virtual start address + * - end - virtual end address + */ + .globl v4wt_dma_clean_range + .equ v4wt_dma_clean_range, v4wt_dma_unmap_area + /* * dma_map_area(start, size, dir) * - start - kernel virtual start address diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index 250c83bf7158..abae7ff5defc 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -200,7 +200,7 @@ ENTRY(v6_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v6_dma_inv_range: +ENTRY(v6_dma_inv_range) #ifdef CONFIG_DMA_CACHE_RWFO ldrb r2, [r0] @ read for ownership strb r2, [r0] @ write for ownership @@ -245,7 +245,7 @@ v6_dma_inv_range: * - start - virtual start address of region * - end - virtual end address of region */ -v6_dma_clean_range: +ENTRY(v6_dma_clean_range) bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: #ifdef CONFIG_DMA_CACHE_RWFO diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index 127afe2096ba..b16a0d2a7cce 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -361,7 +361,7 @@ ENDPROC(v7_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v7_dma_inv_range: +ENTRY(v7_dma_inv_range) dcache_line_size r2, r3 sub r3, r2, #1 tst r0, r3 @@ -391,7 +391,7 @@ ENDPROC(v7_dma_inv_range) * - start - virtual start address of region * - end - virtual end address of region */ -v7_dma_clean_range: +ENTRY(v7_dma_clean_range) dcache_line_size r2, r3 sub r3, r2, #1 bic r0, r0, r3 @@ -477,6 +477,8 @@ ENDPROC(v7_dma_unmap_area) globl_equ b15_dma_map_area, v7_dma_map_area globl_equ b15_dma_unmap_area, v7_dma_unmap_area + globl_equ b15_dma_clean_range, v7_dma_clean_range + globl_equ b15_dma_inv_range, v7_dma_inv_range globl_equ b15_dma_flush_range, v7_dma_flush_range define_cache_functions b15 diff --git a/arch/arm/mm/cache-v7m.S b/arch/arm/mm/cache-v7m.S index eb60b5e5e2ad..4fc6e0028e40 100644 --- a/arch/arm/mm/cache-v7m.S +++ b/arch/arm/mm/cache-v7m.S @@ -364,7 +364,7 @@ ENDPROC(v7m_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v7m_dma_inv_range: +ENTRY(v7m_dma_inv_range) dcache_line_size r2, r3 sub r3, r2, #1 tst r0, r3 @@ -390,7 +390,7 @@ ENDPROC(v7m_dma_inv_range) * - start - virtual start address of region * - end - virtual end address of region */ -v7m_dma_clean_range: +ENTRY(v7m_dma_clean_range) dcache_line_size r2, r3 sub r3, r2, #1 bic r0, r0, r3 diff --git a/arch/arm/mm/proc-arm1020.S b/arch/arm/mm/proc-arm1020.S index 6837cf7a4812..0089e366f4e8 100644 --- a/arch/arm/mm/proc-arm1020.S +++ b/arch/arm/mm/proc-arm1020.S @@ -263,7 +263,7 @@ ENTRY(arm1020_flush_kern_dcache_area) * * (same as v4wb) */ -arm1020_dma_inv_range: +ENTRY(arm1020_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -293,7 +293,7 @@ arm1020_dma_inv_range: * * (same as v4wb) */ -arm1020_dma_clean_range: +ENTRY(arm1020_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S index df49b10250b8..c662e55a76fa 100644 --- a/arch/arm/mm/proc-arm1020e.S +++ b/arch/arm/mm/proc-arm1020e.S @@ -256,7 +256,7 @@ ENTRY(arm1020e_flush_kern_dcache_area) * * (same as v4wb) */ -arm1020e_dma_inv_range: +ENTRY(arm1020e_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -282,7 +282,7 @@ arm1020e_dma_inv_range: * * (same as v4wb) */ -arm1020e_dma_clean_range: +ENTRY(arm1020e_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S index e89ce467f672..e77328906bc5 100644 --- a/arch/arm/mm/proc-arm1022.S +++ b/arch/arm/mm/proc-arm1022.S @@ -256,7 +256,7 @@ ENTRY(arm1022_flush_kern_dcache_area) * * (same as v4wb) */ -arm1022_dma_inv_range: +ENTRY(arm1022_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -282,7 +282,7 @@ arm1022_dma_inv_range: * * (same as v4wb) */ -arm1022_dma_clean_range: +ENTRY(arm1022_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S index 7fdd1a205e8e..a23f9fa28d07 100644 --- a/arch/arm/mm/proc-arm1026.S +++ b/arch/arm/mm/proc-arm1026.S @@ -250,7 +250,7 @@ ENTRY(arm1026_flush_kern_dcache_area) * * (same as v4wb) */ -arm1026_dma_inv_range: +ENTRY(arm1026_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -276,7 +276,7 @@ arm1026_dma_inv_range: * * (same as v4wb) */ -arm1026_dma_clean_range: +ENTRY(arm1026_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S index a234cd8ba5e6..4c918ab106f3 100644 --- a/arch/arm/mm/proc-arm920.S +++ b/arch/arm/mm/proc-arm920.S @@ -232,7 +232,7 @@ ENTRY(arm920_flush_kern_dcache_area) * * (same as v4wb) */ -arm920_dma_inv_range: +ENTRY(arm920_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -255,7 +255,7 @@ arm920_dma_inv_range: * * (same as v4wb) */ -arm920_dma_clean_range: +ENTRY(arm920_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S index 53c029dcfd83..6ac7bb7d94a4 100644 --- a/arch/arm/mm/proc-arm922.S +++ b/arch/arm/mm/proc-arm922.S @@ -234,7 +234,7 @@ ENTRY(arm922_flush_kern_dcache_area) * * (same as v4wb) */ -arm922_dma_inv_range: +ENTRY(arm922_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -257,7 +257,7 @@ arm922_dma_inv_range: * * (same as v4wb) */ -arm922_dma_clean_range: +ENTRY(arm922_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S index 0bfad62ea858..860f0074ff81 100644 --- a/arch/arm/mm/proc-arm925.S +++ b/arch/arm/mm/proc-arm925.S @@ -280,7 +280,7 @@ ENTRY(arm925_flush_kern_dcache_area) * * (same as v4wb) */ -arm925_dma_inv_range: +ENTRY(arm925_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -305,7 +305,7 @@ arm925_dma_inv_range: * * (same as v4wb) */ -arm925_dma_clean_range: +ENTRY(arm925_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S index 0487a2c3439b..519f62e023c5 100644 --- a/arch/arm/mm/proc-arm926.S +++ b/arch/arm/mm/proc-arm926.S @@ -243,7 +243,7 @@ ENTRY(arm926_flush_kern_dcache_area) * * (same as v4wb) */ -arm926_dma_inv_range: +ENTRY(arm926_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -268,7 +268,7 @@ arm926_dma_inv_range: * * (same as v4wb) */ -arm926_dma_clean_range: +ENTRY(arm926_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S index cf9bfcc825ca..14dda5c5ee4a 100644 --- a/arch/arm/mm/proc-arm940.S +++ b/arch/arm/mm/proc-arm940.S @@ -177,7 +177,7 @@ ENTRY(arm940_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -arm940_dma_inv_range: +ENTRY(arm940_dma_inv_range) mov ip, #0 mov r1, #(CACHE_DSEGMENTS - 1) << 4 @ 4 segments 1: orr r3, r1, #(CACHE_DENTRIES - 1) << 26 @ 64 entries @@ -198,7 +198,7 @@ arm940_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -arm940_dma_clean_range: +ENTRY(arm940_dma_clean_range) ENTRY(cpu_arm940_dcache_clean_area) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S index 6fb3898ad1cd..91f62a7d334b 100644 --- a/arch/arm/mm/proc-arm946.S +++ b/arch/arm/mm/proc-arm946.S @@ -222,7 +222,7 @@ ENTRY(arm946_flush_kern_dcache_area) * - end - virtual end address * (same as arm926) */ -arm946_dma_inv_range: +ENTRY(arm946_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -247,7 +247,7 @@ arm946_dma_inv_range: * * (same as arm926) */ -arm946_dma_clean_range: +ENTRY(arm946_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S index 61ce82aca6f0..86122bad6d9b 100644 --- a/arch/arm/mm/proc-feroceon.S +++ b/arch/arm/mm/proc-feroceon.S @@ -271,7 +271,7 @@ ENTRY(feroceon_range_flush_kern_dcache_area) * (same as v4wb) */ .align 5 -feroceon_dma_inv_range: +ENTRY(feroceon_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -285,7 +285,7 @@ feroceon_dma_inv_range: ret lr .align 5 -feroceon_range_dma_inv_range: +ENTRY(feroceon_range_dma_inv_range) mrs r2, cpsr tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -311,7 +311,7 @@ feroceon_range_dma_inv_range: * (same as v4wb) */ .align 5 -feroceon_dma_clean_range: +ENTRY(feroceon_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE @@ -321,7 +321,7 @@ feroceon_dma_clean_range: ret lr .align 5 -feroceon_range_dma_clean_range: +ENTRY(feroceon_range_dma_clean_range) mrs r2, cpsr cmp r1, r0 subne r1, r1, #1 @ top address is inclusive diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S index e43f6d716b4b..c1328955fd2a 100644 --- a/arch/arm/mm/proc-macros.S +++ b/arch/arm/mm/proc-macros.S @@ -334,6 +334,8 @@ ENTRY(\name\()_cache_fns) .long \name\()_flush_kern_dcache_area .long \name\()_dma_map_area .long \name\()_dma_unmap_area + .long \name\()_dma_clean_range + .long \name\()_dma_inv_range .long \name\()_dma_flush_range .size \name\()_cache_fns, . - \name\()_cache_fns .endm diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S index 1645ccaffe96..db3a2f00372a 100644 --- a/arch/arm/mm/proc-mohawk.S +++ b/arch/arm/mm/proc-mohawk.S @@ -216,7 +216,7 @@ ENTRY(mohawk_flush_kern_dcache_area) * * (same as v4wb) */ -mohawk_dma_inv_range: +ENTRY(mohawk_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry tst r1, #CACHE_DLINESIZE - 1 @@ -239,7 +239,7 @@ mohawk_dma_inv_range: * * (same as v4wb) */ -mohawk_dma_clean_range: +ENTRY(mohawk_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S index a17afe7e195a..6db611a945f3 100644 --- a/arch/arm/mm/proc-xsc3.S +++ b/arch/arm/mm/proc-xsc3.S @@ -263,7 +263,7 @@ ENTRY(xsc3_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -xsc3_dma_inv_range: +ENTRY(xsc3_dma_inv_range) tst r0, #CACHELINESIZE - 1 bic r0, r0, #CACHELINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean L1 D line @@ -284,7 +284,7 @@ xsc3_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -xsc3_dma_clean_range: +ENTRY(xsc3_dma_clean_range) bic r0, r0, #CACHELINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean L1 D line add r0, r0, #CACHELINESIZE diff --git a/arch/arm/mm/proc-xscale.S b/arch/arm/mm/proc-xscale.S index d82590aa71c0..291dec830714 100644 --- a/arch/arm/mm/proc-xscale.S +++ b/arch/arm/mm/proc-xscale.S @@ -323,7 +323,7 @@ ENTRY(xscale_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -xscale_dma_inv_range: +ENTRY(xscale_dma_inv_range) tst r0, #CACHELINESIZE - 1 bic r0, r0, #CACHELINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -344,7 +344,7 @@ xscale_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -xscale_dma_clean_range: +ENTRY(xscale_dma_clean_range) bic r0, r0, #CACHELINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHELINESIZE @@ -445,6 +445,8 @@ ENDPROC(xscale_dma_unmap_area) a0_alias coherent_kern_range a0_alias coherent_user_range a0_alias flush_kern_dcache_area + a0_alias dma_clean_range + a0_alias dma_inv_range a0_alias dma_flush_range a0_alias dma_unmap_area -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: remove dmac_clean_range and dmac_inv_range") in an effort to sanitize the dma-mapping API. Now this logic is getting moved into the generic dma-mapping implementation in order to give architectures less control over it, which requires reverting that earlier work. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/include/asm/cacheflush.h | 21 +++++++++++++++++++++ arch/arm/include/asm/glue-cache.h | 4 ++++ arch/arm/mm/cache-fa.S | 4 ++-- arch/arm/mm/cache-nop.S | 6 ++++++ arch/arm/mm/cache-v4.S | 5 +++++ arch/arm/mm/cache-v4wb.S | 4 ++-- arch/arm/mm/cache-v4wt.S | 14 +++++++++++++- arch/arm/mm/cache-v6.S | 4 ++-- arch/arm/mm/cache-v7.S | 6 ++++-- arch/arm/mm/cache-v7m.S | 4 ++-- arch/arm/mm/proc-arm1020.S | 4 ++-- arch/arm/mm/proc-arm1020e.S | 4 ++-- arch/arm/mm/proc-arm1022.S | 4 ++-- arch/arm/mm/proc-arm1026.S | 4 ++-- arch/arm/mm/proc-arm920.S | 4 ++-- arch/arm/mm/proc-arm922.S | 4 ++-- arch/arm/mm/proc-arm925.S | 4 ++-- arch/arm/mm/proc-arm926.S | 4 ++-- arch/arm/mm/proc-arm940.S | 4 ++-- arch/arm/mm/proc-arm946.S | 4 ++-- arch/arm/mm/proc-feroceon.S | 8 ++++---- arch/arm/mm/proc-macros.S | 2 ++ arch/arm/mm/proc-mohawk.S | 4 ++-- arch/arm/mm/proc-xsc3.S | 4 ++-- arch/arm/mm/proc-xscale.S | 6 ++++-- 25 files changed, 95 insertions(+), 41 deletions(-) diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index a094f964c869..04462bfe9130 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -91,6 +91,21 @@ * DMA Cache Coherency * =================== * + * dma_inv_range(start, end) + * + * Invalidate (discard) the specified virtual address range. + * May not write back any entries. If 'start' or 'end' + * are not cache line aligned, those lines must be written + * back. + * - start - virtual start address + * - end - virtual end address + * + * dma_clean_range(start, end) + * + * Clean (write back) the specified virtual address range. + * - start - virtual start address + * - end - virtual end address + * * dma_flush_range(start, end) * * Clean and invalidate the specified virtual address range. @@ -112,6 +127,8 @@ struct cpu_cache_fns { void (*dma_map_area)(const void *, size_t, int); void (*dma_unmap_area)(const void *, size_t, int); + void (*dma_clean_range)(const void *, const void *); + void (*dma_inv_range)(const void *, const void *); void (*dma_flush_range)(const void *, const void *); } __no_randomize_layout; @@ -137,6 +154,8 @@ extern struct cpu_cache_fns cpu_cache; * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +#define dmac_clean_range cpu_cache.dma_clean_range +#define dmac_inv_range cpu_cache.dma_inv_range #define dmac_flush_range cpu_cache.dma_flush_range #else @@ -156,6 +175,8 @@ extern void __cpuc_flush_dcache_area(void *, size_t); * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +extern void dmac_clean_range(const void *, const void *); +extern void dmac_inv_range(const void *, const void *); extern void dmac_flush_range(const void *, const void *); #endif diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h index 724f8dac1e5b..d8c93b483adf 100644 --- a/arch/arm/include/asm/glue-cache.h +++ b/arch/arm/include/asm/glue-cache.h @@ -139,6 +139,8 @@ static inline int nop_coherent_user_range(unsigned long a, unsigned long b) { return 0; } static inline void nop_flush_kern_dcache_area(void *a, size_t s) { } +static inline void nop_dma_clean_range(const void *a, const void *b) { } +static inline void nop_dma_inv_range(const void *a, const void *b) { } static inline void nop_dma_flush_range(const void *a, const void *b) { } static inline void nop_dma_map_area(const void *s, size_t l, int f) { } @@ -155,6 +157,8 @@ static inline void nop_dma_unmap_area(const void *s, size_t l, int f) { } #define __cpuc_coherent_user_range __glue(_CACHE,_coherent_user_range) #define __cpuc_flush_dcache_area __glue(_CACHE,_flush_kern_dcache_area) +#define dmac_clean_range __glue(_CACHE,_dma_clean_range) +#define dmac_inv_range __glue(_CACHE,_dma_inv_range) #define dmac_flush_range __glue(_CACHE,_dma_flush_range) #endif diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S index 3a464d1649b4..abc3d58948dd 100644 --- a/arch/arm/mm/cache-fa.S +++ b/arch/arm/mm/cache-fa.S @@ -166,7 +166,7 @@ ENTRY(fa_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -fa_dma_inv_range: +ENTRY(fa_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c14, 1 @ clean & invalidate D entry @@ -189,7 +189,7 @@ fa_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -fa_dma_clean_range: +ENTRY(fa_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/cache-nop.S b/arch/arm/mm/cache-nop.S index 72d939ef8798..a058544d6c2b 100644 --- a/arch/arm/mm/cache-nop.S +++ b/arch/arm/mm/cache-nop.S @@ -32,6 +32,12 @@ ENDPROC(nop_coherent_user_range) .globl nop_flush_kern_dcache_area .equ nop_flush_kern_dcache_area, nop_flush_icache_all + .globl nop_dma_clean_range + .equ nop_dma_clean_range, nop_flush_icache_all + + .globl nop_dma_inv_range + .equ nop_dma_inv_range, nop_flush_icache_all + .globl nop_dma_flush_range .equ nop_dma_flush_range, nop_flush_icache_all diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index e2b104876340..b747e591109c 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -103,17 +103,22 @@ ENTRY(v4_flush_kern_dcache_area) /* * dma_flush_range(start, end) + * dma_inv_range(start, end) * * Clean and invalidate the specified virtual address range. + * As only write-through caches are supported here, this is the + * same as invalidate, while the clean operation does nothing. * * - start - virtual start address * - end - virtual end address */ +ENTRY(v4_dma_inv_range) ENTRY(v4_dma_flush_range) #ifdef CONFIG_CPU_CP15 mov r0, #0 mcr p15, 0, r0, c7, c7, 0 @ flush ID cache #endif +ENTRY(v4_dma_clean_range) ret lr /* diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S index 905ac2fa2b1e..55f609eae38d 100644 --- a/arch/arm/mm/cache-v4wb.S +++ b/arch/arm/mm/cache-v4wb.S @@ -183,7 +183,7 @@ ENTRY(v4wb_coherent_user_range) * - start - virtual start address * - end - virtual end address */ -v4wb_dma_inv_range: +ENTRY(v4wb_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -204,7 +204,7 @@ v4wb_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -v4wb_dma_clean_range: +ENTRY(v4wb_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 652218752f88..1a88627ec09b 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -152,7 +152,7 @@ ENTRY(v4wt_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -v4wt_dma_inv_range: +ENTRY(v4wt_dma_inv_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c6, 1 @ invalidate D entry add r0, r0, #CACHE_DLINESIZE @@ -171,6 +171,18 @@ v4wt_dma_inv_range: .globl v4wt_dma_flush_range .equ v4wt_dma_flush_range, v4wt_dma_inv_range +/* + * dma_clean_range(start, end) + * + * Clean the specified virtual address range. + * Empty implementation for writethrough caches. + * + * - start - virtual start address + * - end - virtual end address + */ + .globl v4wt_dma_clean_range + .equ v4wt_dma_clean_range, v4wt_dma_unmap_area + /* * dma_map_area(start, size, dir) * - start - kernel virtual start address diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index 250c83bf7158..abae7ff5defc 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -200,7 +200,7 @@ ENTRY(v6_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v6_dma_inv_range: +ENTRY(v6_dma_inv_range) #ifdef CONFIG_DMA_CACHE_RWFO ldrb r2, [r0] @ read for ownership strb r2, [r0] @ write for ownership @@ -245,7 +245,7 @@ v6_dma_inv_range: * - start - virtual start address of region * - end - virtual end address of region */ -v6_dma_clean_range: +ENTRY(v6_dma_clean_range) bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: #ifdef CONFIG_DMA_CACHE_RWFO diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index 127afe2096ba..b16a0d2a7cce 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -361,7 +361,7 @@ ENDPROC(v7_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v7_dma_inv_range: +ENTRY(v7_dma_inv_range) dcache_line_size r2, r3 sub r3, r2, #1 tst r0, r3 @@ -391,7 +391,7 @@ ENDPROC(v7_dma_inv_range) * - start - virtual start address of region * - end - virtual end address of region */ -v7_dma_clean_range: +ENTRY(v7_dma_clean_range) dcache_line_size r2, r3 sub r3, r2, #1 bic r0, r0, r3 @@ -477,6 +477,8 @@ ENDPROC(v7_dma_unmap_area) globl_equ b15_dma_map_area, v7_dma_map_area globl_equ b15_dma_unmap_area, v7_dma_unmap_area + globl_equ b15_dma_clean_range, v7_dma_clean_range + globl_equ b15_dma_inv_range, v7_dma_inv_range globl_equ b15_dma_flush_range, v7_dma_flush_range define_cache_functions b15 diff --git a/arch/arm/mm/cache-v7m.S b/arch/arm/mm/cache-v7m.S index eb60b5e5e2ad..4fc6e0028e40 100644 --- a/arch/arm/mm/cache-v7m.S +++ b/arch/arm/mm/cache-v7m.S @@ -364,7 +364,7 @@ ENDPROC(v7m_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v7m_dma_inv_range: +ENTRY(v7m_dma_inv_range) dcache_line_size r2, r3 sub r3, r2, #1 tst r0, r3 @@ -390,7 +390,7 @@ ENDPROC(v7m_dma_inv_range) * - start - virtual start address of region * - end - virtual end address of region */ -v7m_dma_clean_range: +ENTRY(v7m_dma_clean_range) dcache_line_size r2, r3 sub r3, r2, #1 bic r0, r0, r3 diff --git a/arch/arm/mm/proc-arm1020.S b/arch/arm/mm/proc-arm1020.S index 6837cf7a4812..0089e366f4e8 100644 --- a/arch/arm/mm/proc-arm1020.S +++ b/arch/arm/mm/proc-arm1020.S @@ -263,7 +263,7 @@ ENTRY(arm1020_flush_kern_dcache_area) * * (same as v4wb) */ -arm1020_dma_inv_range: +ENTRY(arm1020_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -293,7 +293,7 @@ arm1020_dma_inv_range: * * (same as v4wb) */ -arm1020_dma_clean_range: +ENTRY(arm1020_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S index df49b10250b8..c662e55a76fa 100644 --- a/arch/arm/mm/proc-arm1020e.S +++ b/arch/arm/mm/proc-arm1020e.S @@ -256,7 +256,7 @@ ENTRY(arm1020e_flush_kern_dcache_area) * * (same as v4wb) */ -arm1020e_dma_inv_range: +ENTRY(arm1020e_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -282,7 +282,7 @@ arm1020e_dma_inv_range: * * (same as v4wb) */ -arm1020e_dma_clean_range: +ENTRY(arm1020e_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S index e89ce467f672..e77328906bc5 100644 --- a/arch/arm/mm/proc-arm1022.S +++ b/arch/arm/mm/proc-arm1022.S @@ -256,7 +256,7 @@ ENTRY(arm1022_flush_kern_dcache_area) * * (same as v4wb) */ -arm1022_dma_inv_range: +ENTRY(arm1022_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -282,7 +282,7 @@ arm1022_dma_inv_range: * * (same as v4wb) */ -arm1022_dma_clean_range: +ENTRY(arm1022_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S index 7fdd1a205e8e..a23f9fa28d07 100644 --- a/arch/arm/mm/proc-arm1026.S +++ b/arch/arm/mm/proc-arm1026.S @@ -250,7 +250,7 @@ ENTRY(arm1026_flush_kern_dcache_area) * * (same as v4wb) */ -arm1026_dma_inv_range: +ENTRY(arm1026_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -276,7 +276,7 @@ arm1026_dma_inv_range: * * (same as v4wb) */ -arm1026_dma_clean_range: +ENTRY(arm1026_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S index a234cd8ba5e6..4c918ab106f3 100644 --- a/arch/arm/mm/proc-arm920.S +++ b/arch/arm/mm/proc-arm920.S @@ -232,7 +232,7 @@ ENTRY(arm920_flush_kern_dcache_area) * * (same as v4wb) */ -arm920_dma_inv_range: +ENTRY(arm920_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -255,7 +255,7 @@ arm920_dma_inv_range: * * (same as v4wb) */ -arm920_dma_clean_range: +ENTRY(arm920_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S index 53c029dcfd83..6ac7bb7d94a4 100644 --- a/arch/arm/mm/proc-arm922.S +++ b/arch/arm/mm/proc-arm922.S @@ -234,7 +234,7 @@ ENTRY(arm922_flush_kern_dcache_area) * * (same as v4wb) */ -arm922_dma_inv_range: +ENTRY(arm922_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -257,7 +257,7 @@ arm922_dma_inv_range: * * (same as v4wb) */ -arm922_dma_clean_range: +ENTRY(arm922_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S index 0bfad62ea858..860f0074ff81 100644 --- a/arch/arm/mm/proc-arm925.S +++ b/arch/arm/mm/proc-arm925.S @@ -280,7 +280,7 @@ ENTRY(arm925_flush_kern_dcache_area) * * (same as v4wb) */ -arm925_dma_inv_range: +ENTRY(arm925_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -305,7 +305,7 @@ arm925_dma_inv_range: * * (same as v4wb) */ -arm925_dma_clean_range: +ENTRY(arm925_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S index 0487a2c3439b..519f62e023c5 100644 --- a/arch/arm/mm/proc-arm926.S +++ b/arch/arm/mm/proc-arm926.S @@ -243,7 +243,7 @@ ENTRY(arm926_flush_kern_dcache_area) * * (same as v4wb) */ -arm926_dma_inv_range: +ENTRY(arm926_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -268,7 +268,7 @@ arm926_dma_inv_range: * * (same as v4wb) */ -arm926_dma_clean_range: +ENTRY(arm926_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S index cf9bfcc825ca..14dda5c5ee4a 100644 --- a/arch/arm/mm/proc-arm940.S +++ b/arch/arm/mm/proc-arm940.S @@ -177,7 +177,7 @@ ENTRY(arm940_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -arm940_dma_inv_range: +ENTRY(arm940_dma_inv_range) mov ip, #0 mov r1, #(CACHE_DSEGMENTS - 1) << 4 @ 4 segments 1: orr r3, r1, #(CACHE_DENTRIES - 1) << 26 @ 64 entries @@ -198,7 +198,7 @@ arm940_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -arm940_dma_clean_range: +ENTRY(arm940_dma_clean_range) ENTRY(cpu_arm940_dcache_clean_area) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S index 6fb3898ad1cd..91f62a7d334b 100644 --- a/arch/arm/mm/proc-arm946.S +++ b/arch/arm/mm/proc-arm946.S @@ -222,7 +222,7 @@ ENTRY(arm946_flush_kern_dcache_area) * - end - virtual end address * (same as arm926) */ -arm946_dma_inv_range: +ENTRY(arm946_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -247,7 +247,7 @@ arm946_dma_inv_range: * * (same as arm926) */ -arm946_dma_clean_range: +ENTRY(arm946_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S index 61ce82aca6f0..86122bad6d9b 100644 --- a/arch/arm/mm/proc-feroceon.S +++ b/arch/arm/mm/proc-feroceon.S @@ -271,7 +271,7 @@ ENTRY(feroceon_range_flush_kern_dcache_area) * (same as v4wb) */ .align 5 -feroceon_dma_inv_range: +ENTRY(feroceon_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -285,7 +285,7 @@ feroceon_dma_inv_range: ret lr .align 5 -feroceon_range_dma_inv_range: +ENTRY(feroceon_range_dma_inv_range) mrs r2, cpsr tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -311,7 +311,7 @@ feroceon_range_dma_inv_range: * (same as v4wb) */ .align 5 -feroceon_dma_clean_range: +ENTRY(feroceon_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE @@ -321,7 +321,7 @@ feroceon_dma_clean_range: ret lr .align 5 -feroceon_range_dma_clean_range: +ENTRY(feroceon_range_dma_clean_range) mrs r2, cpsr cmp r1, r0 subne r1, r1, #1 @ top address is inclusive diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S index e43f6d716b4b..c1328955fd2a 100644 --- a/arch/arm/mm/proc-macros.S +++ b/arch/arm/mm/proc-macros.S @@ -334,6 +334,8 @@ ENTRY(\name\()_cache_fns) .long \name\()_flush_kern_dcache_area .long \name\()_dma_map_area .long \name\()_dma_unmap_area + .long \name\()_dma_clean_range + .long \name\()_dma_inv_range .long \name\()_dma_flush_range .size \name\()_cache_fns, . - \name\()_cache_fns .endm diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S index 1645ccaffe96..db3a2f00372a 100644 --- a/arch/arm/mm/proc-mohawk.S +++ b/arch/arm/mm/proc-mohawk.S @@ -216,7 +216,7 @@ ENTRY(mohawk_flush_kern_dcache_area) * * (same as v4wb) */ -mohawk_dma_inv_range: +ENTRY(mohawk_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry tst r1, #CACHE_DLINESIZE - 1 @@ -239,7 +239,7 @@ mohawk_dma_inv_range: * * (same as v4wb) */ -mohawk_dma_clean_range: +ENTRY(mohawk_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S index a17afe7e195a..6db611a945f3 100644 --- a/arch/arm/mm/proc-xsc3.S +++ b/arch/arm/mm/proc-xsc3.S @@ -263,7 +263,7 @@ ENTRY(xsc3_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -xsc3_dma_inv_range: +ENTRY(xsc3_dma_inv_range) tst r0, #CACHELINESIZE - 1 bic r0, r0, #CACHELINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean L1 D line @@ -284,7 +284,7 @@ xsc3_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -xsc3_dma_clean_range: +ENTRY(xsc3_dma_clean_range) bic r0, r0, #CACHELINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean L1 D line add r0, r0, #CACHELINESIZE diff --git a/arch/arm/mm/proc-xscale.S b/arch/arm/mm/proc-xscale.S index d82590aa71c0..291dec830714 100644 --- a/arch/arm/mm/proc-xscale.S +++ b/arch/arm/mm/proc-xscale.S @@ -323,7 +323,7 @@ ENTRY(xscale_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -xscale_dma_inv_range: +ENTRY(xscale_dma_inv_range) tst r0, #CACHELINESIZE - 1 bic r0, r0, #CACHELINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -344,7 +344,7 @@ xscale_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -xscale_dma_clean_range: +ENTRY(xscale_dma_clean_range) bic r0, r0, #CACHELINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHELINESIZE @@ -445,6 +445,8 @@ ENDPROC(xscale_dma_unmap_area) a0_alias coherent_kern_range a0_alias coherent_user_range a0_alias flush_kern_dcache_area + a0_alias dma_clean_range + a0_alias dma_inv_range a0_alias dma_flush_range a0_alias dma_unmap_area -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: remove dmac_clean_range and dmac_inv_range") in an effort to sanitize the dma-mapping API. Now this logic is getting moved into the generic dma-mapping implementation in order to give architectures less control over it, which requires reverting that earlier work. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/include/asm/cacheflush.h | 21 +++++++++++++++++++++ arch/arm/include/asm/glue-cache.h | 4 ++++ arch/arm/mm/cache-fa.S | 4 ++-- arch/arm/mm/cache-nop.S | 6 ++++++ arch/arm/mm/cache-v4.S | 5 +++++ arch/arm/mm/cache-v4wb.S | 4 ++-- arch/arm/mm/cache-v4wt.S | 14 +++++++++++++- arch/arm/mm/cache-v6.S | 4 ++-- arch/arm/mm/cache-v7.S | 6 ++++-- arch/arm/mm/cache-v7m.S | 4 ++-- arch/arm/mm/proc-arm1020.S | 4 ++-- arch/arm/mm/proc-arm1020e.S | 4 ++-- arch/arm/mm/proc-arm1022.S | 4 ++-- arch/arm/mm/proc-arm1026.S | 4 ++-- arch/arm/mm/proc-arm920.S | 4 ++-- arch/arm/mm/proc-arm922.S | 4 ++-- arch/arm/mm/proc-arm925.S | 4 ++-- arch/arm/mm/proc-arm926.S | 4 ++-- arch/arm/mm/proc-arm940.S | 4 ++-- arch/arm/mm/proc-arm946.S | 4 ++-- arch/arm/mm/proc-feroceon.S | 8 ++++---- arch/arm/mm/proc-macros.S | 2 ++ arch/arm/mm/proc-mohawk.S | 4 ++-- arch/arm/mm/proc-xsc3.S | 4 ++-- arch/arm/mm/proc-xscale.S | 6 ++++-- 25 files changed, 95 insertions(+), 41 deletions(-) diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index a094f964c869..04462bfe9130 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -91,6 +91,21 @@ * DMA Cache Coherency * =================== * + * dma_inv_range(start, end) + * + * Invalidate (discard) the specified virtual address range. + * May not write back any entries. If 'start' or 'end' + * are not cache line aligned, those lines must be written + * back. + * - start - virtual start address + * - end - virtual end address + * + * dma_clean_range(start, end) + * + * Clean (write back) the specified virtual address range. + * - start - virtual start address + * - end - virtual end address + * * dma_flush_range(start, end) * * Clean and invalidate the specified virtual address range. @@ -112,6 +127,8 @@ struct cpu_cache_fns { void (*dma_map_area)(const void *, size_t, int); void (*dma_unmap_area)(const void *, size_t, int); + void (*dma_clean_range)(const void *, const void *); + void (*dma_inv_range)(const void *, const void *); void (*dma_flush_range)(const void *, const void *); } __no_randomize_layout; @@ -137,6 +154,8 @@ extern struct cpu_cache_fns cpu_cache; * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +#define dmac_clean_range cpu_cache.dma_clean_range +#define dmac_inv_range cpu_cache.dma_inv_range #define dmac_flush_range cpu_cache.dma_flush_range #else @@ -156,6 +175,8 @@ extern void __cpuc_flush_dcache_area(void *, size_t); * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +extern void dmac_clean_range(const void *, const void *); +extern void dmac_inv_range(const void *, const void *); extern void dmac_flush_range(const void *, const void *); #endif diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h index 724f8dac1e5b..d8c93b483adf 100644 --- a/arch/arm/include/asm/glue-cache.h +++ b/arch/arm/include/asm/glue-cache.h @@ -139,6 +139,8 @@ static inline int nop_coherent_user_range(unsigned long a, unsigned long b) { return 0; } static inline void nop_flush_kern_dcache_area(void *a, size_t s) { } +static inline void nop_dma_clean_range(const void *a, const void *b) { } +static inline void nop_dma_inv_range(const void *a, const void *b) { } static inline void nop_dma_flush_range(const void *a, const void *b) { } static inline void nop_dma_map_area(const void *s, size_t l, int f) { } @@ -155,6 +157,8 @@ static inline void nop_dma_unmap_area(const void *s, size_t l, int f) { } #define __cpuc_coherent_user_range __glue(_CACHE,_coherent_user_range) #define __cpuc_flush_dcache_area __glue(_CACHE,_flush_kern_dcache_area) +#define dmac_clean_range __glue(_CACHE,_dma_clean_range) +#define dmac_inv_range __glue(_CACHE,_dma_inv_range) #define dmac_flush_range __glue(_CACHE,_dma_flush_range) #endif diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S index 3a464d1649b4..abc3d58948dd 100644 --- a/arch/arm/mm/cache-fa.S +++ b/arch/arm/mm/cache-fa.S @@ -166,7 +166,7 @@ ENTRY(fa_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -fa_dma_inv_range: +ENTRY(fa_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c14, 1 @ clean & invalidate D entry @@ -189,7 +189,7 @@ fa_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -fa_dma_clean_range: +ENTRY(fa_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/cache-nop.S b/arch/arm/mm/cache-nop.S index 72d939ef8798..a058544d6c2b 100644 --- a/arch/arm/mm/cache-nop.S +++ b/arch/arm/mm/cache-nop.S @@ -32,6 +32,12 @@ ENDPROC(nop_coherent_user_range) .globl nop_flush_kern_dcache_area .equ nop_flush_kern_dcache_area, nop_flush_icache_all + .globl nop_dma_clean_range + .equ nop_dma_clean_range, nop_flush_icache_all + + .globl nop_dma_inv_range + .equ nop_dma_inv_range, nop_flush_icache_all + .globl nop_dma_flush_range .equ nop_dma_flush_range, nop_flush_icache_all diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index e2b104876340..b747e591109c 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -103,17 +103,22 @@ ENTRY(v4_flush_kern_dcache_area) /* * dma_flush_range(start, end) + * dma_inv_range(start, end) * * Clean and invalidate the specified virtual address range. + * As only write-through caches are supported here, this is the + * same as invalidate, while the clean operation does nothing. * * - start - virtual start address * - end - virtual end address */ +ENTRY(v4_dma_inv_range) ENTRY(v4_dma_flush_range) #ifdef CONFIG_CPU_CP15 mov r0, #0 mcr p15, 0, r0, c7, c7, 0 @ flush ID cache #endif +ENTRY(v4_dma_clean_range) ret lr /* diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S index 905ac2fa2b1e..55f609eae38d 100644 --- a/arch/arm/mm/cache-v4wb.S +++ b/arch/arm/mm/cache-v4wb.S @@ -183,7 +183,7 @@ ENTRY(v4wb_coherent_user_range) * - start - virtual start address * - end - virtual end address */ -v4wb_dma_inv_range: +ENTRY(v4wb_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -204,7 +204,7 @@ v4wb_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -v4wb_dma_clean_range: +ENTRY(v4wb_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 652218752f88..1a88627ec09b 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -152,7 +152,7 @@ ENTRY(v4wt_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -v4wt_dma_inv_range: +ENTRY(v4wt_dma_inv_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c6, 1 @ invalidate D entry add r0, r0, #CACHE_DLINESIZE @@ -171,6 +171,18 @@ v4wt_dma_inv_range: .globl v4wt_dma_flush_range .equ v4wt_dma_flush_range, v4wt_dma_inv_range +/* + * dma_clean_range(start, end) + * + * Clean the specified virtual address range. + * Empty implementation for writethrough caches. + * + * - start - virtual start address + * - end - virtual end address + */ + .globl v4wt_dma_clean_range + .equ v4wt_dma_clean_range, v4wt_dma_unmap_area + /* * dma_map_area(start, size, dir) * - start - kernel virtual start address diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index 250c83bf7158..abae7ff5defc 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -200,7 +200,7 @@ ENTRY(v6_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v6_dma_inv_range: +ENTRY(v6_dma_inv_range) #ifdef CONFIG_DMA_CACHE_RWFO ldrb r2, [r0] @ read for ownership strb r2, [r0] @ write for ownership @@ -245,7 +245,7 @@ v6_dma_inv_range: * - start - virtual start address of region * - end - virtual end address of region */ -v6_dma_clean_range: +ENTRY(v6_dma_clean_range) bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: #ifdef CONFIG_DMA_CACHE_RWFO diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index 127afe2096ba..b16a0d2a7cce 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -361,7 +361,7 @@ ENDPROC(v7_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v7_dma_inv_range: +ENTRY(v7_dma_inv_range) dcache_line_size r2, r3 sub r3, r2, #1 tst r0, r3 @@ -391,7 +391,7 @@ ENDPROC(v7_dma_inv_range) * - start - virtual start address of region * - end - virtual end address of region */ -v7_dma_clean_range: +ENTRY(v7_dma_clean_range) dcache_line_size r2, r3 sub r3, r2, #1 bic r0, r0, r3 @@ -477,6 +477,8 @@ ENDPROC(v7_dma_unmap_area) globl_equ b15_dma_map_area, v7_dma_map_area globl_equ b15_dma_unmap_area, v7_dma_unmap_area + globl_equ b15_dma_clean_range, v7_dma_clean_range + globl_equ b15_dma_inv_range, v7_dma_inv_range globl_equ b15_dma_flush_range, v7_dma_flush_range define_cache_functions b15 diff --git a/arch/arm/mm/cache-v7m.S b/arch/arm/mm/cache-v7m.S index eb60b5e5e2ad..4fc6e0028e40 100644 --- a/arch/arm/mm/cache-v7m.S +++ b/arch/arm/mm/cache-v7m.S @@ -364,7 +364,7 @@ ENDPROC(v7m_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v7m_dma_inv_range: +ENTRY(v7m_dma_inv_range) dcache_line_size r2, r3 sub r3, r2, #1 tst r0, r3 @@ -390,7 +390,7 @@ ENDPROC(v7m_dma_inv_range) * - start - virtual start address of region * - end - virtual end address of region */ -v7m_dma_clean_range: +ENTRY(v7m_dma_clean_range) dcache_line_size r2, r3 sub r3, r2, #1 bic r0, r0, r3 diff --git a/arch/arm/mm/proc-arm1020.S b/arch/arm/mm/proc-arm1020.S index 6837cf7a4812..0089e366f4e8 100644 --- a/arch/arm/mm/proc-arm1020.S +++ b/arch/arm/mm/proc-arm1020.S @@ -263,7 +263,7 @@ ENTRY(arm1020_flush_kern_dcache_area) * * (same as v4wb) */ -arm1020_dma_inv_range: +ENTRY(arm1020_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -293,7 +293,7 @@ arm1020_dma_inv_range: * * (same as v4wb) */ -arm1020_dma_clean_range: +ENTRY(arm1020_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S index df49b10250b8..c662e55a76fa 100644 --- a/arch/arm/mm/proc-arm1020e.S +++ b/arch/arm/mm/proc-arm1020e.S @@ -256,7 +256,7 @@ ENTRY(arm1020e_flush_kern_dcache_area) * * (same as v4wb) */ -arm1020e_dma_inv_range: +ENTRY(arm1020e_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -282,7 +282,7 @@ arm1020e_dma_inv_range: * * (same as v4wb) */ -arm1020e_dma_clean_range: +ENTRY(arm1020e_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S index e89ce467f672..e77328906bc5 100644 --- a/arch/arm/mm/proc-arm1022.S +++ b/arch/arm/mm/proc-arm1022.S @@ -256,7 +256,7 @@ ENTRY(arm1022_flush_kern_dcache_area) * * (same as v4wb) */ -arm1022_dma_inv_range: +ENTRY(arm1022_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -282,7 +282,7 @@ arm1022_dma_inv_range: * * (same as v4wb) */ -arm1022_dma_clean_range: +ENTRY(arm1022_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S index 7fdd1a205e8e..a23f9fa28d07 100644 --- a/arch/arm/mm/proc-arm1026.S +++ b/arch/arm/mm/proc-arm1026.S @@ -250,7 +250,7 @@ ENTRY(arm1026_flush_kern_dcache_area) * * (same as v4wb) */ -arm1026_dma_inv_range: +ENTRY(arm1026_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -276,7 +276,7 @@ arm1026_dma_inv_range: * * (same as v4wb) */ -arm1026_dma_clean_range: +ENTRY(arm1026_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S index a234cd8ba5e6..4c918ab106f3 100644 --- a/arch/arm/mm/proc-arm920.S +++ b/arch/arm/mm/proc-arm920.S @@ -232,7 +232,7 @@ ENTRY(arm920_flush_kern_dcache_area) * * (same as v4wb) */ -arm920_dma_inv_range: +ENTRY(arm920_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -255,7 +255,7 @@ arm920_dma_inv_range: * * (same as v4wb) */ -arm920_dma_clean_range: +ENTRY(arm920_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S index 53c029dcfd83..6ac7bb7d94a4 100644 --- a/arch/arm/mm/proc-arm922.S +++ b/arch/arm/mm/proc-arm922.S @@ -234,7 +234,7 @@ ENTRY(arm922_flush_kern_dcache_area) * * (same as v4wb) */ -arm922_dma_inv_range: +ENTRY(arm922_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -257,7 +257,7 @@ arm922_dma_inv_range: * * (same as v4wb) */ -arm922_dma_clean_range: +ENTRY(arm922_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S index 0bfad62ea858..860f0074ff81 100644 --- a/arch/arm/mm/proc-arm925.S +++ b/arch/arm/mm/proc-arm925.S @@ -280,7 +280,7 @@ ENTRY(arm925_flush_kern_dcache_area) * * (same as v4wb) */ -arm925_dma_inv_range: +ENTRY(arm925_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -305,7 +305,7 @@ arm925_dma_inv_range: * * (same as v4wb) */ -arm925_dma_clean_range: +ENTRY(arm925_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S index 0487a2c3439b..519f62e023c5 100644 --- a/arch/arm/mm/proc-arm926.S +++ b/arch/arm/mm/proc-arm926.S @@ -243,7 +243,7 @@ ENTRY(arm926_flush_kern_dcache_area) * * (same as v4wb) */ -arm926_dma_inv_range: +ENTRY(arm926_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -268,7 +268,7 @@ arm926_dma_inv_range: * * (same as v4wb) */ -arm926_dma_clean_range: +ENTRY(arm926_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S index cf9bfcc825ca..14dda5c5ee4a 100644 --- a/arch/arm/mm/proc-arm940.S +++ b/arch/arm/mm/proc-arm940.S @@ -177,7 +177,7 @@ ENTRY(arm940_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -arm940_dma_inv_range: +ENTRY(arm940_dma_inv_range) mov ip, #0 mov r1, #(CACHE_DSEGMENTS - 1) << 4 @ 4 segments 1: orr r3, r1, #(CACHE_DENTRIES - 1) << 26 @ 64 entries @@ -198,7 +198,7 @@ arm940_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -arm940_dma_clean_range: +ENTRY(arm940_dma_clean_range) ENTRY(cpu_arm940_dcache_clean_area) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S index 6fb3898ad1cd..91f62a7d334b 100644 --- a/arch/arm/mm/proc-arm946.S +++ b/arch/arm/mm/proc-arm946.S @@ -222,7 +222,7 @@ ENTRY(arm946_flush_kern_dcache_area) * - end - virtual end address * (same as arm926) */ -arm946_dma_inv_range: +ENTRY(arm946_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -247,7 +247,7 @@ arm946_dma_inv_range: * * (same as arm926) */ -arm946_dma_clean_range: +ENTRY(arm946_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S index 61ce82aca6f0..86122bad6d9b 100644 --- a/arch/arm/mm/proc-feroceon.S +++ b/arch/arm/mm/proc-feroceon.S @@ -271,7 +271,7 @@ ENTRY(feroceon_range_flush_kern_dcache_area) * (same as v4wb) */ .align 5 -feroceon_dma_inv_range: +ENTRY(feroceon_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -285,7 +285,7 @@ feroceon_dma_inv_range: ret lr .align 5 -feroceon_range_dma_inv_range: +ENTRY(feroceon_range_dma_inv_range) mrs r2, cpsr tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -311,7 +311,7 @@ feroceon_range_dma_inv_range: * (same as v4wb) */ .align 5 -feroceon_dma_clean_range: +ENTRY(feroceon_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE @@ -321,7 +321,7 @@ feroceon_dma_clean_range: ret lr .align 5 -feroceon_range_dma_clean_range: +ENTRY(feroceon_range_dma_clean_range) mrs r2, cpsr cmp r1, r0 subne r1, r1, #1 @ top address is inclusive diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S index e43f6d716b4b..c1328955fd2a 100644 --- a/arch/arm/mm/proc-macros.S +++ b/arch/arm/mm/proc-macros.S @@ -334,6 +334,8 @@ ENTRY(\name\()_cache_fns) .long \name\()_flush_kern_dcache_area .long \name\()_dma_map_area .long \name\()_dma_unmap_area + .long \name\()_dma_clean_range + .long \name\()_dma_inv_range .long \name\()_dma_flush_range .size \name\()_cache_fns, . - \name\()_cache_fns .endm diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S index 1645ccaffe96..db3a2f00372a 100644 --- a/arch/arm/mm/proc-mohawk.S +++ b/arch/arm/mm/proc-mohawk.S @@ -216,7 +216,7 @@ ENTRY(mohawk_flush_kern_dcache_area) * * (same as v4wb) */ -mohawk_dma_inv_range: +ENTRY(mohawk_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry tst r1, #CACHE_DLINESIZE - 1 @@ -239,7 +239,7 @@ mohawk_dma_inv_range: * * (same as v4wb) */ -mohawk_dma_clean_range: +ENTRY(mohawk_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S index a17afe7e195a..6db611a945f3 100644 --- a/arch/arm/mm/proc-xsc3.S +++ b/arch/arm/mm/proc-xsc3.S @@ -263,7 +263,7 @@ ENTRY(xsc3_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -xsc3_dma_inv_range: +ENTRY(xsc3_dma_inv_range) tst r0, #CACHELINESIZE - 1 bic r0, r0, #CACHELINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean L1 D line @@ -284,7 +284,7 @@ xsc3_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -xsc3_dma_clean_range: +ENTRY(xsc3_dma_clean_range) bic r0, r0, #CACHELINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean L1 D line add r0, r0, #CACHELINESIZE diff --git a/arch/arm/mm/proc-xscale.S b/arch/arm/mm/proc-xscale.S index d82590aa71c0..291dec830714 100644 --- a/arch/arm/mm/proc-xscale.S +++ b/arch/arm/mm/proc-xscale.S @@ -323,7 +323,7 @@ ENTRY(xscale_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -xscale_dma_inv_range: +ENTRY(xscale_dma_inv_range) tst r0, #CACHELINESIZE - 1 bic r0, r0, #CACHELINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -344,7 +344,7 @@ xscale_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -xscale_dma_clean_range: +ENTRY(xscale_dma_clean_range) bic r0, r0, #CACHELINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHELINESIZE @@ -445,6 +445,8 @@ ENDPROC(xscale_dma_unmap_area) a0_alias coherent_kern_range a0_alias coherent_user_range a0_alias flush_kern_dcache_area + a0_alias dma_clean_range + a0_alias dma_inv_range a0_alias dma_flush_range a0_alias dma_unmap_area -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: remove dmac_clean_range and dmac_inv_range") in an effort to sanitize the dma-mapping API. Now this logic is getting moved into the generic dma-mapping implementation in order to give architectures less control over it, which requires reverting that earlier work. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/include/asm/cacheflush.h | 21 +++++++++++++++++++++ arch/arm/include/asm/glue-cache.h | 4 ++++ arch/arm/mm/cache-fa.S | 4 ++-- arch/arm/mm/cache-nop.S | 6 ++++++ arch/arm/mm/cache-v4.S | 5 +++++ arch/arm/mm/cache-v4wb.S | 4 ++-- arch/arm/mm/cache-v4wt.S | 14 +++++++++++++- arch/arm/mm/cache-v6.S | 4 ++-- arch/arm/mm/cache-v7.S | 6 ++++-- arch/arm/mm/cache-v7m.S | 4 ++-- arch/arm/mm/proc-arm1020.S | 4 ++-- arch/arm/mm/proc-arm1020e.S | 4 ++-- arch/arm/mm/proc-arm1022.S | 4 ++-- arch/arm/mm/proc-arm1026.S | 4 ++-- arch/arm/mm/proc-arm920.S | 4 ++-- arch/arm/mm/proc-arm922.S | 4 ++-- arch/arm/mm/proc-arm925.S | 4 ++-- arch/arm/mm/proc-arm926.S | 4 ++-- arch/arm/mm/proc-arm940.S | 4 ++-- arch/arm/mm/proc-arm946.S | 4 ++-- arch/arm/mm/proc-feroceon.S | 8 ++++---- arch/arm/mm/proc-macros.S | 2 ++ arch/arm/mm/proc-mohawk.S | 4 ++-- arch/arm/mm/proc-xsc3.S | 4 ++-- arch/arm/mm/proc-xscale.S | 6 ++++-- 25 files changed, 95 insertions(+), 41 deletions(-) diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index a094f964c869..04462bfe9130 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -91,6 +91,21 @@ * DMA Cache Coherency * =================== * + * dma_inv_range(start, end) + * + * Invalidate (discard) the specified virtual address range. + * May not write back any entries. If 'start' or 'end' + * are not cache line aligned, those lines must be written + * back. + * - start - virtual start address + * - end - virtual end address + * + * dma_clean_range(start, end) + * + * Clean (write back) the specified virtual address range. + * - start - virtual start address + * - end - virtual end address + * * dma_flush_range(start, end) * * Clean and invalidate the specified virtual address range. @@ -112,6 +127,8 @@ struct cpu_cache_fns { void (*dma_map_area)(const void *, size_t, int); void (*dma_unmap_area)(const void *, size_t, int); + void (*dma_clean_range)(const void *, const void *); + void (*dma_inv_range)(const void *, const void *); void (*dma_flush_range)(const void *, const void *); } __no_randomize_layout; @@ -137,6 +154,8 @@ extern struct cpu_cache_fns cpu_cache; * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +#define dmac_clean_range cpu_cache.dma_clean_range +#define dmac_inv_range cpu_cache.dma_inv_range #define dmac_flush_range cpu_cache.dma_flush_range #else @@ -156,6 +175,8 @@ extern void __cpuc_flush_dcache_area(void *, size_t); * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +extern void dmac_clean_range(const void *, const void *); +extern void dmac_inv_range(const void *, const void *); extern void dmac_flush_range(const void *, const void *); #endif diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h index 724f8dac1e5b..d8c93b483adf 100644 --- a/arch/arm/include/asm/glue-cache.h +++ b/arch/arm/include/asm/glue-cache.h @@ -139,6 +139,8 @@ static inline int nop_coherent_user_range(unsigned long a, unsigned long b) { return 0; } static inline void nop_flush_kern_dcache_area(void *a, size_t s) { } +static inline void nop_dma_clean_range(const void *a, const void *b) { } +static inline void nop_dma_inv_range(const void *a, const void *b) { } static inline void nop_dma_flush_range(const void *a, const void *b) { } static inline void nop_dma_map_area(const void *s, size_t l, int f) { } @@ -155,6 +157,8 @@ static inline void nop_dma_unmap_area(const void *s, size_t l, int f) { } #define __cpuc_coherent_user_range __glue(_CACHE,_coherent_user_range) #define __cpuc_flush_dcache_area __glue(_CACHE,_flush_kern_dcache_area) +#define dmac_clean_range __glue(_CACHE,_dma_clean_range) +#define dmac_inv_range __glue(_CACHE,_dma_inv_range) #define dmac_flush_range __glue(_CACHE,_dma_flush_range) #endif diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S index 3a464d1649b4..abc3d58948dd 100644 --- a/arch/arm/mm/cache-fa.S +++ b/arch/arm/mm/cache-fa.S @@ -166,7 +166,7 @@ ENTRY(fa_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -fa_dma_inv_range: +ENTRY(fa_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c14, 1 @ clean & invalidate D entry @@ -189,7 +189,7 @@ fa_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -fa_dma_clean_range: +ENTRY(fa_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/cache-nop.S b/arch/arm/mm/cache-nop.S index 72d939ef8798..a058544d6c2b 100644 --- a/arch/arm/mm/cache-nop.S +++ b/arch/arm/mm/cache-nop.S @@ -32,6 +32,12 @@ ENDPROC(nop_coherent_user_range) .globl nop_flush_kern_dcache_area .equ nop_flush_kern_dcache_area, nop_flush_icache_all + .globl nop_dma_clean_range + .equ nop_dma_clean_range, nop_flush_icache_all + + .globl nop_dma_inv_range + .equ nop_dma_inv_range, nop_flush_icache_all + .globl nop_dma_flush_range .equ nop_dma_flush_range, nop_flush_icache_all diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index e2b104876340..b747e591109c 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -103,17 +103,22 @@ ENTRY(v4_flush_kern_dcache_area) /* * dma_flush_range(start, end) + * dma_inv_range(start, end) * * Clean and invalidate the specified virtual address range. + * As only write-through caches are supported here, this is the + * same as invalidate, while the clean operation does nothing. * * - start - virtual start address * - end - virtual end address */ +ENTRY(v4_dma_inv_range) ENTRY(v4_dma_flush_range) #ifdef CONFIG_CPU_CP15 mov r0, #0 mcr p15, 0, r0, c7, c7, 0 @ flush ID cache #endif +ENTRY(v4_dma_clean_range) ret lr /* diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S index 905ac2fa2b1e..55f609eae38d 100644 --- a/arch/arm/mm/cache-v4wb.S +++ b/arch/arm/mm/cache-v4wb.S @@ -183,7 +183,7 @@ ENTRY(v4wb_coherent_user_range) * - start - virtual start address * - end - virtual end address */ -v4wb_dma_inv_range: +ENTRY(v4wb_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -204,7 +204,7 @@ v4wb_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -v4wb_dma_clean_range: +ENTRY(v4wb_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 652218752f88..1a88627ec09b 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -152,7 +152,7 @@ ENTRY(v4wt_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -v4wt_dma_inv_range: +ENTRY(v4wt_dma_inv_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c6, 1 @ invalidate D entry add r0, r0, #CACHE_DLINESIZE @@ -171,6 +171,18 @@ v4wt_dma_inv_range: .globl v4wt_dma_flush_range .equ v4wt_dma_flush_range, v4wt_dma_inv_range +/* + * dma_clean_range(start, end) + * + * Clean the specified virtual address range. + * Empty implementation for writethrough caches. + * + * - start - virtual start address + * - end - virtual end address + */ + .globl v4wt_dma_clean_range + .equ v4wt_dma_clean_range, v4wt_dma_unmap_area + /* * dma_map_area(start, size, dir) * - start - kernel virtual start address diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index 250c83bf7158..abae7ff5defc 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -200,7 +200,7 @@ ENTRY(v6_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v6_dma_inv_range: +ENTRY(v6_dma_inv_range) #ifdef CONFIG_DMA_CACHE_RWFO ldrb r2, [r0] @ read for ownership strb r2, [r0] @ write for ownership @@ -245,7 +245,7 @@ v6_dma_inv_range: * - start - virtual start address of region * - end - virtual end address of region */ -v6_dma_clean_range: +ENTRY(v6_dma_clean_range) bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: #ifdef CONFIG_DMA_CACHE_RWFO diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index 127afe2096ba..b16a0d2a7cce 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -361,7 +361,7 @@ ENDPROC(v7_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v7_dma_inv_range: +ENTRY(v7_dma_inv_range) dcache_line_size r2, r3 sub r3, r2, #1 tst r0, r3 @@ -391,7 +391,7 @@ ENDPROC(v7_dma_inv_range) * - start - virtual start address of region * - end - virtual end address of region */ -v7_dma_clean_range: +ENTRY(v7_dma_clean_range) dcache_line_size r2, r3 sub r3, r2, #1 bic r0, r0, r3 @@ -477,6 +477,8 @@ ENDPROC(v7_dma_unmap_area) globl_equ b15_dma_map_area, v7_dma_map_area globl_equ b15_dma_unmap_area, v7_dma_unmap_area + globl_equ b15_dma_clean_range, v7_dma_clean_range + globl_equ b15_dma_inv_range, v7_dma_inv_range globl_equ b15_dma_flush_range, v7_dma_flush_range define_cache_functions b15 diff --git a/arch/arm/mm/cache-v7m.S b/arch/arm/mm/cache-v7m.S index eb60b5e5e2ad..4fc6e0028e40 100644 --- a/arch/arm/mm/cache-v7m.S +++ b/arch/arm/mm/cache-v7m.S @@ -364,7 +364,7 @@ ENDPROC(v7m_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v7m_dma_inv_range: +ENTRY(v7m_dma_inv_range) dcache_line_size r2, r3 sub r3, r2, #1 tst r0, r3 @@ -390,7 +390,7 @@ ENDPROC(v7m_dma_inv_range) * - start - virtual start address of region * - end - virtual end address of region */ -v7m_dma_clean_range: +ENTRY(v7m_dma_clean_range) dcache_line_size r2, r3 sub r3, r2, #1 bic r0, r0, r3 diff --git a/arch/arm/mm/proc-arm1020.S b/arch/arm/mm/proc-arm1020.S index 6837cf7a4812..0089e366f4e8 100644 --- a/arch/arm/mm/proc-arm1020.S +++ b/arch/arm/mm/proc-arm1020.S @@ -263,7 +263,7 @@ ENTRY(arm1020_flush_kern_dcache_area) * * (same as v4wb) */ -arm1020_dma_inv_range: +ENTRY(arm1020_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -293,7 +293,7 @@ arm1020_dma_inv_range: * * (same as v4wb) */ -arm1020_dma_clean_range: +ENTRY(arm1020_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S index df49b10250b8..c662e55a76fa 100644 --- a/arch/arm/mm/proc-arm1020e.S +++ b/arch/arm/mm/proc-arm1020e.S @@ -256,7 +256,7 @@ ENTRY(arm1020e_flush_kern_dcache_area) * * (same as v4wb) */ -arm1020e_dma_inv_range: +ENTRY(arm1020e_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -282,7 +282,7 @@ arm1020e_dma_inv_range: * * (same as v4wb) */ -arm1020e_dma_clean_range: +ENTRY(arm1020e_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S index e89ce467f672..e77328906bc5 100644 --- a/arch/arm/mm/proc-arm1022.S +++ b/arch/arm/mm/proc-arm1022.S @@ -256,7 +256,7 @@ ENTRY(arm1022_flush_kern_dcache_area) * * (same as v4wb) */ -arm1022_dma_inv_range: +ENTRY(arm1022_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -282,7 +282,7 @@ arm1022_dma_inv_range: * * (same as v4wb) */ -arm1022_dma_clean_range: +ENTRY(arm1022_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S index 7fdd1a205e8e..a23f9fa28d07 100644 --- a/arch/arm/mm/proc-arm1026.S +++ b/arch/arm/mm/proc-arm1026.S @@ -250,7 +250,7 @@ ENTRY(arm1026_flush_kern_dcache_area) * * (same as v4wb) */ -arm1026_dma_inv_range: +ENTRY(arm1026_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -276,7 +276,7 @@ arm1026_dma_inv_range: * * (same as v4wb) */ -arm1026_dma_clean_range: +ENTRY(arm1026_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S index a234cd8ba5e6..4c918ab106f3 100644 --- a/arch/arm/mm/proc-arm920.S +++ b/arch/arm/mm/proc-arm920.S @@ -232,7 +232,7 @@ ENTRY(arm920_flush_kern_dcache_area) * * (same as v4wb) */ -arm920_dma_inv_range: +ENTRY(arm920_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -255,7 +255,7 @@ arm920_dma_inv_range: * * (same as v4wb) */ -arm920_dma_clean_range: +ENTRY(arm920_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S index 53c029dcfd83..6ac7bb7d94a4 100644 --- a/arch/arm/mm/proc-arm922.S +++ b/arch/arm/mm/proc-arm922.S @@ -234,7 +234,7 @@ ENTRY(arm922_flush_kern_dcache_area) * * (same as v4wb) */ -arm922_dma_inv_range: +ENTRY(arm922_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -257,7 +257,7 @@ arm922_dma_inv_range: * * (same as v4wb) */ -arm922_dma_clean_range: +ENTRY(arm922_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S index 0bfad62ea858..860f0074ff81 100644 --- a/arch/arm/mm/proc-arm925.S +++ b/arch/arm/mm/proc-arm925.S @@ -280,7 +280,7 @@ ENTRY(arm925_flush_kern_dcache_area) * * (same as v4wb) */ -arm925_dma_inv_range: +ENTRY(arm925_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -305,7 +305,7 @@ arm925_dma_inv_range: * * (same as v4wb) */ -arm925_dma_clean_range: +ENTRY(arm925_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S index 0487a2c3439b..519f62e023c5 100644 --- a/arch/arm/mm/proc-arm926.S +++ b/arch/arm/mm/proc-arm926.S @@ -243,7 +243,7 @@ ENTRY(arm926_flush_kern_dcache_area) * * (same as v4wb) */ -arm926_dma_inv_range: +ENTRY(arm926_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -268,7 +268,7 @@ arm926_dma_inv_range: * * (same as v4wb) */ -arm926_dma_clean_range: +ENTRY(arm926_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S index cf9bfcc825ca..14dda5c5ee4a 100644 --- a/arch/arm/mm/proc-arm940.S +++ b/arch/arm/mm/proc-arm940.S @@ -177,7 +177,7 @@ ENTRY(arm940_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -arm940_dma_inv_range: +ENTRY(arm940_dma_inv_range) mov ip, #0 mov r1, #(CACHE_DSEGMENTS - 1) << 4 @ 4 segments 1: orr r3, r1, #(CACHE_DENTRIES - 1) << 26 @ 64 entries @@ -198,7 +198,7 @@ arm940_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -arm940_dma_clean_range: +ENTRY(arm940_dma_clean_range) ENTRY(cpu_arm940_dcache_clean_area) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S index 6fb3898ad1cd..91f62a7d334b 100644 --- a/arch/arm/mm/proc-arm946.S +++ b/arch/arm/mm/proc-arm946.S @@ -222,7 +222,7 @@ ENTRY(arm946_flush_kern_dcache_area) * - end - virtual end address * (same as arm926) */ -arm946_dma_inv_range: +ENTRY(arm946_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -247,7 +247,7 @@ arm946_dma_inv_range: * * (same as arm926) */ -arm946_dma_clean_range: +ENTRY(arm946_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S index 61ce82aca6f0..86122bad6d9b 100644 --- a/arch/arm/mm/proc-feroceon.S +++ b/arch/arm/mm/proc-feroceon.S @@ -271,7 +271,7 @@ ENTRY(feroceon_range_flush_kern_dcache_area) * (same as v4wb) */ .align 5 -feroceon_dma_inv_range: +ENTRY(feroceon_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -285,7 +285,7 @@ feroceon_dma_inv_range: ret lr .align 5 -feroceon_range_dma_inv_range: +ENTRY(feroceon_range_dma_inv_range) mrs r2, cpsr tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -311,7 +311,7 @@ feroceon_range_dma_inv_range: * (same as v4wb) */ .align 5 -feroceon_dma_clean_range: +ENTRY(feroceon_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE @@ -321,7 +321,7 @@ feroceon_dma_clean_range: ret lr .align 5 -feroceon_range_dma_clean_range: +ENTRY(feroceon_range_dma_clean_range) mrs r2, cpsr cmp r1, r0 subne r1, r1, #1 @ top address is inclusive diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S index e43f6d716b4b..c1328955fd2a 100644 --- a/arch/arm/mm/proc-macros.S +++ b/arch/arm/mm/proc-macros.S @@ -334,6 +334,8 @@ ENTRY(\name\()_cache_fns) .long \name\()_flush_kern_dcache_area .long \name\()_dma_map_area .long \name\()_dma_unmap_area + .long \name\()_dma_clean_range + .long \name\()_dma_inv_range .long \name\()_dma_flush_range .size \name\()_cache_fns, . - \name\()_cache_fns .endm diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S index 1645ccaffe96..db3a2f00372a 100644 --- a/arch/arm/mm/proc-mohawk.S +++ b/arch/arm/mm/proc-mohawk.S @@ -216,7 +216,7 @@ ENTRY(mohawk_flush_kern_dcache_area) * * (same as v4wb) */ -mohawk_dma_inv_range: +ENTRY(mohawk_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry tst r1, #CACHE_DLINESIZE - 1 @@ -239,7 +239,7 @@ mohawk_dma_inv_range: * * (same as v4wb) */ -mohawk_dma_clean_range: +ENTRY(mohawk_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S index a17afe7e195a..6db611a945f3 100644 --- a/arch/arm/mm/proc-xsc3.S +++ b/arch/arm/mm/proc-xsc3.S @@ -263,7 +263,7 @@ ENTRY(xsc3_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -xsc3_dma_inv_range: +ENTRY(xsc3_dma_inv_range) tst r0, #CACHELINESIZE - 1 bic r0, r0, #CACHELINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean L1 D line @@ -284,7 +284,7 @@ xsc3_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -xsc3_dma_clean_range: +ENTRY(xsc3_dma_clean_range) bic r0, r0, #CACHELINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean L1 D line add r0, r0, #CACHELINESIZE diff --git a/arch/arm/mm/proc-xscale.S b/arch/arm/mm/proc-xscale.S index d82590aa71c0..291dec830714 100644 --- a/arch/arm/mm/proc-xscale.S +++ b/arch/arm/mm/proc-xscale.S @@ -323,7 +323,7 @@ ENTRY(xscale_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -xscale_dma_inv_range: +ENTRY(xscale_dma_inv_range) tst r0, #CACHELINESIZE - 1 bic r0, r0, #CACHELINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -344,7 +344,7 @@ xscale_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -xscale_dma_clean_range: +ENTRY(xscale_dma_clean_range) bic r0, r0, #CACHELINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHELINESIZE @@ -445,6 +445,8 @@ ENDPROC(xscale_dma_unmap_area) a0_alias coherent_kern_range a0_alias coherent_user_range a0_alias flush_kern_dcache_area + a0_alias dma_clean_range + a0_alias dma_inv_range a0_alias dma_flush_range a0_alias dma_unmap_area -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: remove dmac_clean_range and dmac_inv_range") in an effort to sanitize the dma-mapping API. Now this logic is getting moved into the generic dma-mapping implementation in order to give architectures less control over it, which requires reverting that earlier work. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/include/asm/cacheflush.h | 21 +++++++++++++++++++++ arch/arm/include/asm/glue-cache.h | 4 ++++ arch/arm/mm/cache-fa.S | 4 ++-- arch/arm/mm/cache-nop.S | 6 ++++++ arch/arm/mm/cache-v4.S | 5 +++++ arch/arm/mm/cache-v4wb.S | 4 ++-- arch/arm/mm/cache-v4wt.S | 14 +++++++++++++- arch/arm/mm/cache-v6.S | 4 ++-- arch/arm/mm/cache-v7.S | 6 ++++-- arch/arm/mm/cache-v7m.S | 4 ++-- arch/arm/mm/proc-arm1020.S | 4 ++-- arch/arm/mm/proc-arm1020e.S | 4 ++-- arch/arm/mm/proc-arm1022.S | 4 ++-- arch/arm/mm/proc-arm1026.S | 4 ++-- arch/arm/mm/proc-arm920.S | 4 ++-- arch/arm/mm/proc-arm922.S | 4 ++-- arch/arm/mm/proc-arm925.S | 4 ++-- arch/arm/mm/proc-arm926.S | 4 ++-- arch/arm/mm/proc-arm940.S | 4 ++-- arch/arm/mm/proc-arm946.S | 4 ++-- arch/arm/mm/proc-feroceon.S | 8 ++++---- arch/arm/mm/proc-macros.S | 2 ++ arch/arm/mm/proc-mohawk.S | 4 ++-- arch/arm/mm/proc-xsc3.S | 4 ++-- arch/arm/mm/proc-xscale.S | 6 ++++-- 25 files changed, 95 insertions(+), 41 deletions(-) diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index a094f964c869..04462bfe9130 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -91,6 +91,21 @@ * DMA Cache Coherency * =================== * + * dma_inv_range(start, end) + * + * Invalidate (discard) the specified virtual address range. + * May not write back any entries. If 'start' or 'end' + * are not cache line aligned, those lines must be written + * back. + * - start - virtual start address + * - end - virtual end address + * + * dma_clean_range(start, end) + * + * Clean (write back) the specified virtual address range. + * - start - virtual start address + * - end - virtual end address + * * dma_flush_range(start, end) * * Clean and invalidate the specified virtual address range. @@ -112,6 +127,8 @@ struct cpu_cache_fns { void (*dma_map_area)(const void *, size_t, int); void (*dma_unmap_area)(const void *, size_t, int); + void (*dma_clean_range)(const void *, const void *); + void (*dma_inv_range)(const void *, const void *); void (*dma_flush_range)(const void *, const void *); } __no_randomize_layout; @@ -137,6 +154,8 @@ extern struct cpu_cache_fns cpu_cache; * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +#define dmac_clean_range cpu_cache.dma_clean_range +#define dmac_inv_range cpu_cache.dma_inv_range #define dmac_flush_range cpu_cache.dma_flush_range #else @@ -156,6 +175,8 @@ extern void __cpuc_flush_dcache_area(void *, size_t); * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +extern void dmac_clean_range(const void *, const void *); +extern void dmac_inv_range(const void *, const void *); extern void dmac_flush_range(const void *, const void *); #endif diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h index 724f8dac1e5b..d8c93b483adf 100644 --- a/arch/arm/include/asm/glue-cache.h +++ b/arch/arm/include/asm/glue-cache.h @@ -139,6 +139,8 @@ static inline int nop_coherent_user_range(unsigned long a, unsigned long b) { return 0; } static inline void nop_flush_kern_dcache_area(void *a, size_t s) { } +static inline void nop_dma_clean_range(const void *a, const void *b) { } +static inline void nop_dma_inv_range(const void *a, const void *b) { } static inline void nop_dma_flush_range(const void *a, const void *b) { } static inline void nop_dma_map_area(const void *s, size_t l, int f) { } @@ -155,6 +157,8 @@ static inline void nop_dma_unmap_area(const void *s, size_t l, int f) { } #define __cpuc_coherent_user_range __glue(_CACHE,_coherent_user_range) #define __cpuc_flush_dcache_area __glue(_CACHE,_flush_kern_dcache_area) +#define dmac_clean_range __glue(_CACHE,_dma_clean_range) +#define dmac_inv_range __glue(_CACHE,_dma_inv_range) #define dmac_flush_range __glue(_CACHE,_dma_flush_range) #endif diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S index 3a464d1649b4..abc3d58948dd 100644 --- a/arch/arm/mm/cache-fa.S +++ b/arch/arm/mm/cache-fa.S @@ -166,7 +166,7 @@ ENTRY(fa_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -fa_dma_inv_range: +ENTRY(fa_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c14, 1 @ clean & invalidate D entry @@ -189,7 +189,7 @@ fa_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -fa_dma_clean_range: +ENTRY(fa_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/cache-nop.S b/arch/arm/mm/cache-nop.S index 72d939ef8798..a058544d6c2b 100644 --- a/arch/arm/mm/cache-nop.S +++ b/arch/arm/mm/cache-nop.S @@ -32,6 +32,12 @@ ENDPROC(nop_coherent_user_range) .globl nop_flush_kern_dcache_area .equ nop_flush_kern_dcache_area, nop_flush_icache_all + .globl nop_dma_clean_range + .equ nop_dma_clean_range, nop_flush_icache_all + + .globl nop_dma_inv_range + .equ nop_dma_inv_range, nop_flush_icache_all + .globl nop_dma_flush_range .equ nop_dma_flush_range, nop_flush_icache_all diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index e2b104876340..b747e591109c 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -103,17 +103,22 @@ ENTRY(v4_flush_kern_dcache_area) /* * dma_flush_range(start, end) + * dma_inv_range(start, end) * * Clean and invalidate the specified virtual address range. + * As only write-through caches are supported here, this is the + * same as invalidate, while the clean operation does nothing. * * - start - virtual start address * - end - virtual end address */ +ENTRY(v4_dma_inv_range) ENTRY(v4_dma_flush_range) #ifdef CONFIG_CPU_CP15 mov r0, #0 mcr p15, 0, r0, c7, c7, 0 @ flush ID cache #endif +ENTRY(v4_dma_clean_range) ret lr /* diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S index 905ac2fa2b1e..55f609eae38d 100644 --- a/arch/arm/mm/cache-v4wb.S +++ b/arch/arm/mm/cache-v4wb.S @@ -183,7 +183,7 @@ ENTRY(v4wb_coherent_user_range) * - start - virtual start address * - end - virtual end address */ -v4wb_dma_inv_range: +ENTRY(v4wb_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -204,7 +204,7 @@ v4wb_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -v4wb_dma_clean_range: +ENTRY(v4wb_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 652218752f88..1a88627ec09b 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -152,7 +152,7 @@ ENTRY(v4wt_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -v4wt_dma_inv_range: +ENTRY(v4wt_dma_inv_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c6, 1 @ invalidate D entry add r0, r0, #CACHE_DLINESIZE @@ -171,6 +171,18 @@ v4wt_dma_inv_range: .globl v4wt_dma_flush_range .equ v4wt_dma_flush_range, v4wt_dma_inv_range +/* + * dma_clean_range(start, end) + * + * Clean the specified virtual address range. + * Empty implementation for writethrough caches. + * + * - start - virtual start address + * - end - virtual end address + */ + .globl v4wt_dma_clean_range + .equ v4wt_dma_clean_range, v4wt_dma_unmap_area + /* * dma_map_area(start, size, dir) * - start - kernel virtual start address diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index 250c83bf7158..abae7ff5defc 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -200,7 +200,7 @@ ENTRY(v6_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v6_dma_inv_range: +ENTRY(v6_dma_inv_range) #ifdef CONFIG_DMA_CACHE_RWFO ldrb r2, [r0] @ read for ownership strb r2, [r0] @ write for ownership @@ -245,7 +245,7 @@ v6_dma_inv_range: * - start - virtual start address of region * - end - virtual end address of region */ -v6_dma_clean_range: +ENTRY(v6_dma_clean_range) bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: #ifdef CONFIG_DMA_CACHE_RWFO diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index 127afe2096ba..b16a0d2a7cce 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -361,7 +361,7 @@ ENDPROC(v7_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v7_dma_inv_range: +ENTRY(v7_dma_inv_range) dcache_line_size r2, r3 sub r3, r2, #1 tst r0, r3 @@ -391,7 +391,7 @@ ENDPROC(v7_dma_inv_range) * - start - virtual start address of region * - end - virtual end address of region */ -v7_dma_clean_range: +ENTRY(v7_dma_clean_range) dcache_line_size r2, r3 sub r3, r2, #1 bic r0, r0, r3 @@ -477,6 +477,8 @@ ENDPROC(v7_dma_unmap_area) globl_equ b15_dma_map_area, v7_dma_map_area globl_equ b15_dma_unmap_area, v7_dma_unmap_area + globl_equ b15_dma_clean_range, v7_dma_clean_range + globl_equ b15_dma_inv_range, v7_dma_inv_range globl_equ b15_dma_flush_range, v7_dma_flush_range define_cache_functions b15 diff --git a/arch/arm/mm/cache-v7m.S b/arch/arm/mm/cache-v7m.S index eb60b5e5e2ad..4fc6e0028e40 100644 --- a/arch/arm/mm/cache-v7m.S +++ b/arch/arm/mm/cache-v7m.S @@ -364,7 +364,7 @@ ENDPROC(v7m_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v7m_dma_inv_range: +ENTRY(v7m_dma_inv_range) dcache_line_size r2, r3 sub r3, r2, #1 tst r0, r3 @@ -390,7 +390,7 @@ ENDPROC(v7m_dma_inv_range) * - start - virtual start address of region * - end - virtual end address of region */ -v7m_dma_clean_range: +ENTRY(v7m_dma_clean_range) dcache_line_size r2, r3 sub r3, r2, #1 bic r0, r0, r3 diff --git a/arch/arm/mm/proc-arm1020.S b/arch/arm/mm/proc-arm1020.S index 6837cf7a4812..0089e366f4e8 100644 --- a/arch/arm/mm/proc-arm1020.S +++ b/arch/arm/mm/proc-arm1020.S @@ -263,7 +263,7 @@ ENTRY(arm1020_flush_kern_dcache_area) * * (same as v4wb) */ -arm1020_dma_inv_range: +ENTRY(arm1020_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -293,7 +293,7 @@ arm1020_dma_inv_range: * * (same as v4wb) */ -arm1020_dma_clean_range: +ENTRY(arm1020_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S index df49b10250b8..c662e55a76fa 100644 --- a/arch/arm/mm/proc-arm1020e.S +++ b/arch/arm/mm/proc-arm1020e.S @@ -256,7 +256,7 @@ ENTRY(arm1020e_flush_kern_dcache_area) * * (same as v4wb) */ -arm1020e_dma_inv_range: +ENTRY(arm1020e_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -282,7 +282,7 @@ arm1020e_dma_inv_range: * * (same as v4wb) */ -arm1020e_dma_clean_range: +ENTRY(arm1020e_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S index e89ce467f672..e77328906bc5 100644 --- a/arch/arm/mm/proc-arm1022.S +++ b/arch/arm/mm/proc-arm1022.S @@ -256,7 +256,7 @@ ENTRY(arm1022_flush_kern_dcache_area) * * (same as v4wb) */ -arm1022_dma_inv_range: +ENTRY(arm1022_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -282,7 +282,7 @@ arm1022_dma_inv_range: * * (same as v4wb) */ -arm1022_dma_clean_range: +ENTRY(arm1022_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S index 7fdd1a205e8e..a23f9fa28d07 100644 --- a/arch/arm/mm/proc-arm1026.S +++ b/arch/arm/mm/proc-arm1026.S @@ -250,7 +250,7 @@ ENTRY(arm1026_flush_kern_dcache_area) * * (same as v4wb) */ -arm1026_dma_inv_range: +ENTRY(arm1026_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -276,7 +276,7 @@ arm1026_dma_inv_range: * * (same as v4wb) */ -arm1026_dma_clean_range: +ENTRY(arm1026_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S index a234cd8ba5e6..4c918ab106f3 100644 --- a/arch/arm/mm/proc-arm920.S +++ b/arch/arm/mm/proc-arm920.S @@ -232,7 +232,7 @@ ENTRY(arm920_flush_kern_dcache_area) * * (same as v4wb) */ -arm920_dma_inv_range: +ENTRY(arm920_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -255,7 +255,7 @@ arm920_dma_inv_range: * * (same as v4wb) */ -arm920_dma_clean_range: +ENTRY(arm920_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S index 53c029dcfd83..6ac7bb7d94a4 100644 --- a/arch/arm/mm/proc-arm922.S +++ b/arch/arm/mm/proc-arm922.S @@ -234,7 +234,7 @@ ENTRY(arm922_flush_kern_dcache_area) * * (same as v4wb) */ -arm922_dma_inv_range: +ENTRY(arm922_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -257,7 +257,7 @@ arm922_dma_inv_range: * * (same as v4wb) */ -arm922_dma_clean_range: +ENTRY(arm922_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S index 0bfad62ea858..860f0074ff81 100644 --- a/arch/arm/mm/proc-arm925.S +++ b/arch/arm/mm/proc-arm925.S @@ -280,7 +280,7 @@ ENTRY(arm925_flush_kern_dcache_area) * * (same as v4wb) */ -arm925_dma_inv_range: +ENTRY(arm925_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -305,7 +305,7 @@ arm925_dma_inv_range: * * (same as v4wb) */ -arm925_dma_clean_range: +ENTRY(arm925_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S index 0487a2c3439b..519f62e023c5 100644 --- a/arch/arm/mm/proc-arm926.S +++ b/arch/arm/mm/proc-arm926.S @@ -243,7 +243,7 @@ ENTRY(arm926_flush_kern_dcache_area) * * (same as v4wb) */ -arm926_dma_inv_range: +ENTRY(arm926_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -268,7 +268,7 @@ arm926_dma_inv_range: * * (same as v4wb) */ -arm926_dma_clean_range: +ENTRY(arm926_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S index cf9bfcc825ca..14dda5c5ee4a 100644 --- a/arch/arm/mm/proc-arm940.S +++ b/arch/arm/mm/proc-arm940.S @@ -177,7 +177,7 @@ ENTRY(arm940_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -arm940_dma_inv_range: +ENTRY(arm940_dma_inv_range) mov ip, #0 mov r1, #(CACHE_DSEGMENTS - 1) << 4 @ 4 segments 1: orr r3, r1, #(CACHE_DENTRIES - 1) << 26 @ 64 entries @@ -198,7 +198,7 @@ arm940_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -arm940_dma_clean_range: +ENTRY(arm940_dma_clean_range) ENTRY(cpu_arm940_dcache_clean_area) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S index 6fb3898ad1cd..91f62a7d334b 100644 --- a/arch/arm/mm/proc-arm946.S +++ b/arch/arm/mm/proc-arm946.S @@ -222,7 +222,7 @@ ENTRY(arm946_flush_kern_dcache_area) * - end - virtual end address * (same as arm926) */ -arm946_dma_inv_range: +ENTRY(arm946_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -247,7 +247,7 @@ arm946_dma_inv_range: * * (same as arm926) */ -arm946_dma_clean_range: +ENTRY(arm946_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S index 61ce82aca6f0..86122bad6d9b 100644 --- a/arch/arm/mm/proc-feroceon.S +++ b/arch/arm/mm/proc-feroceon.S @@ -271,7 +271,7 @@ ENTRY(feroceon_range_flush_kern_dcache_area) * (same as v4wb) */ .align 5 -feroceon_dma_inv_range: +ENTRY(feroceon_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -285,7 +285,7 @@ feroceon_dma_inv_range: ret lr .align 5 -feroceon_range_dma_inv_range: +ENTRY(feroceon_range_dma_inv_range) mrs r2, cpsr tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -311,7 +311,7 @@ feroceon_range_dma_inv_range: * (same as v4wb) */ .align 5 -feroceon_dma_clean_range: +ENTRY(feroceon_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE @@ -321,7 +321,7 @@ feroceon_dma_clean_range: ret lr .align 5 -feroceon_range_dma_clean_range: +ENTRY(feroceon_range_dma_clean_range) mrs r2, cpsr cmp r1, r0 subne r1, r1, #1 @ top address is inclusive diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S index e43f6d716b4b..c1328955fd2a 100644 --- a/arch/arm/mm/proc-macros.S +++ b/arch/arm/mm/proc-macros.S @@ -334,6 +334,8 @@ ENTRY(\name\()_cache_fns) .long \name\()_flush_kern_dcache_area .long \name\()_dma_map_area .long \name\()_dma_unmap_area + .long \name\()_dma_clean_range + .long \name\()_dma_inv_range .long \name\()_dma_flush_range .size \name\()_cache_fns, . - \name\()_cache_fns .endm diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S index 1645ccaffe96..db3a2f00372a 100644 --- a/arch/arm/mm/proc-mohawk.S +++ b/arch/arm/mm/proc-mohawk.S @@ -216,7 +216,7 @@ ENTRY(mohawk_flush_kern_dcache_area) * * (same as v4wb) */ -mohawk_dma_inv_range: +ENTRY(mohawk_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry tst r1, #CACHE_DLINESIZE - 1 @@ -239,7 +239,7 @@ mohawk_dma_inv_range: * * (same as v4wb) */ -mohawk_dma_clean_range: +ENTRY(mohawk_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S index a17afe7e195a..6db611a945f3 100644 --- a/arch/arm/mm/proc-xsc3.S +++ b/arch/arm/mm/proc-xsc3.S @@ -263,7 +263,7 @@ ENTRY(xsc3_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -xsc3_dma_inv_range: +ENTRY(xsc3_dma_inv_range) tst r0, #CACHELINESIZE - 1 bic r0, r0, #CACHELINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean L1 D line @@ -284,7 +284,7 @@ xsc3_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -xsc3_dma_clean_range: +ENTRY(xsc3_dma_clean_range) bic r0, r0, #CACHELINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean L1 D line add r0, r0, #CACHELINESIZE diff --git a/arch/arm/mm/proc-xscale.S b/arch/arm/mm/proc-xscale.S index d82590aa71c0..291dec830714 100644 --- a/arch/arm/mm/proc-xscale.S +++ b/arch/arm/mm/proc-xscale.S @@ -323,7 +323,7 @@ ENTRY(xscale_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -xscale_dma_inv_range: +ENTRY(xscale_dma_inv_range) tst r0, #CACHELINESIZE - 1 bic r0, r0, #CACHELINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -344,7 +344,7 @@ xscale_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -xscale_dma_clean_range: +ENTRY(xscale_dma_clean_range) bic r0, r0, #CACHELINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHELINESIZE @@ -445,6 +445,8 @@ ENDPROC(xscale_dma_unmap_area) a0_alias coherent_kern_range a0_alias coherent_user_range a0_alias flush_kern_dcache_area + a0_alias dma_clean_range + a0_alias dma_inv_range a0_alias dma_flush_range a0_alias dma_unmap_area -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: remove dmac_clean_range and dmac_inv_range") in an effort to sanitize the dma-mapping API. Now this logic is getting moved into the generic dma-mapping implementation in order to give architectures less control over it, which requires reverting that earlier work. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/include/asm/cacheflush.h | 21 +++++++++++++++++++++ arch/arm/include/asm/glue-cache.h | 4 ++++ arch/arm/mm/cache-fa.S | 4 ++-- arch/arm/mm/cache-nop.S | 6 ++++++ arch/arm/mm/cache-v4.S | 5 +++++ arch/arm/mm/cache-v4wb.S | 4 ++-- arch/arm/mm/cache-v4wt.S | 14 +++++++++++++- arch/arm/mm/cache-v6.S | 4 ++-- arch/arm/mm/cache-v7.S | 6 ++++-- arch/arm/mm/cache-v7m.S | 4 ++-- arch/arm/mm/proc-arm1020.S | 4 ++-- arch/arm/mm/proc-arm1020e.S | 4 ++-- arch/arm/mm/proc-arm1022.S | 4 ++-- arch/arm/mm/proc-arm1026.S | 4 ++-- arch/arm/mm/proc-arm920.S | 4 ++-- arch/arm/mm/proc-arm922.S | 4 ++-- arch/arm/mm/proc-arm925.S | 4 ++-- arch/arm/mm/proc-arm926.S | 4 ++-- arch/arm/mm/proc-arm940.S | 4 ++-- arch/arm/mm/proc-arm946.S | 4 ++-- arch/arm/mm/proc-feroceon.S | 8 ++++---- arch/arm/mm/proc-macros.S | 2 ++ arch/arm/mm/proc-mohawk.S | 4 ++-- arch/arm/mm/proc-xsc3.S | 4 ++-- arch/arm/mm/proc-xscale.S | 6 ++++-- 25 files changed, 95 insertions(+), 41 deletions(-) diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index a094f964c869..04462bfe9130 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -91,6 +91,21 @@ * DMA Cache Coherency * =================== * + * dma_inv_range(start, end) + * + * Invalidate (discard) the specified virtual address range. + * May not write back any entries. If 'start' or 'end' + * are not cache line aligned, those lines must be written + * back. + * - start - virtual start address + * - end - virtual end address + * + * dma_clean_range(start, end) + * + * Clean (write back) the specified virtual address range. + * - start - virtual start address + * - end - virtual end address + * * dma_flush_range(start, end) * * Clean and invalidate the specified virtual address range. @@ -112,6 +127,8 @@ struct cpu_cache_fns { void (*dma_map_area)(const void *, size_t, int); void (*dma_unmap_area)(const void *, size_t, int); + void (*dma_clean_range)(const void *, const void *); + void (*dma_inv_range)(const void *, const void *); void (*dma_flush_range)(const void *, const void *); } __no_randomize_layout; @@ -137,6 +154,8 @@ extern struct cpu_cache_fns cpu_cache; * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +#define dmac_clean_range cpu_cache.dma_clean_range +#define dmac_inv_range cpu_cache.dma_inv_range #define dmac_flush_range cpu_cache.dma_flush_range #else @@ -156,6 +175,8 @@ extern void __cpuc_flush_dcache_area(void *, size_t); * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +extern void dmac_clean_range(const void *, const void *); +extern void dmac_inv_range(const void *, const void *); extern void dmac_flush_range(const void *, const void *); #endif diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h index 724f8dac1e5b..d8c93b483adf 100644 --- a/arch/arm/include/asm/glue-cache.h +++ b/arch/arm/include/asm/glue-cache.h @@ -139,6 +139,8 @@ static inline int nop_coherent_user_range(unsigned long a, unsigned long b) { return 0; } static inline void nop_flush_kern_dcache_area(void *a, size_t s) { } +static inline void nop_dma_clean_range(const void *a, const void *b) { } +static inline void nop_dma_inv_range(const void *a, const void *b) { } static inline void nop_dma_flush_range(const void *a, const void *b) { } static inline void nop_dma_map_area(const void *s, size_t l, int f) { } @@ -155,6 +157,8 @@ static inline void nop_dma_unmap_area(const void *s, size_t l, int f) { } #define __cpuc_coherent_user_range __glue(_CACHE,_coherent_user_range) #define __cpuc_flush_dcache_area __glue(_CACHE,_flush_kern_dcache_area) +#define dmac_clean_range __glue(_CACHE,_dma_clean_range) +#define dmac_inv_range __glue(_CACHE,_dma_inv_range) #define dmac_flush_range __glue(_CACHE,_dma_flush_range) #endif diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S index 3a464d1649b4..abc3d58948dd 100644 --- a/arch/arm/mm/cache-fa.S +++ b/arch/arm/mm/cache-fa.S @@ -166,7 +166,7 @@ ENTRY(fa_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -fa_dma_inv_range: +ENTRY(fa_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c14, 1 @ clean & invalidate D entry @@ -189,7 +189,7 @@ fa_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -fa_dma_clean_range: +ENTRY(fa_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/cache-nop.S b/arch/arm/mm/cache-nop.S index 72d939ef8798..a058544d6c2b 100644 --- a/arch/arm/mm/cache-nop.S +++ b/arch/arm/mm/cache-nop.S @@ -32,6 +32,12 @@ ENDPROC(nop_coherent_user_range) .globl nop_flush_kern_dcache_area .equ nop_flush_kern_dcache_area, nop_flush_icache_all + .globl nop_dma_clean_range + .equ nop_dma_clean_range, nop_flush_icache_all + + .globl nop_dma_inv_range + .equ nop_dma_inv_range, nop_flush_icache_all + .globl nop_dma_flush_range .equ nop_dma_flush_range, nop_flush_icache_all diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index e2b104876340..b747e591109c 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -103,17 +103,22 @@ ENTRY(v4_flush_kern_dcache_area) /* * dma_flush_range(start, end) + * dma_inv_range(start, end) * * Clean and invalidate the specified virtual address range. + * As only write-through caches are supported here, this is the + * same as invalidate, while the clean operation does nothing. * * - start - virtual start address * - end - virtual end address */ +ENTRY(v4_dma_inv_range) ENTRY(v4_dma_flush_range) #ifdef CONFIG_CPU_CP15 mov r0, #0 mcr p15, 0, r0, c7, c7, 0 @ flush ID cache #endif +ENTRY(v4_dma_clean_range) ret lr /* diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S index 905ac2fa2b1e..55f609eae38d 100644 --- a/arch/arm/mm/cache-v4wb.S +++ b/arch/arm/mm/cache-v4wb.S @@ -183,7 +183,7 @@ ENTRY(v4wb_coherent_user_range) * - start - virtual start address * - end - virtual end address */ -v4wb_dma_inv_range: +ENTRY(v4wb_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -204,7 +204,7 @@ v4wb_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -v4wb_dma_clean_range: +ENTRY(v4wb_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 652218752f88..1a88627ec09b 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -152,7 +152,7 @@ ENTRY(v4wt_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -v4wt_dma_inv_range: +ENTRY(v4wt_dma_inv_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c6, 1 @ invalidate D entry add r0, r0, #CACHE_DLINESIZE @@ -171,6 +171,18 @@ v4wt_dma_inv_range: .globl v4wt_dma_flush_range .equ v4wt_dma_flush_range, v4wt_dma_inv_range +/* + * dma_clean_range(start, end) + * + * Clean the specified virtual address range. + * Empty implementation for writethrough caches. + * + * - start - virtual start address + * - end - virtual end address + */ + .globl v4wt_dma_clean_range + .equ v4wt_dma_clean_range, v4wt_dma_unmap_area + /* * dma_map_area(start, size, dir) * - start - kernel virtual start address diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index 250c83bf7158..abae7ff5defc 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -200,7 +200,7 @@ ENTRY(v6_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v6_dma_inv_range: +ENTRY(v6_dma_inv_range) #ifdef CONFIG_DMA_CACHE_RWFO ldrb r2, [r0] @ read for ownership strb r2, [r0] @ write for ownership @@ -245,7 +245,7 @@ v6_dma_inv_range: * - start - virtual start address of region * - end - virtual end address of region */ -v6_dma_clean_range: +ENTRY(v6_dma_clean_range) bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: #ifdef CONFIG_DMA_CACHE_RWFO diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index 127afe2096ba..b16a0d2a7cce 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -361,7 +361,7 @@ ENDPROC(v7_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v7_dma_inv_range: +ENTRY(v7_dma_inv_range) dcache_line_size r2, r3 sub r3, r2, #1 tst r0, r3 @@ -391,7 +391,7 @@ ENDPROC(v7_dma_inv_range) * - start - virtual start address of region * - end - virtual end address of region */ -v7_dma_clean_range: +ENTRY(v7_dma_clean_range) dcache_line_size r2, r3 sub r3, r2, #1 bic r0, r0, r3 @@ -477,6 +477,8 @@ ENDPROC(v7_dma_unmap_area) globl_equ b15_dma_map_area, v7_dma_map_area globl_equ b15_dma_unmap_area, v7_dma_unmap_area + globl_equ b15_dma_clean_range, v7_dma_clean_range + globl_equ b15_dma_inv_range, v7_dma_inv_range globl_equ b15_dma_flush_range, v7_dma_flush_range define_cache_functions b15 diff --git a/arch/arm/mm/cache-v7m.S b/arch/arm/mm/cache-v7m.S index eb60b5e5e2ad..4fc6e0028e40 100644 --- a/arch/arm/mm/cache-v7m.S +++ b/arch/arm/mm/cache-v7m.S @@ -364,7 +364,7 @@ ENDPROC(v7m_flush_kern_dcache_area) * - start - virtual start address of region * - end - virtual end address of region */ -v7m_dma_inv_range: +ENTRY(v7m_dma_inv_range) dcache_line_size r2, r3 sub r3, r2, #1 tst r0, r3 @@ -390,7 +390,7 @@ ENDPROC(v7m_dma_inv_range) * - start - virtual start address of region * - end - virtual end address of region */ -v7m_dma_clean_range: +ENTRY(v7m_dma_clean_range) dcache_line_size r2, r3 sub r3, r2, #1 bic r0, r0, r3 diff --git a/arch/arm/mm/proc-arm1020.S b/arch/arm/mm/proc-arm1020.S index 6837cf7a4812..0089e366f4e8 100644 --- a/arch/arm/mm/proc-arm1020.S +++ b/arch/arm/mm/proc-arm1020.S @@ -263,7 +263,7 @@ ENTRY(arm1020_flush_kern_dcache_area) * * (same as v4wb) */ -arm1020_dma_inv_range: +ENTRY(arm1020_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -293,7 +293,7 @@ arm1020_dma_inv_range: * * (same as v4wb) */ -arm1020_dma_clean_range: +ENTRY(arm1020_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S index df49b10250b8..c662e55a76fa 100644 --- a/arch/arm/mm/proc-arm1020e.S +++ b/arch/arm/mm/proc-arm1020e.S @@ -256,7 +256,7 @@ ENTRY(arm1020e_flush_kern_dcache_area) * * (same as v4wb) */ -arm1020e_dma_inv_range: +ENTRY(arm1020e_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -282,7 +282,7 @@ arm1020e_dma_inv_range: * * (same as v4wb) */ -arm1020e_dma_clean_range: +ENTRY(arm1020e_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S index e89ce467f672..e77328906bc5 100644 --- a/arch/arm/mm/proc-arm1022.S +++ b/arch/arm/mm/proc-arm1022.S @@ -256,7 +256,7 @@ ENTRY(arm1022_flush_kern_dcache_area) * * (same as v4wb) */ -arm1022_dma_inv_range: +ENTRY(arm1022_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -282,7 +282,7 @@ arm1022_dma_inv_range: * * (same as v4wb) */ -arm1022_dma_clean_range: +ENTRY(arm1022_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S index 7fdd1a205e8e..a23f9fa28d07 100644 --- a/arch/arm/mm/proc-arm1026.S +++ b/arch/arm/mm/proc-arm1026.S @@ -250,7 +250,7 @@ ENTRY(arm1026_flush_kern_dcache_area) * * (same as v4wb) */ -arm1026_dma_inv_range: +ENTRY(arm1026_dma_inv_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE tst r0, #CACHE_DLINESIZE - 1 @@ -276,7 +276,7 @@ arm1026_dma_inv_range: * * (same as v4wb) */ -arm1026_dma_clean_range: +ENTRY(arm1026_dma_clean_range) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_DISABLE bic r0, r0, #CACHE_DLINESIZE - 1 diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S index a234cd8ba5e6..4c918ab106f3 100644 --- a/arch/arm/mm/proc-arm920.S +++ b/arch/arm/mm/proc-arm920.S @@ -232,7 +232,7 @@ ENTRY(arm920_flush_kern_dcache_area) * * (same as v4wb) */ -arm920_dma_inv_range: +ENTRY(arm920_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -255,7 +255,7 @@ arm920_dma_inv_range: * * (same as v4wb) */ -arm920_dma_clean_range: +ENTRY(arm920_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S index 53c029dcfd83..6ac7bb7d94a4 100644 --- a/arch/arm/mm/proc-arm922.S +++ b/arch/arm/mm/proc-arm922.S @@ -234,7 +234,7 @@ ENTRY(arm922_flush_kern_dcache_area) * * (same as v4wb) */ -arm922_dma_inv_range: +ENTRY(arm922_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -257,7 +257,7 @@ arm922_dma_inv_range: * * (same as v4wb) */ -arm922_dma_clean_range: +ENTRY(arm922_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S index 0bfad62ea858..860f0074ff81 100644 --- a/arch/arm/mm/proc-arm925.S +++ b/arch/arm/mm/proc-arm925.S @@ -280,7 +280,7 @@ ENTRY(arm925_flush_kern_dcache_area) * * (same as v4wb) */ -arm925_dma_inv_range: +ENTRY(arm925_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -305,7 +305,7 @@ arm925_dma_inv_range: * * (same as v4wb) */ -arm925_dma_clean_range: +ENTRY(arm925_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S index 0487a2c3439b..519f62e023c5 100644 --- a/arch/arm/mm/proc-arm926.S +++ b/arch/arm/mm/proc-arm926.S @@ -243,7 +243,7 @@ ENTRY(arm926_flush_kern_dcache_area) * * (same as v4wb) */ -arm926_dma_inv_range: +ENTRY(arm926_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -268,7 +268,7 @@ arm926_dma_inv_range: * * (same as v4wb) */ -arm926_dma_clean_range: +ENTRY(arm926_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S index cf9bfcc825ca..14dda5c5ee4a 100644 --- a/arch/arm/mm/proc-arm940.S +++ b/arch/arm/mm/proc-arm940.S @@ -177,7 +177,7 @@ ENTRY(arm940_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -arm940_dma_inv_range: +ENTRY(arm940_dma_inv_range) mov ip, #0 mov r1, #(CACHE_DSEGMENTS - 1) << 4 @ 4 segments 1: orr r3, r1, #(CACHE_DENTRIES - 1) << 26 @ 64 entries @@ -198,7 +198,7 @@ arm940_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -arm940_dma_clean_range: +ENTRY(arm940_dma_clean_range) ENTRY(cpu_arm940_dcache_clean_area) mov ip, #0 #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S index 6fb3898ad1cd..91f62a7d334b 100644 --- a/arch/arm/mm/proc-arm946.S +++ b/arch/arm/mm/proc-arm946.S @@ -222,7 +222,7 @@ ENTRY(arm946_flush_kern_dcache_area) * - end - virtual end address * (same as arm926) */ -arm946_dma_inv_range: +ENTRY(arm946_dma_inv_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -247,7 +247,7 @@ arm946_dma_inv_range: * * (same as arm926) */ -arm946_dma_clean_range: +ENTRY(arm946_dma_clean_range) #ifndef CONFIG_CPU_DCACHE_WRITETHROUGH bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S index 61ce82aca6f0..86122bad6d9b 100644 --- a/arch/arm/mm/proc-feroceon.S +++ b/arch/arm/mm/proc-feroceon.S @@ -271,7 +271,7 @@ ENTRY(feroceon_range_flush_kern_dcache_area) * (same as v4wb) */ .align 5 -feroceon_dma_inv_range: +ENTRY(feroceon_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 bic r0, r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -285,7 +285,7 @@ feroceon_dma_inv_range: ret lr .align 5 -feroceon_range_dma_inv_range: +ENTRY(feroceon_range_dma_inv_range) mrs r2, cpsr tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -311,7 +311,7 @@ feroceon_range_dma_inv_range: * (same as v4wb) */ .align 5 -feroceon_dma_clean_range: +ENTRY(feroceon_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE @@ -321,7 +321,7 @@ feroceon_dma_clean_range: ret lr .align 5 -feroceon_range_dma_clean_range: +ENTRY(feroceon_range_dma_clean_range) mrs r2, cpsr cmp r1, r0 subne r1, r1, #1 @ top address is inclusive diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S index e43f6d716b4b..c1328955fd2a 100644 --- a/arch/arm/mm/proc-macros.S +++ b/arch/arm/mm/proc-macros.S @@ -334,6 +334,8 @@ ENTRY(\name\()_cache_fns) .long \name\()_flush_kern_dcache_area .long \name\()_dma_map_area .long \name\()_dma_unmap_area + .long \name\()_dma_clean_range + .long \name\()_dma_inv_range .long \name\()_dma_flush_range .size \name\()_cache_fns, . - \name\()_cache_fns .endm diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S index 1645ccaffe96..db3a2f00372a 100644 --- a/arch/arm/mm/proc-mohawk.S +++ b/arch/arm/mm/proc-mohawk.S @@ -216,7 +216,7 @@ ENTRY(mohawk_flush_kern_dcache_area) * * (same as v4wb) */ -mohawk_dma_inv_range: +ENTRY(mohawk_dma_inv_range) tst r0, #CACHE_DLINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry tst r1, #CACHE_DLINESIZE - 1 @@ -239,7 +239,7 @@ mohawk_dma_inv_range: * * (same as v4wb) */ -mohawk_dma_clean_range: +ENTRY(mohawk_dma_clean_range) bic r0, r0, #CACHE_DLINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHE_DLINESIZE diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S index a17afe7e195a..6db611a945f3 100644 --- a/arch/arm/mm/proc-xsc3.S +++ b/arch/arm/mm/proc-xsc3.S @@ -263,7 +263,7 @@ ENTRY(xsc3_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -xsc3_dma_inv_range: +ENTRY(xsc3_dma_inv_range) tst r0, #CACHELINESIZE - 1 bic r0, r0, #CACHELINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean L1 D line @@ -284,7 +284,7 @@ xsc3_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -xsc3_dma_clean_range: +ENTRY(xsc3_dma_clean_range) bic r0, r0, #CACHELINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean L1 D line add r0, r0, #CACHELINESIZE diff --git a/arch/arm/mm/proc-xscale.S b/arch/arm/mm/proc-xscale.S index d82590aa71c0..291dec830714 100644 --- a/arch/arm/mm/proc-xscale.S +++ b/arch/arm/mm/proc-xscale.S @@ -323,7 +323,7 @@ ENTRY(xscale_flush_kern_dcache_area) * - start - virtual start address * - end - virtual end address */ -xscale_dma_inv_range: +ENTRY(xscale_dma_inv_range) tst r0, #CACHELINESIZE - 1 bic r0, r0, #CACHELINESIZE - 1 mcrne p15, 0, r0, c7, c10, 1 @ clean D entry @@ -344,7 +344,7 @@ xscale_dma_inv_range: * - start - virtual start address * - end - virtual end address */ -xscale_dma_clean_range: +ENTRY(xscale_dma_clean_range) bic r0, r0, #CACHELINESIZE - 1 1: mcr p15, 0, r0, c7, c10, 1 @ clean D entry add r0, r0, #CACHELINESIZE @@ -445,6 +445,8 @@ ENDPROC(xscale_dma_unmap_area) a0_alias coherent_kern_range a0_alias coherent_user_range a0_alias flush_kern_dcache_area + a0_alias dma_clean_range + a0_alias dma_inv_range a0_alias dma_flush_range a0_alias dma_unmap_area -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 13:10 ` Russell King (Oracle) -1 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-27 13:10 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:13:12PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: > remove dmac_clean_range and dmac_inv_range") in an effort to sanitize > the dma-mapping API. Really no, please no. Let's not go back to this, let's keep the buffer ownership model that came at around that time. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range @ 2023-03-27 13:10 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-27 13:10 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:13:12PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: > remove dmac_clean_range and dmac_inv_range") in an effort to sanitize > the dma-mapping API. Really no, please no. Let's not go back to this, let's keep the buffer ownership model that came at around that time. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range @ 2023-03-27 13:10 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-27 13:10 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller On Mon, Mar 27, 2023 at 02:13:12PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: > remove dmac_clean_range and dmac_inv_range") in an effort to sanitize > the dma-mapping API. Really no, please no. Let's not go back to this, let's keep the buffer ownership model that came at around that time. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range @ 2023-03-27 13:10 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-27 13:10 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:13:12PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: > remove dmac_clean_range and dmac_inv_range") in an effort to sanitize > the dma-mapping API. Really no, please no. Let's not go back to this, let's keep the buffer ownership model that came at around that time. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range @ 2023-03-27 13:10 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-27 13:10 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:13:12PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: > remove dmac_clean_range and dmac_inv_range") in an effort to sanitize > the dma-mapping API. Really no, please no. Let's not go back to this, let's keep the buffer ownership model that came at around that time. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range @ 2023-03-27 13:10 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-27 13:10 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John On Mon, Mar 27, 2023 at 02:13:12PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: > remove dmac_clean_range and dmac_inv_range") in an effort to sanitize > the dma-mapping API. Really no, please no. Let's not go back to this, let's keep the buffer ownership model that came at around that time. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The arm specific iommu code in dma-mapping.c uses the page+offset based __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() wrappers around the. In order to be able to move the latter part set of functions into common code, change the iommu implementation to use them directly and remove the internal ones as a separate interface. As page+offset and phys_address are equivalent, but are used in different parts of the code here, this allows removing some of the conversion but adds them elsewhere. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/dma-mapping.c | 93 ++++++++++++++------------------------- 1 file changed, 33 insertions(+), 60 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 8bc01071474a..ce4b74f34a58 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -622,16 +622,14 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, kfree(buf); } -static void dma_cache_maint_page(struct page *page, unsigned long offset, +static void dma_cache_maint(phys_addr_t paddr, size_t size, enum dma_data_direction dir, void (*op)(const void *, size_t, int)) { - unsigned long pfn; + unsigned long pfn = PFN_DOWN(paddr); + unsigned long offset = paddr % PAGE_SIZE; size_t left = size; - pfn = page_to_pfn(page) + offset / PAGE_SIZE; - offset %= PAGE_SIZE; - /* * A single sg entry may refer to multiple physically contiguous * pages. But we still need to process highmem pages individually. @@ -641,8 +639,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, do { size_t len = left; void *vaddr; - - page = pfn_to_page(pfn); + struct page *page = pfn_to_page(pfn); if (PageHighMem(page)) { if (len + offset > PAGE_SIZE) @@ -674,14 +671,11 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, * Note: Drivers should NOT use this function directly. * Use the driver DMA support - see dma-mapping.h (dma_sync_*) */ -static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr; + dma_cache_maint(paddr, size, dir, dmac_map_area); - dma_cache_maint_page(page, off, size, dir, dmac_map_area); - - paddr = page_to_phys(page) + off; if (dir == DMA_FROM_DEVICE) { outer_inv_range(paddr, paddr + size); } else { @@ -690,34 +684,30 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, /* FIXME: non-speculating: flush on bidirectional mappings? */ } -static void __dma_page_dev_to_cpu(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr = page_to_phys(page) + off; - /* FIXME: non-speculating: not required */ /* in any case, don't bother invalidating if DMA to device */ if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dma_cache_maint_page(page, off, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dir, dmac_unmap_area); } /* * Mark the D-cache clean for these pages to avoid extra flushing. */ if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn; + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); size_t left = size; - pfn = page_to_pfn(page) + off / PAGE_SIZE; - off %= PAGE_SIZE; - if (off) { - pfn++; + if (off) left -= PAGE_SIZE - off; - } + while (left >= PAGE_SIZE) { - page = pfn_to_page(pfn++); + struct page *page = pfn_to_page(pfn++); set_bit(PG_dcache_clean, &page->flags); left -= PAGE_SIZE; } @@ -1204,7 +1194,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, unsigned int len = PAGE_ALIGN(s->offset + s->length); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(phys + s->offset, s->length, dir); prot = __dma_info_to_prot(dir, attrs); @@ -1306,8 +1296,7 @@ static void arm_iommu_unmap_sg(struct device *dev, __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_dev_to_cpu(sg_page(s), s->offset, - s->length, dir); + arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); } } @@ -1329,7 +1318,7 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, return; for_each_sg(sg, s, nents, i) - __dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); } @@ -1351,7 +1340,8 @@ static void arm_iommu_sync_sg_for_device(struct device *dev, return; for_each_sg(sg, s, nents, i) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(page_to_phys(sg_page(s)) + s->offset, + s->length, dir); } /** @@ -1373,7 +1363,8 @@ static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page, int ret, prot, len = PAGE_ALIGN(size + offset); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(page, offset, size, dir); + arch_sync_dma_for_device(page_to_phys(page) + offset, + size, dir); dma_addr = __alloc_iova(mapping, len); if (dma_addr == DMA_MAPPING_ERROR) @@ -1406,7 +1397,7 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); dma_addr_t iova = handle & PAGE_MASK; - struct page *page; + phys_addr_t phys; int offset = handle & ~PAGE_MASK; int len = PAGE_ALIGN(size + offset); @@ -1414,8 +1405,8 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, return; if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_dev_to_cpu(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_cpu(phys, size, dir); } iommu_unmap(mapping->domain, iova, len); @@ -1483,30 +1474,26 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - dma_addr_t iova = handle & PAGE_MASK; - struct page *page; - unsigned int offset = handle & ~PAGE_MASK; + phys_addr_t phys; - if (dev->dma_coherent || !iova) + if (dev->dma_coherent || !(handle & PAGE_MASK)) return; - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_dev_to_cpu(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_cpu(phys, size, dir); } static void arm_iommu_sync_single_for_device(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - dma_addr_t iova = handle & PAGE_MASK; - struct page *page; - unsigned int offset = handle & ~PAGE_MASK; + phys_addr_t phys; - if (dev->dma_coherent || !iova) + if (dev->dma_coherent || !(handle & PAGE_MASK)) return; - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_cpu_to_dev(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_device(phys, size, dir); } static const struct dma_map_ops iommu_ops = { @@ -1789,20 +1776,6 @@ void arch_teardown_dma_ops(struct device *dev) set_dma_ops(dev, NULL); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_page_cpu_to_dev(phys_to_page(paddr), paddr & (PAGE_SIZE - 1), - size, dir); -} - -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_page_dev_to_cpu(phys_to_page(paddr), paddr & (PAGE_SIZE - 1), - size, dir); -} - void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The arm specific iommu code in dma-mapping.c uses the page+offset based __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() wrappers around the. In order to be able to move the latter part set of functions into common code, change the iommu implementation to use them directly and remove the internal ones as a separate interface. As page+offset and phys_address are equivalent, but are used in different parts of the code here, this allows removing some of the conversion but adds them elsewhere. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/dma-mapping.c | 93 ++++++++++++++------------------------- 1 file changed, 33 insertions(+), 60 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 8bc01071474a..ce4b74f34a58 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -622,16 +622,14 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, kfree(buf); } -static void dma_cache_maint_page(struct page *page, unsigned long offset, +static void dma_cache_maint(phys_addr_t paddr, size_t size, enum dma_data_direction dir, void (*op)(const void *, size_t, int)) { - unsigned long pfn; + unsigned long pfn = PFN_DOWN(paddr); + unsigned long offset = paddr % PAGE_SIZE; size_t left = size; - pfn = page_to_pfn(page) + offset / PAGE_SIZE; - offset %= PAGE_SIZE; - /* * A single sg entry may refer to multiple physically contiguous * pages. But we still need to process highmem pages individually. @@ -641,8 +639,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, do { size_t len = left; void *vaddr; - - page = pfn_to_page(pfn); + struct page *page = pfn_to_page(pfn); if (PageHighMem(page)) { if (len + offset > PAGE_SIZE) @@ -674,14 +671,11 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, * Note: Drivers should NOT use this function directly. * Use the driver DMA support - see dma-mapping.h (dma_sync_*) */ -static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr; + dma_cache_maint(paddr, size, dir, dmac_map_area); - dma_cache_maint_page(page, off, size, dir, dmac_map_area); - - paddr = page_to_phys(page) + off; if (dir == DMA_FROM_DEVICE) { outer_inv_range(paddr, paddr + size); } else { @@ -690,34 +684,30 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, /* FIXME: non-speculating: flush on bidirectional mappings? */ } -static void __dma_page_dev_to_cpu(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr = page_to_phys(page) + off; - /* FIXME: non-speculating: not required */ /* in any case, don't bother invalidating if DMA to device */ if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dma_cache_maint_page(page, off, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dir, dmac_unmap_area); } /* * Mark the D-cache clean for these pages to avoid extra flushing. */ if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn; + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); size_t left = size; - pfn = page_to_pfn(page) + off / PAGE_SIZE; - off %= PAGE_SIZE; - if (off) { - pfn++; + if (off) left -= PAGE_SIZE - off; - } + while (left >= PAGE_SIZE) { - page = pfn_to_page(pfn++); + struct page *page = pfn_to_page(pfn++); set_bit(PG_dcache_clean, &page->flags); left -= PAGE_SIZE; } @@ -1204,7 +1194,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, unsigned int len = PAGE_ALIGN(s->offset + s->length); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(phys + s->offset, s->length, dir); prot = __dma_info_to_prot(dir, attrs); @@ -1306,8 +1296,7 @@ static void arm_iommu_unmap_sg(struct device *dev, __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_dev_to_cpu(sg_page(s), s->offset, - s->length, dir); + arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); } } @@ -1329,7 +1318,7 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, return; for_each_sg(sg, s, nents, i) - __dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); } @@ -1351,7 +1340,8 @@ static void arm_iommu_sync_sg_for_device(struct device *dev, return; for_each_sg(sg, s, nents, i) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(page_to_phys(sg_page(s)) + s->offset, + s->length, dir); } /** @@ -1373,7 +1363,8 @@ static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page, int ret, prot, len = PAGE_ALIGN(size + offset); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(page, offset, size, dir); + arch_sync_dma_for_device(page_to_phys(page) + offset, + size, dir); dma_addr = __alloc_iova(mapping, len); if (dma_addr == DMA_MAPPING_ERROR) @@ -1406,7 +1397,7 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); dma_addr_t iova = handle & PAGE_MASK; - struct page *page; + phys_addr_t phys; int offset = handle & ~PAGE_MASK; int len = PAGE_ALIGN(size + offset); @@ -1414,8 +1405,8 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, return; if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_dev_to_cpu(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_cpu(phys, size, dir); } iommu_unmap(mapping->domain, iova, len); @@ -1483,30 +1474,26 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - dma_addr_t iova = handle & PAGE_MASK; - struct page *page; - unsigned int offset = handle & ~PAGE_MASK; + phys_addr_t phys; - if (dev->dma_coherent || !iova) + if (dev->dma_coherent || !(handle & PAGE_MASK)) return; - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_dev_to_cpu(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_cpu(phys, size, dir); } static void arm_iommu_sync_single_for_device(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - dma_addr_t iova = handle & PAGE_MASK; - struct page *page; - unsigned int offset = handle & ~PAGE_MASK; + phys_addr_t phys; - if (dev->dma_coherent || !iova) + if (dev->dma_coherent || !(handle & PAGE_MASK)) return; - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_cpu_to_dev(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_device(phys, size, dir); } static const struct dma_map_ops iommu_ops = { @@ -1789,20 +1776,6 @@ void arch_teardown_dma_ops(struct device *dev) set_dma_ops(dev, NULL); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_page_cpu_to_dev(phys_to_page(paddr), paddr & (PAGE_SIZE - 1), - size, dir); -} - -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_page_dev_to_cpu(phys_to_page(paddr), paddr & (PAGE_SIZE - 1), - size, dir); -} - void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> The arm specific iommu code in dma-mapping.c uses the page+offset based __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() wrappers around the. In order to be able to move the latter part set of functions into common code, change the iommu implementation to use them directly and remove the internal ones as a separate interface. As page+offset and phys_address are equivalent, but are used in different parts of the code here, this allows removing some of the conversion but adds them elsewhere. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/dma-mapping.c | 93 ++++++++++++++------------------------- 1 file changed, 33 insertions(+), 60 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 8bc01071474a..ce4b74f34a58 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -622,16 +622,14 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, kfree(buf); } -static void dma_cache_maint_page(struct page *page, unsigned long offset, +static void dma_cache_maint(phys_addr_t paddr, size_t size, enum dma_data_direction dir, void (*op)(const void *, size_t, int)) { - unsigned long pfn; + unsigned long pfn = PFN_DOWN(paddr); + unsigned long offset = paddr % PAGE_SIZE; size_t left = size; - pfn = page_to_pfn(page) + offset / PAGE_SIZE; - offset %= PAGE_SIZE; - /* * A single sg entry may refer to multiple physically contiguous * pages. But we still need to process highmem pages individually. @@ -641,8 +639,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, do { size_t len = left; void *vaddr; - - page = pfn_to_page(pfn); + struct page *page = pfn_to_page(pfn); if (PageHighMem(page)) { if (len + offset > PAGE_SIZE) @@ -674,14 +671,11 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, * Note: Drivers should NOT use this function directly. * Use the driver DMA support - see dma-mapping.h (dma_sync_*) */ -static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr; + dma_cache_maint(paddr, size, dir, dmac_map_area); - dma_cache_maint_page(page, off, size, dir, dmac_map_area); - - paddr = page_to_phys(page) + off; if (dir == DMA_FROM_DEVICE) { outer_inv_range(paddr, paddr + size); } else { @@ -690,34 +684,30 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, /* FIXME: non-speculating: flush on bidirectional mappings? */ } -static void __dma_page_dev_to_cpu(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr = page_to_phys(page) + off; - /* FIXME: non-speculating: not required */ /* in any case, don't bother invalidating if DMA to device */ if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dma_cache_maint_page(page, off, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dir, dmac_unmap_area); } /* * Mark the D-cache clean for these pages to avoid extra flushing. */ if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn; + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); size_t left = size; - pfn = page_to_pfn(page) + off / PAGE_SIZE; - off %= PAGE_SIZE; - if (off) { - pfn++; + if (off) left -= PAGE_SIZE - off; - } + while (left >= PAGE_SIZE) { - page = pfn_to_page(pfn++); + struct page *page = pfn_to_page(pfn++); set_bit(PG_dcache_clean, &page->flags); left -= PAGE_SIZE; } @@ -1204,7 +1194,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, unsigned int len = PAGE_ALIGN(s->offset + s->length); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(phys + s->offset, s->length, dir); prot = __dma_info_to_prot(dir, attrs); @@ -1306,8 +1296,7 @@ static void arm_iommu_unmap_sg(struct device *dev, __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_dev_to_cpu(sg_page(s), s->offset, - s->length, dir); + arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); } } @@ -1329,7 +1318,7 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, return; for_each_sg(sg, s, nents, i) - __dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); } @@ -1351,7 +1340,8 @@ static void arm_iommu_sync_sg_for_device(struct device *dev, return; for_each_sg(sg, s, nents, i) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(page_to_phys(sg_page(s)) + s->offset, + s->length, dir); } /** @@ -1373,7 +1363,8 @@ static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page, int ret, prot, len = PAGE_ALIGN(size + offset); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(page, offset, size, dir); + arch_sync_dma_for_device(page_to_phys(page) + offset, + size, dir); dma_addr = __alloc_iova(mapping, len); if (dma_addr == DMA_MAPPING_ERROR) @@ -1406,7 +1397,7 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); dma_addr_t iova = handle & PAGE_MASK; - struct page *page; + phys_addr_t phys; int offset = handle & ~PAGE_MASK; int len = PAGE_ALIGN(size + offset); @@ -1414,8 +1405,8 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, return; if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_dev_to_cpu(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_cpu(phys, size, dir); } iommu_unmap(mapping->domain, iova, len); @@ -1483,30 +1474,26 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - dma_addr_t iova = handle & PAGE_MASK; - struct page *page; - unsigned int offset = handle & ~PAGE_MASK; + phys_addr_t phys; - if (dev->dma_coherent || !iova) + if (dev->dma_coherent || !(handle & PAGE_MASK)) return; - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_dev_to_cpu(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_cpu(phys, size, dir); } static void arm_iommu_sync_single_for_device(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - dma_addr_t iova = handle & PAGE_MASK; - struct page *page; - unsigned int offset = handle & ~PAGE_MASK; + phys_addr_t phys; - if (dev->dma_coherent || !iova) + if (dev->dma_coherent || !(handle & PAGE_MASK)) return; - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_cpu_to_dev(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_device(phys, size, dir); } static const struct dma_map_ops iommu_ops = { @@ -1789,20 +1776,6 @@ void arch_teardown_dma_ops(struct device *dev) set_dma_ops(dev, NULL); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_page_cpu_to_dev(phys_to_page(paddr), paddr & (PAGE_SIZE - 1), - size, dir); -} - -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_page_dev_to_cpu(phys_to_page(paddr), paddr & (PAGE_SIZE - 1), - size, dir); -} - void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The arm specific iommu code in dma-mapping.c uses the page+offset based __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() wrappers around the. In order to be able to move the latter part set of functions into common code, change the iommu implementation to use them directly and remove the internal ones as a separate interface. As page+offset and phys_address are equivalent, but are used in different parts of the code here, this allows removing some of the conversion but adds them elsewhere. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/dma-mapping.c | 93 ++++++++++++++------------------------- 1 file changed, 33 insertions(+), 60 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 8bc01071474a..ce4b74f34a58 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -622,16 +622,14 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, kfree(buf); } -static void dma_cache_maint_page(struct page *page, unsigned long offset, +static void dma_cache_maint(phys_addr_t paddr, size_t size, enum dma_data_direction dir, void (*op)(const void *, size_t, int)) { - unsigned long pfn; + unsigned long pfn = PFN_DOWN(paddr); + unsigned long offset = paddr % PAGE_SIZE; size_t left = size; - pfn = page_to_pfn(page) + offset / PAGE_SIZE; - offset %= PAGE_SIZE; - /* * A single sg entry may refer to multiple physically contiguous * pages. But we still need to process highmem pages individually. @@ -641,8 +639,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, do { size_t len = left; void *vaddr; - - page = pfn_to_page(pfn); + struct page *page = pfn_to_page(pfn); if (PageHighMem(page)) { if (len + offset > PAGE_SIZE) @@ -674,14 +671,11 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, * Note: Drivers should NOT use this function directly. * Use the driver DMA support - see dma-mapping.h (dma_sync_*) */ -static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr; + dma_cache_maint(paddr, size, dir, dmac_map_area); - dma_cache_maint_page(page, off, size, dir, dmac_map_area); - - paddr = page_to_phys(page) + off; if (dir == DMA_FROM_DEVICE) { outer_inv_range(paddr, paddr + size); } else { @@ -690,34 +684,30 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, /* FIXME: non-speculating: flush on bidirectional mappings? */ } -static void __dma_page_dev_to_cpu(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr = page_to_phys(page) + off; - /* FIXME: non-speculating: not required */ /* in any case, don't bother invalidating if DMA to device */ if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dma_cache_maint_page(page, off, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dir, dmac_unmap_area); } /* * Mark the D-cache clean for these pages to avoid extra flushing. */ if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn; + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); size_t left = size; - pfn = page_to_pfn(page) + off / PAGE_SIZE; - off %= PAGE_SIZE; - if (off) { - pfn++; + if (off) left -= PAGE_SIZE - off; - } + while (left >= PAGE_SIZE) { - page = pfn_to_page(pfn++); + struct page *page = pfn_to_page(pfn++); set_bit(PG_dcache_clean, &page->flags); left -= PAGE_SIZE; } @@ -1204,7 +1194,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, unsigned int len = PAGE_ALIGN(s->offset + s->length); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(phys + s->offset, s->length, dir); prot = __dma_info_to_prot(dir, attrs); @@ -1306,8 +1296,7 @@ static void arm_iommu_unmap_sg(struct device *dev, __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_dev_to_cpu(sg_page(s), s->offset, - s->length, dir); + arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); } } @@ -1329,7 +1318,7 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, return; for_each_sg(sg, s, nents, i) - __dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); } @@ -1351,7 +1340,8 @@ static void arm_iommu_sync_sg_for_device(struct device *dev, return; for_each_sg(sg, s, nents, i) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(page_to_phys(sg_page(s)) + s->offset, + s->length, dir); } /** @@ -1373,7 +1363,8 @@ static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page, int ret, prot, len = PAGE_ALIGN(size + offset); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(page, offset, size, dir); + arch_sync_dma_for_device(page_to_phys(page) + offset, + size, dir); dma_addr = __alloc_iova(mapping, len); if (dma_addr == DMA_MAPPING_ERROR) @@ -1406,7 +1397,7 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); dma_addr_t iova = handle & PAGE_MASK; - struct page *page; + phys_addr_t phys; int offset = handle & ~PAGE_MASK; int len = PAGE_ALIGN(size + offset); @@ -1414,8 +1405,8 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, return; if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_dev_to_cpu(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_cpu(phys, size, dir); } iommu_unmap(mapping->domain, iova, len); @@ -1483,30 +1474,26 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - dma_addr_t iova = handle & PAGE_MASK; - struct page *page; - unsigned int offset = handle & ~PAGE_MASK; + phys_addr_t phys; - if (dev->dma_coherent || !iova) + if (dev->dma_coherent || !(handle & PAGE_MASK)) return; - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_dev_to_cpu(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_cpu(phys, size, dir); } static void arm_iommu_sync_single_for_device(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - dma_addr_t iova = handle & PAGE_MASK; - struct page *page; - unsigned int offset = handle & ~PAGE_MASK; + phys_addr_t phys; - if (dev->dma_coherent || !iova) + if (dev->dma_coherent || !(handle & PAGE_MASK)) return; - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_cpu_to_dev(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_device(phys, size, dir); } static const struct dma_map_ops iommu_ops = { @@ -1789,20 +1776,6 @@ void arch_teardown_dma_ops(struct device *dev) set_dma_ops(dev, NULL); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_page_cpu_to_dev(phys_to_page(paddr), paddr & (PAGE_SIZE - 1), - size, dir); -} - -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_page_dev_to_cpu(phys_to_page(paddr), paddr & (PAGE_SIZE - 1), - size, dir); -} - void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The arm specific iommu code in dma-mapping.c uses the page+offset based __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() wrappers around the. In order to be able to move the latter part set of functions into common code, change the iommu implementation to use them directly and remove the internal ones as a separate interface. As page+offset and phys_address are equivalent, but are used in different parts of the code here, this allows removing some of the conversion but adds them elsewhere. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/dma-mapping.c | 93 ++++++++++++++------------------------- 1 file changed, 33 insertions(+), 60 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 8bc01071474a..ce4b74f34a58 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -622,16 +622,14 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, kfree(buf); } -static void dma_cache_maint_page(struct page *page, unsigned long offset, +static void dma_cache_maint(phys_addr_t paddr, size_t size, enum dma_data_direction dir, void (*op)(const void *, size_t, int)) { - unsigned long pfn; + unsigned long pfn = PFN_DOWN(paddr); + unsigned long offset = paddr % PAGE_SIZE; size_t left = size; - pfn = page_to_pfn(page) + offset / PAGE_SIZE; - offset %= PAGE_SIZE; - /* * A single sg entry may refer to multiple physically contiguous * pages. But we still need to process highmem pages individually. @@ -641,8 +639,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, do { size_t len = left; void *vaddr; - - page = pfn_to_page(pfn); + struct page *page = pfn_to_page(pfn); if (PageHighMem(page)) { if (len + offset > PAGE_SIZE) @@ -674,14 +671,11 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, * Note: Drivers should NOT use this function directly. * Use the driver DMA support - see dma-mapping.h (dma_sync_*) */ -static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr; + dma_cache_maint(paddr, size, dir, dmac_map_area); - dma_cache_maint_page(page, off, size, dir, dmac_map_area); - - paddr = page_to_phys(page) + off; if (dir == DMA_FROM_DEVICE) { outer_inv_range(paddr, paddr + size); } else { @@ -690,34 +684,30 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, /* FIXME: non-speculating: flush on bidirectional mappings? */ } -static void __dma_page_dev_to_cpu(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr = page_to_phys(page) + off; - /* FIXME: non-speculating: not required */ /* in any case, don't bother invalidating if DMA to device */ if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dma_cache_maint_page(page, off, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dir, dmac_unmap_area); } /* * Mark the D-cache clean for these pages to avoid extra flushing. */ if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn; + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); size_t left = size; - pfn = page_to_pfn(page) + off / PAGE_SIZE; - off %= PAGE_SIZE; - if (off) { - pfn++; + if (off) left -= PAGE_SIZE - off; - } + while (left >= PAGE_SIZE) { - page = pfn_to_page(pfn++); + struct page *page = pfn_to_page(pfn++); set_bit(PG_dcache_clean, &page->flags); left -= PAGE_SIZE; } @@ -1204,7 +1194,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, unsigned int len = PAGE_ALIGN(s->offset + s->length); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(phys + s->offset, s->length, dir); prot = __dma_info_to_prot(dir, attrs); @@ -1306,8 +1296,7 @@ static void arm_iommu_unmap_sg(struct device *dev, __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_dev_to_cpu(sg_page(s), s->offset, - s->length, dir); + arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); } } @@ -1329,7 +1318,7 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, return; for_each_sg(sg, s, nents, i) - __dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); } @@ -1351,7 +1340,8 @@ static void arm_iommu_sync_sg_for_device(struct device *dev, return; for_each_sg(sg, s, nents, i) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(page_to_phys(sg_page(s)) + s->offset, + s->length, dir); } /** @@ -1373,7 +1363,8 @@ static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page, int ret, prot, len = PAGE_ALIGN(size + offset); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(page, offset, size, dir); + arch_sync_dma_for_device(page_to_phys(page) + offset, + size, dir); dma_addr = __alloc_iova(mapping, len); if (dma_addr == DMA_MAPPING_ERROR) @@ -1406,7 +1397,7 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); dma_addr_t iova = handle & PAGE_MASK; - struct page *page; + phys_addr_t phys; int offset = handle & ~PAGE_MASK; int len = PAGE_ALIGN(size + offset); @@ -1414,8 +1405,8 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, return; if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_dev_to_cpu(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_cpu(phys, size, dir); } iommu_unmap(mapping->domain, iova, len); @@ -1483,30 +1474,26 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - dma_addr_t iova = handle & PAGE_MASK; - struct page *page; - unsigned int offset = handle & ~PAGE_MASK; + phys_addr_t phys; - if (dev->dma_coherent || !iova) + if (dev->dma_coherent || !(handle & PAGE_MASK)) return; - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_dev_to_cpu(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_cpu(phys, size, dir); } static void arm_iommu_sync_single_for_device(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - dma_addr_t iova = handle & PAGE_MASK; - struct page *page; - unsigned int offset = handle & ~PAGE_MASK; + phys_addr_t phys; - if (dev->dma_coherent || !iova) + if (dev->dma_coherent || !(handle & PAGE_MASK)) return; - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_cpu_to_dev(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_device(phys, size, dir); } static const struct dma_map_ops iommu_ops = { @@ -1789,20 +1776,6 @@ void arch_teardown_dma_ops(struct device *dev) set_dma_ops(dev, NULL); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_page_cpu_to_dev(phys_to_page(paddr), paddr & (PAGE_SIZE - 1), - size, dir); -} - -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_page_dev_to_cpu(phys_to_page(paddr), paddr & (PAGE_SIZE - 1), - size, dir); -} - void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John From: Arnd Bergmann <arnd@arndb.de> The arm specific iommu code in dma-mapping.c uses the page+offset based __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() wrappers around the. In order to be able to move the latter part set of functions into common code, change the iommu implementation to use them directly and remove the internal ones as a separate interface. As page+offset and phys_address are equivalent, but are used in different parts of the code here, this allows removing some of the conversion but adds them elsewhere. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/dma-mapping.c | 93 ++++++++++++++------------------------- 1 file changed, 33 insertions(+), 60 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 8bc01071474a..ce4b74f34a58 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -622,16 +622,14 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, kfree(buf); } -static void dma_cache_maint_page(struct page *page, unsigned long offset, +static void dma_cache_maint(phys_addr_t paddr, size_t size, enum dma_data_direction dir, void (*op)(const void *, size_t, int)) { - unsigned long pfn; + unsigned long pfn = PFN_DOWN(paddr); + unsigned long offset = paddr % PAGE_SIZE; size_t left = size; - pfn = page_to_pfn(page) + offset / PAGE_SIZE; - offset %= PAGE_SIZE; - /* * A single sg entry may refer to multiple physically contiguous * pages. But we still need to process highmem pages individually. @@ -641,8 +639,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, do { size_t len = left; void *vaddr; - - page = pfn_to_page(pfn); + struct page *page = pfn_to_page(pfn); if (PageHighMem(page)) { if (len + offset > PAGE_SIZE) @@ -674,14 +671,11 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, * Note: Drivers should NOT use this function directly. * Use the driver DMA support - see dma-mapping.h (dma_sync_*) */ -static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr; + dma_cache_maint(paddr, size, dir, dmac_map_area); - dma_cache_maint_page(page, off, size, dir, dmac_map_area); - - paddr = page_to_phys(page) + off; if (dir == DMA_FROM_DEVICE) { outer_inv_range(paddr, paddr + size); } else { @@ -690,34 +684,30 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, /* FIXME: non-speculating: flush on bidirectional mappings? */ } -static void __dma_page_dev_to_cpu(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr = page_to_phys(page) + off; - /* FIXME: non-speculating: not required */ /* in any case, don't bother invalidating if DMA to device */ if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dma_cache_maint_page(page, off, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dir, dmac_unmap_area); } /* * Mark the D-cache clean for these pages to avoid extra flushing. */ if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn; + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); size_t left = size; - pfn = page_to_pfn(page) + off / PAGE_SIZE; - off %= PAGE_SIZE; - if (off) { - pfn++; + if (off) left -= PAGE_SIZE - off; - } + while (left >= PAGE_SIZE) { - page = pfn_to_page(pfn++); + struct page *page = pfn_to_page(pfn++); set_bit(PG_dcache_clean, &page->flags); left -= PAGE_SIZE; } @@ -1204,7 +1194,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, unsigned int len = PAGE_ALIGN(s->offset + s->length); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(phys + s->offset, s->length, dir); prot = __dma_info_to_prot(dir, attrs); @@ -1306,8 +1296,7 @@ static void arm_iommu_unmap_sg(struct device *dev, __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_dev_to_cpu(sg_page(s), s->offset, - s->length, dir); + arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); } } @@ -1329,7 +1318,7 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, return; for_each_sg(sg, s, nents, i) - __dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); } @@ -1351,7 +1340,8 @@ static void arm_iommu_sync_sg_for_device(struct device *dev, return; for_each_sg(sg, s, nents, i) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(page_to_phys(sg_page(s)) + s->offset, + s->length, dir); } /** @@ -1373,7 +1363,8 @@ static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page, int ret, prot, len = PAGE_ALIGN(size + offset); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(page, offset, size, dir); + arch_sync_dma_for_device(page_to_phys(page) + offset, + size, dir); dma_addr = __alloc_iova(mapping, len); if (dma_addr == DMA_MAPPING_ERROR) @@ -1406,7 +1397,7 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); dma_addr_t iova = handle & PAGE_MASK; - struct page *page; + phys_addr_t phys; int offset = handle & ~PAGE_MASK; int len = PAGE_ALIGN(size + offset); @@ -1414,8 +1405,8 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, return; if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_dev_to_cpu(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_cpu(phys, size, dir); } iommu_unmap(mapping->domain, iova, len); @@ -1483,30 +1474,26 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - dma_addr_t iova = handle & PAGE_MASK; - struct page *page; - unsigned int offset = handle & ~PAGE_MASK; + phys_addr_t phys; - if (dev->dma_coherent || !iova) + if (dev->dma_coherent || !(handle & PAGE_MASK)) return; - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_dev_to_cpu(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_cpu(phys, size, dir); } static void arm_iommu_sync_single_for_device(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - dma_addr_t iova = handle & PAGE_MASK; - struct page *page; - unsigned int offset = handle & ~PAGE_MASK; + phys_addr_t phys; - if (dev->dma_coherent || !iova) + if (dev->dma_coherent || !(handle & PAGE_MASK)) return; - page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); - __dma_page_cpu_to_dev(page, offset, size, dir); + phys = iommu_iova_to_phys(mapping->domain, handle); + arch_sync_dma_for_device(phys, size, dir); } static const struct dma_map_ops iommu_ops = { @@ -1789,20 +1776,6 @@ void arch_teardown_dma_ops(struct device *dev) set_dma_ops(dev, NULL); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_page_cpu_to_dev(phys_to_page(paddr), paddr & (PAGE_SIZE - 1), - size, dir); -} - -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_page_dev_to_cpu(phys_to_page(paddr), paddr & (PAGE_SIZE - 1), - size, dir); -} ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-31 9:10 ` Linus Walleij -1 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-31 9:10 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm specific iommu code in dma-mapping.c uses the page+offset based > __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the > phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() > wrappers around the. Broken sentence? > In order to be able to move the latter part set of functions into > common code, change the iommu implementation to use them directly > and remove the internal ones as a separate interface. > > As page+offset and phys_address are equivalent, but are used in > different parts of the code here, this allows removing some of > the conversion but adds them elsewhere. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Looks good to me, took me some time to verify and understand the open-coded version of PFN_UP() and this refactoring alone makes the patch highly valuable. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-31 9:10 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-31 9:10 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm specific iommu code in dma-mapping.c uses the page+offset based > __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the > phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() > wrappers around the. Broken sentence? > In order to be able to move the latter part set of functions into > common code, change the iommu implementation to use them directly > and remove the internal ones as a separate interface. > > As page+offset and phys_address are equivalent, but are used in > different parts of the code here, this allows removing some of > the conversion but adds them elsewhere. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Looks good to me, took me some time to verify and understand the open-coded version of PFN_UP() and this refactoring alone makes the patch highly valuable. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-31 9:10 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-31 9:10 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm specific iommu code in dma-mapping.c uses the page+offset based > __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the > phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() > wrappers around the. Broken sentence? > In order to be able to move the latter part set of functions into > common code, change the iommu implementation to use them directly > and remove the internal ones as a separate interface. > > As page+offset and phys_address are equivalent, but are used in > different parts of the code here, this allows removing some of > the conversion but adds them elsewhere. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Looks good to me, took me some time to verify and understand the open-coded version of PFN_UP() and this refactoring alone makes the patch highly valuable. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-31 9:10 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-31 9:10 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm specific iommu code in dma-mapping.c uses the page+offset based > __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the > phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() > wrappers around the. Broken sentence? > In order to be able to move the latter part set of functions into > common code, change the iommu implementation to use them directly > and remove the internal ones as a separate interface. > > As page+offset and phys_address are equivalent, but are used in > different parts of the code here, this allows removing some of > the conversion but adds them elsewhere. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Looks good to me, took me some time to verify and understand the open-coded version of PFN_UP() and this refactoring alone makes the patch highly valuable. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-31 9:10 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-31 9:10 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm specific iommu code in dma-mapping.c uses the page+offset based > __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the > phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() > wrappers around the. Broken sentence? > In order to be able to move the latter part set of functions into > common code, change the iommu implementation to use them directly > and remove the internal ones as a separate interface. > > As page+offset and phys_address are equivalent, but are used in > different parts of the code here, this allows removing some of > the conversion but adds them elsewhere. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Looks good to me, took me some time to verify and understand the open-coded version of PFN_UP() and this refactoring alone makes the patch highly valuable. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-31 9:10 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-31 9:10 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm specific iommu code in dma-mapping.c uses the page+offset based > __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the > phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() > wrappers around the. Broken sentence? > In order to be able to move the latter part set of functions into > common code, change the iommu implementation to use them directly > and remove the internal ones as a separate interface. > > As page+offset and phys_address are equivalent, but are used in > different parts of the code here, this allows removing some of > the conversion but adds them elsewhere. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Looks good to me, took me some time to verify and understand the open-coded version of PFN_UP() and this refactoring alone makes the patch highly valuable. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally 2023-03-31 9:10 ` Linus Walleij ` (3 preceding siblings ...) (?) @ 2023-03-31 12:48 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:48 UTC (permalink / raw) To: Linus Walleij, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 11:10, Linus Walleij wrote: > On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > >> From: Arnd Bergmann <arnd@arndb.de> >> >> The arm specific iommu code in dma-mapping.c uses the page+offset based >> __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the >> phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() >> wrappers around the. > > Broken sentence? I've changed s/the/them/ now, at least I think that's what I meant to write in the first place. >> In order to be able to move the latter part set of functions into >> common code, change the iommu implementation to use them directly >> and remove the internal ones as a separate interface. >> >> As page+offset and phys_address are equivalent, but are used in >> different parts of the code here, this allows removing some of >> the conversion but adds them elsewhere. >> >> Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > Looks good to me, took me some time to verify and understand > the open-coded version of PFN_UP() and this refactoring alone > makes the patch highly valuable. > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Thanks! ARnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-31 12:48 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:48 UTC (permalink / raw) To: Linus Walleij, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 11:10, Linus Walleij wrote: > On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > >> From: Arnd Bergmann <arnd@arndb.de> >> >> The arm specific iommu code in dma-mapping.c uses the page+offset based >> __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the >> phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() >> wrappers around the. > > Broken sentence? I've changed s/the/them/ now, at least I think that's what I meant to write in the first place. >> In order to be able to move the latter part set of functions into >> common code, change the iommu implementation to use them directly >> and remove the internal ones as a separate interface. >> >> As page+offset and phys_address are equivalent, but are used in >> different parts of the code here, this allows removing some of >> the conversion but adds them elsewhere. >> >> Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > Looks good to me, took me some time to verify and understand > the open-coded version of PFN_UP() and this refactoring alone > makes the patch highly valuable. > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Thanks! ARnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-31 12:48 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:48 UTC (permalink / raw) To: Linus Walleij, Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Brian Cain, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Fri, Mar 31, 2023, at 11:10, Linus Walleij wrote: > On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > >> From: Arnd Bergmann <arnd@arndb.de> >> >> The arm specific iommu code in dma-mapping.c uses the page+offset based >> __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the >> phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() >> wrappers around the. > > Broken sentence? I've changed s/the/them/ now, at least I think that's what I meant to write in the first place. >> In order to be able to move the latter part set of functions into >> common code, change the iommu implementation to use them directly >> and remove the internal ones as a separate interface. >> >> As page+offset and phys_address are equivalent, but are used in >> different parts of the code here, this allows removing some of >> the conversion but adds them elsewhere. >> >> Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > Looks good to me, took me some time to verify and understand > the open-coded version of PFN_UP() and this refactoring alone > makes the patch highly valuable. > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Thanks! ARnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-31 12:48 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:48 UTC (permalink / raw) To: Linus Walleij, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 11:10, Linus Walleij wrote: > On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > >> From: Arnd Bergmann <arnd@arndb.de> >> >> The arm specific iommu code in dma-mapping.c uses the page+offset based >> __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the >> phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() >> wrappers around the. > > Broken sentence? I've changed s/the/them/ now, at least I think that's what I meant to write in the first place. >> In order to be able to move the latter part set of functions into >> common code, change the iommu implementation to use them directly >> and remove the internal ones as a separate interface. >> >> As page+offset and phys_address are equivalent, but are used in >> different parts of the code here, this allows removing some of >> the conversion but adds them elsewhere. >> >> Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > Looks good to me, took me some time to verify and understand > the open-coded version of PFN_UP() and this refactoring alone > makes the patch highly valuable. > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Thanks! ARnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-31 12:48 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:48 UTC (permalink / raw) To: Linus Walleij, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 11:10, Linus Walleij wrote: > On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > >> From: Arnd Bergmann <arnd@arndb.de> >> >> The arm specific iommu code in dma-mapping.c uses the page+offset based >> __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the >> phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() >> wrappers around the. > > Broken sentence? I've changed s/the/them/ now, at least I think that's what I meant to write in the first place. >> In order to be able to move the latter part set of functions into >> common code, change the iommu implementation to use them directly >> and remove the internal ones as a separate interface. >> >> As page+offset and phys_address are equivalent, but are used in >> different parts of the code here, this allows removing some of >> the conversion but adds them elsewhere. >> >> Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > Looks good to me, took me some time to verify and understand > the open-coded version of PFN_UP() and this refactoring alone > makes the patch highly valuable. > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Thanks! ARnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally @ 2023-03-31 12:48 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:48 UTC (permalink / raw) To: Linus Walleij, Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Brian Cain, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linu On Fri, Mar 31, 2023, at 11:10, Linus Walleij wrote: > On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > >> From: Arnd Bergmann <arnd@arndb.de> >> >> The arm specific iommu code in dma-mapping.c uses the page+offset based >> __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the >> phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() >> wrappers around the. > > Broken sentence? I've changed s/the/them/ now, at least I think that's what I meant to write in the first place. >> In order to be able to move the latter part set of functions into >> common code, change the iommu implementation to use them directly >> and remove the internal ones as a separate interface. >> >> As page+offset and phys_address are equivalent, but are used in >> different parts of the code here, this allows removing some of >> the conversion but adds them elsewhere. >> >> Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > Looks good to me, took me some time to verify and understand > the open-coded version of PFN_UP() and this refactoring alone > makes the patch highly valuable. > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Thanks! ARnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 18/21] ARM: drop SMP support for ARM11MPCore 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle From: Arnd Bergmann <arnd@arndb.de> The cache management operations for noncoherent DMA on ARMv6 work in two different ways: * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight DMA buffers lead to data corruption when the prefetched data is written back on top of data from the device. * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU is not seen by the other core(s), leading to inconsistent contents accross the system. As a consequence, neither configuration is actually safe to use in a general-purpose kernel that is used on both MPCore systems and ARM1176 with prefetching enabled. We could add further workarounds to make the behavior more dynamic based on the system, but realistically, there are close to zero remaining users on any ARM11MPCore anyway, and nobody seems too interested in it, compared to the more popular ARM1176 used in BMC2835 and AST2500. The Oxnas platform has some minimal support in OpenWRT, but most of the drivers and dts files never made it into the mainline kernel, while the Arm Versatile/Realview platform mainly serves as a reference system but is not necessary to be kept working once all other ARM11MPCore are gone. Take the easy way out here and drop support for multiprocessing on ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache management implementation for it. This also helps with other ARMv6 issues, but for the moment leaves the ability to build a kernel that can run on both ARMv7 SMP and single-processor ARMv6, which we probably want to stop supporting as well, but not as part of this series. Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Daniel Golle <daniel@makrotopia.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: linux-oxnas@groups.io Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I could use some help clarifying the above changelog text to describe the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching into the instruction cache, not the data cache, but this can end up in the outercache as a result. The 1176 has some extra control bits to control prefetching, but I found no reference that explains why an MPCore does not run into the problem. --- arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 ------ arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 ----- arch/arm/mm/cache-v6.S | 31 ------- 7 files changed, 178 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig index a9ded7079268..a054235c3d6c 100644 --- a/arch/arm/mach-oxnas/Kconfig +++ b/arch/arm/mach-oxnas/Kconfig @@ -28,10 +28,6 @@ config MACH_OX820 bool "Support OX820 Based Products" depends on ARCH_MULTI_V6 select ARM_GIC - select DMA_CACHE_RWFO if SMP - select HAVE_SMP - select HAVE_ARM_SCU if SMP - select HAVE_ARM_TWD if SMP help Include Support for the Oxford Semiconductor OX820 SoC Based Products. diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile index 0e78ecfe6c49..a4e40e534e6a 100644 --- a/arch/arm/mach-oxnas/Makefile +++ b/arch/arm/mach-oxnas/Makefile @@ -1,2 +1 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_SMP) += platsmp.o headsmp.o diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S deleted file mode 100644 index 9c0f1479f33a..000000000000 --- a/arch/arm/mach-oxnas/headsmp.S +++ /dev/null @@ -1,23 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> - * Copyright (c) 2003 ARM Limited - * All Rights Reserved - */ -#include <linux/linkage.h> -#include <linux/init.h> - - __INIT - -/* - * OX820 specific entry point for secondary CPUs. - */ -ENTRY(ox820_secondary_startup) - mov r4, #0 - /* invalidate both caches and branch target cache */ - mcr p15, 0, r4, c7, c7, 0 - /* - * we've been released from the holding pen: secondary_stack - * should now contain the SVC stack for this core - */ - b secondary_startup diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c deleted file mode 100644 index f0a50b9e61df..000000000000 --- a/arch/arm/mach-oxnas/platsmp.c +++ /dev/null @@ -1,96 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> - * Copyright (C) 2002 ARM Ltd. - * All Rights Reserved - */ -#include <linux/io.h> -#include <linux/delay.h> -#include <linux/of.h> -#include <linux/of_address.h> - -#include <asm/cacheflush.h> -#include <asm/cp15.h> -#include <asm/smp_plat.h> -#include <asm/smp_scu.h> - -extern void ox820_secondary_startup(void); - -static void __iomem *cpu_ctrl; -static void __iomem *gic_cpu_ctrl; - -#define HOLDINGPEN_CPU_OFFSET 0xc8 -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 - -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) -#define GIC_CPU_CTRL 0x00 -#define GIC_CPU_CTRL_ENABLE 1 - -static int __init ox820_boot_secondary(unsigned int cpu, - struct task_struct *idle) -{ - /* - * Write the address of secondary startup into the - * system-wide flags register. The BootMonitor waits - * until it receives a soft interrupt, and then the - * secondary CPU branches to this address. - */ - writel(virt_to_phys(ox820_secondary_startup), - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); - - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); - - /* - * Enable GIC cpu interface in CPU Interface Control Register - */ - writel(GIC_CPU_CTRL_ENABLE, - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); - - /* - * Send the secondary CPU a soft interrupt, thereby causing - * the boot monitor to read the system wide flags register, - * and branch to the address found there. - */ - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); - - return 0; -} - -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) -{ - struct device_node *np; - void __iomem *scu_base; - - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); - scu_base = of_iomap(np, 0); - of_node_put(np); - if (!scu_base) - return; - - /* Remap CPU Interrupt Interface Registers */ - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); - gic_cpu_ctrl = of_iomap(np, 1); - of_node_put(np); - if (!gic_cpu_ctrl) - goto unmap_scu; - - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); - cpu_ctrl = of_iomap(np, 0); - of_node_put(np); - if (!cpu_ctrl) - goto unmap_scu; - - scu_enable(scu_base); - flush_cache_all(); - -unmap_scu: - iounmap(scu_base); -} - -static const struct smp_operations ox820_smp_ops __initconst = { - .smp_prepare_cpus = ox820_smp_prepare_cpus, - .smp_boot_secondary = ox820_boot_secondary, -}; - -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c index 5d363385c801..fa31fd2d211d 100644 --- a/arch/arm/mach-versatile/platsmp-realview.c +++ b/arch/arm/mach-versatile/platsmp-realview.c @@ -18,16 +18,12 @@ #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 static const struct of_device_id realview_scu_match[] = { - { .compatible = "arm,arm11mp-scu", }, { .compatible = "arm,cortex-a9-scu", }, { .compatible = "arm,cortex-a5-scu", }, { } }; static const struct of_device_id realview_syscon_match[] = { - { .compatible = "arm,core-module-integrator", }, - { .compatible = "arm,realview-eb-syscon", }, - { .compatible = "arm,realview-pb11mp-syscon", }, { .compatible = "arm,realview-pbx-syscon", }, { }, }; diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index c5bbae86f725..16b62bc0a970 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -937,25 +937,6 @@ config VDSO You must have glibc 2.22 or later for programs to seamlessly take advantage of this. -config DMA_CACHE_RWFO - bool "Enable read/write for ownership DMA cache maintenance" - depends on CPU_V6K && SMP - default y - help - The Snoop Control Unit on ARM11MPCore does not detect the - cache maintenance operations and the dma_{map,unmap}_area() - functions may leave stale cache entries on other CPUs. By - enabling this option, Read or Write For Ownership in the ARMv6 - DMA cache maintenance functions is performed. These LDR/STR - instructions change the cache line state to shared or modified - so that the cache operation has the desired effect. - - Note that the workaround is only valid on processors that do - not perform speculative loads into the D-cache. For such - processors, if cache maintenance operations are not broadcast - in hardware, other workarounds are needed (e.g. cache - maintenance broadcasting in software via FIQ). - config OUTER_CACHE bool diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index abae7ff5defc..f6ee53c1de20 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) * - end - virtual end address of region */ ENTRY(v6_dma_inv_range) -#ifdef CONFIG_DMA_CACHE_RWFO - ldrb r2, [r0] @ read for ownership - strb r2, [r0] @ write for ownership -#endif tst r0, #D_CACHE_LINE_SIZE - 1 bic r0, r0, #D_CACHE_LINE_SIZE - 1 #ifdef HARVARD_CACHE @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) mcrne p15, 0, r0, c7, c11, 1 @ clean unified line #endif tst r1, #D_CACHE_LINE_SIZE - 1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrbne r2, [r1, #-1] @ read for ownership - strbne r2, [r1, #-1] @ write for ownership -#endif bic r1, r1, #D_CACHE_LINE_SIZE - 1 #ifdef HARVARD_CACHE mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) #endif add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrlo r2, [r0] @ read for ownership - strlo r2, [r0] @ write for ownership -#endif blo 1b mov r0, #0 mcr p15, 0, r0, c7, c10, 4 @ drain write buffer @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) ENTRY(v6_dma_clean_range) bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: -#ifdef CONFIG_DMA_CACHE_RWFO - ldr r2, [r0] @ read for ownership -#endif #ifdef HARVARD_CACHE mcr p15, 0, r0, c7, c10, 1 @ clean D line #else @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) * - end - virtual end address of region */ ENTRY(v6_dma_flush_range) -#ifdef CONFIG_DMA_CACHE_RWFO - ldrb r2, [r0] @ read for ownership - strb r2, [r0] @ write for ownership -#endif bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: #ifdef HARVARD_CACHE @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) #endif add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrblo r2, [r0] @ read for ownership - strblo r2, [r0] @ write for ownership -#endif blo 1b mov r0, #0 mcr p15, 0, r0, c7, c10, 4 @ drain write buffer @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) add r1, r1, r0 teq r2, #DMA_FROM_DEVICE beq v6_dma_inv_range -#ifndef CONFIG_DMA_CACHE_RWFO b v6_dma_clean_range -#else - teq r2, #DMA_TO_DEVICE - beq v6_dma_clean_range - b v6_dma_flush_range -#endif ENDPROC(v6_dma_map_area) /* @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) * - dir - DMA direction */ ENTRY(v6_dma_unmap_area) -#ifndef CONFIG_DMA_CACHE_RWFO add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v6_dma_inv_range -#endif ret lr ENDPROC(v6_dma_unmap_area) -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle From: Arnd Bergmann <arnd@arndb.de> The cache management operations for noncoherent DMA on ARMv6 work in two different ways: * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight DMA buffers lead to data corruption when the prefetched data is written back on top of data from the device. * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU is not seen by the other core(s), leading to inconsistent contents accross the system. As a consequence, neither configuration is actually safe to use in a general-purpose kernel that is used on both MPCore systems and ARM1176 with prefetching enabled. We could add further workarounds to make the behavior more dynamic based on the system, but realistically, there are close to zero remaining users on any ARM11MPCore anyway, and nobody seems too interested in it, compared to the more popular ARM1176 used in BMC2835 and AST2500. The Oxnas platform has some minimal support in OpenWRT, but most of the drivers and dts files never made it into the mainline kernel, while the Arm Versatile/Realview platform mainly serves as a reference system but is not necessary to be kept working once all other ARM11MPCore are gone. Take the easy way out here and drop support for multiprocessing on ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache management implementation for it. This also helps with other ARMv6 issues, but for the moment leaves the ability to build a kernel that can run on both ARMv7 SMP and single-processor ARMv6, which we probably want to stop supporting as well, but not as part of this series. Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Daniel Golle <daniel@makrotopia.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: linux-oxnas@groups.io Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I could use some help clarifying the above changelog text to describe the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching into the instruction cache, not the data cache, but this can end up in the outercache as a result. The 1176 has some extra control bits to control prefetching, but I found no reference that explains why an MPCore does not run into the problem. --- arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 ------ arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 ----- arch/arm/mm/cache-v6.S | 31 ------- 7 files changed, 178 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig index a9ded7079268..a054235c3d6c 100644 --- a/arch/arm/mach-oxnas/Kconfig +++ b/arch/arm/mach-oxnas/Kconfig @@ -28,10 +28,6 @@ config MACH_OX820 bool "Support OX820 Based Products" depends on ARCH_MULTI_V6 select ARM_GIC - select DMA_CACHE_RWFO if SMP - select HAVE_SMP - select HAVE_ARM_SCU if SMP - select HAVE_ARM_TWD if SMP help Include Support for the Oxford Semiconductor OX820 SoC Based Products. diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile index 0e78ecfe6c49..a4e40e534e6a 100644 --- a/arch/arm/mach-oxnas/Makefile +++ b/arch/arm/mach-oxnas/Makefile @@ -1,2 +1 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_SMP) += platsmp.o headsmp.o diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S deleted file mode 100644 index 9c0f1479f33a..000000000000 --- a/arch/arm/mach-oxnas/headsmp.S +++ /dev/null @@ -1,23 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> - * Copyright (c) 2003 ARM Limited - * All Rights Reserved - */ -#include <linux/linkage.h> -#include <linux/init.h> - - __INIT - -/* - * OX820 specific entry point for secondary CPUs. - */ -ENTRY(ox820_secondary_startup) - mov r4, #0 - /* invalidate both caches and branch target cache */ - mcr p15, 0, r4, c7, c7, 0 - /* - * we've been released from the holding pen: secondary_stack - * should now contain the SVC stack for this core - */ - b secondary_startup diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c deleted file mode 100644 index f0a50b9e61df..000000000000 --- a/arch/arm/mach-oxnas/platsmp.c +++ /dev/null @@ -1,96 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> - * Copyright (C) 2002 ARM Ltd. - * All Rights Reserved - */ -#include <linux/io.h> -#include <linux/delay.h> -#include <linux/of.h> -#include <linux/of_address.h> - -#include <asm/cacheflush.h> -#include <asm/cp15.h> -#include <asm/smp_plat.h> -#include <asm/smp_scu.h> - -extern void ox820_secondary_startup(void); - -static void __iomem *cpu_ctrl; -static void __iomem *gic_cpu_ctrl; - -#define HOLDINGPEN_CPU_OFFSET 0xc8 -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 - -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) -#define GIC_CPU_CTRL 0x00 -#define GIC_CPU_CTRL_ENABLE 1 - -static int __init ox820_boot_secondary(unsigned int cpu, - struct task_struct *idle) -{ - /* - * Write the address of secondary startup into the - * system-wide flags register. The BootMonitor waits - * until it receives a soft interrupt, and then the - * secondary CPU branches to this address. - */ - writel(virt_to_phys(ox820_secondary_startup), - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); - - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); - - /* - * Enable GIC cpu interface in CPU Interface Control Register - */ - writel(GIC_CPU_CTRL_ENABLE, - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); - - /* - * Send the secondary CPU a soft interrupt, thereby causing - * the boot monitor to read the system wide flags register, - * and branch to the address found there. - */ - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); - - return 0; -} - -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) -{ - struct device_node *np; - void __iomem *scu_base; - - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); - scu_base = of_iomap(np, 0); - of_node_put(np); - if (!scu_base) - return; - - /* Remap CPU Interrupt Interface Registers */ - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); - gic_cpu_ctrl = of_iomap(np, 1); - of_node_put(np); - if (!gic_cpu_ctrl) - goto unmap_scu; - - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); - cpu_ctrl = of_iomap(np, 0); - of_node_put(np); - if (!cpu_ctrl) - goto unmap_scu; - - scu_enable(scu_base); - flush_cache_all(); - -unmap_scu: - iounmap(scu_base); -} - -static const struct smp_operations ox820_smp_ops __initconst = { - .smp_prepare_cpus = ox820_smp_prepare_cpus, - .smp_boot_secondary = ox820_boot_secondary, -}; - -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c index 5d363385c801..fa31fd2d211d 100644 --- a/arch/arm/mach-versatile/platsmp-realview.c +++ b/arch/arm/mach-versatile/platsmp-realview.c @@ -18,16 +18,12 @@ #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 static const struct of_device_id realview_scu_match[] = { - { .compatible = "arm,arm11mp-scu", }, { .compatible = "arm,cortex-a9-scu", }, { .compatible = "arm,cortex-a5-scu", }, { } }; static const struct of_device_id realview_syscon_match[] = { - { .compatible = "arm,core-module-integrator", }, - { .compatible = "arm,realview-eb-syscon", }, - { .compatible = "arm,realview-pb11mp-syscon", }, { .compatible = "arm,realview-pbx-syscon", }, { }, }; diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index c5bbae86f725..16b62bc0a970 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -937,25 +937,6 @@ config VDSO You must have glibc 2.22 or later for programs to seamlessly take advantage of this. -config DMA_CACHE_RWFO - bool "Enable read/write for ownership DMA cache maintenance" - depends on CPU_V6K && SMP - default y - help - The Snoop Control Unit on ARM11MPCore does not detect the - cache maintenance operations and the dma_{map,unmap}_area() - functions may leave stale cache entries on other CPUs. By - enabling this option, Read or Write For Ownership in the ARMv6 - DMA cache maintenance functions is performed. These LDR/STR - instructions change the cache line state to shared or modified - so that the cache operation has the desired effect. - - Note that the workaround is only valid on processors that do - not perform speculative loads into the D-cache. For such - processors, if cache maintenance operations are not broadcast - in hardware, other workarounds are needed (e.g. cache - maintenance broadcasting in software via FIQ). - config OUTER_CACHE bool diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index abae7ff5defc..f6ee53c1de20 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) * - end - virtual end address of region */ ENTRY(v6_dma_inv_range) -#ifdef CONFIG_DMA_CACHE_RWFO - ldrb r2, [r0] @ read for ownership - strb r2, [r0] @ write for ownership -#endif tst r0, #D_CACHE_LINE_SIZE - 1 bic r0, r0, #D_CACHE_LINE_SIZE - 1 #ifdef HARVARD_CACHE @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) mcrne p15, 0, r0, c7, c11, 1 @ clean unified line #endif tst r1, #D_CACHE_LINE_SIZE - 1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrbne r2, [r1, #-1] @ read for ownership - strbne r2, [r1, #-1] @ write for ownership -#endif bic r1, r1, #D_CACHE_LINE_SIZE - 1 #ifdef HARVARD_CACHE mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) #endif add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrlo r2, [r0] @ read for ownership - strlo r2, [r0] @ write for ownership -#endif blo 1b mov r0, #0 mcr p15, 0, r0, c7, c10, 4 @ drain write buffer @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) ENTRY(v6_dma_clean_range) bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: -#ifdef CONFIG_DMA_CACHE_RWFO - ldr r2, [r0] @ read for ownership -#endif #ifdef HARVARD_CACHE mcr p15, 0, r0, c7, c10, 1 @ clean D line #else @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) * - end - virtual end address of region */ ENTRY(v6_dma_flush_range) -#ifdef CONFIG_DMA_CACHE_RWFO - ldrb r2, [r0] @ read for ownership - strb r2, [r0] @ write for ownership -#endif bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: #ifdef HARVARD_CACHE @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) #endif add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrblo r2, [r0] @ read for ownership - strblo r2, [r0] @ write for ownership -#endif blo 1b mov r0, #0 mcr p15, 0, r0, c7, c10, 4 @ drain write buffer @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) add r1, r1, r0 teq r2, #DMA_FROM_DEVICE beq v6_dma_inv_range -#ifndef CONFIG_DMA_CACHE_RWFO b v6_dma_clean_range -#else - teq r2, #DMA_TO_DEVICE - beq v6_dma_clean_range - b v6_dma_flush_range -#endif ENDPROC(v6_dma_map_area) /* @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) * - dir - DMA direction */ ENTRY(v6_dma_unmap_area) -#ifndef CONFIG_DMA_CACHE_RWFO add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v6_dma_inv_range -#endif ret lr ENDPROC(v6_dma_unmap_area) -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Daniel Golle, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong <n From: Arnd Bergmann <arnd@arndb.de> The cache management operations for noncoherent DMA on ARMv6 work in two different ways: * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight DMA buffers lead to data corruption when the prefetched data is written back on top of data from the device. * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU is not seen by the other core(s), leading to inconsistent contents accross the system. As a consequence, neither configuration is actually safe to use in a general-purpose kernel that is used on both MPCore systems and ARM1176 with prefetching enabled. We could add further workarounds to make the behavior more dynamic based on the system, but realistically, there are close to zero remaining users on any ARM11MPCore anyway, and nobody seems too interested in it, compared to the more popular ARM1176 used in BMC2835 and AST2500. The Oxnas platform has some minimal support in OpenWRT, but most of the drivers and dts files never made it into the mainline kernel, while the Arm Versatile/Realview platform mainly serves as a reference system but is not necessary to be kept working once all other ARM11MPCore are gone. Take the easy way out here and drop support for multiprocessing on ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache management implementation for it. This also helps with other ARMv6 issues, but for the moment leaves the ability to build a kernel that can run on both ARMv7 SMP and single-processor ARMv6, which we probably want to stop supporting as well, but not as part of this series. Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Daniel Golle <daniel@makrotopia.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: linux-oxnas@groups.io Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I could use some help clarifying the above changelog text to describe the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching into the instruction cache, not the data cache, but this can end up in the outercache as a result. The 1176 has some extra control bits to control prefetching, but I found no reference that explains why an MPCore does not run into the problem. --- arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 ------ arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 ----- arch/arm/mm/cache-v6.S | 31 ------- 7 files changed, 178 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig index a9ded7079268..a054235c3d6c 100644 --- a/arch/arm/mach-oxnas/Kconfig +++ b/arch/arm/mach-oxnas/Kconfig @@ -28,10 +28,6 @@ config MACH_OX820 bool "Support OX820 Based Products" depends on ARCH_MULTI_V6 select ARM_GIC - select DMA_CACHE_RWFO if SMP - select HAVE_SMP - select HAVE_ARM_SCU if SMP - select HAVE_ARM_TWD if SMP help Include Support for the Oxford Semiconductor OX820 SoC Based Products. diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile index 0e78ecfe6c49..a4e40e534e6a 100644 --- a/arch/arm/mach-oxnas/Makefile +++ b/arch/arm/mach-oxnas/Makefile @@ -1,2 +1 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_SMP) += platsmp.o headsmp.o diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S deleted file mode 100644 index 9c0f1479f33a..000000000000 --- a/arch/arm/mach-oxnas/headsmp.S +++ /dev/null @@ -1,23 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> - * Copyright (c) 2003 ARM Limited - * All Rights Reserved - */ -#include <linux/linkage.h> -#include <linux/init.h> - - __INIT - -/* - * OX820 specific entry point for secondary CPUs. - */ -ENTRY(ox820_secondary_startup) - mov r4, #0 - /* invalidate both caches and branch target cache */ - mcr p15, 0, r4, c7, c7, 0 - /* - * we've been released from the holding pen: secondary_stack - * should now contain the SVC stack for this core - */ - b secondary_startup diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c deleted file mode 100644 index f0a50b9e61df..000000000000 --- a/arch/arm/mach-oxnas/platsmp.c +++ /dev/null @@ -1,96 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> - * Copyright (C) 2002 ARM Ltd. - * All Rights Reserved - */ -#include <linux/io.h> -#include <linux/delay.h> -#include <linux/of.h> -#include <linux/of_address.h> - -#include <asm/cacheflush.h> -#include <asm/cp15.h> -#include <asm/smp_plat.h> -#include <asm/smp_scu.h> - -extern void ox820_secondary_startup(void); - -static void __iomem *cpu_ctrl; -static void __iomem *gic_cpu_ctrl; - -#define HOLDINGPEN_CPU_OFFSET 0xc8 -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 - -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) -#define GIC_CPU_CTRL 0x00 -#define GIC_CPU_CTRL_ENABLE 1 - -static int __init ox820_boot_secondary(unsigned int cpu, - struct task_struct *idle) -{ - /* - * Write the address of secondary startup into the - * system-wide flags register. The BootMonitor waits - * until it receives a soft interrupt, and then the - * secondary CPU branches to this address. - */ - writel(virt_to_phys(ox820_secondary_startup), - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); - - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); - - /* - * Enable GIC cpu interface in CPU Interface Control Register - */ - writel(GIC_CPU_CTRL_ENABLE, - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); - - /* - * Send the secondary CPU a soft interrupt, thereby causing - * the boot monitor to read the system wide flags register, - * and branch to the address found there. - */ - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); - - return 0; -} - -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) -{ - struct device_node *np; - void __iomem *scu_base; - - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); - scu_base = of_iomap(np, 0); - of_node_put(np); - if (!scu_base) - return; - - /* Remap CPU Interrupt Interface Registers */ - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); - gic_cpu_ctrl = of_iomap(np, 1); - of_node_put(np); - if (!gic_cpu_ctrl) - goto unmap_scu; - - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); - cpu_ctrl = of_iomap(np, 0); - of_node_put(np); - if (!cpu_ctrl) - goto unmap_scu; - - scu_enable(scu_base); - flush_cache_all(); - -unmap_scu: - iounmap(scu_base); -} - -static const struct smp_operations ox820_smp_ops __initconst = { - .smp_prepare_cpus = ox820_smp_prepare_cpus, - .smp_boot_secondary = ox820_boot_secondary, -}; - -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c index 5d363385c801..fa31fd2d211d 100644 --- a/arch/arm/mach-versatile/platsmp-realview.c +++ b/arch/arm/mach-versatile/platsmp-realview.c @@ -18,16 +18,12 @@ #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 static const struct of_device_id realview_scu_match[] = { - { .compatible = "arm,arm11mp-scu", }, { .compatible = "arm,cortex-a9-scu", }, { .compatible = "arm,cortex-a5-scu", }, { } }; static const struct of_device_id realview_syscon_match[] = { - { .compatible = "arm,core-module-integrator", }, - { .compatible = "arm,realview-eb-syscon", }, - { .compatible = "arm,realview-pb11mp-syscon", }, { .compatible = "arm,realview-pbx-syscon", }, { }, }; diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index c5bbae86f725..16b62bc0a970 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -937,25 +937,6 @@ config VDSO You must have glibc 2.22 or later for programs to seamlessly take advantage of this. -config DMA_CACHE_RWFO - bool "Enable read/write for ownership DMA cache maintenance" - depends on CPU_V6K && SMP - default y - help - The Snoop Control Unit on ARM11MPCore does not detect the - cache maintenance operations and the dma_{map,unmap}_area() - functions may leave stale cache entries on other CPUs. By - enabling this option, Read or Write For Ownership in the ARMv6 - DMA cache maintenance functions is performed. These LDR/STR - instructions change the cache line state to shared or modified - so that the cache operation has the desired effect. - - Note that the workaround is only valid on processors that do - not perform speculative loads into the D-cache. For such - processors, if cache maintenance operations are not broadcast - in hardware, other workarounds are needed (e.g. cache - maintenance broadcasting in software via FIQ). - config OUTER_CACHE bool diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index abae7ff5defc..f6ee53c1de20 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) * - end - virtual end address of region */ ENTRY(v6_dma_inv_range) -#ifdef CONFIG_DMA_CACHE_RWFO - ldrb r2, [r0] @ read for ownership - strb r2, [r0] @ write for ownership -#endif tst r0, #D_CACHE_LINE_SIZE - 1 bic r0, r0, #D_CACHE_LINE_SIZE - 1 #ifdef HARVARD_CACHE @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) mcrne p15, 0, r0, c7, c11, 1 @ clean unified line #endif tst r1, #D_CACHE_LINE_SIZE - 1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrbne r2, [r1, #-1] @ read for ownership - strbne r2, [r1, #-1] @ write for ownership -#endif bic r1, r1, #D_CACHE_LINE_SIZE - 1 #ifdef HARVARD_CACHE mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) #endif add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrlo r2, [r0] @ read for ownership - strlo r2, [r0] @ write for ownership -#endif blo 1b mov r0, #0 mcr p15, 0, r0, c7, c10, 4 @ drain write buffer @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) ENTRY(v6_dma_clean_range) bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: -#ifdef CONFIG_DMA_CACHE_RWFO - ldr r2, [r0] @ read for ownership -#endif #ifdef HARVARD_CACHE mcr p15, 0, r0, c7, c10, 1 @ clean D line #else @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) * - end - virtual end address of region */ ENTRY(v6_dma_flush_range) -#ifdef CONFIG_DMA_CACHE_RWFO - ldrb r2, [r0] @ read for ownership - strb r2, [r0] @ write for ownership -#endif bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: #ifdef HARVARD_CACHE @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) #endif add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrblo r2, [r0] @ read for ownership - strblo r2, [r0] @ write for ownership -#endif blo 1b mov r0, #0 mcr p15, 0, r0, c7, c10, 4 @ drain write buffer @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) add r1, r1, r0 teq r2, #DMA_FROM_DEVICE beq v6_dma_inv_range -#ifndef CONFIG_DMA_CACHE_RWFO b v6_dma_clean_range -#else - teq r2, #DMA_TO_DEVICE - beq v6_dma_clean_range - b v6_dma_flush_range -#endif ENDPROC(v6_dma_map_area) /* @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) * - dir - DMA direction */ ENTRY(v6_dma_unmap_area) -#ifndef CONFIG_DMA_CACHE_RWFO add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v6_dma_inv_range -#endif ret lr ENDPROC(v6_dma_unmap_area) -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle From: Arnd Bergmann <arnd@arndb.de> The cache management operations for noncoherent DMA on ARMv6 work in two different ways: * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight DMA buffers lead to data corruption when the prefetched data is written back on top of data from the device. * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU is not seen by the other core(s), leading to inconsistent contents accross the system. As a consequence, neither configuration is actually safe to use in a general-purpose kernel that is used on both MPCore systems and ARM1176 with prefetching enabled. We could add further workarounds to make the behavior more dynamic based on the system, but realistically, there are close to zero remaining users on any ARM11MPCore anyway, and nobody seems too interested in it, compared to the more popular ARM1176 used in BMC2835 and AST2500. The Oxnas platform has some minimal support in OpenWRT, but most of the drivers and dts files never made it into the mainline kernel, while the Arm Versatile/Realview platform mainly serves as a reference system but is not necessary to be kept working once all other ARM11MPCore are gone. Take the easy way out here and drop support for multiprocessing on ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache management implementation for it. This also helps with other ARMv6 issues, but for the moment leaves the ability to build a kernel that can run on both ARMv7 SMP and single-processor ARMv6, which we probably want to stop supporting as well, but not as part of this series. Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Daniel Golle <daniel@makrotopia.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: linux-oxnas@groups.io Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I could use some help clarifying the above changelog text to describe the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching into the instruction cache, not the data cache, but this can end up in the outercache as a result. The 1176 has some extra control bits to control prefetching, but I found no reference that explains why an MPCore does not run into the problem. --- arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 ------ arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 ----- arch/arm/mm/cache-v6.S | 31 ------- 7 files changed, 178 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig index a9ded7079268..a054235c3d6c 100644 --- a/arch/arm/mach-oxnas/Kconfig +++ b/arch/arm/mach-oxnas/Kconfig @@ -28,10 +28,6 @@ config MACH_OX820 bool "Support OX820 Based Products" depends on ARCH_MULTI_V6 select ARM_GIC - select DMA_CACHE_RWFO if SMP - select HAVE_SMP - select HAVE_ARM_SCU if SMP - select HAVE_ARM_TWD if SMP help Include Support for the Oxford Semiconductor OX820 SoC Based Products. diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile index 0e78ecfe6c49..a4e40e534e6a 100644 --- a/arch/arm/mach-oxnas/Makefile +++ b/arch/arm/mach-oxnas/Makefile @@ -1,2 +1 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_SMP) += platsmp.o headsmp.o diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S deleted file mode 100644 index 9c0f1479f33a..000000000000 --- a/arch/arm/mach-oxnas/headsmp.S +++ /dev/null @@ -1,23 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> - * Copyright (c) 2003 ARM Limited - * All Rights Reserved - */ -#include <linux/linkage.h> -#include <linux/init.h> - - __INIT - -/* - * OX820 specific entry point for secondary CPUs. - */ -ENTRY(ox820_secondary_startup) - mov r4, #0 - /* invalidate both caches and branch target cache */ - mcr p15, 0, r4, c7, c7, 0 - /* - * we've been released from the holding pen: secondary_stack - * should now contain the SVC stack for this core - */ - b secondary_startup diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c deleted file mode 100644 index f0a50b9e61df..000000000000 --- a/arch/arm/mach-oxnas/platsmp.c +++ /dev/null @@ -1,96 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> - * Copyright (C) 2002 ARM Ltd. - * All Rights Reserved - */ -#include <linux/io.h> -#include <linux/delay.h> -#include <linux/of.h> -#include <linux/of_address.h> - -#include <asm/cacheflush.h> -#include <asm/cp15.h> -#include <asm/smp_plat.h> -#include <asm/smp_scu.h> - -extern void ox820_secondary_startup(void); - -static void __iomem *cpu_ctrl; -static void __iomem *gic_cpu_ctrl; - -#define HOLDINGPEN_CPU_OFFSET 0xc8 -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 - -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) -#define GIC_CPU_CTRL 0x00 -#define GIC_CPU_CTRL_ENABLE 1 - -static int __init ox820_boot_secondary(unsigned int cpu, - struct task_struct *idle) -{ - /* - * Write the address of secondary startup into the - * system-wide flags register. The BootMonitor waits - * until it receives a soft interrupt, and then the - * secondary CPU branches to this address. - */ - writel(virt_to_phys(ox820_secondary_startup), - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); - - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); - - /* - * Enable GIC cpu interface in CPU Interface Control Register - */ - writel(GIC_CPU_CTRL_ENABLE, - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); - - /* - * Send the secondary CPU a soft interrupt, thereby causing - * the boot monitor to read the system wide flags register, - * and branch to the address found there. - */ - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); - - return 0; -} - -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) -{ - struct device_node *np; - void __iomem *scu_base; - - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); - scu_base = of_iomap(np, 0); - of_node_put(np); - if (!scu_base) - return; - - /* Remap CPU Interrupt Interface Registers */ - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); - gic_cpu_ctrl = of_iomap(np, 1); - of_node_put(np); - if (!gic_cpu_ctrl) - goto unmap_scu; - - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); - cpu_ctrl = of_iomap(np, 0); - of_node_put(np); - if (!cpu_ctrl) - goto unmap_scu; - - scu_enable(scu_base); - flush_cache_all(); - -unmap_scu: - iounmap(scu_base); -} - -static const struct smp_operations ox820_smp_ops __initconst = { - .smp_prepare_cpus = ox820_smp_prepare_cpus, - .smp_boot_secondary = ox820_boot_secondary, -}; - -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c index 5d363385c801..fa31fd2d211d 100644 --- a/arch/arm/mach-versatile/platsmp-realview.c +++ b/arch/arm/mach-versatile/platsmp-realview.c @@ -18,16 +18,12 @@ #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 static const struct of_device_id realview_scu_match[] = { - { .compatible = "arm,arm11mp-scu", }, { .compatible = "arm,cortex-a9-scu", }, { .compatible = "arm,cortex-a5-scu", }, { } }; static const struct of_device_id realview_syscon_match[] = { - { .compatible = "arm,core-module-integrator", }, - { .compatible = "arm,realview-eb-syscon", }, - { .compatible = "arm,realview-pb11mp-syscon", }, { .compatible = "arm,realview-pbx-syscon", }, { }, }; diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index c5bbae86f725..16b62bc0a970 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -937,25 +937,6 @@ config VDSO You must have glibc 2.22 or later for programs to seamlessly take advantage of this. -config DMA_CACHE_RWFO - bool "Enable read/write for ownership DMA cache maintenance" - depends on CPU_V6K && SMP - default y - help - The Snoop Control Unit on ARM11MPCore does not detect the - cache maintenance operations and the dma_{map,unmap}_area() - functions may leave stale cache entries on other CPUs. By - enabling this option, Read or Write For Ownership in the ARMv6 - DMA cache maintenance functions is performed. These LDR/STR - instructions change the cache line state to shared or modified - so that the cache operation has the desired effect. - - Note that the workaround is only valid on processors that do - not perform speculative loads into the D-cache. For such - processors, if cache maintenance operations are not broadcast - in hardware, other workarounds are needed (e.g. cache - maintenance broadcasting in software via FIQ). - config OUTER_CACHE bool diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index abae7ff5defc..f6ee53c1de20 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) * - end - virtual end address of region */ ENTRY(v6_dma_inv_range) -#ifdef CONFIG_DMA_CACHE_RWFO - ldrb r2, [r0] @ read for ownership - strb r2, [r0] @ write for ownership -#endif tst r0, #D_CACHE_LINE_SIZE - 1 bic r0, r0, #D_CACHE_LINE_SIZE - 1 #ifdef HARVARD_CACHE @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) mcrne p15, 0, r0, c7, c11, 1 @ clean unified line #endif tst r1, #D_CACHE_LINE_SIZE - 1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrbne r2, [r1, #-1] @ read for ownership - strbne r2, [r1, #-1] @ write for ownership -#endif bic r1, r1, #D_CACHE_LINE_SIZE - 1 #ifdef HARVARD_CACHE mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) #endif add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrlo r2, [r0] @ read for ownership - strlo r2, [r0] @ write for ownership -#endif blo 1b mov r0, #0 mcr p15, 0, r0, c7, c10, 4 @ drain write buffer @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) ENTRY(v6_dma_clean_range) bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: -#ifdef CONFIG_DMA_CACHE_RWFO - ldr r2, [r0] @ read for ownership -#endif #ifdef HARVARD_CACHE mcr p15, 0, r0, c7, c10, 1 @ clean D line #else @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) * - end - virtual end address of region */ ENTRY(v6_dma_flush_range) -#ifdef CONFIG_DMA_CACHE_RWFO - ldrb r2, [r0] @ read for ownership - strb r2, [r0] @ write for ownership -#endif bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: #ifdef HARVARD_CACHE @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) #endif add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrblo r2, [r0] @ read for ownership - strblo r2, [r0] @ write for ownership -#endif blo 1b mov r0, #0 mcr p15, 0, r0, c7, c10, 4 @ drain write buffer @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) add r1, r1, r0 teq r2, #DMA_FROM_DEVICE beq v6_dma_inv_range -#ifndef CONFIG_DMA_CACHE_RWFO b v6_dma_clean_range -#else - teq r2, #DMA_TO_DEVICE - beq v6_dma_clean_range - b v6_dma_flush_range -#endif ENDPROC(v6_dma_map_area) /* @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) * - dir - DMA direction */ ENTRY(v6_dma_unmap_area) -#ifndef CONFIG_DMA_CACHE_RWFO add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v6_dma_inv_range -#endif ret lr ENDPROC(v6_dma_unmap_area) -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle From: Arnd Bergmann <arnd@arndb.de> The cache management operations for noncoherent DMA on ARMv6 work in two different ways: * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight DMA buffers lead to data corruption when the prefetched data is written back on top of data from the device. * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU is not seen by the other core(s), leading to inconsistent contents accross the system. As a consequence, neither configuration is actually safe to use in a general-purpose kernel that is used on both MPCore systems and ARM1176 with prefetching enabled. We could add further workarounds to make the behavior more dynamic based on the system, but realistically, there are close to zero remaining users on any ARM11MPCore anyway, and nobody seems too interested in it, compared to the more popular ARM1176 used in BMC2835 and AST2500. The Oxnas platform has some minimal support in OpenWRT, but most of the drivers and dts files never made it into the mainline kernel, while the Arm Versatile/Realview platform mainly serves as a reference system but is not necessary to be kept working once all other ARM11MPCore are gone. Take the easy way out here and drop support for multiprocessing on ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache management implementation for it. This also helps with other ARMv6 issues, but for the moment leaves the ability to build a kernel that can run on both ARMv7 SMP and single-processor ARMv6, which we probably want to stop supporting as well, but not as part of this series. Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Daniel Golle <daniel@makrotopia.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: linux-oxnas@groups.io Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I could use some help clarifying the above changelog text to describe the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching into the instruction cache, not the data cache, but this can end up in the outercache as a result. The 1176 has some extra control bits to control prefetching, but I found no reference that explains why an MPCore does not run into the problem. --- arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 ------ arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 ----- arch/arm/mm/cache-v6.S | 31 ------- 7 files changed, 178 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig index a9ded7079268..a054235c3d6c 100644 --- a/arch/arm/mach-oxnas/Kconfig +++ b/arch/arm/mach-oxnas/Kconfig @@ -28,10 +28,6 @@ config MACH_OX820 bool "Support OX820 Based Products" depends on ARCH_MULTI_V6 select ARM_GIC - select DMA_CACHE_RWFO if SMP - select HAVE_SMP - select HAVE_ARM_SCU if SMP - select HAVE_ARM_TWD if SMP help Include Support for the Oxford Semiconductor OX820 SoC Based Products. diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile index 0e78ecfe6c49..a4e40e534e6a 100644 --- a/arch/arm/mach-oxnas/Makefile +++ b/arch/arm/mach-oxnas/Makefile @@ -1,2 +1 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_SMP) += platsmp.o headsmp.o diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S deleted file mode 100644 index 9c0f1479f33a..000000000000 --- a/arch/arm/mach-oxnas/headsmp.S +++ /dev/null @@ -1,23 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> - * Copyright (c) 2003 ARM Limited - * All Rights Reserved - */ -#include <linux/linkage.h> -#include <linux/init.h> - - __INIT - -/* - * OX820 specific entry point for secondary CPUs. - */ -ENTRY(ox820_secondary_startup) - mov r4, #0 - /* invalidate both caches and branch target cache */ - mcr p15, 0, r4, c7, c7, 0 - /* - * we've been released from the holding pen: secondary_stack - * should now contain the SVC stack for this core - */ - b secondary_startup diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c deleted file mode 100644 index f0a50b9e61df..000000000000 --- a/arch/arm/mach-oxnas/platsmp.c +++ /dev/null @@ -1,96 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> - * Copyright (C) 2002 ARM Ltd. - * All Rights Reserved - */ -#include <linux/io.h> -#include <linux/delay.h> -#include <linux/of.h> -#include <linux/of_address.h> - -#include <asm/cacheflush.h> -#include <asm/cp15.h> -#include <asm/smp_plat.h> -#include <asm/smp_scu.h> - -extern void ox820_secondary_startup(void); - -static void __iomem *cpu_ctrl; -static void __iomem *gic_cpu_ctrl; - -#define HOLDINGPEN_CPU_OFFSET 0xc8 -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 - -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) -#define GIC_CPU_CTRL 0x00 -#define GIC_CPU_CTRL_ENABLE 1 - -static int __init ox820_boot_secondary(unsigned int cpu, - struct task_struct *idle) -{ - /* - * Write the address of secondary startup into the - * system-wide flags register. The BootMonitor waits - * until it receives a soft interrupt, and then the - * secondary CPU branches to this address. - */ - writel(virt_to_phys(ox820_secondary_startup), - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); - - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); - - /* - * Enable GIC cpu interface in CPU Interface Control Register - */ - writel(GIC_CPU_CTRL_ENABLE, - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); - - /* - * Send the secondary CPU a soft interrupt, thereby causing - * the boot monitor to read the system wide flags register, - * and branch to the address found there. - */ - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); - - return 0; -} - -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) -{ - struct device_node *np; - void __iomem *scu_base; - - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); - scu_base = of_iomap(np, 0); - of_node_put(np); - if (!scu_base) - return; - - /* Remap CPU Interrupt Interface Registers */ - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); - gic_cpu_ctrl = of_iomap(np, 1); - of_node_put(np); - if (!gic_cpu_ctrl) - goto unmap_scu; - - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); - cpu_ctrl = of_iomap(np, 0); - of_node_put(np); - if (!cpu_ctrl) - goto unmap_scu; - - scu_enable(scu_base); - flush_cache_all(); - -unmap_scu: - iounmap(scu_base); -} - -static const struct smp_operations ox820_smp_ops __initconst = { - .smp_prepare_cpus = ox820_smp_prepare_cpus, - .smp_boot_secondary = ox820_boot_secondary, -}; - -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c index 5d363385c801..fa31fd2d211d 100644 --- a/arch/arm/mach-versatile/platsmp-realview.c +++ b/arch/arm/mach-versatile/platsmp-realview.c @@ -18,16 +18,12 @@ #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 static const struct of_device_id realview_scu_match[] = { - { .compatible = "arm,arm11mp-scu", }, { .compatible = "arm,cortex-a9-scu", }, { .compatible = "arm,cortex-a5-scu", }, { } }; static const struct of_device_id realview_syscon_match[] = { - { .compatible = "arm,core-module-integrator", }, - { .compatible = "arm,realview-eb-syscon", }, - { .compatible = "arm,realview-pb11mp-syscon", }, { .compatible = "arm,realview-pbx-syscon", }, { }, }; diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index c5bbae86f725..16b62bc0a970 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -937,25 +937,6 @@ config VDSO You must have glibc 2.22 or later for programs to seamlessly take advantage of this. -config DMA_CACHE_RWFO - bool "Enable read/write for ownership DMA cache maintenance" - depends on CPU_V6K && SMP - default y - help - The Snoop Control Unit on ARM11MPCore does not detect the - cache maintenance operations and the dma_{map,unmap}_area() - functions may leave stale cache entries on other CPUs. By - enabling this option, Read or Write For Ownership in the ARMv6 - DMA cache maintenance functions is performed. These LDR/STR - instructions change the cache line state to shared or modified - so that the cache operation has the desired effect. - - Note that the workaround is only valid on processors that do - not perform speculative loads into the D-cache. For such - processors, if cache maintenance operations are not broadcast - in hardware, other workarounds are needed (e.g. cache - maintenance broadcasting in software via FIQ). - config OUTER_CACHE bool diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index abae7ff5defc..f6ee53c1de20 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) * - end - virtual end address of region */ ENTRY(v6_dma_inv_range) -#ifdef CONFIG_DMA_CACHE_RWFO - ldrb r2, [r0] @ read for ownership - strb r2, [r0] @ write for ownership -#endif tst r0, #D_CACHE_LINE_SIZE - 1 bic r0, r0, #D_CACHE_LINE_SIZE - 1 #ifdef HARVARD_CACHE @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) mcrne p15, 0, r0, c7, c11, 1 @ clean unified line #endif tst r1, #D_CACHE_LINE_SIZE - 1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrbne r2, [r1, #-1] @ read for ownership - strbne r2, [r1, #-1] @ write for ownership -#endif bic r1, r1, #D_CACHE_LINE_SIZE - 1 #ifdef HARVARD_CACHE mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) #endif add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrlo r2, [r0] @ read for ownership - strlo r2, [r0] @ write for ownership -#endif blo 1b mov r0, #0 mcr p15, 0, r0, c7, c10, 4 @ drain write buffer @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) ENTRY(v6_dma_clean_range) bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: -#ifdef CONFIG_DMA_CACHE_RWFO - ldr r2, [r0] @ read for ownership -#endif #ifdef HARVARD_CACHE mcr p15, 0, r0, c7, c10, 1 @ clean D line #else @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) * - end - virtual end address of region */ ENTRY(v6_dma_flush_range) -#ifdef CONFIG_DMA_CACHE_RWFO - ldrb r2, [r0] @ read for ownership - strb r2, [r0] @ write for ownership -#endif bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: #ifdef HARVARD_CACHE @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) #endif add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrblo r2, [r0] @ read for ownership - strblo r2, [r0] @ write for ownership -#endif blo 1b mov r0, #0 mcr p15, 0, r0, c7, c10, 4 @ drain write buffer @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) add r1, r1, r0 teq r2, #DMA_FROM_DEVICE beq v6_dma_inv_range -#ifndef CONFIG_DMA_CACHE_RWFO b v6_dma_clean_range -#else - teq r2, #DMA_TO_DEVICE - beq v6_dma_clean_range - b v6_dma_flush_range -#endif ENDPROC(v6_dma_map_area) /* @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) * - dir - DMA direction */ ENTRY(v6_dma_unmap_area) -#ifndef CONFIG_DMA_CACHE_RWFO add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v6_dma_inv_range -#endif ret lr ENDPROC(v6_dma_unmap_area) -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> The cache management operations for noncoherent DMA on ARMv6 work in two different ways: * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight DMA buffers lead to data corruption when the prefetched data is written back on top of data from the device. * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU is not seen by the other core(s), leading to inconsistent contents accross the system. As a consequence, neither configuration is actually safe to use in a general-purpose kernel that is used on both MPCore systems and ARM1176 with prefetching enabled. We could add further workarounds to make the behavior more dynamic based on the system, but realistically, there are close to zero remaining users on any ARM11MPCore anyway, and nobody seems too interested in it, compared to the more popular ARM1176 used in BMC2835 and AST2500. The Oxnas platform has some minimal support in OpenWRT, but most of the drivers and dts files never made it into the mainline kernel, while the Arm Versatile/Realview platform mainly serves as a reference system but is not necessary to be kept working once all other ARM11MPCore are gone. Take the easy way out here and drop support for multiprocessing on ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache management implementation for it. This also helps with other ARMv6 issues, but for the moment leaves the ability to build a kernel that can run on both ARMv7 SMP and single-processor ARMv6, which we probably want to stop supporting as well, but not as part of this series. Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Daniel Golle <daniel@makrotopia.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: linux-oxnas@groups.io Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I could use some help clarifying the above changelog text to describe the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching into the instruction cache, not the data cache, but this can end up in the outercache as a result. The 1176 has some extra control bits to control prefetching, but I found no reference that explains why an MPCore does not run into the problem. --- arch/arm/mach-oxnas/Kconfig | 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 ------ arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig | 19 ----- arch/arm/mm/cache-v6.S | 31 ------- 7 files changed, 178 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig index a9ded7079268..a054235c3d6c 100644 --- a/arch/arm/mach-oxnas/Kconfig +++ b/arch/arm/mach-oxnas/Kconfig @@ -28,10 +28,6 @@ config MACH_OX820 bool "Support OX820 Based Products" depends on ARCH_MULTI_V6 select ARM_GIC - select DMA_CACHE_RWFO if SMP - select HAVE_SMP - select HAVE_ARM_SCU if SMP - select HAVE_ARM_TWD if SMP help Include Support for the Oxford Semiconductor OX820 SoC Based Products. diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile index 0e78ecfe6c49..a4e40e534e6a 100644 --- a/arch/arm/mach-oxnas/Makefile +++ b/arch/arm/mach-oxnas/Makefile @@ -1,2 +1 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_SMP) += platsmp.o headsmp.o diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S deleted file mode 100644 index 9c0f1479f33a..000000000000 --- a/arch/arm/mach-oxnas/headsmp.S +++ /dev/null @@ -1,23 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> - * Copyright (c) 2003 ARM Limited - * All Rights Reserved - */ -#include <linux/linkage.h> -#include <linux/init.h> - - __INIT - -/* - * OX820 specific entry point for secondary CPUs. - */ -ENTRY(ox820_secondary_startup) - mov r4, #0 - /* invalidate both caches and branch target cache */ - mcr p15, 0, r4, c7, c7, 0 - /* - * we've been released from the holding pen: secondary_stack - * should now contain the SVC stack for this core - */ - b secondary_startup diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c deleted file mode 100644 index f0a50b9e61df..000000000000 --- a/arch/arm/mach-oxnas/platsmp.c +++ /dev/null @@ -1,96 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> - * Copyright (C) 2002 ARM Ltd. - * All Rights Reserved - */ -#include <linux/io.h> -#include <linux/delay.h> -#include <linux/of.h> -#include <linux/of_address.h> - -#include <asm/cacheflush.h> -#include <asm/cp15.h> -#include <asm/smp_plat.h> -#include <asm/smp_scu.h> - -extern void ox820_secondary_startup(void); - -static void __iomem *cpu_ctrl; -static void __iomem *gic_cpu_ctrl; - -#define HOLDINGPEN_CPU_OFFSET 0xc8 -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 - -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) -#define GIC_CPU_CTRL 0x00 -#define GIC_CPU_CTRL_ENABLE 1 - -static int __init ox820_boot_secondary(unsigned int cpu, - struct task_struct *idle) -{ - /* - * Write the address of secondary startup into the - * system-wide flags register. The BootMonitor waits - * until it receives a soft interrupt, and then the - * secondary CPU branches to this address. - */ - writel(virt_to_phys(ox820_secondary_startup), - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); - - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); - - /* - * Enable GIC cpu interface in CPU Interface Control Register - */ - writel(GIC_CPU_CTRL_ENABLE, - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); - - /* - * Send the secondary CPU a soft interrupt, thereby causing - * the boot monitor to read the system wide flags register, - * and branch to the address found there. - */ - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); - - return 0; -} - -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) -{ - struct device_node *np; - void __iomem *scu_base; - - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); - scu_base = of_iomap(np, 0); - of_node_put(np); - if (!scu_base) - return; - - /* Remap CPU Interrupt Interface Registers */ - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); - gic_cpu_ctrl = of_iomap(np, 1); - of_node_put(np); - if (!gic_cpu_ctrl) - goto unmap_scu; - - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); - cpu_ctrl = of_iomap(np, 0); - of_node_put(np); - if (!cpu_ctrl) - goto unmap_scu; - - scu_enable(scu_base); - flush_cache_all(); - -unmap_scu: - iounmap(scu_base); -} - -static const struct smp_operations ox820_smp_ops __initconst = { - .smp_prepare_cpus = ox820_smp_prepare_cpus, - .smp_boot_secondary = ox820_boot_secondary, -}; - -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c index 5d363385c801..fa31fd2d211d 100644 --- a/arch/arm/mach-versatile/platsmp-realview.c +++ b/arch/arm/mach-versatile/platsmp-realview.c @@ -18,16 +18,12 @@ #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 static const struct of_device_id realview_scu_match[] = { - { .compatible = "arm,arm11mp-scu", }, { .compatible = "arm,cortex-a9-scu", }, { .compatible = "arm,cortex-a5-scu", }, { } }; static const struct of_device_id realview_syscon_match[] = { - { .compatible = "arm,core-module-integrator", }, - { .compatible = "arm,realview-eb-syscon", }, - { .compatible = "arm,realview-pb11mp-syscon", }, { .compatible = "arm,realview-pbx-syscon", }, { }, }; diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index c5bbae86f725..16b62bc0a970 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -937,25 +937,6 @@ config VDSO You must have glibc 2.22 or later for programs to seamlessly take advantage of this. -config DMA_CACHE_RWFO - bool "Enable read/write for ownership DMA cache maintenance" - depends on CPU_V6K && SMP - default y - help - The Snoop Control Unit on ARM11MPCore does not detect the - cache maintenance operations and the dma_{map,unmap}_area() - functions may leave stale cache entries on other CPUs. By - enabling this option, Read or Write For Ownership in the ARMv6 - DMA cache maintenance functions is performed. These LDR/STR - instructions change the cache line state to shared or modified - so that the cache operation has the desired effect. - - Note that the workaround is only valid on processors that do - not perform speculative loads into the D-cache. For such - processors, if cache maintenance operations are not broadcast - in hardware, other workarounds are needed (e.g. cache - maintenance broadcasting in software via FIQ). - config OUTER_CACHE bool diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index abae7ff5defc..f6ee53c1de20 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) * - end - virtual end address of region */ ENTRY(v6_dma_inv_range) -#ifdef CONFIG_DMA_CACHE_RWFO - ldrb r2, [r0] @ read for ownership - strb r2, [r0] @ write for ownership -#endif tst r0, #D_CACHE_LINE_SIZE - 1 bic r0, r0, #D_CACHE_LINE_SIZE - 1 #ifdef HARVARD_CACHE @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) mcrne p15, 0, r0, c7, c11, 1 @ clean unified line #endif tst r1, #D_CACHE_LINE_SIZE - 1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrbne r2, [r1, #-1] @ read for ownership - strbne r2, [r1, #-1] @ write for ownership -#endif bic r1, r1, #D_CACHE_LINE_SIZE - 1 #ifdef HARVARD_CACHE mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) #endif add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrlo r2, [r0] @ read for ownership - strlo r2, [r0] @ write for ownership -#endif blo 1b mov r0, #0 mcr p15, 0, r0, c7, c10, 4 @ drain write buffer @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) ENTRY(v6_dma_clean_range) bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: -#ifdef CONFIG_DMA_CACHE_RWFO - ldr r2, [r0] @ read for ownership -#endif #ifdef HARVARD_CACHE mcr p15, 0, r0, c7, c10, 1 @ clean D line #else @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) * - end - virtual end address of region */ ENTRY(v6_dma_flush_range) -#ifdef CONFIG_DMA_CACHE_RWFO - ldrb r2, [r0] @ read for ownership - strb r2, [r0] @ write for ownership -#endif bic r0, r0, #D_CACHE_LINE_SIZE - 1 1: #ifdef HARVARD_CACHE @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) #endif add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 -#ifdef CONFIG_DMA_CACHE_RWFO - ldrblo r2, [r0] @ read for ownership - strblo r2, [r0] @ write for ownership -#endif blo 1b mov r0, #0 mcr p15, 0, r0, c7, c10, 4 @ drain write buffer @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) add r1, r1, r0 teq r2, #DMA_FROM_DEVICE beq v6_dma_inv_range -#ifndef CONFIG_DMA_CACHE_RWFO b v6_dma_clean_range -#else - teq r2, #DMA_TO_DEVICE - beq v6_dma_clean_range - b v6_dma_flush_range -#endif ENDPROC(v6_dma_map_area) /* @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) * - dir - DMA direction */ ENTRY(v6_dma_unmap_area) -#ifndef CONFIG_DMA_CACHE_RWFO add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v6_dma_inv_range -#endif ret lr ENDPROC(v6_dma_unmap_area) -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-30 7:48 ` Neil Armstrong -1 siblings, 0 replies; 456+ messages in thread From: Neil Armstrong @ 2023-03-30 7:48 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On 27/03/2023 14:13, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. Acked-by: Neil Armstrong <neil.armstrong@linaro.org> It's sad but it's the reality, there's no chance full OXNAS support will ever come upstream and no real work has been done for years. I think OXNAS support can be programmed for removal for next release, it would need significant work to rework current support to make it acceptable before trying to upstream missing bits anyway. Thanks, Neil > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > I could use some help clarifying the above changelog text to describe > the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on > MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching > into the instruction cache, not the data cache, but this can end up in > the outercache as a result. The 1176 has some extra control bits to > control prefetching, but I found no reference that explains why an > MPCore does not run into the problem. > --- > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 ------ > arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 ----- > arch/arm/mm/cache-v6.S | 31 ------- > 7 files changed, 178 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > > diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig > index a9ded7079268..a054235c3d6c 100644 > --- a/arch/arm/mach-oxnas/Kconfig > +++ b/arch/arm/mach-oxnas/Kconfig > @@ -28,10 +28,6 @@ config MACH_OX820 > bool "Support OX820 Based Products" > depends on ARCH_MULTI_V6 > select ARM_GIC > - select DMA_CACHE_RWFO if SMP > - select HAVE_SMP > - select HAVE_ARM_SCU if SMP > - select HAVE_ARM_TWD if SMP > help > Include Support for the Oxford Semiconductor OX820 SoC Based Products. > > diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile > index 0e78ecfe6c49..a4e40e534e6a 100644 > --- a/arch/arm/mach-oxnas/Makefile > +++ b/arch/arm/mach-oxnas/Makefile > @@ -1,2 +1 @@ > # SPDX-License-Identifier: GPL-2.0-only > -obj-$(CONFIG_SMP) += platsmp.o headsmp.o > diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S > deleted file mode 100644 > index 9c0f1479f33a..000000000000 > --- a/arch/arm/mach-oxnas/headsmp.S > +++ /dev/null > @@ -1,23 +0,0 @@ > -/* SPDX-License-Identifier: GPL-2.0-only */ > -/* > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (c) 2003 ARM Limited > - * All Rights Reserved > - */ > -#include <linux/linkage.h> > -#include <linux/init.h> > - > - __INIT > - > -/* > - * OX820 specific entry point for secondary CPUs. > - */ > -ENTRY(ox820_secondary_startup) > - mov r4, #0 > - /* invalidate both caches and branch target cache */ > - mcr p15, 0, r4, c7, c7, 0 > - /* > - * we've been released from the holding pen: secondary_stack > - * should now contain the SVC stack for this core > - */ > - b secondary_startup > diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c > deleted file mode 100644 > index f0a50b9e61df..000000000000 > --- a/arch/arm/mach-oxnas/platsmp.c > +++ /dev/null > @@ -1,96 +0,0 @@ > -// SPDX-License-Identifier: GPL-2.0-only > -/* > - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (C) 2002 ARM Ltd. > - * All Rights Reserved > - */ > -#include <linux/io.h> > -#include <linux/delay.h> > -#include <linux/of.h> > -#include <linux/of_address.h> > - > -#include <asm/cacheflush.h> > -#include <asm/cp15.h> > -#include <asm/smp_plat.h> > -#include <asm/smp_scu.h> > - > -extern void ox820_secondary_startup(void); > - > -static void __iomem *cpu_ctrl; > -static void __iomem *gic_cpu_ctrl; > - > -#define HOLDINGPEN_CPU_OFFSET 0xc8 > -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 > - > -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) > -#define GIC_CPU_CTRL 0x00 > -#define GIC_CPU_CTRL_ENABLE 1 > - > -static int __init ox820_boot_secondary(unsigned int cpu, > - struct task_struct *idle) > -{ > - /* > - * Write the address of secondary startup into the > - * system-wide flags register. The BootMonitor waits > - * until it receives a soft interrupt, and then the > - * secondary CPU branches to this address. > - */ > - writel(virt_to_phys(ox820_secondary_startup), > - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); > - > - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); > - > - /* > - * Enable GIC cpu interface in CPU Interface Control Register > - */ > - writel(GIC_CPU_CTRL_ENABLE, > - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); > - > - /* > - * Send the secondary CPU a soft interrupt, thereby causing > - * the boot monitor to read the system wide flags register, > - * and branch to the address found there. > - */ > - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); > - > - return 0; > -} > - > -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) > -{ > - struct device_node *np; > - void __iomem *scu_base; > - > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); > - scu_base = of_iomap(np, 0); > - of_node_put(np); > - if (!scu_base) > - return; > - > - /* Remap CPU Interrupt Interface Registers */ > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); > - gic_cpu_ctrl = of_iomap(np, 1); > - of_node_put(np); > - if (!gic_cpu_ctrl) > - goto unmap_scu; > - > - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); > - cpu_ctrl = of_iomap(np, 0); > - of_node_put(np); > - if (!cpu_ctrl) > - goto unmap_scu; > - > - scu_enable(scu_base); > - flush_cache_all(); > - > -unmap_scu: > - iounmap(scu_base); > -} > - > -static const struct smp_operations ox820_smp_ops __initconst = { > - .smp_prepare_cpus = ox820_smp_prepare_cpus, > - .smp_boot_secondary = ox820_boot_secondary, > -}; > - > -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); > diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c > index 5d363385c801..fa31fd2d211d 100644 > --- a/arch/arm/mach-versatile/platsmp-realview.c > +++ b/arch/arm/mach-versatile/platsmp-realview.c > @@ -18,16 +18,12 @@ > #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 > > static const struct of_device_id realview_scu_match[] = { > - { .compatible = "arm,arm11mp-scu", }, > { .compatible = "arm,cortex-a9-scu", }, > { .compatible = "arm,cortex-a5-scu", }, > { } > }; > > static const struct of_device_id realview_syscon_match[] = { > - { .compatible = "arm,core-module-integrator", }, > - { .compatible = "arm,realview-eb-syscon", }, > - { .compatible = "arm,realview-pb11mp-syscon", }, > { .compatible = "arm,realview-pbx-syscon", }, > { }, > }; > diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig > index c5bbae86f725..16b62bc0a970 100644 > --- a/arch/arm/mm/Kconfig > +++ b/arch/arm/mm/Kconfig > @@ -937,25 +937,6 @@ config VDSO > You must have glibc 2.22 or later for programs to seamlessly > take advantage of this. > > -config DMA_CACHE_RWFO > - bool "Enable read/write for ownership DMA cache maintenance" > - depends on CPU_V6K && SMP > - default y > - help > - The Snoop Control Unit on ARM11MPCore does not detect the > - cache maintenance operations and the dma_{map,unmap}_area() > - functions may leave stale cache entries on other CPUs. By > - enabling this option, Read or Write For Ownership in the ARMv6 > - DMA cache maintenance functions is performed. These LDR/STR > - instructions change the cache line state to shared or modified > - so that the cache operation has the desired effect. > - > - Note that the workaround is only valid on processors that do > - not perform speculative loads into the D-cache. For such > - processors, if cache maintenance operations are not broadcast > - in hardware, other workarounds are needed (e.g. cache > - maintenance broadcasting in software via FIQ). > - > config OUTER_CACHE > bool > > diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S > index abae7ff5defc..f6ee53c1de20 100644 > --- a/arch/arm/mm/cache-v6.S > +++ b/arch/arm/mm/cache-v6.S > @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) > * - end - virtual end address of region > */ > ENTRY(v6_dma_inv_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > tst r0, #D_CACHE_LINE_SIZE - 1 > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) > mcrne p15, 0, r0, c7, c11, 1 @ clean unified line > #endif > tst r1, #D_CACHE_LINE_SIZE - 1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrbne r2, [r1, #-1] @ read for ownership > - strbne r2, [r1, #-1] @ write for ownership > -#endif > bic r1, r1, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line > @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrlo r2, [r0] @ read for ownership > - strlo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) > ENTRY(v6_dma_clean_range) > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldr r2, [r0] @ read for ownership > -#endif > #ifdef HARVARD_CACHE > mcr p15, 0, r0, c7, c10, 1 @ clean D line > #else > @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) > * - end - virtual end address of region > */ > ENTRY(v6_dma_flush_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > #ifdef HARVARD_CACHE > @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrblo r2, [r0] @ read for ownership > - strblo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) > add r1, r1, r0 > teq r2, #DMA_FROM_DEVICE > beq v6_dma_inv_range > -#ifndef CONFIG_DMA_CACHE_RWFO > b v6_dma_clean_range > -#else > - teq r2, #DMA_TO_DEVICE > - beq v6_dma_clean_range > - b v6_dma_flush_range > -#endif > ENDPROC(v6_dma_map_area) > > /* > @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) > * - dir - DMA direction > */ > ENTRY(v6_dma_unmap_area) > -#ifndef CONFIG_DMA_CACHE_RWFO > add r1, r1, r0 > teq r2, #DMA_TO_DEVICE > bne v6_dma_inv_range > -#endif > ret lr > ENDPROC(v6_dma_unmap_area) > ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 7:48 ` Neil Armstrong 0 siblings, 0 replies; 456+ messages in thread From: Neil Armstrong @ 2023-03-30 7:48 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On 27/03/2023 14:13, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. Acked-by: Neil Armstrong <neil.armstrong@linaro.org> It's sad but it's the reality, there's no chance full OXNAS support will ever come upstream and no real work has been done for years. I think OXNAS support can be programmed for removal for next release, it would need significant work to rework current support to make it acceptable before trying to upstream missing bits anyway. Thanks, Neil > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > I could use some help clarifying the above changelog text to describe > the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on > MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching > into the instruction cache, not the data cache, but this can end up in > the outercache as a result. The 1176 has some extra control bits to > control prefetching, but I found no reference that explains why an > MPCore does not run into the problem. > --- > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 ------ > arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 ----- > arch/arm/mm/cache-v6.S | 31 ------- > 7 files changed, 178 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > > diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig > index a9ded7079268..a054235c3d6c 100644 > --- a/arch/arm/mach-oxnas/Kconfig > +++ b/arch/arm/mach-oxnas/Kconfig > @@ -28,10 +28,6 @@ config MACH_OX820 > bool "Support OX820 Based Products" > depends on ARCH_MULTI_V6 > select ARM_GIC > - select DMA_CACHE_RWFO if SMP > - select HAVE_SMP > - select HAVE_ARM_SCU if SMP > - select HAVE_ARM_TWD if SMP > help > Include Support for the Oxford Semiconductor OX820 SoC Based Products. > > diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile > index 0e78ecfe6c49..a4e40e534e6a 100644 > --- a/arch/arm/mach-oxnas/Makefile > +++ b/arch/arm/mach-oxnas/Makefile > @@ -1,2 +1 @@ > # SPDX-License-Identifier: GPL-2.0-only > -obj-$(CONFIG_SMP) += platsmp.o headsmp.o > diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S > deleted file mode 100644 > index 9c0f1479f33a..000000000000 > --- a/arch/arm/mach-oxnas/headsmp.S > +++ /dev/null > @@ -1,23 +0,0 @@ > -/* SPDX-License-Identifier: GPL-2.0-only */ > -/* > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (c) 2003 ARM Limited > - * All Rights Reserved > - */ > -#include <linux/linkage.h> > -#include <linux/init.h> > - > - __INIT > - > -/* > - * OX820 specific entry point for secondary CPUs. > - */ > -ENTRY(ox820_secondary_startup) > - mov r4, #0 > - /* invalidate both caches and branch target cache */ > - mcr p15, 0, r4, c7, c7, 0 > - /* > - * we've been released from the holding pen: secondary_stack > - * should now contain the SVC stack for this core > - */ > - b secondary_startup > diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c > deleted file mode 100644 > index f0a50b9e61df..000000000000 > --- a/arch/arm/mach-oxnas/platsmp.c > +++ /dev/null > @@ -1,96 +0,0 @@ > -// SPDX-License-Identifier: GPL-2.0-only > -/* > - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (C) 2002 ARM Ltd. > - * All Rights Reserved > - */ > -#include <linux/io.h> > -#include <linux/delay.h> > -#include <linux/of.h> > -#include <linux/of_address.h> > - > -#include <asm/cacheflush.h> > -#include <asm/cp15.h> > -#include <asm/smp_plat.h> > -#include <asm/smp_scu.h> > - > -extern void ox820_secondary_startup(void); > - > -static void __iomem *cpu_ctrl; > -static void __iomem *gic_cpu_ctrl; > - > -#define HOLDINGPEN_CPU_OFFSET 0xc8 > -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 > - > -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) > -#define GIC_CPU_CTRL 0x00 > -#define GIC_CPU_CTRL_ENABLE 1 > - > -static int __init ox820_boot_secondary(unsigned int cpu, > - struct task_struct *idle) > -{ > - /* > - * Write the address of secondary startup into the > - * system-wide flags register. The BootMonitor waits > - * until it receives a soft interrupt, and then the > - * secondary CPU branches to this address. > - */ > - writel(virt_to_phys(ox820_secondary_startup), > - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); > - > - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); > - > - /* > - * Enable GIC cpu interface in CPU Interface Control Register > - */ > - writel(GIC_CPU_CTRL_ENABLE, > - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); > - > - /* > - * Send the secondary CPU a soft interrupt, thereby causing > - * the boot monitor to read the system wide flags register, > - * and branch to the address found there. > - */ > - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); > - > - return 0; > -} > - > -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) > -{ > - struct device_node *np; > - void __iomem *scu_base; > - > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); > - scu_base = of_iomap(np, 0); > - of_node_put(np); > - if (!scu_base) > - return; > - > - /* Remap CPU Interrupt Interface Registers */ > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); > - gic_cpu_ctrl = of_iomap(np, 1); > - of_node_put(np); > - if (!gic_cpu_ctrl) > - goto unmap_scu; > - > - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); > - cpu_ctrl = of_iomap(np, 0); > - of_node_put(np); > - if (!cpu_ctrl) > - goto unmap_scu; > - > - scu_enable(scu_base); > - flush_cache_all(); > - > -unmap_scu: > - iounmap(scu_base); > -} > - > -static const struct smp_operations ox820_smp_ops __initconst = { > - .smp_prepare_cpus = ox820_smp_prepare_cpus, > - .smp_boot_secondary = ox820_boot_secondary, > -}; > - > -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); > diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c > index 5d363385c801..fa31fd2d211d 100644 > --- a/arch/arm/mach-versatile/platsmp-realview.c > +++ b/arch/arm/mach-versatile/platsmp-realview.c > @@ -18,16 +18,12 @@ > #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 > > static const struct of_device_id realview_scu_match[] = { > - { .compatible = "arm,arm11mp-scu", }, > { .compatible = "arm,cortex-a9-scu", }, > { .compatible = "arm,cortex-a5-scu", }, > { } > }; > > static const struct of_device_id realview_syscon_match[] = { > - { .compatible = "arm,core-module-integrator", }, > - { .compatible = "arm,realview-eb-syscon", }, > - { .compatible = "arm,realview-pb11mp-syscon", }, > { .compatible = "arm,realview-pbx-syscon", }, > { }, > }; > diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig > index c5bbae86f725..16b62bc0a970 100644 > --- a/arch/arm/mm/Kconfig > +++ b/arch/arm/mm/Kconfig > @@ -937,25 +937,6 @@ config VDSO > You must have glibc 2.22 or later for programs to seamlessly > take advantage of this. > > -config DMA_CACHE_RWFO > - bool "Enable read/write for ownership DMA cache maintenance" > - depends on CPU_V6K && SMP > - default y > - help > - The Snoop Control Unit on ARM11MPCore does not detect the > - cache maintenance operations and the dma_{map,unmap}_area() > - functions may leave stale cache entries on other CPUs. By > - enabling this option, Read or Write For Ownership in the ARMv6 > - DMA cache maintenance functions is performed. These LDR/STR > - instructions change the cache line state to shared or modified > - so that the cache operation has the desired effect. > - > - Note that the workaround is only valid on processors that do > - not perform speculative loads into the D-cache. For such > - processors, if cache maintenance operations are not broadcast > - in hardware, other workarounds are needed (e.g. cache > - maintenance broadcasting in software via FIQ). > - > config OUTER_CACHE > bool > > diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S > index abae7ff5defc..f6ee53c1de20 100644 > --- a/arch/arm/mm/cache-v6.S > +++ b/arch/arm/mm/cache-v6.S > @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) > * - end - virtual end address of region > */ > ENTRY(v6_dma_inv_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > tst r0, #D_CACHE_LINE_SIZE - 1 > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) > mcrne p15, 0, r0, c7, c11, 1 @ clean unified line > #endif > tst r1, #D_CACHE_LINE_SIZE - 1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrbne r2, [r1, #-1] @ read for ownership > - strbne r2, [r1, #-1] @ write for ownership > -#endif > bic r1, r1, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line > @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrlo r2, [r0] @ read for ownership > - strlo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) > ENTRY(v6_dma_clean_range) > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldr r2, [r0] @ read for ownership > -#endif > #ifdef HARVARD_CACHE > mcr p15, 0, r0, c7, c10, 1 @ clean D line > #else > @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) > * - end - virtual end address of region > */ > ENTRY(v6_dma_flush_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > #ifdef HARVARD_CACHE > @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrblo r2, [r0] @ read for ownership > - strblo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) > add r1, r1, r0 > teq r2, #DMA_FROM_DEVICE > beq v6_dma_inv_range > -#ifndef CONFIG_DMA_CACHE_RWFO > b v6_dma_clean_range > -#else > - teq r2, #DMA_TO_DEVICE > - beq v6_dma_clean_range > - b v6_dma_flush_range > -#endif > ENDPROC(v6_dma_map_area) > > /* > @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) > * - dir - DMA direction > */ > ENTRY(v6_dma_unmap_area) > -#ifndef CONFIG_DMA_CACHE_RWFO > add r1, r1, r0 > teq r2, #DMA_TO_DEVICE > bne v6_dma_inv_range > -#endif > ret lr > ENDPROC(v6_dma_unmap_area) > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 7:48 ` Neil Armstrong 0 siblings, 0 replies; 456+ messages in thread From: Neil Armstrong @ 2023-03-30 7:48 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Daniel Golle, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Brian Cain, Michal Simek <mon On 27/03/2023 14:13, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. Acked-by: Neil Armstrong <neil.armstrong@linaro.org> It's sad but it's the reality, there's no chance full OXNAS support will ever come upstream and no real work has been done for years. I think OXNAS support can be programmed for removal for next release, it would need significant work to rework current support to make it acceptable before trying to upstream missing bits anyway. Thanks, Neil > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > I could use some help clarifying the above changelog text to describe > the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on > MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching > into the instruction cache, not the data cache, but this can end up in > the outercache as a result. The 1176 has some extra control bits to > control prefetching, but I found no reference that explains why an > MPCore does not run into the problem. > --- > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 ------ > arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 ----- > arch/arm/mm/cache-v6.S | 31 ------- > 7 files changed, 178 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > > diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig > index a9ded7079268..a054235c3d6c 100644 > --- a/arch/arm/mach-oxnas/Kconfig > +++ b/arch/arm/mach-oxnas/Kconfig > @@ -28,10 +28,6 @@ config MACH_OX820 > bool "Support OX820 Based Products" > depends on ARCH_MULTI_V6 > select ARM_GIC > - select DMA_CACHE_RWFO if SMP > - select HAVE_SMP > - select HAVE_ARM_SCU if SMP > - select HAVE_ARM_TWD if SMP > help > Include Support for the Oxford Semiconductor OX820 SoC Based Products. > > diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile > index 0e78ecfe6c49..a4e40e534e6a 100644 > --- a/arch/arm/mach-oxnas/Makefile > +++ b/arch/arm/mach-oxnas/Makefile > @@ -1,2 +1 @@ > # SPDX-License-Identifier: GPL-2.0-only > -obj-$(CONFIG_SMP) += platsmp.o headsmp.o > diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S > deleted file mode 100644 > index 9c0f1479f33a..000000000000 > --- a/arch/arm/mach-oxnas/headsmp.S > +++ /dev/null > @@ -1,23 +0,0 @@ > -/* SPDX-License-Identifier: GPL-2.0-only */ > -/* > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (c) 2003 ARM Limited > - * All Rights Reserved > - */ > -#include <linux/linkage.h> > -#include <linux/init.h> > - > - __INIT > - > -/* > - * OX820 specific entry point for secondary CPUs. > - */ > -ENTRY(ox820_secondary_startup) > - mov r4, #0 > - /* invalidate both caches and branch target cache */ > - mcr p15, 0, r4, c7, c7, 0 > - /* > - * we've been released from the holding pen: secondary_stack > - * should now contain the SVC stack for this core > - */ > - b secondary_startup > diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c > deleted file mode 100644 > index f0a50b9e61df..000000000000 > --- a/arch/arm/mach-oxnas/platsmp.c > +++ /dev/null > @@ -1,96 +0,0 @@ > -// SPDX-License-Identifier: GPL-2.0-only > -/* > - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (C) 2002 ARM Ltd. > - * All Rights Reserved > - */ > -#include <linux/io.h> > -#include <linux/delay.h> > -#include <linux/of.h> > -#include <linux/of_address.h> > - > -#include <asm/cacheflush.h> > -#include <asm/cp15.h> > -#include <asm/smp_plat.h> > -#include <asm/smp_scu.h> > - > -extern void ox820_secondary_startup(void); > - > -static void __iomem *cpu_ctrl; > -static void __iomem *gic_cpu_ctrl; > - > -#define HOLDINGPEN_CPU_OFFSET 0xc8 > -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 > - > -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) > -#define GIC_CPU_CTRL 0x00 > -#define GIC_CPU_CTRL_ENABLE 1 > - > -static int __init ox820_boot_secondary(unsigned int cpu, > - struct task_struct *idle) > -{ > - /* > - * Write the address of secondary startup into the > - * system-wide flags register. The BootMonitor waits > - * until it receives a soft interrupt, and then the > - * secondary CPU branches to this address. > - */ > - writel(virt_to_phys(ox820_secondary_startup), > - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); > - > - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); > - > - /* > - * Enable GIC cpu interface in CPU Interface Control Register > - */ > - writel(GIC_CPU_CTRL_ENABLE, > - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); > - > - /* > - * Send the secondary CPU a soft interrupt, thereby causing > - * the boot monitor to read the system wide flags register, > - * and branch to the address found there. > - */ > - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); > - > - return 0; > -} > - > -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) > -{ > - struct device_node *np; > - void __iomem *scu_base; > - > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); > - scu_base = of_iomap(np, 0); > - of_node_put(np); > - if (!scu_base) > - return; > - > - /* Remap CPU Interrupt Interface Registers */ > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); > - gic_cpu_ctrl = of_iomap(np, 1); > - of_node_put(np); > - if (!gic_cpu_ctrl) > - goto unmap_scu; > - > - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); > - cpu_ctrl = of_iomap(np, 0); > - of_node_put(np); > - if (!cpu_ctrl) > - goto unmap_scu; > - > - scu_enable(scu_base); > - flush_cache_all(); > - > -unmap_scu: > - iounmap(scu_base); > -} > - > -static const struct smp_operations ox820_smp_ops __initconst = { > - .smp_prepare_cpus = ox820_smp_prepare_cpus, > - .smp_boot_secondary = ox820_boot_secondary, > -}; > - > -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); > diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c > index 5d363385c801..fa31fd2d211d 100644 > --- a/arch/arm/mach-versatile/platsmp-realview.c > +++ b/arch/arm/mach-versatile/platsmp-realview.c > @@ -18,16 +18,12 @@ > #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 > > static const struct of_device_id realview_scu_match[] = { > - { .compatible = "arm,arm11mp-scu", }, > { .compatible = "arm,cortex-a9-scu", }, > { .compatible = "arm,cortex-a5-scu", }, > { } > }; > > static const struct of_device_id realview_syscon_match[] = { > - { .compatible = "arm,core-module-integrator", }, > - { .compatible = "arm,realview-eb-syscon", }, > - { .compatible = "arm,realview-pb11mp-syscon", }, > { .compatible = "arm,realview-pbx-syscon", }, > { }, > }; > diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig > index c5bbae86f725..16b62bc0a970 100644 > --- a/arch/arm/mm/Kconfig > +++ b/arch/arm/mm/Kconfig > @@ -937,25 +937,6 @@ config VDSO > You must have glibc 2.22 or later for programs to seamlessly > take advantage of this. > > -config DMA_CACHE_RWFO > - bool "Enable read/write for ownership DMA cache maintenance" > - depends on CPU_V6K && SMP > - default y > - help > - The Snoop Control Unit on ARM11MPCore does not detect the > - cache maintenance operations and the dma_{map,unmap}_area() > - functions may leave stale cache entries on other CPUs. By > - enabling this option, Read or Write For Ownership in the ARMv6 > - DMA cache maintenance functions is performed. These LDR/STR > - instructions change the cache line state to shared or modified > - so that the cache operation has the desired effect. > - > - Note that the workaround is only valid on processors that do > - not perform speculative loads into the D-cache. For such > - processors, if cache maintenance operations are not broadcast > - in hardware, other workarounds are needed (e.g. cache > - maintenance broadcasting in software via FIQ). > - > config OUTER_CACHE > bool > > diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S > index abae7ff5defc..f6ee53c1de20 100644 > --- a/arch/arm/mm/cache-v6.S > +++ b/arch/arm/mm/cache-v6.S > @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) > * - end - virtual end address of region > */ > ENTRY(v6_dma_inv_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > tst r0, #D_CACHE_LINE_SIZE - 1 > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) > mcrne p15, 0, r0, c7, c11, 1 @ clean unified line > #endif > tst r1, #D_CACHE_LINE_SIZE - 1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrbne r2, [r1, #-1] @ read for ownership > - strbne r2, [r1, #-1] @ write for ownership > -#endif > bic r1, r1, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line > @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrlo r2, [r0] @ read for ownership > - strlo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) > ENTRY(v6_dma_clean_range) > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldr r2, [r0] @ read for ownership > -#endif > #ifdef HARVARD_CACHE > mcr p15, 0, r0, c7, c10, 1 @ clean D line > #else > @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) > * - end - virtual end address of region > */ > ENTRY(v6_dma_flush_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > #ifdef HARVARD_CACHE > @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrblo r2, [r0] @ read for ownership > - strblo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) > add r1, r1, r0 > teq r2, #DMA_FROM_DEVICE > beq v6_dma_inv_range > -#ifndef CONFIG_DMA_CACHE_RWFO > b v6_dma_clean_range > -#else > - teq r2, #DMA_TO_DEVICE > - beq v6_dma_clean_range > - b v6_dma_flush_range > -#endif > ENDPROC(v6_dma_map_area) > > /* > @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) > * - dir - DMA direction > */ > ENTRY(v6_dma_unmap_area) > -#ifndef CONFIG_DMA_CACHE_RWFO > add r1, r1, r0 > teq r2, #DMA_TO_DEVICE > bne v6_dma_inv_range > -#endif > ret lr > ENDPROC(v6_dma_unmap_area) > ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 7:48 ` Neil Armstrong 0 siblings, 0 replies; 456+ messages in thread From: Neil Armstrong @ 2023-03-30 7:48 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On 27/03/2023 14:13, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. Acked-by: Neil Armstrong <neil.armstrong@linaro.org> It's sad but it's the reality, there's no chance full OXNAS support will ever come upstream and no real work has been done for years. I think OXNAS support can be programmed for removal for next release, it would need significant work to rework current support to make it acceptable before trying to upstream missing bits anyway. Thanks, Neil > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > I could use some help clarifying the above changelog text to describe > the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on > MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching > into the instruction cache, not the data cache, but this can end up in > the outercache as a result. The 1176 has some extra control bits to > control prefetching, but I found no reference that explains why an > MPCore does not run into the problem. > --- > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 ------ > arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 ----- > arch/arm/mm/cache-v6.S | 31 ------- > 7 files changed, 178 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > > diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig > index a9ded7079268..a054235c3d6c 100644 > --- a/arch/arm/mach-oxnas/Kconfig > +++ b/arch/arm/mach-oxnas/Kconfig > @@ -28,10 +28,6 @@ config MACH_OX820 > bool "Support OX820 Based Products" > depends on ARCH_MULTI_V6 > select ARM_GIC > - select DMA_CACHE_RWFO if SMP > - select HAVE_SMP > - select HAVE_ARM_SCU if SMP > - select HAVE_ARM_TWD if SMP > help > Include Support for the Oxford Semiconductor OX820 SoC Based Products. > > diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile > index 0e78ecfe6c49..a4e40e534e6a 100644 > --- a/arch/arm/mach-oxnas/Makefile > +++ b/arch/arm/mach-oxnas/Makefile > @@ -1,2 +1 @@ > # SPDX-License-Identifier: GPL-2.0-only > -obj-$(CONFIG_SMP) += platsmp.o headsmp.o > diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S > deleted file mode 100644 > index 9c0f1479f33a..000000000000 > --- a/arch/arm/mach-oxnas/headsmp.S > +++ /dev/null > @@ -1,23 +0,0 @@ > -/* SPDX-License-Identifier: GPL-2.0-only */ > -/* > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (c) 2003 ARM Limited > - * All Rights Reserved > - */ > -#include <linux/linkage.h> > -#include <linux/init.h> > - > - __INIT > - > -/* > - * OX820 specific entry point for secondary CPUs. > - */ > -ENTRY(ox820_secondary_startup) > - mov r4, #0 > - /* invalidate both caches and branch target cache */ > - mcr p15, 0, r4, c7, c7, 0 > - /* > - * we've been released from the holding pen: secondary_stack > - * should now contain the SVC stack for this core > - */ > - b secondary_startup > diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c > deleted file mode 100644 > index f0a50b9e61df..000000000000 > --- a/arch/arm/mach-oxnas/platsmp.c > +++ /dev/null > @@ -1,96 +0,0 @@ > -// SPDX-License-Identifier: GPL-2.0-only > -/* > - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (C) 2002 ARM Ltd. > - * All Rights Reserved > - */ > -#include <linux/io.h> > -#include <linux/delay.h> > -#include <linux/of.h> > -#include <linux/of_address.h> > - > -#include <asm/cacheflush.h> > -#include <asm/cp15.h> > -#include <asm/smp_plat.h> > -#include <asm/smp_scu.h> > - > -extern void ox820_secondary_startup(void); > - > -static void __iomem *cpu_ctrl; > -static void __iomem *gic_cpu_ctrl; > - > -#define HOLDINGPEN_CPU_OFFSET 0xc8 > -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 > - > -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) > -#define GIC_CPU_CTRL 0x00 > -#define GIC_CPU_CTRL_ENABLE 1 > - > -static int __init ox820_boot_secondary(unsigned int cpu, > - struct task_struct *idle) > -{ > - /* > - * Write the address of secondary startup into the > - * system-wide flags register. The BootMonitor waits > - * until it receives a soft interrupt, and then the > - * secondary CPU branches to this address. > - */ > - writel(virt_to_phys(ox820_secondary_startup), > - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); > - > - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); > - > - /* > - * Enable GIC cpu interface in CPU Interface Control Register > - */ > - writel(GIC_CPU_CTRL_ENABLE, > - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); > - > - /* > - * Send the secondary CPU a soft interrupt, thereby causing > - * the boot monitor to read the system wide flags register, > - * and branch to the address found there. > - */ > - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); > - > - return 0; > -} > - > -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) > -{ > - struct device_node *np; > - void __iomem *scu_base; > - > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); > - scu_base = of_iomap(np, 0); > - of_node_put(np); > - if (!scu_base) > - return; > - > - /* Remap CPU Interrupt Interface Registers */ > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); > - gic_cpu_ctrl = of_iomap(np, 1); > - of_node_put(np); > - if (!gic_cpu_ctrl) > - goto unmap_scu; > - > - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); > - cpu_ctrl = of_iomap(np, 0); > - of_node_put(np); > - if (!cpu_ctrl) > - goto unmap_scu; > - > - scu_enable(scu_base); > - flush_cache_all(); > - > -unmap_scu: > - iounmap(scu_base); > -} > - > -static const struct smp_operations ox820_smp_ops __initconst = { > - .smp_prepare_cpus = ox820_smp_prepare_cpus, > - .smp_boot_secondary = ox820_boot_secondary, > -}; > - > -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); > diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c > index 5d363385c801..fa31fd2d211d 100644 > --- a/arch/arm/mach-versatile/platsmp-realview.c > +++ b/arch/arm/mach-versatile/platsmp-realview.c > @@ -18,16 +18,12 @@ > #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 > > static const struct of_device_id realview_scu_match[] = { > - { .compatible = "arm,arm11mp-scu", }, > { .compatible = "arm,cortex-a9-scu", }, > { .compatible = "arm,cortex-a5-scu", }, > { } > }; > > static const struct of_device_id realview_syscon_match[] = { > - { .compatible = "arm,core-module-integrator", }, > - { .compatible = "arm,realview-eb-syscon", }, > - { .compatible = "arm,realview-pb11mp-syscon", }, > { .compatible = "arm,realview-pbx-syscon", }, > { }, > }; > diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig > index c5bbae86f725..16b62bc0a970 100644 > --- a/arch/arm/mm/Kconfig > +++ b/arch/arm/mm/Kconfig > @@ -937,25 +937,6 @@ config VDSO > You must have glibc 2.22 or later for programs to seamlessly > take advantage of this. > > -config DMA_CACHE_RWFO > - bool "Enable read/write for ownership DMA cache maintenance" > - depends on CPU_V6K && SMP > - default y > - help > - The Snoop Control Unit on ARM11MPCore does not detect the > - cache maintenance operations and the dma_{map,unmap}_area() > - functions may leave stale cache entries on other CPUs. By > - enabling this option, Read or Write For Ownership in the ARMv6 > - DMA cache maintenance functions is performed. These LDR/STR > - instructions change the cache line state to shared or modified > - so that the cache operation has the desired effect. > - > - Note that the workaround is only valid on processors that do > - not perform speculative loads into the D-cache. For such > - processors, if cache maintenance operations are not broadcast > - in hardware, other workarounds are needed (e.g. cache > - maintenance broadcasting in software via FIQ). > - > config OUTER_CACHE > bool > > diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S > index abae7ff5defc..f6ee53c1de20 100644 > --- a/arch/arm/mm/cache-v6.S > +++ b/arch/arm/mm/cache-v6.S > @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) > * - end - virtual end address of region > */ > ENTRY(v6_dma_inv_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > tst r0, #D_CACHE_LINE_SIZE - 1 > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) > mcrne p15, 0, r0, c7, c11, 1 @ clean unified line > #endif > tst r1, #D_CACHE_LINE_SIZE - 1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrbne r2, [r1, #-1] @ read for ownership > - strbne r2, [r1, #-1] @ write for ownership > -#endif > bic r1, r1, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line > @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrlo r2, [r0] @ read for ownership > - strlo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) > ENTRY(v6_dma_clean_range) > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldr r2, [r0] @ read for ownership > -#endif > #ifdef HARVARD_CACHE > mcr p15, 0, r0, c7, c10, 1 @ clean D line > #else > @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) > * - end - virtual end address of region > */ > ENTRY(v6_dma_flush_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > #ifdef HARVARD_CACHE > @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrblo r2, [r0] @ read for ownership > - strblo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) > add r1, r1, r0 > teq r2, #DMA_FROM_DEVICE > beq v6_dma_inv_range > -#ifndef CONFIG_DMA_CACHE_RWFO > b v6_dma_clean_range > -#else > - teq r2, #DMA_TO_DEVICE > - beq v6_dma_clean_range > - b v6_dma_flush_range > -#endif > ENDPROC(v6_dma_map_area) > > /* > @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) > * - dir - DMA direction > */ > ENTRY(v6_dma_unmap_area) > -#ifndef CONFIG_DMA_CACHE_RWFO > add r1, r1, r0 > teq r2, #DMA_TO_DEVICE > bne v6_dma_inv_range > -#endif > ret lr > ENDPROC(v6_dma_unmap_area) > _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 7:48 ` Neil Armstrong 0 siblings, 0 replies; 456+ messages in thread From: Neil Armstrong @ 2023-03-30 7:48 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On 27/03/2023 14:13, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. Acked-by: Neil Armstrong <neil.armstrong@linaro.org> It's sad but it's the reality, there's no chance full OXNAS support will ever come upstream and no real work has been done for years. I think OXNAS support can be programmed for removal for next release, it would need significant work to rework current support to make it acceptable before trying to upstream missing bits anyway. Thanks, Neil > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > I could use some help clarifying the above changelog text to describe > the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on > MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching > into the instruction cache, not the data cache, but this can end up in > the outercache as a result. The 1176 has some extra control bits to > control prefetching, but I found no reference that explains why an > MPCore does not run into the problem. > --- > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 ------ > arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 ----- > arch/arm/mm/cache-v6.S | 31 ------- > 7 files changed, 178 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > > diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig > index a9ded7079268..a054235c3d6c 100644 > --- a/arch/arm/mach-oxnas/Kconfig > +++ b/arch/arm/mach-oxnas/Kconfig > @@ -28,10 +28,6 @@ config MACH_OX820 > bool "Support OX820 Based Products" > depends on ARCH_MULTI_V6 > select ARM_GIC > - select DMA_CACHE_RWFO if SMP > - select HAVE_SMP > - select HAVE_ARM_SCU if SMP > - select HAVE_ARM_TWD if SMP > help > Include Support for the Oxford Semiconductor OX820 SoC Based Products. > > diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile > index 0e78ecfe6c49..a4e40e534e6a 100644 > --- a/arch/arm/mach-oxnas/Makefile > +++ b/arch/arm/mach-oxnas/Makefile > @@ -1,2 +1 @@ > # SPDX-License-Identifier: GPL-2.0-only > -obj-$(CONFIG_SMP) += platsmp.o headsmp.o > diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S > deleted file mode 100644 > index 9c0f1479f33a..000000000000 > --- a/arch/arm/mach-oxnas/headsmp.S > +++ /dev/null > @@ -1,23 +0,0 @@ > -/* SPDX-License-Identifier: GPL-2.0-only */ > -/* > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (c) 2003 ARM Limited > - * All Rights Reserved > - */ > -#include <linux/linkage.h> > -#include <linux/init.h> > - > - __INIT > - > -/* > - * OX820 specific entry point for secondary CPUs. > - */ > -ENTRY(ox820_secondary_startup) > - mov r4, #0 > - /* invalidate both caches and branch target cache */ > - mcr p15, 0, r4, c7, c7, 0 > - /* > - * we've been released from the holding pen: secondary_stack > - * should now contain the SVC stack for this core > - */ > - b secondary_startup > diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c > deleted file mode 100644 > index f0a50b9e61df..000000000000 > --- a/arch/arm/mach-oxnas/platsmp.c > +++ /dev/null > @@ -1,96 +0,0 @@ > -// SPDX-License-Identifier: GPL-2.0-only > -/* > - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (C) 2002 ARM Ltd. > - * All Rights Reserved > - */ > -#include <linux/io.h> > -#include <linux/delay.h> > -#include <linux/of.h> > -#include <linux/of_address.h> > - > -#include <asm/cacheflush.h> > -#include <asm/cp15.h> > -#include <asm/smp_plat.h> > -#include <asm/smp_scu.h> > - > -extern void ox820_secondary_startup(void); > - > -static void __iomem *cpu_ctrl; > -static void __iomem *gic_cpu_ctrl; > - > -#define HOLDINGPEN_CPU_OFFSET 0xc8 > -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 > - > -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) > -#define GIC_CPU_CTRL 0x00 > -#define GIC_CPU_CTRL_ENABLE 1 > - > -static int __init ox820_boot_secondary(unsigned int cpu, > - struct task_struct *idle) > -{ > - /* > - * Write the address of secondary startup into the > - * system-wide flags register. The BootMonitor waits > - * until it receives a soft interrupt, and then the > - * secondary CPU branches to this address. > - */ > - writel(virt_to_phys(ox820_secondary_startup), > - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); > - > - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); > - > - /* > - * Enable GIC cpu interface in CPU Interface Control Register > - */ > - writel(GIC_CPU_CTRL_ENABLE, > - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); > - > - /* > - * Send the secondary CPU a soft interrupt, thereby causing > - * the boot monitor to read the system wide flags register, > - * and branch to the address found there. > - */ > - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); > - > - return 0; > -} > - > -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) > -{ > - struct device_node *np; > - void __iomem *scu_base; > - > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); > - scu_base = of_iomap(np, 0); > - of_node_put(np); > - if (!scu_base) > - return; > - > - /* Remap CPU Interrupt Interface Registers */ > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); > - gic_cpu_ctrl = of_iomap(np, 1); > - of_node_put(np); > - if (!gic_cpu_ctrl) > - goto unmap_scu; > - > - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); > - cpu_ctrl = of_iomap(np, 0); > - of_node_put(np); > - if (!cpu_ctrl) > - goto unmap_scu; > - > - scu_enable(scu_base); > - flush_cache_all(); > - > -unmap_scu: > - iounmap(scu_base); > -} > - > -static const struct smp_operations ox820_smp_ops __initconst = { > - .smp_prepare_cpus = ox820_smp_prepare_cpus, > - .smp_boot_secondary = ox820_boot_secondary, > -}; > - > -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); > diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c > index 5d363385c801..fa31fd2d211d 100644 > --- a/arch/arm/mach-versatile/platsmp-realview.c > +++ b/arch/arm/mach-versatile/platsmp-realview.c > @@ -18,16 +18,12 @@ > #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 > > static const struct of_device_id realview_scu_match[] = { > - { .compatible = "arm,arm11mp-scu", }, > { .compatible = "arm,cortex-a9-scu", }, > { .compatible = "arm,cortex-a5-scu", }, > { } > }; > > static const struct of_device_id realview_syscon_match[] = { > - { .compatible = "arm,core-module-integrator", }, > - { .compatible = "arm,realview-eb-syscon", }, > - { .compatible = "arm,realview-pb11mp-syscon", }, > { .compatible = "arm,realview-pbx-syscon", }, > { }, > }; > diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig > index c5bbae86f725..16b62bc0a970 100644 > --- a/arch/arm/mm/Kconfig > +++ b/arch/arm/mm/Kconfig > @@ -937,25 +937,6 @@ config VDSO > You must have glibc 2.22 or later for programs to seamlessly > take advantage of this. > > -config DMA_CACHE_RWFO > - bool "Enable read/write for ownership DMA cache maintenance" > - depends on CPU_V6K && SMP > - default y > - help > - The Snoop Control Unit on ARM11MPCore does not detect the > - cache maintenance operations and the dma_{map,unmap}_area() > - functions may leave stale cache entries on other CPUs. By > - enabling this option, Read or Write For Ownership in the ARMv6 > - DMA cache maintenance functions is performed. These LDR/STR > - instructions change the cache line state to shared or modified > - so that the cache operation has the desired effect. > - > - Note that the workaround is only valid on processors that do > - not perform speculative loads into the D-cache. For such > - processors, if cache maintenance operations are not broadcast > - in hardware, other workarounds are needed (e.g. cache > - maintenance broadcasting in software via FIQ). > - > config OUTER_CACHE > bool > > diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S > index abae7ff5defc..f6ee53c1de20 100644 > --- a/arch/arm/mm/cache-v6.S > +++ b/arch/arm/mm/cache-v6.S > @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) > * - end - virtual end address of region > */ > ENTRY(v6_dma_inv_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > tst r0, #D_CACHE_LINE_SIZE - 1 > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) > mcrne p15, 0, r0, c7, c11, 1 @ clean unified line > #endif > tst r1, #D_CACHE_LINE_SIZE - 1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrbne r2, [r1, #-1] @ read for ownership > - strbne r2, [r1, #-1] @ write for ownership > -#endif > bic r1, r1, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line > @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrlo r2, [r0] @ read for ownership > - strlo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) > ENTRY(v6_dma_clean_range) > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldr r2, [r0] @ read for ownership > -#endif > #ifdef HARVARD_CACHE > mcr p15, 0, r0, c7, c10, 1 @ clean D line > #else > @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) > * - end - virtual end address of region > */ > ENTRY(v6_dma_flush_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > #ifdef HARVARD_CACHE > @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrblo r2, [r0] @ read for ownership > - strblo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) > add r1, r1, r0 > teq r2, #DMA_FROM_DEVICE > beq v6_dma_inv_range > -#ifndef CONFIG_DMA_CACHE_RWFO > b v6_dma_clean_range > -#else > - teq r2, #DMA_TO_DEVICE > - beq v6_dma_clean_range > - b v6_dma_flush_range > -#endif > ENDPROC(v6_dma_map_area) > > /* > @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) > * - dir - DMA direction > */ > ENTRY(v6_dma_unmap_area) > -#ifndef CONFIG_DMA_CACHE_RWFO > add r1, r1, r0 > teq r2, #DMA_TO_DEVICE > bne v6_dma_inv_range > -#endif > ret lr > ENDPROC(v6_dma_unmap_area) > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 7:48 ` Neil Armstrong 0 siblings, 0 replies; 456+ messages in thread From: Neil Armstrong @ 2023-03-30 7:48 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy <> On 27/03/2023 14:13, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. Acked-by: Neil Armstrong <neil.armstrong@linaro.org> It's sad but it's the reality, there's no chance full OXNAS support will ever come upstream and no real work has been done for years. I think OXNAS support can be programmed for removal for next release, it would need significant work to rework current support to make it acceptable before trying to upstream missing bits anyway. Thanks, Neil > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > I could use some help clarifying the above changelog text to describe > the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on > MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching > into the instruction cache, not the data cache, but this can end up in > the outercache as a result. The 1176 has some extra control bits to > control prefetching, but I found no reference that explains why an > MPCore does not run into the problem. > --- > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 ------ > arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 ----- > arch/arm/mm/cache-v6.S | 31 ------- > 7 files changed, 178 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > > diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig > index a9ded7079268..a054235c3d6c 100644 > --- a/arch/arm/mach-oxnas/Kconfig > +++ b/arch/arm/mach-oxnas/Kconfig > @@ -28,10 +28,6 @@ config MACH_OX820 > bool "Support OX820 Based Products" > depends on ARCH_MULTI_V6 > select ARM_GIC > - select DMA_CACHE_RWFO if SMP > - select HAVE_SMP > - select HAVE_ARM_SCU if SMP > - select HAVE_ARM_TWD if SMP > help > Include Support for the Oxford Semiconductor OX820 SoC Based Products. > > diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile > index 0e78ecfe6c49..a4e40e534e6a 100644 > --- a/arch/arm/mach-oxnas/Makefile > +++ b/arch/arm/mach-oxnas/Makefile > @@ -1,2 +1 @@ > # SPDX-License-Identifier: GPL-2.0-only > -obj-$(CONFIG_SMP) += platsmp.o headsmp.o > diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S > deleted file mode 100644 > index 9c0f1479f33a..000000000000 > --- a/arch/arm/mach-oxnas/headsmp.S > +++ /dev/null > @@ -1,23 +0,0 @@ > -/* SPDX-License-Identifier: GPL-2.0-only */ > -/* > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (c) 2003 ARM Limited > - * All Rights Reserved > - */ > -#include <linux/linkage.h> > -#include <linux/init.h> > - > - __INIT > - > -/* > - * OX820 specific entry point for secondary CPUs. > - */ > -ENTRY(ox820_secondary_startup) > - mov r4, #0 > - /* invalidate both caches and branch target cache */ > - mcr p15, 0, r4, c7, c7, 0 > - /* > - * we've been released from the holding pen: secondary_stack > - * should now contain the SVC stack for this core > - */ > - b secondary_startup > diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c > deleted file mode 100644 > index f0a50b9e61df..000000000000 > --- a/arch/arm/mach-oxnas/platsmp.c > +++ /dev/null > @@ -1,96 +0,0 @@ > -// SPDX-License-Identifier: GPL-2.0-only > -/* > - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (C) 2002 ARM Ltd. > - * All Rights Reserved > - */ > -#include <linux/io.h> > -#include <linux/delay.h> > -#include <linux/of.h> > -#include <linux/of_address.h> > - > -#include <asm/cacheflush.h> > -#include <asm/cp15.h> > -#include <asm/smp_plat.h> > -#include <asm/smp_scu.h> > - > -extern void ox820_secondary_startup(void); > - > -static void __iomem *cpu_ctrl; > -static void __iomem *gic_cpu_ctrl; > - > -#define HOLDINGPEN_CPU_OFFSET 0xc8 > -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 > - > -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) > -#define GIC_CPU_CTRL 0x00 > -#define GIC_CPU_CTRL_ENABLE 1 > - > -static int __init ox820_boot_secondary(unsigned int cpu, > - struct task_struct *idle) > -{ > - /* > - * Write the address of secondary startup into the > - * system-wide flags register. The BootMonitor waits > - * until it receives a soft interrupt, and then the > - * secondary CPU branches to this address. > - */ > - writel(virt_to_phys(ox820_secondary_startup), > - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); > - > - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); > - > - /* > - * Enable GIC cpu interface in CPU Interface Control Register > - */ > - writel(GIC_CPU_CTRL_ENABLE, > - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); > - > - /* > - * Send the secondary CPU a soft interrupt, thereby causing > - * the boot monitor to read the system wide flags register, > - * and branch to the address found there. > - */ > - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); > - > - return 0; > -} > - > -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) > -{ > - struct device_node *np; > - void __iomem *scu_base; > - > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); > - scu_base = of_iomap(np, 0); > - of_node_put(np); > - if (!scu_base) > - return; > - > - /* Remap CPU Interrupt Interface Registers */ > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); > - gic_cpu_ctrl = of_iomap(np, 1); > - of_node_put(np); > - if (!gic_cpu_ctrl) > - goto unmap_scu; > - > - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); > - cpu_ctrl = of_iomap(np, 0); > - of_node_put(np); > - if (!cpu_ctrl) > - goto unmap_scu; > - > - scu_enable(scu_base); > - flush_cache_all(); > - > -unmap_scu: > - iounmap(scu_base); > -} > - > -static const struct smp_operations ox820_smp_ops __initconst = { > - .smp_prepare_cpus = ox820_smp_prepare_cpus, > - .smp_boot_secondary = ox820_boot_secondary, > -}; > - > -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); > diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c > index 5d363385c801..fa31fd2d211d 100644 > --- a/arch/arm/mach-versatile/platsmp-realview.c > +++ b/arch/arm/mach-versatile/platsmp-realview.c > @@ -18,16 +18,12 @@ > #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 > > static const struct of_device_id realview_scu_match[] = { > - { .compatible = "arm,arm11mp-scu", }, > { .compatible = "arm,cortex-a9-scu", }, > { .compatible = "arm,cortex-a5-scu", }, > { } > }; > > static const struct of_device_id realview_syscon_match[] = { > - { .compatible = "arm,core-module-integrator", }, > - { .compatible = "arm,realview-eb-syscon", }, > - { .compatible = "arm,realview-pb11mp-syscon", }, > { .compatible = "arm,realview-pbx-syscon", }, > { }, > }; > diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig > index c5bbae86f725..16b62bc0a970 100644 > --- a/arch/arm/mm/Kconfig > +++ b/arch/arm/mm/Kconfig > @@ -937,25 +937,6 @@ config VDSO > You must have glibc 2.22 or later for programs to seamlessly > take advantage of this. > > -config DMA_CACHE_RWFO > - bool "Enable read/write for ownership DMA cache maintenance" > - depends on CPU_V6K && SMP > - default y > - help > - The Snoop Control Unit on ARM11MPCore does not detect the > - cache maintenance operations and the dma_{map,unmap}_area() > - functions may leave stale cache entries on other CPUs. By > - enabling this option, Read or Write For Ownership in the ARMv6 > - DMA cache maintenance functions is performed. These LDR/STR > - instructions change the cache line state to shared or modified > - so that the cache operation has the desired effect. > - > - Note that the workaround is only valid on processors that do > - not perform speculative loads into the D-cache. For such > - processors, if cache maintenance operations are not broadcast > - in hardware, other workarounds are needed (e.g. cache > - maintenance broadcasting in software via FIQ). > - > config OUTER_CACHE > bool > > diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S > index abae7ff5defc..f6ee53c1de20 100644 > --- a/arch/arm/mm/cache-v6.S > +++ b/arch/arm/mm/cache-v6.S > @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) > * - end - virtual end address of region > */ > ENTRY(v6_dma_inv_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > tst r0, #D_CACHE_LINE_SIZE - 1 > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) > mcrne p15, 0, r0, c7, c11, 1 @ clean unified line > #endif > tst r1, #D_CACHE_LINE_SIZE - 1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrbne r2, [r1, #-1] @ read for ownership > - strbne r2, [r1, #-1] @ write for ownership > -#endif > bic r1, r1, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line > @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrlo r2, [r0] @ read for ownership > - strlo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) > ENTRY(v6_dma_clean_range) > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldr r2, [r0] @ read for ownership > -#endif > #ifdef HARVARD_CACHE > mcr p15, 0, r0, c7, c10, 1 @ clean D line > #else > @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) > * - end - virtual end address of region > */ > ENTRY(v6_dma_flush_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > #ifdef HARVARD_CACHE > @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrblo r2, [r0] @ read for ownership > - strblo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) > add r1, r1, r0 > teq r2, #DMA_FROM_DEVICE > beq v6_dma_inv_range > -#ifndef CONFIG_DMA_CACHE_RWFO > b v6_dma_clean_range > -#else > - teq r2, #DMA_TO_DEVICE > - beq v6_dma_clean_range > - b v6_dma_flush_range > -#endif > ENDPROC(v6_dma_map_area) > > /* > @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) > * - dir - DMA direction > */ > ENTRY(v6_dma_unmap_area) > -#ifndef CONFIG_DMA_CACHE_RWFO > add r1, r1, r0 > teq r2, #DMA_TO_DEVICE > bne v6_dma_inv_range > -#endif > ret lr > ENDPROC(v6_dma_unmap_area) > ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore 2023-03-30 7:48 ` Neil Armstrong ` (3 preceding siblings ...) (?) @ 2023-03-30 10:03 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-30 10:03 UTC (permalink / raw) To: Neil Armstrong, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote: > On 27/03/2023 14:13, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> The cache management operations for noncoherent DMA on ARMv6 work >> in two different ways: >> >> * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight >> DMA buffers lead to data corruption when the prefetched data is written >> back on top of data from the device. >> >> * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU >> is not seen by the other core(s), leading to inconsistent contents >> accross the system. >> >> As a consequence, neither configuration is actually safe to use in a >> general-purpose kernel that is used on both MPCore systems and ARM1176 >> with prefetching enabled. >> >> We could add further workarounds to make the behavior more dynamic based >> on the system, but realistically, there are close to zero remaining >> users on any ARM11MPCore anyway, and nobody seems too interested in it, >> compared to the more popular ARM1176 used in BMC2835 and AST2500. >> >> The Oxnas platform has some minimal support in OpenWRT, but most of the >> drivers and dts files never made it into the mainline kernel, while the >> Arm Versatile/Realview platform mainly serves as a reference system but >> is not necessary to be kept working once all other ARM11MPCore are gone. > > Acked-by: Neil Armstrong <neil.armstrong@linaro.org> > > It's sad but it's the reality, there's no chance full OXNAS support will > ever come upstream and no real work has been done for years. > > I think OXNAS support can be programmed for removal for next release, > it would need significant work to rework current support to make it acceptable > before trying to upstream missing bits anyway. Ok, thanks for your reply! To clarify, do you think we should plan for removal after the next stable release (6.3, removed in 6.4), or after the next LTS release (probably 6.6, removed in 6.7)? As far as I understand, the next OpenWRT release (23.x) will be based on linux-5.15, and the one after that (24.x) would likely still use 6.1, unless they skip an LTS kernel. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 10:03 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-30 10:03 UTC (permalink / raw) To: Neil Armstrong, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote: > On 27/03/2023 14:13, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> The cache management operations for noncoherent DMA on ARMv6 work >> in two different ways: >> >> * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight >> DMA buffers lead to data corruption when the prefetched data is written >> back on top of data from the device. >> >> * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU >> is not seen by the other core(s), leading to inconsistent contents >> accross the system. >> >> As a consequence, neither configuration is actually safe to use in a >> general-purpose kernel that is used on both MPCore systems and ARM1176 >> with prefetching enabled. >> >> We could add further workarounds to make the behavior more dynamic based >> on the system, but realistically, there are close to zero remaining >> users on any ARM11MPCore anyway, and nobody seems too interested in it, >> compared to the more popular ARM1176 used in BMC2835 and AST2500. >> >> The Oxnas platform has some minimal support in OpenWRT, but most of the >> drivers and dts files never made it into the mainline kernel, while the >> Arm Versatile/Realview platform mainly serves as a reference system but >> is not necessary to be kept working once all other ARM11MPCore are gone. > > Acked-by: Neil Armstrong <neil.armstrong@linaro.org> > > It's sad but it's the reality, there's no chance full OXNAS support will > ever come upstream and no real work has been done for years. > > I think OXNAS support can be programmed for removal for next release, > it would need significant work to rework current support to make it acceptable > before trying to upstream missing bits anyway. Ok, thanks for your reply! To clarify, do you think we should plan for removal after the next stable release (6.3, removed in 6.4), or after the next LTS release (probably 6.6, removed in 6.7)? As far as I understand, the next OpenWRT release (23.x) will be based on linux-5.15, and the one after that (24.x) would likely still use 6.1, unless they skip an LTS kernel. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 10:03 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-30 10:03 UTC (permalink / raw) To: Neil Armstrong, Arnd Bergmann, linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Daniel Golle, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Brian Cain, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote: > On 27/03/2023 14:13, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> The cache management operations for noncoherent DMA on ARMv6 work >> in two different ways: >> >> * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight >> DMA buffers lead to data corruption when the prefetched data is written >> back on top of data from the device. >> >> * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU >> is not seen by the other core(s), leading to inconsistent contents >> accross the system. >> >> As a consequence, neither configuration is actually safe to use in a >> general-purpose kernel that is used on both MPCore systems and ARM1176 >> with prefetching enabled. >> >> We could add further workarounds to make the behavior more dynamic based >> on the system, but realistically, there are close to zero remaining >> users on any ARM11MPCore anyway, and nobody seems too interested in it, >> compared to the more popular ARM1176 used in BMC2835 and AST2500. >> >> The Oxnas platform has some minimal support in OpenWRT, but most of the >> drivers and dts files never made it into the mainline kernel, while the >> Arm Versatile/Realview platform mainly serves as a reference system but >> is not necessary to be kept working once all other ARM11MPCore are gone. > > Acked-by: Neil Armstrong <neil.armstrong@linaro.org> > > It's sad but it's the reality, there's no chance full OXNAS support will > ever come upstream and no real work has been done for years. > > I think OXNAS support can be programmed for removal for next release, > it would need significant work to rework current support to make it acceptable > before trying to upstream missing bits anyway. Ok, thanks for your reply! To clarify, do you think we should plan for removal after the next stable release (6.3, removed in 6.4), or after the next LTS release (probably 6.6, removed in 6.7)? As far as I understand, the next OpenWRT release (23.x) will be based on linux-5.15, and the one after that (24.x) would likely still use 6.1, unless they skip an LTS kernel. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 10:03 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-30 10:03 UTC (permalink / raw) To: Neil Armstrong, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote: > On 27/03/2023 14:13, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> The cache management operations for noncoherent DMA on ARMv6 work >> in two different ways: >> >> * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight >> DMA buffers lead to data corruption when the prefetched data is written >> back on top of data from the device. >> >> * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU >> is not seen by the other core(s), leading to inconsistent contents >> accross the system. >> >> As a consequence, neither configuration is actually safe to use in a >> general-purpose kernel that is used on both MPCore systems and ARM1176 >> with prefetching enabled. >> >> We could add further workarounds to make the behavior more dynamic based >> on the system, but realistically, there are close to zero remaining >> users on any ARM11MPCore anyway, and nobody seems too interested in it, >> compared to the more popular ARM1176 used in BMC2835 and AST2500. >> >> The Oxnas platform has some minimal support in OpenWRT, but most of the >> drivers and dts files never made it into the mainline kernel, while the >> Arm Versatile/Realview platform mainly serves as a reference system but >> is not necessary to be kept working once all other ARM11MPCore are gone. > > Acked-by: Neil Armstrong <neil.armstrong@linaro.org> > > It's sad but it's the reality, there's no chance full OXNAS support will > ever come upstream and no real work has been done for years. > > I think OXNAS support can be programmed for removal for next release, > it would need significant work to rework current support to make it acceptable > before trying to upstream missing bits anyway. Ok, thanks for your reply! To clarify, do you think we should plan for removal after the next stable release (6.3, removed in 6.4), or after the next LTS release (probably 6.6, removed in 6.7)? As far as I understand, the next OpenWRT release (23.x) will be based on linux-5.15, and the one after that (24.x) would likely still use 6.1, unless they skip an LTS kernel. Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 10:03 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-30 10:03 UTC (permalink / raw) To: Neil Armstrong, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote: > On 27/03/2023 14:13, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> The cache management operations for noncoherent DMA on ARMv6 work >> in two different ways: >> >> * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight >> DMA buffers lead to data corruption when the prefetched data is written >> back on top of data from the device. >> >> * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU >> is not seen by the other core(s), leading to inconsistent contents >> accross the system. >> >> As a consequence, neither configuration is actually safe to use in a >> general-purpose kernel that is used on both MPCore systems and ARM1176 >> with prefetching enabled. >> >> We could add further workarounds to make the behavior more dynamic based >> on the system, but realistically, there are close to zero remaining >> users on any ARM11MPCore anyway, and nobody seems too interested in it, >> compared to the more popular ARM1176 used in BMC2835 and AST2500. >> >> The Oxnas platform has some minimal support in OpenWRT, but most of the >> drivers and dts files never made it into the mainline kernel, while the >> Arm Versatile/Realview platform mainly serves as a reference system but >> is not necessary to be kept working once all other ARM11MPCore are gone. > > Acked-by: Neil Armstrong <neil.armstrong@linaro.org> > > It's sad but it's the reality, there's no chance full OXNAS support will > ever come upstream and no real work has been done for years. > > I think OXNAS support can be programmed for removal for next release, > it would need significant work to rework current support to make it acceptable > before trying to upstream missing bits anyway. Ok, thanks for your reply! To clarify, do you think we should plan for removal after the next stable release (6.3, removed in 6.4), or after the next LTS release (probably 6.6, removed in 6.7)? As far as I understand, the next OpenWRT release (23.x) will be based on linux-5.15, and the one after that (24.x) would likely still use 6.1, unless they skip an LTS kernel. Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 10:03 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-30 10:03 UTC (permalink / raw) To: Neil Armstrong, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, R On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote: > On 27/03/2023 14:13, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> The cache management operations for noncoherent DMA on ARMv6 work >> in two different ways: >> >> * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight >> DMA buffers lead to data corruption when the prefetched data is written >> back on top of data from the device. >> >> * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU >> is not seen by the other core(s), leading to inconsistent contents >> accross the system. >> >> As a consequence, neither configuration is actually safe to use in a >> general-purpose kernel that is used on both MPCore systems and ARM1176 >> with prefetching enabled. >> >> We could add further workarounds to make the behavior more dynamic based >> on the system, but realistically, there are close to zero remaining >> users on any ARM11MPCore anyway, and nobody seems too interested in it, >> compared to the more popular ARM1176 used in BMC2835 and AST2500. >> >> The Oxnas platform has some minimal support in OpenWRT, but most of the >> drivers and dts files never made it into the mainline kernel, while the >> Arm Versatile/Realview platform mainly serves as a reference system but >> is not necessary to be kept working once all other ARM11MPCore are gone. > > Acked-by: Neil Armstrong <neil.armstrong@linaro.org> > > It's sad but it's the reality, there's no chance full OXNAS support will > ever come upstream and no real work has been done for years. > > I think OXNAS support can be programmed for removal for next release, > it would need significant work to rework current support to make it acceptable > before trying to upstream missing bits anyway. Ok, thanks for your reply! To clarify, do you think we should plan for removal after the next stable release (6.3, removed in 6.4), or after the next LTS release (probably 6.6, removed in 6.7)? As far as I understand, the next OpenWRT release (23.x) will be based on linux-5.15, and the one after that (24.x) would likely still use 6.1, unless they skip an LTS kernel. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore 2023-03-30 10:03 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-30 16:40 ` Neil Armstrong -1 siblings, 0 replies; 456+ messages in thread From: Neil Armstrong @ 2023-03-30 16:40 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle Le 30/03/2023 à 12:03, Arnd Bergmann a écrit : > On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote: >> On 27/03/2023 14:13, Arnd Bergmann wrote: >>> From: Arnd Bergmann <arnd@arndb.de> >>> >>> The cache management operations for noncoherent DMA on ARMv6 work >>> in two different ways: >>> >>> * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight >>> DMA buffers lead to data corruption when the prefetched data is written >>> back on top of data from the device. >>> >>> * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU >>> is not seen by the other core(s), leading to inconsistent contents >>> accross the system. >>> >>> As a consequence, neither configuration is actually safe to use in a >>> general-purpose kernel that is used on both MPCore systems and ARM1176 >>> with prefetching enabled. >>> >>> We could add further workarounds to make the behavior more dynamic based >>> on the system, but realistically, there are close to zero remaining >>> users on any ARM11MPCore anyway, and nobody seems too interested in it, >>> compared to the more popular ARM1176 used in BMC2835 and AST2500. >>> >>> The Oxnas platform has some minimal support in OpenWRT, but most of the >>> drivers and dts files never made it into the mainline kernel, while the >>> Arm Versatile/Realview platform mainly serves as a reference system but >>> is not necessary to be kept working once all other ARM11MPCore are gone. >> >> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> >> >> It's sad but it's the reality, there's no chance full OXNAS support will >> ever come upstream and no real work has been done for years. >> >> I think OXNAS support can be programmed for removal for next release, >> it would need significant work to rework current support to make it acceptable >> before trying to upstream missing bits anyway. > > Ok, thanks for your reply! > > To clarify, do you think we should plan for removal after the next > stable release (6.3, removed in 6.4), or after the next LTS > release (probably 6.6, removed in 6.7)? As far as I understand, > the next OpenWRT release (23.x) will be based on linux-5.15, > and the one after that (24.x) would likely still use 6.1, unless > they skip an LTS kernel. I think it's ok to remove it ASAP, or at least before the next LTS, not having SMP makes the platform barely usable so the earliest is the best. Neil > > Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 16:40 ` Neil Armstrong 0 siblings, 0 replies; 456+ messages in thread From: Neil Armstrong @ 2023-03-30 16:40 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle Le 30/03/2023 à 12:03, Arnd Bergmann a écrit : > On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote: >> On 27/03/2023 14:13, Arnd Bergmann wrote: >>> From: Arnd Bergmann <arnd@arndb.de> >>> >>> The cache management operations for noncoherent DMA on ARMv6 work >>> in two different ways: >>> >>> * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight >>> DMA buffers lead to data corruption when the prefetched data is written >>> back on top of data from the device. >>> >>> * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU >>> is not seen by the other core(s), leading to inconsistent contents >>> accross the system. >>> >>> As a consequence, neither configuration is actually safe to use in a >>> general-purpose kernel that is used on both MPCore systems and ARM1176 >>> with prefetching enabled. >>> >>> We could add further workarounds to make the behavior more dynamic based >>> on the system, but realistically, there are close to zero remaining >>> users on any ARM11MPCore anyway, and nobody seems too interested in it, >>> compared to the more popular ARM1176 used in BMC2835 and AST2500. >>> >>> The Oxnas platform has some minimal support in OpenWRT, but most of the >>> drivers and dts files never made it into the mainline kernel, while the >>> Arm Versatile/Realview platform mainly serves as a reference system but >>> is not necessary to be kept working once all other ARM11MPCore are gone. >> >> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> >> >> It's sad but it's the reality, there's no chance full OXNAS support will >> ever come upstream and no real work has been done for years. >> >> I think OXNAS support can be programmed for removal for next release, >> it would need significant work to rework current support to make it acceptable >> before trying to upstream missing bits anyway. > > Ok, thanks for your reply! > > To clarify, do you think we should plan for removal after the next > stable release (6.3, removed in 6.4), or after the next LTS > release (probably 6.6, removed in 6.7)? As far as I understand, > the next OpenWRT release (23.x) will be based on linux-5.15, > and the one after that (24.x) would likely still use 6.1, unless > they skip an LTS kernel. I think it's ok to remove it ASAP, or at least before the next LTS, not having SMP makes the platform barely usable so the earliest is the best. Neil > > Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 16:40 ` Neil Armstrong 0 siblings, 0 replies; 456+ messages in thread From: Neil Armstrong @ 2023-03-30 16:40 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Daniel Golle, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Brian Cain, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, Robin Murphy, David S . Miller Le 30/03/2023 à 12:03, Arnd Bergmann a écrit : > On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote: >> On 27/03/2023 14:13, Arnd Bergmann wrote: >>> From: Arnd Bergmann <arnd@arndb.de> >>> >>> The cache management operations for noncoherent DMA on ARMv6 work >>> in two different ways: >>> >>> * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight >>> DMA buffers lead to data corruption when the prefetched data is written >>> back on top of data from the device. >>> >>> * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU >>> is not seen by the other core(s), leading to inconsistent contents >>> accross the system. >>> >>> As a consequence, neither configuration is actually safe to use in a >>> general-purpose kernel that is used on both MPCore systems and ARM1176 >>> with prefetching enabled. >>> >>> We could add further workarounds to make the behavior more dynamic based >>> on the system, but realistically, there are close to zero remaining >>> users on any ARM11MPCore anyway, and nobody seems too interested in it, >>> compared to the more popular ARM1176 used in BMC2835 and AST2500. >>> >>> The Oxnas platform has some minimal support in OpenWRT, but most of the >>> drivers and dts files never made it into the mainline kernel, while the >>> Arm Versatile/Realview platform mainly serves as a reference system but >>> is not necessary to be kept working once all other ARM11MPCore are gone. >> >> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> >> >> It's sad but it's the reality, there's no chance full OXNAS support will >> ever come upstream and no real work has been done for years. >> >> I think OXNAS support can be programmed for removal for next release, >> it would need significant work to rework current support to make it acceptable >> before trying to upstream missing bits anyway. > > Ok, thanks for your reply! > > To clarify, do you think we should plan for removal after the next > stable release (6.3, removed in 6.4), or after the next LTS > release (probably 6.6, removed in 6.7)? As far as I understand, > the next OpenWRT release (23.x) will be based on linux-5.15, > and the one after that (24.x) would likely still use 6.1, unless > they skip an LTS kernel. I think it's ok to remove it ASAP, or at least before the next LTS, not having SMP makes the platform barely usable so the earliest is the best. Neil > > Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 16:40 ` Neil Armstrong 0 siblings, 0 replies; 456+ messages in thread From: Neil Armstrong @ 2023-03-30 16:40 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle Le 30/03/2023 à 12:03, Arnd Bergmann a écrit : > On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote: >> On 27/03/2023 14:13, Arnd Bergmann wrote: >>> From: Arnd Bergmann <arnd@arndb.de> >>> >>> The cache management operations for noncoherent DMA on ARMv6 work >>> in two different ways: >>> >>> * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight >>> DMA buffers lead to data corruption when the prefetched data is written >>> back on top of data from the device. >>> >>> * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU >>> is not seen by the other core(s), leading to inconsistent contents >>> accross the system. >>> >>> As a consequence, neither configuration is actually safe to use in a >>> general-purpose kernel that is used on both MPCore systems and ARM1176 >>> with prefetching enabled. >>> >>> We could add further workarounds to make the behavior more dynamic based >>> on the system, but realistically, there are close to zero remaining >>> users on any ARM11MPCore anyway, and nobody seems too interested in it, >>> compared to the more popular ARM1176 used in BMC2835 and AST2500. >>> >>> The Oxnas platform has some minimal support in OpenWRT, but most of the >>> drivers and dts files never made it into the mainline kernel, while the >>> Arm Versatile/Realview platform mainly serves as a reference system but >>> is not necessary to be kept working once all other ARM11MPCore are gone. >> >> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> >> >> It's sad but it's the reality, there's no chance full OXNAS support will >> ever come upstream and no real work has been done for years. >> >> I think OXNAS support can be programmed for removal for next release, >> it would need significant work to rework current support to make it acceptable >> before trying to upstream missing bits anyway. > > Ok, thanks for your reply! > > To clarify, do you think we should plan for removal after the next > stable release (6.3, removed in 6.4), or after the next LTS > release (probably 6.6, removed in 6.7)? As far as I understand, > the next OpenWRT release (23.x) will be based on linux-5.15, > and the one after that (24.x) would likely still use 6.1, unless > they skip an LTS kernel. I think it's ok to remove it ASAP, or at least before the next LTS, not having SMP makes the platform barely usable so the earliest is the best. Neil > > Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 16:40 ` Neil Armstrong 0 siblings, 0 replies; 456+ messages in thread From: Neil Armstrong @ 2023-03-30 16:40 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle Le 30/03/2023 à 12:03, Arnd Bergmann a écrit : > On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote: >> On 27/03/2023 14:13, Arnd Bergmann wrote: >>> From: Arnd Bergmann <arnd@arndb.de> >>> >>> The cache management operations for noncoherent DMA on ARMv6 work >>> in two different ways: >>> >>> * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight >>> DMA buffers lead to data corruption when the prefetched data is written >>> back on top of data from the device. >>> >>> * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU >>> is not seen by the other core(s), leading to inconsistent contents >>> accross the system. >>> >>> As a consequence, neither configuration is actually safe to use in a >>> general-purpose kernel that is used on both MPCore systems and ARM1176 >>> with prefetching enabled. >>> >>> We could add further workarounds to make the behavior more dynamic based >>> on the system, but realistically, there are close to zero remaining >>> users on any ARM11MPCore anyway, and nobody seems too interested in it, >>> compared to the more popular ARM1176 used in BMC2835 and AST2500. >>> >>> The Oxnas platform has some minimal support in OpenWRT, but most of the >>> drivers and dts files never made it into the mainline kernel, while the >>> Arm Versatile/Realview platform mainly serves as a reference system but >>> is not necessary to be kept working once all other ARM11MPCore are gone. >> >> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> >> >> It's sad but it's the reality, there's no chance full OXNAS support will >> ever come upstream and no real work has been done for years. >> >> I think OXNAS support can be programmed for removal for next release, >> it would need significant work to rework current support to make it acceptable >> before trying to upstream missing bits anyway. > > Ok, thanks for your reply! > > To clarify, do you think we should plan for removal after the next > stable release (6.3, removed in 6.4), or after the next LTS > release (probably 6.6, removed in 6.7)? As far as I understand, > the next OpenWRT release (23.x) will be based on linux-5.15, > and the one after that (24.x) would likely still use 6.1, unless > they skip an LTS kernel. I think it's ok to remove it ASAP, or at least before the next LTS, not having SMP makes the platform barely usable so the earliest is the best. Neil > > Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 16:40 ` Neil Armstrong 0 siblings, 0 replies; 456+ messages in thread From: Neil Armstrong @ 2023-03-30 16:40 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, "David S . Miller" <dave> Le 30/03/2023 à 12:03, Arnd Bergmann a écrit : > On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote: >> On 27/03/2023 14:13, Arnd Bergmann wrote: >>> From: Arnd Bergmann <arnd@arndb.de> >>> >>> The cache management operations for noncoherent DMA on ARMv6 work >>> in two different ways: >>> >>> * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight >>> DMA buffers lead to data corruption when the prefetched data is written >>> back on top of data from the device. >>> >>> * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU >>> is not seen by the other core(s), leading to inconsistent contents >>> accross the system. >>> >>> As a consequence, neither configuration is actually safe to use in a >>> general-purpose kernel that is used on both MPCore systems and ARM1176 >>> with prefetching enabled. >>> >>> We could add further workarounds to make the behavior more dynamic based >>> on the system, but realistically, there are close to zero remaining >>> users on any ARM11MPCore anyway, and nobody seems too interested in it, >>> compared to the more popular ARM1176 used in BMC2835 and AST2500. >>> >>> The Oxnas platform has some minimal support in OpenWRT, but most of the >>> drivers and dts files never made it into the mainline kernel, while the >>> Arm Versatile/Realview platform mainly serves as a reference system but >>> is not necessary to be kept working once all other ARM11MPCore are gone. >> >> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> >> >> It's sad but it's the reality, there's no chance full OXNAS support will >> ever come upstream and no real work has been done for years. >> >> I think OXNAS support can be programmed for removal for next release, >> it would need significant work to rework current support to make it acceptable >> before trying to upstream missing bits anyway. > > Ok, thanks for your reply! > > To clarify, do you think we should plan for removal after the next > stable release (6.3, removed in 6.4), or after the next LTS > release (probably 6.6, removed in 6.7)? As far as I understand, > the next OpenWRT release (23.x) will be based on linux-5.15, > and the one after that (24.x) would likely still use 6.1, unless > they skip an LTS kernel. I think it's ok to remove it ASAP, or at least before the next LTS, not having SMP makes the platform barely usable so the earliest is the best. Neil > > Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-30 8:12 ` Linus Walleij -1 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-30 8:12 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Yeah, we discussed this earlier, let's just drop it. Not worth the effort. Acked-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 8:12 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-30 8:12 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Yeah, we discussed this earlier, let's just drop it. Not worth the effort. Acked-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 8:12 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-30 8:12 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Daniel Golle, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Yeah, we discussed this earlier, let's just drop it. Not worth the effort. Acked-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 8:12 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-30 8:12 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Yeah, we discussed this earlier, let's just drop it. Not worth the effort. Acked-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 8:12 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-30 8:12 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Yeah, we discussed this earlier, let's just drop it. Not worth the effort. Acked-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 8:12 ` Linus Walleij 0 siblings, 0 replies; 456+ messages in thread From: Linus Walleij @ 2023-03-30 8:12 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christop On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Yeah, we discussed this earlier, let's just drop it. Not worth the effort. Acked-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore 2023-03-27 12:13 ` Arnd Bergmann ` (6 preceding siblings ...) (?) @ 2023-03-30 11:28 ` Joel Stanley 2023-03-31 12:54 ` Arnd Bergmann -1 siblings, 1 reply; 456+ messages in thread From: Joel Stanley @ 2023-03-30 11:28 UTC (permalink / raw) To: Arnd Bergmann; +Cc: linux-arm-kernel, Andrew Jeffery On Mon, 27 Mar 2023 at 12:18, Arnd Bergmann <arnd@kernel.org> wrote: > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. Why's that? I currently build a kernel for the ast2600 (dual core cortex a7) and ast2500 (arm1176). Cheers, Joel > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > I could use some help clarifying the above changelog text to describe > the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on > MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching > into the instruction cache, not the data cache, but this can end up in > the outercache as a result. The 1176 has some extra control bits to > control prefetching, but I found no reference that explains why an > MPCore does not run into the problem. > --- > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 ------ > arch/arm/mach-oxnas/platsmp.c | 96 ---------------------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 ----- > arch/arm/mm/cache-v6.S | 31 ------- > 7 files changed, 178 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > > diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig > index a9ded7079268..a054235c3d6c 100644 > --- a/arch/arm/mach-oxnas/Kconfig > +++ b/arch/arm/mach-oxnas/Kconfig > @@ -28,10 +28,6 @@ config MACH_OX820 > bool "Support OX820 Based Products" > depends on ARCH_MULTI_V6 > select ARM_GIC > - select DMA_CACHE_RWFO if SMP > - select HAVE_SMP > - select HAVE_ARM_SCU if SMP > - select HAVE_ARM_TWD if SMP > help > Include Support for the Oxford Semiconductor OX820 SoC Based Products. > > diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile > index 0e78ecfe6c49..a4e40e534e6a 100644 > --- a/arch/arm/mach-oxnas/Makefile > +++ b/arch/arm/mach-oxnas/Makefile > @@ -1,2 +1 @@ > # SPDX-License-Identifier: GPL-2.0-only > -obj-$(CONFIG_SMP) += platsmp.o headsmp.o > diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S > deleted file mode 100644 > index 9c0f1479f33a..000000000000 > --- a/arch/arm/mach-oxnas/headsmp.S > +++ /dev/null > @@ -1,23 +0,0 @@ > -/* SPDX-License-Identifier: GPL-2.0-only */ > -/* > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (c) 2003 ARM Limited > - * All Rights Reserved > - */ > -#include <linux/linkage.h> > -#include <linux/init.h> > - > - __INIT > - > -/* > - * OX820 specific entry point for secondary CPUs. > - */ > -ENTRY(ox820_secondary_startup) > - mov r4, #0 > - /* invalidate both caches and branch target cache */ > - mcr p15, 0, r4, c7, c7, 0 > - /* > - * we've been released from the holding pen: secondary_stack > - * should now contain the SVC stack for this core > - */ > - b secondary_startup > diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c > deleted file mode 100644 > index f0a50b9e61df..000000000000 > --- a/arch/arm/mach-oxnas/platsmp.c > +++ /dev/null > @@ -1,96 +0,0 @@ > -// SPDX-License-Identifier: GPL-2.0-only > -/* > - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com> > - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com> > - * Copyright (C) 2002 ARM Ltd. > - * All Rights Reserved > - */ > -#include <linux/io.h> > -#include <linux/delay.h> > -#include <linux/of.h> > -#include <linux/of_address.h> > - > -#include <asm/cacheflush.h> > -#include <asm/cp15.h> > -#include <asm/smp_plat.h> > -#include <asm/smp_scu.h> > - > -extern void ox820_secondary_startup(void); > - > -static void __iomem *cpu_ctrl; > -static void __iomem *gic_cpu_ctrl; > - > -#define HOLDINGPEN_CPU_OFFSET 0xc8 > -#define HOLDINGPEN_LOCATION_OFFSET 0xc4 > - > -#define GIC_NCPU_OFFSET(cpu) (0x100 + (cpu)*0x100) > -#define GIC_CPU_CTRL 0x00 > -#define GIC_CPU_CTRL_ENABLE 1 > - > -static int __init ox820_boot_secondary(unsigned int cpu, > - struct task_struct *idle) > -{ > - /* > - * Write the address of secondary startup into the > - * system-wide flags register. The BootMonitor waits > - * until it receives a soft interrupt, and then the > - * secondary CPU branches to this address. > - */ > - writel(virt_to_phys(ox820_secondary_startup), > - cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET); > - > - writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET); > - > - /* > - * Enable GIC cpu interface in CPU Interface Control Register > - */ > - writel(GIC_CPU_CTRL_ENABLE, > - gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL); > - > - /* > - * Send the secondary CPU a soft interrupt, thereby causing > - * the boot monitor to read the system wide flags register, > - * and branch to the address found there. > - */ > - arch_send_wakeup_ipi_mask(cpumask_of(cpu)); > - > - return 0; > -} > - > -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus) > -{ > - struct device_node *np; > - void __iomem *scu_base; > - > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu"); > - scu_base = of_iomap(np, 0); > - of_node_put(np); > - if (!scu_base) > - return; > - > - /* Remap CPU Interrupt Interface Registers */ > - np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic"); > - gic_cpu_ctrl = of_iomap(np, 1); > - of_node_put(np); > - if (!gic_cpu_ctrl) > - goto unmap_scu; > - > - np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl"); > - cpu_ctrl = of_iomap(np, 0); > - of_node_put(np); > - if (!cpu_ctrl) > - goto unmap_scu; > - > - scu_enable(scu_base); > - flush_cache_all(); > - > -unmap_scu: > - iounmap(scu_base); > -} > - > -static const struct smp_operations ox820_smp_ops __initconst = { > - .smp_prepare_cpus = ox820_smp_prepare_cpus, > - .smp_boot_secondary = ox820_boot_secondary, > -}; > - > -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops); > diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c > index 5d363385c801..fa31fd2d211d 100644 > --- a/arch/arm/mach-versatile/platsmp-realview.c > +++ b/arch/arm/mach-versatile/platsmp-realview.c > @@ -18,16 +18,12 @@ > #define REALVIEW_SYS_FLAGSSET_OFFSET 0x30 > > static const struct of_device_id realview_scu_match[] = { > - { .compatible = "arm,arm11mp-scu", }, > { .compatible = "arm,cortex-a9-scu", }, > { .compatible = "arm,cortex-a5-scu", }, > { } > }; > > static const struct of_device_id realview_syscon_match[] = { > - { .compatible = "arm,core-module-integrator", }, > - { .compatible = "arm,realview-eb-syscon", }, > - { .compatible = "arm,realview-pb11mp-syscon", }, > { .compatible = "arm,realview-pbx-syscon", }, > { }, > }; > diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig > index c5bbae86f725..16b62bc0a970 100644 > --- a/arch/arm/mm/Kconfig > +++ b/arch/arm/mm/Kconfig > @@ -937,25 +937,6 @@ config VDSO > You must have glibc 2.22 or later for programs to seamlessly > take advantage of this. > > -config DMA_CACHE_RWFO > - bool "Enable read/write for ownership DMA cache maintenance" > - depends on CPU_V6K && SMP > - default y > - help > - The Snoop Control Unit on ARM11MPCore does not detect the > - cache maintenance operations and the dma_{map,unmap}_area() > - functions may leave stale cache entries on other CPUs. By > - enabling this option, Read or Write For Ownership in the ARMv6 > - DMA cache maintenance functions is performed. These LDR/STR > - instructions change the cache line state to shared or modified > - so that the cache operation has the desired effect. > - > - Note that the workaround is only valid on processors that do > - not perform speculative loads into the D-cache. For such > - processors, if cache maintenance operations are not broadcast > - in hardware, other workarounds are needed (e.g. cache > - maintenance broadcasting in software via FIQ). > - > config OUTER_CACHE > bool > > diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S > index abae7ff5defc..f6ee53c1de20 100644 > --- a/arch/arm/mm/cache-v6.S > +++ b/arch/arm/mm/cache-v6.S > @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area) > * - end - virtual end address of region > */ > ENTRY(v6_dma_inv_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > tst r0, #D_CACHE_LINE_SIZE - 1 > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range) > mcrne p15, 0, r0, c7, c11, 1 @ clean unified line > #endif > tst r1, #D_CACHE_LINE_SIZE - 1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrbne r2, [r1, #-1] @ read for ownership > - strbne r2, [r1, #-1] @ write for ownership > -#endif > bic r1, r1, #D_CACHE_LINE_SIZE - 1 > #ifdef HARVARD_CACHE > mcrne p15, 0, r1, c7, c14, 1 @ clean & invalidate D line > @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrlo r2, [r0] @ read for ownership > - strlo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range) > ENTRY(v6_dma_clean_range) > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldr r2, [r0] @ read for ownership > -#endif > #ifdef HARVARD_CACHE > mcr p15, 0, r0, c7, c10, 1 @ clean D line > #else > @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range) > * - end - virtual end address of region > */ > ENTRY(v6_dma_flush_range) > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrb r2, [r0] @ read for ownership > - strb r2, [r0] @ write for ownership > -#endif > bic r0, r0, #D_CACHE_LINE_SIZE - 1 > 1: > #ifdef HARVARD_CACHE > @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range) > #endif > add r0, r0, #D_CACHE_LINE_SIZE > cmp r0, r1 > -#ifdef CONFIG_DMA_CACHE_RWFO > - ldrblo r2, [r0] @ read for ownership > - strblo r2, [r0] @ write for ownership > -#endif > blo 1b > mov r0, #0 > mcr p15, 0, r0, c7, c10, 4 @ drain write buffer > @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area) > add r1, r1, r0 > teq r2, #DMA_FROM_DEVICE > beq v6_dma_inv_range > -#ifndef CONFIG_DMA_CACHE_RWFO > b v6_dma_clean_range > -#else > - teq r2, #DMA_TO_DEVICE > - beq v6_dma_clean_range > - b v6_dma_flush_range > -#endif > ENDPROC(v6_dma_map_area) > > /* > @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area) > * - dir - DMA direction > */ > ENTRY(v6_dma_unmap_area) > -#ifndef CONFIG_DMA_CACHE_RWFO > add r1, r1, r0 > teq r2, #DMA_TO_DEVICE > bne v6_dma_inv_range > -#endif > ret lr > ENDPROC(v6_dma_unmap_area) > > -- > 2.39.2 > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore 2023-03-30 11:28 ` Joel Stanley @ 2023-03-31 12:54 ` Arnd Bergmann 2023-04-05 1:49 ` Joel Stanley 0 siblings, 1 reply; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 12:54 UTC (permalink / raw) To: Joel Stanley; +Cc: linux-arm-kernel, Andrew Jeffery On Thu, Mar 30, 2023, at 13:28, Joel Stanley wrote: > On Mon, 27 Mar 2023 at 12:18, Arnd Bergmann <arnd@kernel.org> wrote: > >> Take the easy way out here and drop support for multiprocessing on >> ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache >> management implementation for it. This also helps with other ARMv6 >> issues, but for the moment leaves the ability to build a kernel that >> can run on both ARMv7 SMP and single-processor ARMv6, which we probably >> want to stop supporting as well, but not as part of this series. > > Why's that? I currently build a kernel for the ast2600 (dual core > cortex a7) and ast2500 (arm1176). There are actually three generations of ARMv6: - "armv6" arm1136r0, in practice only omap2 and i.mx31 - "armv6k" arm1136r1 and arm1176, common in ast2500, bcm2835, i.mx35, and s3c64xx - "armv6k" arm11mpcore in oxnas and realview The early armv6 ones are halfway between armv5 and armv6k, which causes a number of problems: - cannot use 8-bit and 16-bit sub-word atomics (strexh etc) and double-word atomics (strexd) that may be needed by some drivers - need to use cp15 barriers instead of isb/dsb etc, which breaks compatibility with armv8 hardware in a multiplatform kernel - missing TLS register requires horrid workarounds for CURRENT_POINTER_IN_TPIDRURO. - Running an SMP-enabled kernel on armv6 requires SMP_ON_UP (this is also required for at least one ARMv7 SoC) One idea that we have discussed as a long-term solution for all of the above would be to change support for armv6 from being compatible with armv7 to being compatible with armv5. arm11mpcore is currently tied to armv7 because of SMP support, but once we drop SMP support for it, we have more freedom and can move armv6 and armv6k uniprocessor support together with armv5. I already did an initial patch last year, but it needs more work to ensure that it correctly addresses everything that currently assumes that armv6 cannot coexist with older cores in the same kernel. If this works out, we'll be able to have a combined ast2400/ ast2500/omap1/omap2/imx2/imx3 kernel separate from ast2600/ omap3/imx5, but hopefully not more kernels that anyone needs to test. For Debian, this would allow armel to ship a combined kernel for Raspberry Pi and the ARMv5 parts in place of the two separate kernels it currently has, but it's not clear there will actually be another Debian armel release after Bookworm. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore 2023-03-31 12:54 ` Arnd Bergmann @ 2023-04-05 1:49 ` Joel Stanley 0 siblings, 0 replies; 456+ messages in thread From: Joel Stanley @ 2023-04-05 1:49 UTC (permalink / raw) To: Arnd Bergmann; +Cc: linux-arm-kernel, Andrew Jeffery On Fri, 31 Mar 2023 at 12:54, Arnd Bergmann <arnd@kernel.org> wrote: > > On Thu, Mar 30, 2023, at 13:28, Joel Stanley wrote: > > On Mon, 27 Mar 2023 at 12:18, Arnd Bergmann <arnd@kernel.org> wrote: > > > >> Take the easy way out here and drop support for multiprocessing on > >> ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > >> management implementation for it. This also helps with other ARMv6 > >> issues, but for the moment leaves the ability to build a kernel that > >> can run on both ARMv7 SMP and single-processor ARMv6, which we probably > >> want to stop supporting as well, but not as part of this series. > > > > Why's that? I currently build a kernel for the ast2600 (dual core > > cortex a7) and ast2500 (arm1176). > If this works out, we'll be able to have a combined ast2400/ > ast2500/omap1/omap2/imx2/imx3 kernel separate from ast2600/ > omap3/imx5, but hopefully not more kernels that anyone needs > to test. Thanks for the explanation. This all makes sense. Cheers, Joel _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-30 11:51 ` Ard Biesheuvel -1 siblings, 0 replies; 456+ messages in thread From: Ard Biesheuvel @ 2023-03-30 11:51 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Mon, 27 Mar 2023 at 14:18, Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 11:51 ` Ard Biesheuvel 0 siblings, 0 replies; 456+ messages in thread From: Ard Biesheuvel @ 2023-03-30 11:51 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Mon, 27 Mar 2023 at 14:18, Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 11:51 ` Ard Biesheuvel 0 siblings, 0 replies; 456+ messages in thread From: Ard Biesheuvel @ 2023-03-30 11:51 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Daniel Golle, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel On Mon, 27 Mar 2023 at 14:18, Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 11:51 ` Ard Biesheuvel 0 siblings, 0 replies; 456+ messages in thread From: Ard Biesheuvel @ 2023-03-30 11:51 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Mon, 27 Mar 2023 at 14:18, Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 11:51 ` Ard Biesheuvel 0 siblings, 0 replies; 456+ messages in thread From: Ard Biesheuvel @ 2023-03-30 11:51 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Mon, 27 Mar 2023 at 14:18, Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-30 11:51 ` Ard Biesheuvel 0 siblings, 0 replies; 456+ messages in thread From: Ard Biesheuvel @ 2023-03-30 11:51 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, M On Mon, 27 Mar 2023 at 14:18, Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. > > We could add further workarounds to make the behavior more dynamic based > on the system, but realistically, there are close to zero remaining > users on any ARM11MPCore anyway, and nobody seems too interested in it, > compared to the more popular ARM1176 used in BMC2835 and AST2500. > > The Oxnas platform has some minimal support in OpenWRT, but most of the > drivers and dts files never made it into the mainline kernel, while the > Arm Versatile/Realview platform mainly serves as a reference system but > is not necessary to be kept working once all other ARM11MPCore are gone. > > Take the easy way out here and drop support for multiprocessing on > ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache > management implementation for it. This also helps with other ARMv6 > issues, but for the moment leaves the ability to build a kernel that > can run on both ARMv7 SMP and single-processor ARMv6, which we probably > want to stop supporting as well, but not as part of this series. > > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Daniel Golle <daniel@makrotopia.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: linux-oxnas@groups.io > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-31 17:09 ` Catalin Marinas -1 siblings, 0 replies; 456+ messages in thread From: Catalin Marinas @ 2023-03-31 17:09 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Mon, Mar 27, 2023 at 02:13:14PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. As the author of this terrible hack (created under duress ;)) Acked-by: Catalin Marinas <catalin.marinas@arm.com> IIRC, RWFO is working in combination with the cache operations. Because the cache maintenance broadcast did not happen, we forced the cache lines to migrate to a CPU via a write (for ownership) and doing the cache maintenance on that CPU (that was the FROM_DEVICE case). For the TO_DEVICE case, reading on a CPU would cause dirty lines on another CPU to be evicted (or migrated as dirty to the current CPU IIRC) then the cache maintenance to clean them to PoC on the local CPU. But there's always a small window between read/write for ownership and the actual cache maintenance which can cause a cache line to migrate to other CPUs if they do speculative prefetches. At the time ARM11MPCore was deemed safe-ish but I haven't followed what later implementations actually did (luckily we fixed the architecture in ARMv7). -- Catalin ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-31 17:09 ` Catalin Marinas 0 siblings, 0 replies; 456+ messages in thread From: Catalin Marinas @ 2023-03-31 17:09 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Mon, Mar 27, 2023 at 02:13:14PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. As the author of this terrible hack (created under duress ;)) Acked-by: Catalin Marinas <catalin.marinas@arm.com> IIRC, RWFO is working in combination with the cache operations. Because the cache maintenance broadcast did not happen, we forced the cache lines to migrate to a CPU via a write (for ownership) and doing the cache maintenance on that CPU (that was the FROM_DEVICE case). For the TO_DEVICE case, reading on a CPU would cause dirty lines on another CPU to be evicted (or migrated as dirty to the current CPU IIRC) then the cache maintenance to clean them to PoC on the local CPU. But there's always a small window between read/write for ownership and the actual cache maintenance which can cause a cache line to migrate to other CPUs if they do speculative prefetches. At the time ARM11MPCore was deemed safe-ish but I haven't followed what later implementations actually did (luckily we fixed the architecture in ARMv7). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-31 17:09 ` Catalin Marinas 0 siblings, 0 replies; 456+ messages in thread From: Catalin Marinas @ 2023-03-31 17:09 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Daniel Golle, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong On Mon, Mar 27, 2023 at 02:13:14PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. As the author of this terrible hack (created under duress ;)) Acked-by: Catalin Marinas <catalin.marinas@arm.com> IIRC, RWFO is working in combination with the cache operations. Because the cache maintenance broadcast did not happen, we forced the cache lines to migrate to a CPU via a write (for ownership) and doing the cache maintenance on that CPU (that was the FROM_DEVICE case). For the TO_DEVICE case, reading on a CPU would cause dirty lines on another CPU to be evicted (or migrated as dirty to the current CPU IIRC) then the cache maintenance to clean them to PoC on the local CPU. But there's always a small window between read/write for ownership and the actual cache maintenance which can cause a cache line to migrate to other CPUs if they do speculative prefetches. At the time ARM11MPCore was deemed safe-ish but I haven't followed what later implementations actually did (luckily we fixed the architecture in ARMv7). -- Catalin ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-31 17:09 ` Catalin Marinas 0 siblings, 0 replies; 456+ messages in thread From: Catalin Marinas @ 2023-03-31 17:09 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Mon, Mar 27, 2023 at 02:13:14PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. As the author of this terrible hack (created under duress ;)) Acked-by: Catalin Marinas <catalin.marinas@arm.com> IIRC, RWFO is working in combination with the cache operations. Because the cache maintenance broadcast did not happen, we forced the cache lines to migrate to a CPU via a write (for ownership) and doing the cache maintenance on that CPU (that was the FROM_DEVICE case). For the TO_DEVICE case, reading on a CPU would cause dirty lines on another CPU to be evicted (or migrated as dirty to the current CPU IIRC) then the cache maintenance to clean them to PoC on the local CPU. But there's always a small window between read/write for ownership and the actual cache maintenance which can cause a cache line to migrate to other CPUs if they do speculative prefetches. At the time ARM11MPCore was deemed safe-ish but I haven't followed what later implementations actually did (luckily we fixed the architecture in ARMv7). -- Catalin _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-31 17:09 ` Catalin Marinas 0 siblings, 0 replies; 456+ messages in thread From: Catalin Marinas @ 2023-03-31 17:09 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Daniel Golle On Mon, Mar 27, 2023 at 02:13:14PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. As the author of this terrible hack (created under duress ;)) Acked-by: Catalin Marinas <catalin.marinas@arm.com> IIRC, RWFO is working in combination with the cache operations. Because the cache maintenance broadcast did not happen, we forced the cache lines to migrate to a CPU via a write (for ownership) and doing the cache maintenance on that CPU (that was the FROM_DEVICE case). For the TO_DEVICE case, reading on a CPU would cause dirty lines on another CPU to be evicted (or migrated as dirty to the current CPU IIRC) then the cache maintenance to clean them to PoC on the local CPU. But there's always a small window between read/write for ownership and the actual cache maintenance which can cause a cache line to migrate to other CPUs if they do speculative prefetches. At the time ARM11MPCore was deemed safe-ish but I haven't followed what later implementations actually did (luckily we fixed the architecture in ARMv7). -- Catalin _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 18/21] ARM: drop SMP support for ARM11MPCore @ 2023-03-31 17:09 ` Catalin Marinas 0 siblings, 0 replies; 456+ messages in thread From: Catalin Marinas @ 2023-03-31 17:09 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John On Mon, Mar 27, 2023 at 02:13:14PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The cache management operations for noncoherent DMA on ARMv6 work > in two different ways: > > * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight > DMA buffers lead to data corruption when the prefetched data is written > back on top of data from the device. > > * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU > is not seen by the other core(s), leading to inconsistent contents > accross the system. > > As a consequence, neither configuration is actually safe to use in a > general-purpose kernel that is used on both MPCore systems and ARM1176 > with prefetching enabled. As the author of this terrible hack (created under duress ;)) Acked-by: Catalin Marinas <catalin.marinas@arm.com> IIRC, RWFO is working in combination with the cache operations. Because the cache maintenance broadcast did not happen, we forced the cache lines to migrate to a CPU via a write (for ownership) and doing the cache maintenance on that CPU (that was the FROM_DEVICE case). For the TO_DEVICE case, reading on a CPU would cause dirty lines on another CPU to be evicted (or migrated as dirty to the current CPU IIRC) then the cache maintenance to clean them to PoC on the local CPU. But there's always a small window between read/write for ownership and the actual cache maintenance which can cause a cache line to migrate to other CPUs if they do speculative prefetches. At the time ARM11MPCore was deemed safe-ish but I haven't followed what later implementations actually did (luckily we fixed the architecture in ARMv7). -- Catalin ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 19/21] ARM: dma-mapping: use generic form of arch_sync_dma_* helpers 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> As the final step of the conversion to generic arch_sync_dma_* helpers, change the Arm implementation to look the same as the new generic version, by calling the dmac_{clean,inv,flush}_area low-level functions instead of the abstracted dmac_{map,unmap}_area version. On ARMv6/v7, this invalidates the caches after a DMA transfer from a device because of speculative prefetching, while on earlier versions it only needs to do this before the transfer. This should not change any of the current behavior. FIXME: address CONFIG_DMA_CACHE_RWFO properly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/dma-mapping-nommu.c | 11 +++---- arch/arm/mm/dma-mapping.c | 53 +++++++++++++++++++++++---------- 2 files changed, 43 insertions(+), 21 deletions(-) diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index cfd9c933d2f0..12b5c6ae93fc 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -16,12 +16,13 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dmac_map_area(__va(paddr), size, dir); - - if (dir == DMA_FROM_DEVICE) + if (dir == DMA_FROM_DEVICE) { + dmac_inv_range(__va(paddr), __va(paddr + size)); outer_inv_range(paddr, paddr + size); - else + } else { + dmac_clean_range(__va(paddr), __va(paddr + size)); outer_clean_range(paddr, paddr + size); + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, @@ -29,7 +30,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, { if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dmac_unmap_area(__va(paddr), size, dir); + dmac_inv_range(__va(paddr), __va(paddr)); } } diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index ce4b74f34a58..cc702cb27ae7 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -623,8 +623,7 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, } static void dma_cache_maint(phys_addr_t paddr, - size_t size, enum dma_data_direction dir, - void (*op)(const void *, size_t, int)) + size_t size, void (*op)(const void *, const void *)) { unsigned long pfn = PFN_DOWN(paddr); unsigned long offset = paddr % PAGE_SIZE; @@ -647,18 +646,18 @@ static void dma_cache_maint(phys_addr_t paddr, if (cache_is_vipt_nonaliasing()) { vaddr = kmap_atomic(page); - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_atomic(vaddr); } else { vaddr = kmap_high_get(page); if (vaddr) { - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_high(page); } } } else { vaddr = page_address(page) + offset; - op(vaddr, len, dir); + op(vaddr, vaddr + len); } offset = 0; pfn++; @@ -666,6 +665,18 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +static bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + if (IS_ENABLED(CONFIG_CPU_V6) || + IS_ENABLED(CONFIG_CPU_V6K) || + IS_ENABLED(CONFIG_CPU_V7) || + IS_ENABLED(CONFIG_CPU_V7M)) + return true; + + /* FIXME: runtime detection */ + return false; +} + /* * Make an area consistent for devices. * Note: Drivers should NOT use this function directly. @@ -674,25 +685,35 @@ static void dma_cache_maint(phys_addr_t paddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_cache_maint(paddr, size, dir, dmac_map_area); - - if (dir == DMA_FROM_DEVICE) { - outer_inv_range(paddr, paddr + size); - } else { + switch (dir) { + case DMA_TO_DEVICE: + dma_cache_maint(paddr, size, dmac_clean_range); outer_clean_range(paddr, paddr + size); + break; + case DMA_FROM_DEVICE: + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); + break; + case DMA_BIDIRECTIONAL: + if (arch_sync_dma_cpu_needs_post_dma_flush()) { + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); + } else { + dma_cache_maint(paddr, size, dmac_flush_range); + outer_flush_range(paddr, paddr + size); + } + break; + default: + break; } - /* FIXME: non-speculating: flush on bidirectional mappings? */ } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - /* FIXME: non-speculating: not required */ - /* in any case, don't bother invalidating if DMA to device */ - if (dir != DMA_TO_DEVICE) { + if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { outer_inv_range(paddr, paddr + size); - - dma_cache_maint(paddr, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dmac_inv_range); } /* -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 19/21] ARM: dma-mapping: use generic form of arch_sync_dma_* helpers @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> As the final step of the conversion to generic arch_sync_dma_* helpers, change the Arm implementation to look the same as the new generic version, by calling the dmac_{clean,inv,flush}_area low-level functions instead of the abstracted dmac_{map,unmap}_area version. On ARMv6/v7, this invalidates the caches after a DMA transfer from a device because of speculative prefetching, while on earlier versions it only needs to do this before the transfer. This should not change any of the current behavior. FIXME: address CONFIG_DMA_CACHE_RWFO properly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/dma-mapping-nommu.c | 11 +++---- arch/arm/mm/dma-mapping.c | 53 +++++++++++++++++++++++---------- 2 files changed, 43 insertions(+), 21 deletions(-) diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index cfd9c933d2f0..12b5c6ae93fc 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -16,12 +16,13 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dmac_map_area(__va(paddr), size, dir); - - if (dir == DMA_FROM_DEVICE) + if (dir == DMA_FROM_DEVICE) { + dmac_inv_range(__va(paddr), __va(paddr + size)); outer_inv_range(paddr, paddr + size); - else + } else { + dmac_clean_range(__va(paddr), __va(paddr + size)); outer_clean_range(paddr, paddr + size); + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, @@ -29,7 +30,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, { if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dmac_unmap_area(__va(paddr), size, dir); + dmac_inv_range(__va(paddr), __va(paddr)); } } diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index ce4b74f34a58..cc702cb27ae7 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -623,8 +623,7 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, } static void dma_cache_maint(phys_addr_t paddr, - size_t size, enum dma_data_direction dir, - void (*op)(const void *, size_t, int)) + size_t size, void (*op)(const void *, const void *)) { unsigned long pfn = PFN_DOWN(paddr); unsigned long offset = paddr % PAGE_SIZE; @@ -647,18 +646,18 @@ static void dma_cache_maint(phys_addr_t paddr, if (cache_is_vipt_nonaliasing()) { vaddr = kmap_atomic(page); - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_atomic(vaddr); } else { vaddr = kmap_high_get(page); if (vaddr) { - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_high(page); } } } else { vaddr = page_address(page) + offset; - op(vaddr, len, dir); + op(vaddr, vaddr + len); } offset = 0; pfn++; @@ -666,6 +665,18 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +static bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + if (IS_ENABLED(CONFIG_CPU_V6) || + IS_ENABLED(CONFIG_CPU_V6K) || + IS_ENABLED(CONFIG_CPU_V7) || + IS_ENABLED(CONFIG_CPU_V7M)) + return true; + + /* FIXME: runtime detection */ + return false; +} + /* * Make an area consistent for devices. * Note: Drivers should NOT use this function directly. @@ -674,25 +685,35 @@ static void dma_cache_maint(phys_addr_t paddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_cache_maint(paddr, size, dir, dmac_map_area); - - if (dir == DMA_FROM_DEVICE) { - outer_inv_range(paddr, paddr + size); - } else { + switch (dir) { + case DMA_TO_DEVICE: + dma_cache_maint(paddr, size, dmac_clean_range); outer_clean_range(paddr, paddr + size); + break; + case DMA_FROM_DEVICE: + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); + break; + case DMA_BIDIRECTIONAL: + if (arch_sync_dma_cpu_needs_post_dma_flush()) { + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); + } else { + dma_cache_maint(paddr, size, dmac_flush_range); + outer_flush_range(paddr, paddr + size); + } + break; + default: + break; } - /* FIXME: non-speculating: flush on bidirectional mappings? */ } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - /* FIXME: non-speculating: not required */ - /* in any case, don't bother invalidating if DMA to device */ - if (dir != DMA_TO_DEVICE) { + if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { outer_inv_range(paddr, paddr + size); - - dma_cache_maint(paddr, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dmac_inv_range); } /* -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 19/21] ARM: dma-mapping: use generic form of arch_sync_dma_* helpers @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> As the final step of the conversion to generic arch_sync_dma_* helpers, change the Arm implementation to look the same as the new generic version, by calling the dmac_{clean,inv,flush}_area low-level functions instead of the abstracted dmac_{map,unmap}_area version. On ARMv6/v7, this invalidates the caches after a DMA transfer from a device because of speculative prefetching, while on earlier versions it only needs to do this before the transfer. This should not change any of the current behavior. FIXME: address CONFIG_DMA_CACHE_RWFO properly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/dma-mapping-nommu.c | 11 +++---- arch/arm/mm/dma-mapping.c | 53 +++++++++++++++++++++++---------- 2 files changed, 43 insertions(+), 21 deletions(-) diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index cfd9c933d2f0..12b5c6ae93fc 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -16,12 +16,13 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dmac_map_area(__va(paddr), size, dir); - - if (dir == DMA_FROM_DEVICE) + if (dir == DMA_FROM_DEVICE) { + dmac_inv_range(__va(paddr), __va(paddr + size)); outer_inv_range(paddr, paddr + size); - else + } else { + dmac_clean_range(__va(paddr), __va(paddr + size)); outer_clean_range(paddr, paddr + size); + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, @@ -29,7 +30,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, { if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dmac_unmap_area(__va(paddr), size, dir); + dmac_inv_range(__va(paddr), __va(paddr)); } } diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index ce4b74f34a58..cc702cb27ae7 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -623,8 +623,7 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, } static void dma_cache_maint(phys_addr_t paddr, - size_t size, enum dma_data_direction dir, - void (*op)(const void *, size_t, int)) + size_t size, void (*op)(const void *, const void *)) { unsigned long pfn = PFN_DOWN(paddr); unsigned long offset = paddr % PAGE_SIZE; @@ -647,18 +646,18 @@ static void dma_cache_maint(phys_addr_t paddr, if (cache_is_vipt_nonaliasing()) { vaddr = kmap_atomic(page); - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_atomic(vaddr); } else { vaddr = kmap_high_get(page); if (vaddr) { - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_high(page); } } } else { vaddr = page_address(page) + offset; - op(vaddr, len, dir); + op(vaddr, vaddr + len); } offset = 0; pfn++; @@ -666,6 +665,18 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +static bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + if (IS_ENABLED(CONFIG_CPU_V6) || + IS_ENABLED(CONFIG_CPU_V6K) || + IS_ENABLED(CONFIG_CPU_V7) || + IS_ENABLED(CONFIG_CPU_V7M)) + return true; + + /* FIXME: runtime detection */ + return false; +} + /* * Make an area consistent for devices. * Note: Drivers should NOT use this function directly. @@ -674,25 +685,35 @@ static void dma_cache_maint(phys_addr_t paddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_cache_maint(paddr, size, dir, dmac_map_area); - - if (dir == DMA_FROM_DEVICE) { - outer_inv_range(paddr, paddr + size); - } else { + switch (dir) { + case DMA_TO_DEVICE: + dma_cache_maint(paddr, size, dmac_clean_range); outer_clean_range(paddr, paddr + size); + break; + case DMA_FROM_DEVICE: + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); + break; + case DMA_BIDIRECTIONAL: + if (arch_sync_dma_cpu_needs_post_dma_flush()) { + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); + } else { + dma_cache_maint(paddr, size, dmac_flush_range); + outer_flush_range(paddr, paddr + size); + } + break; + default: + break; } - /* FIXME: non-speculating: flush on bidirectional mappings? */ } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - /* FIXME: non-speculating: not required */ - /* in any case, don't bother invalidating if DMA to device */ - if (dir != DMA_TO_DEVICE) { + if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { outer_inv_range(paddr, paddr + size); - - dma_cache_maint(paddr, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dmac_inv_range); } /* -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 19/21] ARM: dma-mapping: use generic form of arch_sync_dma_* helpers @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> As the final step of the conversion to generic arch_sync_dma_* helpers, change the Arm implementation to look the same as the new generic version, by calling the dmac_{clean,inv,flush}_area low-level functions instead of the abstracted dmac_{map,unmap}_area version. On ARMv6/v7, this invalidates the caches after a DMA transfer from a device because of speculative prefetching, while on earlier versions it only needs to do this before the transfer. This should not change any of the current behavior. FIXME: address CONFIG_DMA_CACHE_RWFO properly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/dma-mapping-nommu.c | 11 +++---- arch/arm/mm/dma-mapping.c | 53 +++++++++++++++++++++++---------- 2 files changed, 43 insertions(+), 21 deletions(-) diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index cfd9c933d2f0..12b5c6ae93fc 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -16,12 +16,13 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dmac_map_area(__va(paddr), size, dir); - - if (dir == DMA_FROM_DEVICE) + if (dir == DMA_FROM_DEVICE) { + dmac_inv_range(__va(paddr), __va(paddr + size)); outer_inv_range(paddr, paddr + size); - else + } else { + dmac_clean_range(__va(paddr), __va(paddr + size)); outer_clean_range(paddr, paddr + size); + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, @@ -29,7 +30,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, { if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dmac_unmap_area(__va(paddr), size, dir); + dmac_inv_range(__va(paddr), __va(paddr)); } } diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index ce4b74f34a58..cc702cb27ae7 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -623,8 +623,7 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, } static void dma_cache_maint(phys_addr_t paddr, - size_t size, enum dma_data_direction dir, - void (*op)(const void *, size_t, int)) + size_t size, void (*op)(const void *, const void *)) { unsigned long pfn = PFN_DOWN(paddr); unsigned long offset = paddr % PAGE_SIZE; @@ -647,18 +646,18 @@ static void dma_cache_maint(phys_addr_t paddr, if (cache_is_vipt_nonaliasing()) { vaddr = kmap_atomic(page); - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_atomic(vaddr); } else { vaddr = kmap_high_get(page); if (vaddr) { - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_high(page); } } } else { vaddr = page_address(page) + offset; - op(vaddr, len, dir); + op(vaddr, vaddr + len); } offset = 0; pfn++; @@ -666,6 +665,18 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +static bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + if (IS_ENABLED(CONFIG_CPU_V6) || + IS_ENABLED(CONFIG_CPU_V6K) || + IS_ENABLED(CONFIG_CPU_V7) || + IS_ENABLED(CONFIG_CPU_V7M)) + return true; + + /* FIXME: runtime detection */ + return false; +} + /* * Make an area consistent for devices. * Note: Drivers should NOT use this function directly. @@ -674,25 +685,35 @@ static void dma_cache_maint(phys_addr_t paddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_cache_maint(paddr, size, dir, dmac_map_area); - - if (dir == DMA_FROM_DEVICE) { - outer_inv_range(paddr, paddr + size); - } else { + switch (dir) { + case DMA_TO_DEVICE: + dma_cache_maint(paddr, size, dmac_clean_range); outer_clean_range(paddr, paddr + size); + break; + case DMA_FROM_DEVICE: + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); + break; + case DMA_BIDIRECTIONAL: + if (arch_sync_dma_cpu_needs_post_dma_flush()) { + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); + } else { + dma_cache_maint(paddr, size, dmac_flush_range); + outer_flush_range(paddr, paddr + size); + } + break; + default: + break; } - /* FIXME: non-speculating: flush on bidirectional mappings? */ } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - /* FIXME: non-speculating: not required */ - /* in any case, don't bother invalidating if DMA to device */ - if (dir != DMA_TO_DEVICE) { + if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { outer_inv_range(paddr, paddr + size); - - dma_cache_maint(paddr, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dmac_inv_range); } /* -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 19/21] ARM: dma-mapping: use generic form of arch_sync_dma_* helpers @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> As the final step of the conversion to generic arch_sync_dma_* helpers, change the Arm implementation to look the same as the new generic version, by calling the dmac_{clean,inv,flush}_area low-level functions instead of the abstracted dmac_{map,unmap}_area version. On ARMv6/v7, this invalidates the caches after a DMA transfer from a device because of speculative prefetching, while on earlier versions it only needs to do this before the transfer. This should not change any of the current behavior. FIXME: address CONFIG_DMA_CACHE_RWFO properly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/dma-mapping-nommu.c | 11 +++---- arch/arm/mm/dma-mapping.c | 53 +++++++++++++++++++++++---------- 2 files changed, 43 insertions(+), 21 deletions(-) diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index cfd9c933d2f0..12b5c6ae93fc 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -16,12 +16,13 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dmac_map_area(__va(paddr), size, dir); - - if (dir == DMA_FROM_DEVICE) + if (dir == DMA_FROM_DEVICE) { + dmac_inv_range(__va(paddr), __va(paddr + size)); outer_inv_range(paddr, paddr + size); - else + } else { + dmac_clean_range(__va(paddr), __va(paddr + size)); outer_clean_range(paddr, paddr + size); + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, @@ -29,7 +30,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, { if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dmac_unmap_area(__va(paddr), size, dir); + dmac_inv_range(__va(paddr), __va(paddr)); } } diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index ce4b74f34a58..cc702cb27ae7 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -623,8 +623,7 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, } static void dma_cache_maint(phys_addr_t paddr, - size_t size, enum dma_data_direction dir, - void (*op)(const void *, size_t, int)) + size_t size, void (*op)(const void *, const void *)) { unsigned long pfn = PFN_DOWN(paddr); unsigned long offset = paddr % PAGE_SIZE; @@ -647,18 +646,18 @@ static void dma_cache_maint(phys_addr_t paddr, if (cache_is_vipt_nonaliasing()) { vaddr = kmap_atomic(page); - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_atomic(vaddr); } else { vaddr = kmap_high_get(page); if (vaddr) { - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_high(page); } } } else { vaddr = page_address(page) + offset; - op(vaddr, len, dir); + op(vaddr, vaddr + len); } offset = 0; pfn++; @@ -666,6 +665,18 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +static bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + if (IS_ENABLED(CONFIG_CPU_V6) || + IS_ENABLED(CONFIG_CPU_V6K) || + IS_ENABLED(CONFIG_CPU_V7) || + IS_ENABLED(CONFIG_CPU_V7M)) + return true; + + /* FIXME: runtime detection */ + return false; +} + /* * Make an area consistent for devices. * Note: Drivers should NOT use this function directly. @@ -674,25 +685,35 @@ static void dma_cache_maint(phys_addr_t paddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_cache_maint(paddr, size, dir, dmac_map_area); - - if (dir == DMA_FROM_DEVICE) { - outer_inv_range(paddr, paddr + size); - } else { + switch (dir) { + case DMA_TO_DEVICE: + dma_cache_maint(paddr, size, dmac_clean_range); outer_clean_range(paddr, paddr + size); + break; + case DMA_FROM_DEVICE: + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); + break; + case DMA_BIDIRECTIONAL: + if (arch_sync_dma_cpu_needs_post_dma_flush()) { + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); + } else { + dma_cache_maint(paddr, size, dmac_flush_range); + outer_flush_range(paddr, paddr + size); + } + break; + default: + break; } - /* FIXME: non-speculating: flush on bidirectional mappings? */ } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - /* FIXME: non-speculating: not required */ - /* in any case, don't bother invalidating if DMA to device */ - if (dir != DMA_TO_DEVICE) { + if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { outer_inv_range(paddr, paddr + size); - - dma_cache_maint(paddr, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dmac_inv_range); } /* -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 19/21] ARM: dma-mapping: use generic form of arch_sync_dma_* helpers @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> As the final step of the conversion to generic arch_sync_dma_* helpers, change the Arm implementation to look the same as the new generic version, by calling the dmac_{clean,inv,flush}_area low-level functions instead of the abstracted dmac_{map,unmap}_area version. On ARMv6/v7, this invalidates the caches after a DMA transfer from a device because of speculative prefetching, while on earlier versions it only needs to do this before the transfer. This should not change any of the current behavior. FIXME: address CONFIG_DMA_CACHE_RWFO properly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/dma-mapping-nommu.c | 11 +++---- arch/arm/mm/dma-mapping.c | 53 +++++++++++++++++++++++---------- 2 files changed, 43 insertions(+), 21 deletions(-) diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index cfd9c933d2f0..12b5c6ae93fc 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -16,12 +16,13 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dmac_map_area(__va(paddr), size, dir); - - if (dir == DMA_FROM_DEVICE) + if (dir == DMA_FROM_DEVICE) { + dmac_inv_range(__va(paddr), __va(paddr + size)); outer_inv_range(paddr, paddr + size); - else + } else { + dmac_clean_range(__va(paddr), __va(paddr + size)); outer_clean_range(paddr, paddr + size); + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, @@ -29,7 +30,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, { if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dmac_unmap_area(__va(paddr), size, dir); + dmac_inv_range(__va(paddr), __va(paddr)); } } diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index ce4b74f34a58..cc702cb27ae7 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -623,8 +623,7 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, } static void dma_cache_maint(phys_addr_t paddr, - size_t size, enum dma_data_direction dir, - void (*op)(const void *, size_t, int)) + size_t size, void (*op)(const void *, const void *)) { unsigned long pfn = PFN_DOWN(paddr); unsigned long offset = paddr % PAGE_SIZE; @@ -647,18 +646,18 @@ static void dma_cache_maint(phys_addr_t paddr, if (cache_is_vipt_nonaliasing()) { vaddr = kmap_atomic(page); - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_atomic(vaddr); } else { vaddr = kmap_high_get(page); if (vaddr) { - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_high(page); } } } else { vaddr = page_address(page) + offset; - op(vaddr, len, dir); + op(vaddr, vaddr + len); } offset = 0; pfn++; @@ -666,6 +665,18 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +static bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + if (IS_ENABLED(CONFIG_CPU_V6) || + IS_ENABLED(CONFIG_CPU_V6K) || + IS_ENABLED(CONFIG_CPU_V7) || + IS_ENABLED(CONFIG_CPU_V7M)) + return true; + + /* FIXME: runtime detection */ + return false; +} + /* * Make an area consistent for devices. * Note: Drivers should NOT use this function directly. @@ -674,25 +685,35 @@ static void dma_cache_maint(phys_addr_t paddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_cache_maint(paddr, size, dir, dmac_map_area); - - if (dir == DMA_FROM_DEVICE) { - outer_inv_range(paddr, paddr + size); - } else { + switch (dir) { + case DMA_TO_DEVICE: + dma_cache_maint(paddr, size, dmac_clean_range); outer_clean_range(paddr, paddr + size); + break; + case DMA_FROM_DEVICE: + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); + break; + case DMA_BIDIRECTIONAL: + if (arch_sync_dma_cpu_needs_post_dma_flush()) { + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); + } else { + dma_cache_maint(paddr, size, dmac_flush_range); + outer_flush_range(paddr, paddr + size); + } + break; + default: + break; } - /* FIXME: non-speculating: flush on bidirectional mappings? */ } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - /* FIXME: non-speculating: not required */ - /* in any case, don't bother invalidating if DMA to device */ - if (dir != DMA_TO_DEVICE) { + if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { outer_inv_range(paddr, paddr + size); - - dma_cache_maint(paddr, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dmac_inv_range); } /* -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The arm version of the arch_sync_dma_for_cpu() function annotates pages as PG_dcache_clean after a DMA, but no other architecture does this here. On ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense to use the same hook in order to have identical arch_sync_dma_for_cpu() semantics as all other architectures. Splitting this out has multiple effects: - for dma-direct, this now gets called after arch_sync_dma_for_cpu() for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While it would not be harmful to keep doing it for bidirectional mappings, those are apparently not used in any callers that care about the flag. - Since arm has its own dma-iommu abstraction, this now also needs to call the same function, so the calls are added there to mirror the dma-direct version. - Like dma-direct, the dma-iommu version now marks the dcache clean for both coherent and noncoherent devices after a DMA, but it only does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. [ HELP NEEDED: can anyone confirm that it is a correct assumption on arm that a cache-coherent device writing to a page always results in it being in a PG_dcache_clean state like on ia64, or can a device write directly into the dcache?] Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/Kconfig | 1 + arch/arm/mm/dma-mapping.c | 71 +++++++++++++++++++++++---------------- 2 files changed, 43 insertions(+), 29 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e24a9820e12f..125d58c54ab1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -7,6 +7,7 @@ config ARM select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_DEBUG_VIRTUAL if MMU + select ARCH_HAS_DMA_MARK_CLEAN if MMU select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index cc702cb27ae7..b703cb83d27e 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) +{ + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (size < PAGE_SIZE) + return; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, &page->flags); + left -= PAGE_SIZE; + } +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, outer_inv_range(paddr, paddr + size); dma_cache_maint(paddr, size, dmac_inv_range); } - - /* - * Mark the D-cache clean for these pages to avoid extra flushing. - */ - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn = PFN_UP(paddr); - unsigned long off = paddr & (PAGE_SIZE - 1); - size_t left = size; - - if (off) - left -= PAGE_SIZE - off; - - while (left >= PAGE_SIZE) { - struct page *page = pfn_to_page(pfn++); - set_bit(PG_dcache_clean, &page->flags); - left -= PAGE_SIZE; - } - } } #ifdef CONFIG_ARM_DMA_USE_IOMMU @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, return -EINVAL; } +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, + enum dma_data_direction dir, + bool dma_coherent) +{ + if (!dma_coherent) + arch_sync_dma_for_cpu(phys, s->length, dir); + + if (dir == DMA_FROM_DEVICE) + arch_dma_mark_clean(phys, s->length); +} + /** * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg * @dev: valid struct device pointer @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, if (sg_dma_len(s)) __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent); } } @@ -1335,12 +1351,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *s; int i; - if (dev->dma_coherent) - return; - for_each_sg(sg, s, nents, i) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); - + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent); } /** @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, if (!iova) return; - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) phys = iommu_iova_to_phys(mapping->domain, handle); - arch_sync_dma_for_cpu(phys, size, dir); + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); } iommu_unmap(mapping->domain, iova, len); @@ -1497,11 +1510,11 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); phys_addr_t phys; - if (dev->dma_coherent || !(handle & PAGE_MASK)) + if (!(handle & PAGE_MASK)) return; phys = iommu_iova_to_phys(mapping->domain, handle); - arch_sync_dma_for_cpu(phys, size, dir); + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); } static void arm_iommu_sync_single_for_device(struct device *dev, -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The arm version of the arch_sync_dma_for_cpu() function annotates pages as PG_dcache_clean after a DMA, but no other architecture does this here. On ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense to use the same hook in order to have identical arch_sync_dma_for_cpu() semantics as all other architectures. Splitting this out has multiple effects: - for dma-direct, this now gets called after arch_sync_dma_for_cpu() for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While it would not be harmful to keep doing it for bidirectional mappings, those are apparently not used in any callers that care about the flag. - Since arm has its own dma-iommu abstraction, this now also needs to call the same function, so the calls are added there to mirror the dma-direct version. - Like dma-direct, the dma-iommu version now marks the dcache clean for both coherent and noncoherent devices after a DMA, but it only does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. [ HELP NEEDED: can anyone confirm that it is a correct assumption on arm that a cache-coherent device writing to a page always results in it being in a PG_dcache_clean state like on ia64, or can a device write directly into the dcache?] Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/Kconfig | 1 + arch/arm/mm/dma-mapping.c | 71 +++++++++++++++++++++++---------------- 2 files changed, 43 insertions(+), 29 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e24a9820e12f..125d58c54ab1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -7,6 +7,7 @@ config ARM select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_DEBUG_VIRTUAL if MMU + select ARCH_HAS_DMA_MARK_CLEAN if MMU select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index cc702cb27ae7..b703cb83d27e 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) +{ + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (size < PAGE_SIZE) + return; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, &page->flags); + left -= PAGE_SIZE; + } +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, outer_inv_range(paddr, paddr + size); dma_cache_maint(paddr, size, dmac_inv_range); } - - /* - * Mark the D-cache clean for these pages to avoid extra flushing. - */ - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn = PFN_UP(paddr); - unsigned long off = paddr & (PAGE_SIZE - 1); - size_t left = size; - - if (off) - left -= PAGE_SIZE - off; - - while (left >= PAGE_SIZE) { - struct page *page = pfn_to_page(pfn++); - set_bit(PG_dcache_clean, &page->flags); - left -= PAGE_SIZE; - } - } } #ifdef CONFIG_ARM_DMA_USE_IOMMU @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, return -EINVAL; } +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, + enum dma_data_direction dir, + bool dma_coherent) +{ + if (!dma_coherent) + arch_sync_dma_for_cpu(phys, s->length, dir); + + if (dir == DMA_FROM_DEVICE) + arch_dma_mark_clean(phys, s->length); +} + /** * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg * @dev: valid struct device pointer @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, if (sg_dma_len(s)) __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent); } } @@ -1335,12 +1351,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *s; int i; - if (dev->dma_coherent) - return; - for_each_sg(sg, s, nents, i) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); - + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent); } /** @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, if (!iova) return; - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) phys = iommu_iova_to_phys(mapping->domain, handle); - arch_sync_dma_for_cpu(phys, size, dir); + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); } iommu_unmap(mapping->domain, iova, len); @@ -1497,11 +1510,11 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); phys_addr_t phys; - if (dev->dma_coherent || !(handle & PAGE_MASK)) + if (!(handle & PAGE_MASK)) return; phys = iommu_iova_to_phys(mapping->domain, handle); - arch_sync_dma_for_cpu(phys, size, dir); + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); } static void arm_iommu_sync_single_for_device(struct device *dev, -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> The arm version of the arch_sync_dma_for_cpu() function annotates pages as PG_dcache_clean after a DMA, but no other architecture does this here. On ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense to use the same hook in order to have identical arch_sync_dma_for_cpu() semantics as all other architectures. Splitting this out has multiple effects: - for dma-direct, this now gets called after arch_sync_dma_for_cpu() for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While it would not be harmful to keep doing it for bidirectional mappings, those are apparently not used in any callers that care about the flag. - Since arm has its own dma-iommu abstraction, this now also needs to call the same function, so the calls are added there to mirror the dma-direct version. - Like dma-direct, the dma-iommu version now marks the dcache clean for both coherent and noncoherent devices after a DMA, but it only does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. [ HELP NEEDED: can anyone confirm that it is a correct assumption on arm that a cache-coherent device writing to a page always results in it being in a PG_dcache_clean state like on ia64, or can a device write directly into the dcache?] Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/Kconfig | 1 + arch/arm/mm/dma-mapping.c | 71 +++++++++++++++++++++++---------------- 2 files changed, 43 insertions(+), 29 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e24a9820e12f..125d58c54ab1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -7,6 +7,7 @@ config ARM select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_DEBUG_VIRTUAL if MMU + select ARCH_HAS_DMA_MARK_CLEAN if MMU select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index cc702cb27ae7..b703cb83d27e 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) +{ + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (size < PAGE_SIZE) + return; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, &page->flags); + left -= PAGE_SIZE; + } +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, outer_inv_range(paddr, paddr + size); dma_cache_maint(paddr, size, dmac_inv_range); } - - /* - * Mark the D-cache clean for these pages to avoid extra flushing. - */ - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn = PFN_UP(paddr); - unsigned long off = paddr & (PAGE_SIZE - 1); - size_t left = size; - - if (off) - left -= PAGE_SIZE - off; - - while (left >= PAGE_SIZE) { - struct page *page = pfn_to_page(pfn++); - set_bit(PG_dcache_clean, &page->flags); - left -= PAGE_SIZE; - } - } } #ifdef CONFIG_ARM_DMA_USE_IOMMU @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, return -EINVAL; } +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, + enum dma_data_direction dir, + bool dma_coherent) +{ + if (!dma_coherent) + arch_sync_dma_for_cpu(phys, s->length, dir); + + if (dir == DMA_FROM_DEVICE) + arch_dma_mark_clean(phys, s->length); +} + /** * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg * @dev: valid struct device pointer @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, if (sg_dma_len(s)) __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent); } } @@ -1335,12 +1351,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *s; int i; - if (dev->dma_coherent) - return; - for_each_sg(sg, s, nents, i) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); - + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent); } /** @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, if (!iova) return; - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) phys = iommu_iova_to_phys(mapping->domain, handle); - arch_sync_dma_for_cpu(phys, size, dir); + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); } iommu_unmap(mapping->domain, iova, len); @@ -1497,11 +1510,11 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); phys_addr_t phys; - if (dev->dma_coherent || !(handle & PAGE_MASK)) + if (!(handle & PAGE_MASK)) return; phys = iommu_iova_to_phys(mapping->domain, handle); - arch_sync_dma_for_cpu(phys, size, dir); + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); } static void arm_iommu_sync_single_for_device(struct device *dev, -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The arm version of the arch_sync_dma_for_cpu() function annotates pages as PG_dcache_clean after a DMA, but no other architecture does this here. On ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense to use the same hook in order to have identical arch_sync_dma_for_cpu() semantics as all other architectures. Splitting this out has multiple effects: - for dma-direct, this now gets called after arch_sync_dma_for_cpu() for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While it would not be harmful to keep doing it for bidirectional mappings, those are apparently not used in any callers that care about the flag. - Since arm has its own dma-iommu abstraction, this now also needs to call the same function, so the calls are added there to mirror the dma-direct version. - Like dma-direct, the dma-iommu version now marks the dcache clean for both coherent and noncoherent devices after a DMA, but it only does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. [ HELP NEEDED: can anyone confirm that it is a correct assumption on arm that a cache-coherent device writing to a page always results in it being in a PG_dcache_clean state like on ia64, or can a device write directly into the dcache?] Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/Kconfig | 1 + arch/arm/mm/dma-mapping.c | 71 +++++++++++++++++++++++---------------- 2 files changed, 43 insertions(+), 29 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e24a9820e12f..125d58c54ab1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -7,6 +7,7 @@ config ARM select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_DEBUG_VIRTUAL if MMU + select ARCH_HAS_DMA_MARK_CLEAN if MMU select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index cc702cb27ae7..b703cb83d27e 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) +{ + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (size < PAGE_SIZE) + return; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, &page->flags); + left -= PAGE_SIZE; + } +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, outer_inv_range(paddr, paddr + size); dma_cache_maint(paddr, size, dmac_inv_range); } - - /* - * Mark the D-cache clean for these pages to avoid extra flushing. - */ - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn = PFN_UP(paddr); - unsigned long off = paddr & (PAGE_SIZE - 1); - size_t left = size; - - if (off) - left -= PAGE_SIZE - off; - - while (left >= PAGE_SIZE) { - struct page *page = pfn_to_page(pfn++); - set_bit(PG_dcache_clean, &page->flags); - left -= PAGE_SIZE; - } - } } #ifdef CONFIG_ARM_DMA_USE_IOMMU @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, return -EINVAL; } +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, + enum dma_data_direction dir, + bool dma_coherent) +{ + if (!dma_coherent) + arch_sync_dma_for_cpu(phys, s->length, dir); + + if (dir == DMA_FROM_DEVICE) + arch_dma_mark_clean(phys, s->length); +} + /** * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg * @dev: valid struct device pointer @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, if (sg_dma_len(s)) __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent); } } @@ -1335,12 +1351,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *s; int i; - if (dev->dma_coherent) - return; - for_each_sg(sg, s, nents, i) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); - + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent); } /** @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, if (!iova) return; - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) phys = iommu_iova_to_phys(mapping->domain, handle); - arch_sync_dma_for_cpu(phys, size, dir); + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); } iommu_unmap(mapping->domain, iova, len); @@ -1497,11 +1510,11 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); phys_addr_t phys; - if (dev->dma_coherent || !(handle & PAGE_MASK)) + if (!(handle & PAGE_MASK)) return; phys = iommu_iova_to_phys(mapping->domain, handle); - arch_sync_dma_for_cpu(phys, size, dir); + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); } static void arm_iommu_sync_single_for_device(struct device *dev, -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> The arm version of the arch_sync_dma_for_cpu() function annotates pages as PG_dcache_clean after a DMA, but no other architecture does this here. On ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense to use the same hook in order to have identical arch_sync_dma_for_cpu() semantics as all other architectures. Splitting this out has multiple effects: - for dma-direct, this now gets called after arch_sync_dma_for_cpu() for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While it would not be harmful to keep doing it for bidirectional mappings, those are apparently not used in any callers that care about the flag. - Since arm has its own dma-iommu abstraction, this now also needs to call the same function, so the calls are added there to mirror the dma-direct version. - Like dma-direct, the dma-iommu version now marks the dcache clean for both coherent and noncoherent devices after a DMA, but it only does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. [ HELP NEEDED: can anyone confirm that it is a correct assumption on arm that a cache-coherent device writing to a page always results in it being in a PG_dcache_clean state like on ia64, or can a device write directly into the dcache?] Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/Kconfig | 1 + arch/arm/mm/dma-mapping.c | 71 +++++++++++++++++++++++---------------- 2 files changed, 43 insertions(+), 29 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e24a9820e12f..125d58c54ab1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -7,6 +7,7 @@ config ARM select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_DEBUG_VIRTUAL if MMU + select ARCH_HAS_DMA_MARK_CLEAN if MMU select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index cc702cb27ae7..b703cb83d27e 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) +{ + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (size < PAGE_SIZE) + return; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, &page->flags); + left -= PAGE_SIZE; + } +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, outer_inv_range(paddr, paddr + size); dma_cache_maint(paddr, size, dmac_inv_range); } - - /* - * Mark the D-cache clean for these pages to avoid extra flushing. - */ - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn = PFN_UP(paddr); - unsigned long off = paddr & (PAGE_SIZE - 1); - size_t left = size; - - if (off) - left -= PAGE_SIZE - off; - - while (left >= PAGE_SIZE) { - struct page *page = pfn_to_page(pfn++); - set_bit(PG_dcache_clean, &page->flags); - left -= PAGE_SIZE; - } - } } #ifdef CONFIG_ARM_DMA_USE_IOMMU @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, return -EINVAL; } +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, + enum dma_data_direction dir, + bool dma_coherent) +{ + if (!dma_coherent) + arch_sync_dma_for_cpu(phys, s->length, dir); + + if (dir == DMA_FROM_DEVICE) + arch_dma_mark_clean(phys, s->length); +} + /** * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg * @dev: valid struct device pointer @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, if (sg_dma_len(s)) __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent); } } @@ -1335,12 +1351,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *s; int i; - if (dev->dma_coherent) - return; - for_each_sg(sg, s, nents, i) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); - + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent); } /** @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, if (!iova) return; - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) phys = iommu_iova_to_phys(mapping->domain, handle); - arch_sync_dma_for_cpu(phys, size, dir); + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); } iommu_unmap(mapping->domain, iova, len); @@ -1497,11 +1510,11 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); phys_addr_t phys; - if (dev->dma_coherent || !(handle & PAGE_MASK)) + if (!(handle & PAGE_MASK)) return; phys = iommu_iova_to_phys(mapping->domain, handle); - arch_sync_dma_for_cpu(phys, size, dir); + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); } static void arm_iommu_sync_single_for_device(struct device *dev, -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> The arm version of the arch_sync_dma_for_cpu() function annotates pages as PG_dcache_clean after a DMA, but no other architecture does this here. On ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense to use the same hook in order to have identical arch_sync_dma_for_cpu() semantics as all other architectures. Splitting this out has multiple effects: - for dma-direct, this now gets called after arch_sync_dma_for_cpu() for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While it would not be harmful to keep doing it for bidirectional mappings, those are apparently not used in any callers that care about the flag. - Since arm has its own dma-iommu abstraction, this now also needs to call the same function, so the calls are added there to mirror the dma-direct version. - Like dma-direct, the dma-iommu version now marks the dcache clean for both coherent and noncoherent devices after a DMA, but it only does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. [ HELP NEEDED: can anyone confirm that it is a correct assumption on arm that a cache-coherent device writing to a page always results in it being in a PG_dcache_clean state like on ia64, or can a device write directly into the dcache?] Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/Kconfig | 1 + arch/arm/mm/dma-mapping.c | 71 +++++++++++++++++++++++---------------- 2 files changed, 43 insertions(+), 29 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e24a9820e12f..125d58c54ab1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -7,6 +7,7 @@ config ARM select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_DEBUG_VIRTUAL if MMU + select ARCH_HAS_DMA_MARK_CLEAN if MMU select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index cc702cb27ae7..b703cb83d27e 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) +{ + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (size < PAGE_SIZE) + return; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, &page->flags); + left -= PAGE_SIZE; + } +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, outer_inv_range(paddr, paddr + size); dma_cache_maint(paddr, size, dmac_inv_range); } - - /* - * Mark the D-cache clean for these pages to avoid extra flushing. - */ - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn = PFN_UP(paddr); - unsigned long off = paddr & (PAGE_SIZE - 1); - size_t left = size; - - if (off) - left -= PAGE_SIZE - off; - - while (left >= PAGE_SIZE) { - struct page *page = pfn_to_page(pfn++); - set_bit(PG_dcache_clean, &page->flags); - left -= PAGE_SIZE; - } - } } #ifdef CONFIG_ARM_DMA_USE_IOMMU @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, return -EINVAL; } +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, + enum dma_data_direction dir, + bool dma_coherent) +{ + if (!dma_coherent) + arch_sync_dma_for_cpu(phys, s->length, dir); + + if (dir == DMA_FROM_DEVICE) + arch_dma_mark_clean(phys, s->length); +} + /** * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg * @dev: valid struct device pointer @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, if (sg_dma_len(s)) __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent); } } @@ -1335,12 +1351,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *s; int i; - if (dev->dma_coherent) - return; - for_each_sg(sg, s, nents, i) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); - + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent); } /** @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, if (!iova) return; - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) phys = iommu_iova_to_phys(mapping->domain, handle); - arch_sync_dma_for_cpu(phys, size, dir); + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); } iommu_unmap(mapping->domain, iova, len); @@ -1497,11 +1510,11 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); phys_addr_t phys; - if (dev->dma_coherent || !(handle & PAGE_MASK)) + if (!(handle & PAGE_MASK)) return; phys = iommu_iova_to_phys(mapping->domain, handle); - arch_sync_dma_for_cpu(phys, size, dir); + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); } static void arm_iommu_sync_single_for_device(struct device *dev, -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:48 ` Robin Murphy -1 siblings, 0 replies; 456+ messages in thread From: Robin Murphy @ 2023-03-27 12:48 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On 2023-03-27 13:13, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. On > ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense > to use the same hook in order to have identical arch_sync_dma_for_cpu() > semantics as all other architectures. > > Splitting this out has multiple effects: > > - for dma-direct, this now gets called after arch_sync_dma_for_cpu() > for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While > it would not be harmful to keep doing it for bidirectional mappings, > those are apparently not used in any callers that care about the flag. > > - Since arm has its own dma-iommu abstraction, this now also needs to > call the same function, so the calls are added there to mirror the > dma-direct version. > > - Like dma-direct, the dma-iommu version now marks the dcache clean > for both coherent and noncoherent devices after a DMA, but it only > does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. > > [ HELP NEEDED: can anyone confirm that it is a correct assumption > on arm that a cache-coherent device writing to a page always results > in it being in a PG_dcache_clean state like on ia64, or can a device > write directly into the dcache?] In AMBA at least, if a snooping write hits in a cache then the data is most likely going to get routed directly into that cache. If it has write-back write-allocate attributes it could also land in any cache along its normal path to RAM; it wouldn't have to go all the way. Hence all the fun we have where treating a coherent device as non-coherent can still be almost as broken as the other way round :) Cheers, Robin. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arm/Kconfig | 1 + > arch/arm/mm/dma-mapping.c | 71 +++++++++++++++++++++++---------------- > 2 files changed, 43 insertions(+), 29 deletions(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index e24a9820e12f..125d58c54ab1 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -7,6 +7,7 @@ config ARM > select ARCH_HAS_BINFMT_FLAT > select ARCH_HAS_CURRENT_STACK_POINTER > select ARCH_HAS_DEBUG_VIRTUAL if MMU > + select ARCH_HAS_DMA_MARK_CLEAN if MMU > select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE > select ARCH_HAS_ELF_RANDOMIZE > select ARCH_HAS_FORTIFY_SOURCE > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index cc702cb27ae7..b703cb83d27e 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, > } while (left); > } > > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) > +{ > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (size < PAGE_SIZE) > + return; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > outer_inv_range(paddr, paddr + size); > dma_cache_maint(paddr, size, dmac_inv_range); > } > - > - /* > - * Mark the D-cache clean for these pages to avoid extra flushing. > - */ > - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { > - unsigned long pfn = PFN_UP(paddr); > - unsigned long off = paddr & (PAGE_SIZE - 1); > - size_t left = size; > - > - if (off) > - left -= PAGE_SIZE - off; > - > - while (left >= PAGE_SIZE) { > - struct page *page = pfn_to_page(pfn++); > - set_bit(PG_dcache_clean, &page->flags); > - left -= PAGE_SIZE; > - } > - } > } > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, > return -EINVAL; > } > > +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, > + enum dma_data_direction dir, > + bool dma_coherent) > +{ > + if (!dma_coherent) > + arch_sync_dma_for_cpu(phys, s->length, dir); > + > + if (dir == DMA_FROM_DEVICE) > + arch_dma_mark_clean(phys, s->length); > +} > + > /** > * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg > * @dev: valid struct device pointer > @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, > if (sg_dma_len(s)) > __iommu_remove_mapping(dev, sg_dma_address(s), > sg_dma_len(s)); > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, > + dev->dma_coherent); > } > } > > @@ -1335,12 +1351,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, > struct scatterlist *s; > int i; > > - if (dev->dma_coherent) > - return; > - > for_each_sg(sg, s, nents, i) > - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); > - > + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, > + dev->dma_coherent); > } > > /** > @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, > if (!iova) > return; > > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > iommu_unmap(mapping->domain, iova, len); > @@ -1497,11 +1510,11 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, > struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); > phys_addr_t phys; > > - if (dev->dma_coherent || !(handle & PAGE_MASK)) > + if (!(handle & PAGE_MASK)) > return; > > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > static void arm_iommu_sync_single_for_device(struct device *dev, ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 12:48 ` Robin Murphy 0 siblings, 0 replies; 456+ messages in thread From: Robin Murphy @ 2023-03-27 12:48 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On 2023-03-27 13:13, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. On > ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense > to use the same hook in order to have identical arch_sync_dma_for_cpu() > semantics as all other architectures. > > Splitting this out has multiple effects: > > - for dma-direct, this now gets called after arch_sync_dma_for_cpu() > for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While > it would not be harmful to keep doing it for bidirectional mappings, > those are apparently not used in any callers that care about the flag. > > - Since arm has its own dma-iommu abstraction, this now also needs to > call the same function, so the calls are added there to mirror the > dma-direct version. > > - Like dma-direct, the dma-iommu version now marks the dcache clean > for both coherent and noncoherent devices after a DMA, but it only > does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. > > [ HELP NEEDED: can anyone confirm that it is a correct assumption > on arm that a cache-coherent device writing to a page always results > in it being in a PG_dcache_clean state like on ia64, or can a device > write directly into the dcache?] In AMBA at least, if a snooping write hits in a cache then the data is most likely going to get routed directly into that cache. If it has write-back write-allocate attributes it could also land in any cache along its normal path to RAM; it wouldn't have to go all the way. Hence all the fun we have where treating a coherent device as non-coherent can still be almost as broken as the other way round :) Cheers, Robin. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arm/Kconfig | 1 + > arch/arm/mm/dma-mapping.c | 71 +++++++++++++++++++++++---------------- > 2 files changed, 43 insertions(+), 29 deletions(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index e24a9820e12f..125d58c54ab1 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -7,6 +7,7 @@ config ARM > select ARCH_HAS_BINFMT_FLAT > select ARCH_HAS_CURRENT_STACK_POINTER > select ARCH_HAS_DEBUG_VIRTUAL if MMU > + select ARCH_HAS_DMA_MARK_CLEAN if MMU > select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE > select ARCH_HAS_ELF_RANDOMIZE > select ARCH_HAS_FORTIFY_SOURCE > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index cc702cb27ae7..b703cb83d27e 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, > } while (left); > } > > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) > +{ > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (size < PAGE_SIZE) > + return; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > outer_inv_range(paddr, paddr + size); > dma_cache_maint(paddr, size, dmac_inv_range); > } > - > - /* > - * Mark the D-cache clean for these pages to avoid extra flushing. > - */ > - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { > - unsigned long pfn = PFN_UP(paddr); > - unsigned long off = paddr & (PAGE_SIZE - 1); > - size_t left = size; > - > - if (off) > - left -= PAGE_SIZE - off; > - > - while (left >= PAGE_SIZE) { > - struct page *page = pfn_to_page(pfn++); > - set_bit(PG_dcache_clean, &page->flags); > - left -= PAGE_SIZE; > - } > - } > } > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, > return -EINVAL; > } > > +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, > + enum dma_data_direction dir, > + bool dma_coherent) > +{ > + if (!dma_coherent) > + arch_sync_dma_for_cpu(phys, s->length, dir); > + > + if (dir == DMA_FROM_DEVICE) > + arch_dma_mark_clean(phys, s->length); > +} > + > /** > * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg > * @dev: valid struct device pointer > @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, > if (sg_dma_len(s)) > __iommu_remove_mapping(dev, sg_dma_address(s), > sg_dma_len(s)); > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, > + dev->dma_coherent); > } > } > > @@ -1335,12 +1351,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, > struct scatterlist *s; > int i; > > - if (dev->dma_coherent) > - return; > - > for_each_sg(sg, s, nents, i) > - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); > - > + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, > + dev->dma_coherent); > } > > /** > @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, > if (!iova) > return; > > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > iommu_unmap(mapping->domain, iova, len); > @@ -1497,11 +1510,11 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, > struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); > phys_addr_t phys; > > - if (dev->dma_coherent || !(handle & PAGE_MASK)) > + if (!(handle & PAGE_MASK)) > return; > > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > static void arm_iommu_sync_single_for_device(struct device *dev, _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 12:48 ` Robin Murphy 0 siblings, 0 replies; 456+ messages in thread From: Robin Murphy @ 2023-03-27 12:48 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, linuxppc-dev, David S. Miller On 2023-03-27 13:13, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. On > ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense > to use the same hook in order to have identical arch_sync_dma_for_cpu() > semantics as all other architectures. > > Splitting this out has multiple effects: > > - for dma-direct, this now gets called after arch_sync_dma_for_cpu() > for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While > it would not be harmful to keep doing it for bidirectional mappings, > those are apparently not used in any callers that care about the flag. > > - Since arm has its own dma-iommu abstraction, this now also needs to > call the same function, so the calls are added there to mirror the > dma-direct version. > > - Like dma-direct, the dma-iommu version now marks the dcache clean > for both coherent and noncoherent devices after a DMA, but it only > does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. > > [ HELP NEEDED: can anyone confirm that it is a correct assumption > on arm that a cache-coherent device writing to a page always results > in it being in a PG_dcache_clean state like on ia64, or can a device > write directly into the dcache?] In AMBA at least, if a snooping write hits in a cache then the data is most likely going to get routed directly into that cache. If it has write-back write-allocate attributes it could also land in any cache along its normal path to RAM; it wouldn't have to go all the way. Hence all the fun we have where treating a coherent device as non-coherent can still be almost as broken as the other way round :) Cheers, Robin. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arm/Kconfig | 1 + > arch/arm/mm/dma-mapping.c | 71 +++++++++++++++++++++++---------------- > 2 files changed, 43 insertions(+), 29 deletions(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index e24a9820e12f..125d58c54ab1 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -7,6 +7,7 @@ config ARM > select ARCH_HAS_BINFMT_FLAT > select ARCH_HAS_CURRENT_STACK_POINTER > select ARCH_HAS_DEBUG_VIRTUAL if MMU > + select ARCH_HAS_DMA_MARK_CLEAN if MMU > select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE > select ARCH_HAS_ELF_RANDOMIZE > select ARCH_HAS_FORTIFY_SOURCE > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index cc702cb27ae7..b703cb83d27e 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, > } while (left); > } > > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) > +{ > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (size < PAGE_SIZE) > + return; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > outer_inv_range(paddr, paddr + size); > dma_cache_maint(paddr, size, dmac_inv_range); > } > - > - /* > - * Mark the D-cache clean for these pages to avoid extra flushing. > - */ > - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { > - unsigned long pfn = PFN_UP(paddr); > - unsigned long off = paddr & (PAGE_SIZE - 1); > - size_t left = size; > - > - if (off) > - left -= PAGE_SIZE - off; > - > - while (left >= PAGE_SIZE) { > - struct page *page = pfn_to_page(pfn++); > - set_bit(PG_dcache_clean, &page->flags); > - left -= PAGE_SIZE; > - } > - } > } > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, > return -EINVAL; > } > > +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, > + enum dma_data_direction dir, > + bool dma_coherent) > +{ > + if (!dma_coherent) > + arch_sync_dma_for_cpu(phys, s->length, dir); > + > + if (dir == DMA_FROM_DEVICE) > + arch_dma_mark_clean(phys, s->length); > +} > + > /** > * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg > * @dev: valid struct device pointer > @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, > if (sg_dma_len(s)) > __iommu_remove_mapping(dev, sg_dma_address(s), > sg_dma_len(s)); > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, > + dev->dma_coherent); > } > } > > @@ -1335,12 +1351,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, > struct scatterlist *s; > int i; > > - if (dev->dma_coherent) > - return; > - > for_each_sg(sg, s, nents, i) > - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); > - > + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, > + dev->dma_coherent); > } > > /** > @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, > if (!iova) > return; > > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > iommu_unmap(mapping->domain, iova, len); > @@ -1497,11 +1510,11 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, > struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); > phys_addr_t phys; > > - if (dev->dma_coherent || !(handle & PAGE_MASK)) > + if (!(handle & PAGE_MASK)) > return; > > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > static void arm_iommu_sync_single_for_device(struct device *dev, ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 12:48 ` Robin Murphy 0 siblings, 0 replies; 456+ messages in thread From: Robin Murphy @ 2023-03-27 12:48 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On 2023-03-27 13:13, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. On > ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense > to use the same hook in order to have identical arch_sync_dma_for_cpu() > semantics as all other architectures. > > Splitting this out has multiple effects: > > - for dma-direct, this now gets called after arch_sync_dma_for_cpu() > for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While > it would not be harmful to keep doing it for bidirectional mappings, > those are apparently not used in any callers that care about the flag. > > - Since arm has its own dma-iommu abstraction, this now also needs to > call the same function, so the calls are added there to mirror the > dma-direct version. > > - Like dma-direct, the dma-iommu version now marks the dcache clean > for both coherent and noncoherent devices after a DMA, but it only > does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. > > [ HELP NEEDED: can anyone confirm that it is a correct assumption > on arm that a cache-coherent device writing to a page always results > in it being in a PG_dcache_clean state like on ia64, or can a device > write directly into the dcache?] In AMBA at least, if a snooping write hits in a cache then the data is most likely going to get routed directly into that cache. If it has write-back write-allocate attributes it could also land in any cache along its normal path to RAM; it wouldn't have to go all the way. Hence all the fun we have where treating a coherent device as non-coherent can still be almost as broken as the other way round :) Cheers, Robin. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arm/Kconfig | 1 + > arch/arm/mm/dma-mapping.c | 71 +++++++++++++++++++++++---------------- > 2 files changed, 43 insertions(+), 29 deletions(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index e24a9820e12f..125d58c54ab1 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -7,6 +7,7 @@ config ARM > select ARCH_HAS_BINFMT_FLAT > select ARCH_HAS_CURRENT_STACK_POINTER > select ARCH_HAS_DEBUG_VIRTUAL if MMU > + select ARCH_HAS_DMA_MARK_CLEAN if MMU > select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE > select ARCH_HAS_ELF_RANDOMIZE > select ARCH_HAS_FORTIFY_SOURCE > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index cc702cb27ae7..b703cb83d27e 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, > } while (left); > } > > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) > +{ > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (size < PAGE_SIZE) > + return; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > outer_inv_range(paddr, paddr + size); > dma_cache_maint(paddr, size, dmac_inv_range); > } > - > - /* > - * Mark the D-cache clean for these pages to avoid extra flushing. > - */ > - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { > - unsigned long pfn = PFN_UP(paddr); > - unsigned long off = paddr & (PAGE_SIZE - 1); > - size_t left = size; > - > - if (off) > - left -= PAGE_SIZE - off; > - > - while (left >= PAGE_SIZE) { > - struct page *page = pfn_to_page(pfn++); > - set_bit(PG_dcache_clean, &page->flags); > - left -= PAGE_SIZE; > - } > - } > } > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, > return -EINVAL; > } > > +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, > + enum dma_data_direction dir, > + bool dma_coherent) > +{ > + if (!dma_coherent) > + arch_sync_dma_for_cpu(phys, s->length, dir); > + > + if (dir == DMA_FROM_DEVICE) > + arch_dma_mark_clean(phys, s->length); > +} > + > /** > * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg > * @dev: valid struct device pointer > @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, > if (sg_dma_len(s)) > __iommu_remove_mapping(dev, sg_dma_address(s), > sg_dma_len(s)); > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, > + dev->dma_coherent); > } > } > > @@ -1335,12 +1351,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, > struct scatterlist *s; > int i; > > - if (dev->dma_coherent) > - return; > - > for_each_sg(sg, s, nents, i) > - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); > - > + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, > + dev->dma_coherent); > } > > /** > @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, > if (!iova) > return; > > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > iommu_unmap(mapping->domain, iova, len); > @@ -1497,11 +1510,11 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, > struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); > phys_addr_t phys; > > - if (dev->dma_coherent || !(handle & PAGE_MASK)) > + if (!(handle & PAGE_MASK)) > return; > > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > static void arm_iommu_sync_single_for_device(struct device *dev, _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 12:48 ` Robin Murphy 0 siblings, 0 replies; 456+ messages in thread From: Robin Murphy @ 2023-03-27 12:48 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On 2023-03-27 13:13, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. On > ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense > to use the same hook in order to have identical arch_sync_dma_for_cpu() > semantics as all other architectures. > > Splitting this out has multiple effects: > > - for dma-direct, this now gets called after arch_sync_dma_for_cpu() > for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While > it would not be harmful to keep doing it for bidirectional mappings, > those are apparently not used in any callers that care about the flag. > > - Since arm has its own dma-iommu abstraction, this now also needs to > call the same function, so the calls are added there to mirror the > dma-direct version. > > - Like dma-direct, the dma-iommu version now marks the dcache clean > for both coherent and noncoherent devices after a DMA, but it only > does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. > > [ HELP NEEDED: can anyone confirm that it is a correct assumption > on arm that a cache-coherent device writing to a page always results > in it being in a PG_dcache_clean state like on ia64, or can a device > write directly into the dcache?] In AMBA at least, if a snooping write hits in a cache then the data is most likely going to get routed directly into that cache. If it has write-back write-allocate attributes it could also land in any cache along its normal path to RAM; it wouldn't have to go all the way. Hence all the fun we have where treating a coherent device as non-coherent can still be almost as broken as the other way round :) Cheers, Robin. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arm/Kconfig | 1 + > arch/arm/mm/dma-mapping.c | 71 +++++++++++++++++++++++---------------- > 2 files changed, 43 insertions(+), 29 deletions(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index e24a9820e12f..125d58c54ab1 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -7,6 +7,7 @@ config ARM > select ARCH_HAS_BINFMT_FLAT > select ARCH_HAS_CURRENT_STACK_POINTER > select ARCH_HAS_DEBUG_VIRTUAL if MMU > + select ARCH_HAS_DMA_MARK_CLEAN if MMU > select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE > select ARCH_HAS_ELF_RANDOMIZE > select ARCH_HAS_FORTIFY_SOURCE > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index cc702cb27ae7..b703cb83d27e 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, > } while (left); > } > > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) > +{ > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (size < PAGE_SIZE) > + return; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > outer_inv_range(paddr, paddr + size); > dma_cache_maint(paddr, size, dmac_inv_range); > } > - > - /* > - * Mark the D-cache clean for these pages to avoid extra flushing. > - */ > - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { > - unsigned long pfn = PFN_UP(paddr); > - unsigned long off = paddr & (PAGE_SIZE - 1); > - size_t left = size; > - > - if (off) > - left -= PAGE_SIZE - off; > - > - while (left >= PAGE_SIZE) { > - struct page *page = pfn_to_page(pfn++); > - set_bit(PG_dcache_clean, &page->flags); > - left -= PAGE_SIZE; > - } > - } > } > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, > return -EINVAL; > } > > +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, > + enum dma_data_direction dir, > + bool dma_coherent) > +{ > + if (!dma_coherent) > + arch_sync_dma_for_cpu(phys, s->length, dir); > + > + if (dir == DMA_FROM_DEVICE) > + arch_dma_mark_clean(phys, s->length); > +} > + > /** > * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg > * @dev: valid struct device pointer > @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, > if (sg_dma_len(s)) > __iommu_remove_mapping(dev, sg_dma_address(s), > sg_dma_len(s)); > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, > + dev->dma_coherent); > } > } > > @@ -1335,12 +1351,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, > struct scatterlist *s; > int i; > > - if (dev->dma_coherent) > - return; > - > for_each_sg(sg, s, nents, i) > - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); > - > + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, > + dev->dma_coherent); > } > > /** > @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, > if (!iova) > return; > > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > iommu_unmap(mapping->domain, iova, len); > @@ -1497,11 +1510,11 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, > struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); > phys_addr_t phys; > > - if (dev->dma_coherent || !(handle & PAGE_MASK)) > + if (!(handle & PAGE_MASK)) > return; > > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > static void arm_iommu_sync_single_for_device(struct device *dev, _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 12:48 ` Robin Murphy 0 siblings, 0 replies; 456+ messages in thread From: Robin Murphy @ 2023-03-27 12:48 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John On 2023-03-27 13:13, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. On > ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense > to use the same hook in order to have identical arch_sync_dma_for_cpu() > semantics as all other architectures. > > Splitting this out has multiple effects: > > - for dma-direct, this now gets called after arch_sync_dma_for_cpu() > for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While > it would not be harmful to keep doing it for bidirectional mappings, > those are apparently not used in any callers that care about the flag. > > - Since arm has its own dma-iommu abstraction, this now also needs to > call the same function, so the calls are added there to mirror the > dma-direct version. > > - Like dma-direct, the dma-iommu version now marks the dcache clean > for both coherent and noncoherent devices after a DMA, but it only > does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. > > [ HELP NEEDED: can anyone confirm that it is a correct assumption > on arm that a cache-coherent device writing to a page always results > in it being in a PG_dcache_clean state like on ia64, or can a device > write directly into the dcache?] In AMBA at least, if a snooping write hits in a cache then the data is most likely going to get routed directly into that cache. If it has write-back write-allocate attributes it could also land in any cache along its normal path to RAM; it wouldn't have to go all the way. Hence all the fun we have where treating a coherent device as non-coherent can still be almost as broken as the other way round :) Cheers, Robin. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arm/Kconfig | 1 + > arch/arm/mm/dma-mapping.c | 71 +++++++++++++++++++++++---------------- > 2 files changed, 43 insertions(+), 29 deletions(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index e24a9820e12f..125d58c54ab1 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -7,6 +7,7 @@ config ARM > select ARCH_HAS_BINFMT_FLAT > select ARCH_HAS_CURRENT_STACK_POINTER > select ARCH_HAS_DEBUG_VIRTUAL if MMU > + select ARCH_HAS_DMA_MARK_CLEAN if MMU > select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE > select ARCH_HAS_ELF_RANDOMIZE > select ARCH_HAS_FORTIFY_SOURCE > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index cc702cb27ae7..b703cb83d27e 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, > } while (left); > } > > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) > +{ > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (size < PAGE_SIZE) > + return; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > outer_inv_range(paddr, paddr + size); > dma_cache_maint(paddr, size, dmac_inv_range); > } > - > - /* > - * Mark the D-cache clean for these pages to avoid extra flushing. > - */ > - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { > - unsigned long pfn = PFN_UP(paddr); > - unsigned long off = paddr & (PAGE_SIZE - 1); > - size_t left = size; > - > - if (off) > - left -= PAGE_SIZE - off; > - > - while (left >= PAGE_SIZE) { > - struct page *page = pfn_to_page(pfn++); > - set_bit(PG_dcache_clean, &page->flags); > - left -= PAGE_SIZE; > - } > - } > } > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, > return -EINVAL; > } > > +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, > + enum dma_data_direction dir, > + bool dma_coherent) > +{ > + if (!dma_coherent) > + arch_sync_dma_for_cpu(phys, s->length, dir); > + > + if (dir == DMA_FROM_DEVICE) > + arch_dma_mark_clean(phys, s->length); > +} > + > /** > * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg > * @dev: valid struct device pointer > @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, > if (sg_dma_len(s)) > __iommu_remove_mapping(dev, sg_dma_address(s), > sg_dma_len(s)); > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, > + dev->dma_coherent); > } > } > > @@ -1335,12 +1351,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, > struct scatterlist *s; > int i; > > - if (dev->dma_coherent) > - return; > - > for_each_sg(sg, s, nents, i) > - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); > - > + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, > + dev->dma_coherent); > } > > /** > @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, > if (!iova) > return; > > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > iommu_unmap(mapping->domain, iova, len); > @@ -1497,11 +1510,11 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, > struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); > phys_addr_t phys; > > - if (dev->dma_coherent || !(handle & PAGE_MASK)) > + if (!(handle & PAGE_MASK)) > return; > > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > static void arm_iommu_sync_single_for_device(struct device *dev, ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-03-27 12:48 ` Robin Murphy ` (3 preceding siblings ...) (?) @ 2023-03-31 14:00 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 14:00 UTC (permalink / raw) To: Robin Murphy, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: > On 2023-03-27 13:13, Arnd Bergmann wrote: >> >> [ HELP NEEDED: can anyone confirm that it is a correct assumption >> on arm that a cache-coherent device writing to a page always results >> in it being in a PG_dcache_clean state like on ia64, or can a device >> write directly into the dcache?] > > In AMBA at least, if a snooping write hits in a cache then the data is > most likely going to get routed directly into that cache. If it has > write-back write-allocate attributes it could also land in any cache > along its normal path to RAM; it wouldn't have to go all the way. > > Hence all the fun we have where treating a coherent device as > non-coherent can still be almost as broken as the other way round :) Ok, thanks for the information. I'm still not sure whether this can result in the situation where PG_dcache_clean is wrong though. Specifically, the question is whether a DMA to a coherent buffer can end up in a dirty L1 dcache of one core and require to write back the dcache before invalidating the icache for that page. On ia64, this is not the case, the optimization here is to only flush the icache after a coherent DMA into an executable user page, while Arm only does this for noncoherent DMA but not coherent DMA. From your explanation it sounds like this might happen, even though that would mean that "coherent" DMA is slightly less coherent than it is elsewhere. To be on the safe side, I'd have to pass a flag into arch_dma_mark_clean() about coherency, to let the arm implementation still require the extra dcache flush for coherent DMA, while ia64 can ignore that flag. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 14:00 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 14:00 UTC (permalink / raw) To: Robin Murphy, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: > On 2023-03-27 13:13, Arnd Bergmann wrote: >> >> [ HELP NEEDED: can anyone confirm that it is a correct assumption >> on arm that a cache-coherent device writing to a page always results >> in it being in a PG_dcache_clean state like on ia64, or can a device >> write directly into the dcache?] > > In AMBA at least, if a snooping write hits in a cache then the data is > most likely going to get routed directly into that cache. If it has > write-back write-allocate attributes it could also land in any cache > along its normal path to RAM; it wouldn't have to go all the way. > > Hence all the fun we have where treating a coherent device as > non-coherent can still be almost as broken as the other way round :) Ok, thanks for the information. I'm still not sure whether this can result in the situation where PG_dcache_clean is wrong though. Specifically, the question is whether a DMA to a coherent buffer can end up in a dirty L1 dcache of one core and require to write back the dcache before invalidating the icache for that page. On ia64, this is not the case, the optimization here is to only flush the icache after a coherent DMA into an executable user page, while Arm only does this for noncoherent DMA but not coherent DMA. From your explanation it sounds like this might happen, even though that would mean that "coherent" DMA is slightly less coherent than it is elsewhere. To be on the safe side, I'd have to pass a flag into arch_dma_mark_clean() about coherency, to let the arm implementation still require the extra dcache flush for coherent DMA, while ia64 can ignore that flag. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 14:00 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 14:00 UTC (permalink / raw) To: Robin Murphy, Arnd Bergmann, linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Brian Cain, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, linuxppc-dev, David S . Miller On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: > On 2023-03-27 13:13, Arnd Bergmann wrote: >> >> [ HELP NEEDED: can anyone confirm that it is a correct assumption >> on arm that a cache-coherent device writing to a page always results >> in it being in a PG_dcache_clean state like on ia64, or can a device >> write directly into the dcache?] > > In AMBA at least, if a snooping write hits in a cache then the data is > most likely going to get routed directly into that cache. If it has > write-back write-allocate attributes it could also land in any cache > along its normal path to RAM; it wouldn't have to go all the way. > > Hence all the fun we have where treating a coherent device as > non-coherent can still be almost as broken as the other way round :) Ok, thanks for the information. I'm still not sure whether this can result in the situation where PG_dcache_clean is wrong though. Specifically, the question is whether a DMA to a coherent buffer can end up in a dirty L1 dcache of one core and require to write back the dcache before invalidating the icache for that page. On ia64, this is not the case, the optimization here is to only flush the icache after a coherent DMA into an executable user page, while Arm only does this for noncoherent DMA but not coherent DMA. From your explanation it sounds like this might happen, even though that would mean that "coherent" DMA is slightly less coherent than it is elsewhere. To be on the safe side, I'd have to pass a flag into arch_dma_mark_clean() about coherency, to let the arm implementation still require the extra dcache flush for coherent DMA, while ia64 can ignore that flag. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 14:00 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 14:00 UTC (permalink / raw) To: Robin Murphy, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: > On 2023-03-27 13:13, Arnd Bergmann wrote: >> >> [ HELP NEEDED: can anyone confirm that it is a correct assumption >> on arm that a cache-coherent device writing to a page always results >> in it being in a PG_dcache_clean state like on ia64, or can a device >> write directly into the dcache?] > > In AMBA at least, if a snooping write hits in a cache then the data is > most likely going to get routed directly into that cache. If it has > write-back write-allocate attributes it could also land in any cache > along its normal path to RAM; it wouldn't have to go all the way. > > Hence all the fun we have where treating a coherent device as > non-coherent can still be almost as broken as the other way round :) Ok, thanks for the information. I'm still not sure whether this can result in the situation where PG_dcache_clean is wrong though. Specifically, the question is whether a DMA to a coherent buffer can end up in a dirty L1 dcache of one core and require to write back the dcache before invalidating the icache for that page. On ia64, this is not the case, the optimization here is to only flush the icache after a coherent DMA into an executable user page, while Arm only does this for noncoherent DMA but not coherent DMA. From your explanation it sounds like this might happen, even though that would mean that "coherent" DMA is slightly less coherent than it is elsewhere. To be on the safe side, I'd have to pass a flag into arch_dma_mark_clean() about coherency, to let the arm implementation still require the extra dcache flush for coherent DMA, while ia64 can ignore that flag. Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 14:00 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 14:00 UTC (permalink / raw) To: Robin Murphy, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: > On 2023-03-27 13:13, Arnd Bergmann wrote: >> >> [ HELP NEEDED: can anyone confirm that it is a correct assumption >> on arm that a cache-coherent device writing to a page always results >> in it being in a PG_dcache_clean state like on ia64, or can a device >> write directly into the dcache?] > > In AMBA at least, if a snooping write hits in a cache then the data is > most likely going to get routed directly into that cache. If it has > write-back write-allocate attributes it could also land in any cache > along its normal path to RAM; it wouldn't have to go all the way. > > Hence all the fun we have where treating a coherent device as > non-coherent can still be almost as broken as the other way round :) Ok, thanks for the information. I'm still not sure whether this can result in the situation where PG_dcache_clean is wrong though. Specifically, the question is whether a DMA to a coherent buffer can end up in a dirty L1 dcache of one core and require to write back the dcache before invalidating the icache for that page. On ia64, this is not the case, the optimization here is to only flush the icache after a coherent DMA into an executable user page, while Arm only does this for noncoherent DMA but not coherent DMA. From your explanation it sounds like this might happen, even though that would mean that "coherent" DMA is slightly less coherent than it is elsewhere. To be on the safe side, I'd have to pass a flag into arch_dma_mark_clean() about coherency, to let the arm implementation still require the extra dcache flush for coherent DMA, while ia64 can ignore that flag. Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 14:00 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 14:00 UTC (permalink / raw) To: Robin Murphy, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, J On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: > On 2023-03-27 13:13, Arnd Bergmann wrote: >> >> [ HELP NEEDED: can anyone confirm that it is a correct assumption >> on arm that a cache-coherent device writing to a page always results >> in it being in a PG_dcache_clean state like on ia64, or can a device >> write directly into the dcache?] > > In AMBA at least, if a snooping write hits in a cache then the data is > most likely going to get routed directly into that cache. If it has > write-back write-allocate attributes it could also land in any cache > along its normal path to RAM; it wouldn't have to go all the way. > > Hence all the fun we have where treating a coherent device as > non-coherent can still be almost as broken as the other way round :) Ok, thanks for the information. I'm still not sure whether this can result in the situation where PG_dcache_clean is wrong though. Specifically, the question is whether a DMA to a coherent buffer can end up in a dirty L1 dcache of one core and require to write back the dcache before invalidating the icache for that page. On ia64, this is not the case, the optimization here is to only flush the icache after a coherent DMA into an executable user page, while Arm only does this for noncoherent DMA but not coherent DMA. From your explanation it sounds like this might happen, even though that would mean that "coherent" DMA is slightly less coherent than it is elsewhere. To be on the safe side, I'd have to pass a flag into arch_dma_mark_clean() about coherency, to let the arm implementation still require the extra dcache flush for coherent DMA, while ia64 can ignore that flag. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-03-31 14:00 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-31 15:12 ` Robin Murphy -1 siblings, 0 replies; 456+ messages in thread From: Robin Murphy @ 2023-03-31 15:12 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On 31/03/2023 3:00 pm, Arnd Bergmann wrote: > On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> On 2023-03-27 13:13, Arnd Bergmann wrote: >>> >>> [ HELP NEEDED: can anyone confirm that it is a correct assumption >>> on arm that a cache-coherent device writing to a page always results >>> in it being in a PG_dcache_clean state like on ia64, or can a device >>> write directly into the dcache?] >> >> In AMBA at least, if a snooping write hits in a cache then the data is >> most likely going to get routed directly into that cache. If it has >> write-back write-allocate attributes it could also land in any cache >> along its normal path to RAM; it wouldn't have to go all the way. >> >> Hence all the fun we have where treating a coherent device as >> non-coherent can still be almost as broken as the other way round :) > > Ok, thanks for the information. I'm still not sure whether this can > result in the situation where PG_dcache_clean is wrong though. > > Specifically, the question is whether a DMA to a coherent buffer > can end up in a dirty L1 dcache of one core and require to write > back the dcache before invalidating the icache for that page. > > On ia64, this is not the case, the optimization here is to > only flush the icache after a coherent DMA into an executable > user page, while Arm only does this for noncoherent DMA but not > coherent DMA. > > From your explanation it sounds like this might happen, > even though that would mean that "coherent" DMA is slightly > less coherent than it is elsewhere. > > To be on the safe side, I'd have to pass a flag into > arch_dma_mark_clean() about coherency, to let the arm > implementation still require the extra dcache flush > for coherent DMA, while ia64 can ignore that flag. Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA write should be pretty much equivalent to a coherent write by another CPU (or indeed the local CPU itself) - nothing says that it *couldn't* dirty a line in a data cache above the level of unification, so in general the assumption must be that, yes, if coherent DMA is writing data intended to be executable, then it's going to want a Dcache clean to PoU and an Icache invalidate to PoU before trying to execute it. By comparison, a non-coherent DMA transfer will inherently have to invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot leave dirty data above the PoU, so only the Icache maintenance is required in the executable case. (FWIW I believe the Armv8 IDC/DIC features can safely be considered irrelevant to 32-bit kernels) I don't know a great deal about IA-64, but it appears to be using its PG_arch_1 flag in a subtly different manner to Arm, namely to optimise out the *Icache* maintenance. So if anything, it seems IA-64 is the weirdo here (who'd have guessed?) where DMA manages to be *more* coherent than the CPUs themselves :) This is all now making me think we need some careful consideration of whether the benefits of consolidating code outweigh the confusion of conflating multiple different meanings of "clean" together... Thanks, Robin. ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 15:12 ` Robin Murphy 0 siblings, 0 replies; 456+ messages in thread From: Robin Murphy @ 2023-03-31 15:12 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On 31/03/2023 3:00 pm, Arnd Bergmann wrote: > On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> On 2023-03-27 13:13, Arnd Bergmann wrote: >>> >>> [ HELP NEEDED: can anyone confirm that it is a correct assumption >>> on arm that a cache-coherent device writing to a page always results >>> in it being in a PG_dcache_clean state like on ia64, or can a device >>> write directly into the dcache?] >> >> In AMBA at least, if a snooping write hits in a cache then the data is >> most likely going to get routed directly into that cache. If it has >> write-back write-allocate attributes it could also land in any cache >> along its normal path to RAM; it wouldn't have to go all the way. >> >> Hence all the fun we have where treating a coherent device as >> non-coherent can still be almost as broken as the other way round :) > > Ok, thanks for the information. I'm still not sure whether this can > result in the situation where PG_dcache_clean is wrong though. > > Specifically, the question is whether a DMA to a coherent buffer > can end up in a dirty L1 dcache of one core and require to write > back the dcache before invalidating the icache for that page. > > On ia64, this is not the case, the optimization here is to > only flush the icache after a coherent DMA into an executable > user page, while Arm only does this for noncoherent DMA but not > coherent DMA. > > From your explanation it sounds like this might happen, > even though that would mean that "coherent" DMA is slightly > less coherent than it is elsewhere. > > To be on the safe side, I'd have to pass a flag into > arch_dma_mark_clean() about coherency, to let the arm > implementation still require the extra dcache flush > for coherent DMA, while ia64 can ignore that flag. Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA write should be pretty much equivalent to a coherent write by another CPU (or indeed the local CPU itself) - nothing says that it *couldn't* dirty a line in a data cache above the level of unification, so in general the assumption must be that, yes, if coherent DMA is writing data intended to be executable, then it's going to want a Dcache clean to PoU and an Icache invalidate to PoU before trying to execute it. By comparison, a non-coherent DMA transfer will inherently have to invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot leave dirty data above the PoU, so only the Icache maintenance is required in the executable case. (FWIW I believe the Armv8 IDC/DIC features can safely be considered irrelevant to 32-bit kernels) I don't know a great deal about IA-64, but it appears to be using its PG_arch_1 flag in a subtly different manner to Arm, namely to optimise out the *Icache* maintenance. So if anything, it seems IA-64 is the weirdo here (who'd have guessed?) where DMA manages to be *more* coherent than the CPUs themselves :) This is all now making me think we need some careful consideration of whether the benefits of consolidating code outweigh the confusion of conflating multiple different meanings of "clean" together... Thanks, Robin. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 15:12 ` Robin Murphy 0 siblings, 0 replies; 456+ messages in thread From: Robin Murphy @ 2023-03-31 15:12 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Brian Cain, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, linuxppc-dev, David S . Miller On 31/03/2023 3:00 pm, Arnd Bergmann wrote: > On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> On 2023-03-27 13:13, Arnd Bergmann wrote: >>> >>> [ HELP NEEDED: can anyone confirm that it is a correct assumption >>> on arm that a cache-coherent device writing to a page always results >>> in it being in a PG_dcache_clean state like on ia64, or can a device >>> write directly into the dcache?] >> >> In AMBA at least, if a snooping write hits in a cache then the data is >> most likely going to get routed directly into that cache. If it has >> write-back write-allocate attributes it could also land in any cache >> along its normal path to RAM; it wouldn't have to go all the way. >> >> Hence all the fun we have where treating a coherent device as >> non-coherent can still be almost as broken as the other way round :) > > Ok, thanks for the information. I'm still not sure whether this can > result in the situation where PG_dcache_clean is wrong though. > > Specifically, the question is whether a DMA to a coherent buffer > can end up in a dirty L1 dcache of one core and require to write > back the dcache before invalidating the icache for that page. > > On ia64, this is not the case, the optimization here is to > only flush the icache after a coherent DMA into an executable > user page, while Arm only does this for noncoherent DMA but not > coherent DMA. > > From your explanation it sounds like this might happen, > even though that would mean that "coherent" DMA is slightly > less coherent than it is elsewhere. > > To be on the safe side, I'd have to pass a flag into > arch_dma_mark_clean() about coherency, to let the arm > implementation still require the extra dcache flush > for coherent DMA, while ia64 can ignore that flag. Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA write should be pretty much equivalent to a coherent write by another CPU (or indeed the local CPU itself) - nothing says that it *couldn't* dirty a line in a data cache above the level of unification, so in general the assumption must be that, yes, if coherent DMA is writing data intended to be executable, then it's going to want a Dcache clean to PoU and an Icache invalidate to PoU before trying to execute it. By comparison, a non-coherent DMA transfer will inherently have to invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot leave dirty data above the PoU, so only the Icache maintenance is required in the executable case. (FWIW I believe the Armv8 IDC/DIC features can safely be considered irrelevant to 32-bit kernels) I don't know a great deal about IA-64, but it appears to be using its PG_arch_1 flag in a subtly different manner to Arm, namely to optimise out the *Icache* maintenance. So if anything, it seems IA-64 is the weirdo here (who'd have guessed?) where DMA manages to be *more* coherent than the CPUs themselves :) This is all now making me think we need some careful consideration of whether the benefits of consolidating code outweigh the confusion of conflating multiple different meanings of "clean" together... Thanks, Robin. ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 15:12 ` Robin Murphy 0 siblings, 0 replies; 456+ messages in thread From: Robin Murphy @ 2023-03-31 15:12 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On 31/03/2023 3:00 pm, Arnd Bergmann wrote: > On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> On 2023-03-27 13:13, Arnd Bergmann wrote: >>> >>> [ HELP NEEDED: can anyone confirm that it is a correct assumption >>> on arm that a cache-coherent device writing to a page always results >>> in it being in a PG_dcache_clean state like on ia64, or can a device >>> write directly into the dcache?] >> >> In AMBA at least, if a snooping write hits in a cache then the data is >> most likely going to get routed directly into that cache. If it has >> write-back write-allocate attributes it could also land in any cache >> along its normal path to RAM; it wouldn't have to go all the way. >> >> Hence all the fun we have where treating a coherent device as >> non-coherent can still be almost as broken as the other way round :) > > Ok, thanks for the information. I'm still not sure whether this can > result in the situation where PG_dcache_clean is wrong though. > > Specifically, the question is whether a DMA to a coherent buffer > can end up in a dirty L1 dcache of one core and require to write > back the dcache before invalidating the icache for that page. > > On ia64, this is not the case, the optimization here is to > only flush the icache after a coherent DMA into an executable > user page, while Arm only does this for noncoherent DMA but not > coherent DMA. > > From your explanation it sounds like this might happen, > even though that would mean that "coherent" DMA is slightly > less coherent than it is elsewhere. > > To be on the safe side, I'd have to pass a flag into > arch_dma_mark_clean() about coherency, to let the arm > implementation still require the extra dcache flush > for coherent DMA, while ia64 can ignore that flag. Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA write should be pretty much equivalent to a coherent write by another CPU (or indeed the local CPU itself) - nothing says that it *couldn't* dirty a line in a data cache above the level of unification, so in general the assumption must be that, yes, if coherent DMA is writing data intended to be executable, then it's going to want a Dcache clean to PoU and an Icache invalidate to PoU before trying to execute it. By comparison, a non-coherent DMA transfer will inherently have to invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot leave dirty data above the PoU, so only the Icache maintenance is required in the executable case. (FWIW I believe the Armv8 IDC/DIC features can safely be considered irrelevant to 32-bit kernels) I don't know a great deal about IA-64, but it appears to be using its PG_arch_1 flag in a subtly different manner to Arm, namely to optimise out the *Icache* maintenance. So if anything, it seems IA-64 is the weirdo here (who'd have guessed?) where DMA manages to be *more* coherent than the CPUs themselves :) This is all now making me think we need some careful consideration of whether the benefits of consolidating code outweigh the confusion of conflating multiple different meanings of "clean" together... Thanks, Robin. _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 15:12 ` Robin Murphy 0 siblings, 0 replies; 456+ messages in thread From: Robin Murphy @ 2023-03-31 15:12 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On 31/03/2023 3:00 pm, Arnd Bergmann wrote: > On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> On 2023-03-27 13:13, Arnd Bergmann wrote: >>> >>> [ HELP NEEDED: can anyone confirm that it is a correct assumption >>> on arm that a cache-coherent device writing to a page always results >>> in it being in a PG_dcache_clean state like on ia64, or can a device >>> write directly into the dcache?] >> >> In AMBA at least, if a snooping write hits in a cache then the data is >> most likely going to get routed directly into that cache. If it has >> write-back write-allocate attributes it could also land in any cache >> along its normal path to RAM; it wouldn't have to go all the way. >> >> Hence all the fun we have where treating a coherent device as >> non-coherent can still be almost as broken as the other way round :) > > Ok, thanks for the information. I'm still not sure whether this can > result in the situation where PG_dcache_clean is wrong though. > > Specifically, the question is whether a DMA to a coherent buffer > can end up in a dirty L1 dcache of one core and require to write > back the dcache before invalidating the icache for that page. > > On ia64, this is not the case, the optimization here is to > only flush the icache after a coherent DMA into an executable > user page, while Arm only does this for noncoherent DMA but not > coherent DMA. > > From your explanation it sounds like this might happen, > even though that would mean that "coherent" DMA is slightly > less coherent than it is elsewhere. > > To be on the safe side, I'd have to pass a flag into > arch_dma_mark_clean() about coherency, to let the arm > implementation still require the extra dcache flush > for coherent DMA, while ia64 can ignore that flag. Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA write should be pretty much equivalent to a coherent write by another CPU (or indeed the local CPU itself) - nothing says that it *couldn't* dirty a line in a data cache above the level of unification, so in general the assumption must be that, yes, if coherent DMA is writing data intended to be executable, then it's going to want a Dcache clean to PoU and an Icache invalidate to PoU before trying to execute it. By comparison, a non-coherent DMA transfer will inherently have to invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot leave dirty data above the PoU, so only the Icache maintenance is required in the executable case. (FWIW I believe the Armv8 IDC/DIC features can safely be considered irrelevant to 32-bit kernels) I don't know a great deal about IA-64, but it appears to be using its PG_arch_1 flag in a subtly different manner to Arm, namely to optimise out the *Icache* maintenance. So if anything, it seems IA-64 is the weirdo here (who'd have guessed?) where DMA manages to be *more* coherent than the CPUs themselves :) This is all now making me think we need some careful consideration of whether the benefits of consolidating code outweigh the confusion of conflating multiple different meanings of "clean" together... Thanks, Robin. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 15:12 ` Robin Murphy 0 siblings, 0 replies; 456+ messages in thread From: Robin Murphy @ 2023-03-31 15:12 UTC (permalink / raw) To: Arnd Bergmann, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz On 31/03/2023 3:00 pm, Arnd Bergmann wrote: > On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> On 2023-03-27 13:13, Arnd Bergmann wrote: >>> >>> [ HELP NEEDED: can anyone confirm that it is a correct assumption >>> on arm that a cache-coherent device writing to a page always results >>> in it being in a PG_dcache_clean state like on ia64, or can a device >>> write directly into the dcache?] >> >> In AMBA at least, if a snooping write hits in a cache then the data is >> most likely going to get routed directly into that cache. If it has >> write-back write-allocate attributes it could also land in any cache >> along its normal path to RAM; it wouldn't have to go all the way. >> >> Hence all the fun we have where treating a coherent device as >> non-coherent can still be almost as broken as the other way round :) > > Ok, thanks for the information. I'm still not sure whether this can > result in the situation where PG_dcache_clean is wrong though. > > Specifically, the question is whether a DMA to a coherent buffer > can end up in a dirty L1 dcache of one core and require to write > back the dcache before invalidating the icache for that page. > > On ia64, this is not the case, the optimization here is to > only flush the icache after a coherent DMA into an executable > user page, while Arm only does this for noncoherent DMA but not > coherent DMA. > > From your explanation it sounds like this might happen, > even though that would mean that "coherent" DMA is slightly > less coherent than it is elsewhere. > > To be on the safe side, I'd have to pass a flag into > arch_dma_mark_clean() about coherency, to let the arm > implementation still require the extra dcache flush > for coherent DMA, while ia64 can ignore that flag. Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA write should be pretty much equivalent to a coherent write by another CPU (or indeed the local CPU itself) - nothing says that it *couldn't* dirty a line in a data cache above the level of unification, so in general the assumption must be that, yes, if coherent DMA is writing data intended to be executable, then it's going to want a Dcache clean to PoU and an Icache invalidate to PoU before trying to execute it. By comparison, a non-coherent DMA transfer will inherently have to invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot leave dirty data above the PoU, so only the Icache maintenance is required in the executable case. (FWIW I believe the Armv8 IDC/DIC features can safely be considered irrelevant to 32-bit kernels) I don't know a great deal about IA-64, but it appears to be using its PG_arch_1 flag in a subtly different manner to Arm, namely to optimise out the *Icache* maintenance. So if anything, it seems IA-64 is the weirdo here (who'd have guessed?) where DMA manages to be *more* coherent than the CPUs themselves :) This is all now making me think we need some careful consideration of whether the benefits of consolidating code outweigh the confusion of conflating multiple different meanings of "clean" together... Thanks, Robin. ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-03-31 15:12 ` Robin Murphy ` (3 preceding siblings ...) (?) @ 2023-03-31 17:20 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 17:20 UTC (permalink / raw) To: Robin Murphy, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 17:12, Robin Murphy wrote: > On 31/03/2023 3:00 pm, Arnd Bergmann wrote: >> On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> >> To be on the safe side, I'd have to pass a flag into >> arch_dma_mark_clean() about coherency, to let the arm >> implementation still require the extra dcache flush >> for coherent DMA, while ia64 can ignore that flag. > > Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA > write should be pretty much equivalent to a coherent write by another > CPU (or indeed the local CPU itself) - nothing says that it *couldn't* > dirty a line in a data cache above the level of unification, so in > general the assumption must be that, yes, if coherent DMA is writing > data intended to be executable, then it's going to want a Dcache clean > to PoU and an Icache invalidate to PoU before trying to execute it. By > comparison, a non-coherent DMA transfer will inherently have to > invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot > leave dirty data above the PoU, so only the Icache maintenance is > required in the executable case. Ok, makes sense. I've already started reworking my patch for it. > (FWIW I believe the Armv8 IDC/DIC features can safely be considered > irrelevant to 32-bit kernels) > > I don't know a great deal about IA-64, but it appears to be using its > PG_arch_1 flag in a subtly different manner to Arm, namely to optimise > out the *Icache* maintenance. So if anything, it seems IA-64 is the > weirdo here (who'd have guessed?) where DMA manages to be *more* > coherent than the CPUs themselves :) I checked this in the ia64 manual, and as far as I can tell, it originally only had one cacheflush instruction that flushes the dcache and invalidates the icache at the same time. So flush_icache_range() actually does both and flush_dcache_page() instead just marks the page as dirty to ensure flush_icache_range() does not get skipped after a writing a page from the kernel. On later Itaniums, there is apparently a separate icache flush instruction that gets used in flush_icache_range(), but that still works for the DMA case that is allowed to skip the flush. > This is all now making me think we need some careful consideration of > whether the benefits of consolidating code outweigh the confusion of > conflating multiple different meanings of "clean" together... The difference in usage of PG_dcache_clean/PG_dcache_dirty/PG_arch_1 across architectures is certainly big enough that we can't just define a a common arch_dma_mark_clean() across architectures, but I think the idea of having a common entry point for arch_dma_mark_clean() to be called from the dma-mapping code to do something architecture specific after a DMA is clean still makes sense, Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 17:20 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 17:20 UTC (permalink / raw) To: Robin Murphy, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 17:12, Robin Murphy wrote: > On 31/03/2023 3:00 pm, Arnd Bergmann wrote: >> On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> >> To be on the safe side, I'd have to pass a flag into >> arch_dma_mark_clean() about coherency, to let the arm >> implementation still require the extra dcache flush >> for coherent DMA, while ia64 can ignore that flag. > > Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA > write should be pretty much equivalent to a coherent write by another > CPU (or indeed the local CPU itself) - nothing says that it *couldn't* > dirty a line in a data cache above the level of unification, so in > general the assumption must be that, yes, if coherent DMA is writing > data intended to be executable, then it's going to want a Dcache clean > to PoU and an Icache invalidate to PoU before trying to execute it. By > comparison, a non-coherent DMA transfer will inherently have to > invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot > leave dirty data above the PoU, so only the Icache maintenance is > required in the executable case. Ok, makes sense. I've already started reworking my patch for it. > (FWIW I believe the Armv8 IDC/DIC features can safely be considered > irrelevant to 32-bit kernels) > > I don't know a great deal about IA-64, but it appears to be using its > PG_arch_1 flag in a subtly different manner to Arm, namely to optimise > out the *Icache* maintenance. So if anything, it seems IA-64 is the > weirdo here (who'd have guessed?) where DMA manages to be *more* > coherent than the CPUs themselves :) I checked this in the ia64 manual, and as far as I can tell, it originally only had one cacheflush instruction that flushes the dcache and invalidates the icache at the same time. So flush_icache_range() actually does both and flush_dcache_page() instead just marks the page as dirty to ensure flush_icache_range() does not get skipped after a writing a page from the kernel. On later Itaniums, there is apparently a separate icache flush instruction that gets used in flush_icache_range(), but that still works for the DMA case that is allowed to skip the flush. > This is all now making me think we need some careful consideration of > whether the benefits of consolidating code outweigh the confusion of > conflating multiple different meanings of "clean" together... The difference in usage of PG_dcache_clean/PG_dcache_dirty/PG_arch_1 across architectures is certainly big enough that we can't just define a a common arch_dma_mark_clean() across architectures, but I think the idea of having a common entry point for arch_dma_mark_clean() to be called from the dma-mapping code to do something architecture specific after a DMA is clean still makes sense, Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 17:20 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 17:20 UTC (permalink / raw) To: Robin Murphy, Arnd Bergmann, linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Brian Cain, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, linuxppc-dev, David S . Miller On Fri, Mar 31, 2023, at 17:12, Robin Murphy wrote: > On 31/03/2023 3:00 pm, Arnd Bergmann wrote: >> On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> >> To be on the safe side, I'd have to pass a flag into >> arch_dma_mark_clean() about coherency, to let the arm >> implementation still require the extra dcache flush >> for coherent DMA, while ia64 can ignore that flag. > > Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA > write should be pretty much equivalent to a coherent write by another > CPU (or indeed the local CPU itself) - nothing says that it *couldn't* > dirty a line in a data cache above the level of unification, so in > general the assumption must be that, yes, if coherent DMA is writing > data intended to be executable, then it's going to want a Dcache clean > to PoU and an Icache invalidate to PoU before trying to execute it. By > comparison, a non-coherent DMA transfer will inherently have to > invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot > leave dirty data above the PoU, so only the Icache maintenance is > required in the executable case. Ok, makes sense. I've already started reworking my patch for it. > (FWIW I believe the Armv8 IDC/DIC features can safely be considered > irrelevant to 32-bit kernels) > > I don't know a great deal about IA-64, but it appears to be using its > PG_arch_1 flag in a subtly different manner to Arm, namely to optimise > out the *Icache* maintenance. So if anything, it seems IA-64 is the > weirdo here (who'd have guessed?) where DMA manages to be *more* > coherent than the CPUs themselves :) I checked this in the ia64 manual, and as far as I can tell, it originally only had one cacheflush instruction that flushes the dcache and invalidates the icache at the same time. So flush_icache_range() actually does both and flush_dcache_page() instead just marks the page as dirty to ensure flush_icache_range() does not get skipped after a writing a page from the kernel. On later Itaniums, there is apparently a separate icache flush instruction that gets used in flush_icache_range(), but that still works for the DMA case that is allowed to skip the flush. > This is all now making me think we need some careful consideration of > whether the benefits of consolidating code outweigh the confusion of > conflating multiple different meanings of "clean" together... The difference in usage of PG_dcache_clean/PG_dcache_dirty/PG_arch_1 across architectures is certainly big enough that we can't just define a a common arch_dma_mark_clean() across architectures, but I think the idea of having a common entry point for arch_dma_mark_clean() to be called from the dma-mapping code to do something architecture specific after a DMA is clean still makes sense, Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 17:20 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 17:20 UTC (permalink / raw) To: Robin Murphy, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 17:12, Robin Murphy wrote: > On 31/03/2023 3:00 pm, Arnd Bergmann wrote: >> On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> >> To be on the safe side, I'd have to pass a flag into >> arch_dma_mark_clean() about coherency, to let the arm >> implementation still require the extra dcache flush >> for coherent DMA, while ia64 can ignore that flag. > > Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA > write should be pretty much equivalent to a coherent write by another > CPU (or indeed the local CPU itself) - nothing says that it *couldn't* > dirty a line in a data cache above the level of unification, so in > general the assumption must be that, yes, if coherent DMA is writing > data intended to be executable, then it's going to want a Dcache clean > to PoU and an Icache invalidate to PoU before trying to execute it. By > comparison, a non-coherent DMA transfer will inherently have to > invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot > leave dirty data above the PoU, so only the Icache maintenance is > required in the executable case. Ok, makes sense. I've already started reworking my patch for it. > (FWIW I believe the Armv8 IDC/DIC features can safely be considered > irrelevant to 32-bit kernels) > > I don't know a great deal about IA-64, but it appears to be using its > PG_arch_1 flag in a subtly different manner to Arm, namely to optimise > out the *Icache* maintenance. So if anything, it seems IA-64 is the > weirdo here (who'd have guessed?) where DMA manages to be *more* > coherent than the CPUs themselves :) I checked this in the ia64 manual, and as far as I can tell, it originally only had one cacheflush instruction that flushes the dcache and invalidates the icache at the same time. So flush_icache_range() actually does both and flush_dcache_page() instead just marks the page as dirty to ensure flush_icache_range() does not get skipped after a writing a page from the kernel. On later Itaniums, there is apparently a separate icache flush instruction that gets used in flush_icache_range(), but that still works for the DMA case that is allowed to skip the flush. > This is all now making me think we need some careful consideration of > whether the benefits of consolidating code outweigh the confusion of > conflating multiple different meanings of "clean" together... The difference in usage of PG_dcache_clean/PG_dcache_dirty/PG_arch_1 across architectures is certainly big enough that we can't just define a a common arch_dma_mark_clean() across architectures, but I think the idea of having a common entry point for arch_dma_mark_clean() to be called from the dma-mapping code to do something architecture specific after a DMA is clean still makes sense, Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 17:20 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 17:20 UTC (permalink / raw) To: Robin Murphy, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 17:12, Robin Murphy wrote: > On 31/03/2023 3:00 pm, Arnd Bergmann wrote: >> On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> >> To be on the safe side, I'd have to pass a flag into >> arch_dma_mark_clean() about coherency, to let the arm >> implementation still require the extra dcache flush >> for coherent DMA, while ia64 can ignore that flag. > > Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA > write should be pretty much equivalent to a coherent write by another > CPU (or indeed the local CPU itself) - nothing says that it *couldn't* > dirty a line in a data cache above the level of unification, so in > general the assumption must be that, yes, if coherent DMA is writing > data intended to be executable, then it's going to want a Dcache clean > to PoU and an Icache invalidate to PoU before trying to execute it. By > comparison, a non-coherent DMA transfer will inherently have to > invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot > leave dirty data above the PoU, so only the Icache maintenance is > required in the executable case. Ok, makes sense. I've already started reworking my patch for it. > (FWIW I believe the Armv8 IDC/DIC features can safely be considered > irrelevant to 32-bit kernels) > > I don't know a great deal about IA-64, but it appears to be using its > PG_arch_1 flag in a subtly different manner to Arm, namely to optimise > out the *Icache* maintenance. So if anything, it seems IA-64 is the > weirdo here (who'd have guessed?) where DMA manages to be *more* > coherent than the CPUs themselves :) I checked this in the ia64 manual, and as far as I can tell, it originally only had one cacheflush instruction that flushes the dcache and invalidates the icache at the same time. So flush_icache_range() actually does both and flush_dcache_page() instead just marks the page as dirty to ensure flush_icache_range() does not get skipped after a writing a page from the kernel. On later Itaniums, there is apparently a separate icache flush instruction that gets used in flush_icache_range(), but that still works for the DMA case that is allowed to skip the flush. > This is all now making me think we need some careful consideration of > whether the benefits of consolidating code outweigh the confusion of > conflating multiple different meanings of "clean" together... The difference in usage of PG_dcache_clean/PG_dcache_dirty/PG_arch_1 across architectures is certainly big enough that we can't just define a a common arch_dma_mark_clean() across architectures, but I think the idea of having a common entry point for arch_dma_mark_clean() to be called from the dma-mapping code to do something architecture specific after a DMA is clean still makes sense, Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 17:20 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 17:20 UTC (permalink / raw) To: Robin Murphy, Arnd Bergmann, linux-kernel Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, J On Fri, Mar 31, 2023, at 17:12, Robin Murphy wrote: > On 31/03/2023 3:00 pm, Arnd Bergmann wrote: >> On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> >> To be on the safe side, I'd have to pass a flag into >> arch_dma_mark_clean() about coherency, to let the arm >> implementation still require the extra dcache flush >> for coherent DMA, while ia64 can ignore that flag. > > Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA > write should be pretty much equivalent to a coherent write by another > CPU (or indeed the local CPU itself) - nothing says that it *couldn't* > dirty a line in a data cache above the level of unification, so in > general the assumption must be that, yes, if coherent DMA is writing > data intended to be executable, then it's going to want a Dcache clean > to PoU and an Icache invalidate to PoU before trying to execute it. By > comparison, a non-coherent DMA transfer will inherently have to > invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot > leave dirty data above the PoU, so only the Icache maintenance is > required in the executable case. Ok, makes sense. I've already started reworking my patch for it. > (FWIW I believe the Armv8 IDC/DIC features can safely be considered > irrelevant to 32-bit kernels) > > I don't know a great deal about IA-64, but it appears to be using its > PG_arch_1 flag in a subtly different manner to Arm, namely to optimise > out the *Icache* maintenance. So if anything, it seems IA-64 is the > weirdo here (who'd have guessed?) where DMA manages to be *more* > coherent than the CPUs themselves :) I checked this in the ia64 manual, and as far as I can tell, it originally only had one cacheflush instruction that flushes the dcache and invalidates the icache at the same time. So flush_icache_range() actually does both and flush_dcache_page() instead just marks the page as dirty to ensure flush_icache_range() does not get skipped after a writing a page from the kernel. On later Itaniums, there is apparently a separate icache flush instruction that gets used in flush_icache_range(), but that still works for the DMA case that is allowed to skip the flush. > This is all now making me think we need some careful consideration of > whether the benefits of consolidating code outweigh the confusion of > conflating multiple different meanings of "clean" together... The difference in usage of PG_dcache_clean/PG_dcache_dirty/PG_arch_1 across architectures is certainly big enough that we can't just define a a common arch_dma_mark_clean() across architectures, but I think the idea of having a common entry point for arch_dma_mark_clean() to be called from the dma-mapping code to do something architecture specific after a DMA is clean still makes sense, Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 15:01 ` Russell King (Oracle) -1 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-27 15:01 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. ... because this is an arm32 specific feature. Generically, it's PG_arch_1, which is a page flag free for architecture use. On arm32 we decided to use this to mark whether we can skip dcache writebacks when establishing a PTE - and thus it was decided to call it PG_dcache_clean to reflect how arm32 decided to use that bit. This isn't just a DMA thing, there are other places that we update the bit, such as flush_dcache_page() and copy_user_highpage(). So thinking that the arm32 PG_dcache_clean is something for DMA is actually wrong. Other architectures are free to do their own other optimisations using that bit, and their implementations may be DMA-centric. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 15:01 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-27 15:01 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. ... because this is an arm32 specific feature. Generically, it's PG_arch_1, which is a page flag free for architecture use. On arm32 we decided to use this to mark whether we can skip dcache writebacks when establishing a PTE - and thus it was decided to call it PG_dcache_clean to reflect how arm32 decided to use that bit. This isn't just a DMA thing, there are other places that we update the bit, such as flush_dcache_page() and copy_user_highpage(). So thinking that the arm32 PG_dcache_clean is something for DMA is actually wrong. Other architectures are free to do their own other optimisations using that bit, and their implementations may be DMA-centric. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 15:01 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-27 15:01 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. ... because this is an arm32 specific feature. Generically, it's PG_arch_1, which is a page flag free for architecture use. On arm32 we decided to use this to mark whether we can skip dcache writebacks when establishing a PTE - and thus it was decided to call it PG_dcache_clean to reflect how arm32 decided to use that bit. This isn't just a DMA thing, there are other places that we update the bit, such as flush_dcache_page() and copy_user_highpage(). So thinking that the arm32 PG_dcache_clean is something for DMA is actually wrong. Other architectures are free to do their own other optimisations using that bit, and their implementations may be DMA-centric. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 15:01 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-27 15:01 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. ... because this is an arm32 specific feature. Generically, it's PG_arch_1, which is a page flag free for architecture use. On arm32 we decided to use this to mark whether we can skip dcache writebacks when establishing a PTE - and thus it was decided to call it PG_dcache_clean to reflect how arm32 decided to use that bit. This isn't just a DMA thing, there are other places that we update the bit, such as flush_dcache_page() and copy_user_highpage(). So thinking that the arm32 PG_dcache_clean is something for DMA is actually wrong. Other architectures are free to do their own other optimisations using that bit, and their implementations may be DMA-centric. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 15:01 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-27 15:01 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. ... because this is an arm32 specific feature. Generically, it's PG_arch_1, which is a page flag free for architecture use. On arm32 we decided to use this to mark whether we can skip dcache writebacks when establishing a PTE - and thus it was decided to call it PG_dcache_clean to reflect how arm32 decided to use that bit. This isn't just a DMA thing, there are other places that we update the bit, such as flush_dcache_page() and copy_user_highpage(). So thinking that the arm32 PG_dcache_clean is something for DMA is actually wrong. Other architectures are free to do their own other optimisations using that bit, and their implementations may be DMA-centric. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-27 15:01 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-27 15:01 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. ... because this is an arm32 specific feature. Generically, it's PG_arch_1, which is a page flag free for architecture use. On arm32 we decided to use this to mark whether we can skip dcache writebacks when establishing a PTE - and thus it was decided to call it PG_dcache_clean to reflect how arm32 decided to use that bit. This isn't just a DMA thing, there are other places that we update the bit, such as flush_dcache_page() and copy_user_highpage(). So thinking that the arm32 PG_dcache_clean is something for DMA is actually wrong. Other architectures are free to do their own other optimisations using that bit, and their implementations may be DMA-centric. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-03-27 15:01 ` Russell King (Oracle) ` (3 preceding siblings ...) (?) @ 2023-03-31 14:06 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 14:06 UTC (permalink / raw) To: Russell King, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023, at 17:01, Russell King (Oracle) wrote: > On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> The arm version of the arch_sync_dma_for_cpu() function annotates pages as >> PG_dcache_clean after a DMA, but no other architecture does this here. > > ... because this is an arm32 specific feature. Generically, it's > PG_arch_1, which is a page flag free for architecture use. On arm32 > we decided to use this to mark whether we can skip dcache writebacks > when establishing a PTE - and thus it was decided to call it > PG_dcache_clean to reflect how arm32 decided to use that bit. > > This isn't just a DMA thing, there are other places that we update > the bit, such as flush_dcache_page() and copy_user_highpage(). > > So thinking that the arm32 PG_dcache_clean is something for DMA is > actually wrong. > > Other architectures are free to do their own other optimisations > using that bit, and their implementations may be DMA-centric. The flag is used the same way on most architectures, though some use the opposite polarity and call it PG_dcache_dirty. The only other architecture that uses it for DMA is ia64, with the difference being that this also marks the page as clean even for coherent DMA, not just when doing a flush as part of noncoherent DMA. Based on Robin's reply it sounds that this is not a valid assumption on Arm, if a coherent DMA can target a dirty dcache line without cleaning it. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 14:06 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 14:06 UTC (permalink / raw) To: Russell King, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023, at 17:01, Russell King (Oracle) wrote: > On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> The arm version of the arch_sync_dma_for_cpu() function annotates pages as >> PG_dcache_clean after a DMA, but no other architecture does this here. > > ... because this is an arm32 specific feature. Generically, it's > PG_arch_1, which is a page flag free for architecture use. On arm32 > we decided to use this to mark whether we can skip dcache writebacks > when establishing a PTE - and thus it was decided to call it > PG_dcache_clean to reflect how arm32 decided to use that bit. > > This isn't just a DMA thing, there are other places that we update > the bit, such as flush_dcache_page() and copy_user_highpage(). > > So thinking that the arm32 PG_dcache_clean is something for DMA is > actually wrong. > > Other architectures are free to do their own other optimisations > using that bit, and their implementations may be DMA-centric. The flag is used the same way on most architectures, though some use the opposite polarity and call it PG_dcache_dirty. The only other architecture that uses it for DMA is ia64, with the difference being that this also marks the page as clean even for coherent DMA, not just when doing a flush as part of noncoherent DMA. Based on Robin's reply it sounds that this is not a valid assumption on Arm, if a coherent DMA can target a dirty dcache line without cleaning it. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 14:06 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 14:06 UTC (permalink / raw) To: Russell King, Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Brian Cain, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Mon, Mar 27, 2023, at 17:01, Russell King (Oracle) wrote: > On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> The arm version of the arch_sync_dma_for_cpu() function annotates pages as >> PG_dcache_clean after a DMA, but no other architecture does this here. > > ... because this is an arm32 specific feature. Generically, it's > PG_arch_1, which is a page flag free for architecture use. On arm32 > we decided to use this to mark whether we can skip dcache writebacks > when establishing a PTE - and thus it was decided to call it > PG_dcache_clean to reflect how arm32 decided to use that bit. > > This isn't just a DMA thing, there are other places that we update > the bit, such as flush_dcache_page() and copy_user_highpage(). > > So thinking that the arm32 PG_dcache_clean is something for DMA is > actually wrong. > > Other architectures are free to do their own other optimisations > using that bit, and their implementations may be DMA-centric. The flag is used the same way on most architectures, though some use the opposite polarity and call it PG_dcache_dirty. The only other architecture that uses it for DMA is ia64, with the difference being that this also marks the page as clean even for coherent DMA, not just when doing a flush as part of noncoherent DMA. Based on Robin's reply it sounds that this is not a valid assumption on Arm, if a coherent DMA can target a dirty dcache line without cleaning it. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 14:06 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 14:06 UTC (permalink / raw) To: Russell King, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023, at 17:01, Russell King (Oracle) wrote: > On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> The arm version of the arch_sync_dma_for_cpu() function annotates pages as >> PG_dcache_clean after a DMA, but no other architecture does this here. > > ... because this is an arm32 specific feature. Generically, it's > PG_arch_1, which is a page flag free for architecture use. On arm32 > we decided to use this to mark whether we can skip dcache writebacks > when establishing a PTE - and thus it was decided to call it > PG_dcache_clean to reflect how arm32 decided to use that bit. > > This isn't just a DMA thing, there are other places that we update > the bit, such as flush_dcache_page() and copy_user_highpage(). > > So thinking that the arm32 PG_dcache_clean is something for DMA is > actually wrong. > > Other architectures are free to do their own other optimisations > using that bit, and their implementations may be DMA-centric. The flag is used the same way on most architectures, though some use the opposite polarity and call it PG_dcache_dirty. The only other architecture that uses it for DMA is ia64, with the difference being that this also marks the page as clean even for coherent DMA, not just when doing a flush as part of noncoherent DMA. Based on Robin's reply it sounds that this is not a valid assumption on Arm, if a coherent DMA can target a dirty dcache line without cleaning it. Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 14:06 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 14:06 UTC (permalink / raw) To: Russell King, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023, at 17:01, Russell King (Oracle) wrote: > On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> The arm version of the arch_sync_dma_for_cpu() function annotates pages as >> PG_dcache_clean after a DMA, but no other architecture does this here. > > ... because this is an arm32 specific feature. Generically, it's > PG_arch_1, which is a page flag free for architecture use. On arm32 > we decided to use this to mark whether we can skip dcache writebacks > when establishing a PTE - and thus it was decided to call it > PG_dcache_clean to reflect how arm32 decided to use that bit. > > This isn't just a DMA thing, there are other places that we update > the bit, such as flush_dcache_page() and copy_user_highpage(). > > So thinking that the arm32 PG_dcache_clean is something for DMA is > actually wrong. > > Other architectures are free to do their own other optimisations > using that bit, and their implementations may be DMA-centric. The flag is used the same way on most architectures, though some use the opposite polarity and call it PG_dcache_dirty. The only other architecture that uses it for DMA is ia64, with the difference being that this also marks the page as clean even for coherent DMA, not just when doing a flush as part of noncoherent DMA. Based on Robin's reply it sounds that this is not a valid assumption on Arm, if a coherent DMA can target a dirty dcache line without cleaning it. Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 14:06 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 14:06 UTC (permalink / raw) To: Russell King, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John On Mon, Mar 27, 2023, at 17:01, Russell King (Oracle) wrote: > On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> The arm version of the arch_sync_dma_for_cpu() function annotates pages as >> PG_dcache_clean after a DMA, but no other architecture does this here. > > ... because this is an arm32 specific feature. Generically, it's > PG_arch_1, which is a page flag free for architecture use. On arm32 > we decided to use this to mark whether we can skip dcache writebacks > when establishing a PTE - and thus it was decided to call it > PG_dcache_clean to reflect how arm32 decided to use that bit. > > This isn't just a DMA thing, there are other places that we update > the bit, such as flush_dcache_page() and copy_user_highpage(). > > So thinking that the arm32 PG_dcache_clean is something for DMA is > actually wrong. > > Other architectures are free to do their own other optimisations > using that bit, and their implementations may be DMA-centric. The flag is used the same way on most architectures, though some use the opposite polarity and call it PG_dcache_dirty. The only other architecture that uses it for DMA is ia64, with the difference being that this also marks the page as clean even for coherent DMA, not just when doing a flush as part of noncoherent DMA. Based on Robin's reply it sounds that this is not a valid assumption on Arm, if a coherent DMA can target a dirty dcache line without cleaning it. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-03-31 14:06 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-31 15:54 ` Russell King (Oracle) -1 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 15:54 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023 at 04:06:37PM +0200, Arnd Bergmann wrote: > On Mon, Mar 27, 2023, at 17:01, Russell King (Oracle) wrote: > > On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > >> From: Arnd Bergmann <arnd@arndb.de> > >> > >> The arm version of the arch_sync_dma_for_cpu() function annotates pages as > >> PG_dcache_clean after a DMA, but no other architecture does this here. > > > > ... because this is an arm32 specific feature. Generically, it's > > PG_arch_1, which is a page flag free for architecture use. On arm32 > > we decided to use this to mark whether we can skip dcache writebacks > > when establishing a PTE - and thus it was decided to call it > > PG_dcache_clean to reflect how arm32 decided to use that bit. > > > > This isn't just a DMA thing, there are other places that we update > > the bit, such as flush_dcache_page() and copy_user_highpage(). > > > > So thinking that the arm32 PG_dcache_clean is something for DMA is > > actually wrong. > > > > Other architectures are free to do their own other optimisations > > using that bit, and their implementations may be DMA-centric. > > The flag is used the same way on most architectures, though some > use the opposite polarity and call it PG_dcache_dirty. The only > other architecture that uses it for DMA is ia64, with the difference > being that this also marks the page as clean even for coherent > DMA, not just when doing a flush as part of noncoherent DMA. > > Based on Robin's reply it sounds that this is not a valid assumption > on Arm, if a coherent DMA can target a dirty dcache line without > cleaning it. The other thing to note here is that PG_dcache_clean doesn't have much meaning on modern CPUs with PIPT caches. For these, cache_is_vipt_nonaliasing() will be true, and cache_ops_need_broadcast() will be false. Firstly, if we're using coherent DMA, then PG_dcache_clean is intentionally not touched, because the data cache isn't cleaned in any way by DMA operations. flush_dcache_page() turns into a no-op apart from clearing PG_dcache_clean if it was set. __sync_icache_dcache() will do nothing for non-executable pages, but will write-back a page that isn't marked PG_dcache_clean to ensure that it is visible to the instruction stream. This is only used to ensure that a the instructions are visible to a newly established executable mapping when e.g. the page has been DMA'd in. The default state of PG_dcache_clean is zero on any new allocation, so this has the effect of causing any executable page to be flushed such that the instruction stream can see the instructions, but only for the first establishment of the mapping. That means that e.g. libc text pages don't keep getting flushed on the start of every program. update_mmu_cache() isn't compiled, so it's use of PG_dcache_clean is irrelevant. v6_copy_user_highpage_aliasing() won't be called because we're not using an aliasing cache. So, for modern ARM systems with DMA-coherent PG_dcache_clean only serves for the __sync_icache_dcache() optimisation. ARMs use of this remains valid in this circumstance. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 15:54 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 15:54 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023 at 04:06:37PM +0200, Arnd Bergmann wrote: > On Mon, Mar 27, 2023, at 17:01, Russell King (Oracle) wrote: > > On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > >> From: Arnd Bergmann <arnd@arndb.de> > >> > >> The arm version of the arch_sync_dma_for_cpu() function annotates pages as > >> PG_dcache_clean after a DMA, but no other architecture does this here. > > > > ... because this is an arm32 specific feature. Generically, it's > > PG_arch_1, which is a page flag free for architecture use. On arm32 > > we decided to use this to mark whether we can skip dcache writebacks > > when establishing a PTE - and thus it was decided to call it > > PG_dcache_clean to reflect how arm32 decided to use that bit. > > > > This isn't just a DMA thing, there are other places that we update > > the bit, such as flush_dcache_page() and copy_user_highpage(). > > > > So thinking that the arm32 PG_dcache_clean is something for DMA is > > actually wrong. > > > > Other architectures are free to do their own other optimisations > > using that bit, and their implementations may be DMA-centric. > > The flag is used the same way on most architectures, though some > use the opposite polarity and call it PG_dcache_dirty. The only > other architecture that uses it for DMA is ia64, with the difference > being that this also marks the page as clean even for coherent > DMA, not just when doing a flush as part of noncoherent DMA. > > Based on Robin's reply it sounds that this is not a valid assumption > on Arm, if a coherent DMA can target a dirty dcache line without > cleaning it. The other thing to note here is that PG_dcache_clean doesn't have much meaning on modern CPUs with PIPT caches. For these, cache_is_vipt_nonaliasing() will be true, and cache_ops_need_broadcast() will be false. Firstly, if we're using coherent DMA, then PG_dcache_clean is intentionally not touched, because the data cache isn't cleaned in any way by DMA operations. flush_dcache_page() turns into a no-op apart from clearing PG_dcache_clean if it was set. __sync_icache_dcache() will do nothing for non-executable pages, but will write-back a page that isn't marked PG_dcache_clean to ensure that it is visible to the instruction stream. This is only used to ensure that a the instructions are visible to a newly established executable mapping when e.g. the page has been DMA'd in. The default state of PG_dcache_clean is zero on any new allocation, so this has the effect of causing any executable page to be flushed such that the instruction stream can see the instructions, but only for the first establishment of the mapping. That means that e.g. libc text pages don't keep getting flushed on the start of every program. update_mmu_cache() isn't compiled, so it's use of PG_dcache_clean is irrelevant. v6_copy_user_highpage_aliasing() won't be called because we're not using an aliasing cache. So, for modern ARM systems with DMA-coherent PG_dcache_clean only serves for the __sync_icache_dcache() optimisation. ARMs use of this remains valid in this circumstance. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 15:54 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 15:54 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Neil Armstrong, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Brian Cain, Arnd Bergmann <ar On Fri, Mar 31, 2023 at 04:06:37PM +0200, Arnd Bergmann wrote: > On Mon, Mar 27, 2023, at 17:01, Russell King (Oracle) wrote: > > On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > >> From: Arnd Bergmann <arnd@arndb.de> > >> > >> The arm version of the arch_sync_dma_for_cpu() function annotates pages as > >> PG_dcache_clean after a DMA, but no other architecture does this here. > > > > ... because this is an arm32 specific feature. Generically, it's > > PG_arch_1, which is a page flag free for architecture use. On arm32 > > we decided to use this to mark whether we can skip dcache writebacks > > when establishing a PTE - and thus it was decided to call it > > PG_dcache_clean to reflect how arm32 decided to use that bit. > > > > This isn't just a DMA thing, there are other places that we update > > the bit, such as flush_dcache_page() and copy_user_highpage(). > > > > So thinking that the arm32 PG_dcache_clean is something for DMA is > > actually wrong. > > > > Other architectures are free to do their own other optimisations > > using that bit, and their implementations may be DMA-centric. > > The flag is used the same way on most architectures, though some > use the opposite polarity and call it PG_dcache_dirty. The only > other architecture that uses it for DMA is ia64, with the difference > being that this also marks the page as clean even for coherent > DMA, not just when doing a flush as part of noncoherent DMA. > > Based on Robin's reply it sounds that this is not a valid assumption > on Arm, if a coherent DMA can target a dirty dcache line without > cleaning it. The other thing to note here is that PG_dcache_clean doesn't have much meaning on modern CPUs with PIPT caches. For these, cache_is_vipt_nonaliasing() will be true, and cache_ops_need_broadcast() will be false. Firstly, if we're using coherent DMA, then PG_dcache_clean is intentionally not touched, because the data cache isn't cleaned in any way by DMA operations. flush_dcache_page() turns into a no-op apart from clearing PG_dcache_clean if it was set. __sync_icache_dcache() will do nothing for non-executable pages, but will write-back a page that isn't marked PG_dcache_clean to ensure that it is visible to the instruction stream. This is only used to ensure that a the instructions are visible to a newly established executable mapping when e.g. the page has been DMA'd in. The default state of PG_dcache_clean is zero on any new allocation, so this has the effect of causing any executable page to be flushed such that the instruction stream can see the instructions, but only for the first establishment of the mapping. That means that e.g. libc text pages don't keep getting flushed on the start of every program. update_mmu_cache() isn't compiled, so it's use of PG_dcache_clean is irrelevant. v6_copy_user_highpage_aliasing() won't be called because we're not using an aliasing cache. So, for modern ARM systems with DMA-coherent PG_dcache_clean only serves for the __sync_icache_dcache() optimisation. ARMs use of this remains valid in this circumstance. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 15:54 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 15:54 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023 at 04:06:37PM +0200, Arnd Bergmann wrote: > On Mon, Mar 27, 2023, at 17:01, Russell King (Oracle) wrote: > > On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > >> From: Arnd Bergmann <arnd@arndb.de> > >> > >> The arm version of the arch_sync_dma_for_cpu() function annotates pages as > >> PG_dcache_clean after a DMA, but no other architecture does this here. > > > > ... because this is an arm32 specific feature. Generically, it's > > PG_arch_1, which is a page flag free for architecture use. On arm32 > > we decided to use this to mark whether we can skip dcache writebacks > > when establishing a PTE - and thus it was decided to call it > > PG_dcache_clean to reflect how arm32 decided to use that bit. > > > > This isn't just a DMA thing, there are other places that we update > > the bit, such as flush_dcache_page() and copy_user_highpage(). > > > > So thinking that the arm32 PG_dcache_clean is something for DMA is > > actually wrong. > > > > Other architectures are free to do their own other optimisations > > using that bit, and their implementations may be DMA-centric. > > The flag is used the same way on most architectures, though some > use the opposite polarity and call it PG_dcache_dirty. The only > other architecture that uses it for DMA is ia64, with the difference > being that this also marks the page as clean even for coherent > DMA, not just when doing a flush as part of noncoherent DMA. > > Based on Robin's reply it sounds that this is not a valid assumption > on Arm, if a coherent DMA can target a dirty dcache line without > cleaning it. The other thing to note here is that PG_dcache_clean doesn't have much meaning on modern CPUs with PIPT caches. For these, cache_is_vipt_nonaliasing() will be true, and cache_ops_need_broadcast() will be false. Firstly, if we're using coherent DMA, then PG_dcache_clean is intentionally not touched, because the data cache isn't cleaned in any way by DMA operations. flush_dcache_page() turns into a no-op apart from clearing PG_dcache_clean if it was set. __sync_icache_dcache() will do nothing for non-executable pages, but will write-back a page that isn't marked PG_dcache_clean to ensure that it is visible to the instruction stream. This is only used to ensure that a the instructions are visible to a newly established executable mapping when e.g. the page has been DMA'd in. The default state of PG_dcache_clean is zero on any new allocation, so this has the effect of causing any executable page to be flushed such that the instruction stream can see the instructions, but only for the first establishment of the mapping. That means that e.g. libc text pages don't keep getting flushed on the start of every program. update_mmu_cache() isn't compiled, so it's use of PG_dcache_clean is irrelevant. v6_copy_user_highpage_aliasing() won't be called because we're not using an aliasing cache. So, for modern ARM systems with DMA-coherent PG_dcache_clean only serves for the __sync_icache_dcache() optimisation. ARMs use of this remains valid in this circumstance. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 15:54 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 15:54 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023 at 04:06:37PM +0200, Arnd Bergmann wrote: > On Mon, Mar 27, 2023, at 17:01, Russell King (Oracle) wrote: > > On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > >> From: Arnd Bergmann <arnd@arndb.de> > >> > >> The arm version of the arch_sync_dma_for_cpu() function annotates pages as > >> PG_dcache_clean after a DMA, but no other architecture does this here. > > > > ... because this is an arm32 specific feature. Generically, it's > > PG_arch_1, which is a page flag free for architecture use. On arm32 > > we decided to use this to mark whether we can skip dcache writebacks > > when establishing a PTE - and thus it was decided to call it > > PG_dcache_clean to reflect how arm32 decided to use that bit. > > > > This isn't just a DMA thing, there are other places that we update > > the bit, such as flush_dcache_page() and copy_user_highpage(). > > > > So thinking that the arm32 PG_dcache_clean is something for DMA is > > actually wrong. > > > > Other architectures are free to do their own other optimisations > > using that bit, and their implementations may be DMA-centric. > > The flag is used the same way on most architectures, though some > use the opposite polarity and call it PG_dcache_dirty. The only > other architecture that uses it for DMA is ia64, with the difference > being that this also marks the page as clean even for coherent > DMA, not just when doing a flush as part of noncoherent DMA. > > Based on Robin's reply it sounds that this is not a valid assumption > on Arm, if a coherent DMA can target a dirty dcache line without > cleaning it. The other thing to note here is that PG_dcache_clean doesn't have much meaning on modern CPUs with PIPT caches. For these, cache_is_vipt_nonaliasing() will be true, and cache_ops_need_broadcast() will be false. Firstly, if we're using coherent DMA, then PG_dcache_clean is intentionally not touched, because the data cache isn't cleaned in any way by DMA operations. flush_dcache_page() turns into a no-op apart from clearing PG_dcache_clean if it was set. __sync_icache_dcache() will do nothing for non-executable pages, but will write-back a page that isn't marked PG_dcache_clean to ensure that it is visible to the instruction stream. This is only used to ensure that a the instructions are visible to a newly established executable mapping when e.g. the page has been DMA'd in. The default state of PG_dcache_clean is zero on any new allocation, so this has the effect of causing any executable page to be flushed such that the instruction stream can see the instructions, but only for the first establishment of the mapping. That means that e.g. libc text pages don't keep getting flushed on the start of every program. update_mmu_cache() isn't compiled, so it's use of PG_dcache_clean is irrelevant. v6_copy_user_highpage_aliasing() won't be called because we're not using an aliasing cache. So, for modern ARM systems with DMA-coherent PG_dcache_clean only serves for the __sync_icache_dcache() optimisation. ARMs use of this remains valid in this circumstance. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-03-31 15:54 ` Russell King (Oracle) 0 siblings, 0 replies; 456+ messages in thread From: Russell King (Oracle) @ 2023-03-31 15:54 UTC (permalink / raw) To: Arnd Bergmann Cc: Arnd Bergmann, linux-kernel, Vineet Gupta, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John On Fri, Mar 31, 2023 at 04:06:37PM +0200, Arnd Bergmann wrote: > On Mon, Mar 27, 2023, at 17:01, Russell King (Oracle) wrote: > > On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > >> From: Arnd Bergmann <arnd@arndb.de> > >> > >> The arm version of the arch_sync_dma_for_cpu() function annotates pages as > >> PG_dcache_clean after a DMA, but no other architecture does this here. > > > > ... because this is an arm32 specific feature. Generically, it's > > PG_arch_1, which is a page flag free for architecture use. On arm32 > > we decided to use this to mark whether we can skip dcache writebacks > > when establishing a PTE - and thus it was decided to call it > > PG_dcache_clean to reflect how arm32 decided to use that bit. > > > > This isn't just a DMA thing, there are other places that we update > > the bit, such as flush_dcache_page() and copy_user_highpage(). > > > > So thinking that the arm32 PG_dcache_clean is something for DMA is > > actually wrong. > > > > Other architectures are free to do their own other optimisations > > using that bit, and their implementations may be DMA-centric. > > The flag is used the same way on most architectures, though some > use the opposite polarity and call it PG_dcache_dirty. The only > other architecture that uses it for DMA is ia64, with the difference > being that this also marks the page as clean even for coherent > DMA, not just when doing a flush as part of noncoherent DMA. > > Based on Robin's reply it sounds that this is not a valid assumption > on Arm, if a coherent DMA can target a dirty dcache line without > cleaning it. The other thing to note here is that PG_dcache_clean doesn't have much meaning on modern CPUs with PIPT caches. For these, cache_is_vipt_nonaliasing() will be true, and cache_ops_need_broadcast() will be false. Firstly, if we're using coherent DMA, then PG_dcache_clean is intentionally not touched, because the data cache isn't cleaned in any way by DMA operations. flush_dcache_page() turns into a no-op apart from clearing PG_dcache_clean if it was set. __sync_icache_dcache() will do nothing for non-executable pages, but will write-back a page that isn't marked PG_dcache_clean to ensure that it is visible to the instruction stream. This is only used to ensure that a the instructions are visible to a newly established executable mapping when e.g. the page has been DMA'd in. The default state of PG_dcache_clean is zero on any new allocation, so this has the effect of causing any executable page to be flushed such that the instruction stream can see the instructions, but only for the first establishment of the mapping. That means that e.g. libc text pages don't keep getting flushed on the start of every program. update_mmu_cache() isn't compiled, so it's use of PG_dcache_clean is irrelevant. v6_copy_user_highpage_aliasing() won't be called because we're not using an aliasing cache. So, for modern ARM systems with DMA-coherent PG_dcache_clean only serves for the __sync_icache_dcache() optimisation. ARMs use of this remains valid in this circumstance. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-03-27 12:13 ` Arnd Bergmann ` (6 preceding siblings ...) (?) @ 2023-03-27 18:42 ` kernel test robot -1 siblings, 0 replies; 456+ messages in thread From: kernel test robot @ 2023-03-27 18:42 UTC (permalink / raw) To: Arnd Bergmann; +Cc: oe-kbuild-all Hi Arnd, I love your patch! Yet something to improve: [auto build test ERROR on soc/for-next] [also build test ERROR on jcmvbkbc-xtensa/xtensa-for-next powerpc/next powerpc/fixes linus/master v6.3-rc4 next-20230323] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Arnd-Bergmann/openrisc-dma-mapping-flush-bidirectional-mappings/20230327-202133 base: https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next patch link: https://lore.kernel.org/r/20230327121317.4081816-21-arnd%40kernel.org patch subject: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper config: arm-randconfig-r046-20230327 (https://download.01.org/0day-ci/archive/20230328/202303280208.wYfwjY3I-lkp@intel.com/config) compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/6455ebecc75d3dcbfcaf31db6e97534d0c564ca3 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Arnd-Bergmann/openrisc-dma-mapping-flush-bidirectional-mappings/20230327-202133 git checkout 6455ebecc75d3dcbfcaf31db6e97534d0c564ca3 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash arch/arm/mm/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303280208.wYfwjY3I-lkp@intel.com/ All error/warnings (new ones prefixed by >>): arch/arm/mm/dma-mapping.c: In function 'arm_iommu_sync_dma_for_cpu': >> arch/arm/mm/dma-mapping.c:1306:45: error: 's' undeclared (first use in this function); did you mean 's8'? 1306 | arch_sync_dma_for_cpu(phys, s->length, dir); | ^ | s8 arch/arm/mm/dma-mapping.c:1306:45: note: each undeclared identifier is reported only once for each function it appears in arch/arm/mm/dma-mapping.c: In function 'arm_iommu_unmap_page': >> arch/arm/mm/dma-mapping.c:1441:9: warning: this 'if' clause does not guard... [-Wmisleading-indentation] 1441 | if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) | ^~ arch/arm/mm/dma-mapping.c:1443:17: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if' 1443 | arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ arch/arm/mm/dma-mapping.c:1436:13: warning: unused variable 'len' [-Wunused-variable] 1436 | int len = PAGE_ALIGN(size + offset); | ^~~ arch/arm/mm/dma-mapping.c: At top level: >> arch/arm/mm/dma-mapping.c:1446:28: error: expected ')' before '->' token 1446 | iommu_unmap(mapping->domain, iova, len); | ^~ | ) >> arch/arm/mm/dma-mapping.c:1447:9: warning: data definition has no type or storage class 1447 | __free_iova(mapping, iova, len); | ^~~~~~~~~~~ >> arch/arm/mm/dma-mapping.c:1447:9: error: type defaults to 'int' in declaration of '__free_iova' [-Werror=implicit-int] >> arch/arm/mm/dma-mapping.c:1447:9: warning: parameter names (without types) in function declaration >> arch/arm/mm/dma-mapping.c:1447:9: error: conflicting types for '__free_iova'; have 'int()' arch/arm/mm/dma-mapping.c:825:20: note: previous definition of '__free_iova' with type 'void(struct dma_iommu_mapping *, dma_addr_t, size_t)' {aka 'void(struct dma_iommu_mapping *, unsigned int, unsigned int)'} 825 | static inline void __free_iova(struct dma_iommu_mapping *mapping, | ^~~~~~~~~~~ >> arch/arm/mm/dma-mapping.c:1448:1: error: expected identifier or '(' before '}' token 1448 | } | ^ cc1: some warnings being treated as errors vim +1306 arch/arm/mm/dma-mapping.c 1300 1301 static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, 1302 enum dma_data_direction dir, 1303 bool dma_coherent) 1304 { 1305 if (!dma_coherent) > 1306 arch_sync_dma_for_cpu(phys, s->length, dir); 1307 1308 if (dir == DMA_FROM_DEVICE) 1309 arch_dma_mark_clean(phys, s->length); 1310 } 1311 1312 /** 1313 * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg 1314 * @dev: valid struct device pointer 1315 * @sg: list of buffers 1316 * @nents: number of buffers to unmap (same as was passed to dma_map_sg) 1317 * @dir: DMA transfer direction (same as was passed to dma_map_sg) 1318 * 1319 * Unmap a set of streaming mode DMA translations. Again, CPU access 1320 * rules concerning calls here are the same as for dma_unmap_single(). 1321 */ 1322 static void arm_iommu_unmap_sg(struct device *dev, 1323 struct scatterlist *sg, int nents, 1324 enum dma_data_direction dir, 1325 unsigned long attrs) 1326 { 1327 struct scatterlist *s; 1328 int i; 1329 1330 for_each_sg(sg, s, nents, i) { 1331 if (sg_dma_len(s)) 1332 __iommu_remove_mapping(dev, sg_dma_address(s), 1333 sg_dma_len(s)); 1334 if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) 1335 arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, 1336 dev->dma_coherent); 1337 } 1338 } 1339 1340 /** 1341 * arm_iommu_sync_sg_for_cpu 1342 * @dev: valid struct device pointer 1343 * @sg: list of buffers 1344 * @nents: number of buffers to map (returned from dma_map_sg) 1345 * @dir: DMA transfer direction (same as was passed to dma_map_sg) 1346 */ 1347 static void arm_iommu_sync_sg_for_cpu(struct device *dev, 1348 struct scatterlist *sg, 1349 int nents, enum dma_data_direction dir) 1350 { 1351 struct scatterlist *s; 1352 int i; 1353 1354 for_each_sg(sg, s, nents, i) 1355 arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, 1356 dev->dma_coherent); 1357 } 1358 1359 /** 1360 * arm_iommu_sync_sg_for_device 1361 * @dev: valid struct device pointer 1362 * @sg: list of buffers 1363 * @nents: number of buffers to map (returned from dma_map_sg) 1364 * @dir: DMA transfer direction (same as was passed to dma_map_sg) 1365 */ 1366 static void arm_iommu_sync_sg_for_device(struct device *dev, 1367 struct scatterlist *sg, 1368 int nents, enum dma_data_direction dir) 1369 { 1370 struct scatterlist *s; 1371 int i; 1372 1373 if (dev->dma_coherent) 1374 return; 1375 1376 for_each_sg(sg, s, nents, i) 1377 arch_sync_dma_for_device(page_to_phys(sg_page(s)) + s->offset, 1378 s->length, dir); 1379 } 1380 1381 /** 1382 * arm_iommu_map_page 1383 * @dev: valid struct device pointer 1384 * @page: page that buffer resides in 1385 * @offset: offset into page for start of buffer 1386 * @size: size of buffer to map 1387 * @dir: DMA transfer direction 1388 * 1389 * IOMMU aware version of arm_dma_map_page() 1390 */ 1391 static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page, 1392 unsigned long offset, size_t size, enum dma_data_direction dir, 1393 unsigned long attrs) 1394 { 1395 struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); 1396 dma_addr_t dma_addr; 1397 int ret, prot, len = PAGE_ALIGN(size + offset); 1398 1399 if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) 1400 arch_sync_dma_for_device(page_to_phys(page) + offset, 1401 size, dir); 1402 1403 dma_addr = __alloc_iova(mapping, len); 1404 if (dma_addr == DMA_MAPPING_ERROR) 1405 return dma_addr; 1406 1407 prot = __dma_info_to_prot(dir, attrs); 1408 1409 ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 1410 prot, GFP_KERNEL); 1411 if (ret < 0) 1412 goto fail; 1413 1414 return dma_addr + offset; 1415 fail: 1416 __free_iova(mapping, dma_addr, len); 1417 return DMA_MAPPING_ERROR; 1418 } 1419 1420 /** 1421 * arm_iommu_unmap_page 1422 * @dev: valid struct device pointer 1423 * @handle: DMA address of buffer 1424 * @size: size of buffer (same as passed to dma_map_page) 1425 * @dir: DMA transfer direction (same as passed to dma_map_page) 1426 * 1427 * IOMMU aware version of arm_dma_unmap_page() 1428 */ 1429 static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, 1430 size_t size, enum dma_data_direction dir, unsigned long attrs) 1431 { 1432 struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); 1433 dma_addr_t iova = handle & PAGE_MASK; 1434 phys_addr_t phys; 1435 int offset = handle & ~PAGE_MASK; 1436 int len = PAGE_ALIGN(size + offset); 1437 1438 if (!iova) 1439 return; 1440 > 1441 if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) 1442 phys = iommu_iova_to_phys(mapping->domain, handle); 1443 arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); 1444 } 1445 > 1446 iommu_unmap(mapping->domain, iova, len); > 1447 __free_iova(mapping, iova, len); > 1448 } 1449 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-03-27 12:13 ` Arnd Bergmann ` (7 preceding siblings ...) (?) @ 2023-03-27 19:03 ` kernel test robot -1 siblings, 0 replies; 456+ messages in thread From: kernel test robot @ 2023-03-27 19:03 UTC (permalink / raw) To: Arnd Bergmann; +Cc: oe-kbuild-all Hi Arnd, I love your patch! Perhaps something to improve: [auto build test WARNING on soc/for-next] [also build test WARNING on jcmvbkbc-xtensa/xtensa-for-next powerpc/next powerpc/fixes linus/master v6.3-rc4 next-20230327] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Arnd-Bergmann/openrisc-dma-mapping-flush-bidirectional-mappings/20230327-202133 base: https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next patch link: https://lore.kernel.org/r/20230327121317.4081816-21-arnd%40kernel.org patch subject: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper config: arm-allyesconfig (https://download.01.org/0day-ci/archive/20230328/202303280250.Q1Mbb3gM-lkp@intel.com/config) compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/6455ebecc75d3dcbfcaf31db6e97534d0c564ca3 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Arnd-Bergmann/openrisc-dma-mapping-flush-bidirectional-mappings/20230327-202133 git checkout 6455ebecc75d3dcbfcaf31db6e97534d0c564ca3 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash arch/arm/mm/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303280250.Q1Mbb3gM-lkp@intel.com/ All warnings (new ones prefixed by >>): arch/arm/mm/dma-mapping.c: In function 'arm_iommu_sync_dma_for_cpu': arch/arm/mm/dma-mapping.c:1306:45: error: 's' undeclared (first use in this function); did you mean 's8'? 1306 | arch_sync_dma_for_cpu(phys, s->length, dir); | ^ | s8 arch/arm/mm/dma-mapping.c:1306:45: note: each undeclared identifier is reported only once for each function it appears in arch/arm/mm/dma-mapping.c: In function 'arm_iommu_unmap_page': >> arch/arm/mm/dma-mapping.c:1441:9: warning: this 'if' clause does not guard... [-Wmisleading-indentation] 1441 | if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) | ^~ arch/arm/mm/dma-mapping.c:1443:17: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if' 1443 | arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ arch/arm/mm/dma-mapping.c:1436:13: warning: unused variable 'len' [-Wunused-variable] 1436 | int len = PAGE_ALIGN(size + offset); | ^~~ arch/arm/mm/dma-mapping.c: At top level: arch/arm/mm/dma-mapping.c:1446:28: error: expected ')' before '->' token 1446 | iommu_unmap(mapping->domain, iova, len); | ^~ | ) >> arch/arm/mm/dma-mapping.c:1447:9: warning: data definition has no type or storage class 1447 | __free_iova(mapping, iova, len); | ^~~~~~~~~~~ arch/arm/mm/dma-mapping.c:1447:9: error: type defaults to 'int' in declaration of '__free_iova' [-Werror=implicit-int] >> arch/arm/mm/dma-mapping.c:1447:9: warning: parameter names (without types) in function declaration arch/arm/mm/dma-mapping.c:1447:9: error: conflicting types for '__free_iova'; have 'int()' arch/arm/mm/dma-mapping.c:825:20: note: previous definition of '__free_iova' with type 'void(struct dma_iommu_mapping *, dma_addr_t, size_t)' {aka 'void(struct dma_iommu_mapping *, unsigned int, unsigned int)'} 825 | static inline void __free_iova(struct dma_iommu_mapping *mapping, | ^~~~~~~~~~~ arch/arm/mm/dma-mapping.c:1448:1: error: expected identifier or '(' before '}' token 1448 | } | ^ cc1: some warnings being treated as errors vim +/if +1441 arch/arm/mm/dma-mapping.c 1300 1301 static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, 1302 enum dma_data_direction dir, 1303 bool dma_coherent) 1304 { 1305 if (!dma_coherent) > 1306 arch_sync_dma_for_cpu(phys, s->length, dir); 1307 1308 if (dir == DMA_FROM_DEVICE) 1309 arch_dma_mark_clean(phys, s->length); 1310 } 1311 1312 /** 1313 * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg 1314 * @dev: valid struct device pointer 1315 * @sg: list of buffers 1316 * @nents: number of buffers to unmap (same as was passed to dma_map_sg) 1317 * @dir: DMA transfer direction (same as was passed to dma_map_sg) 1318 * 1319 * Unmap a set of streaming mode DMA translations. Again, CPU access 1320 * rules concerning calls here are the same as for dma_unmap_single(). 1321 */ 1322 static void arm_iommu_unmap_sg(struct device *dev, 1323 struct scatterlist *sg, int nents, 1324 enum dma_data_direction dir, 1325 unsigned long attrs) 1326 { 1327 struct scatterlist *s; 1328 int i; 1329 1330 for_each_sg(sg, s, nents, i) { 1331 if (sg_dma_len(s)) 1332 __iommu_remove_mapping(dev, sg_dma_address(s), 1333 sg_dma_len(s)); 1334 if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) 1335 arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, 1336 dev->dma_coherent); 1337 } 1338 } 1339 1340 /** 1341 * arm_iommu_sync_sg_for_cpu 1342 * @dev: valid struct device pointer 1343 * @sg: list of buffers 1344 * @nents: number of buffers to map (returned from dma_map_sg) 1345 * @dir: DMA transfer direction (same as was passed to dma_map_sg) 1346 */ 1347 static void arm_iommu_sync_sg_for_cpu(struct device *dev, 1348 struct scatterlist *sg, 1349 int nents, enum dma_data_direction dir) 1350 { 1351 struct scatterlist *s; 1352 int i; 1353 1354 for_each_sg(sg, s, nents, i) 1355 arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, 1356 dev->dma_coherent); 1357 } 1358 1359 /** 1360 * arm_iommu_sync_sg_for_device 1361 * @dev: valid struct device pointer 1362 * @sg: list of buffers 1363 * @nents: number of buffers to map (returned from dma_map_sg) 1364 * @dir: DMA transfer direction (same as was passed to dma_map_sg) 1365 */ 1366 static void arm_iommu_sync_sg_for_device(struct device *dev, 1367 struct scatterlist *sg, 1368 int nents, enum dma_data_direction dir) 1369 { 1370 struct scatterlist *s; 1371 int i; 1372 1373 if (dev->dma_coherent) 1374 return; 1375 1376 for_each_sg(sg, s, nents, i) 1377 arch_sync_dma_for_device(page_to_phys(sg_page(s)) + s->offset, 1378 s->length, dir); 1379 } 1380 1381 /** 1382 * arm_iommu_map_page 1383 * @dev: valid struct device pointer 1384 * @page: page that buffer resides in 1385 * @offset: offset into page for start of buffer 1386 * @size: size of buffer to map 1387 * @dir: DMA transfer direction 1388 * 1389 * IOMMU aware version of arm_dma_map_page() 1390 */ 1391 static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page, 1392 unsigned long offset, size_t size, enum dma_data_direction dir, 1393 unsigned long attrs) 1394 { 1395 struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); 1396 dma_addr_t dma_addr; 1397 int ret, prot, len = PAGE_ALIGN(size + offset); 1398 1399 if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) 1400 arch_sync_dma_for_device(page_to_phys(page) + offset, 1401 size, dir); 1402 1403 dma_addr = __alloc_iova(mapping, len); 1404 if (dma_addr == DMA_MAPPING_ERROR) 1405 return dma_addr; 1406 1407 prot = __dma_info_to_prot(dir, attrs); 1408 1409 ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 1410 prot, GFP_KERNEL); 1411 if (ret < 0) 1412 goto fail; 1413 1414 return dma_addr + offset; 1415 fail: 1416 __free_iova(mapping, dma_addr, len); 1417 return DMA_MAPPING_ERROR; 1418 } 1419 1420 /** 1421 * arm_iommu_unmap_page 1422 * @dev: valid struct device pointer 1423 * @handle: DMA address of buffer 1424 * @size: size of buffer (same as passed to dma_map_page) 1425 * @dir: DMA transfer direction (same as passed to dma_map_page) 1426 * 1427 * IOMMU aware version of arm_dma_unmap_page() 1428 */ 1429 static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, 1430 size_t size, enum dma_data_direction dir, unsigned long attrs) 1431 { 1432 struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); 1433 dma_addr_t iova = handle & PAGE_MASK; 1434 phys_addr_t phys; 1435 int offset = handle & ~PAGE_MASK; > 1436 int len = PAGE_ALIGN(size + offset); 1437 1438 if (!iova) 1439 return; 1440 > 1441 if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) 1442 phys = iommu_iova_to_phys(mapping->domain, handle); 1443 arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); 1444 } 1445 > 1446 iommu_unmap(mapping->domain, iova, len); > 1447 __free_iova(mapping, iova, len); 1448 } 1449 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-03-27 12:13 ` Arnd Bergmann ` (8 preceding siblings ...) (?) @ 2023-03-28 13:17 ` kernel test robot -1 siblings, 0 replies; 456+ messages in thread From: kernel test robot @ 2023-03-28 13:17 UTC (permalink / raw) To: Arnd Bergmann; +Cc: oe-kbuild-all Hi Arnd, I love your patch! Yet something to improve: [auto build test ERROR on soc/for-next] [also build test ERROR on jcmvbkbc-xtensa/xtensa-for-next powerpc/next powerpc/fixes linus/master v6.3-rc4 next-20230328] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Arnd-Bergmann/openrisc-dma-mapping-flush-bidirectional-mappings/20230327-202133 base: https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next patch link: https://lore.kernel.org/r/20230327121317.4081816-21-arnd%40kernel.org patch subject: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper config: arm-allyesconfig (https://download.01.org/0day-ci/archive/20230328/202303282114.do3TMGYj-lkp@intel.com/config) compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/6455ebecc75d3dcbfcaf31db6e97534d0c564ca3 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Arnd-Bergmann/openrisc-dma-mapping-flush-bidirectional-mappings/20230327-202133 git checkout 6455ebecc75d3dcbfcaf31db6e97534d0c564ca3 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303282114.do3TMGYj-lkp@intel.com/ All errors (new ones prefixed by >>): arch/arm/mm/dma-mapping.c: In function 'arm_iommu_sync_dma_for_cpu': >> arch/arm/mm/dma-mapping.c:1306:45: error: 's' undeclared (first use in this function); did you mean 's8'? 1306 | arch_sync_dma_for_cpu(phys, s->length, dir); | ^ | s8 arch/arm/mm/dma-mapping.c:1306:45: note: each undeclared identifier is reported only once for each function it appears in arch/arm/mm/dma-mapping.c: In function 'arm_iommu_unmap_page': arch/arm/mm/dma-mapping.c:1441:9: warning: this 'if' clause does not guard... [-Wmisleading-indentation] 1441 | if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) | ^~ arch/arm/mm/dma-mapping.c:1443:17: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if' 1443 | arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ arch/arm/mm/dma-mapping.c:1436:13: warning: unused variable 'len' [-Wunused-variable] 1436 | int len = PAGE_ALIGN(size + offset); | ^~~ arch/arm/mm/dma-mapping.c: At top level: >> arch/arm/mm/dma-mapping.c:1446:28: error: expected ')' before '->' token 1446 | iommu_unmap(mapping->domain, iova, len); | ^~ | ) arch/arm/mm/dma-mapping.c:1447:9: warning: data definition has no type or storage class 1447 | __free_iova(mapping, iova, len); | ^~~~~~~~~~~ >> arch/arm/mm/dma-mapping.c:1447:9: error: type defaults to 'int' in declaration of '__free_iova' [-Werror=implicit-int] arch/arm/mm/dma-mapping.c:1447:9: warning: parameter names (without types) in function declaration >> arch/arm/mm/dma-mapping.c:1447:9: error: conflicting types for '__free_iova'; have 'int()' arch/arm/mm/dma-mapping.c:825:20: note: previous definition of '__free_iova' with type 'void(struct dma_iommu_mapping *, dma_addr_t, size_t)' {aka 'void(struct dma_iommu_mapping *, unsigned int, unsigned int)'} 825 | static inline void __free_iova(struct dma_iommu_mapping *mapping, | ^~~~~~~~~~~ >> arch/arm/mm/dma-mapping.c:1448:1: error: expected identifier or '(' before '}' token 1448 | } | ^ cc1: some warnings being treated as errors vim +1306 arch/arm/mm/dma-mapping.c 1300 1301 static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, 1302 enum dma_data_direction dir, 1303 bool dma_coherent) 1304 { 1305 if (!dma_coherent) > 1306 arch_sync_dma_for_cpu(phys, s->length, dir); 1307 1308 if (dir == DMA_FROM_DEVICE) 1309 arch_dma_mark_clean(phys, s->length); 1310 } 1311 1312 /** 1313 * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg 1314 * @dev: valid struct device pointer 1315 * @sg: list of buffers 1316 * @nents: number of buffers to unmap (same as was passed to dma_map_sg) 1317 * @dir: DMA transfer direction (same as was passed to dma_map_sg) 1318 * 1319 * Unmap a set of streaming mode DMA translations. Again, CPU access 1320 * rules concerning calls here are the same as for dma_unmap_single(). 1321 */ 1322 static void arm_iommu_unmap_sg(struct device *dev, 1323 struct scatterlist *sg, int nents, 1324 enum dma_data_direction dir, 1325 unsigned long attrs) 1326 { 1327 struct scatterlist *s; 1328 int i; 1329 1330 for_each_sg(sg, s, nents, i) { 1331 if (sg_dma_len(s)) 1332 __iommu_remove_mapping(dev, sg_dma_address(s), 1333 sg_dma_len(s)); 1334 if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) 1335 arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, 1336 dev->dma_coherent); 1337 } 1338 } 1339 1340 /** 1341 * arm_iommu_sync_sg_for_cpu 1342 * @dev: valid struct device pointer 1343 * @sg: list of buffers 1344 * @nents: number of buffers to map (returned from dma_map_sg) 1345 * @dir: DMA transfer direction (same as was passed to dma_map_sg) 1346 */ 1347 static void arm_iommu_sync_sg_for_cpu(struct device *dev, 1348 struct scatterlist *sg, 1349 int nents, enum dma_data_direction dir) 1350 { 1351 struct scatterlist *s; 1352 int i; 1353 1354 for_each_sg(sg, s, nents, i) 1355 arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, 1356 dev->dma_coherent); 1357 } 1358 1359 /** 1360 * arm_iommu_sync_sg_for_device 1361 * @dev: valid struct device pointer 1362 * @sg: list of buffers 1363 * @nents: number of buffers to map (returned from dma_map_sg) 1364 * @dir: DMA transfer direction (same as was passed to dma_map_sg) 1365 */ 1366 static void arm_iommu_sync_sg_for_device(struct device *dev, 1367 struct scatterlist *sg, 1368 int nents, enum dma_data_direction dir) 1369 { 1370 struct scatterlist *s; 1371 int i; 1372 1373 if (dev->dma_coherent) 1374 return; 1375 1376 for_each_sg(sg, s, nents, i) 1377 arch_sync_dma_for_device(page_to_phys(sg_page(s)) + s->offset, 1378 s->length, dir); 1379 } 1380 1381 /** 1382 * arm_iommu_map_page 1383 * @dev: valid struct device pointer 1384 * @page: page that buffer resides in 1385 * @offset: offset into page for start of buffer 1386 * @size: size of buffer to map 1387 * @dir: DMA transfer direction 1388 * 1389 * IOMMU aware version of arm_dma_map_page() 1390 */ 1391 static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page, 1392 unsigned long offset, size_t size, enum dma_data_direction dir, 1393 unsigned long attrs) 1394 { 1395 struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); 1396 dma_addr_t dma_addr; 1397 int ret, prot, len = PAGE_ALIGN(size + offset); 1398 1399 if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) 1400 arch_sync_dma_for_device(page_to_phys(page) + offset, 1401 size, dir); 1402 1403 dma_addr = __alloc_iova(mapping, len); 1404 if (dma_addr == DMA_MAPPING_ERROR) 1405 return dma_addr; 1406 1407 prot = __dma_info_to_prot(dir, attrs); 1408 1409 ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 1410 prot, GFP_KERNEL); 1411 if (ret < 0) 1412 goto fail; 1413 1414 return dma_addr + offset; 1415 fail: 1416 __free_iova(mapping, dma_addr, len); 1417 return DMA_MAPPING_ERROR; 1418 } 1419 1420 /** 1421 * arm_iommu_unmap_page 1422 * @dev: valid struct device pointer 1423 * @handle: DMA address of buffer 1424 * @size: size of buffer (same as passed to dma_map_page) 1425 * @dir: DMA transfer direction (same as passed to dma_map_page) 1426 * 1427 * IOMMU aware version of arm_dma_unmap_page() 1428 */ 1429 static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, 1430 size_t size, enum dma_data_direction dir, unsigned long attrs) 1431 { 1432 struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); 1433 dma_addr_t iova = handle & PAGE_MASK; 1434 phys_addr_t phys; 1435 int offset = handle & ~PAGE_MASK; 1436 int len = PAGE_ALIGN(size + offset); 1437 1438 if (!iova) 1439 return; 1440 1441 if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) 1442 phys = iommu_iova_to_phys(mapping->domain, handle); > 1443 arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); 1444 } 1445 > 1446 iommu_unmap(mapping->domain, iova, len); > 1447 __free_iova(mapping, iova, len); > 1448 } 1449 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-07-03 7:54 ` Geert Uytterhoeven -1 siblings, 0 replies; 456+ messages in thread From: Geert Uytterhoeven @ 2023-07-03 7:54 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Emil Renner Berthing Hi Arnd, On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. On > ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense > to use the same hook in order to have identical arch_sync_dma_for_cpu() > semantics as all other architectures. > > Splitting this out has multiple effects: > > - for dma-direct, this now gets called after arch_sync_dma_for_cpu() > for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While > it would not be harmful to keep doing it for bidirectional mappings, > those are apparently not used in any callers that care about the flag. > > - Since arm has its own dma-iommu abstraction, this now also needs to > call the same function, so the calls are added there to mirror the > dma-direct version. > > - Like dma-direct, the dma-iommu version now marks the dcache clean > for both coherent and noncoherent devices after a DMA, but it only > does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. > > [ HELP NEEDED: can anyone confirm that it is a correct assumption > on arm that a cache-coherent device writing to a page always results > in it being in a PG_dcache_clean state like on ia64, or can a device > write directly into the dcache?] > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Thanks for your patch, which is now commit 322dbe898f82fd8a ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in esmil/jh7100-dmapool. If CONFIG_ARM_DMA_USE_IOMMU=y, the build fails. > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, > return -EINVAL; > } > > +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, > + enum dma_data_direction dir, > + bool dma_coherent) > +{ > + if (!dma_coherent) > + arch_sync_dma_for_cpu(phys, s->length, dir); s/s->length/len/ > + > + if (dir == DMA_FROM_DEVICE) > + arch_dma_mark_clean(phys, s->length); Likewise. > +} > + > /** > * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg > * @dev: valid struct device pointer > @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, > if (!iova) > return; > > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) Missing opening curly brace. > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > iommu_unmap(mapping->domain, iova, len); With the above fixed, it builds and boots fine (on R-Car M2-W). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-07-03 7:54 ` Geert Uytterhoeven 0 siblings, 0 replies; 456+ messages in thread From: Geert Uytterhoeven @ 2023-07-03 7:54 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Emil Renner Berthing Hi Arnd, On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. On > ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense > to use the same hook in order to have identical arch_sync_dma_for_cpu() > semantics as all other architectures. > > Splitting this out has multiple effects: > > - for dma-direct, this now gets called after arch_sync_dma_for_cpu() > for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While > it would not be harmful to keep doing it for bidirectional mappings, > those are apparently not used in any callers that care about the flag. > > - Since arm has its own dma-iommu abstraction, this now also needs to > call the same function, so the calls are added there to mirror the > dma-direct version. > > - Like dma-direct, the dma-iommu version now marks the dcache clean > for both coherent and noncoherent devices after a DMA, but it only > does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. > > [ HELP NEEDED: can anyone confirm that it is a correct assumption > on arm that a cache-coherent device writing to a page always results > in it being in a PG_dcache_clean state like on ia64, or can a device > write directly into the dcache?] > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Thanks for your patch, which is now commit 322dbe898f82fd8a ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in esmil/jh7100-dmapool. If CONFIG_ARM_DMA_USE_IOMMU=y, the build fails. > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, > return -EINVAL; > } > > +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, > + enum dma_data_direction dir, > + bool dma_coherent) > +{ > + if (!dma_coherent) > + arch_sync_dma_for_cpu(phys, s->length, dir); s/s->length/len/ > + > + if (dir == DMA_FROM_DEVICE) > + arch_dma_mark_clean(phys, s->length); Likewise. > +} > + > /** > * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg > * @dev: valid struct device pointer > @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, > if (!iova) > return; > > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) Missing opening curly brace. > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > iommu_unmap(mapping->domain, iova, len); With the above fixed, it builds and boots fine (on R-Car M2-W). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-07-03 7:54 ` Geert Uytterhoeven 0 siblings, 0 replies; 456+ messages in thread From: Geert Uytterhoeven @ 2023-07-03 7:54 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Emil Renner Berthing, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstr ong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller Hi Arnd, On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. On > ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense > to use the same hook in order to have identical arch_sync_dma_for_cpu() > semantics as all other architectures. > > Splitting this out has multiple effects: > > - for dma-direct, this now gets called after arch_sync_dma_for_cpu() > for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While > it would not be harmful to keep doing it for bidirectional mappings, > those are apparently not used in any callers that care about the flag. > > - Since arm has its own dma-iommu abstraction, this now also needs to > call the same function, so the calls are added there to mirror the > dma-direct version. > > - Like dma-direct, the dma-iommu version now marks the dcache clean > for both coherent and noncoherent devices after a DMA, but it only > does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. > > [ HELP NEEDED: can anyone confirm that it is a correct assumption > on arm that a cache-coherent device writing to a page always results > in it being in a PG_dcache_clean state like on ia64, or can a device > write directly into the dcache?] > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Thanks for your patch, which is now commit 322dbe898f82fd8a ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in esmil/jh7100-dmapool. If CONFIG_ARM_DMA_USE_IOMMU=y, the build fails. > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, > return -EINVAL; > } > > +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, > + enum dma_data_direction dir, > + bool dma_coherent) > +{ > + if (!dma_coherent) > + arch_sync_dma_for_cpu(phys, s->length, dir); s/s->length/len/ > + > + if (dir == DMA_FROM_DEVICE) > + arch_dma_mark_clean(phys, s->length); Likewise. > +} > + > /** > * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg > * @dev: valid struct device pointer > @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, > if (!iova) > return; > > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) Missing opening curly brace. > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > iommu_unmap(mapping->domain, iova, len); With the above fixed, it builds and boots fine (on R-Car M2-W). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-07-03 7:54 ` Geert Uytterhoeven 0 siblings, 0 replies; 456+ messages in thread From: Geert Uytterhoeven @ 2023-07-03 7:54 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Emil Renner Berthing Hi Arnd, On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. On > ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense > to use the same hook in order to have identical arch_sync_dma_for_cpu() > semantics as all other architectures. > > Splitting this out has multiple effects: > > - for dma-direct, this now gets called after arch_sync_dma_for_cpu() > for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While > it would not be harmful to keep doing it for bidirectional mappings, > those are apparently not used in any callers that care about the flag. > > - Since arm has its own dma-iommu abstraction, this now also needs to > call the same function, so the calls are added there to mirror the > dma-direct version. > > - Like dma-direct, the dma-iommu version now marks the dcache clean > for both coherent and noncoherent devices after a DMA, but it only > does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. > > [ HELP NEEDED: can anyone confirm that it is a correct assumption > on arm that a cache-coherent device writing to a page always results > in it being in a PG_dcache_clean state like on ia64, or can a device > write directly into the dcache?] > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Thanks for your patch, which is now commit 322dbe898f82fd8a ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in esmil/jh7100-dmapool. If CONFIG_ARM_DMA_USE_IOMMU=y, the build fails. > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, > return -EINVAL; > } > > +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, > + enum dma_data_direction dir, > + bool dma_coherent) > +{ > + if (!dma_coherent) > + arch_sync_dma_for_cpu(phys, s->length, dir); s/s->length/len/ > + > + if (dir == DMA_FROM_DEVICE) > + arch_dma_mark_clean(phys, s->length); Likewise. > +} > + > /** > * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg > * @dev: valid struct device pointer > @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, > if (!iova) > return; > > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) Missing opening curly brace. > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > iommu_unmap(mapping->domain, iova, len); With the above fixed, it builds and boots fine (on R-Car M2-W). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-07-03 7:54 ` Geert Uytterhoeven 0 siblings, 0 replies; 456+ messages in thread From: Geert Uytterhoeven @ 2023-07-03 7:54 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Emil Renner Berthing Hi Arnd, On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. On > ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense > to use the same hook in order to have identical arch_sync_dma_for_cpu() > semantics as all other architectures. > > Splitting this out has multiple effects: > > - for dma-direct, this now gets called after arch_sync_dma_for_cpu() > for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While > it would not be harmful to keep doing it for bidirectional mappings, > those are apparently not used in any callers that care about the flag. > > - Since arm has its own dma-iommu abstraction, this now also needs to > call the same function, so the calls are added there to mirror the > dma-direct version. > > - Like dma-direct, the dma-iommu version now marks the dcache clean > for both coherent and noncoherent devices after a DMA, but it only > does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. > > [ HELP NEEDED: can anyone confirm that it is a correct assumption > on arm that a cache-coherent device writing to a page always results > in it being in a PG_dcache_clean state like on ia64, or can a device > write directly into the dcache?] > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Thanks for your patch, which is now commit 322dbe898f82fd8a ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in esmil/jh7100-dmapool. If CONFIG_ARM_DMA_USE_IOMMU=y, the build fails. > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, > return -EINVAL; > } > > +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, > + enum dma_data_direction dir, > + bool dma_coherent) > +{ > + if (!dma_coherent) > + arch_sync_dma_for_cpu(phys, s->length, dir); s/s->length/len/ > + > + if (dir == DMA_FROM_DEVICE) > + arch_dma_mark_clean(phys, s->length); Likewise. > +} > + > /** > * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg > * @dev: valid struct device pointer > @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, > if (!iova) > return; > > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) Missing opening curly brace. > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > iommu_unmap(mapping->domain, iova, len); With the above fixed, it builds and boots fine (on R-Car M2-W). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-07-03 7:54 ` Geert Uytterhoeven 0 siblings, 0 replies; 456+ messages in thread From: Geert Uytterhoeven @ 2023-07-03 7:54 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Hi Arnd, On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. On > ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense > to use the same hook in order to have identical arch_sync_dma_for_cpu() > semantics as all other architectures. > > Splitting this out has multiple effects: > > - for dma-direct, this now gets called after arch_sync_dma_for_cpu() > for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While > it would not be harmful to keep doing it for bidirectional mappings, > those are apparently not used in any callers that care about the flag. > > - Since arm has its own dma-iommu abstraction, this now also needs to > call the same function, so the calls are added there to mirror the > dma-direct version. > > - Like dma-direct, the dma-iommu version now marks the dcache clean > for both coherent and noncoherent devices after a DMA, but it only > does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. > > [ HELP NEEDED: can anyone confirm that it is a correct assumption > on arm that a cache-coherent device writing to a page always results > in it being in a PG_dcache_clean state like on ia64, or can a device > write directly into the dcache?] > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Thanks for your patch, which is now commit 322dbe898f82fd8a ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in esmil/jh7100-dmapool. If CONFIG_ARM_DMA_USE_IOMMU=y, the build fails. > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, > return -EINVAL; > } > > +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, > + enum dma_data_direction dir, > + bool dma_coherent) > +{ > + if (!dma_coherent) > + arch_sync_dma_for_cpu(phys, s->length, dir); s/s->length/len/ > + > + if (dir == DMA_FROM_DEVICE) > + arch_dma_mark_clean(phys, s->length); Likewise. > +} > + > /** > * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg > * @dev: valid struct device pointer > @@ -1425,9 +1438,9 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, > if (!iova) > return; > > - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) Missing opening curly brace. > phys = iommu_iova_to_phys(mapping->domain, handle); > - arch_sync_dma_for_cpu(phys, size, dir); > + arm_iommu_sync_dma_for_cpu(phys, size, dir, dev->dma_coherent); > } > > iommu_unmap(mapping->domain, iova, len); With the above fixed, it builds and boots fine (on R-Car M2-W). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper 2023-07-03 7:54 ` Geert Uytterhoeven ` (3 preceding siblings ...) (?) @ 2023-07-06 14:11 ` Christoph Hellwig -1 siblings, 0 replies; 456+ messages in thread From: Christoph Hellwig @ 2023-07-06 14:11 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Arnd Bergmann, linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Emil Renner Berthing > Thanks for your patch, which is now commit 322dbe898f82fd8a > ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in > esmil/jh7100-dmapool. Well, something is wrong with that branch then, and this series still needs more work, and should eventually be merged through the dma-mapping tree. ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-07-06 14:11 ` Christoph Hellwig 0 siblings, 0 replies; 456+ messages in thread From: Christoph Hellwig @ 2023-07-06 14:11 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Arnd Bergmann, linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Emil Renner Berthing > Thanks for your patch, which is now commit 322dbe898f82fd8a > ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in > esmil/jh7100-dmapool. Well, something is wrong with that branch then, and this series still needs more work, and should eventually be merged through the dma-mapping tree. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-07-06 14:11 ` Christoph Hellwig 0 siblings, 0 replies; 456+ messages in thread From: Christoph Hellwig @ 2023-07-06 14:11 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Neil Armstrong, Lad Prabhakar, linux-m68k, Emil Renner Berthing, Paul Walmsley, Stafford Horne, linux-arm-kernel, Brian Cain, Arnd Bergmann, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller > Thanks for your patch, which is now commit 322dbe898f82fd8a > ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in > esmil/jh7100-dmapool. Well, something is wrong with that branch then, and this series still needs more work, and should eventually be merged through the dma-mapping tree. ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-07-06 14:11 ` Christoph Hellwig 0 siblings, 0 replies; 456+ messages in thread From: Christoph Hellwig @ 2023-07-06 14:11 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Arnd Bergmann, linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Emil Renner Berthing > Thanks for your patch, which is now commit 322dbe898f82fd8a > ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in > esmil/jh7100-dmapool. Well, something is wrong with that branch then, and this series still needs more work, and should eventually be merged through the dma-mapping tree. _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-07-06 14:11 ` Christoph Hellwig 0 siblings, 0 replies; 456+ messages in thread From: Christoph Hellwig @ 2023-07-06 14:11 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Arnd Bergmann, linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa, Emil Renner Berthing > Thanks for your patch, which is now commit 322dbe898f82fd8a > ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in > esmil/jh7100-dmapool. Well, something is wrong with that branch then, and this series still needs more work, and should eventually be merged through the dma-mapping tree. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper @ 2023-07-06 14:11 ` Christoph Hellwig 0 siblings, 0 replies; 456+ messages in thread From: Christoph Hellwig @ 2023-07-06 14:11 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Arnd Bergmann, linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker <dali> > Thanks for your patch, which is now commit 322dbe898f82fd8a > ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in > esmil/jh7100-dmapool. Well, something is wrong with that branch then, and this series still needs more work, and should eventually be merged through the dma-mapping tree. ^ permalink raw reply [flat|nested] 456+ messages in thread
* [PATCH 21/21] dma-mapping: replace custom code with generic implementation 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 12:13 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Now that all of these have consistent behavior, replace them with a single shared implementation of arch_sync_dma_for_device() and arch_sync_dma_for_cpu() and three parameters to pick how they should operate: - If the CPU has speculative prefetching, then the cache has to be invalidated after a transfer from the device. On the rarer CPUs without prefetching, this can be skipped, with all cache management happening before the transfer. This flag can be runtime detected, but is usually fixed per architecture. - Some architectures currently clean the caches before DMA from a device, while others invalidate it. There has not been a conclusion regarding whether we should change all architectures to use clean instead, so this adds an architecture specific flag that we can change later on. - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps track pages that are marked clean in the page cache, to avoid flushing them again. The implementation for this is generic enough to work on all architectures that use the PG_dcache_clean page flag, but a Kconfig symbol is used to only enable it on Arm to preserve the existing behavior. For the function naming, I picked 'wback' over 'clean', and 'wback_inv' over 'flush', to avoid any ambiguity of what the helper functions are supposed to do. Moving the global functions into a header file is usually a bad idea as it prevents the header from being included more than once, but it helps keep the behavior as close as possible to the previous state, including the possibility of inlining most of it into these functions where that was done before. This also helps keep the global namespace clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use them incorrectly. It would be possible to do this one architecture at a time, but as the change is the same everywhere, the combined patch helps explain it better once. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arc/mm/dma.c | 66 +++++------------- arch/arm/Kconfig | 3 + arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- arch/arm/mm/dma-mapping.c | 64 +++++++----------- arch/arm64/mm/dma-mapping.c | 28 +++++--- arch/csky/mm/dma-mapping.c | 44 ++++++------ arch/hexagon/kernel/dma.c | 44 ++++++------ arch/m68k/kernel/dma.c | 43 +++++++----- arch/microblaze/kernel/dma.c | 48 +++++++------- arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- arch/nios2/mm/dma-mapping.c | 57 +++++++--------- arch/openrisc/kernel/dma.c | 63 +++++++++++------- arch/parisc/kernel/pci-dma.c | 46 ++++++------- arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- arch/sh/kernel/dma-coherent.c | 43 +++++++----- arch/sparc/kernel/ioport.c | 38 ++++++++--- arch/xtensa/kernel/pci-dma.c | 40 ++++++----- include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 include/linux/dma-sync.h diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index ddb96786f765..61cd01646222 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) dma_cache_wback_inv(page_to_phys(page), size); } -/* - * Cache operations depending on function and direction argument, inspired by - * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] - * dma-mapping: provide a generic dma-noncoherent implementation)" - * - * | map == for_device | unmap == for_cpu - * |---------------------------------------------------------------- - * TO_DEV | writeback writeback | none none - * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR | writeback writeback | invalidate invalidate - * - * [*] needed for CPU speculative prefetches - * - * NOTE: we don't check the validity of direction argument as it is done in - * upper layer functions (in include/linux/dma-mapping.h) - */ - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback(paddr, size); - break; - - case DMA_FROM_DEVICE: - dma_cache_inv(paddr, size); - break; - - case DMA_BIDIRECTIONAL: - dma_cache_wback(paddr, size); - break; + dma_cache_wback(paddr, size); +} - default: - break; - } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_inv(paddr, size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - break; + dma_cache_wback_inv(paddr, size); +} - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv(paddr, size); - break; +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} - default: - break; - } +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + /* * Plug in direct dma map ops. */ diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 125d58c54ab1..0de84e861027 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT bool default y +config ARCH_DMA_MARK_DCACHE_CLEAN + def_bool y + config ARCH_HAS_ILOG2_U32 bool diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index 12b5c6ae93fc..0817274aed15 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -13,27 +13,36 @@ #include "dma.h" -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - if (dir == DMA_FROM_DEVICE) { - dmac_inv_range(__va(paddr), __va(paddr + size)); - outer_inv_range(paddr, paddr + size); - } else { - dmac_clean_range(__va(paddr), __va(paddr + size)); - outer_clean_range(paddr, paddr + size); - } + dmac_clean_range(__va(paddr), __va(paddr + size)); + outer_clean_range(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - if (dir != DMA_TO_DEVICE) { - outer_inv_range(paddr, paddr + size); - dmac_inv_range(__va(paddr), __va(paddr)); - } + dmac_inv_range(__va(paddr), __va(paddr + size)); + outer_inv_range(paddr, paddr + size); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dmac_flush_range(__va(paddr), __va(paddr + size)); + outer_flush_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu, bool coherent) { diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index b703cb83d27e..aa6ee820a0ab 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t size) } } +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); +} + + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_flush_range); + outer_flush_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -699,45 +723,7 @@ static bool arch_sync_dma_cpu_needs_post_dma_flush(void) return false; } -/* - * Make an area consistent for devices. - * Note: Drivers should NOT use this function directly. - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) - */ -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_maint(paddr, size, dmac_clean_range); - outer_clean_range(paddr, paddr + size); - break; - case DMA_FROM_DEVICE: - dma_cache_maint(paddr, size, dmac_inv_range); - outer_inv_range(paddr, paddr + size); - break; - case DMA_BIDIRECTIONAL: - if (arch_sync_dma_cpu_needs_post_dma_flush()) { - dma_cache_maint(paddr, size, dmac_clean_range); - outer_clean_range(paddr, paddr + size); - } else { - dma_cache_maint(paddr, size, dmac_flush_range); - outer_flush_range(paddr, paddr + size); - } - break; - default: - break; - } -} - -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { - outer_inv_range(paddr, paddr + size); - dma_cache_maint(paddr, size, dmac_inv_range); - } -} +#include <linux/dma-sync.h> #ifdef CONFIG_ARM_DMA_USE_IOMMU diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 5240f6acad64..bae741aa65e9 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -13,25 +13,33 @@ #include <asm/cacheflush.h> #include <asm/xen/xen-ops.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - unsigned long start = (unsigned long)phys_to_virt(paddr); + dcache_clean_poc(paddr, paddr + size); +} - dcache_clean_poc(start, start + size); +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dcache_inval_poc(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - unsigned long start = (unsigned long)phys_to_virt(paddr); + dcache_clean_inval_poc(paddr, paddr + size); +} - if (dir == DMA_TO_DEVICE) - return; +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} - dcache_inval_poc(start, start + size); +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index c90f912e2822..9402e101b363 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t size) cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wb_range); - break; - default: - BUG(); - } + cache_op(paddr, size, dma_wb_range); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - return; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_inv_range); - break; - default: - BUG(); - } + cache_op(paddr, size, dma_inv_range); } + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + cache_op(paddr, size, dma_wbinv_range); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index 882680e81a30..e6538128a75b 100644 --- a/arch/hexagon/kernel/dma.c +++ b/arch/hexagon/kernel/dma.c @@ -9,29 +9,33 @@ #include <linux/memblock.h> #include <asm/page.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - void *addr = phys_to_virt(paddr); - - switch (dir) { - case DMA_TO_DEVICE: - hexagon_clean_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - case DMA_FROM_DEVICE: - hexagon_inv_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - case DMA_BIDIRECTIONAL: - flush_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - default: - BUG(); - } + hexagon_clean_dcache_range(paddr, paddr + size); } +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) +{ + hexagon_inv_dcache_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t size) +{ + hexagon_flush_dcache_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + /* * Our max_low_pfn should have been backed off by 16MB in mm/init.c to create * DMA coherent space. Use that for the pool. diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index 2e192a5df949..aa9b434e6df8 100644 --- a/arch/m68k/kernel/dma.c +++ b/arch/m68k/kernel/dma.c @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - cache_push(handle, size); - break; - case DMA_FROM_DEVICE: - cache_clear(handle, size); - break; - default: - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir %u\n", - dir); - break; - } + /* + * cache_push() always invalidates in addition to cleaning + * write-back caches. + */ + cache_push(paddr, size); +} + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + cache_clear(paddr, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + cache_push(paddr, size); } + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index b4c4e45fd45e..01110d4aa5b0 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,32 +14,30 @@ #include <linux/bug.h> #include <asm/cacheflush.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_TO_DEVICE: - case DMA_BIDIRECTIONAL: - flush_dcache_range(paddr, paddr + size); - break; - case DMA_FROM_DEVICE: - invalidate_dcache_range(paddr, paddr + size); - break; - default: - BUG(); - } + /* writeback plus invalidate, could be a nop on WT caches */ + flush_dcache_range(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_TO_DEVICE: - break; - case DMA_BIDIRECTIONAL: - case DMA_FROM_DEVICE: - invalidate_dcache_range(paddr, paddr + size); - break; - default: - BUG(); - }} + invalidate_dcache_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + flush_dcache_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b9d68bcc5d53..902d4b7c1f85 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } while (left); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_sync_phys(paddr, size, _dma_cache_wback); - break; - case DMA_FROM_DEVICE: - dma_sync_phys(paddr, size, _dma_cache_inv); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, _dma_cache_wback); - else - dma_sync_phys(paddr, size, _dma_cache_wback_inv); - break; - default: - break; - } + dma_sync_phys(paddr, size, _dma_cache_wback); } -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, _dma_cache_inv); - break; - default: - break; - } + dma_sync_phys(paddr, size, _dma_cache_inv); } -#endif + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dma_sync_phys(paddr, size, _dma_cache_wback_inv); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush(); +} + +#include <linux/dma-sync.h> #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, - const struct iommu_ops *iommu, bool coherent) + const struct iommu_ops *iommu, bool coherent) { - dev->dma_coherent = coherent; + dev->dma_coherent = coherent; } #endif diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index fd887d5f3f9a..29978970955e 100644 --- a/arch/nios2/mm/dma-mapping.c +++ b/arch/nios2/mm/dma-mapping.c @@ -13,53 +13,46 @@ #include <linux/types.h> #include <linux/mm.h> #include <linux/string.h> +#include <linux/dma-map-ops.h> #include <linux/dma-mapping.h> #include <linux/io.h> #include <linux/cache.h> #include <asm/cacheflush.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { + /* + * We just need to write back the caches here, but Nios2 flush + * instruction will do both writeback and invalidate. + */ void *vaddr = phys_to_virt(paddr); + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); +} - switch (dir) { - case DMA_FROM_DEVICE: - invalidate_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - case DMA_TO_DEVICE: - /* - * We just need to flush the caches here , but Nios2 flush - * instruction will do both writeback and invalidate. - */ - case DMA_BIDIRECTIONAL: /* flush and invalidate */ - flush_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - default: - BUG(); - } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} - switch (dir) { - case DMA_BIDIRECTIONAL: - case DMA_FROM_DEVICE: - invalidate_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - case DMA_TO_DEVICE: - break; - default: - BUG(); - } +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index 91a00d09ffad..aba2258e62eb 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) mmap_write_unlock(&init_mm); } -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { unsigned long cl; struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; - switch (dir) { - case DMA_TO_DEVICE: - /* Write back the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBWR, cl); - break; - case DMA_FROM_DEVICE: - /* Invalidate the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBIR, cl); - break; - case DMA_BIDIRECTIONAL: - /* Flush the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); - break; - default: - break; - } + /* Write back the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBWR, cl); } + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + unsigned long cl; + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + + /* Invalidate the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBIR, cl); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + unsigned long cl; + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + + /* Flush the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index 6d3d3cffb316..a7955aab8ce2 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, free_pages((unsigned long)__va(dma_handle), order); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { unsigned long virt = (unsigned long)phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - clean_kernel_dcache_range(virt, size); - break; - case DMA_FROM_DEVICE: - clean_kernel_dcache_range(virt, size); - break; - case DMA_BIDIRECTIONAL: - flush_kernel_dcache_range(virt, size); - break; - } + clean_kernel_dcache_range(virt, size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { unsigned long virt = (unsigned long)phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - purge_kernel_dcache_range(virt, size); - break; - } + purge_kernel_dcache_range(virt, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + flush_kernel_dcache_range(virt, size); } + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 00e59a4faa2b..268510c71156 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) #endif } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { __dma_phys_op(start, end, DMA_CACHE_CLEAN); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - } + __dma_phys_op(start, end, DMA_CACHE_INVAL); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long kaddr = (unsigned long)page_address(page); diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 69c80b2155a1..b9a9f57e02be 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -12,43 +12,40 @@ static bool noncoherent_supported; -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - case DMA_BIDIRECTIONAL: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - default: - break; - } + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); - break; - default: - break; - } + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + void *vaddr = phys_to_virt(paddr); + + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + + void arch_dma_prep_coherent(struct page *page, size_t size) { void *flush_addr = page_address(page); diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index 6a44c0e7ba40..41f031ae7609 100644 --- a/arch/sh/kernel/dma-coherent.c +++ b/arch/sh/kernel/dma-coherent.c @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t size) __flush_purge_region(page_address(page), size); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); - switch (dir) { - case DMA_FROM_DEVICE: /* invalidate only */ - __flush_invalidate_region(addr, size); - break; - case DMA_TO_DEVICE: /* writeback only */ - __flush_wback_region(addr, size); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __flush_purge_region(addr, size); - break; - default: - BUG(); - } + __flush_wback_region(addr, size); } + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); + + __flush_invalidate_region(addr, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); + + __flush_purge_region(addr, size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4f3d26066ec2..6926ead2f208 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); #endif /* CONFIG_SBUS */ -/* - * IIep is write-through, not flushing on cpu to device transfer. - * - * On LEON systems without cache snooping, the entire D-CACHE must be flushed to - * make DMA to cacheable memory coherent. - */ -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - if (dir != DMA_TO_DEVICE && - sparc_cpu_model == sparc_leon && + /* IIep is write-through, not flushing on cpu to device transfer. */ +} + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + /* + * On LEON systems without cache snooping, the entire D-CACHE must be + * flushed to make DMA to cacheable memory coherent. + */ + if (sparc_cpu_model == sparc_leon && !sparc_leon3_snooping_enabled()) leon_flush_dcache_all(); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + arch_dma_cache_inv(paddr, size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + #ifdef CONFIG_PROC_FS static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index ff3bf015eca4..d4ff96585545 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - do_cache_op(paddr, size, __flush_dcache_range); - break; - case DMA_FROM_DEVICE: - do_cache_op(paddr, size, __invalidate_dcache_range); - break; - case DMA_BIDIRECTIONAL: - do_cache_op(paddr, size, __flush_invalidate_dcache_range); - break; - default: - break; - } + do_cache_op(paddr, size, __flush_dcache_range); } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + do_cache_op(paddr, size, __invalidate_dcache_range); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + do_cache_op(paddr, size, __flush_invalidate_dcache_range); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + + void arch_dma_prep_coherent(struct page *page, size_t size) { __invalidate_dcache_range((unsigned long)page_address(page), size); diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file mode 100644 index 000000000000..18e33d5e8eaf --- /dev/null +++ b/include/linux/dma-sync.h @@ -0,0 +1,107 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Cache operations depending on function and direction argument, inspired by + * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] + * dma-mapping: provide a generic dma-noncoherent implementation)" + * + * | map == for_device | unmap == for_cpu + * |---------------------------------------------------------------- + * TO_DEV | writeback writeback | none none + * FROM_DEV | invalidate invalidate | invalidate* invalidate* + * BIDIR | writeback writeback | invalidate invalidate + * + * [*] needed for CPU speculative prefetches + * + * NOTE: we don't check the validity of direction argument as it is done in + * upper layer functions (in include/linux/dma-mapping.h) + * + * This file can be included by arch/.../kernel/dma-noncoherent.c to provide + * the respective high-level operations without having to expose the + * cache management ops to drivers. + */ + +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + /* + * This may be an empty function on write-through caches, + * and it might invalidate the cache if an architecture has + * a write-back cache but no way to write it back without + * invalidating + */ + arch_dma_cache_wback(paddr, size); + break; + + case DMA_FROM_DEVICE: + /* + * FIXME: this should be handled the same across all + * architectures, see + * https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ + */ + if (!arch_sync_dma_clean_before_fromdevice()) { + arch_dma_cache_inv(paddr, size); + break; + } + fallthrough; + + case DMA_BIDIRECTIONAL: + /* Skip the invalidate here if it's done later */ + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + arch_sync_dma_cpu_needs_post_dma_flush()) + arch_dma_cache_wback(paddr, size); + else + arch_dma_cache_wback_inv(paddr, size); + break; + + default: + break; + } +} + +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) +{ +#ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, &page->flags); + left -= PAGE_SIZE; + } +#endif +} + +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + break; + + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ + if (arch_sync_dma_cpu_needs_post_dma_flush()) + arch_dma_cache_inv(paddr, size); + + if (size > PAGE_SIZE) + arch_dma_mark_dcache_clean(paddr, size); + break; + + default: + break; + } +} +#endif -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Now that all of these have consistent behavior, replace them with a single shared implementation of arch_sync_dma_for_device() and arch_sync_dma_for_cpu() and three parameters to pick how they should operate: - If the CPU has speculative prefetching, then the cache has to be invalidated after a transfer from the device. On the rarer CPUs without prefetching, this can be skipped, with all cache management happening before the transfer. This flag can be runtime detected, but is usually fixed per architecture. - Some architectures currently clean the caches before DMA from a device, while others invalidate it. There has not been a conclusion regarding whether we should change all architectures to use clean instead, so this adds an architecture specific flag that we can change later on. - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps track pages that are marked clean in the page cache, to avoid flushing them again. The implementation for this is generic enough to work on all architectures that use the PG_dcache_clean page flag, but a Kconfig symbol is used to only enable it on Arm to preserve the existing behavior. For the function naming, I picked 'wback' over 'clean', and 'wback_inv' over 'flush', to avoid any ambiguity of what the helper functions are supposed to do. Moving the global functions into a header file is usually a bad idea as it prevents the header from being included more than once, but it helps keep the behavior as close as possible to the previous state, including the possibility of inlining most of it into these functions where that was done before. This also helps keep the global namespace clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use them incorrectly. It would be possible to do this one architecture at a time, but as the change is the same everywhere, the combined patch helps explain it better once. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arc/mm/dma.c | 66 +++++------------- arch/arm/Kconfig | 3 + arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- arch/arm/mm/dma-mapping.c | 64 +++++++----------- arch/arm64/mm/dma-mapping.c | 28 +++++--- arch/csky/mm/dma-mapping.c | 44 ++++++------ arch/hexagon/kernel/dma.c | 44 ++++++------ arch/m68k/kernel/dma.c | 43 +++++++----- arch/microblaze/kernel/dma.c | 48 +++++++------- arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- arch/nios2/mm/dma-mapping.c | 57 +++++++--------- arch/openrisc/kernel/dma.c | 63 +++++++++++------- arch/parisc/kernel/pci-dma.c | 46 ++++++------- arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- arch/sh/kernel/dma-coherent.c | 43 +++++++----- arch/sparc/kernel/ioport.c | 38 ++++++++--- arch/xtensa/kernel/pci-dma.c | 40 ++++++----- include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 include/linux/dma-sync.h diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index ddb96786f765..61cd01646222 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) dma_cache_wback_inv(page_to_phys(page), size); } -/* - * Cache operations depending on function and direction argument, inspired by - * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] - * dma-mapping: provide a generic dma-noncoherent implementation)" - * - * | map == for_device | unmap == for_cpu - * |---------------------------------------------------------------- - * TO_DEV | writeback writeback | none none - * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR | writeback writeback | invalidate invalidate - * - * [*] needed for CPU speculative prefetches - * - * NOTE: we don't check the validity of direction argument as it is done in - * upper layer functions (in include/linux/dma-mapping.h) - */ - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback(paddr, size); - break; - - case DMA_FROM_DEVICE: - dma_cache_inv(paddr, size); - break; - - case DMA_BIDIRECTIONAL: - dma_cache_wback(paddr, size); - break; + dma_cache_wback(paddr, size); +} - default: - break; - } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_inv(paddr, size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - break; + dma_cache_wback_inv(paddr, size); +} - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv(paddr, size); - break; +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} - default: - break; - } +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + /* * Plug in direct dma map ops. */ diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 125d58c54ab1..0de84e861027 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT bool default y +config ARCH_DMA_MARK_DCACHE_CLEAN + def_bool y + config ARCH_HAS_ILOG2_U32 bool diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index 12b5c6ae93fc..0817274aed15 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -13,27 +13,36 @@ #include "dma.h" -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - if (dir == DMA_FROM_DEVICE) { - dmac_inv_range(__va(paddr), __va(paddr + size)); - outer_inv_range(paddr, paddr + size); - } else { - dmac_clean_range(__va(paddr), __va(paddr + size)); - outer_clean_range(paddr, paddr + size); - } + dmac_clean_range(__va(paddr), __va(paddr + size)); + outer_clean_range(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - if (dir != DMA_TO_DEVICE) { - outer_inv_range(paddr, paddr + size); - dmac_inv_range(__va(paddr), __va(paddr)); - } + dmac_inv_range(__va(paddr), __va(paddr + size)); + outer_inv_range(paddr, paddr + size); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dmac_flush_range(__va(paddr), __va(paddr + size)); + outer_flush_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu, bool coherent) { diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index b703cb83d27e..aa6ee820a0ab 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t size) } } +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); +} + + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_flush_range); + outer_flush_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -699,45 +723,7 @@ static bool arch_sync_dma_cpu_needs_post_dma_flush(void) return false; } -/* - * Make an area consistent for devices. - * Note: Drivers should NOT use this function directly. - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) - */ -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_maint(paddr, size, dmac_clean_range); - outer_clean_range(paddr, paddr + size); - break; - case DMA_FROM_DEVICE: - dma_cache_maint(paddr, size, dmac_inv_range); - outer_inv_range(paddr, paddr + size); - break; - case DMA_BIDIRECTIONAL: - if (arch_sync_dma_cpu_needs_post_dma_flush()) { - dma_cache_maint(paddr, size, dmac_clean_range); - outer_clean_range(paddr, paddr + size); - } else { - dma_cache_maint(paddr, size, dmac_flush_range); - outer_flush_range(paddr, paddr + size); - } - break; - default: - break; - } -} - -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { - outer_inv_range(paddr, paddr + size); - dma_cache_maint(paddr, size, dmac_inv_range); - } -} +#include <linux/dma-sync.h> #ifdef CONFIG_ARM_DMA_USE_IOMMU diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 5240f6acad64..bae741aa65e9 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -13,25 +13,33 @@ #include <asm/cacheflush.h> #include <asm/xen/xen-ops.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - unsigned long start = (unsigned long)phys_to_virt(paddr); + dcache_clean_poc(paddr, paddr + size); +} - dcache_clean_poc(start, start + size); +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dcache_inval_poc(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - unsigned long start = (unsigned long)phys_to_virt(paddr); + dcache_clean_inval_poc(paddr, paddr + size); +} - if (dir == DMA_TO_DEVICE) - return; +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} - dcache_inval_poc(start, start + size); +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index c90f912e2822..9402e101b363 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t size) cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wb_range); - break; - default: - BUG(); - } + cache_op(paddr, size, dma_wb_range); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - return; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_inv_range); - break; - default: - BUG(); - } + cache_op(paddr, size, dma_inv_range); } + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + cache_op(paddr, size, dma_wbinv_range); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index 882680e81a30..e6538128a75b 100644 --- a/arch/hexagon/kernel/dma.c +++ b/arch/hexagon/kernel/dma.c @@ -9,29 +9,33 @@ #include <linux/memblock.h> #include <asm/page.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - void *addr = phys_to_virt(paddr); - - switch (dir) { - case DMA_TO_DEVICE: - hexagon_clean_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - case DMA_FROM_DEVICE: - hexagon_inv_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - case DMA_BIDIRECTIONAL: - flush_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - default: - BUG(); - } + hexagon_clean_dcache_range(paddr, paddr + size); } +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) +{ + hexagon_inv_dcache_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t size) +{ + hexagon_flush_dcache_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + /* * Our max_low_pfn should have been backed off by 16MB in mm/init.c to create * DMA coherent space. Use that for the pool. diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index 2e192a5df949..aa9b434e6df8 100644 --- a/arch/m68k/kernel/dma.c +++ b/arch/m68k/kernel/dma.c @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - cache_push(handle, size); - break; - case DMA_FROM_DEVICE: - cache_clear(handle, size); - break; - default: - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir %u\n", - dir); - break; - } + /* + * cache_push() always invalidates in addition to cleaning + * write-back caches. + */ + cache_push(paddr, size); +} + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + cache_clear(paddr, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + cache_push(paddr, size); } + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index b4c4e45fd45e..01110d4aa5b0 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,32 +14,30 @@ #include <linux/bug.h> #include <asm/cacheflush.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_TO_DEVICE: - case DMA_BIDIRECTIONAL: - flush_dcache_range(paddr, paddr + size); - break; - case DMA_FROM_DEVICE: - invalidate_dcache_range(paddr, paddr + size); - break; - default: - BUG(); - } + /* writeback plus invalidate, could be a nop on WT caches */ + flush_dcache_range(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_TO_DEVICE: - break; - case DMA_BIDIRECTIONAL: - case DMA_FROM_DEVICE: - invalidate_dcache_range(paddr, paddr + size); - break; - default: - BUG(); - }} + invalidate_dcache_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + flush_dcache_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b9d68bcc5d53..902d4b7c1f85 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } while (left); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_sync_phys(paddr, size, _dma_cache_wback); - break; - case DMA_FROM_DEVICE: - dma_sync_phys(paddr, size, _dma_cache_inv); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, _dma_cache_wback); - else - dma_sync_phys(paddr, size, _dma_cache_wback_inv); - break; - default: - break; - } + dma_sync_phys(paddr, size, _dma_cache_wback); } -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, _dma_cache_inv); - break; - default: - break; - } + dma_sync_phys(paddr, size, _dma_cache_inv); } -#endif + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dma_sync_phys(paddr, size, _dma_cache_wback_inv); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush(); +} + +#include <linux/dma-sync.h> #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, - const struct iommu_ops *iommu, bool coherent) + const struct iommu_ops *iommu, bool coherent) { - dev->dma_coherent = coherent; + dev->dma_coherent = coherent; } #endif diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index fd887d5f3f9a..29978970955e 100644 --- a/arch/nios2/mm/dma-mapping.c +++ b/arch/nios2/mm/dma-mapping.c @@ -13,53 +13,46 @@ #include <linux/types.h> #include <linux/mm.h> #include <linux/string.h> +#include <linux/dma-map-ops.h> #include <linux/dma-mapping.h> #include <linux/io.h> #include <linux/cache.h> #include <asm/cacheflush.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { + /* + * We just need to write back the caches here, but Nios2 flush + * instruction will do both writeback and invalidate. + */ void *vaddr = phys_to_virt(paddr); + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); +} - switch (dir) { - case DMA_FROM_DEVICE: - invalidate_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - case DMA_TO_DEVICE: - /* - * We just need to flush the caches here , but Nios2 flush - * instruction will do both writeback and invalidate. - */ - case DMA_BIDIRECTIONAL: /* flush and invalidate */ - flush_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - default: - BUG(); - } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} - switch (dir) { - case DMA_BIDIRECTIONAL: - case DMA_FROM_DEVICE: - invalidate_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - case DMA_TO_DEVICE: - break; - default: - BUG(); - } +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index 91a00d09ffad..aba2258e62eb 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) mmap_write_unlock(&init_mm); } -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { unsigned long cl; struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; - switch (dir) { - case DMA_TO_DEVICE: - /* Write back the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBWR, cl); - break; - case DMA_FROM_DEVICE: - /* Invalidate the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBIR, cl); - break; - case DMA_BIDIRECTIONAL: - /* Flush the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); - break; - default: - break; - } + /* Write back the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBWR, cl); } + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + unsigned long cl; + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + + /* Invalidate the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBIR, cl); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + unsigned long cl; + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + + /* Flush the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index 6d3d3cffb316..a7955aab8ce2 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, free_pages((unsigned long)__va(dma_handle), order); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { unsigned long virt = (unsigned long)phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - clean_kernel_dcache_range(virt, size); - break; - case DMA_FROM_DEVICE: - clean_kernel_dcache_range(virt, size); - break; - case DMA_BIDIRECTIONAL: - flush_kernel_dcache_range(virt, size); - break; - } + clean_kernel_dcache_range(virt, size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { unsigned long virt = (unsigned long)phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - purge_kernel_dcache_range(virt, size); - break; - } + purge_kernel_dcache_range(virt, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + flush_kernel_dcache_range(virt, size); } + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 00e59a4faa2b..268510c71156 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) #endif } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { __dma_phys_op(start, end, DMA_CACHE_CLEAN); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - } + __dma_phys_op(start, end, DMA_CACHE_INVAL); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long kaddr = (unsigned long)page_address(page); diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 69c80b2155a1..b9a9f57e02be 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -12,43 +12,40 @@ static bool noncoherent_supported; -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - case DMA_BIDIRECTIONAL: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - default: - break; - } + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); - break; - default: - break; - } + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + void *vaddr = phys_to_virt(paddr); + + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + + void arch_dma_prep_coherent(struct page *page, size_t size) { void *flush_addr = page_address(page); diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index 6a44c0e7ba40..41f031ae7609 100644 --- a/arch/sh/kernel/dma-coherent.c +++ b/arch/sh/kernel/dma-coherent.c @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t size) __flush_purge_region(page_address(page), size); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); - switch (dir) { - case DMA_FROM_DEVICE: /* invalidate only */ - __flush_invalidate_region(addr, size); - break; - case DMA_TO_DEVICE: /* writeback only */ - __flush_wback_region(addr, size); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __flush_purge_region(addr, size); - break; - default: - BUG(); - } + __flush_wback_region(addr, size); } + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); + + __flush_invalidate_region(addr, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); + + __flush_purge_region(addr, size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4f3d26066ec2..6926ead2f208 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); #endif /* CONFIG_SBUS */ -/* - * IIep is write-through, not flushing on cpu to device transfer. - * - * On LEON systems without cache snooping, the entire D-CACHE must be flushed to - * make DMA to cacheable memory coherent. - */ -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - if (dir != DMA_TO_DEVICE && - sparc_cpu_model == sparc_leon && + /* IIep is write-through, not flushing on cpu to device transfer. */ +} + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + /* + * On LEON systems without cache snooping, the entire D-CACHE must be + * flushed to make DMA to cacheable memory coherent. + */ + if (sparc_cpu_model == sparc_leon && !sparc_leon3_snooping_enabled()) leon_flush_dcache_all(); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + arch_dma_cache_inv(paddr, size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + #ifdef CONFIG_PROC_FS static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index ff3bf015eca4..d4ff96585545 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - do_cache_op(paddr, size, __flush_dcache_range); - break; - case DMA_FROM_DEVICE: - do_cache_op(paddr, size, __invalidate_dcache_range); - break; - case DMA_BIDIRECTIONAL: - do_cache_op(paddr, size, __flush_invalidate_dcache_range); - break; - default: - break; - } + do_cache_op(paddr, size, __flush_dcache_range); } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + do_cache_op(paddr, size, __invalidate_dcache_range); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + do_cache_op(paddr, size, __flush_invalidate_dcache_range); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + + void arch_dma_prep_coherent(struct page *page, size_t size) { __invalidate_dcache_range((unsigned long)page_address(page), size); diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file mode 100644 index 000000000000..18e33d5e8eaf --- /dev/null +++ b/include/linux/dma-sync.h @@ -0,0 +1,107 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Cache operations depending on function and direction argument, inspired by + * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] + * dma-mapping: provide a generic dma-noncoherent implementation)" + * + * | map == for_device | unmap == for_cpu + * |---------------------------------------------------------------- + * TO_DEV | writeback writeback | none none + * FROM_DEV | invalidate invalidate | invalidate* invalidate* + * BIDIR | writeback writeback | invalidate invalidate + * + * [*] needed for CPU speculative prefetches + * + * NOTE: we don't check the validity of direction argument as it is done in + * upper layer functions (in include/linux/dma-mapping.h) + * + * This file can be included by arch/.../kernel/dma-noncoherent.c to provide + * the respective high-level operations without having to expose the + * cache management ops to drivers. + */ + +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + /* + * This may be an empty function on write-through caches, + * and it might invalidate the cache if an architecture has + * a write-back cache but no way to write it back without + * invalidating + */ + arch_dma_cache_wback(paddr, size); + break; + + case DMA_FROM_DEVICE: + /* + * FIXME: this should be handled the same across all + * architectures, see + * https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ + */ + if (!arch_sync_dma_clean_before_fromdevice()) { + arch_dma_cache_inv(paddr, size); + break; + } + fallthrough; + + case DMA_BIDIRECTIONAL: + /* Skip the invalidate here if it's done later */ + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + arch_sync_dma_cpu_needs_post_dma_flush()) + arch_dma_cache_wback(paddr, size); + else + arch_dma_cache_wback_inv(paddr, size); + break; + + default: + break; + } +} + +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) +{ +#ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, &page->flags); + left -= PAGE_SIZE; + } +#endif +} + +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + break; + + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ + if (arch_sync_dma_cpu_needs_post_dma_flush()) + arch_dma_cache_inv(paddr, size); + + if (size > PAGE_SIZE) + arch_dma_mark_dcache_clean(paddr, size); + break; + + default: + break; + } +} +#endif -- 2.39.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Sime k, Thomas Bogendoerfer, linux-parisc, linux-openrisc, linuxppc-dev, linux-mips, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas, Robin Murphy, David S. Miller From: Arnd Bergmann <arnd@arndb.de> Now that all of these have consistent behavior, replace them with a single shared implementation of arch_sync_dma_for_device() and arch_sync_dma_for_cpu() and three parameters to pick how they should operate: - If the CPU has speculative prefetching, then the cache has to be invalidated after a transfer from the device. On the rarer CPUs without prefetching, this can be skipped, with all cache management happening before the transfer. This flag can be runtime detected, but is usually fixed per architecture. - Some architectures currently clean the caches before DMA from a device, while others invalidate it. There has not been a conclusion regarding whether we should change all architectures to use clean instead, so this adds an architecture specific flag that we can change later on. - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps track pages that are marked clean in the page cache, to avoid flushing them again. The implementation for this is generic enough to work on all architectures that use the PG_dcache_clean page flag, but a Kconfig symbol is used to only enable it on Arm to preserve the existing behavior. For the function naming, I picked 'wback' over 'clean', and 'wback_inv' over 'flush', to avoid any ambiguity of what the helper functions are supposed to do. Moving the global functions into a header file is usually a bad idea as it prevents the header from being included more than once, but it helps keep the behavior as close as possible to the previous state, including the possibility of inlining most of it into these functions where that was done before. This also helps keep the global namespace clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use them incorrectly. It would be possible to do this one architecture at a time, but as the change is the same everywhere, the combined patch helps explain it better once. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arc/mm/dma.c | 66 +++++------------- arch/arm/Kconfig | 3 + arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- arch/arm/mm/dma-mapping.c | 64 +++++++----------- arch/arm64/mm/dma-mapping.c | 28 +++++--- arch/csky/mm/dma-mapping.c | 44 ++++++------ arch/hexagon/kernel/dma.c | 44 ++++++------ arch/m68k/kernel/dma.c | 43 +++++++----- arch/microblaze/kernel/dma.c | 48 +++++++------- arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- arch/nios2/mm/dma-mapping.c | 57 +++++++--------- arch/openrisc/kernel/dma.c | 63 +++++++++++------- arch/parisc/kernel/pci-dma.c | 46 ++++++------- arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- arch/sh/kernel/dma-coherent.c | 43 +++++++----- arch/sparc/kernel/ioport.c | 38 ++++++++--- arch/xtensa/kernel/pci-dma.c | 40 ++++++----- include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 include/linux/dma-sync.h diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index ddb96786f765..61cd01646222 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) dma_cache_wback_inv(page_to_phys(page), size); } -/* - * Cache operations depending on function and direction argument, inspired by - * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] - * dma-mapping: provide a generic dma-noncoherent implementation)" - * - * | map == for_device | unmap == for_cpu - * |---------------------------------------------------------------- - * TO_DEV | writeback writeback | none none - * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR | writeback writeback | invalidate invalidate - * - * [*] needed for CPU speculative prefetches - * - * NOTE: we don't check the validity of direction argument as it is done in - * upper layer functions (in include/linux/dma-mapping.h) - */ - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback(paddr, size); - break; - - case DMA_FROM_DEVICE: - dma_cache_inv(paddr, size); - break; - - case DMA_BIDIRECTIONAL: - dma_cache_wback(paddr, size); - break; + dma_cache_wback(paddr, size); +} - default: - break; - } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_inv(paddr, size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - break; + dma_cache_wback_inv(paddr, size); +} - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv(paddr, size); - break; +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} - default: - break; - } +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + /* * Plug in direct dma map ops. */ diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 125d58c54ab1..0de84e861027 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT bool default y +config ARCH_DMA_MARK_DCACHE_CLEAN + def_bool y + config ARCH_HAS_ILOG2_U32 bool diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index 12b5c6ae93fc..0817274aed15 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -13,27 +13,36 @@ #include "dma.h" -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - if (dir == DMA_FROM_DEVICE) { - dmac_inv_range(__va(paddr), __va(paddr + size)); - outer_inv_range(paddr, paddr + size); - } else { - dmac_clean_range(__va(paddr), __va(paddr + size)); - outer_clean_range(paddr, paddr + size); - } + dmac_clean_range(__va(paddr), __va(paddr + size)); + outer_clean_range(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - if (dir != DMA_TO_DEVICE) { - outer_inv_range(paddr, paddr + size); - dmac_inv_range(__va(paddr), __va(paddr)); - } + dmac_inv_range(__va(paddr), __va(paddr + size)); + outer_inv_range(paddr, paddr + size); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dmac_flush_range(__va(paddr), __va(paddr + size)); + outer_flush_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu, bool coherent) { diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index b703cb83d27e..aa6ee820a0ab 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t size) } } +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); +} + + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_flush_range); + outer_flush_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -699,45 +723,7 @@ static bool arch_sync_dma_cpu_needs_post_dma_flush(void) return false; } -/* - * Make an area consistent for devices. - * Note: Drivers should NOT use this function directly. - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) - */ -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_maint(paddr, size, dmac_clean_range); - outer_clean_range(paddr, paddr + size); - break; - case DMA_FROM_DEVICE: - dma_cache_maint(paddr, size, dmac_inv_range); - outer_inv_range(paddr, paddr + size); - break; - case DMA_BIDIRECTIONAL: - if (arch_sync_dma_cpu_needs_post_dma_flush()) { - dma_cache_maint(paddr, size, dmac_clean_range); - outer_clean_range(paddr, paddr + size); - } else { - dma_cache_maint(paddr, size, dmac_flush_range); - outer_flush_range(paddr, paddr + size); - } - break; - default: - break; - } -} - -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { - outer_inv_range(paddr, paddr + size); - dma_cache_maint(paddr, size, dmac_inv_range); - } -} +#include <linux/dma-sync.h> #ifdef CONFIG_ARM_DMA_USE_IOMMU diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 5240f6acad64..bae741aa65e9 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -13,25 +13,33 @@ #include <asm/cacheflush.h> #include <asm/xen/xen-ops.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - unsigned long start = (unsigned long)phys_to_virt(paddr); + dcache_clean_poc(paddr, paddr + size); +} - dcache_clean_poc(start, start + size); +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dcache_inval_poc(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - unsigned long start = (unsigned long)phys_to_virt(paddr); + dcache_clean_inval_poc(paddr, paddr + size); +} - if (dir == DMA_TO_DEVICE) - return; +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} - dcache_inval_poc(start, start + size); +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index c90f912e2822..9402e101b363 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t size) cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wb_range); - break; - default: - BUG(); - } + cache_op(paddr, size, dma_wb_range); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - return; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_inv_range); - break; - default: - BUG(); - } + cache_op(paddr, size, dma_inv_range); } + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + cache_op(paddr, size, dma_wbinv_range); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index 882680e81a30..e6538128a75b 100644 --- a/arch/hexagon/kernel/dma.c +++ b/arch/hexagon/kernel/dma.c @@ -9,29 +9,33 @@ #include <linux/memblock.h> #include <asm/page.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - void *addr = phys_to_virt(paddr); - - switch (dir) { - case DMA_TO_DEVICE: - hexagon_clean_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - case DMA_FROM_DEVICE: - hexagon_inv_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - case DMA_BIDIRECTIONAL: - flush_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - default: - BUG(); - } + hexagon_clean_dcache_range(paddr, paddr + size); } +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) +{ + hexagon_inv_dcache_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t size) +{ + hexagon_flush_dcache_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + /* * Our max_low_pfn should have been backed off by 16MB in mm/init.c to create * DMA coherent space. Use that for the pool. diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index 2e192a5df949..aa9b434e6df8 100644 --- a/arch/m68k/kernel/dma.c +++ b/arch/m68k/kernel/dma.c @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - cache_push(handle, size); - break; - case DMA_FROM_DEVICE: - cache_clear(handle, size); - break; - default: - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir %u\n", - dir); - break; - } + /* + * cache_push() always invalidates in addition to cleaning + * write-back caches. + */ + cache_push(paddr, size); +} + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + cache_clear(paddr, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + cache_push(paddr, size); } + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index b4c4e45fd45e..01110d4aa5b0 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,32 +14,30 @@ #include <linux/bug.h> #include <asm/cacheflush.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_TO_DEVICE: - case DMA_BIDIRECTIONAL: - flush_dcache_range(paddr, paddr + size); - break; - case DMA_FROM_DEVICE: - invalidate_dcache_range(paddr, paddr + size); - break; - default: - BUG(); - } + /* writeback plus invalidate, could be a nop on WT caches */ + flush_dcache_range(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_TO_DEVICE: - break; - case DMA_BIDIRECTIONAL: - case DMA_FROM_DEVICE: - invalidate_dcache_range(paddr, paddr + size); - break; - default: - BUG(); - }} + invalidate_dcache_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + flush_dcache_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b9d68bcc5d53..902d4b7c1f85 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } while (left); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_sync_phys(paddr, size, _dma_cache_wback); - break; - case DMA_FROM_DEVICE: - dma_sync_phys(paddr, size, _dma_cache_inv); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, _dma_cache_wback); - else - dma_sync_phys(paddr, size, _dma_cache_wback_inv); - break; - default: - break; - } + dma_sync_phys(paddr, size, _dma_cache_wback); } -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, _dma_cache_inv); - break; - default: - break; - } + dma_sync_phys(paddr, size, _dma_cache_inv); } -#endif + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dma_sync_phys(paddr, size, _dma_cache_wback_inv); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush(); +} + +#include <linux/dma-sync.h> #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, - const struct iommu_ops *iommu, bool coherent) + const struct iommu_ops *iommu, bool coherent) { - dev->dma_coherent = coherent; + dev->dma_coherent = coherent; } #endif diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index fd887d5f3f9a..29978970955e 100644 --- a/arch/nios2/mm/dma-mapping.c +++ b/arch/nios2/mm/dma-mapping.c @@ -13,53 +13,46 @@ #include <linux/types.h> #include <linux/mm.h> #include <linux/string.h> +#include <linux/dma-map-ops.h> #include <linux/dma-mapping.h> #include <linux/io.h> #include <linux/cache.h> #include <asm/cacheflush.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { + /* + * We just need to write back the caches here, but Nios2 flush + * instruction will do both writeback and invalidate. + */ void *vaddr = phys_to_virt(paddr); + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); +} - switch (dir) { - case DMA_FROM_DEVICE: - invalidate_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - case DMA_TO_DEVICE: - /* - * We just need to flush the caches here , but Nios2 flush - * instruction will do both writeback and invalidate. - */ - case DMA_BIDIRECTIONAL: /* flush and invalidate */ - flush_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - default: - BUG(); - } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} - switch (dir) { - case DMA_BIDIRECTIONAL: - case DMA_FROM_DEVICE: - invalidate_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - case DMA_TO_DEVICE: - break; - default: - BUG(); - } +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index 91a00d09ffad..aba2258e62eb 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) mmap_write_unlock(&init_mm); } -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { unsigned long cl; struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; - switch (dir) { - case DMA_TO_DEVICE: - /* Write back the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBWR, cl); - break; - case DMA_FROM_DEVICE: - /* Invalidate the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBIR, cl); - break; - case DMA_BIDIRECTIONAL: - /* Flush the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); - break; - default: - break; - } + /* Write back the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBWR, cl); } + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + unsigned long cl; + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + + /* Invalidate the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBIR, cl); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + unsigned long cl; + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + + /* Flush the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index 6d3d3cffb316..a7955aab8ce2 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, free_pages((unsigned long)__va(dma_handle), order); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { unsigned long virt = (unsigned long)phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - clean_kernel_dcache_range(virt, size); - break; - case DMA_FROM_DEVICE: - clean_kernel_dcache_range(virt, size); - break; - case DMA_BIDIRECTIONAL: - flush_kernel_dcache_range(virt, size); - break; - } + clean_kernel_dcache_range(virt, size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { unsigned long virt = (unsigned long)phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - purge_kernel_dcache_range(virt, size); - break; - } + purge_kernel_dcache_range(virt, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + flush_kernel_dcache_range(virt, size); } + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 00e59a4faa2b..268510c71156 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) #endif } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { __dma_phys_op(start, end, DMA_CACHE_CLEAN); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - } + __dma_phys_op(start, end, DMA_CACHE_INVAL); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long kaddr = (unsigned long)page_address(page); diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 69c80b2155a1..b9a9f57e02be 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -12,43 +12,40 @@ static bool noncoherent_supported; -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - case DMA_BIDIRECTIONAL: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - default: - break; - } + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); - break; - default: - break; - } + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + void *vaddr = phys_to_virt(paddr); + + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + + void arch_dma_prep_coherent(struct page *page, size_t size) { void *flush_addr = page_address(page); diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index 6a44c0e7ba40..41f031ae7609 100644 --- a/arch/sh/kernel/dma-coherent.c +++ b/arch/sh/kernel/dma-coherent.c @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t size) __flush_purge_region(page_address(page), size); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); - switch (dir) { - case DMA_FROM_DEVICE: /* invalidate only */ - __flush_invalidate_region(addr, size); - break; - case DMA_TO_DEVICE: /* writeback only */ - __flush_wback_region(addr, size); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __flush_purge_region(addr, size); - break; - default: - BUG(); - } + __flush_wback_region(addr, size); } + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); + + __flush_invalidate_region(addr, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); + + __flush_purge_region(addr, size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4f3d26066ec2..6926ead2f208 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); #endif /* CONFIG_SBUS */ -/* - * IIep is write-through, not flushing on cpu to device transfer. - * - * On LEON systems without cache snooping, the entire D-CACHE must be flushed to - * make DMA to cacheable memory coherent. - */ -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - if (dir != DMA_TO_DEVICE && - sparc_cpu_model == sparc_leon && + /* IIep is write-through, not flushing on cpu to device transfer. */ +} + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + /* + * On LEON systems without cache snooping, the entire D-CACHE must be + * flushed to make DMA to cacheable memory coherent. + */ + if (sparc_cpu_model == sparc_leon && !sparc_leon3_snooping_enabled()) leon_flush_dcache_all(); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + arch_dma_cache_inv(paddr, size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + #ifdef CONFIG_PROC_FS static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index ff3bf015eca4..d4ff96585545 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - do_cache_op(paddr, size, __flush_dcache_range); - break; - case DMA_FROM_DEVICE: - do_cache_op(paddr, size, __invalidate_dcache_range); - break; - case DMA_BIDIRECTIONAL: - do_cache_op(paddr, size, __flush_invalidate_dcache_range); - break; - default: - break; - } + do_cache_op(paddr, size, __flush_dcache_range); } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + do_cache_op(paddr, size, __invalidate_dcache_range); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + do_cache_op(paddr, size, __flush_invalidate_dcache_range); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + + void arch_dma_prep_coherent(struct page *page, size_t size) { __invalidate_dcache_range((unsigned long)page_address(page), size); diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file mode 100644 index 000000000000..18e33d5e8eaf --- /dev/null +++ b/include/linux/dma-sync.h @@ -0,0 +1,107 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Cache operations depending on function and direction argument, inspired by + * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] + * dma-mapping: provide a generic dma-noncoherent implementation)" + * + * | map == for_device | unmap == for_cpu + * |---------------------------------------------------------------- + * TO_DEV | writeback writeback | none none + * FROM_DEV | invalidate invalidate | invalidate* invalidate* + * BIDIR | writeback writeback | invalidate invalidate + * + * [*] needed for CPU speculative prefetches + * + * NOTE: we don't check the validity of direction argument as it is done in + * upper layer functions (in include/linux/dma-mapping.h) + * + * This file can be included by arch/.../kernel/dma-noncoherent.c to provide + * the respective high-level operations without having to expose the + * cache management ops to drivers. + */ + +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + /* + * This may be an empty function on write-through caches, + * and it might invalidate the cache if an architecture has + * a write-back cache but no way to write it back without + * invalidating + */ + arch_dma_cache_wback(paddr, size); + break; + + case DMA_FROM_DEVICE: + /* + * FIXME: this should be handled the same across all + * architectures, see + * https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ + */ + if (!arch_sync_dma_clean_before_fromdevice()) { + arch_dma_cache_inv(paddr, size); + break; + } + fallthrough; + + case DMA_BIDIRECTIONAL: + /* Skip the invalidate here if it's done later */ + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + arch_sync_dma_cpu_needs_post_dma_flush()) + arch_dma_cache_wback(paddr, size); + else + arch_dma_cache_wback_inv(paddr, size); + break; + + default: + break; + } +} + +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) +{ +#ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, &page->flags); + left -= PAGE_SIZE; + } +#endif +} + +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + break; + + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ + if (arch_sync_dma_cpu_needs_post_dma_flush()) + arch_dma_cache_inv(paddr, size); + + if (size > PAGE_SIZE) + arch_dma_mark_dcache_clean(paddr, size); + break; + + default: + break; + } +} +#endif -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Now that all of these have consistent behavior, replace them with a single shared implementation of arch_sync_dma_for_device() and arch_sync_dma_for_cpu() and three parameters to pick how they should operate: - If the CPU has speculative prefetching, then the cache has to be invalidated after a transfer from the device. On the rarer CPUs without prefetching, this can be skipped, with all cache management happening before the transfer. This flag can be runtime detected, but is usually fixed per architecture. - Some architectures currently clean the caches before DMA from a device, while others invalidate it. There has not been a conclusion regarding whether we should change all architectures to use clean instead, so this adds an architecture specific flag that we can change later on. - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps track pages that are marked clean in the page cache, to avoid flushing them again. The implementation for this is generic enough to work on all architectures that use the PG_dcache_clean page flag, but a Kconfig symbol is used to only enable it on Arm to preserve the existing behavior. For the function naming, I picked 'wback' over 'clean', and 'wback_inv' over 'flush', to avoid any ambiguity of what the helper functions are supposed to do. Moving the global functions into a header file is usually a bad idea as it prevents the header from being included more than once, but it helps keep the behavior as close as possible to the previous state, including the possibility of inlining most of it into these functions where that was done before. This also helps keep the global namespace clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use them incorrectly. It would be possible to do this one architecture at a time, but as the change is the same everywhere, the combined patch helps explain it better once. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arc/mm/dma.c | 66 +++++------------- arch/arm/Kconfig | 3 + arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- arch/arm/mm/dma-mapping.c | 64 +++++++----------- arch/arm64/mm/dma-mapping.c | 28 +++++--- arch/csky/mm/dma-mapping.c | 44 ++++++------ arch/hexagon/kernel/dma.c | 44 ++++++------ arch/m68k/kernel/dma.c | 43 +++++++----- arch/microblaze/kernel/dma.c | 48 +++++++------- arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- arch/nios2/mm/dma-mapping.c | 57 +++++++--------- arch/openrisc/kernel/dma.c | 63 +++++++++++------- arch/parisc/kernel/pci-dma.c | 46 ++++++------- arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- arch/sh/kernel/dma-coherent.c | 43 +++++++----- arch/sparc/kernel/ioport.c | 38 ++++++++--- arch/xtensa/kernel/pci-dma.c | 40 ++++++----- include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 include/linux/dma-sync.h diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index ddb96786f765..61cd01646222 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) dma_cache_wback_inv(page_to_phys(page), size); } -/* - * Cache operations depending on function and direction argument, inspired by - * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] - * dma-mapping: provide a generic dma-noncoherent implementation)" - * - * | map == for_device | unmap == for_cpu - * |---------------------------------------------------------------- - * TO_DEV | writeback writeback | none none - * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR | writeback writeback | invalidate invalidate - * - * [*] needed for CPU speculative prefetches - * - * NOTE: we don't check the validity of direction argument as it is done in - * upper layer functions (in include/linux/dma-mapping.h) - */ - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback(paddr, size); - break; - - case DMA_FROM_DEVICE: - dma_cache_inv(paddr, size); - break; - - case DMA_BIDIRECTIONAL: - dma_cache_wback(paddr, size); - break; + dma_cache_wback(paddr, size); +} - default: - break; - } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_inv(paddr, size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - break; + dma_cache_wback_inv(paddr, size); +} - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv(paddr, size); - break; +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} - default: - break; - } +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + /* * Plug in direct dma map ops. */ diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 125d58c54ab1..0de84e861027 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT bool default y +config ARCH_DMA_MARK_DCACHE_CLEAN + def_bool y + config ARCH_HAS_ILOG2_U32 bool diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index 12b5c6ae93fc..0817274aed15 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -13,27 +13,36 @@ #include "dma.h" -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - if (dir == DMA_FROM_DEVICE) { - dmac_inv_range(__va(paddr), __va(paddr + size)); - outer_inv_range(paddr, paddr + size); - } else { - dmac_clean_range(__va(paddr), __va(paddr + size)); - outer_clean_range(paddr, paddr + size); - } + dmac_clean_range(__va(paddr), __va(paddr + size)); + outer_clean_range(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - if (dir != DMA_TO_DEVICE) { - outer_inv_range(paddr, paddr + size); - dmac_inv_range(__va(paddr), __va(paddr)); - } + dmac_inv_range(__va(paddr), __va(paddr + size)); + outer_inv_range(paddr, paddr + size); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dmac_flush_range(__va(paddr), __va(paddr + size)); + outer_flush_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu, bool coherent) { diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index b703cb83d27e..aa6ee820a0ab 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t size) } } +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); +} + + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_flush_range); + outer_flush_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -699,45 +723,7 @@ static bool arch_sync_dma_cpu_needs_post_dma_flush(void) return false; } -/* - * Make an area consistent for devices. - * Note: Drivers should NOT use this function directly. - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) - */ -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_maint(paddr, size, dmac_clean_range); - outer_clean_range(paddr, paddr + size); - break; - case DMA_FROM_DEVICE: - dma_cache_maint(paddr, size, dmac_inv_range); - outer_inv_range(paddr, paddr + size); - break; - case DMA_BIDIRECTIONAL: - if (arch_sync_dma_cpu_needs_post_dma_flush()) { - dma_cache_maint(paddr, size, dmac_clean_range); - outer_clean_range(paddr, paddr + size); - } else { - dma_cache_maint(paddr, size, dmac_flush_range); - outer_flush_range(paddr, paddr + size); - } - break; - default: - break; - } -} - -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { - outer_inv_range(paddr, paddr + size); - dma_cache_maint(paddr, size, dmac_inv_range); - } -} +#include <linux/dma-sync.h> #ifdef CONFIG_ARM_DMA_USE_IOMMU diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 5240f6acad64..bae741aa65e9 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -13,25 +13,33 @@ #include <asm/cacheflush.h> #include <asm/xen/xen-ops.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - unsigned long start = (unsigned long)phys_to_virt(paddr); + dcache_clean_poc(paddr, paddr + size); +} - dcache_clean_poc(start, start + size); +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dcache_inval_poc(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - unsigned long start = (unsigned long)phys_to_virt(paddr); + dcache_clean_inval_poc(paddr, paddr + size); +} - if (dir == DMA_TO_DEVICE) - return; +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} - dcache_inval_poc(start, start + size); +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index c90f912e2822..9402e101b363 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t size) cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wb_range); - break; - default: - BUG(); - } + cache_op(paddr, size, dma_wb_range); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - return; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_inv_range); - break; - default: - BUG(); - } + cache_op(paddr, size, dma_inv_range); } + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + cache_op(paddr, size, dma_wbinv_range); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index 882680e81a30..e6538128a75b 100644 --- a/arch/hexagon/kernel/dma.c +++ b/arch/hexagon/kernel/dma.c @@ -9,29 +9,33 @@ #include <linux/memblock.h> #include <asm/page.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - void *addr = phys_to_virt(paddr); - - switch (dir) { - case DMA_TO_DEVICE: - hexagon_clean_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - case DMA_FROM_DEVICE: - hexagon_inv_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - case DMA_BIDIRECTIONAL: - flush_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - default: - BUG(); - } + hexagon_clean_dcache_range(paddr, paddr + size); } +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) +{ + hexagon_inv_dcache_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t size) +{ + hexagon_flush_dcache_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + /* * Our max_low_pfn should have been backed off by 16MB in mm/init.c to create * DMA coherent space. Use that for the pool. diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index 2e192a5df949..aa9b434e6df8 100644 --- a/arch/m68k/kernel/dma.c +++ b/arch/m68k/kernel/dma.c @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - cache_push(handle, size); - break; - case DMA_FROM_DEVICE: - cache_clear(handle, size); - break; - default: - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir %u\n", - dir); - break; - } + /* + * cache_push() always invalidates in addition to cleaning + * write-back caches. + */ + cache_push(paddr, size); +} + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + cache_clear(paddr, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + cache_push(paddr, size); } + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index b4c4e45fd45e..01110d4aa5b0 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,32 +14,30 @@ #include <linux/bug.h> #include <asm/cacheflush.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_TO_DEVICE: - case DMA_BIDIRECTIONAL: - flush_dcache_range(paddr, paddr + size); - break; - case DMA_FROM_DEVICE: - invalidate_dcache_range(paddr, paddr + size); - break; - default: - BUG(); - } + /* writeback plus invalidate, could be a nop on WT caches */ + flush_dcache_range(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_TO_DEVICE: - break; - case DMA_BIDIRECTIONAL: - case DMA_FROM_DEVICE: - invalidate_dcache_range(paddr, paddr + size); - break; - default: - BUG(); - }} + invalidate_dcache_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + flush_dcache_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b9d68bcc5d53..902d4b7c1f85 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } while (left); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_sync_phys(paddr, size, _dma_cache_wback); - break; - case DMA_FROM_DEVICE: - dma_sync_phys(paddr, size, _dma_cache_inv); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, _dma_cache_wback); - else - dma_sync_phys(paddr, size, _dma_cache_wback_inv); - break; - default: - break; - } + dma_sync_phys(paddr, size, _dma_cache_wback); } -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, _dma_cache_inv); - break; - default: - break; - } + dma_sync_phys(paddr, size, _dma_cache_inv); } -#endif + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dma_sync_phys(paddr, size, _dma_cache_wback_inv); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush(); +} + +#include <linux/dma-sync.h> #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, - const struct iommu_ops *iommu, bool coherent) + const struct iommu_ops *iommu, bool coherent) { - dev->dma_coherent = coherent; + dev->dma_coherent = coherent; } #endif diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index fd887d5f3f9a..29978970955e 100644 --- a/arch/nios2/mm/dma-mapping.c +++ b/arch/nios2/mm/dma-mapping.c @@ -13,53 +13,46 @@ #include <linux/types.h> #include <linux/mm.h> #include <linux/string.h> +#include <linux/dma-map-ops.h> #include <linux/dma-mapping.h> #include <linux/io.h> #include <linux/cache.h> #include <asm/cacheflush.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { + /* + * We just need to write back the caches here, but Nios2 flush + * instruction will do both writeback and invalidate. + */ void *vaddr = phys_to_virt(paddr); + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); +} - switch (dir) { - case DMA_FROM_DEVICE: - invalidate_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - case DMA_TO_DEVICE: - /* - * We just need to flush the caches here , but Nios2 flush - * instruction will do both writeback and invalidate. - */ - case DMA_BIDIRECTIONAL: /* flush and invalidate */ - flush_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - default: - BUG(); - } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} - switch (dir) { - case DMA_BIDIRECTIONAL: - case DMA_FROM_DEVICE: - invalidate_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - case DMA_TO_DEVICE: - break; - default: - BUG(); - } +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index 91a00d09ffad..aba2258e62eb 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) mmap_write_unlock(&init_mm); } -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { unsigned long cl; struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; - switch (dir) { - case DMA_TO_DEVICE: - /* Write back the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBWR, cl); - break; - case DMA_FROM_DEVICE: - /* Invalidate the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBIR, cl); - break; - case DMA_BIDIRECTIONAL: - /* Flush the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); - break; - default: - break; - } + /* Write back the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBWR, cl); } + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + unsigned long cl; + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + + /* Invalidate the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBIR, cl); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + unsigned long cl; + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + + /* Flush the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index 6d3d3cffb316..a7955aab8ce2 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, free_pages((unsigned long)__va(dma_handle), order); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { unsigned long virt = (unsigned long)phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - clean_kernel_dcache_range(virt, size); - break; - case DMA_FROM_DEVICE: - clean_kernel_dcache_range(virt, size); - break; - case DMA_BIDIRECTIONAL: - flush_kernel_dcache_range(virt, size); - break; - } + clean_kernel_dcache_range(virt, size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { unsigned long virt = (unsigned long)phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - purge_kernel_dcache_range(virt, size); - break; - } + purge_kernel_dcache_range(virt, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + flush_kernel_dcache_range(virt, size); } + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 00e59a4faa2b..268510c71156 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) #endif } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { __dma_phys_op(start, end, DMA_CACHE_CLEAN); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - } + __dma_phys_op(start, end, DMA_CACHE_INVAL); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long kaddr = (unsigned long)page_address(page); diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 69c80b2155a1..b9a9f57e02be 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -12,43 +12,40 @@ static bool noncoherent_supported; -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - case DMA_BIDIRECTIONAL: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - default: - break; - } + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); - break; - default: - break; - } + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + void *vaddr = phys_to_virt(paddr); + + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + + void arch_dma_prep_coherent(struct page *page, size_t size) { void *flush_addr = page_address(page); diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index 6a44c0e7ba40..41f031ae7609 100644 --- a/arch/sh/kernel/dma-coherent.c +++ b/arch/sh/kernel/dma-coherent.c @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t size) __flush_purge_region(page_address(page), size); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); - switch (dir) { - case DMA_FROM_DEVICE: /* invalidate only */ - __flush_invalidate_region(addr, size); - break; - case DMA_TO_DEVICE: /* writeback only */ - __flush_wback_region(addr, size); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __flush_purge_region(addr, size); - break; - default: - BUG(); - } + __flush_wback_region(addr, size); } + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); + + __flush_invalidate_region(addr, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); + + __flush_purge_region(addr, size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4f3d26066ec2..6926ead2f208 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); #endif /* CONFIG_SBUS */ -/* - * IIep is write-through, not flushing on cpu to device transfer. - * - * On LEON systems without cache snooping, the entire D-CACHE must be flushed to - * make DMA to cacheable memory coherent. - */ -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - if (dir != DMA_TO_DEVICE && - sparc_cpu_model == sparc_leon && + /* IIep is write-through, not flushing on cpu to device transfer. */ +} + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + /* + * On LEON systems without cache snooping, the entire D-CACHE must be + * flushed to make DMA to cacheable memory coherent. + */ + if (sparc_cpu_model == sparc_leon && !sparc_leon3_snooping_enabled()) leon_flush_dcache_all(); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + arch_dma_cache_inv(paddr, size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + #ifdef CONFIG_PROC_FS static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index ff3bf015eca4..d4ff96585545 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - do_cache_op(paddr, size, __flush_dcache_range); - break; - case DMA_FROM_DEVICE: - do_cache_op(paddr, size, __invalidate_dcache_range); - break; - case DMA_BIDIRECTIONAL: - do_cache_op(paddr, size, __flush_invalidate_dcache_range); - break; - default: - break; - } + do_cache_op(paddr, size, __flush_dcache_range); } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + do_cache_op(paddr, size, __invalidate_dcache_range); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + do_cache_op(paddr, size, __flush_invalidate_dcache_range); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + + void arch_dma_prep_coherent(struct page *page, size_t size) { __invalidate_dcache_range((unsigned long)page_address(page), size); diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file mode 100644 index 000000000000..18e33d5e8eaf --- /dev/null +++ b/include/linux/dma-sync.h @@ -0,0 +1,107 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Cache operations depending on function and direction argument, inspired by + * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] + * dma-mapping: provide a generic dma-noncoherent implementation)" + * + * | map == for_device | unmap == for_cpu + * |---------------------------------------------------------------- + * TO_DEV | writeback writeback | none none + * FROM_DEV | invalidate invalidate | invalidate* invalidate* + * BIDIR | writeback writeback | invalidate invalidate + * + * [*] needed for CPU speculative prefetches + * + * NOTE: we don't check the validity of direction argument as it is done in + * upper layer functions (in include/linux/dma-mapping.h) + * + * This file can be included by arch/.../kernel/dma-noncoherent.c to provide + * the respective high-level operations without having to expose the + * cache management ops to drivers. + */ + +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + /* + * This may be an empty function on write-through caches, + * and it might invalidate the cache if an architecture has + * a write-back cache but no way to write it back without + * invalidating + */ + arch_dma_cache_wback(paddr, size); + break; + + case DMA_FROM_DEVICE: + /* + * FIXME: this should be handled the same across all + * architectures, see + * https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ + */ + if (!arch_sync_dma_clean_before_fromdevice()) { + arch_dma_cache_inv(paddr, size); + break; + } + fallthrough; + + case DMA_BIDIRECTIONAL: + /* Skip the invalidate here if it's done later */ + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + arch_sync_dma_cpu_needs_post_dma_flush()) + arch_dma_cache_wback(paddr, size); + else + arch_dma_cache_wback_inv(paddr, size); + break; + + default: + break; + } +} + +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) +{ +#ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, &page->flags); + left -= PAGE_SIZE; + } +#endif +} + +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + break; + + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ + if (arch_sync_dma_cpu_needs_post_dma_flush()) + arch_dma_cache_inv(paddr, size); + + if (size > PAGE_SIZE) + arch_dma_mark_dcache_clean(paddr, size); + break; + + default: + break; + } +} +#endif -- 2.39.2 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa From: Arnd Bergmann <arnd@arndb.de> Now that all of these have consistent behavior, replace them with a single shared implementation of arch_sync_dma_for_device() and arch_sync_dma_for_cpu() and three parameters to pick how they should operate: - If the CPU has speculative prefetching, then the cache has to be invalidated after a transfer from the device. On the rarer CPUs without prefetching, this can be skipped, with all cache management happening before the transfer. This flag can be runtime detected, but is usually fixed per architecture. - Some architectures currently clean the caches before DMA from a device, while others invalidate it. There has not been a conclusion regarding whether we should change all architectures to use clean instead, so this adds an architecture specific flag that we can change later on. - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps track pages that are marked clean in the page cache, to avoid flushing them again. The implementation for this is generic enough to work on all architectures that use the PG_dcache_clean page flag, but a Kconfig symbol is used to only enable it on Arm to preserve the existing behavior. For the function naming, I picked 'wback' over 'clean', and 'wback_inv' over 'flush', to avoid any ambiguity of what the helper functions are supposed to do. Moving the global functions into a header file is usually a bad idea as it prevents the header from being included more than once, but it helps keep the behavior as close as possible to the previous state, including the possibility of inlining most of it into these functions where that was done before. This also helps keep the global namespace clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use them incorrectly. It would be possible to do this one architecture at a time, but as the change is the same everywhere, the combined patch helps explain it better once. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arc/mm/dma.c | 66 +++++------------- arch/arm/Kconfig | 3 + arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- arch/arm/mm/dma-mapping.c | 64 +++++++----------- arch/arm64/mm/dma-mapping.c | 28 +++++--- arch/csky/mm/dma-mapping.c | 44 ++++++------ arch/hexagon/kernel/dma.c | 44 ++++++------ arch/m68k/kernel/dma.c | 43 +++++++----- arch/microblaze/kernel/dma.c | 48 +++++++------- arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- arch/nios2/mm/dma-mapping.c | 57 +++++++--------- arch/openrisc/kernel/dma.c | 63 +++++++++++------- arch/parisc/kernel/pci-dma.c | 46 ++++++------- arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- arch/sh/kernel/dma-coherent.c | 43 +++++++----- arch/sparc/kernel/ioport.c | 38 ++++++++--- arch/xtensa/kernel/pci-dma.c | 40 ++++++----- include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 include/linux/dma-sync.h diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index ddb96786f765..61cd01646222 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) dma_cache_wback_inv(page_to_phys(page), size); } -/* - * Cache operations depending on function and direction argument, inspired by - * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] - * dma-mapping: provide a generic dma-noncoherent implementation)" - * - * | map == for_device | unmap == for_cpu - * |---------------------------------------------------------------- - * TO_DEV | writeback writeback | none none - * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR | writeback writeback | invalidate invalidate - * - * [*] needed for CPU speculative prefetches - * - * NOTE: we don't check the validity of direction argument as it is done in - * upper layer functions (in include/linux/dma-mapping.h) - */ - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback(paddr, size); - break; - - case DMA_FROM_DEVICE: - dma_cache_inv(paddr, size); - break; - - case DMA_BIDIRECTIONAL: - dma_cache_wback(paddr, size); - break; + dma_cache_wback(paddr, size); +} - default: - break; - } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_inv(paddr, size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - break; + dma_cache_wback_inv(paddr, size); +} - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv(paddr, size); - break; +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} - default: - break; - } +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + /* * Plug in direct dma map ops. */ diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 125d58c54ab1..0de84e861027 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT bool default y +config ARCH_DMA_MARK_DCACHE_CLEAN + def_bool y + config ARCH_HAS_ILOG2_U32 bool diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index 12b5c6ae93fc..0817274aed15 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -13,27 +13,36 @@ #include "dma.h" -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - if (dir == DMA_FROM_DEVICE) { - dmac_inv_range(__va(paddr), __va(paddr + size)); - outer_inv_range(paddr, paddr + size); - } else { - dmac_clean_range(__va(paddr), __va(paddr + size)); - outer_clean_range(paddr, paddr + size); - } + dmac_clean_range(__va(paddr), __va(paddr + size)); + outer_clean_range(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - if (dir != DMA_TO_DEVICE) { - outer_inv_range(paddr, paddr + size); - dmac_inv_range(__va(paddr), __va(paddr)); - } + dmac_inv_range(__va(paddr), __va(paddr + size)); + outer_inv_range(paddr, paddr + size); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dmac_flush_range(__va(paddr), __va(paddr + size)); + outer_flush_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu, bool coherent) { diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index b703cb83d27e..aa6ee820a0ab 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t size) } } +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); +} + + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_flush_range); + outer_flush_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -699,45 +723,7 @@ static bool arch_sync_dma_cpu_needs_post_dma_flush(void) return false; } -/* - * Make an area consistent for devices. - * Note: Drivers should NOT use this function directly. - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) - */ -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_maint(paddr, size, dmac_clean_range); - outer_clean_range(paddr, paddr + size); - break; - case DMA_FROM_DEVICE: - dma_cache_maint(paddr, size, dmac_inv_range); - outer_inv_range(paddr, paddr + size); - break; - case DMA_BIDIRECTIONAL: - if (arch_sync_dma_cpu_needs_post_dma_flush()) { - dma_cache_maint(paddr, size, dmac_clean_range); - outer_clean_range(paddr, paddr + size); - } else { - dma_cache_maint(paddr, size, dmac_flush_range); - outer_flush_range(paddr, paddr + size); - } - break; - default: - break; - } -} - -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { - outer_inv_range(paddr, paddr + size); - dma_cache_maint(paddr, size, dmac_inv_range); - } -} +#include <linux/dma-sync.h> #ifdef CONFIG_ARM_DMA_USE_IOMMU diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 5240f6acad64..bae741aa65e9 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -13,25 +13,33 @@ #include <asm/cacheflush.h> #include <asm/xen/xen-ops.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - unsigned long start = (unsigned long)phys_to_virt(paddr); + dcache_clean_poc(paddr, paddr + size); +} - dcache_clean_poc(start, start + size); +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dcache_inval_poc(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - unsigned long start = (unsigned long)phys_to_virt(paddr); + dcache_clean_inval_poc(paddr, paddr + size); +} - if (dir == DMA_TO_DEVICE) - return; +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} - dcache_inval_poc(start, start + size); +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index c90f912e2822..9402e101b363 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t size) cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wb_range); - break; - default: - BUG(); - } + cache_op(paddr, size, dma_wb_range); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - return; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_inv_range); - break; - default: - BUG(); - } + cache_op(paddr, size, dma_inv_range); } + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + cache_op(paddr, size, dma_wbinv_range); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index 882680e81a30..e6538128a75b 100644 --- a/arch/hexagon/kernel/dma.c +++ b/arch/hexagon/kernel/dma.c @@ -9,29 +9,33 @@ #include <linux/memblock.h> #include <asm/page.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - void *addr = phys_to_virt(paddr); - - switch (dir) { - case DMA_TO_DEVICE: - hexagon_clean_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - case DMA_FROM_DEVICE: - hexagon_inv_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - case DMA_BIDIRECTIONAL: - flush_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - default: - BUG(); - } + hexagon_clean_dcache_range(paddr, paddr + size); } +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) +{ + hexagon_inv_dcache_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t size) +{ + hexagon_flush_dcache_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + /* * Our max_low_pfn should have been backed off by 16MB in mm/init.c to create * DMA coherent space. Use that for the pool. diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index 2e192a5df949..aa9b434e6df8 100644 --- a/arch/m68k/kernel/dma.c +++ b/arch/m68k/kernel/dma.c @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - cache_push(handle, size); - break; - case DMA_FROM_DEVICE: - cache_clear(handle, size); - break; - default: - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir %u\n", - dir); - break; - } + /* + * cache_push() always invalidates in addition to cleaning + * write-back caches. + */ + cache_push(paddr, size); +} + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + cache_clear(paddr, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + cache_push(paddr, size); } + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index b4c4e45fd45e..01110d4aa5b0 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,32 +14,30 @@ #include <linux/bug.h> #include <asm/cacheflush.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_TO_DEVICE: - case DMA_BIDIRECTIONAL: - flush_dcache_range(paddr, paddr + size); - break; - case DMA_FROM_DEVICE: - invalidate_dcache_range(paddr, paddr + size); - break; - default: - BUG(); - } + /* writeback plus invalidate, could be a nop on WT caches */ + flush_dcache_range(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_TO_DEVICE: - break; - case DMA_BIDIRECTIONAL: - case DMA_FROM_DEVICE: - invalidate_dcache_range(paddr, paddr + size); - break; - default: - BUG(); - }} + invalidate_dcache_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + flush_dcache_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b9d68bcc5d53..902d4b7c1f85 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } while (left); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_sync_phys(paddr, size, _dma_cache_wback); - break; - case DMA_FROM_DEVICE: - dma_sync_phys(paddr, size, _dma_cache_inv); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, _dma_cache_wback); - else - dma_sync_phys(paddr, size, _dma_cache_wback_inv); - break; - default: - break; - } + dma_sync_phys(paddr, size, _dma_cache_wback); } -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, _dma_cache_inv); - break; - default: - break; - } + dma_sync_phys(paddr, size, _dma_cache_inv); } -#endif + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dma_sync_phys(paddr, size, _dma_cache_wback_inv); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush(); +} + +#include <linux/dma-sync.h> #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, - const struct iommu_ops *iommu, bool coherent) + const struct iommu_ops *iommu, bool coherent) { - dev->dma_coherent = coherent; + dev->dma_coherent = coherent; } #endif diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index fd887d5f3f9a..29978970955e 100644 --- a/arch/nios2/mm/dma-mapping.c +++ b/arch/nios2/mm/dma-mapping.c @@ -13,53 +13,46 @@ #include <linux/types.h> #include <linux/mm.h> #include <linux/string.h> +#include <linux/dma-map-ops.h> #include <linux/dma-mapping.h> #include <linux/io.h> #include <linux/cache.h> #include <asm/cacheflush.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { + /* + * We just need to write back the caches here, but Nios2 flush + * instruction will do both writeback and invalidate. + */ void *vaddr = phys_to_virt(paddr); + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); +} - switch (dir) { - case DMA_FROM_DEVICE: - invalidate_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - case DMA_TO_DEVICE: - /* - * We just need to flush the caches here , but Nios2 flush - * instruction will do both writeback and invalidate. - */ - case DMA_BIDIRECTIONAL: /* flush and invalidate */ - flush_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - default: - BUG(); - } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} - switch (dir) { - case DMA_BIDIRECTIONAL: - case DMA_FROM_DEVICE: - invalidate_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - case DMA_TO_DEVICE: - break; - default: - BUG(); - } +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index 91a00d09ffad..aba2258e62eb 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) mmap_write_unlock(&init_mm); } -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { unsigned long cl; struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; - switch (dir) { - case DMA_TO_DEVICE: - /* Write back the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBWR, cl); - break; - case DMA_FROM_DEVICE: - /* Invalidate the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBIR, cl); - break; - case DMA_BIDIRECTIONAL: - /* Flush the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); - break; - default: - break; - } + /* Write back the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBWR, cl); } + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + unsigned long cl; + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + + /* Invalidate the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBIR, cl); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + unsigned long cl; + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + + /* Flush the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index 6d3d3cffb316..a7955aab8ce2 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, free_pages((unsigned long)__va(dma_handle), order); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { unsigned long virt = (unsigned long)phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - clean_kernel_dcache_range(virt, size); - break; - case DMA_FROM_DEVICE: - clean_kernel_dcache_range(virt, size); - break; - case DMA_BIDIRECTIONAL: - flush_kernel_dcache_range(virt, size); - break; - } + clean_kernel_dcache_range(virt, size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { unsigned long virt = (unsigned long)phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - purge_kernel_dcache_range(virt, size); - break; - } + purge_kernel_dcache_range(virt, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + flush_kernel_dcache_range(virt, size); } + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 00e59a4faa2b..268510c71156 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) #endif } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { __dma_phys_op(start, end, DMA_CACHE_CLEAN); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - } + __dma_phys_op(start, end, DMA_CACHE_INVAL); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long kaddr = (unsigned long)page_address(page); diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 69c80b2155a1..b9a9f57e02be 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -12,43 +12,40 @@ static bool noncoherent_supported; -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - case DMA_BIDIRECTIONAL: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - default: - break; - } + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); - break; - default: - break; - } + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + void *vaddr = phys_to_virt(paddr); + + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + + void arch_dma_prep_coherent(struct page *page, size_t size) { void *flush_addr = page_address(page); diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index 6a44c0e7ba40..41f031ae7609 100644 --- a/arch/sh/kernel/dma-coherent.c +++ b/arch/sh/kernel/dma-coherent.c @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t size) __flush_purge_region(page_address(page), size); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); - switch (dir) { - case DMA_FROM_DEVICE: /* invalidate only */ - __flush_invalidate_region(addr, size); - break; - case DMA_TO_DEVICE: /* writeback only */ - __flush_wback_region(addr, size); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __flush_purge_region(addr, size); - break; - default: - BUG(); - } + __flush_wback_region(addr, size); } + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); + + __flush_invalidate_region(addr, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); + + __flush_purge_region(addr, size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4f3d26066ec2..6926ead2f208 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); #endif /* CONFIG_SBUS */ -/* - * IIep is write-through, not flushing on cpu to device transfer. - * - * On LEON systems without cache snooping, the entire D-CACHE must be flushed to - * make DMA to cacheable memory coherent. - */ -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - if (dir != DMA_TO_DEVICE && - sparc_cpu_model == sparc_leon && + /* IIep is write-through, not flushing on cpu to device transfer. */ +} + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + /* + * On LEON systems without cache snooping, the entire D-CACHE must be + * flushed to make DMA to cacheable memory coherent. + */ + if (sparc_cpu_model == sparc_leon && !sparc_leon3_snooping_enabled()) leon_flush_dcache_all(); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + arch_dma_cache_inv(paddr, size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + #ifdef CONFIG_PROC_FS static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index ff3bf015eca4..d4ff96585545 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - do_cache_op(paddr, size, __flush_dcache_range); - break; - case DMA_FROM_DEVICE: - do_cache_op(paddr, size, __invalidate_dcache_range); - break; - case DMA_BIDIRECTIONAL: - do_cache_op(paddr, size, __flush_invalidate_dcache_range); - break; - default: - break; - } + do_cache_op(paddr, size, __flush_dcache_range); } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + do_cache_op(paddr, size, __invalidate_dcache_range); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + do_cache_op(paddr, size, __flush_invalidate_dcache_range); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + + void arch_dma_prep_coherent(struct page *page, size_t size) { __invalidate_dcache_range((unsigned long)page_address(page), size); diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file mode 100644 index 000000000000..18e33d5e8eaf --- /dev/null +++ b/include/linux/dma-sync.h @@ -0,0 +1,107 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Cache operations depending on function and direction argument, inspired by + * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] + * dma-mapping: provide a generic dma-noncoherent implementation)" + * + * | map == for_device | unmap == for_cpu + * |---------------------------------------------------------------- + * TO_DEV | writeback writeback | none none + * FROM_DEV | invalidate invalidate | invalidate* invalidate* + * BIDIR | writeback writeback | invalidate invalidate + * + * [*] needed for CPU speculative prefetches + * + * NOTE: we don't check the validity of direction argument as it is done in + * upper layer functions (in include/linux/dma-mapping.h) + * + * This file can be included by arch/.../kernel/dma-noncoherent.c to provide + * the respective high-level operations without having to expose the + * cache management ops to drivers. + */ + +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + /* + * This may be an empty function on write-through caches, + * and it might invalidate the cache if an architecture has + * a write-back cache but no way to write it back without + * invalidating + */ + arch_dma_cache_wback(paddr, size); + break; + + case DMA_FROM_DEVICE: + /* + * FIXME: this should be handled the same across all + * architectures, see + * https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ + */ + if (!arch_sync_dma_clean_before_fromdevice()) { + arch_dma_cache_inv(paddr, size); + break; + } + fallthrough; + + case DMA_BIDIRECTIONAL: + /* Skip the invalidate here if it's done later */ + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + arch_sync_dma_cpu_needs_post_dma_flush()) + arch_dma_cache_wback(paddr, size); + else + arch_dma_cache_wback_inv(paddr, size); + break; + + default: + break; + } +} + +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) +{ +#ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, &page->flags); + left -= PAGE_SIZE; + } +#endif +} + +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + break; + + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ + if (arch_sync_dma_cpu_needs_post_dma_flush()) + arch_dma_cache_inv(paddr, size); + + if (size > PAGE_SIZE) + arch_dma_mark_dcache_clean(paddr, size); + break; + + default: + break; + } +} +#endif -- 2.39.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-27 12:13 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-27 12:13 UTC (permalink / raw) To: linux-kernel Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov From: Arnd Bergmann <arnd@arndb.de> Now that all of these have consistent behavior, replace them with a single shared implementation of arch_sync_dma_for_device() and arch_sync_dma_for_cpu() and three parameters to pick how they should operate: - If the CPU has speculative prefetching, then the cache has to be invalidated after a transfer from the device. On the rarer CPUs without prefetching, this can be skipped, with all cache management happening before the transfer. This flag can be runtime detected, but is usually fixed per architecture. - Some architectures currently clean the caches before DMA from a device, while others invalidate it. There has not been a conclusion regarding whether we should change all architectures to use clean instead, so this adds an architecture specific flag that we can change later on. - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps track pages that are marked clean in the page cache, to avoid flushing them again. The implementation for this is generic enough to work on all architectures that use the PG_dcache_clean page flag, but a Kconfig symbol is used to only enable it on Arm to preserve the existing behavior. For the function naming, I picked 'wback' over 'clean', and 'wback_inv' over 'flush', to avoid any ambiguity of what the helper functions are supposed to do. Moving the global functions into a header file is usually a bad idea as it prevents the header from being included more than once, but it helps keep the behavior as close as possible to the previous state, including the possibility of inlining most of it into these functions where that was done before. This also helps keep the global namespace clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use them incorrectly. It would be possible to do this one architecture at a time, but as the change is the same everywhere, the combined patch helps explain it better once. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arc/mm/dma.c | 66 +++++------------- arch/arm/Kconfig | 3 + arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- arch/arm/mm/dma-mapping.c | 64 +++++++----------- arch/arm64/mm/dma-mapping.c | 28 +++++--- arch/csky/mm/dma-mapping.c | 44 ++++++------ arch/hexagon/kernel/dma.c | 44 ++++++------ arch/m68k/kernel/dma.c | 43 +++++++----- arch/microblaze/kernel/dma.c | 48 +++++++------- arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- arch/nios2/mm/dma-mapping.c | 57 +++++++--------- arch/openrisc/kernel/dma.c | 63 +++++++++++------- arch/parisc/kernel/pci-dma.c | 46 ++++++------- arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- arch/sh/kernel/dma-coherent.c | 43 +++++++----- arch/sparc/kernel/ioport.c | 38 ++++++++--- arch/xtensa/kernel/pci-dma.c | 40 ++++++----- include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 include/linux/dma-sync.h diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index ddb96786f765..61cd01646222 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) dma_cache_wback_inv(page_to_phys(page), size); } -/* - * Cache operations depending on function and direction argument, inspired by - * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] - * dma-mapping: provide a generic dma-noncoherent implementation)" - * - * | map == for_device | unmap == for_cpu - * |---------------------------------------------------------------- - * TO_DEV | writeback writeback | none none - * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR | writeback writeback | invalidate invalidate - * - * [*] needed for CPU speculative prefetches - * - * NOTE: we don't check the validity of direction argument as it is done in - * upper layer functions (in include/linux/dma-mapping.h) - */ - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback(paddr, size); - break; - - case DMA_FROM_DEVICE: - dma_cache_inv(paddr, size); - break; - - case DMA_BIDIRECTIONAL: - dma_cache_wback(paddr, size); - break; + dma_cache_wback(paddr, size); +} - default: - break; - } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_inv(paddr, size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - break; + dma_cache_wback_inv(paddr, size); +} - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv(paddr, size); - break; +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} - default: - break; - } +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + /* * Plug in direct dma map ops. */ diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 125d58c54ab1..0de84e861027 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT bool default y +config ARCH_DMA_MARK_DCACHE_CLEAN + def_bool y + config ARCH_HAS_ILOG2_U32 bool diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index 12b5c6ae93fc..0817274aed15 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -13,27 +13,36 @@ #include "dma.h" -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - if (dir == DMA_FROM_DEVICE) { - dmac_inv_range(__va(paddr), __va(paddr + size)); - outer_inv_range(paddr, paddr + size); - } else { - dmac_clean_range(__va(paddr), __va(paddr + size)); - outer_clean_range(paddr, paddr + size); - } + dmac_clean_range(__va(paddr), __va(paddr + size)); + outer_clean_range(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - if (dir != DMA_TO_DEVICE) { - outer_inv_range(paddr, paddr + size); - dmac_inv_range(__va(paddr), __va(paddr)); - } + dmac_inv_range(__va(paddr), __va(paddr + size)); + outer_inv_range(paddr, paddr + size); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dmac_flush_range(__va(paddr), __va(paddr + size)); + outer_flush_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu, bool coherent) { diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index b703cb83d27e..aa6ee820a0ab 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t size) } } +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); +} + + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dma_cache_maint(paddr, size, dmac_flush_range); + outer_flush_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -699,45 +723,7 @@ static bool arch_sync_dma_cpu_needs_post_dma_flush(void) return false; } -/* - * Make an area consistent for devices. - * Note: Drivers should NOT use this function directly. - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) - */ -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_maint(paddr, size, dmac_clean_range); - outer_clean_range(paddr, paddr + size); - break; - case DMA_FROM_DEVICE: - dma_cache_maint(paddr, size, dmac_inv_range); - outer_inv_range(paddr, paddr + size); - break; - case DMA_BIDIRECTIONAL: - if (arch_sync_dma_cpu_needs_post_dma_flush()) { - dma_cache_maint(paddr, size, dmac_clean_range); - outer_clean_range(paddr, paddr + size); - } else { - dma_cache_maint(paddr, size, dmac_flush_range); - outer_flush_range(paddr, paddr + size); - } - break; - default: - break; - } -} - -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { - outer_inv_range(paddr, paddr + size); - dma_cache_maint(paddr, size, dmac_inv_range); - } -} +#include <linux/dma-sync.h> #ifdef CONFIG_ARM_DMA_USE_IOMMU diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 5240f6acad64..bae741aa65e9 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -13,25 +13,33 @@ #include <asm/cacheflush.h> #include <asm/xen/xen-ops.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - unsigned long start = (unsigned long)phys_to_virt(paddr); + dcache_clean_poc(paddr, paddr + size); +} - dcache_clean_poc(start, start + size); +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + dcache_inval_poc(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - unsigned long start = (unsigned long)phys_to_virt(paddr); + dcache_clean_inval_poc(paddr, paddr + size); +} - if (dir == DMA_TO_DEVICE) - return; +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} - dcache_inval_poc(start, start + size); +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index c90f912e2822..9402e101b363 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t size) cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wb_range); - break; - default: - BUG(); - } + cache_op(paddr, size, dma_wb_range); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - return; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_inv_range); - break; - default: - BUG(); - } + cache_op(paddr, size, dma_inv_range); } + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + cache_op(paddr, size, dma_wbinv_range); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index 882680e81a30..e6538128a75b 100644 --- a/arch/hexagon/kernel/dma.c +++ b/arch/hexagon/kernel/dma.c @@ -9,29 +9,33 @@ #include <linux/memblock.h> #include <asm/page.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - void *addr = phys_to_virt(paddr); - - switch (dir) { - case DMA_TO_DEVICE: - hexagon_clean_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - case DMA_FROM_DEVICE: - hexagon_inv_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - case DMA_BIDIRECTIONAL: - flush_dcache_range((unsigned long) addr, - (unsigned long) addr + size); - break; - default: - BUG(); - } + hexagon_clean_dcache_range(paddr, paddr + size); } +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) +{ + hexagon_inv_dcache_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t size) +{ + hexagon_flush_dcache_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + /* * Our max_low_pfn should have been backed off by 16MB in mm/init.c to create * DMA coherent space. Use that for the pool. diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index 2e192a5df949..aa9b434e6df8 100644 --- a/arch/m68k/kernel/dma.c +++ b/arch/m68k/kernel/dma.c @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - cache_push(handle, size); - break; - case DMA_FROM_DEVICE: - cache_clear(handle, size); - break; - default: - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir %u\n", - dir); - break; - } + /* + * cache_push() always invalidates in addition to cleaning + * write-back caches. + */ + cache_push(paddr, size); +} + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + cache_clear(paddr, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + cache_push(paddr, size); } + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index b4c4e45fd45e..01110d4aa5b0 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,32 +14,30 @@ #include <linux/bug.h> #include <asm/cacheflush.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_TO_DEVICE: - case DMA_BIDIRECTIONAL: - flush_dcache_range(paddr, paddr + size); - break; - case DMA_FROM_DEVICE: - invalidate_dcache_range(paddr, paddr + size); - break; - default: - BUG(); - } + /* writeback plus invalidate, could be a nop on WT caches */ + flush_dcache_range(paddr, paddr + size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_TO_DEVICE: - break; - case DMA_BIDIRECTIONAL: - case DMA_FROM_DEVICE: - invalidate_dcache_range(paddr, paddr + size); - break; - default: - BUG(); - }} + invalidate_dcache_range(paddr, paddr + size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + flush_dcache_range(paddr, paddr + size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b9d68bcc5d53..902d4b7c1f85 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } while (left); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_sync_phys(paddr, size, _dma_cache_wback); - break; - case DMA_FROM_DEVICE: - dma_sync_phys(paddr, size, _dma_cache_inv); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, _dma_cache_wback); - else - dma_sync_phys(paddr, size, _dma_cache_wback_inv); - break; - default: - break; - } + dma_sync_phys(paddr, size, _dma_cache_wback); } -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, _dma_cache_inv); - break; - default: - break; - } + dma_sync_phys(paddr, size, _dma_cache_inv); } -#endif + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + dma_sync_phys(paddr, size, _dma_cache_wback_inv); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush(); +} + +#include <linux/dma-sync.h> #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, - const struct iommu_ops *iommu, bool coherent) + const struct iommu_ops *iommu, bool coherent) { - dev->dma_coherent = coherent; + dev->dma_coherent = coherent; } #endif diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index fd887d5f3f9a..29978970955e 100644 --- a/arch/nios2/mm/dma-mapping.c +++ b/arch/nios2/mm/dma-mapping.c @@ -13,53 +13,46 @@ #include <linux/types.h> #include <linux/mm.h> #include <linux/string.h> +#include <linux/dma-map-ops.h> #include <linux/dma-mapping.h> #include <linux/io.h> #include <linux/cache.h> #include <asm/cacheflush.h> -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { + /* + * We just need to write back the caches here, but Nios2 flush + * instruction will do both writeback and invalidate. + */ void *vaddr = phys_to_virt(paddr); + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); +} - switch (dir) { - case DMA_FROM_DEVICE: - invalidate_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - case DMA_TO_DEVICE: - /* - * We just need to flush the caches here , but Nios2 flush - * instruction will do both writeback and invalidate. - */ - case DMA_BIDIRECTIONAL: /* flush and invalidate */ - flush_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - default: - BUG(); - } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} - switch (dir) { - case DMA_BIDIRECTIONAL: - case DMA_FROM_DEVICE: - invalidate_dcache_range((unsigned long)vaddr, - (unsigned long)(vaddr + size)); - break; - case DMA_TO_DEVICE: - break; - default: - BUG(); - } +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; } +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index 91a00d09ffad..aba2258e62eb 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) mmap_write_unlock(&init_mm); } -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { unsigned long cl; struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; - switch (dir) { - case DMA_TO_DEVICE: - /* Write back the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBWR, cl); - break; - case DMA_FROM_DEVICE: - /* Invalidate the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBIR, cl); - break; - case DMA_BIDIRECTIONAL: - /* Flush the dcache for the requested range */ - for (cl = addr; cl < addr + size; - cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); - break; - default: - break; - } + /* Write back the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBWR, cl); } + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + unsigned long cl; + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + + /* Invalidate the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBIR, cl); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + unsigned long cl; + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + + /* Flush the dcache for the requested range */ + for (cl = paddr; cl < paddr + size; + cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index 6d3d3cffb316..a7955aab8ce2 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, free_pages((unsigned long)__va(dma_handle), order); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { unsigned long virt = (unsigned long)phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - clean_kernel_dcache_range(virt, size); - break; - case DMA_FROM_DEVICE: - clean_kernel_dcache_range(virt, size); - break; - case DMA_BIDIRECTIONAL: - flush_kernel_dcache_range(virt, size); - break; - } + clean_kernel_dcache_range(virt, size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { unsigned long virt = (unsigned long)phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - purge_kernel_dcache_range(virt, size); - break; - } + purge_kernel_dcache_range(virt, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + flush_kernel_dcache_range(virt, size); } + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 00e59a4faa2b..268510c71156 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) #endif } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { __dma_phys_op(start, end, DMA_CACHE_CLEAN); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - } + __dma_phys_op(start, end, DMA_CACHE_INVAL); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + __dma_phys_op(start, end, DMA_CACHE_FLUSH); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long kaddr = (unsigned long)page_address(page); diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 69c80b2155a1..b9a9f57e02be 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -12,43 +12,40 @@ static bool noncoherent_supported; -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - case DMA_BIDIRECTIONAL: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); - break; - default: - break; - } + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); - break; - default: - break; - } + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + void *vaddr = phys_to_virt(paddr); + + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return true; +} + +#include <linux/dma-sync.h> + + void arch_dma_prep_coherent(struct page *page, size_t size) { void *flush_addr = page_address(page); diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index 6a44c0e7ba40..41f031ae7609 100644 --- a/arch/sh/kernel/dma-coherent.c +++ b/arch/sh/kernel/dma-coherent.c @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t size) __flush_purge_region(page_address(page), size); } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); - switch (dir) { - case DMA_FROM_DEVICE: /* invalidate only */ - __flush_invalidate_region(addr, size); - break; - case DMA_TO_DEVICE: /* writeback only */ - __flush_wback_region(addr, size); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __flush_purge_region(addr, size); - break; - default: - BUG(); - } + __flush_wback_region(addr, size); } + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); + + __flush_invalidate_region(addr, size); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); + + __flush_purge_region(addr, size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4f3d26066ec2..6926ead2f208 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); #endif /* CONFIG_SBUS */ -/* - * IIep is write-through, not flushing on cpu to device transfer. - * - * On LEON systems without cache snooping, the entire D-CACHE must be flushed to - * make DMA to cacheable memory coherent. - */ -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - if (dir != DMA_TO_DEVICE && - sparc_cpu_model == sparc_leon && + /* IIep is write-through, not flushing on cpu to device transfer. */ +} + +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + /* + * On LEON systems without cache snooping, the entire D-CACHE must be + * flushed to make DMA to cacheable memory coherent. + */ + if (sparc_cpu_model == sparc_leon && !sparc_leon3_snooping_enabled()) leon_flush_dcache_all(); } +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + arch_dma_cache_inv(paddr, size); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return true; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + #ifdef CONFIG_PROC_FS static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index ff3bf015eca4..d4ff96585545 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - do_cache_op(paddr, size, __flush_dcache_range); - break; - case DMA_FROM_DEVICE: - do_cache_op(paddr, size, __invalidate_dcache_range); - break; - case DMA_BIDIRECTIONAL: - do_cache_op(paddr, size, __flush_invalidate_dcache_range); - break; - default: - break; - } + do_cache_op(paddr, size, __flush_dcache_range); } +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) +{ + do_cache_op(paddr, size, __invalidate_dcache_range); +} + +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) +{ + do_cache_op(paddr, size, __flush_invalidate_dcache_range); +} + +static inline bool arch_sync_dma_clean_before_fromdevice(void) +{ + return false; +} + +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return false; +} + +#include <linux/dma-sync.h> + + void arch_dma_prep_coherent(struct page *page, size_t size) { __invalidate_dcache_range((unsigned long)page_address(page), size); diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file mode 100644 index 000000000000..18e33d5e8eaf --- /dev/null +++ b/include/linux/dma-sync.h @@ -0,0 +1,107 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Cache operations depending on function and direction argument, inspired by + * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] + * dma-mapping: provide a generic dma-noncoherent implementation)" + * + * | map == for_device | unmap == for_cpu + * |---------------------------------------------------------------- + * TO_DEV | writeback writeback | none none + * FROM_DEV | invalidate invalidate | invalidate* invalidate* + * BIDIR | writeback writeback | invalidate invalidate + * + * [*] needed for CPU speculative prefetches + * + * NOTE: we don't check the validity of direction argument as it is done in + * upper layer functions (in include/linux/dma-mapping.h) + * + * This file can be included by arch/.../kernel/dma-noncoherent.c to provide + * the respective high-level operations without having to expose the + * cache management ops to drivers. + */ + +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + /* + * This may be an empty function on write-through caches, + * and it might invalidate the cache if an architecture has + * a write-back cache but no way to write it back without + * invalidating + */ + arch_dma_cache_wback(paddr, size); + break; + + case DMA_FROM_DEVICE: + /* + * FIXME: this should be handled the same across all + * architectures, see + * https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ + */ + if (!arch_sync_dma_clean_before_fromdevice()) { + arch_dma_cache_inv(paddr, size); + break; + } + fallthrough; + + case DMA_BIDIRECTIONAL: + /* Skip the invalidate here if it's done later */ + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + arch_sync_dma_cpu_needs_post_dma_flush()) + arch_dma_cache_wback(paddr, size); + else + arch_dma_cache_wback_inv(paddr, size); + break; + + default: + break; + } +} + +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) +{ +#ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, &page->flags); + left -= PAGE_SIZE; + } +#endif +} + +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + break; + + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ + if (arch_sync_dma_cpu_needs_post_dma_flush()) + arch_dma_cache_inv(paddr, size); + + if (size > PAGE_SIZE) + arch_dma_mark_dcache_clean(paddr, size); + break; + + default: + break; + } +} +#endif -- 2.39.2 ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-27 22:25 ` Christoph Hellwig -1 siblings, 0 replies; 456+ messages in thread From: Christoph Hellwig @ 2023-03-27 22:25 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + dma_cache_wback(paddr, size); > +} > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > + dma_cache_wback_inv(paddr, size); > +} There are the only calls for the three functions for each of the involved functions. So I'd rather rename the low-level symbols (and drop the pointless exports for two of them) rather than adding these wrapppers. The same is probably true for many other architectures. > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } Is there a way to cut down on this boilerplate code by just having sane default, and Kconfig options to override them if they are not runtime decisions? > +#include <linux/dma-sync.h> I can't really say I like the #include version here despite your rationale in the commit log. I can probably live with it if you think it is absolutely worth it, but I'm really not in favor of it. > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y What do we need this symbol for? Unless I'm missing something it is always enable for arm32, and only used in arm32 code. ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-27 22:25 ` Christoph Hellwig 0 siblings, 0 replies; 456+ messages in thread From: Christoph Hellwig @ 2023-03-27 22:25 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + dma_cache_wback(paddr, size); > +} > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > + dma_cache_wback_inv(paddr, size); > +} There are the only calls for the three functions for each of the involved functions. So I'd rather rename the low-level symbols (and drop the pointless exports for two of them) rather than adding these wrapppers. The same is probably true for many other architectures. > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } Is there a way to cut down on this boilerplate code by just having sane default, and Kconfig options to override them if they are not runtime decisions? > +#include <linux/dma-sync.h> I can't really say I like the #include version here despite your rationale in the commit log. I can probably live with it if you think it is absolutely worth it, but I'm really not in favor of it. > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y What do we need this symbol for? Unless I'm missing something it is always enable for arm32, and only used in arm32 code. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-27 22:25 ` Christoph Hellwig 0 siblings, 0 replies; 456+ messages in thread From: Christoph Hellwig @ 2023-03-27 22:25 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong <neil.armstr > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + dma_cache_wback(paddr, size); > +} > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > + dma_cache_wback_inv(paddr, size); > +} There are the only calls for the three functions for each of the involved functions. So I'd rather rename the low-level symbols (and drop the pointless exports for two of them) rather than adding these wrapppers. The same is probably true for many other architectures. > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } Is there a way to cut down on this boilerplate code by just having sane default, and Kconfig options to override them if they are not runtime decisions? > +#include <linux/dma-sync.h> I can't really say I like the #include version here despite your rationale in the commit log. I can probably live with it if you think it is absolutely worth it, but I'm really not in favor of it. > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y What do we need this symbol for? Unless I'm missing something it is always enable for arm32, and only used in arm32 code. ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-27 22:25 ` Christoph Hellwig 0 siblings, 0 replies; 456+ messages in thread From: Christoph Hellwig @ 2023-03-27 22:25 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + dma_cache_wback(paddr, size); > +} > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > + dma_cache_wback_inv(paddr, size); > +} There are the only calls for the three functions for each of the involved functions. So I'd rather rename the low-level symbols (and drop the pointless exports for two of them) rather than adding these wrapppers. The same is probably true for many other architectures. > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } Is there a way to cut down on this boilerplate code by just having sane default, and Kconfig options to override them if they are not runtime decisions? > +#include <linux/dma-sync.h> I can't really say I like the #include version here despite your rationale in the commit log. I can probably live with it if you think it is absolutely worth it, but I'm really not in favor of it. > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y What do we need this symbol for? Unless I'm missing something it is always enable for arm32, and only used in arm32 code. _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-27 22:25 ` Christoph Hellwig 0 siblings, 0 replies; 456+ messages in thread From: Christoph Hellwig @ 2023-03-27 22:25 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + dma_cache_wback(paddr, size); > +} > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > + dma_cache_wback_inv(paddr, size); > +} There are the only calls for the three functions for each of the involved functions. So I'd rather rename the low-level symbols (and drop the pointless exports for two of them) rather than adding these wrapppers. The same is probably true for many other architectures. > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } Is there a way to cut down on this boilerplate code by just having sane default, and Kconfig options to override them if they are not runtime decisions? > +#include <linux/dma-sync.h> I can't really say I like the #include version here despite your rationale in the commit log. I can probably live with it if you think it is absolutely worth it, but I'm really not in favor of it. > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y What do we need this symbol for? Unless I'm missing something it is always enable for arm32, and only used in arm32 code. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-27 22:25 ` Christoph Hellwig 0 siblings, 0 replies; 456+ messages in thread From: Christoph Hellwig @ 2023-03-27 22:25 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + dma_cache_wback(paddr, size); > +} > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > + dma_cache_wback_inv(paddr, size); > +} There are the only calls for the three functions for each of the involved functions. So I'd rather rename the low-level symbols (and drop the pointless exports for two of them) rather than adding these wrapppers. The same is probably true for many other architectures. > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } Is there a way to cut down on this boilerplate code by just having sane default, and Kconfig options to override them if they are not runtime decisions? > +#include <linux/dma-sync.h> I can't really say I like the #include version here despite your rationale in the commit log. I can probably live with it if you think it is absolutely worth it, but I'm really not in favor of it. > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y What do we need this symbol for? Unless I'm missing something it is always enable for arm32, and only used in arm32 code. ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation 2023-03-27 22:25 ` Christoph Hellwig ` (3 preceding siblings ...) (?) @ 2023-03-31 13:04 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 13:04 UTC (permalink / raw) To: Christoph Hellwig, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Tue, Mar 28, 2023, at 00:25, Christoph Hellwig wrote: >> +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) >> { >> + dma_cache_wback(paddr, size); >> +} >> >> +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) >> +{ >> + dma_cache_inv(paddr, size); >> } > >> +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) >> { >> + dma_cache_wback_inv(paddr, size); >> +} > > There are the only calls for the three functions for each of the > involved functions. So I'd rather rename the low-level symbols > (and drop the pointless exports for two of them) rather than adding > these wrapppers. > > The same is probably true for many other architectures. Ok, done that now. >> +static inline bool arch_sync_dma_clean_before_fromdevice(void) >> +{ >> + return false; >> +} >> >> +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) >> +{ >> + return true; >> } > > Is there a way to cut down on this boilerplate code by just having > sane default, and Kconfig options to override them if they are not > runtime decisions? I've changed arch_sync_dma_clean_before_fromdevice() to a Kconfig symbol now, as this is never a runtime decision. For arch_sync_dma_cpu_needs_post_dma_flush(), I have this version now in common code, which lets mips and arm have their own logic and has the same effect elsewhere: +#ifndef arch_sync_dma_cpu_needs_post_dma_flush +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU); +} +#endif >> +#include <linux/dma-sync.h> > > I can't really say I like the #include version here despite your > rationale in the commit log. I can probably live with it if you > think it is absolutely worth it, but I'm really not in favor of it. > >> +config ARCH_DMA_MARK_DCACHE_CLEAN >> + def_bool y > > What do we need this symbol for? Unless I'm missing something it is > always enable for arm32, and only used in arm32 code. This was left over from an earlier draft and accidentally duplicates the thing that I have in the Arm version for the existing ARCH_HAS_DMA_MARK_CLEAN. I dropped this one and the generic copy of the arch_dma_mark_dcache_clean() function now, but still need to revisit the arm version, as it sounds like it has slightly different semantics from the ia64 version. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-31 13:04 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 13:04 UTC (permalink / raw) To: Christoph Hellwig, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Tue, Mar 28, 2023, at 00:25, Christoph Hellwig wrote: >> +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) >> { >> + dma_cache_wback(paddr, size); >> +} >> >> +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) >> +{ >> + dma_cache_inv(paddr, size); >> } > >> +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) >> { >> + dma_cache_wback_inv(paddr, size); >> +} > > There are the only calls for the three functions for each of the > involved functions. So I'd rather rename the low-level symbols > (and drop the pointless exports for two of them) rather than adding > these wrapppers. > > The same is probably true for many other architectures. Ok, done that now. >> +static inline bool arch_sync_dma_clean_before_fromdevice(void) >> +{ >> + return false; >> +} >> >> +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) >> +{ >> + return true; >> } > > Is there a way to cut down on this boilerplate code by just having > sane default, and Kconfig options to override them if they are not > runtime decisions? I've changed arch_sync_dma_clean_before_fromdevice() to a Kconfig symbol now, as this is never a runtime decision. For arch_sync_dma_cpu_needs_post_dma_flush(), I have this version now in common code, which lets mips and arm have their own logic and has the same effect elsewhere: +#ifndef arch_sync_dma_cpu_needs_post_dma_flush +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU); +} +#endif >> +#include <linux/dma-sync.h> > > I can't really say I like the #include version here despite your > rationale in the commit log. I can probably live with it if you > think it is absolutely worth it, but I'm really not in favor of it. > >> +config ARCH_DMA_MARK_DCACHE_CLEAN >> + def_bool y > > What do we need this symbol for? Unless I'm missing something it is > always enable for arm32, and only used in arm32 code. This was left over from an earlier draft and accidentally duplicates the thing that I have in the Arm version for the existing ARCH_HAS_DMA_MARK_CLEAN. I dropped this one and the generic copy of the arch_dma_mark_dcache_clean() function now, but still need to revisit the arm version, as it sounds like it has slightly different semantics from the ia64 version. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-31 13:04 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 13:04 UTC (permalink / raw) To: Christoph Hellwig, Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Brian Cain, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Tue, Mar 28, 2023, at 00:25, Christoph Hellwig wrote: >> +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) >> { >> + dma_cache_wback(paddr, size); >> +} >> >> +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) >> +{ >> + dma_cache_inv(paddr, size); >> } > >> +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) >> { >> + dma_cache_wback_inv(paddr, size); >> +} > > There are the only calls for the three functions for each of the > involved functions. So I'd rather rename the low-level symbols > (and drop the pointless exports for two of them) rather than adding > these wrapppers. > > The same is probably true for many other architectures. Ok, done that now. >> +static inline bool arch_sync_dma_clean_before_fromdevice(void) >> +{ >> + return false; >> +} >> >> +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) >> +{ >> + return true; >> } > > Is there a way to cut down on this boilerplate code by just having > sane default, and Kconfig options to override them if they are not > runtime decisions? I've changed arch_sync_dma_clean_before_fromdevice() to a Kconfig symbol now, as this is never a runtime decision. For arch_sync_dma_cpu_needs_post_dma_flush(), I have this version now in common code, which lets mips and arm have their own logic and has the same effect elsewhere: +#ifndef arch_sync_dma_cpu_needs_post_dma_flush +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU); +} +#endif >> +#include <linux/dma-sync.h> > > I can't really say I like the #include version here despite your > rationale in the commit log. I can probably live with it if you > think it is absolutely worth it, but I'm really not in favor of it. > >> +config ARCH_DMA_MARK_DCACHE_CLEAN >> + def_bool y > > What do we need this symbol for? Unless I'm missing something it is > always enable for arm32, and only used in arm32 code. This was left over from an earlier draft and accidentally duplicates the thing that I have in the Arm version for the existing ARCH_HAS_DMA_MARK_CLEAN. I dropped this one and the generic copy of the arch_dma_mark_dcache_clean() function now, but still need to revisit the arm version, as it sounds like it has slightly different semantics from the ia64 version. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-31 13:04 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 13:04 UTC (permalink / raw) To: Christoph Hellwig, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Tue, Mar 28, 2023, at 00:25, Christoph Hellwig wrote: >> +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) >> { >> + dma_cache_wback(paddr, size); >> +} >> >> +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) >> +{ >> + dma_cache_inv(paddr, size); >> } > >> +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) >> { >> + dma_cache_wback_inv(paddr, size); >> +} > > There are the only calls for the three functions for each of the > involved functions. So I'd rather rename the low-level symbols > (and drop the pointless exports for two of them) rather than adding > these wrapppers. > > The same is probably true for many other architectures. Ok, done that now. >> +static inline bool arch_sync_dma_clean_before_fromdevice(void) >> +{ >> + return false; >> +} >> >> +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) >> +{ >> + return true; >> } > > Is there a way to cut down on this boilerplate code by just having > sane default, and Kconfig options to override them if they are not > runtime decisions? I've changed arch_sync_dma_clean_before_fromdevice() to a Kconfig symbol now, as this is never a runtime decision. For arch_sync_dma_cpu_needs_post_dma_flush(), I have this version now in common code, which lets mips and arm have their own logic and has the same effect elsewhere: +#ifndef arch_sync_dma_cpu_needs_post_dma_flush +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU); +} +#endif >> +#include <linux/dma-sync.h> > > I can't really say I like the #include version here despite your > rationale in the commit log. I can probably live with it if you > think it is absolutely worth it, but I'm really not in favor of it. > >> +config ARCH_DMA_MARK_DCACHE_CLEAN >> + def_bool y > > What do we need this symbol for? Unless I'm missing something it is > always enable for arm32, and only used in arm32 code. This was left over from an earlier draft and accidentally duplicates the thing that I have in the Arm version for the existing ARCH_HAS_DMA_MARK_CLEAN. I dropped this one and the generic copy of the arch_dma_mark_dcache_clean() function now, but still need to revisit the arm version, as it sounds like it has slightly different semantics from the ia64 version. Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-31 13:04 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 13:04 UTC (permalink / raw) To: Christoph Hellwig, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Tue, Mar 28, 2023, at 00:25, Christoph Hellwig wrote: >> +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) >> { >> + dma_cache_wback(paddr, size); >> +} >> >> +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) >> +{ >> + dma_cache_inv(paddr, size); >> } > >> +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) >> { >> + dma_cache_wback_inv(paddr, size); >> +} > > There are the only calls for the three functions for each of the > involved functions. So I'd rather rename the low-level symbols > (and drop the pointless exports for two of them) rather than adding > these wrapppers. > > The same is probably true for many other architectures. Ok, done that now. >> +static inline bool arch_sync_dma_clean_before_fromdevice(void) >> +{ >> + return false; >> +} >> >> +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) >> +{ >> + return true; >> } > > Is there a way to cut down on this boilerplate code by just having > sane default, and Kconfig options to override them if they are not > runtime decisions? I've changed arch_sync_dma_clean_before_fromdevice() to a Kconfig symbol now, as this is never a runtime decision. For arch_sync_dma_cpu_needs_post_dma_flush(), I have this version now in common code, which lets mips and arm have their own logic and has the same effect elsewhere: +#ifndef arch_sync_dma_cpu_needs_post_dma_flush +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU); +} +#endif >> +#include <linux/dma-sync.h> > > I can't really say I like the #include version here despite your > rationale in the commit log. I can probably live with it if you > think it is absolutely worth it, but I'm really not in favor of it. > >> +config ARCH_DMA_MARK_DCACHE_CLEAN >> + def_bool y > > What do we need this symbol for? Unless I'm missing something it is > always enable for arm32, and only used in arm32 code. This was left over from an earlier draft and accidentally duplicates the thing that I have in the Arm version for the existing ARCH_HAS_DMA_MARK_CLEAN. I dropped this one and the generic copy of the arch_dma_mark_dcache_clean() function now, but still need to revisit the arm version, as it sounds like it has slightly different semantics from the ia64 version. Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-31 13:04 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 13:04 UTC (permalink / raw) To: Christoph Hellwig, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich On Tue, Mar 28, 2023, at 00:25, Christoph Hellwig wrote: >> +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) >> { >> + dma_cache_wback(paddr, size); >> +} >> >> +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) >> +{ >> + dma_cache_inv(paddr, size); >> } > >> +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) >> { >> + dma_cache_wback_inv(paddr, size); >> +} > > There are the only calls for the three functions for each of the > involved functions. So I'd rather rename the low-level symbols > (and drop the pointless exports for two of them) rather than adding > these wrapppers. > > The same is probably true for many other architectures. Ok, done that now. >> +static inline bool arch_sync_dma_clean_before_fromdevice(void) >> +{ >> + return false; >> +} >> >> +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) >> +{ >> + return true; >> } > > Is there a way to cut down on this boilerplate code by just having > sane default, and Kconfig options to override them if they are not > runtime decisions? I've changed arch_sync_dma_clean_before_fromdevice() to a Kconfig symbol now, as this is never a runtime decision. For arch_sync_dma_cpu_needs_post_dma_flush(), I have this version now in common code, which lets mips and arm have their own logic and has the same effect elsewhere: +#ifndef arch_sync_dma_cpu_needs_post_dma_flush +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU); +} +#endif >> +#include <linux/dma-sync.h> > > I can't really say I like the #include version here despite your > rationale in the commit log. I can probably live with it if you > think it is absolutely worth it, but I'm really not in favor of it. > >> +config ARCH_DMA_MARK_DCACHE_CLEAN >> + def_bool y > > What do we need this symbol for? Unless I'm missing something it is > always enable for arm32, and only used in arm32 code. This was left over from an earlier draft and accidentally duplicates the thing that I have in the Arm version for the existing ARCH_HAS_DMA_MARK_CLEAN. I dropped this one and the generic copy of the arch_dma_mark_dcache_clean() function now, but still need to revisit the arm version, as it sounds like it has slightly different semantics from the ia64 version. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation 2023-03-27 12:13 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-30 14:06 ` Lad, Prabhakar -1 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 14:06 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 1:20 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with > a single shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea > as it prevents the header from being included more than once, but it > helps keep the behavior as close as possible to the previous state, > including the possibility of inlining most of it into these functions > where that was done before. This also helps keep the global namespace > clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from > device drivers that might use them incorrectly. > > It would be possible to do this one architecture at a time, but > as the change is the same everywhere, the combined patch helps > explain it better once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) > create mode 100644 include/linux/dma-sync.h > I tested this on RZ/Five (with my v6 [0] + additional changes) so for RISC-V, Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> [0] https://patchwork.kernel.org/project/linux-renesas-soc/cover/20230106185526.260163-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ Cheers, Prabhakar > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c > index ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > dma_cache_wback_inv(page_to_phys(page), size); > } > > -/* > - * Cache operations depending on function and direction argument, inspired by > - * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |---------------------------------------------------------------- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* invalidate* > - * BIDIR | writeback writeback | invalidate invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c > index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) > { > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); > +} > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > index 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); > +} > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); > +} > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + cache_op(paddr, size, dma_wbinv_range); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c > index 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) > +{ > + hexagon_inv_dcache_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t size) > +{ > + hexagon_flush_dcache_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c > index 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + flush_dcache_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); > +} > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c > index fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); > +} > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c > index 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) > #endif > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long kaddr = (unsigned long)page_address(page); > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > void *flush_addr = page_address(page); > diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c > index 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > __flush_purge_region(page_address(page), size); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c > index 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) > diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c > index ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + do_cache_op(paddr, size, __invalidate_dcache_range); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h > new file mode 100644 > index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, inspired by > + * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |---------------------------------------------------------------- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* invalidate* > + * BIDIR | writeback writeback | invalidate invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ > +#ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-30 14:06 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 14:06 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 1:20 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with > a single shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea > as it prevents the header from being included more than once, but it > helps keep the behavior as close as possible to the previous state, > including the possibility of inlining most of it into these functions > where that was done before. This also helps keep the global namespace > clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from > device drivers that might use them incorrectly. > > It would be possible to do this one architecture at a time, but > as the change is the same everywhere, the combined patch helps > explain it better once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) > create mode 100644 include/linux/dma-sync.h > I tested this on RZ/Five (with my v6 [0] + additional changes) so for RISC-V, Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> [0] https://patchwork.kernel.org/project/linux-renesas-soc/cover/20230106185526.260163-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ Cheers, Prabhakar > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c > index ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > dma_cache_wback_inv(page_to_phys(page), size); > } > > -/* > - * Cache operations depending on function and direction argument, inspired by > - * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |---------------------------------------------------------------- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* invalidate* > - * BIDIR | writeback writeback | invalidate invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c > index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) > { > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); > +} > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > index 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); > +} > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); > +} > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + cache_op(paddr, size, dma_wbinv_range); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c > index 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) > +{ > + hexagon_inv_dcache_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t size) > +{ > + hexagon_flush_dcache_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c > index 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + flush_dcache_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); > +} > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c > index fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); > +} > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c > index 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) > #endif > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long kaddr = (unsigned long)page_address(page); > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > void *flush_addr = page_address(page); > diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c > index 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > __flush_purge_region(page_address(page), size); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c > index 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) > diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c > index ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + do_cache_op(paddr, size, __invalidate_dcache_range); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h > new file mode 100644 > index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, inspired by > + * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |---------------------------------------------------------------- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* invalidate* > + * BIDIR | writeback writeback | invalidate invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ > +#ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-30 14:06 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 14:06 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong <neil.armstr On Mon, Mar 27, 2023 at 1:20 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with > a single shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea > as it prevents the header from being included more than once, but it > helps keep the behavior as close as possible to the previous state, > including the possibility of inlining most of it into these functions > where that was done before. This also helps keep the global namespace > clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from > device drivers that might use them incorrectly. > > It would be possible to do this one architecture at a time, but > as the change is the same everywhere, the combined patch helps > explain it better once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) > create mode 100644 include/linux/dma-sync.h > I tested this on RZ/Five (with my v6 [0] + additional changes) so for RISC-V, Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> [0] https://patchwork.kernel.org/project/linux-renesas-soc/cover/20230106185526.260163-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ Cheers, Prabhakar > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c > index ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > dma_cache_wback_inv(page_to_phys(page), size); > } > > -/* > - * Cache operations depending on function and direction argument, inspired by > - * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |---------------------------------------------------------------- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* invalidate* > - * BIDIR | writeback writeback | invalidate invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c > index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) > { > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); > +} > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > index 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); > +} > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); > +} > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + cache_op(paddr, size, dma_wbinv_range); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c > index 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) > +{ > + hexagon_inv_dcache_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t size) > +{ > + hexagon_flush_dcache_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c > index 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + flush_dcache_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); > +} > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c > index fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); > +} > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c > index 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) > #endif > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long kaddr = (unsigned long)page_address(page); > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > void *flush_addr = page_address(page); > diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c > index 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > __flush_purge_region(page_address(page), size); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c > index 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) > diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c > index ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + do_cache_op(paddr, size, __invalidate_dcache_range); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h > new file mode 100644 > index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, inspired by > + * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |---------------------------------------------------------------- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* invalidate* > + * BIDIR | writeback writeback | invalidate invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ > +#ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-30 14:06 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 14:06 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 1:20 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with > a single shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea > as it prevents the header from being included more than once, but it > helps keep the behavior as close as possible to the previous state, > including the possibility of inlining most of it into these functions > where that was done before. This also helps keep the global namespace > clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from > device drivers that might use them incorrectly. > > It would be possible to do this one architecture at a time, but > as the change is the same everywhere, the combined patch helps > explain it better once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) > create mode 100644 include/linux/dma-sync.h > I tested this on RZ/Five (with my v6 [0] + additional changes) so for RISC-V, Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> [0] https://patchwork.kernel.org/project/linux-renesas-soc/cover/20230106185526.260163-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ Cheers, Prabhakar > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c > index ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > dma_cache_wback_inv(page_to_phys(page), size); > } > > -/* > - * Cache operations depending on function and direction argument, inspired by > - * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |---------------------------------------------------------------- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* invalidate* > - * BIDIR | writeback writeback | invalidate invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c > index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) > { > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); > +} > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > index 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); > +} > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); > +} > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + cache_op(paddr, size, dma_wbinv_range); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c > index 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) > +{ > + hexagon_inv_dcache_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t size) > +{ > + hexagon_flush_dcache_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c > index 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + flush_dcache_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); > +} > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c > index fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); > +} > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c > index 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) > #endif > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long kaddr = (unsigned long)page_address(page); > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > void *flush_addr = page_address(page); > diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c > index 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > __flush_purge_region(page_address(page), size); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c > index 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) > diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c > index ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + do_cache_op(paddr, size, __invalidate_dcache_range); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h > new file mode 100644 > index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, inspired by > + * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |---------------------------------------------------------------- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* invalidate* > + * BIDIR | writeback writeback | invalidate invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ > +#ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-30 14:06 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 14:06 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 1:20 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with > a single shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea > as it prevents the header from being included more than once, but it > helps keep the behavior as close as possible to the previous state, > including the possibility of inlining most of it into these functions > where that was done before. This also helps keep the global namespace > clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from > device drivers that might use them incorrectly. > > It would be possible to do this one architecture at a time, but > as the change is the same everywhere, the combined patch helps > explain it better once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) > create mode 100644 include/linux/dma-sync.h > I tested this on RZ/Five (with my v6 [0] + additional changes) so for RISC-V, Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> [0] https://patchwork.kernel.org/project/linux-renesas-soc/cover/20230106185526.260163-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ Cheers, Prabhakar > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c > index ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > dma_cache_wback_inv(page_to_phys(page), size); > } > > -/* > - * Cache operations depending on function and direction argument, inspired by > - * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |---------------------------------------------------------------- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* invalidate* > - * BIDIR | writeback writeback | invalidate invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c > index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) > { > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); > +} > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > index 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); > +} > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); > +} > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + cache_op(paddr, size, dma_wbinv_range); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c > index 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) > +{ > + hexagon_inv_dcache_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t size) > +{ > + hexagon_flush_dcache_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c > index 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + flush_dcache_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); > +} > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c > index fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); > +} > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c > index 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) > #endif > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long kaddr = (unsigned long)page_address(page); > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > void *flush_addr = page_address(page); > diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c > index 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > __flush_purge_region(page_address(page), size); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c > index 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) > diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c > index ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + do_cache_op(paddr, size, __invalidate_dcache_range); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h > new file mode 100644 > index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, inspired by > + * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |---------------------------------------------------------------- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* invalidate* > + * BIDIR | writeback writeback | invalidate invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ > +#ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-03-30 14:06 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-03-30 14:06 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich On Mon, Mar 27, 2023 at 1:20 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with > a single shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea > as it prevents the header from being included more than once, but it > helps keep the behavior as close as possible to the previous state, > including the possibility of inlining most of it into these functions > where that was done before. This also helps keep the global namespace > clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from > device drivers that might use them incorrectly. > > It would be possible to do this one architecture at a time, but > as the change is the same everywhere, the combined patch helps > explain it better once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) > create mode 100644 include/linux/dma-sync.h > I tested this on RZ/Five (with my v6 [0] + additional changes) so for RISC-V, Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> [0] https://patchwork.kernel.org/project/linux-renesas-soc/cover/20230106185526.260163-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ Cheers, Prabhakar > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c > index ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > dma_cache_wback_inv(page_to_phys(page), size); > } > > -/* > - * Cache operations depending on function and direction argument, inspired by > - * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |---------------------------------------------------------------- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* invalidate* > - * BIDIR | writeback writeback | invalidate invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c > index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) > { > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); > +} > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > index 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); > +} > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); > +} > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + cache_op(paddr, size, dma_wbinv_range); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c > index 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) > +{ > + hexagon_inv_dcache_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t size) > +{ > + hexagon_flush_dcache_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c > index 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + flush_dcache_range(paddr, paddr + size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); > +} > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c > index fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); > +} > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + size)); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c > index 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) > #endif > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long kaddr = (unsigned long)page_address(page); > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > void *flush_addr = page_address(page); > diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c > index 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > __flush_purge_region(page_address(page), size); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c > index 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) > diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c > index ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + do_cache_op(paddr, size, __invalidate_dcache_range); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > +{ > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) > { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h > new file mode 100644 > index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, inspired by > + * https://lore.kernel.org/lkml/20180518175004.GF17671@n2100.armlinux.org.uk > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |---------------------------------------------------------------- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* invalidate* > + * BIDIR | writeback writeback | invalidate invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ > +#ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* RE: [PATCH 21/21] dma-mapping: replace custom code with generic implementation 2023-03-27 12:13 ` Arnd Bergmann ` (4 preceding siblings ...) (?) @ 2023-04-13 12:13 ` Biju Das -1 siblings, 0 replies; 456+ messages in thread From: Biju Das @ 2023-04-13 12:13 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Prabhakar Mahadev Lad, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Hi all, FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 for fixing this issue. [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at virtual address 000000004afb0080 [10:53] <biju> [ 3.392755] Mem abort info: [10:53] <biju> [ 3.395883] ESR = 0x0000000096000144 [10:53] <biju> [ 3.399957] EC = 0x25: DABT (current EL), IL = 32 bits [10:53] <biju> [ 3.405674] SET = 0, FnV = 0 [10:53] <biju> [ 3.408978] EA = 0, S1PTW = 0 [10:53] <biju> [ 3.412442] FSC = 0x04: level 0 translation fault [10:53] <biju> [ 3.417825] Data abort info: [10:53] <biju> [ 3.420959] ISV = 0, ISS = 0x00000144 [10:53] <biju> [ 3.425115] CM = 1, WnR = 1 [10:53] <biju> [ 3.428521] [000000004afb0080] user address but active_mm is swapper [10:53] <biju> [ 3.435135] Internal error: Oops: 0000000096000144 [#1] PREEMPT SMP [10:53] <biju> [ 3.441501] Modules linked in: [10:53] <biju> [ 3.444644] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc6-next-20230412-g2936e9299572 #712 [10:53] <biju> [ 3.453537] Hardware name: Renesas SMARC EVK based on r9a07g054l2 (DT) [10:53] <biju> [ 3.460130] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [10:53] <biju> [ 3.467184] pc : dcache_clean_poc+0x20/0x38 [10:53] <biju> [ 3.471488] lr : arch_sync_dma_for_device+0x1c/0x2c [10:53] <biju> [ 3.476463] sp : ffff80000a70b970 [10:53] <biju> [ 3.479834] x29: ffff80000a70b970 x28: 0000000000000000 x27: ffff00000aef7c10 [10:53] <biju> [ 3.487118] x26: ffff00000afb0080 x25: ffff00000b710000 x24: ffff00000b710a40 [10:53] <biju> [ 3.494397] x23: 0000000000002000 x22: 0000000000000000 x21: 0000000000000002 [10:53] <biju> [ 3.501670] x20: ffff00000aef7c10 x19: 000000004afb0080 x18: 0000000000000000 [10:53] <biju> [ 3.508943] x17: 0000000000000100 x16: fffffc0001efc008 x15: 0000000000000000 [10:53] <biju> [ 3.516216] x14: 0000000000000100 x13: 0000000000000068 x12: ffff00007fc0aa50 [10:54] <biju> [ 3.523488] x11: ffff00007fc0a9c0 x10: 0000000000000000 x9 : ffff00000aef7f08 [10:54] <biju> [ 3.530761] x8 : 0000000000000000 x7 : fffffc00002bec00 x6 : 0000000000000000 [10:54] <biju> [ 3.538028] x5 : 0000000000000000 x4 : 0000000000000002 x3 : 000000000000003f [10:54] <biju> [ 3.545297] x2 : 0000000000000040 x1 : 000000004afb2080 x0 : 000000004afb0080 [10:54] <biju> [ 3.552569] Call trace: [10:54] <biju> [ 3.555074] dcache_clean_poc+0x20/0x38 [10:54] <biju> [ 3.559014] dma_map_page_attrs+0x1b4/0x248 [10:54] <biju> [ 3.563289] ravb_rx_ring_format_gbeth+0xd8/0x198 [10:54] <biju> [ 3.568095] ravb_ring_format+0x5c/0x108 [10:54] <biju> [ 3.572108] ravb_dmac_init_gbeth+0x30/0xe4 [10:54] <biju> [ 3.576382] ravb_dmac_init+0x80/0x104 [10:54] <biju> [ 3.580222] ravb_open+0x84/0x78c [10:54] <biju> [ 3.583626] __dev_open+0xec/0x1d8 [10:54] <biju> [ 3.587138] __dev_change_flags+0x190/0x208 [10:54] <biju> [ 3.591406] dev_change_flags+0x24/0x6c [10:54] <biju> [ 3.595324] ip_auto_config+0x248/0x10ac [10:54] <biju> [ 3.599345] do_one_initcall+0x6c/0x1b0 [10:54] <biju> [ 3.603268] kernel_init_freeable+0x1c0/0x294 Cheers, Biju > -----Original Message----- > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > Behalf Of Arnd Bergmann > Sent: Monday, March 27, 2023 1:13 PM > To: linux-kernel@vger.kernel.org > Cc: Arnd Bergmann <arnd@arndb.de>; Vineet Gupta <vgupta@kernel.org>; Russell > King <linux@armlinux.org.uk>; Neil Armstrong <neil.armstrong@linaro.org>; > Linus Walleij <linus.walleij@linaro.org>; Catalin Marinas > <catalin.marinas@arm.com>; Will Deacon <will@kernel.org>; Guo Ren > <guoren@kernel.org>; Brian Cain <bcain@quicinc.com>; Geert Uytterhoeven > <geert@linux-m68k.org>; Michal Simek <monstr@monstr.eu>; Thomas Bogendoerfer > <tsbogend@alpha.franken.de>; Dinh Nguyen <dinguyen@kernel.org>; Stafford > Horne <shorne@gmail.com>; Helge Deller <deller@gmx.de>; Michael Ellerman > <mpe@ellerman.id.au>; Christophe Leroy <christophe.leroy@csgroup.eu>; Paul > Walmsley <paul.walmsley@sifive.com>; Palmer Dabbelt <palmer@dabbelt.com>; > Rich Felker <dalias@libc.org>; John Paul Adrian Glaubitz > <glaubitz@physik.fu-berlin.de>; David S. Miller <davem@davemloft.net>; Max > Filippov <jcmvbkbc@gmail.com>; Christoph Hellwig <hch@lst.de>; Robin Murphy > <robin.murphy@arm.com>; Prabhakar Mahadev Lad <prabhakar.mahadev- > lad.rj@bp.renesas.com>; Conor Dooley <conor.dooley@microchip.com>; linux- > snps-arc@lists.infradead.org; linux-arm-kernel@lists.infradead.org; linux- > oxnas@groups.io; linux-csky@vger.kernel.org; linux-hexagon@vger.kernel.org; > linux-m68k@lists.linux-m68k.org; linux-mips@vger.kernel.org; linux- > openrisc@vger.kernel.org; linux-parisc@vger.kernel.org; linuxppc- > dev@lists.ozlabs.org; linux-riscv@lists.infradead.org; linux- > sh@vger.kernel.org; sparclinux@vger.kernel.org; linux-xtensa@linux- > xtensa.org > Subject: [PATCH 21/21] dma-mapping: replace custom code with generic > implementation > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with a single > shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea as it > prevents the header from being included more than once, but it helps keep > the behavior as close as possible to the previous state, including the > possibility of inlining most of it into these functions where that was done > before. This also helps keep the global namespace clean, by hiding the new > arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use > them incorrectly. > > It would be possible to do this one architecture at a time, but as the > change is the same everywhere, the combined patch helps explain it better > once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 > include/linux/dma-sync.h > > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index > ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > dma_cache_wback_inv(page_to_phys(page), size); } > > -/* > - * Cache operations depending on function and direction argument, inspired > by > - * > https://lore.kerne/ > l.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data=05%7C01%7 > Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1b0c%7C53d82571d > a1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250292766%7CUnknown%7CTWFpbGZsb3d > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7 > C%7C%7C&sdata=vVMW38elUoLyGW9%2BPQhsBDW8N61ubjgJBsbL6ct6uOU%3D&reserved=0 > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |-------------------------------------------------------------- > -- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > - * BIDIR | writeback writeback | invalidate > invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index > 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping- > nommu.c index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) { diff -- > git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index > b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t > size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); } > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool > arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) > { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index > 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); } > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); } > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index > c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_op(paddr, size, dma_wbinv_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index > 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) { > + hexagon_inv_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t > +size) { > + hexagon_flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to > create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index > 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void > *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir > %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, > size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void > arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); } > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index > fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index > 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t > size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, > void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma- > noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t > size, enum dma_cache_op op) #endif } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long kaddr = (unsigned long)page_address(page); diff --git > a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index > 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > void *flush_addr = page_address(page); diff --git > a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index > 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > __flush_purge_region(page_address(page), size); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index > 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be > flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git > a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index > ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + do_cache_op(paddr, size, __invalidate_dcache_range); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file > mode 100644 index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, > +inspired by > + * > +https://lore/. > +kernel.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data > +=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1 > +b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7CU > +nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW > +wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=04qDpyhP%2FT1wdPjg%2Bi0EzLz815rk > +8AJmZFv8tq7tolM%3D&reserved=0 > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |-------------------------------------------------------------- > -- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > + * BIDIR | writeback writeback | invalidate > invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is > +done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to > +provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * > https://lore.kerne/ > l.org%2Fall%2F20220606152150.GA31568%40willie-the- > truck%2F&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d93810 > 8db2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286% > 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi > LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rMRR1qB7VTNcvosS73f04WZ5BI46kEoZXj4sTXl > Sbf8%3D&reserved=0 > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ #ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch > only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infra/ > dead.org%2Fmailman%2Flistinfo%2Flinux-arm- > kernel&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108d > b2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7C > Unknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC > JXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jVWHs4FyF3gf99YGax4jl1vHNQ7JFMbsX3NoIAHdw > Zw%3D&reserved=0 ^ permalink raw reply [flat|nested] 456+ messages in thread
* RE: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-04-13 12:13 ` Biju Das 0 siblings, 0 replies; 456+ messages in thread From: Biju Das @ 2023-04-13 12:13 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Prabhakar Mahadev Lad, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Hi all, FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 for fixing this issue. [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at virtual address 000000004afb0080 [10:53] <biju> [ 3.392755] Mem abort info: [10:53] <biju> [ 3.395883] ESR = 0x0000000096000144 [10:53] <biju> [ 3.399957] EC = 0x25: DABT (current EL), IL = 32 bits [10:53] <biju> [ 3.405674] SET = 0, FnV = 0 [10:53] <biju> [ 3.408978] EA = 0, S1PTW = 0 [10:53] <biju> [ 3.412442] FSC = 0x04: level 0 translation fault [10:53] <biju> [ 3.417825] Data abort info: [10:53] <biju> [ 3.420959] ISV = 0, ISS = 0x00000144 [10:53] <biju> [ 3.425115] CM = 1, WnR = 1 [10:53] <biju> [ 3.428521] [000000004afb0080] user address but active_mm is swapper [10:53] <biju> [ 3.435135] Internal error: Oops: 0000000096000144 [#1] PREEMPT SMP [10:53] <biju> [ 3.441501] Modules linked in: [10:53] <biju> [ 3.444644] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc6-next-20230412-g2936e9299572 #712 [10:53] <biju> [ 3.453537] Hardware name: Renesas SMARC EVK based on r9a07g054l2 (DT) [10:53] <biju> [ 3.460130] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [10:53] <biju> [ 3.467184] pc : dcache_clean_poc+0x20/0x38 [10:53] <biju> [ 3.471488] lr : arch_sync_dma_for_device+0x1c/0x2c [10:53] <biju> [ 3.476463] sp : ffff80000a70b970 [10:53] <biju> [ 3.479834] x29: ffff80000a70b970 x28: 0000000000000000 x27: ffff00000aef7c10 [10:53] <biju> [ 3.487118] x26: ffff00000afb0080 x25: ffff00000b710000 x24: ffff00000b710a40 [10:53] <biju> [ 3.494397] x23: 0000000000002000 x22: 0000000000000000 x21: 0000000000000002 [10:53] <biju> [ 3.501670] x20: ffff00000aef7c10 x19: 000000004afb0080 x18: 0000000000000000 [10:53] <biju> [ 3.508943] x17: 0000000000000100 x16: fffffc0001efc008 x15: 0000000000000000 [10:53] <biju> [ 3.516216] x14: 0000000000000100 x13: 0000000000000068 x12: ffff00007fc0aa50 [10:54] <biju> [ 3.523488] x11: ffff00007fc0a9c0 x10: 0000000000000000 x9 : ffff00000aef7f08 [10:54] <biju> [ 3.530761] x8 : 0000000000000000 x7 : fffffc00002bec00 x6 : 0000000000000000 [10:54] <biju> [ 3.538028] x5 : 0000000000000000 x4 : 0000000000000002 x3 : 000000000000003f [10:54] <biju> [ 3.545297] x2 : 0000000000000040 x1 : 000000004afb2080 x0 : 000000004afb0080 [10:54] <biju> [ 3.552569] Call trace: [10:54] <biju> [ 3.555074] dcache_clean_poc+0x20/0x38 [10:54] <biju> [ 3.559014] dma_map_page_attrs+0x1b4/0x248 [10:54] <biju> [ 3.563289] ravb_rx_ring_format_gbeth+0xd8/0x198 [10:54] <biju> [ 3.568095] ravb_ring_format+0x5c/0x108 [10:54] <biju> [ 3.572108] ravb_dmac_init_gbeth+0x30/0xe4 [10:54] <biju> [ 3.576382] ravb_dmac_init+0x80/0x104 [10:54] <biju> [ 3.580222] ravb_open+0x84/0x78c [10:54] <biju> [ 3.583626] __dev_open+0xec/0x1d8 [10:54] <biju> [ 3.587138] __dev_change_flags+0x190/0x208 [10:54] <biju> [ 3.591406] dev_change_flags+0x24/0x6c [10:54] <biju> [ 3.595324] ip_auto_config+0x248/0x10ac [10:54] <biju> [ 3.599345] do_one_initcall+0x6c/0x1b0 [10:54] <biju> [ 3.603268] kernel_init_freeable+0x1c0/0x294 Cheers, Biju > -----Original Message----- > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > Behalf Of Arnd Bergmann > Sent: Monday, March 27, 2023 1:13 PM > To: linux-kernel@vger.kernel.org > Cc: Arnd Bergmann <arnd@arndb.de>; Vineet Gupta <vgupta@kernel.org>; Russell > King <linux@armlinux.org.uk>; Neil Armstrong <neil.armstrong@linaro.org>; > Linus Walleij <linus.walleij@linaro.org>; Catalin Marinas > <catalin.marinas@arm.com>; Will Deacon <will@kernel.org>; Guo Ren > <guoren@kernel.org>; Brian Cain <bcain@quicinc.com>; Geert Uytterhoeven > <geert@linux-m68k.org>; Michal Simek <monstr@monstr.eu>; Thomas Bogendoerfer > <tsbogend@alpha.franken.de>; Dinh Nguyen <dinguyen@kernel.org>; Stafford > Horne <shorne@gmail.com>; Helge Deller <deller@gmx.de>; Michael Ellerman > <mpe@ellerman.id.au>; Christophe Leroy <christophe.leroy@csgroup.eu>; Paul > Walmsley <paul.walmsley@sifive.com>; Palmer Dabbelt <palmer@dabbelt.com>; > Rich Felker <dalias@libc.org>; John Paul Adrian Glaubitz > <glaubitz@physik.fu-berlin.de>; David S. Miller <davem@davemloft.net>; Max > Filippov <jcmvbkbc@gmail.com>; Christoph Hellwig <hch@lst.de>; Robin Murphy > <robin.murphy@arm.com>; Prabhakar Mahadev Lad <prabhakar.mahadev- > lad.rj@bp.renesas.com>; Conor Dooley <conor.dooley@microchip.com>; linux- > snps-arc@lists.infradead.org; linux-arm-kernel@lists.infradead.org; linux- > oxnas@groups.io; linux-csky@vger.kernel.org; linux-hexagon@vger.kernel.org; > linux-m68k@lists.linux-m68k.org; linux-mips@vger.kernel.org; linux- > openrisc@vger.kernel.org; linux-parisc@vger.kernel.org; linuxppc- > dev@lists.ozlabs.org; linux-riscv@lists.infradead.org; linux- > sh@vger.kernel.org; sparclinux@vger.kernel.org; linux-xtensa@linux- > xtensa.org > Subject: [PATCH 21/21] dma-mapping: replace custom code with generic > implementation > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with a single > shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea as it > prevents the header from being included more than once, but it helps keep > the behavior as close as possible to the previous state, including the > possibility of inlining most of it into these functions where that was done > before. This also helps keep the global namespace clean, by hiding the new > arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use > them incorrectly. > > It would be possible to do this one architecture at a time, but as the > change is the same everywhere, the combined patch helps explain it better > once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 > include/linux/dma-sync.h > > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index > ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > dma_cache_wback_inv(page_to_phys(page), size); } > > -/* > - * Cache operations depending on function and direction argument, inspired > by > - * > https://lore.kerne/ > l.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data=05%7C01%7 > Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1b0c%7C53d82571d > a1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250292766%7CUnknown%7CTWFpbGZsb3d > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7 > C%7C%7C&sdata=vVMW38elUoLyGW9%2BPQhsBDW8N61ubjgJBsbL6ct6uOU%3D&reserved=0 > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |-------------------------------------------------------------- > -- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > - * BIDIR | writeback writeback | invalidate > invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index > 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping- > nommu.c index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) { diff -- > git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index > b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t > size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); } > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool > arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) > { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index > 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); } > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); } > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index > c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_op(paddr, size, dma_wbinv_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index > 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) { > + hexagon_inv_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t > +size) { > + hexagon_flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to > create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index > 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void > *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir > %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, > size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void > arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); } > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index > fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index > 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t > size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, > void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma- > noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t > size, enum dma_cache_op op) #endif } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long kaddr = (unsigned long)page_address(page); diff --git > a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index > 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > void *flush_addr = page_address(page); diff --git > a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index > 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > __flush_purge_region(page_address(page), size); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index > 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be > flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git > a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index > ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + do_cache_op(paddr, size, __invalidate_dcache_range); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file > mode 100644 index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, > +inspired by > + * > +https://lore/. > +kernel.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data > +=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1 > +b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7CU > +nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW > +wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=04qDpyhP%2FT1wdPjg%2Bi0EzLz815rk > +8AJmZFv8tq7tolM%3D&reserved=0 > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |-------------------------------------------------------------- > -- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > + * BIDIR | writeback writeback | invalidate > invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is > +done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to > +provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * > https://lore.kerne/ > l.org%2Fall%2F20220606152150.GA31568%40willie-the- > truck%2F&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d93810 > 8db2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286% > 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi > LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rMRR1qB7VTNcvosS73f04WZ5BI46kEoZXj4sTXl > Sbf8%3D&reserved=0 > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ #ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch > only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infra/ > dead.org%2Fmailman%2Flistinfo%2Flinux-arm- > kernel&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108d > b2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7C > Unknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC > JXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jVWHs4FyF3gf99YGax4jl1vHNQ7JFMbsX3NoIAHdw > Zw%3D&reserved=0 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* RE: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-04-13 12:13 ` Biju Das 0 siblings, 0 replies; 456+ messages in thread From: Biju Das @ 2023-04-13 12:13 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Rich Felker, linux-sh@vger.kernel.org, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor Dooley, Guo Ren, linux-csky@vger.kernel.org, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org, Arnd Bergmann, Brian Cain, Prabhakar Mahadev Lad Hi all, FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 for fixing this issue. [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at virtual address 000000004afb0080 [10:53] <biju> [ 3.392755] Mem abort info: [10:53] <biju> [ 3.395883] ESR = 0x0000000096000144 [10:53] <biju> [ 3.399957] EC = 0x25: DABT (current EL), IL = 32 bits [10:53] <biju> [ 3.405674] SET = 0, FnV = 0 [10:53] <biju> [ 3.408978] EA = 0, S1PTW = 0 [10:53] <biju> [ 3.412442] FSC = 0x04: level 0 translation fault [10:53] <biju> [ 3.417825] Data abort info: [10:53] <biju> [ 3.420959] ISV = 0, ISS = 0x00000144 [10:53] <biju> [ 3.425115] CM = 1, WnR = 1 [10:53] <biju> [ 3.428521] [000000004afb0080] user address but active_mm is swapper [10:53] <biju> [ 3.435135] Internal error: Oops: 0000000096000144 [#1] PREEMPT SMP [10:53] <biju> [ 3.441501] Modules linked in: [10:53] <biju> [ 3.444644] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc6-next-20230412-g2936e9299572 #712 [10:53] <biju> [ 3.453537] Hardware name: Renesas SMARC EVK based on r9a07g054l2 (DT) [10:53] <biju> [ 3.460130] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [10:53] <biju> [ 3.467184] pc : dcache_clean_poc+0x20/0x38 [10:53] <biju> [ 3.471488] lr : arch_sync_dma_for_device+0x1c/0x2c [10:53] <biju> [ 3.476463] sp : ffff80000a70b970 [10:53] <biju> [ 3.479834] x29: ffff80000a70b970 x28: 0000000000000000 x27: ffff00000aef7c10 [10:53] <biju> [ 3.487118] x26: ffff00000afb0080 x25: ffff00000b710000 x24: ffff00000b710a40 [10:53] <biju> [ 3.494397] x23: 0000000000002000 x22: 0000000000000000 x21: 0000000000000002 [10:53] <biju> [ 3.501670] x20: ffff00000aef7c10 x19: 000000004afb0080 x18: 0000000000000000 [10:53] <biju> [ 3.508943] x17: 0000000000000100 x16: fffffc0001efc008 x15: 0000000000000000 [10:53] <biju> [ 3.516216] x14: 0000000000000100 x13: 0000000000000068 x12: ffff00007fc0aa50 [10:54] <biju> [ 3.523488] x11: ffff00007fc0a9c0 x10: 0000000000000000 x9 : ffff00000aef7f08 [10:54] <biju> [ 3.530761] x8 : 0000000000000000 x7 : fffffc00002bec00 x6 : 0000000000000000 [10:54] <biju> [ 3.538028] x5 : 0000000000000000 x4 : 0000000000000002 x3 : 000000000000003f [10:54] <biju> [ 3.545297] x2 : 0000000000000040 x1 : 000000004afb2080 x0 : 000000004afb0080 [10:54] <biju> [ 3.552569] Call trace: [10:54] <biju> [ 3.555074] dcache_clean_poc+0x20/0x38 [10:54] <biju> [ 3.559014] dma_map_page_attrs+0x1b4/0x248 [10:54] <biju> [ 3.563289] ravb_rx_ring_format_gbeth+0xd8/0x198 [10:54] <biju> [ 3.568095] ravb_ring_format+0x5c/0x108 [10:54] <biju> [ 3.572108] ravb_dmac_init_gbeth+0x30/0xe4 [10:54] <biju> [ 3.576382] ravb_dmac_init+0x80/0x104 [10:54] <biju> [ 3.580222] ravb_open+0x84/0x78c [10:54] <biju> [ 3.583626] __dev_open+0xec/0x1d8 [10:54] <biju> [ 3.587138] __dev_change_flags+0x190/0x208 [10:54] <biju> [ 3.591406] dev_change_flags+0x24/0x6c [10:54] <biju> [ 3.595324] ip_auto_config+0x248/0x10ac [10:54] <biju> [ 3.599345] do_one_initcall+0x6c/0x1b0 [10:54] <biju> [ 3.603268] kernel_init_freeable+0x1c0/0x294 Cheers, Biju > -----Original Message----- > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > Behalf Of Arnd Bergmann > Sent: Monday, March 27, 2023 1:13 PM > To: linux-kernel@vger.kernel.org > Cc: Arnd Bergmann <arnd@arndb.de>; Vineet Gupta <vgupta@kernel.org>; Russell > King <linux@armlinux.org.uk>; Neil Armstrong <neil.armstrong@linaro.org>; > Linus Walleij <linus.walleij@linaro.org>; Catalin Marinas > <catalin.marinas@arm.com>; Will Deacon <will@kernel.org>; Guo Ren > <guoren@kernel.org>; Brian Cain <bcain@quicinc.com>; Geert Uytterhoeven > <geert@linux-m68k.org>; Michal Simek <monstr@monstr.eu>; Thomas Bogendoerfer > <tsbogend@alpha.franken.de>; Dinh Nguyen <dinguyen@kernel.org>; Stafford > Horne <shorne@gmail.com>; Helge Deller <deller@gmx.de>; Michael Ellerman > <mpe@ellerman.id.au>; Christophe Leroy <christophe.leroy@csgroup.eu>; Paul > Walmsley <paul.walmsley@sifive.com>; Palmer Dabbelt <palmer@dabbelt.com>; > Rich Felker <dalias@libc.org>; John Paul Adrian Glaubitz > <glaubitz@physik.fu-berlin.de>; David S. Miller <davem@davemloft.net>; Max > Filippov <jcmvbkbc@gmail.com>; Christoph Hellwig <hch@lst.de>; Robin Murphy > <robin.murphy@arm.com>; Prabhakar Mahadev Lad <prabhakar.mahadev- > lad.rj@bp.renesas.com>; Conor Dooley <conor.dooley@microchip.com>; linux- > snps-arc@lists.infradead.org; linux-arm-kernel@lists.infradead.org; linux- > oxnas@groups.io; linux-csky@vger.kernel.org; linux-hexagon@vger.kernel.org; > linux-m68k@lists.linux-m68k.org; linux-mips@vger.kernel.org; linux- > openrisc@vger.kernel.org; linux-parisc@vger.kernel.org; linuxppc- > dev@lists.ozlabs.org; linux-riscv@lists.infradead.org; linux- > sh@vger.kernel.org; sparclinux@vger.kernel.org; linux-xtensa@linux- > xtensa.org > Subject: [PATCH 21/21] dma-mapping: replace custom code with generic > implementation > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with a single > shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea as it > prevents the header from being included more than once, but it helps keep > the behavior as close as possible to the previous state, including the > possibility of inlining most of it into these functions where that was done > before. This also helps keep the global namespace clean, by hiding the new > arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use > them incorrectly. > > It would be possible to do this one architecture at a time, but as the > change is the same everywhere, the combined patch helps explain it better > once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 > include/linux/dma-sync.h > > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index > ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > dma_cache_wback_inv(page_to_phys(page), size); } > > -/* > - * Cache operations depending on function and direction argument, inspired > by > - * > https://lore.kerne/ > l.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data=05%7C01%7 > Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1b0c%7C53d82571d > a1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250292766%7CUnknown%7CTWFpbGZsb3d > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7 > C%7C%7C&sdata=vVMW38elUoLyGW9%2BPQhsBDW8N61ubjgJBsbL6ct6uOU%3D&reserved=0 > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |-------------------------------------------------------------- > -- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > - * BIDIR | writeback writeback | invalidate > invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index > 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping- > nommu.c index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) { diff -- > git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index > b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t > size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); } > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool > arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) > { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index > 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); } > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); } > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index > c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_op(paddr, size, dma_wbinv_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index > 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) { > + hexagon_inv_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t > +size) { > + hexagon_flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to > create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index > 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void > *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir > %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, > size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void > arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); } > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index > fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index > 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t > size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, > void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma- > noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t > size, enum dma_cache_op op) #endif } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long kaddr = (unsigned long)page_address(page); diff --git > a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index > 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > void *flush_addr = page_address(page); diff --git > a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index > 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > __flush_purge_region(page_address(page), size); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index > 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be > flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git > a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index > ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + do_cache_op(paddr, size, __invalidate_dcache_range); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file > mode 100644 index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, > +inspired by > + * > +https://lore/. > +kernel.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data > +=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1 > +b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7CU > +nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW > +wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=04qDpyhP%2FT1wdPjg%2Bi0EzLz815rk > +8AJmZFv8tq7tolM%3D&reserved=0 > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |-------------------------------------------------------------- > -- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > + * BIDIR | writeback writeback | invalidate > invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is > +done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to > +provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * > https://lore.kerne/ > l.org%2Fall%2F20220606152150.GA31568%40willie-the- > truck%2F&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d93810 > 8db2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286% > 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi > LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rMRR1qB7VTNcvosS73f04WZ5BI46kEoZXj4sTXl > Sbf8%3D&reserved=0 > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ #ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch > only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infra/ > dead.org%2Fmailman%2Flistinfo%2Flinux-arm- > kernel&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108d > b2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7C > Unknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC > JXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jVWHs4FyF3gf99YGax4jl1vHNQ7JFMbsX3NoIAHdw > Zw%3D&reserved=0 ^ permalink raw reply [flat|nested] 456+ messages in thread
* RE: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-04-13 12:13 ` Biju Das 0 siblings, 0 replies; 456+ messages in thread From: Biju Das @ 2023-04-13 12:13 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Prabhakar Mahadev Lad, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Hi all, FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 for fixing this issue. [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at virtual address 000000004afb0080 [10:53] <biju> [ 3.392755] Mem abort info: [10:53] <biju> [ 3.395883] ESR = 0x0000000096000144 [10:53] <biju> [ 3.399957] EC = 0x25: DABT (current EL), IL = 32 bits [10:53] <biju> [ 3.405674] SET = 0, FnV = 0 [10:53] <biju> [ 3.408978] EA = 0, S1PTW = 0 [10:53] <biju> [ 3.412442] FSC = 0x04: level 0 translation fault [10:53] <biju> [ 3.417825] Data abort info: [10:53] <biju> [ 3.420959] ISV = 0, ISS = 0x00000144 [10:53] <biju> [ 3.425115] CM = 1, WnR = 1 [10:53] <biju> [ 3.428521] [000000004afb0080] user address but active_mm is swapper [10:53] <biju> [ 3.435135] Internal error: Oops: 0000000096000144 [#1] PREEMPT SMP [10:53] <biju> [ 3.441501] Modules linked in: [10:53] <biju> [ 3.444644] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc6-next-20230412-g2936e9299572 #712 [10:53] <biju> [ 3.453537] Hardware name: Renesas SMARC EVK based on r9a07g054l2 (DT) [10:53] <biju> [ 3.460130] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [10:53] <biju> [ 3.467184] pc : dcache_clean_poc+0x20/0x38 [10:53] <biju> [ 3.471488] lr : arch_sync_dma_for_device+0x1c/0x2c [10:53] <biju> [ 3.476463] sp : ffff80000a70b970 [10:53] <biju> [ 3.479834] x29: ffff80000a70b970 x28: 0000000000000000 x27: ffff00000aef7c10 [10:53] <biju> [ 3.487118] x26: ffff00000afb0080 x25: ffff00000b710000 x24: ffff00000b710a40 [10:53] <biju> [ 3.494397] x23: 0000000000002000 x22: 0000000000000000 x21: 0000000000000002 [10:53] <biju> [ 3.501670] x20: ffff00000aef7c10 x19: 000000004afb0080 x18: 0000000000000000 [10:53] <biju> [ 3.508943] x17: 0000000000000100 x16: fffffc0001efc008 x15: 0000000000000000 [10:53] <biju> [ 3.516216] x14: 0000000000000100 x13: 0000000000000068 x12: ffff00007fc0aa50 [10:54] <biju> [ 3.523488] x11: ffff00007fc0a9c0 x10: 0000000000000000 x9 : ffff00000aef7f08 [10:54] <biju> [ 3.530761] x8 : 0000000000000000 x7 : fffffc00002bec00 x6 : 0000000000000000 [10:54] <biju> [ 3.538028] x5 : 0000000000000000 x4 : 0000000000000002 x3 : 000000000000003f [10:54] <biju> [ 3.545297] x2 : 0000000000000040 x1 : 000000004afb2080 x0 : 000000004afb0080 [10:54] <biju> [ 3.552569] Call trace: [10:54] <biju> [ 3.555074] dcache_clean_poc+0x20/0x38 [10:54] <biju> [ 3.559014] dma_map_page_attrs+0x1b4/0x248 [10:54] <biju> [ 3.563289] ravb_rx_ring_format_gbeth+0xd8/0x198 [10:54] <biju> [ 3.568095] ravb_ring_format+0x5c/0x108 [10:54] <biju> [ 3.572108] ravb_dmac_init_gbeth+0x30/0xe4 [10:54] <biju> [ 3.576382] ravb_dmac_init+0x80/0x104 [10:54] <biju> [ 3.580222] ravb_open+0x84/0x78c [10:54] <biju> [ 3.583626] __dev_open+0xec/0x1d8 [10:54] <biju> [ 3.587138] __dev_change_flags+0x190/0x208 [10:54] <biju> [ 3.591406] dev_change_flags+0x24/0x6c [10:54] <biju> [ 3.595324] ip_auto_config+0x248/0x10ac [10:54] <biju> [ 3.599345] do_one_initcall+0x6c/0x1b0 [10:54] <biju> [ 3.603268] kernel_init_freeable+0x1c0/0x294 Cheers, Biju > -----Original Message----- > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > Behalf Of Arnd Bergmann > Sent: Monday, March 27, 2023 1:13 PM > To: linux-kernel@vger.kernel.org > Cc: Arnd Bergmann <arnd@arndb.de>; Vineet Gupta <vgupta@kernel.org>; Russell > King <linux@armlinux.org.uk>; Neil Armstrong <neil.armstrong@linaro.org>; > Linus Walleij <linus.walleij@linaro.org>; Catalin Marinas > <catalin.marinas@arm.com>; Will Deacon <will@kernel.org>; Guo Ren > <guoren@kernel.org>; Brian Cain <bcain@quicinc.com>; Geert Uytterhoeven > <geert@linux-m68k.org>; Michal Simek <monstr@monstr.eu>; Thomas Bogendoerfer > <tsbogend@alpha.franken.de>; Dinh Nguyen <dinguyen@kernel.org>; Stafford > Horne <shorne@gmail.com>; Helge Deller <deller@gmx.de>; Michael Ellerman > <mpe@ellerman.id.au>; Christophe Leroy <christophe.leroy@csgroup.eu>; Paul > Walmsley <paul.walmsley@sifive.com>; Palmer Dabbelt <palmer@dabbelt.com>; > Rich Felker <dalias@libc.org>; John Paul Adrian Glaubitz > <glaubitz@physik.fu-berlin.de>; David S. Miller <davem@davemloft.net>; Max > Filippov <jcmvbkbc@gmail.com>; Christoph Hellwig <hch@lst.de>; Robin Murphy > <robin.murphy@arm.com>; Prabhakar Mahadev Lad <prabhakar.mahadev- > lad.rj@bp.renesas.com>; Conor Dooley <conor.dooley@microchip.com>; linux- > snps-arc@lists.infradead.org; linux-arm-kernel@lists.infradead.org; linux- > oxnas@groups.io; linux-csky@vger.kernel.org; linux-hexagon@vger.kernel.org; > linux-m68k@lists.linux-m68k.org; linux-mips@vger.kernel.org; linux- > openrisc@vger.kernel.org; linux-parisc@vger.kernel.org; linuxppc- > dev@lists.ozlabs.org; linux-riscv@lists.infradead.org; linux- > sh@vger.kernel.org; sparclinux@vger.kernel.org; linux-xtensa@linux- > xtensa.org > Subject: [PATCH 21/21] dma-mapping: replace custom code with generic > implementation > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with a single > shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea as it > prevents the header from being included more than once, but it helps keep > the behavior as close as possible to the previous state, including the > possibility of inlining most of it into these functions where that was done > before. This also helps keep the global namespace clean, by hiding the new > arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use > them incorrectly. > > It would be possible to do this one architecture at a time, but as the > change is the same everywhere, the combined patch helps explain it better > once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 > include/linux/dma-sync.h > > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index > ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > dma_cache_wback_inv(page_to_phys(page), size); } > > -/* > - * Cache operations depending on function and direction argument, inspired > by > - * > https://lore.kerne/ > l.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data=05%7C01%7 > Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1b0c%7C53d82571d > a1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250292766%7CUnknown%7CTWFpbGZsb3d > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7 > C%7C%7C&sdata=vVMW38elUoLyGW9%2BPQhsBDW8N61ubjgJBsbL6ct6uOU%3D&reserved=0 > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |-------------------------------------------------------------- > -- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > - * BIDIR | writeback writeback | invalidate > invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index > 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping- > nommu.c index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) { diff -- > git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index > b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t > size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); } > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool > arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) > { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index > 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); } > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); } > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index > c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_op(paddr, size, dma_wbinv_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index > 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) { > + hexagon_inv_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t > +size) { > + hexagon_flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to > create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index > 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void > *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir > %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, > size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void > arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); } > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index > fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index > 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t > size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, > void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma- > noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t > size, enum dma_cache_op op) #endif } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long kaddr = (unsigned long)page_address(page); diff --git > a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index > 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > void *flush_addr = page_address(page); diff --git > a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index > 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > __flush_purge_region(page_address(page), size); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index > 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be > flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git > a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index > ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + do_cache_op(paddr, size, __invalidate_dcache_range); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file > mode 100644 index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, > +inspired by > + * > +https://lore/. > +kernel.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data > +=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1 > +b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7CU > +nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW > +wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=04qDpyhP%2FT1wdPjg%2Bi0EzLz815rk > +8AJmZFv8tq7tolM%3D&reserved=0 > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |-------------------------------------------------------------- > -- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > + * BIDIR | writeback writeback | invalidate > invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is > +done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to > +provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * > https://lore.kerne/ > l.org%2Fall%2F20220606152150.GA31568%40willie-the- > truck%2F&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d93810 > 8db2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286% > 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi > LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rMRR1qB7VTNcvosS73f04WZ5BI46kEoZXj4sTXl > Sbf8%3D&reserved=0 > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ #ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch > only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infra/ > dead.org%2Fmailman%2Flistinfo%2Flinux-arm- > kernel&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108d > b2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7C > Unknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC > JXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jVWHs4FyF3gf99YGax4jl1vHNQ7JFMbsX3NoIAHdw > Zw%3D&reserved=0 _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* RE: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-04-13 12:13 ` Biju Das 0 siblings, 0 replies; 456+ messages in thread From: Biju Das @ 2023-04-13 12:13 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Prabhakar Mahadev Lad, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Hi all, FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 for fixing this issue. [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at virtual address 000000004afb0080 [10:53] <biju> [ 3.392755] Mem abort info: [10:53] <biju> [ 3.395883] ESR = 0x0000000096000144 [10:53] <biju> [ 3.399957] EC = 0x25: DABT (current EL), IL = 32 bits [10:53] <biju> [ 3.405674] SET = 0, FnV = 0 [10:53] <biju> [ 3.408978] EA = 0, S1PTW = 0 [10:53] <biju> [ 3.412442] FSC = 0x04: level 0 translation fault [10:53] <biju> [ 3.417825] Data abort info: [10:53] <biju> [ 3.420959] ISV = 0, ISS = 0x00000144 [10:53] <biju> [ 3.425115] CM = 1, WnR = 1 [10:53] <biju> [ 3.428521] [000000004afb0080] user address but active_mm is swapper [10:53] <biju> [ 3.435135] Internal error: Oops: 0000000096000144 [#1] PREEMPT SMP [10:53] <biju> [ 3.441501] Modules linked in: [10:53] <biju> [ 3.444644] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc6-next-20230412-g2936e9299572 #712 [10:53] <biju> [ 3.453537] Hardware name: Renesas SMARC EVK based on r9a07g054l2 (DT) [10:53] <biju> [ 3.460130] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [10:53] <biju> [ 3.467184] pc : dcache_clean_poc+0x20/0x38 [10:53] <biju> [ 3.471488] lr : arch_sync_dma_for_device+0x1c/0x2c [10:53] <biju> [ 3.476463] sp : ffff80000a70b970 [10:53] <biju> [ 3.479834] x29: ffff80000a70b970 x28: 0000000000000000 x27: ffff00000aef7c10 [10:53] <biju> [ 3.487118] x26: ffff00000afb0080 x25: ffff00000b710000 x24: ffff00000b710a40 [10:53] <biju> [ 3.494397] x23: 0000000000002000 x22: 0000000000000000 x21: 0000000000000002 [10:53] <biju> [ 3.501670] x20: ffff00000aef7c10 x19: 000000004afb0080 x18: 0000000000000000 [10:53] <biju> [ 3.508943] x17: 0000000000000100 x16: fffffc0001efc008 x15: 0000000000000000 [10:53] <biju> [ 3.516216] x14: 0000000000000100 x13: 0000000000000068 x12: ffff00007fc0aa50 [10:54] <biju> [ 3.523488] x11: ffff00007fc0a9c0 x10: 0000000000000000 x9 : ffff00000aef7f08 [10:54] <biju> [ 3.530761] x8 : 0000000000000000 x7 : fffffc00002bec00 x6 : 0000000000000000 [10:54] <biju> [ 3.538028] x5 : 0000000000000000 x4 : 0000000000000002 x3 : 000000000000003f [10:54] <biju> [ 3.545297] x2 : 0000000000000040 x1 : 000000004afb2080 x0 : 000000004afb0080 [10:54] <biju> [ 3.552569] Call trace: [10:54] <biju> [ 3.555074] dcache_clean_poc+0x20/0x38 [10:54] <biju> [ 3.559014] dma_map_page_attrs+0x1b4/0x248 [10:54] <biju> [ 3.563289] ravb_rx_ring_format_gbeth+0xd8/0x198 [10:54] <biju> [ 3.568095] ravb_ring_format+0x5c/0x108 [10:54] <biju> [ 3.572108] ravb_dmac_init_gbeth+0x30/0xe4 [10:54] <biju> [ 3.576382] ravb_dmac_init+0x80/0x104 [10:54] <biju> [ 3.580222] ravb_open+0x84/0x78c [10:54] <biju> [ 3.583626] __dev_open+0xec/0x1d8 [10:54] <biju> [ 3.587138] __dev_change_flags+0x190/0x208 [10:54] <biju> [ 3.591406] dev_change_flags+0x24/0x6c [10:54] <biju> [ 3.595324] ip_auto_config+0x248/0x10ac [10:54] <biju> [ 3.599345] do_one_initcall+0x6c/0x1b0 [10:54] <biju> [ 3.603268] kernel_init_freeable+0x1c0/0x294 Cheers, Biju > -----Original Message----- > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > Behalf Of Arnd Bergmann > Sent: Monday, March 27, 2023 1:13 PM > To: linux-kernel@vger.kernel.org > Cc: Arnd Bergmann <arnd@arndb.de>; Vineet Gupta <vgupta@kernel.org>; Russell > King <linux@armlinux.org.uk>; Neil Armstrong <neil.armstrong@linaro.org>; > Linus Walleij <linus.walleij@linaro.org>; Catalin Marinas > <catalin.marinas@arm.com>; Will Deacon <will@kernel.org>; Guo Ren > <guoren@kernel.org>; Brian Cain <bcain@quicinc.com>; Geert Uytterhoeven > <geert@linux-m68k.org>; Michal Simek <monstr@monstr.eu>; Thomas Bogendoerfer > <tsbogend@alpha.franken.de>; Dinh Nguyen <dinguyen@kernel.org>; Stafford > Horne <shorne@gmail.com>; Helge Deller <deller@gmx.de>; Michael Ellerman > <mpe@ellerman.id.au>; Christophe Leroy <christophe.leroy@csgroup.eu>; Paul > Walmsley <paul.walmsley@sifive.com>; Palmer Dabbelt <palmer@dabbelt.com>; > Rich Felker <dalias@libc.org>; John Paul Adrian Glaubitz > <glaubitz@physik.fu-berlin.de>; David S. Miller <davem@davemloft.net>; Max > Filippov <jcmvbkbc@gmail.com>; Christoph Hellwig <hch@lst.de>; Robin Murphy > <robin.murphy@arm.com>; Prabhakar Mahadev Lad <prabhakar.mahadev- > lad.rj@bp.renesas.com>; Conor Dooley <conor.dooley@microchip.com>; linux- > snps-arc@lists.infradead.org; linux-arm-kernel@lists.infradead.org; linux- > oxnas@groups.io; linux-csky@vger.kernel.org; linux-hexagon@vger.kernel.org; > linux-m68k@lists.linux-m68k.org; linux-mips@vger.kernel.org; linux- > openrisc@vger.kernel.org; linux-parisc@vger.kernel.org; linuxppc- > dev@lists.ozlabs.org; linux-riscv@lists.infradead.org; linux- > sh@vger.kernel.org; sparclinux@vger.kernel.org; linux-xtensa@linux- > xtensa.org > Subject: [PATCH 21/21] dma-mapping: replace custom code with generic > implementation > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with a single > shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea as it > prevents the header from being included more than once, but it helps keep > the behavior as close as possible to the previous state, including the > possibility of inlining most of it into these functions where that was done > before. This also helps keep the global namespace clean, by hiding the new > arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use > them incorrectly. > > It would be possible to do this one architecture at a time, but as the > change is the same everywhere, the combined patch helps explain it better > once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 > include/linux/dma-sync.h > > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index > ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > dma_cache_wback_inv(page_to_phys(page), size); } > > -/* > - * Cache operations depending on function and direction argument, inspired > by > - * > https://lore.kerne/ > l.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data=05%7C01%7 > Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1b0c%7C53d82571d > a1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250292766%7CUnknown%7CTWFpbGZsb3d > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7 > C%7C%7C&sdata=vVMW38elUoLyGW9%2BPQhsBDW8N61ubjgJBsbL6ct6uOU%3D&reserved=0 > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |-------------------------------------------------------------- > -- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > - * BIDIR | writeback writeback | invalidate > invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index > 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping- > nommu.c index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) { diff -- > git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index > b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t > size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); } > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool > arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) > { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index > 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); } > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); } > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index > c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_op(paddr, size, dma_wbinv_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index > 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) { > + hexagon_inv_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t > +size) { > + hexagon_flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to > create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index > 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void > *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir > %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, > size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void > arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); } > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index > fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index > 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t > size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, > void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma- > noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t > size, enum dma_cache_op op) #endif } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long kaddr = (unsigned long)page_address(page); diff --git > a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index > 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > void *flush_addr = page_address(page); diff --git > a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index > 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > __flush_purge_region(page_address(page), size); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index > 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be > flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git > a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index > ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + do_cache_op(paddr, size, __invalidate_dcache_range); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file > mode 100644 index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, > +inspired by > + * > +https://lore/. > +kernel.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data > +=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1 > +b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7CU > +nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW > +wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=04qDpyhP%2FT1wdPjg%2Bi0EzLz815rk > +8AJmZFv8tq7tolM%3D&reserved=0 > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |-------------------------------------------------------------- > -- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > + * BIDIR | writeback writeback | invalidate > invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is > +done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to > +provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * > https://lore.kerne/ > l.org%2Fall%2F20220606152150.GA31568%40willie-the- > truck%2F&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d93810 > 8db2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286% > 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi > LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rMRR1qB7VTNcvosS73f04WZ5BI46kEoZXj4sTXl > Sbf8%3D&reserved=0 > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ #ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch > only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infra/ > dead.org%2Fmailman%2Flistinfo%2Flinux-arm- > kernel&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108d > b2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7C > Unknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC > JXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jVWHs4FyF3gf99YGax4jl1vHNQ7JFMbsX3NoIAHdw > Zw%3D&reserved=0 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-04-13 12:13 ` Biju Das 0 siblings, 0 replies; 456+ messages in thread From: Biju Das @ 2023-04-13 12:13 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Prabhakar Mahadev Lad, Conor Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org Hi all, FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 for fixing this issue. [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at virtual address 000000004afb0080 [10:53] <biju> [ 3.392755] Mem abort info: [10:53] <biju> [ 3.395883] ESR = 0x0000000096000144 [10:53] <biju> [ 3.399957] EC = 0x25: DABT (current EL), IL = 32 bits [10:53] <biju> [ 3.405674] SET = 0, FnV = 0 [10:53] <biju> [ 3.408978] EA = 0, S1PTW = 0 [10:53] <biju> [ 3.412442] FSC = 0x04: level 0 translation fault [10:53] <biju> [ 3.417825] Data abort info: [10:53] <biju> [ 3.420959] ISV = 0, ISS = 0x00000144 [10:53] <biju> [ 3.425115] CM = 1, WnR = 1 [10:53] <biju> [ 3.428521] [000000004afb0080] user address but active_mm is swapper [10:53] <biju> [ 3.435135] Internal error: Oops: 0000000096000144 [#1] PREEMPT SMP [10:53] <biju> [ 3.441501] Modules linked in: [10:53] <biju> [ 3.444644] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc6-next-20230412-g2936e9299572 #712 [10:53] <biju> [ 3.453537] Hardware name: Renesas SMARC EVK based on r9a07g054l2 (DT) [10:53] <biju> [ 3.460130] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [10:53] <biju> [ 3.467184] pc : dcache_clean_poc+0x20/0x38 [10:53] <biju> [ 3.471488] lr : arch_sync_dma_for_device+0x1c/0x2c [10:53] <biju> [ 3.476463] sp : ffff80000a70b970 [10:53] <biju> [ 3.479834] x29: ffff80000a70b970 x28: 0000000000000000 x27: ffff00000aef7c10 [10:53] <biju> [ 3.487118] x26: ffff00000afb0080 x25: ffff00000b710000 x24: ffff00000b710a40 [10:53] <biju> [ 3.494397] x23: 0000000000002000 x22: 0000000000000000 x21: 0000000000000002 [10:53] <biju> [ 3.501670] x20: ffff00000aef7c10 x19: 000000004afb0080 x18: 0000000000000000 [10:53] <biju> [ 3.508943] x17: 0000000000000100 x16: fffffc0001efc008 x15: 0000000000000000 [10:53] <biju> [ 3.516216] x14: 0000000000000100 x13: 0000000000000068 x12: ffff00007fc0aa50 [10:54] <biju> [ 3.523488] x11: ffff00007fc0a9c0 x10: 0000000000000000 x9 : ffff00000aef7f08 [10:54] <biju> [ 3.530761] x8 : 0000000000000000 x7 : fffffc00002bec00 x6 : 0000000000000000 [10:54] <biju> [ 3.538028] x5 : 0000000000000000 x4 : 0000000000000002 x3 : 000000000000003f [10:54] <biju> [ 3.545297] x2 : 0000000000000040 x1 : 000000004afb2080 x0 : 000000004afb0080 [10:54] <biju> [ 3.552569] Call trace: [10:54] <biju> [ 3.555074] dcache_clean_poc+0x20/0x38 [10:54] <biju> [ 3.559014] dma_map_page_attrs+0x1b4/0x248 [10:54] <biju> [ 3.563289] ravb_rx_ring_format_gbeth+0xd8/0x198 [10:54] <biju> [ 3.568095] ravb_ring_format+0x5c/0x108 [10:54] <biju> [ 3.572108] ravb_dmac_init_gbeth+0x30/0xe4 [10:54] <biju> [ 3.576382] ravb_dmac_init+0x80/0x104 [10:54] <biju> [ 3.580222] ravb_open+0x84/0x78c [10:54] <biju> [ 3.583626] __dev_open+0xec/0x1d8 [10:54] <biju> [ 3.587138] __dev_change_flags+0x190/0x208 [10:54] <biju> [ 3.591406] dev_change_flags+0x24/0x6c [10:54] <biju> [ 3.595324] ip_auto_config+0x248/0x10ac [10:54] <biju> [ 3.599345] do_one_initcall+0x6c/0x1b0 [10:54] <biju> [ 3.603268] kernel_init_freeable+0x1c0/0x294 Cheers, Biju > -----Original Message----- > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > Behalf Of Arnd Bergmann > Sent: Monday, March 27, 2023 1:13 PM > To: linux-kernel@vger.kernel.org > Cc: Arnd Bergmann <arnd@arndb.de>; Vineet Gupta <vgupta@kernel.org>; Russell > King <linux@armlinux.org.uk>; Neil Armstrong <neil.armstrong@linaro.org>; > Linus Walleij <linus.walleij@linaro.org>; Catalin Marinas > <catalin.marinas@arm.com>; Will Deacon <will@kernel.org>; Guo Ren > <guoren@kernel.org>; Brian Cain <bcain@quicinc.com>; Geert Uytterhoeven > <geert@linux-m68k.org>; Michal Simek <monstr@monstr.eu>; Thomas Bogendoerfer > <tsbogend@alpha.franken.de>; Dinh Nguyen <dinguyen@kernel.org>; Stafford > Horne <shorne@gmail.com>; Helge Deller <deller@gmx.de>; Michael Ellerman > <mpe@ellerman.id.au>; Christophe Leroy <christophe.leroy@csgroup.eu>; Paul > Walmsley <paul.walmsley@sifive.com>; Palmer Dabbelt <palmer@dabbelt.com>; > Rich Felker <dalias@libc.org>; John Paul Adrian Glaubitz > <glaubitz@physik.fu-berlin.de>; David S. Miller <davem@davemloft.net>; Max > Filippov <jcmvbkbc@gmail.com>; Christoph Hellwig <hch@lst.de>; Robin Murphy > <robin.murphy@arm.com>; Prabhakar Mahadev Lad <prabhakar.mahadev- > lad.rj@bp.renesas.com>; Conor Dooley <conor.dooley@microchip.com>; linux- > snps-arc@lists.infradead.org; linux-arm-kernel@lists.infradead.org; linux- > oxnas@groups.io; linux-csky@vger.kernel.org; linux-hexagon@vger.kernel.org; > linux-m68k@lists.linux-m68k.org; linux-mips@vger.kernel.org; linux- > openrisc@vger.kernel.org; linux-parisc@vger.kernel.org; linuxppc- > dev@lists.ozlabs.org; linux-riscv@lists.infradead.org; linux- > sh@vger.kernel.org; sparclinux@vger.kernel.org; linux-xtensa@linux- > xtensa.org > Subject: [PATCH 21/21] dma-mapping: replace custom code with generic > implementation > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with a single > shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea as it > prevents the header from being included more than once, but it helps keep > the behavior as close as possible to the previous state, including the > possibility of inlining most of it into these functions where that was done > before. This also helps keep the global namespace clean, by hiding the new > arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use > them incorrectly. > > It would be possible to do this one architecture at a time, but as the > change is the same everywhere, the combined patch helps explain it better > once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 > include/linux/dma-sync.h > > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index > ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > dma_cache_wback_inv(page_to_phys(page), size); } > > -/* > - * Cache operations depending on function and direction argument, inspired > by > - * > https://lore.kerne/ > l.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data=05%7C01%7 > Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1b0c%7C53d82571d > a1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250292766%7CUnknown%7CTWFpbGZsb3d > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7 > C%7C%7C&sdata=vVMW38elUoLyGW9%2BPQhsBDW8N61ubjgJBsbL6ct6uOU%3D&reserved=0 > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |-------------------------------------------------------------- > -- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > - * BIDIR | writeback writeback | invalidate > invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index > 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping- > nommu.c index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) { diff -- > git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index > b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t > size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); } > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool > arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) > { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index > 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); } > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); } > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index > c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_op(paddr, size, dma_wbinv_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index > 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) { > + hexagon_inv_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t > +size) { > + hexagon_flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to > create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index > 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void > *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir > %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, > size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void > arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); } > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index > fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index > 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t > size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, > void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma- > noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t > size, enum dma_cache_op op) #endif } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long kaddr = (unsigned long)page_address(page); diff --git > a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index > 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > void *flush_addr = page_address(page); diff --git > a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index > 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > __flush_purge_region(page_address(page), size); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index > 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be > flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git > a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index > ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + do_cache_op(paddr, size, __invalidate_dcache_range); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file > mode 100644 index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, > +inspired by > + * > +https://lore/. > +kernel.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data > +=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1 > +b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7CU > +nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW > +wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=04qDpyhP%2FT1wdPjg%2Bi0EzLz815rk > +8AJmZFv8tq7tolM%3D&reserved=0 > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |-------------------------------------------------------------- > -- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > + * BIDIR | writeback writeback | invalidate > invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is > +done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to > +provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * > https://lore.kerne/ > l.org%2Fall%2F20220606152150.GA31568%40willie-the- > truck%2F&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d93810 > 8db2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286% > 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi > LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rMRR1qB7VTNcvosS73f04WZ5BI46kEoZXj4sTXl > Sbf8%3D&reserved=0 > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ #ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch > only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infra/ > dead.org%2Fmailman%2Flistinfo%2Flinux-arm- > kernel&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108d > b2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7C > Unknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC > JXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jVWHs4FyF3gf99YGax4jl1vHNQ7JFMbsX3NoIAHdw > Zw%3D&reserved=0 ^ permalink raw reply [flat|nested] 456+ messages in thread
* RE: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-04-13 12:13 ` Biju Das 0 siblings, 0 replies; 456+ messages in thread From: Biju Das @ 2023-04-13 12:13 UTC (permalink / raw) To: Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Hi all, FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 for fixing this issue. [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at virtual address 000000004afb0080 [10:53] <biju> [ 3.392755] Mem abort info: [10:53] <biju> [ 3.395883] ESR = 0x0000000096000144 [10:53] <biju> [ 3.399957] EC = 0x25: DABT (current EL), IL = 32 bits [10:53] <biju> [ 3.405674] SET = 0, FnV = 0 [10:53] <biju> [ 3.408978] EA = 0, S1PTW = 0 [10:53] <biju> [ 3.412442] FSC = 0x04: level 0 translation fault [10:53] <biju> [ 3.417825] Data abort info: [10:53] <biju> [ 3.420959] ISV = 0, ISS = 0x00000144 [10:53] <biju> [ 3.425115] CM = 1, WnR = 1 [10:53] <biju> [ 3.428521] [000000004afb0080] user address but active_mm is swapper [10:53] <biju> [ 3.435135] Internal error: Oops: 0000000096000144 [#1] PREEMPT SMP [10:53] <biju> [ 3.441501] Modules linked in: [10:53] <biju> [ 3.444644] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc6-next-20230412-g2936e9299572 #712 [10:53] <biju> [ 3.453537] Hardware name: Renesas SMARC EVK based on r9a07g054l2 (DT) [10:53] <biju> [ 3.460130] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [10:53] <biju> [ 3.467184] pc : dcache_clean_poc+0x20/0x38 [10:53] <biju> [ 3.471488] lr : arch_sync_dma_for_device+0x1c/0x2c [10:53] <biju> [ 3.476463] sp : ffff80000a70b970 [10:53] <biju> [ 3.479834] x29: ffff80000a70b970 x28: 0000000000000000 x27: ffff00000aef7c10 [10:53] <biju> [ 3.487118] x26: ffff00000afb0080 x25: ffff00000b710000 x24: ffff00000b710a40 [10:53] <biju> [ 3.494397] x23: 0000000000002000 x22: 0000000000000000 x21: 0000000000000002 [10:53] <biju> [ 3.501670] x20: ffff00000aef7c10 x19: 000000004afb0080 x18: 0000000000000000 [10:53] <biju> [ 3.508943] x17: 0000000000000100 x16: fffffc0001efc008 x15: 0000000000000000 [10:53] <biju> [ 3.516216] x14: 0000000000000100 x13: 0000000000000068 x12: ffff00007fc0aa50 [10:54] <biju> [ 3.523488] x11: ffff00007fc0a9c0 x10: 0000000000000000 x9 : ffff00000aef7f08 [10:54] <biju> [ 3.530761] x8 : 0000000000000000 x7 : fffffc00002bec00 x6 : 0000000000000000 [10:54] <biju> [ 3.538028] x5 : 0000000000000000 x4 : 0000000000000002 x3 : 000000000000003f [10:54] <biju> [ 3.545297] x2 : 0000000000000040 x1 : 000000004afb2080 x0 : 000000004afb0080 [10:54] <biju> [ 3.552569] Call trace: [10:54] <biju> [ 3.555074] dcache_clean_poc+0x20/0x38 [10:54] <biju> [ 3.559014] dma_map_page_attrs+0x1b4/0x248 [10:54] <biju> [ 3.563289] ravb_rx_ring_format_gbeth+0xd8/0x198 [10:54] <biju> [ 3.568095] ravb_ring_format+0x5c/0x108 [10:54] <biju> [ 3.572108] ravb_dmac_init_gbeth+0x30/0xe4 [10:54] <biju> [ 3.576382] ravb_dmac_init+0x80/0x104 [10:54] <biju> [ 3.580222] ravb_open+0x84/0x78c [10:54] <biju> [ 3.583626] __dev_open+0xec/0x1d8 [10:54] <biju> [ 3.587138] __dev_change_flags+0x190/0x208 [10:54] <biju> [ 3.591406] dev_change_flags+0x24/0x6c [10:54] <biju> [ 3.595324] ip_auto_config+0x248/0x10ac [10:54] <biju> [ 3.599345] do_one_initcall+0x6c/0x1b0 [10:54] <biju> [ 3.603268] kernel_init_freeable+0x1c0/0x294 Cheers, Biju > -----Original Message----- > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > Behalf Of Arnd Bergmann > Sent: Monday, March 27, 2023 1:13 PM > To: linux-kernel@vger.kernel.org > Cc: Arnd Bergmann <arnd@arndb.de>; Vineet Gupta <vgupta@kernel.org>; Russell > King <linux@armlinux.org.uk>; Neil Armstrong <neil.armstrong@linaro.org>; > Linus Walleij <linus.walleij@linaro.org>; Catalin Marinas > <catalin.marinas@arm.com>; Will Deacon <will@kernel.org>; Guo Ren > <guoren@kernel.org>; Brian Cain <bcain@quicinc.com>; Geert Uytterhoeven > <geert@linux-m68k.org>; Michal Simek <monstr@monstr.eu>; Thomas Bogendoerfer > <tsbogend@alpha.franken.de>; Dinh Nguyen <dinguyen@kernel.org>; Stafford > Horne <shorne@gmail.com>; Helge Deller <deller@gmx.de>; Michael Ellerman > <mpe@ellerman.id.au>; Christophe Leroy <christophe.leroy@csgroup.eu>; Paul > Walmsley <paul.walmsley@sifive.com>; Palmer Dabbelt <palmer@dabbelt.com>; > Rich Felker <dalias@libc.org>; John Paul Adrian Glaubitz > <glaubitz@physik.fu-berlin.de>; David S. Miller <davem@davemloft.net>; Max > Filippov <jcmvbkbc@gmail.com>; Christoph Hellwig <hch@lst.de>; Robin Murphy > <robin.murphy@arm.com>; Prabhakar Mahadev Lad <prabhakar.mahadev- > lad.rj@bp.renesas.com>; Conor Dooley <conor.dooley@microchip.com>; linux- > snps-arc@lists.infradead.org; linux-arm-kernel@lists.infradead.org; linux- > oxnas@groups.io; linux-csky@vger.kernel.org; linux-hexagon@vger.kernel.org; > linux-m68k@lists.linux-m68k.org; linux-mips@vger.kernel.org; linux- > openrisc@vger.kernel.org; linux-parisc@vger.kernel.org; linuxppc- > dev@lists.ozlabs.org; linux-riscv@lists.infradead.org; linux- > sh@vger.kernel.org; sparclinux@vger.kernel.org; linux-xtensa@linux- > xtensa.org > Subject: [PATCH 21/21] dma-mapping: replace custom code with generic > implementation > > From: Arnd Bergmann <arnd@arndb.de> > > Now that all of these have consistent behavior, replace them with a single > shared implementation of arch_sync_dma_for_device() and > arch_sync_dma_for_cpu() and three parameters to pick how they should > operate: > > - If the CPU has speculative prefetching, then the cache > has to be invalidated after a transfer from the device. > On the rarer CPUs without prefetching, this can be skipped, > with all cache management happening before the transfer. > This flag can be runtime detected, but is usually fixed > per architecture. > > - Some architectures currently clean the caches before DMA > from a device, while others invalidate it. There has not > been a conclusion regarding whether we should change all > architectures to use clean instead, so this adds an > architecture specific flag that we can change later on. > > - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps > track pages that are marked clean in the page cache, to > avoid flushing them again. The implementation for this is > generic enough to work on all architectures that use the > PG_dcache_clean page flag, but a Kconfig symbol is used > to only enable it on Arm to preserve the existing behavior. > > For the function naming, I picked 'wback' over 'clean', and 'wback_inv' > over 'flush', to avoid any ambiguity of what the helper functions are > supposed to do. > > Moving the global functions into a header file is usually a bad idea as it > prevents the header from being included more than once, but it helps keep > the behavior as close as possible to the previous state, including the > possibility of inlining most of it into these functions where that was done > before. This also helps keep the global namespace clean, by hiding the new > arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use > them incorrectly. > > It would be possible to do this one architecture at a time, but as the > change is the same everywhere, the combined patch helps explain it better > once. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arc/mm/dma.c | 66 +++++------------- > arch/arm/Kconfig | 3 + > arch/arm/mm/dma-mapping-nommu.c | 39 ++++++----- > arch/arm/mm/dma-mapping.c | 64 +++++++----------- > arch/arm64/mm/dma-mapping.c | 28 +++++--- > arch/csky/mm/dma-mapping.c | 44 ++++++------ > arch/hexagon/kernel/dma.c | 44 ++++++------ > arch/m68k/kernel/dma.c | 43 +++++++----- > arch/microblaze/kernel/dma.c | 48 +++++++------- > arch/mips/mm/dma-noncoherent.c | 60 +++++++---------- > arch/nios2/mm/dma-mapping.c | 57 +++++++--------- > arch/openrisc/kernel/dma.c | 63 +++++++++++------- > arch/parisc/kernel/pci-dma.c | 46 ++++++------- > arch/powerpc/mm/dma-noncoherent.c | 34 ++++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++++++------- > arch/sh/kernel/dma-coherent.c | 43 +++++++----- > arch/sparc/kernel/ioport.c | 38 ++++++++--- > arch/xtensa/kernel/pci-dma.c | 40 ++++++----- > include/linux/dma-sync.h | 107 ++++++++++++++++++++++++++++++ > 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 > include/linux/dma-sync.h > > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index > ddb96786f765..61cd01646222 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > dma_cache_wback_inv(page_to_phys(page), size); } > > -/* > - * Cache operations depending on function and direction argument, inspired > by > - * > https://lore.kerne/ > l.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data=05%7C01%7 > Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1b0c%7C53d82571d > a1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250292766%7CUnknown%7CTWFpbGZsb3d > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7 > C%7C%7C&sdata=vVMW38elUoLyGW9%2BPQhsBDW8N61ubjgJBsbL6ct6uOU%3D&reserved=0 > - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > - * dma-mapping: provide a generic dma-noncoherent implementation)" > - * > - * | map == for_device | unmap == for_cpu > - * |-------------------------------------------------------------- > -- > - * TO_DEV | writeback writeback | none none > - * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > - * BIDIR | writeback writeback | invalidate > invalidate > - * > - * [*] needed for CPU speculative prefetches > - * > - * NOTE: we don't check the validity of direction argument as it is done in > - * upper layer functions (in include/linux/dma-mapping.h) > - */ > - > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_wback(paddr, size); > - break; > - > - case DMA_FROM_DEVICE: > - dma_cache_inv(paddr, size); > - break; > - > - case DMA_BIDIRECTIONAL: > - dma_cache_wback(paddr, size); > - break; > + dma_cache_wback(paddr, size); > +} > > - default: > - break; > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_inv(paddr, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > + dma_cache_wback_inv(paddr, size); > +} > > - /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */ > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - dma_cache_inv(paddr, size); > - break; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - default: > - break; > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > /* > * Plug in direct dma map ops. > */ > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index > 125d58c54ab1..0de84e861027 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -212,6 +212,9 @@ config LOCKDEP_SUPPORT > bool > default y > > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y > + > config ARCH_HAS_ILOG2_U32 > bool > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping- > nommu.c index 12b5c6ae93fc..0817274aed15 100644 > --- a/arch/arm/mm/dma-mapping-nommu.c > +++ b/arch/arm/mm/dma-mapping-nommu.c > @@ -13,27 +13,36 @@ > > #include "dma.h" > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir == DMA_FROM_DEVICE) { > - dmac_inv_range(__va(paddr), __va(paddr + size)); > - outer_inv_range(paddr, paddr + size); > - } else { > - dmac_clean_range(__va(paddr), __va(paddr + size)); > - outer_clean_range(paddr, paddr + size); > - } > + dmac_clean_range(__va(paddr), __va(paddr + size)); > + outer_clean_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > - dmac_inv_range(__va(paddr), __va(paddr)); > - } > + dmac_inv_range(__va(paddr), __va(paddr + size)); > + outer_inv_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dmac_flush_range(__va(paddr), __va(paddr + size)); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) { diff -- > git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index > b703cb83d27e..aa6ee820a0ab 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -687,6 +687,30 @@ void arch_dma_mark_clean(phys_addr_t paddr, size_t > size) > } > } > > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > +{ > + dma_cache_maint(paddr, size, dmac_clean_range); > + outer_clean_range(paddr, paddr + size); } > + > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dma_cache_maint(paddr, size, dmac_inv_range); > + outer_inv_range(paddr, paddr + size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_cache_maint(paddr, size, dmac_flush_range); > + outer_flush_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > static bool arch_sync_dma_cpu_needs_post_dma_flush(void) > { > if (IS_ENABLED(CONFIG_CPU_V6) || > @@ -699,45 +723,7 @@ static bool > arch_sync_dma_cpu_needs_post_dma_flush(void) > return false; > } > > -/* > - * Make an area consistent for devices. > - * Note: Drivers should NOT use this function directly. > - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - dma_cache_maint(paddr, size, dmac_inv_range); > - outer_inv_range(paddr, paddr + size); > - break; > - case DMA_BIDIRECTIONAL: > - if (arch_sync_dma_cpu_needs_post_dma_flush()) { > - dma_cache_maint(paddr, size, dmac_clean_range); > - outer_clean_range(paddr, paddr + size); > - } else { > - dma_cache_maint(paddr, size, dmac_flush_range); > - outer_flush_range(paddr, paddr + size); > - } > - break; > - default: > - break; > - } > -} > - > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > -{ > - if (dir != DMA_TO_DEVICE && arch_sync_dma_cpu_needs_post_dma_flush()) > { > - outer_inv_range(paddr, paddr + size); > - dma_cache_maint(paddr, size, dmac_inv_range); > - } > -} > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index > 5240f6acad64..bae741aa65e9 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -13,25 +13,33 @@ > #include <asm/cacheflush.h> > #include <asm/xen/xen-ops.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_poc(paddr, paddr + size); } > > - dcache_clean_poc(start, start + size); > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + dcache_inval_poc(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > - unsigned long start = (unsigned long)phys_to_virt(paddr); > + dcache_clean_inval_poc(paddr, paddr + size); } > > - if (dir == DMA_TO_DEVICE) > - return; > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > > - dcache_inval_poc(start, start + size); > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index > c90f912e2822..9402e101b363 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -55,31 +55,29 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wb_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_wb_range); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - return; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_inv_range); > - break; > - default: > - BUG(); > - } > + cache_op(paddr, size, dma_inv_range); > } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_op(paddr, size, dma_wbinv_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index > 882680e81a30..e6538128a75b 100644 > --- a/arch/hexagon/kernel/dma.c > +++ b/arch/hexagon/kernel/dma.c > @@ -9,29 +9,33 @@ > #include <linux/memblock.h> > #include <asm/page.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - void *addr = phys_to_virt(paddr); > - > - switch (dir) { > - case DMA_TO_DEVICE: > - hexagon_clean_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_FROM_DEVICE: > - hexagon_inv_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_dcache_range((unsigned long) addr, > - (unsigned long) addr + size); > - break; > - default: > - BUG(); > - } > + hexagon_clean_dcache_range(paddr, paddr + size); > } > > +static inline void arch_dma_cache_inv(phys_addr_t start, size_t size) { > + hexagon_inv_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t start, size_t > +size) { > + hexagon_flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > /* > * Our max_low_pfn should have been backed off by 16MB in mm/init.c to > create > * DMA coherent space. Use that for the pool. > diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index > 2e192a5df949..aa9b434e6df8 100644 > --- a/arch/m68k/kernel/dma.c > +++ b/arch/m68k/kernel/dma.c > @@ -58,20 +58,33 @@ void arch_dma_free(struct device *dev, size_t size, void > *vaddr, > > #endif /* CONFIG_MMU && !CONFIG_COLDFIRE */ > > -void arch_sync_dma_for_device(phys_addr_t handle, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_TO_DEVICE: > - cache_push(handle, size); > - break; > - case DMA_FROM_DEVICE: > - cache_clear(handle, size); > - break; > - default: > - pr_err_ratelimited("dma_sync_single_for_device: unsupported dir > %u\n", > - dir); > - break; > - } > + /* > + * cache_push() always invalidates in addition to cleaning > + * write-back caches. > + */ > + cache_push(paddr, size); > +} > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + cache_clear(paddr, size); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + cache_push(paddr, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c > index b4c4e45fd45e..01110d4aa5b0 100644 > --- a/arch/microblaze/kernel/dma.c > +++ b/arch/microblaze/kernel/dma.c > @@ -14,32 +14,30 @@ > #include <linux/bug.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - case DMA_BIDIRECTIONAL: > - flush_dcache_range(paddr, paddr + size); > - break; > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - } > + /* writeback plus invalidate, could be a nop on WT caches */ > + flush_dcache_range(paddr, paddr + size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_TO_DEVICE: > - break; > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range(paddr, paddr + size); > - break; > - default: > - BUG(); > - }} > + invalidate_dcache_range(paddr, paddr + size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + flush_dcache_range(paddr, paddr + size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c > index b9d68bcc5d53..902d4b7c1f85 100644 > --- a/arch/mips/mm/dma-noncoherent.c > +++ b/arch/mips/mm/dma-noncoherent.c > @@ -85,50 +85,38 @@ static inline void dma_sync_phys(phys_addr_t paddr, > size_t size, > } while (left); > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_wback); > - break; > - case DMA_FROM_DEVICE: > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - case DMA_BIDIRECTIONAL: > - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > - cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_wback); > - else > - dma_sync_phys(paddr, size, _dma_cache_wback_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_wback); > } > > -#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU -void > arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - if (cpu_needs_post_dma_flush()) > - dma_sync_phys(paddr, size, _dma_cache_inv); > - break; > - default: > - break; > - } > + dma_sync_phys(paddr, size, _dma_cache_inv); > } > -#endif > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + dma_sync_phys(paddr, size, _dma_cache_wback_inv); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + cpu_needs_post_dma_flush(); } > + > +#include <linux/dma-sync.h> > > #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > - const struct iommu_ops *iommu, bool coherent) > + const struct iommu_ops *iommu, bool coherent) > { > - dev->dma_coherent = coherent; > + dev->dma_coherent = coherent; > } > #endif > diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index > fd887d5f3f9a..29978970955e 100644 > --- a/arch/nios2/mm/dma-mapping.c > +++ b/arch/nios2/mm/dma-mapping.c > @@ -13,53 +13,46 @@ > #include <linux/types.h> > #include <linux/mm.h> > #include <linux/string.h> > +#include <linux/dma-map-ops.h> > #include <linux/dma-mapping.h> > #include <linux/io.h> > #include <linux/cache.h> > #include <asm/cacheflush.h> > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + /* > + * We just need to write back the caches here, but Nios2 flush > + * instruction will do both writeback and invalidate. > + */ > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > > - switch (dir) { > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - /* > - * We just need to flush the caches here , but Nios2 flush > - * instruction will do both writeback and invalidate. > - */ > - case DMA_BIDIRECTIONAL: /* flush and invalidate */ > - flush_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - default: > - BUG(); > - } > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long vaddr = (unsigned long)phys_to_virt(paddr); > + invalidate_dcache_range(vaddr, (unsigned long)(vaddr + size)); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) > { > void *vaddr = phys_to_virt(paddr); > + flush_dcache_range((unsigned long)vaddr, (unsigned long)(vaddr + > +size)); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > - switch (dir) { > - case DMA_BIDIRECTIONAL: > - case DMA_FROM_DEVICE: > - invalidate_dcache_range((unsigned long)vaddr, > - (unsigned long)(vaddr + size)); > - break; > - case DMA_TO_DEVICE: > - break; > - default: > - BUG(); > - } > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } > > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long start = (unsigned long)page_address(page); diff --git > a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index > 91a00d09ffad..aba2258e62eb 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -95,32 +95,47 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t > size) > mmap_write_unlock(&init_mm); > } > > -void arch_sync_dma_for_device(phys_addr_t addr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long cl; > struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > > - switch (dir) { > - case DMA_TO_DEVICE: > - /* Write back the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBWR, cl); > - break; > - case DMA_FROM_DEVICE: > - /* Invalidate the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBIR, cl); > - break; > - case DMA_BIDIRECTIONAL: > - /* Flush the dcache for the requested range */ > - for (cl = addr; cl < addr + size; > - cl += cpuinfo->dcache_block_size) > - mtspr(SPR_DCBFR, cl); > - break; > - default: > - break; > - } > + /* Write back the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBWR, cl); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Invalidate the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBIR, cl); > +} > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long cl; > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; > + > + /* Flush the dcache for the requested range */ > + for (cl = paddr; cl < paddr + size; > + cl += cpuinfo->dcache_block_size) > + mtspr(SPR_DCBFR, cl); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c > index 6d3d3cffb316..a7955aab8ce2 100644 > --- a/arch/parisc/kernel/pci-dma.c > +++ b/arch/parisc/kernel/pci-dma.c > @@ -443,35 +443,35 @@ void arch_dma_free(struct device *dev, size_t size, > void *vaddr, > free_pages((unsigned long)__va(dma_handle), order); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_FROM_DEVICE: > - clean_kernel_dcache_range(virt, size); > - break; > - case DMA_BIDIRECTIONAL: > - flush_kernel_dcache_range(virt, size); > - break; > - } > + clean_kernel_dcache_range(virt, size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > unsigned long virt = (unsigned long)phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - purge_kernel_dcache_range(virt, size); > - break; > - } > + purge_kernel_dcache_range(virt, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + unsigned long virt = (unsigned long)phys_to_virt(paddr); > + > + flush_kernel_dcache_range(virt, size); > } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma- > noncoherent.c > index 00e59a4faa2b..268510c71156 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -101,27 +101,33 @@ static void __dma_phys_op(phys_addr_t paddr, size_t > size, enum dma_cache_op op) #endif } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > __dma_phys_op(start, end, DMA_CACHE_CLEAN); } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > - switch (direction) { > - case DMA_NONE: > - BUG(); > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - } > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + __dma_phys_op(start, end, DMA_CACHE_FLUSH); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > unsigned long kaddr = (unsigned long)page_address(page); diff --git > a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index > 69c80b2155a1..b9a9f57e02be 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -12,43 +12,40 @@ > > static bool noncoherent_supported; > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_FROM_DEVICE: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > } > > -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > { > void *vaddr = phys_to_virt(paddr); > > - switch (dir) { > - case DMA_TO_DEVICE: > - break; > - case DMA_FROM_DEVICE: > - case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > - break; > - default: > - break; > - } > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *vaddr = phys_to_virt(paddr); > + > + ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > void *flush_addr = page_address(page); diff --git > a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index > 6a44c0e7ba40..41f031ae7609 100644 > --- a/arch/sh/kernel/dma-coherent.c > +++ b/arch/sh/kernel/dma-coherent.c > @@ -12,22 +12,35 @@ void arch_dma_prep_coherent(struct page *page, size_t > size) > __flush_purge_region(page_address(page), size); } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > > - switch (dir) { > - case DMA_FROM_DEVICE: /* invalidate only */ > - __flush_invalidate_region(addr, size); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __flush_wback_region(addr, size); > - break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __flush_purge_region(addr, size); > - break; > - default: > - BUG(); > - } > + __flush_wback_region(addr, size); > } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_invalidate_region(addr, size); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + void *addr = sh_cacheop_vaddr(phys_to_virt(paddr)); > + > + __flush_purge_region(addr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index > 4f3d26066ec2..6926ead2f208 100644 > --- a/arch/sparc/kernel/ioport.c > +++ b/arch/sparc/kernel/ioport.c > @@ -300,21 +300,39 @@ arch_initcall(sparc_register_ioport); > > #endif /* CONFIG_SBUS */ > > -/* > - * IIep is write-through, not flushing on cpu to device transfer. > - * > - * On LEON systems without cache snooping, the entire D-CACHE must be > flushed to > - * make DMA to cacheable memory coherent. > - */ > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - if (dir != DMA_TO_DEVICE && > - sparc_cpu_model == sparc_leon && > + /* IIep is write-through, not flushing on cpu to device transfer. */ } > + > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + /* > + * On LEON systems without cache snooping, the entire D-CACHE must be > + * flushed to make DMA to cacheable memory coherent. > + */ > + if (sparc_cpu_model == sparc_leon && > !sparc_leon3_snooping_enabled()) > leon_flush_dcache_all(); > } > > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + arch_dma_cache_inv(paddr, size); > +} > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return true; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > #ifdef CONFIG_PROC_FS > > static int sparc_io_proc_show(struct seq_file *m, void *v) diff --git > a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index > ff3bf015eca4..d4ff96585545 100644 > --- a/arch/xtensa/kernel/pci-dma.c > +++ b/arch/xtensa/kernel/pci-dma.c > @@ -43,24 +43,34 @@ static void do_cache_op(phys_addr_t paddr, size_t size, > } > } > > -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > - enum dma_data_direction dir) > +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > - switch (dir) { > - case DMA_TO_DEVICE: > - do_cache_op(paddr, size, __flush_dcache_range); > - break; > - case DMA_FROM_DEVICE: > - do_cache_op(paddr, size, __invalidate_dcache_range); > - break; > - case DMA_BIDIRECTIONAL: > - do_cache_op(paddr, size, __flush_invalidate_dcache_range); > - break; > - default: > - break; > - } > + do_cache_op(paddr, size, __flush_dcache_range); > } > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { > + do_cache_op(paddr, size, __invalidate_dcache_range); } > + > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t > +size) { > + do_cache_op(paddr, size, __flush_invalidate_dcache_range); } > + > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > + > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return false; > +} > + > +#include <linux/dma-sync.h> > + > + > void arch_dma_prep_coherent(struct page *page, size_t size) { > __invalidate_dcache_range((unsigned long)page_address(page), size); > diff --git a/include/linux/dma-sync.h b/include/linux/dma-sync.h new file > mode 100644 index 000000000000..18e33d5e8eaf > --- /dev/null > +++ b/include/linux/dma-sync.h > @@ -0,0 +1,107 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Cache operations depending on function and direction argument, > +inspired by > + * > +https://lore/. > +kernel.org%2Flkml%2F20180518175004.GF17671%40n2100.armlinux.org.uk&data > +=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108db2ebe1 > +b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7CU > +nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW > +wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=04qDpyhP%2FT1wdPjg%2Bi0EzLz815rk > +8AJmZFv8tq7tolM%3D&reserved=0 > + * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] > + * dma-mapping: provide a generic dma-noncoherent implementation)" > + * > + * | map == for_device | unmap == for_cpu > + * |-------------------------------------------------------------- > -- > + * TO_DEV | writeback writeback | none none > + * FROM_DEV | invalidate invalidate | invalidate* > invalidate* > + * BIDIR | writeback writeback | invalidate > invalidate > + * > + * [*] needed for CPU speculative prefetches > + * > + * NOTE: we don't check the validity of direction argument as it is > +done in > + * upper layer functions (in include/linux/dma-mapping.h) > + * > + * This file can be included by arch/.../kernel/dma-noncoherent.c to > +provide > + * the respective high-level operations without having to expose the > + * cache management ops to drivers. > + */ > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + /* > + * This may be an empty function on write-through caches, > + * and it might invalidate the cache if an architecture has > + * a write-back cache but no way to write it back without > + * invalidating > + */ > + arch_dma_cache_wback(paddr, size); > + break; > + > + case DMA_FROM_DEVICE: > + /* > + * FIXME: this should be handled the same across all > + * architectures, see > + * > https://lore.kerne/ > l.org%2Fall%2F20220606152150.GA31568%40willie-the- > truck%2F&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d93810 > 8db2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286% > 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi > LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rMRR1qB7VTNcvosS73f04WZ5BI46kEoZXj4sTXl > Sbf8%3D&reserved=0 > + */ > + if (!arch_sync_dma_clean_before_fromdevice()) { > + arch_dma_cache_inv(paddr, size); > + break; > + } > + fallthrough; > + > + case DMA_BIDIRECTIONAL: > + /* Skip the invalidate here if it's done later */ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > + arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_wback(paddr, size); > + else > + arch_dma_cache_wback_inv(paddr, size); > + break; > + > + default: > + break; > + } > +} > + > +#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU > +/* > + * Mark the D-cache clean for these pages to avoid extra flushing. > + */ > +static void arch_dma_mark_dcache_clean(phys_addr_t paddr, size_t size) > +{ #ifdef CONFIG_ARCH_DMA_MARK_DCACHE_CLEAN > + unsigned long pfn = PFN_UP(paddr); > + unsigned long off = paddr & (PAGE_SIZE - 1); > + size_t left = size; > + > + if (off) > + left -= PAGE_SIZE - off; > + > + while (left >= PAGE_SIZE) { > + struct page *page = pfn_to_page(pfn++); > + set_bit(PG_dcache_clean, &page->flags); > + left -= PAGE_SIZE; > + } > +#endif > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > + enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + /* FROM_DEVICE invalidate needed if speculative CPU prefetch > only */ > + if (arch_sync_dma_cpu_needs_post_dma_flush()) > + arch_dma_cache_inv(paddr, size); > + > + if (size > PAGE_SIZE) > + arch_dma_mark_dcache_clean(paddr, size); > + break; > + > + default: > + break; > + } > +} > +#endif > -- > 2.39.2 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infra/ > dead.org%2Fmailman%2Flistinfo%2Flinux-arm- > kernel&data=05%7C01%7Cbiju.das.jz%40bp.renesas.com%7C3db9a66f29fa416d938108d > b2ebe1b0c%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638155166250449286%7C > Unknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC > JXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jVWHs4FyF3gf99YGax4jl1vHNQ7JFMbsX3NoIAHdw > Zw%3D&reserved=0 ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation 2023-04-13 12:13 ` Biju Das ` (3 preceding siblings ...) (?) @ 2023-04-13 12:51 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-04-13 12:51 UTC (permalink / raw) To: Biju Das, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org On Thu, Apr 13, 2023, at 14:13, Biju Das wrote: > Hi all, > > FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 > for fixing this issue. > > [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at > virtual address 000000004afb0080 Right, sorry about this, I accidentally removed the 'phys_to_virt()' conversion on arm64. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-04-13 12:51 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-04-13 12:51 UTC (permalink / raw) To: Biju Das, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org On Thu, Apr 13, 2023, at 14:13, Biju Das wrote: > Hi all, > > FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 > for fixing this issue. > > [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at > virtual address 000000004afb0080 Right, sorry about this, I accidentally removed the 'phys_to_virt()' conversion on arm64. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-04-13 12:51 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-04-13 12:51 UTC (permalink / raw) To: Biju Das, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Rich Felker, linux-sh@vger.kernel.org, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org, Brian Cain, Lad, Prabhakar, linux-m68k@lists.linux-m68k.org, Paul Walmsley, Stafford Horne, linux-arm-kernel@lists.infradead.org, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc@vger.kernel.org, linux-openrisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mips@vger.kernel.org, Dinh Nguyen, Palmer Dabbelt, linux-hexagon@vger.kernel.org, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Thu, Apr 13, 2023, at 14:13, Biju Das wrote: > Hi all, > > FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 > for fixing this issue. > > [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at > virtual address 000000004afb0080 Right, sorry about this, I accidentally removed the 'phys_to_virt()' conversion on arm64. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-04-13 12:51 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-04-13 12:51 UTC (permalink / raw) To: Biju Das, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org On Thu, Apr 13, 2023, at 14:13, Biju Das wrote: > Hi all, > > FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 > for fixing this issue. > > [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at > virtual address 000000004afb0080 Right, sorry about this, I accidentally removed the 'phys_to_virt()' conversion on arm64. Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-04-13 12:51 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-04-13 12:51 UTC (permalink / raw) To: Biju Das, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org On Thu, Apr 13, 2023, at 14:13, Biju Das wrote: > Hi all, > > FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 > for fixing this issue. > > [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at > virtual address 000000004afb0080 Right, sorry about this, I accidentally removed the 'phys_to_virt()' conversion on arm64. Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-04-13 12:51 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-04-13 12:51 UTC (permalink / raw) To: Biju Das, Arnd Bergmann, linux-kernel@vger.kernel.org Cc: Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov On Thu, Apr 13, 2023, at 14:13, Biju Das wrote: > Hi all, > > FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 > for fixing this issue. > > [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at > virtual address 000000004afb0080 Right, sorry about this, I accidentally removed the 'phys_to_virt()' conversion on arm64. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation 2023-04-13 12:51 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-06-27 16:52 ` Geert Uytterhoeven -1 siblings, 0 replies; 456+ messages in thread From: Geert Uytterhoeven @ 2023-06-27 16:52 UTC (permalink / raw) To: Arnd Bergmann Cc: Biju Das, Arnd Bergmann, linux-kernel@vger.kernel.org, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Emil Renner Berthing On Thu, Apr 13, 2023 at 2:52 PM Arnd Bergmann <arnd@arndb.de> wrote: > On Thu, Apr 13, 2023, at 14:13, Biju Das wrote: > > FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 > > for fixing this issue. > > > > [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at > > virtual address 000000004afb0080 > > Right, sorry about this, I accidentally removed the 'phys_to_virt()' > conversion on arm64. Meh, I missed that, so I ended up bisecting this same failure... This patch is now commit 801f1883c4bb70cc ("dma-mapping: replace custom code with generic implementation") in esmil/jh7100-dmapool, and broke booting on R-Car Gen3. The following gmail-whitespace-damaged patch fixes that: diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 97b7cea5eb23aedd..77e0b68b43e5849a 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -15,17 +15,23 @@ static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - dcache_clean_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_clean_poc(start, start + size); } static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - dcache_inval_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_inval_poc(start, start + size); } static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - dcache_clean_inval_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_clean_inval_poc(start, start + size); } static inline bool arch_sync_dma_clean_before_fromdevice(void) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-06-27 16:52 ` Geert Uytterhoeven 0 siblings, 0 replies; 456+ messages in thread From: Geert Uytterhoeven @ 2023-06-27 16:52 UTC (permalink / raw) To: Arnd Bergmann Cc: Biju Das, Arnd Bergmann, linux-kernel@vger.kernel.org, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Emil Renner Berthing On Thu, Apr 13, 2023 at 2:52 PM Arnd Bergmann <arnd@arndb.de> wrote: > On Thu, Apr 13, 2023, at 14:13, Biju Das wrote: > > FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 > > for fixing this issue. > > > > [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at > > virtual address 000000004afb0080 > > Right, sorry about this, I accidentally removed the 'phys_to_virt()' > conversion on arm64. Meh, I missed that, so I ended up bisecting this same failure... This patch is now commit 801f1883c4bb70cc ("dma-mapping: replace custom code with generic implementation") in esmil/jh7100-dmapool, and broke booting on R-Car Gen3. The following gmail-whitespace-damaged patch fixes that: diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 97b7cea5eb23aedd..77e0b68b43e5849a 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -15,17 +15,23 @@ static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - dcache_clean_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_clean_poc(start, start + size); } static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - dcache_inval_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_inval_poc(start, start + size); } static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - dcache_clean_inval_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_clean_inval_poc(start, start + size); } static inline bool arch_sync_dma_clean_before_fromdevice(void) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-06-27 16:52 ` Geert Uytterhoeven 0 siblings, 0 replies; 456+ messages in thread From: Geert Uytterhoeven @ 2023-06-27 16:52 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh@vger.kernel.org, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips@vger.kernel.org, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Vineet Gupta, linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org, Neil Armstrong, Lad, Prabhakar, l inux-m68k@lists.linux-m68k.org, Emil Renner Berthing, Paul Walmsley, Biju Das, Stafford Horne, linux-arm-kernel@lists.infradead.org, Brian Cain, Arnd Bergmann, Michal Simek, Thomas Bogendoerfer, linux-parisc@vger.kernel.org, linux-openrisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Dinh Nguyen, Palmer Dabbelt, linux-hexagon@vger.kernel.org, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Thu, Apr 13, 2023 at 2:52 PM Arnd Bergmann <arnd@arndb.de> wrote: > On Thu, Apr 13, 2023, at 14:13, Biju Das wrote: > > FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 > > for fixing this issue. > > > > [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at > > virtual address 000000004afb0080 > > Right, sorry about this, I accidentally removed the 'phys_to_virt()' > conversion on arm64. Meh, I missed that, so I ended up bisecting this same failure... This patch is now commit 801f1883c4bb70cc ("dma-mapping: replace custom code with generic implementation") in esmil/jh7100-dmapool, and broke booting on R-Car Gen3. The following gmail-whitespace-damaged patch fixes that: diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 97b7cea5eb23aedd..77e0b68b43e5849a 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -15,17 +15,23 @@ static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - dcache_clean_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_clean_poc(start, start + size); } static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - dcache_inval_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_inval_poc(start, start + size); } static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - dcache_clean_inval_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_clean_inval_poc(start, start + size); } static inline bool arch_sync_dma_clean_before_fromdevice(void) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-06-27 16:52 ` Geert Uytterhoeven 0 siblings, 0 replies; 456+ messages in thread From: Geert Uytterhoeven @ 2023-06-27 16:52 UTC (permalink / raw) To: Arnd Bergmann Cc: Biju Das, Arnd Bergmann, linux-kernel@vger.kernel.org, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Emil Renner Berthing On Thu, Apr 13, 2023 at 2:52 PM Arnd Bergmann <arnd@arndb.de> wrote: > On Thu, Apr 13, 2023, at 14:13, Biju Das wrote: > > FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 > > for fixing this issue. > > > > [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at > > virtual address 000000004afb0080 > > Right, sorry about this, I accidentally removed the 'phys_to_virt()' > conversion on arm64. Meh, I missed that, so I ended up bisecting this same failure... This patch is now commit 801f1883c4bb70cc ("dma-mapping: replace custom code with generic implementation") in esmil/jh7100-dmapool, and broke booting on R-Car Gen3. The following gmail-whitespace-damaged patch fixes that: diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 97b7cea5eb23aedd..77e0b68b43e5849a 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -15,17 +15,23 @@ static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - dcache_clean_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_clean_poc(start, start + size); } static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - dcache_inval_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_inval_poc(start, start + size); } static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - dcache_clean_inval_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_clean_inval_poc(start, start + size); } static inline bool arch_sync_dma_clean_before_fromdevice(void) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-06-27 16:52 ` Geert Uytterhoeven 0 siblings, 0 replies; 456+ messages in thread From: Geert Uytterhoeven @ 2023-06-27 16:52 UTC (permalink / raw) To: Arnd Bergmann Cc: Biju Das, Arnd Bergmann, linux-kernel@vger.kernel.org, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Emil Renner Berthing On Thu, Apr 13, 2023 at 2:52 PM Arnd Bergmann <arnd@arndb.de> wrote: > On Thu, Apr 13, 2023, at 14:13, Biju Das wrote: > > FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 > > for fixing this issue. > > > > [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at > > virtual address 000000004afb0080 > > Right, sorry about this, I accidentally removed the 'phys_to_virt()' > conversion on arm64. Meh, I missed that, so I ended up bisecting this same failure... This patch is now commit 801f1883c4bb70cc ("dma-mapping: replace custom code with generic implementation") in esmil/jh7100-dmapool, and broke booting on R-Car Gen3. The following gmail-whitespace-damaged patch fixes that: diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 97b7cea5eb23aedd..77e0b68b43e5849a 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -15,17 +15,23 @@ static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - dcache_clean_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_clean_poc(start, start + size); } static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - dcache_inval_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_inval_poc(start, start + size); } static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - dcache_clean_inval_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_clean_inval_poc(start, start + size); } static inline bool arch_sync_dma_clean_before_fromdevice(void) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation @ 2023-06-27 16:52 ` Geert Uytterhoeven 0 siblings, 0 replies; 456+ messages in thread From: Geert Uytterhoeven @ 2023-06-27 16:52 UTC (permalink / raw) To: Arnd Bergmann Cc: Biju Das, Arnd Bergmann, linux-kernel@vger.kernel.org, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, guoren, Brian Cain, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer On Thu, Apr 13, 2023 at 2:52 PM Arnd Bergmann <arnd@arndb.de> wrote: > On Thu, Apr 13, 2023, at 14:13, Biju Das wrote: > > FYI, this patch breaks on RZ/G2L SMARC EVK board and Arnd will send V2 > > for fixing this issue. > > > > [10:53] <biju> [ 3.384408] Unable to handle kernel paging request at > > virtual address 000000004afb0080 > > Right, sorry about this, I accidentally removed the 'phys_to_virt()' > conversion on arm64. Meh, I missed that, so I ended up bisecting this same failure... This patch is now commit 801f1883c4bb70cc ("dma-mapping: replace custom code with generic implementation") in esmil/jh7100-dmapool, and broke booting on R-Car Gen3. The following gmail-whitespace-damaged patch fixes that: diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 97b7cea5eb23aedd..77e0b68b43e5849a 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -15,17 +15,23 @@ static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - dcache_clean_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_clean_poc(start, start + size); } static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { - dcache_inval_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_inval_poc(start, start + size); } static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { - dcache_clean_inval_poc(paddr, paddr + size); + unsigned long start = (unsigned long)phys_to_virt(paddr); + + dcache_clean_inval_poc(start, start + size); } static inline bool arch_sync_dma_clean_before_fromdevice(void) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply related [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-03-31 16:53 ` Catalin Marinas -1 siblings, 0 replies; 456+ messages in thread From: Catalin Marinas @ 2023-03-31 16:53 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:12:56PM +0200, Arnd Bergmann wrote: > Another difference that I do not address here is what cache invalidation > does for partical cache lines. On arm32, arm64 and powerpc, a partial > cache line always gets written back before invalidation in order to > ensure that data before or after the buffer is not discarded. On all > other architectures, the assumption is cache lines are never shared > between DMA buffer and data that is accessed by the CPU. I don't think sharing the DMA buffer with other data is safe even with this clean+invalidate on the unaligned cache. Mapping the DMA buffer as FROM_DEVICE or BIDIRECTIONAL can cause the shared cache line to be evicted and override the device written data. This sharing only works if the CPU guarantees not to dirty the corresponding cache line. I'm fine with removing this partial cache line hack from arm64 as it's not safe anyway. We'll see if any driver stops working. If there's some benign sharing (I wouldn't trust it), the cache cleaning prior to mapping and invalidate on unmap would not lose any data. -- Catalin ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-31 16:53 ` Catalin Marinas 0 siblings, 0 replies; 456+ messages in thread From: Catalin Marinas @ 2023-03-31 16:53 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:12:56PM +0200, Arnd Bergmann wrote: > Another difference that I do not address here is what cache invalidation > does for partical cache lines. On arm32, arm64 and powerpc, a partial > cache line always gets written back before invalidation in order to > ensure that data before or after the buffer is not discarded. On all > other architectures, the assumption is cache lines are never shared > between DMA buffer and data that is accessed by the CPU. I don't think sharing the DMA buffer with other data is safe even with this clean+invalidate on the unaligned cache. Mapping the DMA buffer as FROM_DEVICE or BIDIRECTIONAL can cause the shared cache line to be evicted and override the device written data. This sharing only works if the CPU guarantees not to dirty the corresponding cache line. I'm fine with removing this partial cache line hack from arm64 as it's not safe anyway. We'll see if any driver stops working. If there's some benign sharing (I wouldn't trust it), the cache cleaning prior to mapping and invalidate on unmap would not lose any data. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-31 16:53 ` Catalin Marinas 0 siblings, 0 replies; 456+ messages in thread From: Catalin Marinas @ 2023-03-31 16:53 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek On Mon, Mar 27, 2023 at 02:12:56PM +0200, Arnd Bergmann wrote: > Another difference that I do not address here is what cache invalidation > does for partical cache lines. On arm32, arm64 and powerpc, a partial > cache line always gets written back before invalidation in order to > ensure that data before or after the buffer is not discarded. On all > other architectures, the assumption is cache lines are never shared > between DMA buffer and data that is accessed by the CPU. I don't think sharing the DMA buffer with other data is safe even with this clean+invalidate on the unaligned cache. Mapping the DMA buffer as FROM_DEVICE or BIDIRECTIONAL can cause the shared cache line to be evicted and override the device written data. This sharing only works if the CPU guarantees not to dirty the corresponding cache line. I'm fine with removing this partial cache line hack from arm64 as it's not safe anyway. We'll see if any driver stops working. If there's some benign sharing (I wouldn't trust it), the cache cleaning prior to mapping and invalidate on unmap would not lose any data. -- Catalin ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-31 16:53 ` Catalin Marinas 0 siblings, 0 replies; 456+ messages in thread From: Catalin Marinas @ 2023-03-31 16:53 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:12:56PM +0200, Arnd Bergmann wrote: > Another difference that I do not address here is what cache invalidation > does for partical cache lines. On arm32, arm64 and powerpc, a partial > cache line always gets written back before invalidation in order to > ensure that data before or after the buffer is not discarded. On all > other architectures, the assumption is cache lines are never shared > between DMA buffer and data that is accessed by the CPU. I don't think sharing the DMA buffer with other data is safe even with this clean+invalidate on the unaligned cache. Mapping the DMA buffer as FROM_DEVICE or BIDIRECTIONAL can cause the shared cache line to be evicted and override the device written data. This sharing only works if the CPU guarantees not to dirty the corresponding cache line. I'm fine with removing this partial cache line hack from arm64 as it's not safe anyway. We'll see if any driver stops working. If there's some benign sharing (I wouldn't trust it), the cache cleaning prior to mapping and invalidate on unmap would not lose any data. -- Catalin _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-31 16:53 ` Catalin Marinas 0 siblings, 0 replies; 456+ messages in thread From: Catalin Marinas @ 2023-03-31 16:53 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Mon, Mar 27, 2023 at 02:12:56PM +0200, Arnd Bergmann wrote: > Another difference that I do not address here is what cache invalidation > does for partical cache lines. On arm32, arm64 and powerpc, a partial > cache line always gets written back before invalidation in order to > ensure that data before or after the buffer is not discarded. On all > other architectures, the assumption is cache lines are never shared > between DMA buffer and data that is accessed by the CPU. I don't think sharing the DMA buffer with other data is safe even with this clean+invalidate on the unaligned cache. Mapping the DMA buffer as FROM_DEVICE or BIDIRECTIONAL can cause the shared cache line to be evicted and override the device written data. This sharing only works if the CPU guarantees not to dirty the corresponding cache line. I'm fine with removing this partial cache line hack from arm64 as it's not safe anyway. We'll see if any driver stops working. If there's some benign sharing (I wouldn't trust it), the cache cleaning prior to mapping and invalidate on unmap would not lose any data. -- Catalin _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-31 16:53 ` Catalin Marinas 0 siblings, 0 replies; 456+ messages in thread From: Catalin Marinas @ 2023-03-31 16:53 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph On Mon, Mar 27, 2023 at 02:12:56PM +0200, Arnd Bergmann wrote: > Another difference that I do not address here is what cache invalidation > does for partical cache lines. On arm32, arm64 and powerpc, a partial > cache line always gets written back before invalidation in order to > ensure that data before or after the buffer is not discarded. On all > other architectures, the assumption is cache lines are never shared > between DMA buffer and data that is accessed by the CPU. I don't think sharing the DMA buffer with other data is safe even with this clean+invalidate on the unaligned cache. Mapping the DMA buffer as FROM_DEVICE or BIDIRECTIONAL can cause the shared cache line to be evicted and override the device written data. This sharing only works if the CPU guarantees not to dirty the corresponding cache line. I'm fine with removing this partial cache line hack from arm64 as it's not safe anyway. We'll see if any driver stops working. If there's some benign sharing (I wouldn't trust it), the cache cleaning prior to mapping and invalidate on unmap would not lose any data. -- Catalin ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes 2023-03-31 16:53 ` Catalin Marinas ` (3 preceding siblings ...) (?) @ 2023-03-31 20:27 ` Arnd Bergmann -1 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 20:27 UTC (permalink / raw) To: Catalin Marinas, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 18:53, Catalin Marinas wrote: > On Mon, Mar 27, 2023 at 02:12:56PM +0200, Arnd Bergmann wrote: >> Another difference that I do not address here is what cache invalidation >> does for partical cache lines. On arm32, arm64 and powerpc, a partial >> cache line always gets written back before invalidation in order to >> ensure that data before or after the buffer is not discarded. On all >> other architectures, the assumption is cache lines are never shared >> between DMA buffer and data that is accessed by the CPU. > > I don't think sharing the DMA buffer with other data is safe even with > this clean+invalidate on the unaligned cache. Mapping the DMA buffer as > FROM_DEVICE or BIDIRECTIONAL can cause the shared cache line to be > evicted and override the device written data. This sharing only works if > the CPU guarantees not to dirty the corresponding cache line. > > I'm fine with removing this partial cache line hack from arm64 as it's > not safe anyway. We'll see if any driver stops working. If there's some > benign sharing (I wouldn't trust it), the cache cleaning prior to > mapping and invalidate on unmap would not lose any data. Ok, I'll add a patch to remove that bit from dcache_inval_poc then. Do you know if any of the the other callers of this function rely on on the writeback behavior, or is it safe to remove it for all of them? Note that before c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer"), it made some sense to write back partial cache lines before a DMA_FROM_DEVICE, in order to allow sharing read-only data in them the same way as on arm32 and powerpc. Doing the writeback in the sync_for_cpu bit is of course always pointless. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-31 20:27 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 20:27 UTC (permalink / raw) To: Catalin Marinas, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 18:53, Catalin Marinas wrote: > On Mon, Mar 27, 2023 at 02:12:56PM +0200, Arnd Bergmann wrote: >> Another difference that I do not address here is what cache invalidation >> does for partical cache lines. On arm32, arm64 and powerpc, a partial >> cache line always gets written back before invalidation in order to >> ensure that data before or after the buffer is not discarded. On all >> other architectures, the assumption is cache lines are never shared >> between DMA buffer and data that is accessed by the CPU. > > I don't think sharing the DMA buffer with other data is safe even with > this clean+invalidate on the unaligned cache. Mapping the DMA buffer as > FROM_DEVICE or BIDIRECTIONAL can cause the shared cache line to be > evicted and override the device written data. This sharing only works if > the CPU guarantees not to dirty the corresponding cache line. > > I'm fine with removing this partial cache line hack from arm64 as it's > not safe anyway. We'll see if any driver stops working. If there's some > benign sharing (I wouldn't trust it), the cache cleaning prior to > mapping and invalidate on unmap would not lose any data. Ok, I'll add a patch to remove that bit from dcache_inval_poc then. Do you know if any of the the other callers of this function rely on on the writeback behavior, or is it safe to remove it for all of them? Note that before c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer"), it made some sense to write back partial cache lines before a DMA_FROM_DEVICE, in order to allow sharing read-only data in them the same way as on arm32 and powerpc. Doing the writeback in the sync_for_cpu bit is of course always pointless. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-31 20:27 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 20:27 UTC (permalink / raw) To: Catalin Marinas, Arnd Bergmann Cc: Rich Felker, linux-sh, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor.Dooley, guoren, linux-csky@vger.kernel.org, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Brian Cain, Lad, Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong, Michal Simek, Thomas Bogendoerfer, linux-parisc, linux-openrisc@vger.kernel.org, linuxppc-dev, linux-kernel, Dinh Nguyen, Palmer Dabbelt, linux-hexagon, linux-oxnas@groups.io, Robin Murphy, David S . Miller On Fri, Mar 31, 2023, at 18:53, Catalin Marinas wrote: > On Mon, Mar 27, 2023 at 02:12:56PM +0200, Arnd Bergmann wrote: >> Another difference that I do not address here is what cache invalidation >> does for partical cache lines. On arm32, arm64 and powerpc, a partial >> cache line always gets written back before invalidation in order to >> ensure that data before or after the buffer is not discarded. On all >> other architectures, the assumption is cache lines are never shared >> between DMA buffer and data that is accessed by the CPU. > > I don't think sharing the DMA buffer with other data is safe even with > this clean+invalidate on the unaligned cache. Mapping the DMA buffer as > FROM_DEVICE or BIDIRECTIONAL can cause the shared cache line to be > evicted and override the device written data. This sharing only works if > the CPU guarantees not to dirty the corresponding cache line. > > I'm fine with removing this partial cache line hack from arm64 as it's > not safe anyway. We'll see if any driver stops working. If there's some > benign sharing (I wouldn't trust it), the cache cleaning prior to > mapping and invalidate on unmap would not lose any data. Ok, I'll add a patch to remove that bit from dcache_inval_poc then. Do you know if any of the the other callers of this function rely on on the writeback behavior, or is it safe to remove it for all of them? Note that before c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer"), it made some sense to write back partial cache lines before a DMA_FROM_DEVICE, in order to allow sharing read-only data in them the same way as on arm32 and powerpc. Doing the writeback in the sync_for_cpu bit is of course always pointless. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-31 20:27 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 20:27 UTC (permalink / raw) To: Catalin Marinas, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 18:53, Catalin Marinas wrote: > On Mon, Mar 27, 2023 at 02:12:56PM +0200, Arnd Bergmann wrote: >> Another difference that I do not address here is what cache invalidation >> does for partical cache lines. On arm32, arm64 and powerpc, a partial >> cache line always gets written back before invalidation in order to >> ensure that data before or after the buffer is not discarded. On all >> other architectures, the assumption is cache lines are never shared >> between DMA buffer and data that is accessed by the CPU. > > I don't think sharing the DMA buffer with other data is safe even with > this clean+invalidate on the unaligned cache. Mapping the DMA buffer as > FROM_DEVICE or BIDIRECTIONAL can cause the shared cache line to be > evicted and override the device written data. This sharing only works if > the CPU guarantees not to dirty the corresponding cache line. > > I'm fine with removing this partial cache line hack from arm64 as it's > not safe anyway. We'll see if any driver stops working. If there's some > benign sharing (I wouldn't trust it), the cache cleaning prior to > mapping and invalidate on unmap would not lose any data. Ok, I'll add a patch to remove that bit from dcache_inval_poc then. Do you know if any of the the other callers of this function rely on on the writeback behavior, or is it safe to remove it for all of them? Note that before c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer"), it made some sense to write back partial cache lines before a DMA_FROM_DEVICE, in order to allow sharing read-only data in them the same way as on arm32 and powerpc. Doing the writeback in the sync_for_cpu bit is of course always pointless. Arnd _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-31 20:27 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 20:27 UTC (permalink / raw) To: Catalin Marinas, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad, Prabhakar, Conor.Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas@groups.io, linux-csky@vger.kernel.org, linux-hexagon, linux-m68k, linux-mips, linux-openrisc@vger.kernel.org, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa On Fri, Mar 31, 2023, at 18:53, Catalin Marinas wrote: > On Mon, Mar 27, 2023 at 02:12:56PM +0200, Arnd Bergmann wrote: >> Another difference that I do not address here is what cache invalidation >> does for partical cache lines. On arm32, arm64 and powerpc, a partial >> cache line always gets written back before invalidation in order to >> ensure that data before or after the buffer is not discarded. On all >> other architectures, the assumption is cache lines are never shared >> between DMA buffer and data that is accessed by the CPU. > > I don't think sharing the DMA buffer with other data is safe even with > this clean+invalidate on the unaligned cache. Mapping the DMA buffer as > FROM_DEVICE or BIDIRECTIONAL can cause the shared cache line to be > evicted and override the device written data. This sharing only works if > the CPU guarantees not to dirty the corresponding cache line. > > I'm fine with removing this partial cache line hack from arm64 as it's > not safe anyway. We'll see if any driver stops working. If there's some > benign sharing (I wouldn't trust it), the cache cleaning prior to > mapping and invalidate on unmap would not lose any data. Ok, I'll add a patch to remove that bit from dcache_inval_poc then. Do you know if any of the the other callers of this function rely on on the writeback behavior, or is it safe to remove it for all of them? Note that before c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer"), it made some sense to write back partial cache lines before a DMA_FROM_DEVICE, in order to allow sharing read-only data in them the same way as on arm32 and powerpc. Doing the writeback in the sync_for_cpu bit is of course always pointless. Arnd _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-03-31 20:27 ` Arnd Bergmann 0 siblings, 0 replies; 456+ messages in thread From: Arnd Bergmann @ 2023-03-31 20:27 UTC (permalink / raw) To: Catalin Marinas, Arnd Bergmann Cc: linux-kernel, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Will Deacon, guoren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S . Miller, Max Filippov, Chris On Fri, Mar 31, 2023, at 18:53, Catalin Marinas wrote: > On Mon, Mar 27, 2023 at 02:12:56PM +0200, Arnd Bergmann wrote: >> Another difference that I do not address here is what cache invalidation >> does for partical cache lines. On arm32, arm64 and powerpc, a partial >> cache line always gets written back before invalidation in order to >> ensure that data before or after the buffer is not discarded. On all >> other architectures, the assumption is cache lines are never shared >> between DMA buffer and data that is accessed by the CPU. > > I don't think sharing the DMA buffer with other data is safe even with > this clean+invalidate on the unaligned cache. Mapping the DMA buffer as > FROM_DEVICE or BIDIRECTIONAL can cause the shared cache line to be > evicted and override the device written data. This sharing only works if > the CPU guarantees not to dirty the corresponding cache line. > > I'm fine with removing this partial cache line hack from arm64 as it's > not safe anyway. We'll see if any driver stops working. If there's some > benign sharing (I wouldn't trust it), the cache cleaning prior to > mapping and invalidate on unmap would not lose any data. Ok, I'll add a patch to remove that bit from dcache_inval_poc then. Do you know if any of the the other callers of this function rely on on the writeback behavior, or is it safe to remove it for all of them? Note that before c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer"), it made some sense to write back partial cache lines before a DMA_FROM_DEVICE, in order to allow sharing read-only data in them the same way as on arm32 and powerpc. Doing the writeback in the sync_for_cpu bit is of course always pointless. Arnd ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes 2023-03-27 12:12 ` Arnd Bergmann ` (3 preceding siblings ...) (?) @ 2023-05-25 7:46 ` Lad, Prabhakar -1 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-05-25 7:46 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa Hi Arnd, On Mon, Mar 27, 2023 at 1:14 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > After a long discussion about adding SoC specific semantics for when > to flush caches in drivers/soc/ drivers that we determined to be > fundamentally flawed[1], I volunteered to try to move that logic into > architecture-independent code and make all existing architectures do > the same thing. > > As we had determined earlier, the behavior is wildly different across > architectures, but most of the differences come down to either bugs > (when required flushes are missing) or extra flushes that are harmless > but might hurt performance. > > I finally found the time to come up with an implementation of this, which > starts by replacing every outlier with one of the three common options: > > 1. architectures without speculative prefetching (hegagon, m68k, > openrisc, sh, sparc, and certain armv4 and xtensa implementations) > only flush their caches before a DMA, by cleaning write-back caches > (if any) before a DMA to the device, and by invalidating the caches > before a DMA from a device > > 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the > normal 32-bit arm model and invalidate their writeback caches > again after a DMA from the device, to remove stale cache lines > that got prefetched during the DMA. arc, csky and mips used to > invalidate buffers also before the bidirectional DMA, but this > is now skipped whenever we know it gets invalidated again > after the DMA. > > 3. parisc, powerpc and riscv already flushed buffers before > a DMA_FROM_DEVICE, and these get moved to the arm64 behavior > that does the writeback before and invalidate after both > DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the > problem of accidentally leaking stale data if the DMA does > not actually happen[2]. > > The last patch in the series replaces the architecture specific code > with a shared version that implements all three based on architecture > specific parameters that are almost always determined at compile time. > > The difference between cases 1. and 2. is hardware specific, while between > 2. and 3. we need to decide which semantics we want, but I explicitly > avoid this question in my series and leave it to be decided later. > > Another difference that I do not address here is what cache invalidation > does for partical cache lines. On arm32, arm64 and powerpc, a partial > cache line always gets written back before invalidation in order to > ensure that data before or after the buffer is not discarded. On all > other architectures, the assumption is cache lines are never shared > between DMA buffer and data that is accessed by the CPU. If we end up > always writing back dirty cache lines before a DMA (option 3 above), > then this point becomes moot, otherwise we should probably address this > in a follow-up series to document one behavior or the other and implement > it consistently. > > Please review! > > Arnd > > [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ > [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > > Arnd Bergmann (21): > openrisc: dma-mapping: flush bidirectional mappings > xtensa: dma-mapping: use normal cache invalidation rules > sparc32: flush caches in dma_sync_*for_device > microblaze: dma-mapping: skip extra DMA flushes > powerpc: dma-mapping: split out cache operation logic > powerpc: dma-mapping: minimize for_cpu flushing > powerpc: dma-mapping: always clean cache in _for_device() op > riscv: dma-mapping: only invalidate after DMA, not flush > riscv: dma-mapping: skip invalidation before bidirectional DMA > csky: dma-mapping: skip invalidating before DMA from device > mips: dma-mapping: skip invalidating before bidirectional DMA > mips: dma-mapping: split out cache operation logic > arc: dma-mapping: skip invalidating before bidirectional DMA > parisc: dma-mapping: use regular flush/invalidate ops > ARM: dma-mapping: always invalidate WT caches before DMA > ARM: dma-mapping: bring back dmac_{clean,inv}_range > ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally > ARM: drop SMP support for ARM11MPCore > ARM: dma-mapping: use generic form of arch_sync_dma_* helpers > ARM: dma-mapping: split out arch_dma_mark_clean() helper > dma-mapping: replace custom code with generic implementation > Do you plan to send v2 for this series? Cheers, Prabhakar > arch/arc/mm/dma.c | 66 ++------ > arch/arm/Kconfig | 4 + > arch/arm/include/asm/cacheflush.h | 21 +++ > arch/arm/include/asm/glue-cache.h | 4 + > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 --- > arch/arm/mach-oxnas/platsmp.c | 96 ----------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 --- > arch/arm/mm/cache-fa.S | 4 +- > arch/arm/mm/cache-nop.S | 6 + > arch/arm/mm/cache-v4.S | 13 +- > arch/arm/mm/cache-v4wb.S | 4 +- > arch/arm/mm/cache-v4wt.S | 22 ++- > arch/arm/mm/cache-v6.S | 35 +--- > arch/arm/mm/cache-v7.S | 6 +- > arch/arm/mm/cache-v7m.S | 4 +- > arch/arm/mm/dma-mapping-nommu.c | 36 ++-- > arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- > arch/arm/mm/proc-arm1020.S | 4 +- > arch/arm/mm/proc-arm1020e.S | 4 +- > arch/arm/mm/proc-arm1022.S | 4 +- > arch/arm/mm/proc-arm1026.S | 4 +- > arch/arm/mm/proc-arm920.S | 4 +- > arch/arm/mm/proc-arm922.S | 4 +- > arch/arm/mm/proc-arm925.S | 4 +- > arch/arm/mm/proc-arm926.S | 4 +- > arch/arm/mm/proc-arm940.S | 4 +- > arch/arm/mm/proc-arm946.S | 4 +- > arch/arm/mm/proc-feroceon.S | 8 +- > arch/arm/mm/proc-macros.S | 2 + > arch/arm/mm/proc-mohawk.S | 4 +- > arch/arm/mm/proc-xsc3.S | 4 +- > arch/arm/mm/proc-xscale.S | 6 +- > arch/arm64/mm/dma-mapping.c | 28 ++-- > arch/csky/mm/dma-mapping.c | 46 +++--- > arch/hexagon/kernel/dma.c | 44 ++--- > arch/m68k/kernel/dma.c | 43 +++-- > arch/microblaze/kernel/dma.c | 38 ++--- > arch/mips/mm/dma-noncoherent.c | 75 +++------ > arch/nios2/mm/dma-mapping.c | 57 +++---- > arch/openrisc/kernel/dma.c | 62 ++++--- > arch/parisc/include/asm/cacheflush.h | 6 +- > arch/parisc/kernel/pci-dma.c | 33 +++- > arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++--- > arch/sh/kernel/dma-coherent.c | 43 +++-- > arch/sparc/Kconfig | 2 +- > arch/sparc/kernel/ioport.c | 38 +++-- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +- > arch/xtensa/kernel/pci-dma.c | 47 +++--- > include/linux/dma-sync.h | 107 ++++++++++++ > 54 files changed, 721 insertions(+), 699 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > create mode 100644 include/linux/dma-sync.h > > -- > 2.39.2 > > Cc: Vineet Gupta <vgupta@kernel.org> > Cc: Russell King <linux@armlinux.org.uk> > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Guo Ren <guoren@kernel.org> > Cc: Brian Cain <bcain@quicinc.com> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: Michal Simek <monstr@monstr.eu> > Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> > Cc: Dinh Nguyen <dinguyen@kernel.org> > Cc: Stafford Horne <shorne@gmail.com> > Cc: Helge Deller <deller@gmx.de> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu> > Cc: Paul Walmsley <paul.walmsley@sifive.com> > Cc: Palmer Dabbelt <palmer@dabbelt.com> > Cc: Rich Felker <dalias@libc.org> > Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > Cc: Conor Dooley <conor.dooley@microchip.com> > Cc: linux-snps-arc@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-oxnas@groups.io > Cc: linux-csky@vger.kernel.org > Cc: linux-hexagon@vger.kernel.org > Cc: linux-m68k@lists.linux-m68k.org > Cc: linux-mips@vger.kernel.org > Cc: linux-openrisc@vger.kernel.org > Cc: linux-parisc@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-riscv@lists.infradead.org > Cc: linux-sh@vger.kernel.org > Cc: sparclinux@vger.kernel.org > Cc: linux-xtensa@linux-xtensa.org > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-05-25 7:46 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-05-25 7:46 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa Hi Arnd, On Mon, Mar 27, 2023 at 1:14 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > After a long discussion about adding SoC specific semantics for when > to flush caches in drivers/soc/ drivers that we determined to be > fundamentally flawed[1], I volunteered to try to move that logic into > architecture-independent code and make all existing architectures do > the same thing. > > As we had determined earlier, the behavior is wildly different across > architectures, but most of the differences come down to either bugs > (when required flushes are missing) or extra flushes that are harmless > but might hurt performance. > > I finally found the time to come up with an implementation of this, which > starts by replacing every outlier with one of the three common options: > > 1. architectures without speculative prefetching (hegagon, m68k, > openrisc, sh, sparc, and certain armv4 and xtensa implementations) > only flush their caches before a DMA, by cleaning write-back caches > (if any) before a DMA to the device, and by invalidating the caches > before a DMA from a device > > 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the > normal 32-bit arm model and invalidate their writeback caches > again after a DMA from the device, to remove stale cache lines > that got prefetched during the DMA. arc, csky and mips used to > invalidate buffers also before the bidirectional DMA, but this > is now skipped whenever we know it gets invalidated again > after the DMA. > > 3. parisc, powerpc and riscv already flushed buffers before > a DMA_FROM_DEVICE, and these get moved to the arm64 behavior > that does the writeback before and invalidate after both > DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the > problem of accidentally leaking stale data if the DMA does > not actually happen[2]. > > The last patch in the series replaces the architecture specific code > with a shared version that implements all three based on architecture > specific parameters that are almost always determined at compile time. > > The difference between cases 1. and 2. is hardware specific, while between > 2. and 3. we need to decide which semantics we want, but I explicitly > avoid this question in my series and leave it to be decided later. > > Another difference that I do not address here is what cache invalidation > does for partical cache lines. On arm32, arm64 and powerpc, a partial > cache line always gets written back before invalidation in order to > ensure that data before or after the buffer is not discarded. On all > other architectures, the assumption is cache lines are never shared > between DMA buffer and data that is accessed by the CPU. If we end up > always writing back dirty cache lines before a DMA (option 3 above), > then this point becomes moot, otherwise we should probably address this > in a follow-up series to document one behavior or the other and implement > it consistently. > > Please review! > > Arnd > > [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ > [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > > Arnd Bergmann (21): > openrisc: dma-mapping: flush bidirectional mappings > xtensa: dma-mapping: use normal cache invalidation rules > sparc32: flush caches in dma_sync_*for_device > microblaze: dma-mapping: skip extra DMA flushes > powerpc: dma-mapping: split out cache operation logic > powerpc: dma-mapping: minimize for_cpu flushing > powerpc: dma-mapping: always clean cache in _for_device() op > riscv: dma-mapping: only invalidate after DMA, not flush > riscv: dma-mapping: skip invalidation before bidirectional DMA > csky: dma-mapping: skip invalidating before DMA from device > mips: dma-mapping: skip invalidating before bidirectional DMA > mips: dma-mapping: split out cache operation logic > arc: dma-mapping: skip invalidating before bidirectional DMA > parisc: dma-mapping: use regular flush/invalidate ops > ARM: dma-mapping: always invalidate WT caches before DMA > ARM: dma-mapping: bring back dmac_{clean,inv}_range > ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally > ARM: drop SMP support for ARM11MPCore > ARM: dma-mapping: use generic form of arch_sync_dma_* helpers > ARM: dma-mapping: split out arch_dma_mark_clean() helper > dma-mapping: replace custom code with generic implementation > Do you plan to send v2 for this series? Cheers, Prabhakar > arch/arc/mm/dma.c | 66 ++------ > arch/arm/Kconfig | 4 + > arch/arm/include/asm/cacheflush.h | 21 +++ > arch/arm/include/asm/glue-cache.h | 4 + > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 --- > arch/arm/mach-oxnas/platsmp.c | 96 ----------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 --- > arch/arm/mm/cache-fa.S | 4 +- > arch/arm/mm/cache-nop.S | 6 + > arch/arm/mm/cache-v4.S | 13 +- > arch/arm/mm/cache-v4wb.S | 4 +- > arch/arm/mm/cache-v4wt.S | 22 ++- > arch/arm/mm/cache-v6.S | 35 +--- > arch/arm/mm/cache-v7.S | 6 +- > arch/arm/mm/cache-v7m.S | 4 +- > arch/arm/mm/dma-mapping-nommu.c | 36 ++-- > arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- > arch/arm/mm/proc-arm1020.S | 4 +- > arch/arm/mm/proc-arm1020e.S | 4 +- > arch/arm/mm/proc-arm1022.S | 4 +- > arch/arm/mm/proc-arm1026.S | 4 +- > arch/arm/mm/proc-arm920.S | 4 +- > arch/arm/mm/proc-arm922.S | 4 +- > arch/arm/mm/proc-arm925.S | 4 +- > arch/arm/mm/proc-arm926.S | 4 +- > arch/arm/mm/proc-arm940.S | 4 +- > arch/arm/mm/proc-arm946.S | 4 +- > arch/arm/mm/proc-feroceon.S | 8 +- > arch/arm/mm/proc-macros.S | 2 + > arch/arm/mm/proc-mohawk.S | 4 +- > arch/arm/mm/proc-xsc3.S | 4 +- > arch/arm/mm/proc-xscale.S | 6 +- > arch/arm64/mm/dma-mapping.c | 28 ++-- > arch/csky/mm/dma-mapping.c | 46 +++--- > arch/hexagon/kernel/dma.c | 44 ++--- > arch/m68k/kernel/dma.c | 43 +++-- > arch/microblaze/kernel/dma.c | 38 ++--- > arch/mips/mm/dma-noncoherent.c | 75 +++------ > arch/nios2/mm/dma-mapping.c | 57 +++---- > arch/openrisc/kernel/dma.c | 62 ++++--- > arch/parisc/include/asm/cacheflush.h | 6 +- > arch/parisc/kernel/pci-dma.c | 33 +++- > arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++--- > arch/sh/kernel/dma-coherent.c | 43 +++-- > arch/sparc/Kconfig | 2 +- > arch/sparc/kernel/ioport.c | 38 +++-- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +- > arch/xtensa/kernel/pci-dma.c | 47 +++--- > include/linux/dma-sync.h | 107 ++++++++++++ > 54 files changed, 721 insertions(+), 699 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > create mode 100644 include/linux/dma-sync.h > > -- > 2.39.2 > > Cc: Vineet Gupta <vgupta@kernel.org> > Cc: Russell King <linux@armlinux.org.uk> > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Guo Ren <guoren@kernel.org> > Cc: Brian Cain <bcain@quicinc.com> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: Michal Simek <monstr@monstr.eu> > Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> > Cc: Dinh Nguyen <dinguyen@kernel.org> > Cc: Stafford Horne <shorne@gmail.com> > Cc: Helge Deller <deller@gmx.de> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu> > Cc: Paul Walmsley <paul.walmsley@sifive.com> > Cc: Palmer Dabbelt <palmer@dabbelt.com> > Cc: Rich Felker <dalias@libc.org> > Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > Cc: Conor Dooley <conor.dooley@microchip.com> > Cc: linux-snps-arc@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-oxnas@groups.io > Cc: linux-csky@vger.kernel.org > Cc: linux-hexagon@vger.kernel.org > Cc: linux-m68k@lists.linux-m68k.org > Cc: linux-mips@vger.kernel.org > Cc: linux-openrisc@vger.kernel.org > Cc: linux-parisc@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-riscv@lists.infradead.org > Cc: linux-sh@vger.kernel.org > Cc: sparclinux@vger.kernel.org > Cc: linux-xtensa@linux-xtensa.org > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-05-25 7:46 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-05-25 7:46 UTC (permalink / raw) To: Arnd Bergmann Cc: Rich Felker, linux-sh, Catalin Marinas, Linus Walleij, John Paul Adrian Glaubitz, linux-mips, Max Filippov, Conor Dooley, Guo Ren, linux-csky, sparclinux, linux-riscv, Will Deacon, Christoph Hellwig, Helge Deller, Russell King, Geert Uytterhoeven, Vineet Gupta, linux-snps-arc, linux-xtensa, Arnd Bergmann, Brian Cain, Lad Prabhakar, linux-m68k, Paul Walmsley, Stafford Horne, linux-arm-kernel, Neil Armstrong <neil.armstr Hi Arnd, On Mon, Mar 27, 2023 at 1:14 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > After a long discussion about adding SoC specific semantics for when > to flush caches in drivers/soc/ drivers that we determined to be > fundamentally flawed[1], I volunteered to try to move that logic into > architecture-independent code and make all existing architectures do > the same thing. > > As we had determined earlier, the behavior is wildly different across > architectures, but most of the differences come down to either bugs > (when required flushes are missing) or extra flushes that are harmless > but might hurt performance. > > I finally found the time to come up with an implementation of this, which > starts by replacing every outlier with one of the three common options: > > 1. architectures without speculative prefetching (hegagon, m68k, > openrisc, sh, sparc, and certain armv4 and xtensa implementations) > only flush their caches before a DMA, by cleaning write-back caches > (if any) before a DMA to the device, and by invalidating the caches > before a DMA from a device > > 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the > normal 32-bit arm model and invalidate their writeback caches > again after a DMA from the device, to remove stale cache lines > that got prefetched during the DMA. arc, csky and mips used to > invalidate buffers also before the bidirectional DMA, but this > is now skipped whenever we know it gets invalidated again > after the DMA. > > 3. parisc, powerpc and riscv already flushed buffers before > a DMA_FROM_DEVICE, and these get moved to the arm64 behavior > that does the writeback before and invalidate after both > DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the > problem of accidentally leaking stale data if the DMA does > not actually happen[2]. > > The last patch in the series replaces the architecture specific code > with a shared version that implements all three based on architecture > specific parameters that are almost always determined at compile time. > > The difference between cases 1. and 2. is hardware specific, while between > 2. and 3. we need to decide which semantics we want, but I explicitly > avoid this question in my series and leave it to be decided later. > > Another difference that I do not address here is what cache invalidation > does for partical cache lines. On arm32, arm64 and powerpc, a partial > cache line always gets written back before invalidation in order to > ensure that data before or after the buffer is not discarded. On all > other architectures, the assumption is cache lines are never shared > between DMA buffer and data that is accessed by the CPU. If we end up > always writing back dirty cache lines before a DMA (option 3 above), > then this point becomes moot, otherwise we should probably address this > in a follow-up series to document one behavior or the other and implement > it consistently. > > Please review! > > Arnd > > [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ > [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > > Arnd Bergmann (21): > openrisc: dma-mapping: flush bidirectional mappings > xtensa: dma-mapping: use normal cache invalidation rules > sparc32: flush caches in dma_sync_*for_device > microblaze: dma-mapping: skip extra DMA flushes > powerpc: dma-mapping: split out cache operation logic > powerpc: dma-mapping: minimize for_cpu flushing > powerpc: dma-mapping: always clean cache in _for_device() op > riscv: dma-mapping: only invalidate after DMA, not flush > riscv: dma-mapping: skip invalidation before bidirectional DMA > csky: dma-mapping: skip invalidating before DMA from device > mips: dma-mapping: skip invalidating before bidirectional DMA > mips: dma-mapping: split out cache operation logic > arc: dma-mapping: skip invalidating before bidirectional DMA > parisc: dma-mapping: use regular flush/invalidate ops > ARM: dma-mapping: always invalidate WT caches before DMA > ARM: dma-mapping: bring back dmac_{clean,inv}_range > ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally > ARM: drop SMP support for ARM11MPCore > ARM: dma-mapping: use generic form of arch_sync_dma_* helpers > ARM: dma-mapping: split out arch_dma_mark_clean() helper > dma-mapping: replace custom code with generic implementation > Do you plan to send v2 for this series? Cheers, Prabhakar > arch/arc/mm/dma.c | 66 ++------ > arch/arm/Kconfig | 4 + > arch/arm/include/asm/cacheflush.h | 21 +++ > arch/arm/include/asm/glue-cache.h | 4 + > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 --- > arch/arm/mach-oxnas/platsmp.c | 96 ----------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 --- > arch/arm/mm/cache-fa.S | 4 +- > arch/arm/mm/cache-nop.S | 6 + > arch/arm/mm/cache-v4.S | 13 +- > arch/arm/mm/cache-v4wb.S | 4 +- > arch/arm/mm/cache-v4wt.S | 22 ++- > arch/arm/mm/cache-v6.S | 35 +--- > arch/arm/mm/cache-v7.S | 6 +- > arch/arm/mm/cache-v7m.S | 4 +- > arch/arm/mm/dma-mapping-nommu.c | 36 ++-- > arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- > arch/arm/mm/proc-arm1020.S | 4 +- > arch/arm/mm/proc-arm1020e.S | 4 +- > arch/arm/mm/proc-arm1022.S | 4 +- > arch/arm/mm/proc-arm1026.S | 4 +- > arch/arm/mm/proc-arm920.S | 4 +- > arch/arm/mm/proc-arm922.S | 4 +- > arch/arm/mm/proc-arm925.S | 4 +- > arch/arm/mm/proc-arm926.S | 4 +- > arch/arm/mm/proc-arm940.S | 4 +- > arch/arm/mm/proc-arm946.S | 4 +- > arch/arm/mm/proc-feroceon.S | 8 +- > arch/arm/mm/proc-macros.S | 2 + > arch/arm/mm/proc-mohawk.S | 4 +- > arch/arm/mm/proc-xsc3.S | 4 +- > arch/arm/mm/proc-xscale.S | 6 +- > arch/arm64/mm/dma-mapping.c | 28 ++-- > arch/csky/mm/dma-mapping.c | 46 +++--- > arch/hexagon/kernel/dma.c | 44 ++--- > arch/m68k/kernel/dma.c | 43 +++-- > arch/microblaze/kernel/dma.c | 38 ++--- > arch/mips/mm/dma-noncoherent.c | 75 +++------ > arch/nios2/mm/dma-mapping.c | 57 +++---- > arch/openrisc/kernel/dma.c | 62 ++++--- > arch/parisc/include/asm/cacheflush.h | 6 +- > arch/parisc/kernel/pci-dma.c | 33 +++- > arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++--- > arch/sh/kernel/dma-coherent.c | 43 +++-- > arch/sparc/Kconfig | 2 +- > arch/sparc/kernel/ioport.c | 38 +++-- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +- > arch/xtensa/kernel/pci-dma.c | 47 +++--- > include/linux/dma-sync.h | 107 ++++++++++++ > 54 files changed, 721 insertions(+), 699 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > create mode 100644 include/linux/dma-sync.h > > -- > 2.39.2 > > Cc: Vineet Gupta <vgupta@kernel.org> > Cc: Russell King <linux@armlinux.org.uk> > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Guo Ren <guoren@kernel.org> > Cc: Brian Cain <bcain@quicinc.com> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: Michal Simek <monstr@monstr.eu> > Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> > Cc: Dinh Nguyen <dinguyen@kernel.org> > Cc: Stafford Horne <shorne@gmail.com> > Cc: Helge Deller <deller@gmx.de> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu> > Cc: Paul Walmsley <paul.walmsley@sifive.com> > Cc: Palmer Dabbelt <palmer@dabbelt.com> > Cc: Rich Felker <dalias@libc.org> > Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > Cc: Conor Dooley <conor.dooley@microchip.com> > Cc: linux-snps-arc@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-oxnas@groups.io > Cc: linux-csky@vger.kernel.org > Cc: linux-hexagon@vger.kernel.org > Cc: linux-m68k@lists.linux-m68k.org > Cc: linux-mips@vger.kernel.org > Cc: linux-openrisc@vger.kernel.org > Cc: linux-parisc@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-riscv@lists.infradead.org > Cc: linux-sh@vger.kernel.org > Cc: sparclinux@vger.kernel.org > Cc: linux-xtensa@linux-xtensa.org > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-05-25 7:46 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-05-25 7:46 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa Hi Arnd, On Mon, Mar 27, 2023 at 1:14 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > After a long discussion about adding SoC specific semantics for when > to flush caches in drivers/soc/ drivers that we determined to be > fundamentally flawed[1], I volunteered to try to move that logic into > architecture-independent code and make all existing architectures do > the same thing. > > As we had determined earlier, the behavior is wildly different across > architectures, but most of the differences come down to either bugs > (when required flushes are missing) or extra flushes that are harmless > but might hurt performance. > > I finally found the time to come up with an implementation of this, which > starts by replacing every outlier with one of the three common options: > > 1. architectures without speculative prefetching (hegagon, m68k, > openrisc, sh, sparc, and certain armv4 and xtensa implementations) > only flush their caches before a DMA, by cleaning write-back caches > (if any) before a DMA to the device, and by invalidating the caches > before a DMA from a device > > 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the > normal 32-bit arm model and invalidate their writeback caches > again after a DMA from the device, to remove stale cache lines > that got prefetched during the DMA. arc, csky and mips used to > invalidate buffers also before the bidirectional DMA, but this > is now skipped whenever we know it gets invalidated again > after the DMA. > > 3. parisc, powerpc and riscv already flushed buffers before > a DMA_FROM_DEVICE, and these get moved to the arm64 behavior > that does the writeback before and invalidate after both > DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the > problem of accidentally leaking stale data if the DMA does > not actually happen[2]. > > The last patch in the series replaces the architecture specific code > with a shared version that implements all three based on architecture > specific parameters that are almost always determined at compile time. > > The difference between cases 1. and 2. is hardware specific, while between > 2. and 3. we need to decide which semantics we want, but I explicitly > avoid this question in my series and leave it to be decided later. > > Another difference that I do not address here is what cache invalidation > does for partical cache lines. On arm32, arm64 and powerpc, a partial > cache line always gets written back before invalidation in order to > ensure that data before or after the buffer is not discarded. On all > other architectures, the assumption is cache lines are never shared > between DMA buffer and data that is accessed by the CPU. If we end up > always writing back dirty cache lines before a DMA (option 3 above), > then this point becomes moot, otherwise we should probably address this > in a follow-up series to document one behavior or the other and implement > it consistently. > > Please review! > > Arnd > > [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ > [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > > Arnd Bergmann (21): > openrisc: dma-mapping: flush bidirectional mappings > xtensa: dma-mapping: use normal cache invalidation rules > sparc32: flush caches in dma_sync_*for_device > microblaze: dma-mapping: skip extra DMA flushes > powerpc: dma-mapping: split out cache operation logic > powerpc: dma-mapping: minimize for_cpu flushing > powerpc: dma-mapping: always clean cache in _for_device() op > riscv: dma-mapping: only invalidate after DMA, not flush > riscv: dma-mapping: skip invalidation before bidirectional DMA > csky: dma-mapping: skip invalidating before DMA from device > mips: dma-mapping: skip invalidating before bidirectional DMA > mips: dma-mapping: split out cache operation logic > arc: dma-mapping: skip invalidating before bidirectional DMA > parisc: dma-mapping: use regular flush/invalidate ops > ARM: dma-mapping: always invalidate WT caches before DMA > ARM: dma-mapping: bring back dmac_{clean,inv}_range > ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally > ARM: drop SMP support for ARM11MPCore > ARM: dma-mapping: use generic form of arch_sync_dma_* helpers > ARM: dma-mapping: split out arch_dma_mark_clean() helper > dma-mapping: replace custom code with generic implementation > Do you plan to send v2 for this series? Cheers, Prabhakar > arch/arc/mm/dma.c | 66 ++------ > arch/arm/Kconfig | 4 + > arch/arm/include/asm/cacheflush.h | 21 +++ > arch/arm/include/asm/glue-cache.h | 4 + > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 --- > arch/arm/mach-oxnas/platsmp.c | 96 ----------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 --- > arch/arm/mm/cache-fa.S | 4 +- > arch/arm/mm/cache-nop.S | 6 + > arch/arm/mm/cache-v4.S | 13 +- > arch/arm/mm/cache-v4wb.S | 4 +- > arch/arm/mm/cache-v4wt.S | 22 ++- > arch/arm/mm/cache-v6.S | 35 +--- > arch/arm/mm/cache-v7.S | 6 +- > arch/arm/mm/cache-v7m.S | 4 +- > arch/arm/mm/dma-mapping-nommu.c | 36 ++-- > arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- > arch/arm/mm/proc-arm1020.S | 4 +- > arch/arm/mm/proc-arm1020e.S | 4 +- > arch/arm/mm/proc-arm1022.S | 4 +- > arch/arm/mm/proc-arm1026.S | 4 +- > arch/arm/mm/proc-arm920.S | 4 +- > arch/arm/mm/proc-arm922.S | 4 +- > arch/arm/mm/proc-arm925.S | 4 +- > arch/arm/mm/proc-arm926.S | 4 +- > arch/arm/mm/proc-arm940.S | 4 +- > arch/arm/mm/proc-arm946.S | 4 +- > arch/arm/mm/proc-feroceon.S | 8 +- > arch/arm/mm/proc-macros.S | 2 + > arch/arm/mm/proc-mohawk.S | 4 +- > arch/arm/mm/proc-xsc3.S | 4 +- > arch/arm/mm/proc-xscale.S | 6 +- > arch/arm64/mm/dma-mapping.c | 28 ++-- > arch/csky/mm/dma-mapping.c | 46 +++--- > arch/hexagon/kernel/dma.c | 44 ++--- > arch/m68k/kernel/dma.c | 43 +++-- > arch/microblaze/kernel/dma.c | 38 ++--- > arch/mips/mm/dma-noncoherent.c | 75 +++------ > arch/nios2/mm/dma-mapping.c | 57 +++---- > arch/openrisc/kernel/dma.c | 62 ++++--- > arch/parisc/include/asm/cacheflush.h | 6 +- > arch/parisc/kernel/pci-dma.c | 33 +++- > arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++--- > arch/sh/kernel/dma-coherent.c | 43 +++-- > arch/sparc/Kconfig | 2 +- > arch/sparc/kernel/ioport.c | 38 +++-- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +- > arch/xtensa/kernel/pci-dma.c | 47 +++--- > include/linux/dma-sync.h | 107 ++++++++++++ > 54 files changed, 721 insertions(+), 699 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > create mode 100644 include/linux/dma-sync.h > > -- > 2.39.2 > > Cc: Vineet Gupta <vgupta@kernel.org> > Cc: Russell King <linux@armlinux.org.uk> > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Guo Ren <guoren@kernel.org> > Cc: Brian Cain <bcain@quicinc.com> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: Michal Simek <monstr@monstr.eu> > Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> > Cc: Dinh Nguyen <dinguyen@kernel.org> > Cc: Stafford Horne <shorne@gmail.com> > Cc: Helge Deller <deller@gmx.de> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu> > Cc: Paul Walmsley <paul.walmsley@sifive.com> > Cc: Palmer Dabbelt <palmer@dabbelt.com> > Cc: Rich Felker <dalias@libc.org> > Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > Cc: Conor Dooley <conor.dooley@microchip.com> > Cc: linux-snps-arc@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-oxnas@groups.io > Cc: linux-csky@vger.kernel.org > Cc: linux-hexagon@vger.kernel.org > Cc: linux-m68k@lists.linux-m68k.org > Cc: linux-mips@vger.kernel.org > Cc: linux-openrisc@vger.kernel.org > Cc: linux-parisc@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-riscv@lists.infradead.org > Cc: linux-sh@vger.kernel.org > Cc: sparclinux@vger.kernel.org > Cc: linux-xtensa@linux-xtensa.org > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-05-25 7:46 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-05-25 7:46 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, Max Filippov, Christoph Hellwig, Robin Murphy, Lad Prabhakar, Conor Dooley, linux-snps-arc, linux-arm-kernel, linux-oxnas, linux-csky, linux-hexagon, linux-m68k, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-sh, sparclinux, linux-xtensa Hi Arnd, On Mon, Mar 27, 2023 at 1:14 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > After a long discussion about adding SoC specific semantics for when > to flush caches in drivers/soc/ drivers that we determined to be > fundamentally flawed[1], I volunteered to try to move that logic into > architecture-independent code and make all existing architectures do > the same thing. > > As we had determined earlier, the behavior is wildly different across > architectures, but most of the differences come down to either bugs > (when required flushes are missing) or extra flushes that are harmless > but might hurt performance. > > I finally found the time to come up with an implementation of this, which > starts by replacing every outlier with one of the three common options: > > 1. architectures without speculative prefetching (hegagon, m68k, > openrisc, sh, sparc, and certain armv4 and xtensa implementations) > only flush their caches before a DMA, by cleaning write-back caches > (if any) before a DMA to the device, and by invalidating the caches > before a DMA from a device > > 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the > normal 32-bit arm model and invalidate their writeback caches > again after a DMA from the device, to remove stale cache lines > that got prefetched during the DMA. arc, csky and mips used to > invalidate buffers also before the bidirectional DMA, but this > is now skipped whenever we know it gets invalidated again > after the DMA. > > 3. parisc, powerpc and riscv already flushed buffers before > a DMA_FROM_DEVICE, and these get moved to the arm64 behavior > that does the writeback before and invalidate after both > DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the > problem of accidentally leaking stale data if the DMA does > not actually happen[2]. > > The last patch in the series replaces the architecture specific code > with a shared version that implements all three based on architecture > specific parameters that are almost always determined at compile time. > > The difference between cases 1. and 2. is hardware specific, while between > 2. and 3. we need to decide which semantics we want, but I explicitly > avoid this question in my series and leave it to be decided later. > > Another difference that I do not address here is what cache invalidation > does for partical cache lines. On arm32, arm64 and powerpc, a partial > cache line always gets written back before invalidation in order to > ensure that data before or after the buffer is not discarded. On all > other architectures, the assumption is cache lines are never shared > between DMA buffer and data that is accessed by the CPU. If we end up > always writing back dirty cache lines before a DMA (option 3 above), > then this point becomes moot, otherwise we should probably address this > in a follow-up series to document one behavior or the other and implement > it consistently. > > Please review! > > Arnd > > [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ > [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > > Arnd Bergmann (21): > openrisc: dma-mapping: flush bidirectional mappings > xtensa: dma-mapping: use normal cache invalidation rules > sparc32: flush caches in dma_sync_*for_device > microblaze: dma-mapping: skip extra DMA flushes > powerpc: dma-mapping: split out cache operation logic > powerpc: dma-mapping: minimize for_cpu flushing > powerpc: dma-mapping: always clean cache in _for_device() op > riscv: dma-mapping: only invalidate after DMA, not flush > riscv: dma-mapping: skip invalidation before bidirectional DMA > csky: dma-mapping: skip invalidating before DMA from device > mips: dma-mapping: skip invalidating before bidirectional DMA > mips: dma-mapping: split out cache operation logic > arc: dma-mapping: skip invalidating before bidirectional DMA > parisc: dma-mapping: use regular flush/invalidate ops > ARM: dma-mapping: always invalidate WT caches before DMA > ARM: dma-mapping: bring back dmac_{clean,inv}_range > ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally > ARM: drop SMP support for ARM11MPCore > ARM: dma-mapping: use generic form of arch_sync_dma_* helpers > ARM: dma-mapping: split out arch_dma_mark_clean() helper > dma-mapping: replace custom code with generic implementation > Do you plan to send v2 for this series? Cheers, Prabhakar > arch/arc/mm/dma.c | 66 ++------ > arch/arm/Kconfig | 4 + > arch/arm/include/asm/cacheflush.h | 21 +++ > arch/arm/include/asm/glue-cache.h | 4 + > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 --- > arch/arm/mach-oxnas/platsmp.c | 96 ----------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 --- > arch/arm/mm/cache-fa.S | 4 +- > arch/arm/mm/cache-nop.S | 6 + > arch/arm/mm/cache-v4.S | 13 +- > arch/arm/mm/cache-v4wb.S | 4 +- > arch/arm/mm/cache-v4wt.S | 22 ++- > arch/arm/mm/cache-v6.S | 35 +--- > arch/arm/mm/cache-v7.S | 6 +- > arch/arm/mm/cache-v7m.S | 4 +- > arch/arm/mm/dma-mapping-nommu.c | 36 ++-- > arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- > arch/arm/mm/proc-arm1020.S | 4 +- > arch/arm/mm/proc-arm1020e.S | 4 +- > arch/arm/mm/proc-arm1022.S | 4 +- > arch/arm/mm/proc-arm1026.S | 4 +- > arch/arm/mm/proc-arm920.S | 4 +- > arch/arm/mm/proc-arm922.S | 4 +- > arch/arm/mm/proc-arm925.S | 4 +- > arch/arm/mm/proc-arm926.S | 4 +- > arch/arm/mm/proc-arm940.S | 4 +- > arch/arm/mm/proc-arm946.S | 4 +- > arch/arm/mm/proc-feroceon.S | 8 +- > arch/arm/mm/proc-macros.S | 2 + > arch/arm/mm/proc-mohawk.S | 4 +- > arch/arm/mm/proc-xsc3.S | 4 +- > arch/arm/mm/proc-xscale.S | 6 +- > arch/arm64/mm/dma-mapping.c | 28 ++-- > arch/csky/mm/dma-mapping.c | 46 +++--- > arch/hexagon/kernel/dma.c | 44 ++--- > arch/m68k/kernel/dma.c | 43 +++-- > arch/microblaze/kernel/dma.c | 38 ++--- > arch/mips/mm/dma-noncoherent.c | 75 +++------ > arch/nios2/mm/dma-mapping.c | 57 +++---- > arch/openrisc/kernel/dma.c | 62 ++++--- > arch/parisc/include/asm/cacheflush.h | 6 +- > arch/parisc/kernel/pci-dma.c | 33 +++- > arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++--- > arch/sh/kernel/dma-coherent.c | 43 +++-- > arch/sparc/Kconfig | 2 +- > arch/sparc/kernel/ioport.c | 38 +++-- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +- > arch/xtensa/kernel/pci-dma.c | 47 +++--- > include/linux/dma-sync.h | 107 ++++++++++++ > 54 files changed, 721 insertions(+), 699 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > create mode 100644 include/linux/dma-sync.h > > -- > 2.39.2 > > Cc: Vineet Gupta <vgupta@kernel.org> > Cc: Russell King <linux@armlinux.org.uk> > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Guo Ren <guoren@kernel.org> > Cc: Brian Cain <bcain@quicinc.com> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: Michal Simek <monstr@monstr.eu> > Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> > Cc: Dinh Nguyen <dinguyen@kernel.org> > Cc: Stafford Horne <shorne@gmail.com> > Cc: Helge Deller <deller@gmx.de> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu> > Cc: Paul Walmsley <paul.walmsley@sifive.com> > Cc: Palmer Dabbelt <palmer@dabbelt.com> > Cc: Rich Felker <dalias@libc.org> > Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > Cc: Conor Dooley <conor.dooley@microchip.com> > Cc: linux-snps-arc@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-oxnas@groups.io > Cc: linux-csky@vger.kernel.org > Cc: linux-hexagon@vger.kernel.org > Cc: linux-m68k@lists.linux-m68k.org > Cc: linux-mips@vger.kernel.org > Cc: linux-openrisc@vger.kernel.org > Cc: linux-parisc@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-riscv@lists.infradead.org > Cc: linux-sh@vger.kernel.org > Cc: sparclinux@vger.kernel.org > Cc: linux-xtensa@linux-xtensa.org > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
* Re: [PATCH 00/21] dma-mapping: unify support for cache flushes @ 2023-05-25 7:46 ` Lad, Prabhakar 0 siblings, 0 replies; 456+ messages in thread From: Lad, Prabhakar @ 2023-05-25 7:46 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Arnd Bergmann, Vineet Gupta, Russell King, Neil Armstrong, Linus Walleij, Catalin Marinas, Will Deacon, Guo Ren, Brian Cain, Geert Uytterhoeven, Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Stafford Horne, Helge Deller, Michael Ellerman, Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Rich Felker, John Paul Adrian Glaubitz, David S. Miller, M Hi Arnd, On Mon, Mar 27, 2023 at 1:14 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > After a long discussion about adding SoC specific semantics for when > to flush caches in drivers/soc/ drivers that we determined to be > fundamentally flawed[1], I volunteered to try to move that logic into > architecture-independent code and make all existing architectures do > the same thing. > > As we had determined earlier, the behavior is wildly different across > architectures, but most of the differences come down to either bugs > (when required flushes are missing) or extra flushes that are harmless > but might hurt performance. > > I finally found the time to come up with an implementation of this, which > starts by replacing every outlier with one of the three common options: > > 1. architectures without speculative prefetching (hegagon, m68k, > openrisc, sh, sparc, and certain armv4 and xtensa implementations) > only flush their caches before a DMA, by cleaning write-back caches > (if any) before a DMA to the device, and by invalidating the caches > before a DMA from a device > > 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the > normal 32-bit arm model and invalidate their writeback caches > again after a DMA from the device, to remove stale cache lines > that got prefetched during the DMA. arc, csky and mips used to > invalidate buffers also before the bidirectional DMA, but this > is now skipped whenever we know it gets invalidated again > after the DMA. > > 3. parisc, powerpc and riscv already flushed buffers before > a DMA_FROM_DEVICE, and these get moved to the arm64 behavior > that does the writeback before and invalidate after both > DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the > problem of accidentally leaking stale data if the DMA does > not actually happen[2]. > > The last patch in the series replaces the architecture specific code > with a shared version that implements all three based on architecture > specific parameters that are almost always determined at compile time. > > The difference between cases 1. and 2. is hardware specific, while between > 2. and 3. we need to decide which semantics we want, but I explicitly > avoid this question in my series and leave it to be decided later. > > Another difference that I do not address here is what cache invalidation > does for partical cache lines. On arm32, arm64 and powerpc, a partial > cache line always gets written back before invalidation in order to > ensure that data before or after the buffer is not discarded. On all > other architectures, the assumption is cache lines are never shared > between DMA buffer and data that is accessed by the CPU. If we end up > always writing back dirty cache lines before a DMA (option 3 above), > then this point becomes moot, otherwise we should probably address this > in a follow-up series to document one behavior or the other and implement > it consistently. > > Please review! > > Arnd > > [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ > [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > > Arnd Bergmann (21): > openrisc: dma-mapping: flush bidirectional mappings > xtensa: dma-mapping: use normal cache invalidation rules > sparc32: flush caches in dma_sync_*for_device > microblaze: dma-mapping: skip extra DMA flushes > powerpc: dma-mapping: split out cache operation logic > powerpc: dma-mapping: minimize for_cpu flushing > powerpc: dma-mapping: always clean cache in _for_device() op > riscv: dma-mapping: only invalidate after DMA, not flush > riscv: dma-mapping: skip invalidation before bidirectional DMA > csky: dma-mapping: skip invalidating before DMA from device > mips: dma-mapping: skip invalidating before bidirectional DMA > mips: dma-mapping: split out cache operation logic > arc: dma-mapping: skip invalidating before bidirectional DMA > parisc: dma-mapping: use regular flush/invalidate ops > ARM: dma-mapping: always invalidate WT caches before DMA > ARM: dma-mapping: bring back dmac_{clean,inv}_range > ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally > ARM: drop SMP support for ARM11MPCore > ARM: dma-mapping: use generic form of arch_sync_dma_* helpers > ARM: dma-mapping: split out arch_dma_mark_clean() helper > dma-mapping: replace custom code with generic implementation > Do you plan to send v2 for this series? Cheers, Prabhakar > arch/arc/mm/dma.c | 66 ++------ > arch/arm/Kconfig | 4 + > arch/arm/include/asm/cacheflush.h | 21 +++ > arch/arm/include/asm/glue-cache.h | 4 + > arch/arm/mach-oxnas/Kconfig | 4 - > arch/arm/mach-oxnas/Makefile | 1 - > arch/arm/mach-oxnas/headsmp.S | 23 --- > arch/arm/mach-oxnas/platsmp.c | 96 ----------- > arch/arm/mach-versatile/platsmp-realview.c | 4 - > arch/arm/mm/Kconfig | 19 --- > arch/arm/mm/cache-fa.S | 4 +- > arch/arm/mm/cache-nop.S | 6 + > arch/arm/mm/cache-v4.S | 13 +- > arch/arm/mm/cache-v4wb.S | 4 +- > arch/arm/mm/cache-v4wt.S | 22 ++- > arch/arm/mm/cache-v6.S | 35 +--- > arch/arm/mm/cache-v7.S | 6 +- > arch/arm/mm/cache-v7m.S | 4 +- > arch/arm/mm/dma-mapping-nommu.c | 36 ++-- > arch/arm/mm/dma-mapping.c | 181 ++++++++++----------- > arch/arm/mm/proc-arm1020.S | 4 +- > arch/arm/mm/proc-arm1020e.S | 4 +- > arch/arm/mm/proc-arm1022.S | 4 +- > arch/arm/mm/proc-arm1026.S | 4 +- > arch/arm/mm/proc-arm920.S | 4 +- > arch/arm/mm/proc-arm922.S | 4 +- > arch/arm/mm/proc-arm925.S | 4 +- > arch/arm/mm/proc-arm926.S | 4 +- > arch/arm/mm/proc-arm940.S | 4 +- > arch/arm/mm/proc-arm946.S | 4 +- > arch/arm/mm/proc-feroceon.S | 8 +- > arch/arm/mm/proc-macros.S | 2 + > arch/arm/mm/proc-mohawk.S | 4 +- > arch/arm/mm/proc-xsc3.S | 4 +- > arch/arm/mm/proc-xscale.S | 6 +- > arch/arm64/mm/dma-mapping.c | 28 ++-- > arch/csky/mm/dma-mapping.c | 46 +++--- > arch/hexagon/kernel/dma.c | 44 ++--- > arch/m68k/kernel/dma.c | 43 +++-- > arch/microblaze/kernel/dma.c | 38 ++--- > arch/mips/mm/dma-noncoherent.c | 75 +++------ > arch/nios2/mm/dma-mapping.c | 57 +++---- > arch/openrisc/kernel/dma.c | 62 ++++--- > arch/parisc/include/asm/cacheflush.h | 6 +- > arch/parisc/kernel/pci-dma.c | 33 +++- > arch/powerpc/mm/dma-noncoherent.c | 76 +++++---- > arch/riscv/mm/dma-noncoherent.c | 51 +++--- > arch/sh/kernel/dma-coherent.c | 43 +++-- > arch/sparc/Kconfig | 2 +- > arch/sparc/kernel/ioport.c | 38 +++-- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +- > arch/xtensa/kernel/pci-dma.c | 47 +++--- > include/linux/dma-sync.h | 107 ++++++++++++ > 54 files changed, 721 insertions(+), 699 deletions(-) > delete mode 100644 arch/arm/mach-oxnas/headsmp.S > delete mode 100644 arch/arm/mach-oxnas/platsmp.c > create mode 100644 include/linux/dma-sync.h > > -- > 2.39.2 > > Cc: Vineet Gupta <vgupta@kernel.org> > Cc: Russell King <linux@armlinux.org.uk> > Cc: Neil Armstrong <neil.armstrong@linaro.org> > Cc: Linus Walleij <linus.walleij@linaro.org> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Guo Ren <guoren@kernel.org> > Cc: Brian Cain <bcain@quicinc.com> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: Michal Simek <monstr@monstr.eu> > Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> > Cc: Dinh Nguyen <dinguyen@kernel.org> > Cc: Stafford Horne <shorne@gmail.com> > Cc: Helge Deller <deller@gmx.de> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu> > Cc: Paul Walmsley <paul.walmsley@sifive.com> > Cc: Palmer Dabbelt <palmer@dabbelt.com> > Cc: Rich Felker <dalias@libc.org> > Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Max Filippov <jcmvbkbc@gmail.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > Cc: Conor Dooley <conor.dooley@microchip.com> > Cc: linux-snps-arc@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-oxnas@groups.io > Cc: linux-csky@vger.kernel.org > Cc: linux-hexagon@vger.kernel.org > Cc: linux-m68k@lists.linux-m68k.org > Cc: linux-mips@vger.kernel.org > Cc: linux-openrisc@vger.kernel.org > Cc: linux-parisc@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-riscv@lists.infradead.org > Cc: linux-sh@vger.kernel.org > Cc: sparclinux@vger.kernel.org > Cc: linux-xtensa@linux-xtensa.org > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 456+ messages in thread
end of thread, other threads:[~2023-07-06 14:18 UTC | newest]
Thread overview: 456+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-27 12:12 [PATCH 00/21] dma-mapping: unify support for cache flushes Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` [PATCH 01/21] openrisc: dma-mapping: flush bidirectional mappings Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 15:42 ` Max Filippov
2023-03-27 15:42 ` Max Filippov
2023-03-27 15:42 ` Max Filippov
2023-03-27 15:42 ` Max Filippov
2023-03-27 15:42 ` Max Filippov
2023-03-27 15:42 ` Max Filippov
2023-03-27 12:12 ` [PATCH 03/21] sparc32: flush caches in dma_sync_*for_device Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:12 ` Arnd Bergmann
2023-03-27 12:13 ` [PATCH 04/21] microblaze: dma-mapping: skip extra DMA flushes Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` [PATCH 05/21] powerpc: dma-mapping: split out cache operation logic Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:56 ` Christophe Leroy
2023-03-27 12:56 ` Christophe Leroy
2023-03-27 12:56 ` Christophe Leroy
2023-03-27 12:56 ` Christophe Leroy
2023-03-27 12:56 ` Christophe Leroy
2023-03-27 12:56 ` Christophe Leroy
2023-03-27 13:02 ` Arnd Bergmann
2023-03-27 13:02 ` Arnd Bergmann
2023-03-27 13:02 ` Arnd Bergmann
2023-03-27 13:02 ` Arnd Bergmann
2023-03-27 13:02 ` Arnd Bergmann
2023-03-27 13:02 ` Arnd Bergmann
2023-03-27 12:13 ` [PATCH 07/21] powerpc: dma-mapping: always clean cache in _for_device() op Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-29 20:48 ` Conor Dooley
2023-03-29 20:48 ` Conor Dooley
2023-03-29 20:48 ` Conor Dooley
2023-03-29 20:48 ` Conor Dooley
2023-03-29 20:48 ` Conor Dooley
2023-03-29 20:48 ` Conor Dooley
2023-03-30 7:10 ` Arnd Bergmann
2023-03-30 7:10 ` Arnd Bergmann
2023-03-30 7:10 ` Arnd Bergmann
2023-03-30 7:10 ` Arnd Bergmann
2023-03-30 7:10 ` Arnd Bergmann
2023-03-30 7:10 ` Arnd Bergmann
2023-03-29 21:51 ` Jessica Clarke
2023-03-29 21:51 ` Jessica Clarke
2023-03-29 21:51 ` Jessica Clarke
2023-03-29 21:51 ` Jessica Clarke
2023-03-29 21:51 ` Jessica Clarke
2023-03-29 21:51 ` Jessica Clarke
2023-03-30 12:59 ` Lad, Prabhakar
2023-03-30 12:59 ` Lad, Prabhakar
2023-03-30 12:59 ` Lad, Prabhakar
2023-03-30 12:59 ` Lad, Prabhakar
2023-03-30 12:59 ` Lad, Prabhakar
2023-03-30 12:59 ` Lad, Prabhakar
2023-04-19 14:22 ` Palmer Dabbelt
2023-04-19 14:22 ` Palmer Dabbelt
2023-04-19 14:22 ` Palmer Dabbelt
2023-04-19 14:22 ` Palmer Dabbelt
2023-04-19 14:22 ` Palmer Dabbelt
2023-04-19 14:22 ` Palmer Dabbelt
2023-03-27 12:13 ` [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-29 20:16 ` Conor Dooley
2023-03-29 20:16 ` Conor Dooley
2023-03-29 20:16 ` Conor Dooley
2023-03-29 20:16 ` Conor Dooley
2023-03-29 20:16 ` Conor Dooley
2023-03-29 20:16 ` Conor Dooley
2023-03-30 13:26 ` Lad, Prabhakar
2023-03-30 13:26 ` Lad, Prabhakar
2023-03-30 13:26 ` Lad, Prabhakar
2023-03-30 13:26 ` Lad, Prabhakar
2023-03-30 13:26 ` Lad, Prabhakar
2023-03-30 13:26 ` Lad, Prabhakar
2023-04-19 14:22 ` Palmer Dabbelt
2023-04-19 14:22 ` Palmer Dabbelt
2023-04-19 14:22 ` Palmer Dabbelt
2023-04-19 14:22 ` Palmer Dabbelt
2023-04-19 14:22 ` Palmer Dabbelt
2023-04-19 14:22 ` Palmer Dabbelt
2023-05-05 5:47 ` Guo Ren
2023-05-05 5:47 ` Guo Ren
2023-05-05 5:47 ` Guo Ren
2023-05-05 5:47 ` Guo Ren
2023-05-05 5:47 ` Guo Ren
2023-05-05 5:47 ` Guo Ren
2023-05-05 13:18 ` Arnd Bergmann
2023-05-05 13:18 ` Arnd Bergmann
2023-05-05 13:18 ` Arnd Bergmann
2023-05-05 13:18 ` Arnd Bergmann
2023-05-05 13:18 ` Arnd Bergmann
2023-05-05 13:18 ` Arnd Bergmann
2023-05-06 7:25 ` Guo Ren
2023-05-06 7:25 ` Guo Ren
2023-05-06 7:25 ` Guo Ren
2023-05-06 7:25 ` Guo Ren
2023-05-06 7:25 ` Guo Ren
2023-05-06 7:25 ` Guo Ren
2023-05-06 7:53 ` Arnd Bergmann
2023-05-06 7:53 ` Arnd Bergmann
2023-05-06 7:53 ` Arnd Bergmann
2023-05-06 7:53 ` Arnd Bergmann
2023-05-06 7:53 ` Arnd Bergmann
2023-05-06 7:53 ` Arnd Bergmann
2023-03-27 12:13 ` [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 13:37 ` Guo Ren
2023-03-27 13:37 ` Guo Ren
2023-03-27 13:37 ` Guo Ren
2023-03-27 13:37 ` Guo Ren
2023-03-27 13:37 ` Guo Ren
2023-03-27 13:37 ` Guo Ren
2023-03-27 12:13 ` [PATCH 11/21] mips: dma-mapping: skip invalidating before bidirectional DMA Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` [PATCH 12/21] mips: dma-mapping: split out cache operation logic Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` [PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-04-02 6:52 ` Vineet Gupta
2023-04-02 6:52 ` Vineet Gupta
2023-04-02 6:52 ` Vineet Gupta
2023-04-02 6:52 ` Vineet Gupta
2023-04-02 6:52 ` Vineet Gupta
2023-04-02 6:52 ` Vineet Gupta
2023-04-04 8:27 ` Shahab Vahedi
2023-04-04 8:27 ` Shahab Vahedi
2023-04-04 8:27 ` Shahab Vahedi
2023-04-04 8:27 ` Shahab Vahedi
2023-04-04 8:27 ` Shahab Vahedi
2023-04-04 8:27 ` Shahab Vahedi
2023-04-06 9:01 ` Shahab Vahedi
2023-04-06 9:01 ` Shahab Vahedi
2023-04-06 9:01 ` Shahab Vahedi
2023-04-06 9:01 ` Shahab Vahedi
2023-04-06 9:01 ` Shahab Vahedi
2023-04-06 9:01 ` Shahab Vahedi
2023-03-27 12:13 ` [PATCH 14/21] parisc: dma-mapping: use regular flush/invalidate ops Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` [PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-31 9:01 ` Linus Walleij
2023-03-31 9:01 ` Linus Walleij
2023-03-31 9:01 ` Linus Walleij
2023-03-31 9:01 ` Linus Walleij
2023-03-31 9:01 ` Linus Walleij
2023-03-31 9:01 ` Linus Walleij
2023-03-31 9:07 ` Russell King (Oracle)
2023-03-31 9:07 ` Russell King (Oracle)
2023-03-31 9:07 ` Russell King (Oracle)
2023-03-31 9:07 ` Russell King (Oracle)
2023-03-31 9:07 ` Russell King (Oracle)
2023-03-31 9:07 ` Russell King (Oracle)
2023-03-31 9:35 ` Russell King (Oracle)
2023-03-31 9:35 ` Russell King (Oracle)
2023-03-31 9:35 ` Russell King (Oracle)
2023-03-31 9:35 ` Russell King (Oracle)
2023-03-31 9:35 ` Russell King (Oracle)
2023-03-31 9:35 ` Russell King (Oracle)
2023-03-31 10:38 ` Arnd Bergmann
2023-03-31 10:38 ` Arnd Bergmann
2023-03-31 10:38 ` Arnd Bergmann
2023-03-31 10:38 ` Arnd Bergmann
2023-03-31 10:38 ` Arnd Bergmann
2023-03-31 10:38 ` Arnd Bergmann
2023-03-31 11:01 ` David Laight
2023-03-31 11:01 ` David Laight
2023-03-31 11:01 ` David Laight
2023-03-31 11:01 ` David Laight
2023-03-31 11:01 ` David Laight
2023-03-31 11:08 ` Russell King (Oracle)
2023-03-31 11:08 ` Russell King (Oracle)
2023-03-31 11:08 ` Russell King (Oracle)
2023-03-31 11:08 ` Russell King (Oracle)
2023-03-31 11:08 ` Russell King (Oracle)
2023-03-31 11:08 ` Russell King (Oracle)
2023-03-31 12:32 ` Arnd Bergmann
2023-03-31 12:32 ` Arnd Bergmann
2023-03-31 12:32 ` Arnd Bergmann
2023-03-31 12:32 ` Arnd Bergmann
2023-03-31 12:32 ` Arnd Bergmann
2023-03-31 12:32 ` Arnd Bergmann
2023-03-27 12:13 ` [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 13:10 ` Russell King (Oracle)
2023-03-27 13:10 ` Russell King (Oracle)
2023-03-27 13:10 ` Russell King (Oracle)
2023-03-27 13:10 ` Russell King (Oracle)
2023-03-27 13:10 ` Russell King (Oracle)
2023-03-27 13:10 ` Russell King (Oracle)
2023-03-27 12:13 ` [PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-31 9:10 ` Linus Walleij
2023-03-31 9:10 ` Linus Walleij
2023-03-31 9:10 ` Linus Walleij
2023-03-31 9:10 ` Linus Walleij
2023-03-31 9:10 ` Linus Walleij
2023-03-31 9:10 ` Linus Walleij
2023-03-31 12:48 ` Arnd Bergmann
2023-03-31 12:48 ` Arnd Bergmann
2023-03-31 12:48 ` Arnd Bergmann
2023-03-31 12:48 ` Arnd Bergmann
2023-03-31 12:48 ` Arnd Bergmann
2023-03-31 12:48 ` Arnd Bergmann
2023-03-27 12:13 ` [PATCH 18/21] ARM: drop SMP support for ARM11MPCore Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-30 7:48 ` Neil Armstrong
2023-03-30 7:48 ` Neil Armstrong
2023-03-30 7:48 ` Neil Armstrong
2023-03-30 7:48 ` Neil Armstrong
2023-03-30 7:48 ` Neil Armstrong
2023-03-30 7:48 ` Neil Armstrong
2023-03-30 10:03 ` Arnd Bergmann
2023-03-30 10:03 ` Arnd Bergmann
2023-03-30 10:03 ` Arnd Bergmann
2023-03-30 10:03 ` Arnd Bergmann
2023-03-30 10:03 ` Arnd Bergmann
2023-03-30 10:03 ` Arnd Bergmann
2023-03-30 16:40 ` Neil Armstrong
2023-03-30 16:40 ` Neil Armstrong
2023-03-30 16:40 ` Neil Armstrong
2023-03-30 16:40 ` Neil Armstrong
2023-03-30 16:40 ` Neil Armstrong
2023-03-30 16:40 ` Neil Armstrong
2023-03-30 8:12 ` Linus Walleij
2023-03-30 8:12 ` Linus Walleij
2023-03-30 8:12 ` Linus Walleij
2023-03-30 8:12 ` Linus Walleij
2023-03-30 8:12 ` Linus Walleij
2023-03-30 8:12 ` Linus Walleij
2023-03-30 11:28 ` Joel Stanley
2023-03-31 12:54 ` Arnd Bergmann
2023-04-05 1:49 ` Joel Stanley
2023-03-30 11:51 ` Ard Biesheuvel
2023-03-30 11:51 ` Ard Biesheuvel
2023-03-30 11:51 ` Ard Biesheuvel
2023-03-30 11:51 ` Ard Biesheuvel
2023-03-30 11:51 ` Ard Biesheuvel
2023-03-30 11:51 ` Ard Biesheuvel
2023-03-31 17:09 ` Catalin Marinas
2023-03-31 17:09 ` Catalin Marinas
2023-03-31 17:09 ` Catalin Marinas
2023-03-31 17:09 ` Catalin Marinas
2023-03-31 17:09 ` Catalin Marinas
2023-03-31 17:09 ` Catalin Marinas
2023-03-27 12:13 ` [PATCH 19/21] ARM: dma-mapping: use generic form of arch_sync_dma_* helpers Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:48 ` Robin Murphy
2023-03-27 12:48 ` Robin Murphy
2023-03-27 12:48 ` Robin Murphy
2023-03-27 12:48 ` Robin Murphy
2023-03-27 12:48 ` Robin Murphy
2023-03-27 12:48 ` Robin Murphy
2023-03-31 14:00 ` Arnd Bergmann
2023-03-31 14:00 ` Arnd Bergmann
2023-03-31 14:00 ` Arnd Bergmann
2023-03-31 14:00 ` Arnd Bergmann
2023-03-31 14:00 ` Arnd Bergmann
2023-03-31 14:00 ` Arnd Bergmann
2023-03-31 15:12 ` Robin Murphy
2023-03-31 15:12 ` Robin Murphy
2023-03-31 15:12 ` Robin Murphy
2023-03-31 15:12 ` Robin Murphy
2023-03-31 15:12 ` Robin Murphy
2023-03-31 15:12 ` Robin Murphy
2023-03-31 17:20 ` Arnd Bergmann
2023-03-31 17:20 ` Arnd Bergmann
2023-03-31 17:20 ` Arnd Bergmann
2023-03-31 17:20 ` Arnd Bergmann
2023-03-31 17:20 ` Arnd Bergmann
2023-03-31 17:20 ` Arnd Bergmann
2023-03-27 15:01 ` Russell King (Oracle)
2023-03-27 15:01 ` Russell King (Oracle)
2023-03-27 15:01 ` Russell King (Oracle)
2023-03-27 15:01 ` Russell King (Oracle)
2023-03-27 15:01 ` Russell King (Oracle)
2023-03-27 15:01 ` Russell King (Oracle)
2023-03-31 14:06 ` Arnd Bergmann
2023-03-31 14:06 ` Arnd Bergmann
2023-03-31 14:06 ` Arnd Bergmann
2023-03-31 14:06 ` Arnd Bergmann
2023-03-31 14:06 ` Arnd Bergmann
2023-03-31 14:06 ` Arnd Bergmann
2023-03-31 15:54 ` Russell King (Oracle)
2023-03-31 15:54 ` Russell King (Oracle)
2023-03-31 15:54 ` Russell King (Oracle)
2023-03-31 15:54 ` Russell King (Oracle)
2023-03-31 15:54 ` Russell King (Oracle)
2023-03-31 15:54 ` Russell King (Oracle)
2023-03-27 18:42 ` kernel test robot
2023-03-27 19:03 ` kernel test robot
2023-03-28 13:17 ` kernel test robot
2023-07-03 7:54 ` Geert Uytterhoeven
2023-07-03 7:54 ` Geert Uytterhoeven
2023-07-03 7:54 ` Geert Uytterhoeven
2023-07-03 7:54 ` Geert Uytterhoeven
2023-07-03 7:54 ` Geert Uytterhoeven
2023-07-03 7:54 ` Geert Uytterhoeven
2023-07-06 14:11 ` Christoph Hellwig
2023-07-06 14:11 ` Christoph Hellwig
2023-07-06 14:11 ` Christoph Hellwig
2023-07-06 14:11 ` Christoph Hellwig
2023-07-06 14:11 ` Christoph Hellwig
2023-07-06 14:11 ` Christoph Hellwig
2023-03-27 12:13 ` [PATCH 21/21] dma-mapping: replace custom code with generic implementation Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 12:13 ` Arnd Bergmann
2023-03-27 22:25 ` Christoph Hellwig
2023-03-27 22:25 ` Christoph Hellwig
2023-03-27 22:25 ` Christoph Hellwig
2023-03-27 22:25 ` Christoph Hellwig
2023-03-27 22:25 ` Christoph Hellwig
2023-03-27 22:25 ` Christoph Hellwig
2023-03-31 13:04 ` Arnd Bergmann
2023-03-31 13:04 ` Arnd Bergmann
2023-03-31 13:04 ` Arnd Bergmann
2023-03-31 13:04 ` Arnd Bergmann
2023-03-31 13:04 ` Arnd Bergmann
2023-03-31 13:04 ` Arnd Bergmann
2023-03-30 14:06 ` Lad, Prabhakar
2023-03-30 14:06 ` Lad, Prabhakar
2023-03-30 14:06 ` Lad, Prabhakar
2023-03-30 14:06 ` Lad, Prabhakar
2023-03-30 14:06 ` Lad, Prabhakar
2023-03-30 14:06 ` Lad, Prabhakar
2023-04-13 12:13 ` Biju Das
2023-04-13 12:13 ` Biju Das
2023-04-13 12:13 ` Biju Das
2023-04-13 12:13 ` Biju Das
2023-04-13 12:13 ` Biju Das
2023-04-13 12:13 ` Biju Das
2023-04-13 12:13 ` Biju Das
2023-04-13 12:51 ` Arnd Bergmann
2023-04-13 12:51 ` Arnd Bergmann
2023-04-13 12:51 ` Arnd Bergmann
2023-04-13 12:51 ` Arnd Bergmann
2023-04-13 12:51 ` Arnd Bergmann
2023-04-13 12:51 ` Arnd Bergmann
2023-06-27 16:52 ` Geert Uytterhoeven
2023-06-27 16:52 ` Geert Uytterhoeven
2023-06-27 16:52 ` Geert Uytterhoeven
2023-06-27 16:52 ` Geert Uytterhoeven
2023-06-27 16:52 ` Geert Uytterhoeven
2023-06-27 16:52 ` Geert Uytterhoeven
2023-03-31 16:53 ` [PATCH 00/21] dma-mapping: unify support for cache flushes Catalin Marinas
2023-03-31 16:53 ` Catalin Marinas
2023-03-31 16:53 ` Catalin Marinas
2023-03-31 16:53 ` Catalin Marinas
2023-03-31 16:53 ` Catalin Marinas
2023-03-31 16:53 ` Catalin Marinas
2023-03-31 20:27 ` Arnd Bergmann
2023-03-31 20:27 ` Arnd Bergmann
2023-03-31 20:27 ` Arnd Bergmann
2023-03-31 20:27 ` Arnd Bergmann
2023-03-31 20:27 ` Arnd Bergmann
2023-03-31 20:27 ` Arnd Bergmann
2023-05-25 7:46 ` Lad, Prabhakar
2023-05-25 7:46 ` Lad, Prabhakar
2023-05-25 7:46 ` Lad, Prabhakar
2023-05-25 7:46 ` Lad, Prabhakar
2023-05-25 7:46 ` Lad, Prabhakar
2023-05-25 7:46 ` Lad, Prabhakar
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.