From: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
Cc: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>,
Arnd Bergmann <arnd@arndb.de>,
Yassine Oudjana <y.oudjana@protonmail.com>,
Marc Zyngier <maz@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
Ard Biesheuvel <ardb@kernel.org>,
Android Kernel Team <kernel-team@android.com>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Mark Rutland <mark.rutland@arm.com>,
Vincent Whitchurch <vincent.whitchurch@axis.com>,
linux-arm-msm <linux-arm-msm@vger.kernel.org>,
Bjorn Andersson <bjorn.andersson@linaro.org>
Subject: Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
Date: Fri, 9 Jul 2021 18:10:53 +0100 [thread overview]
Message-ID: <20210709171051.GB29765@arm.com> (raw)
In-Reply-To: <20210709084842.GA24432@willie-the-truck>
On Fri, Jul 09, 2021 at 09:48:42AM +0100, Will Deacon wrote:
> On Thu, Jul 08, 2021 at 02:59:28PM -0600, Jeffrey Hugo wrote:
> > On Wed, Jul 7, 2021 at 8:41 AM Jeffrey Hugo <jeffrey.l.hugo@gmail.com> wrote:
> > > L0 I 64 byte cacheline
> > > L1 I 64
> > > L1 D 64
> > > L2 unified 128 (shared between the CPUs of a duplex)
> > >
> > > I believe L2 is within the POC, but I'm trying to dig up the old
> > > documentation to confirm.
> >
> > Was able to track down a friendly hardware designer. The POC lies
> > between L2 and L3. Hope this helps.
>
> Damn, yes, it's bad news but thanks for chasing it up. I'll revert the patch
> at -rc1 and add a comment about MSM8996.
It's a shame but we can't do much for this platform.
Longer term, we should look at making kmalloc() cache selection more
dynamic. Probably still starting with a 128 byte minimum size but, after
initialising all the devices during boot, if we can't find any
non-coherent one just relax the kmalloc() allocations. We still have the
issue with platform devices with DT assumed to be non-coherent and any
late call (after boot) to arch_setup_dma_ops().
Some bodge below to get an idea, not a final patch (not even the
beginning of one). It initialises the kmalloc caches to size 8 but
limits the allocation size to a kmalloc_dyn_min_size, initially set to
128 on arm64. In a device_initcall_sync(), if we didn't find any
non-coherent device, we lower this to KMALLOC_MIN_SIZE (8 with slub).
----------------8<----------------------------
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index a074459f8f2f..bed65db3c42e 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -40,15 +40,6 @@
#define CLIDR_LOC(clidr) (((clidr) >> CLIDR_LOC_SHIFT) & 0x7)
#define CLIDR_LOUIS(clidr) (((clidr) >> CLIDR_LOUIS_SHIFT) & 0x7)
-/*
- * Memory returned by kmalloc() may be used for DMA, so we must make
- * sure that all such allocations are cache aligned. Otherwise,
- * unrelated code may cause parts of the buffer to be read into the
- * cache before the transfer is done, causing old data to be seen by
- * the CPU.
- */
-#define ARCH_DMA_MINALIGN (128)
-
#ifdef CONFIG_KASAN_SW_TAGS
#define ARCH_SLAB_MINALIGN (1ULL << KASAN_SHADOW_SCALE_SHIFT)
#elif defined(CONFIG_KASAN_HW_TAGS)
@@ -59,6 +50,9 @@
#include <linux/bitops.h>
+extern int kmalloc_dyn_min_size;
+#define __HAVE_ARCH_KMALLOC_DYN_MIN_SIZE
+
#define ICACHEF_ALIASING 0
#define ICACHEF_VPIPT 1
extern unsigned long __icache_flags;
@@ -88,7 +82,7 @@ static inline int cache_line_size_of_cpu(void)
{
u32 cwg = cache_type_cwg();
- return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;
+ return cwg ? 4 << cwg : __alignof__(unsigned long long);
}
int cache_line_size(void);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index efed2830d141..a25813377187 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2808,8 +2808,8 @@ void __init setup_cpu_features(void)
*/
cwg = cache_type_cwg();
if (!cwg)
- pr_warn("No Cache Writeback Granule information, assuming %d\n",
- ARCH_DMA_MINALIGN);
+ pr_warn("No Cache Writeback Granule information, assuming %ld\n",
+ __alignof__(unsigned long long));
}
static void __maybe_unused cpu_enable_cnp(struct arm64_cpu_capabilities const *cap)
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4bf1dd3eb041..9a30d1beb3ea 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -13,6 +13,18 @@
#include <asm/cacheflush.h>
+/*
+ * Memory returned by kmalloc() may be used for DMA, so we must make
+ * sure that all such allocations are cache aligned. Otherwise,
+ * unrelated code may cause parts of the buffer to be read into the
+ * cache before the transfer is done, causing old data to be seen by
+ * the CPU.
+ */
+int kmalloc_dyn_min_size = 128;
+EXPORT_SYMBOL(kmalloc_dyn_min_size);
+
+static bool non_coherent_devices;
+
void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
enum dma_data_direction dir)
{
@@ -42,11 +54,14 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
{
int cls = cache_line_size_of_cpu();
- WARN_TAINT(!coherent && cls > ARCH_DMA_MINALIGN,
+ WARN_TAINT(!coherent && cls > kmalloc_dyn_min_size,
TAINT_CPU_OUT_OF_SPEC,
- "%s %s: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < %d)",
+ "%s %s: kmalloc() minimum size smaller than CTR_EL0.CWG (%d < %d)",
dev_driver_string(dev), dev_name(dev),
- ARCH_DMA_MINALIGN, cls);
+ kmalloc_dyn_min_size, cls);
+
+ if (!coherent)
+ non_coherent_devices = true;
dev->dma_coherent = coherent;
if (iommu)
@@ -57,3 +72,12 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
dev->dma_ops = &xen_swiotlb_dma_ops;
#endif
}
+
+static int __init adjust_kmalloc_dyn_min_size(void)
+{
+ if (!non_coherent_devices)
+ kmalloc_dyn_min_size = KMALLOC_MIN_SIZE;
+
+ return 0;
+}
+device_initcall_sync(adjust_kmalloc_dyn_min_size);
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0c97d788762c..e40c7899cb07 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -349,15 +349,21 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags)
*/
static __always_inline unsigned int kmalloc_index(size_t size)
{
+ int min_size = KMALLOC_MIN_SIZE;
+
if (!size)
return 0;
- if (size <= KMALLOC_MIN_SIZE)
- return KMALLOC_SHIFT_LOW;
+#ifdef __HAVE_ARCH_KMALLOC_DYN_MIN_SIZE
+ min_size = kmalloc_dyn_min_size;
+#endif
+
+ if (size <= min_size)
+ return ilog2(min_size);
- if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96)
+ if (min_size <= 32 && size > 64 && size <= 96)
return 1;
- if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192)
+ if (min_size <= 64 && size > 128 && size <= 192)
return 2;
if (size <= 8) return 3;
if (size <= 16) return 4;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7cab77655f11..2666237c84c4 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -725,6 +725,10 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
if (!size)
return ZERO_SIZE_PTR;
+#ifdef __HAVE_ARCH_KMALLOC_DYN_MIN_SIZE
+ if (size < kmalloc_dyn_min_size)
+ size = kmalloc_dyn_min_size;
+#endif
index = size_index[size_index_elem(size)];
} else {
if (WARN_ON_ONCE(size > KMALLOC_MAX_CACHE_SIZE))
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
Cc: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>,
Arnd Bergmann <arnd@arndb.de>,
Yassine Oudjana <y.oudjana@protonmail.com>,
Marc Zyngier <maz@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
Ard Biesheuvel <ardb@kernel.org>,
Android Kernel Team <kernel-team@android.com>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Mark Rutland <mark.rutland@arm.com>,
Vincent Whitchurch <vincent.whitchurch@axis.com>,
linux-arm-msm <linux-arm-msm@vger.kernel.org>,
Bjorn Andersson <bjorn.andersson@linaro.org>
Subject: Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
Date: Fri, 9 Jul 2021 18:10:53 +0100 [thread overview]
Message-ID: <20210709171051.GB29765@arm.com> (raw)
In-Reply-To: <20210709084842.GA24432@willie-the-truck>
On Fri, Jul 09, 2021 at 09:48:42AM +0100, Will Deacon wrote:
> On Thu, Jul 08, 2021 at 02:59:28PM -0600, Jeffrey Hugo wrote:
> > On Wed, Jul 7, 2021 at 8:41 AM Jeffrey Hugo <jeffrey.l.hugo@gmail.com> wrote:
> > > L0 I 64 byte cacheline
> > > L1 I 64
> > > L1 D 64
> > > L2 unified 128 (shared between the CPUs of a duplex)
> > >
> > > I believe L2 is within the POC, but I'm trying to dig up the old
> > > documentation to confirm.
> >
> > Was able to track down a friendly hardware designer. The POC lies
> > between L2 and L3. Hope this helps.
>
> Damn, yes, it's bad news but thanks for chasing it up. I'll revert the patch
> at -rc1 and add a comment about MSM8996.
It's a shame but we can't do much for this platform.
Longer term, we should look at making kmalloc() cache selection more
dynamic. Probably still starting with a 128 byte minimum size but, after
initialising all the devices during boot, if we can't find any
non-coherent one just relax the kmalloc() allocations. We still have the
issue with platform devices with DT assumed to be non-coherent and any
late call (after boot) to arch_setup_dma_ops().
Some bodge below to get an idea, not a final patch (not even the
beginning of one). It initialises the kmalloc caches to size 8 but
limits the allocation size to a kmalloc_dyn_min_size, initially set to
128 on arm64. In a device_initcall_sync(), if we didn't find any
non-coherent device, we lower this to KMALLOC_MIN_SIZE (8 with slub).
----------------8<----------------------------
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index a074459f8f2f..bed65db3c42e 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -40,15 +40,6 @@
#define CLIDR_LOC(clidr) (((clidr) >> CLIDR_LOC_SHIFT) & 0x7)
#define CLIDR_LOUIS(clidr) (((clidr) >> CLIDR_LOUIS_SHIFT) & 0x7)
-/*
- * Memory returned by kmalloc() may be used for DMA, so we must make
- * sure that all such allocations are cache aligned. Otherwise,
- * unrelated code may cause parts of the buffer to be read into the
- * cache before the transfer is done, causing old data to be seen by
- * the CPU.
- */
-#define ARCH_DMA_MINALIGN (128)
-
#ifdef CONFIG_KASAN_SW_TAGS
#define ARCH_SLAB_MINALIGN (1ULL << KASAN_SHADOW_SCALE_SHIFT)
#elif defined(CONFIG_KASAN_HW_TAGS)
@@ -59,6 +50,9 @@
#include <linux/bitops.h>
+extern int kmalloc_dyn_min_size;
+#define __HAVE_ARCH_KMALLOC_DYN_MIN_SIZE
+
#define ICACHEF_ALIASING 0
#define ICACHEF_VPIPT 1
extern unsigned long __icache_flags;
@@ -88,7 +82,7 @@ static inline int cache_line_size_of_cpu(void)
{
u32 cwg = cache_type_cwg();
- return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;
+ return cwg ? 4 << cwg : __alignof__(unsigned long long);
}
int cache_line_size(void);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index efed2830d141..a25813377187 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2808,8 +2808,8 @@ void __init setup_cpu_features(void)
*/
cwg = cache_type_cwg();
if (!cwg)
- pr_warn("No Cache Writeback Granule information, assuming %d\n",
- ARCH_DMA_MINALIGN);
+ pr_warn("No Cache Writeback Granule information, assuming %ld\n",
+ __alignof__(unsigned long long));
}
static void __maybe_unused cpu_enable_cnp(struct arm64_cpu_capabilities const *cap)
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4bf1dd3eb041..9a30d1beb3ea 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -13,6 +13,18 @@
#include <asm/cacheflush.h>
+/*
+ * Memory returned by kmalloc() may be used for DMA, so we must make
+ * sure that all such allocations are cache aligned. Otherwise,
+ * unrelated code may cause parts of the buffer to be read into the
+ * cache before the transfer is done, causing old data to be seen by
+ * the CPU.
+ */
+int kmalloc_dyn_min_size = 128;
+EXPORT_SYMBOL(kmalloc_dyn_min_size);
+
+static bool non_coherent_devices;
+
void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
enum dma_data_direction dir)
{
@@ -42,11 +54,14 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
{
int cls = cache_line_size_of_cpu();
- WARN_TAINT(!coherent && cls > ARCH_DMA_MINALIGN,
+ WARN_TAINT(!coherent && cls > kmalloc_dyn_min_size,
TAINT_CPU_OUT_OF_SPEC,
- "%s %s: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < %d)",
+ "%s %s: kmalloc() minimum size smaller than CTR_EL0.CWG (%d < %d)",
dev_driver_string(dev), dev_name(dev),
- ARCH_DMA_MINALIGN, cls);
+ kmalloc_dyn_min_size, cls);
+
+ if (!coherent)
+ non_coherent_devices = true;
dev->dma_coherent = coherent;
if (iommu)
@@ -57,3 +72,12 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
dev->dma_ops = &xen_swiotlb_dma_ops;
#endif
}
+
+static int __init adjust_kmalloc_dyn_min_size(void)
+{
+ if (!non_coherent_devices)
+ kmalloc_dyn_min_size = KMALLOC_MIN_SIZE;
+
+ return 0;
+}
+device_initcall_sync(adjust_kmalloc_dyn_min_size);
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0c97d788762c..e40c7899cb07 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -349,15 +349,21 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags)
*/
static __always_inline unsigned int kmalloc_index(size_t size)
{
+ int min_size = KMALLOC_MIN_SIZE;
+
if (!size)
return 0;
- if (size <= KMALLOC_MIN_SIZE)
- return KMALLOC_SHIFT_LOW;
+#ifdef __HAVE_ARCH_KMALLOC_DYN_MIN_SIZE
+ min_size = kmalloc_dyn_min_size;
+#endif
+
+ if (size <= min_size)
+ return ilog2(min_size);
- if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96)
+ if (min_size <= 32 && size > 64 && size <= 96)
return 1;
- if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192)
+ if (min_size <= 64 && size > 128 && size <= 192)
return 2;
if (size <= 8) return 3;
if (size <= 16) return 4;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7cab77655f11..2666237c84c4 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -725,6 +725,10 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
if (!size)
return ZERO_SIZE_PTR;
+#ifdef __HAVE_ARCH_KMALLOC_DYN_MIN_SIZE
+ if (size < kmalloc_dyn_min_size)
+ size = kmalloc_dyn_min_size;
+#endif
index = size_index[size_index_elem(size)];
} else {
if (WARN_ON_ONCE(size > KMALLOC_MAX_CACHE_SIZE))
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-07-09 17:11 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20210602132541eucas1p17127696041c26c00d1d2f50bef9cfaf0@eucas1p1.samsung.com>
2021-05-27 12:43 ` [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES) Will Deacon
2021-05-27 13:11 ` Catalin Marinas
2021-05-27 13:19 ` Mark Rutland
2021-05-28 9:35 ` Arnd Bergmann
2021-06-01 10:14 ` Catalin Marinas
2021-05-31 5:38 ` Ard Biesheuvel
2021-06-01 18:21 ` Will Deacon
2021-06-02 13:25 ` Marek Szyprowski
2021-06-02 13:51 ` Mark Rutland
2021-06-02 14:09 ` Marek Szyprowski
2021-06-02 14:14 ` Arnd Bergmann
2021-06-02 14:28 ` Marek Szyprowski
2021-06-02 14:52 ` Arnd Bergmann
2021-06-07 12:17 ` Arnd Bergmann
2021-06-04 10:01 ` Mark Rutland
2021-06-07 9:58 ` Marek Szyprowski
2021-06-07 12:01 ` Mark Rutland
2021-06-07 13:08 ` Mark Rutland
2021-06-07 13:39 ` Will Deacon
2021-06-07 13:39 ` Will Deacon
2021-06-07 13:56 ` Mark Rutland
2021-06-07 13:56 ` Mark Rutland
2021-06-07 13:57 ` Arnd Bergmann
2021-06-07 13:57 ` Arnd Bergmann
2021-06-07 15:17 ` Maxime Ripard
2021-06-07 15:17 ` Maxime Ripard
2021-06-07 15:50 ` Arnd Bergmann
2021-06-07 15:50 ` Arnd Bergmann
2021-06-08 8:57 ` Mark Rutland
2021-06-08 8:57 ` Mark Rutland
2021-06-07 15:32 ` Mark Rutland
2021-06-07 15:32 ` Mark Rutland
2021-06-02 14:11 ` Arnd Bergmann
2021-06-02 14:15 ` Marek Szyprowski
2021-07-06 9:26 ` Yassine Oudjana
2021-07-06 10:26 ` Catalin Marinas
2021-07-06 10:26 ` Catalin Marinas
2021-07-06 13:29 ` Robin Murphy
2021-07-06 13:29 ` Robin Murphy
2021-07-06 13:33 ` Will Deacon
2021-07-06 13:33 ` Will Deacon
2021-07-06 13:44 ` Marc Zyngier
2021-07-06 13:44 ` Marc Zyngier
2021-07-06 14:21 ` Robin Murphy
2021-07-06 14:21 ` Robin Murphy
2021-07-06 14:30 ` Arnd Bergmann
2021-07-06 14:30 ` Arnd Bergmann
2021-07-06 14:46 ` Marc Zyngier
2021-07-06 14:46 ` Marc Zyngier
2021-07-06 15:43 ` Arnd Bergmann
2021-07-06 15:43 ` Arnd Bergmann
2021-07-06 17:15 ` Yassine Oudjana
2021-07-06 17:15 ` Yassine Oudjana
2021-07-06 20:33 ` Arnd Bergmann
2021-07-06 20:33 ` Arnd Bergmann
2021-07-06 22:27 ` Bjorn Andersson
2021-07-06 22:27 ` Bjorn Andersson
2021-07-07 9:27 ` Will Deacon
2021-07-07 9:27 ` Will Deacon
2021-07-07 8:24 ` Yassine Oudjana
2021-07-07 8:24 ` Yassine Oudjana
2021-07-07 9:29 ` Arnd Bergmann
2021-07-07 9:29 ` Arnd Bergmann
2021-07-07 14:41 ` Jeffrey Hugo
2021-07-07 14:41 ` Jeffrey Hugo
2021-07-08 20:59 ` Jeffrey Hugo
2021-07-08 20:59 ` Jeffrey Hugo
2021-07-09 8:48 ` Will Deacon
2021-07-09 8:48 ` Will Deacon
2021-07-09 17:10 ` Catalin Marinas [this message]
2021-07-09 17:10 ` Catalin Marinas
2021-07-06 16:20 ` Will Deacon
2021-07-06 16:20 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210709171051.GB29765@arm.com \
--to=catalin.marinas@arm.com \
--cc=ardb@kernel.org \
--cc=arnd@arndb.de \
--cc=bjorn.andersson@linaro.org \
--cc=jeffrey.l.hugo@gmail.com \
--cc=kernel-team@android.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=robin.murphy@arm.com \
--cc=vincent.whitchurch@axis.com \
--cc=will@kernel.org \
--cc=y.oudjana@protonmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.