linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [Question] cacheline sharing problem on DMA for custom outer-cache
@ 2016-01-15  2:36 Masahiro Yamada
  2016-01-15  8:33 ` Arnd Bergmann
  0 siblings, 1 reply; 5+ messages in thread
From: Masahiro Yamada @ 2016-01-15  2:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi.


Documentation/DMA-API-HOWTO.txt states as follows.

--------------->8---------------------
Even if those classes of
memory could physically work with DMA, you'd need to ensure the I/O
buffers were cacheline-aligned.  Without that, you'd see cacheline
sharing problems (data corruption) on CPUs with DMA-incoherent caches.
(The CPU could write to one word, DMA would write to a different one
in the same cache line, and one of them could be overwritten.)
---------------8<-----------------------


The UniPhier SoC series (ARM SoC) uses
its own custom outer-cache, not the one from ARM Ltd.
(arch/arm/mm/cache-uniphier.c)

Its line-size is 128 byte.


As far as I tested on my board, the memory allocated by kmalloc()
is 64byte-aligned.
(the kernel image is configured with multi_v7_defconfig)


For example,
  a = kmalloc(1, GFP_KERNEL);
  b = kmalloc(1, GFP_KERNEL);
  printk("a=%p, b=%p\n", a, b);

shows console log as follows

  a=ee5db9c0, b=ee5dba00





When only L1-cache is enabled, it is OK.


If L2 is also enabled,
kmalloc() & dma_map_single() could be a cacheline sharing problem.


Is there any good solution?


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Question] cacheline sharing problem on DMA for custom outer-cache
  2016-01-15  2:36 [Question] cacheline sharing problem on DMA for custom outer-cache Masahiro Yamada
@ 2016-01-15  8:33 ` Arnd Bergmann
  2016-01-18 12:05   ` Masahiro Yamada
  0 siblings, 1 reply; 5+ messages in thread
From: Arnd Bergmann @ 2016-01-15  8:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 15 January 2016 11:36:30 Masahiro Yamada wrote:
> 
> When only L1-cache is enabled, it is OK.
> 
> 
> If L2 is also enabled,
> kmalloc() & dma_map_single() could be a cacheline sharing problem.
> 
> 
> Is there any good solution?

kmalloc uses ARCH_KMALLOC_MINALIGN alignment, so we need to tweak that
in one form or another.


The relevant definitions I see are

#define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
#define ARCH_DMA_MINALIGN       L1_CACHE_BYTES
#define L1_CACHE_SHIFT          CONFIG_ARM_L1_CACHE_SHIFT
#define L1_CACHE_BYTES          (1 << L1_CACHE_SHIFT)

I think you should check all other uses of L1_CACHE_SHIFT and L1_CACHE_BYTES.
If this is the only one that needs to be adjusted, we can change the
definition of ARCH_DMA_MINALIGN, otherwise we may have to add a platform
specific option to CONFIG_ARM_L1_CACHE_SHIFT.

I see a couple of suspicious uses of the L1 cache line size:

drivers/net/ethernet/broadcom/cnic.c:   data->rx.cache_line_alignment_log_size = L1_CACHE_SHIFT;
drivers/net/ethernet/qlogic/qede/qede.h:#define QEDE_RX_ALIGN_SHIFT             max(6, min(8, L1_CACHE_SHIFT))
lib/dma-debug.c:#define CACHELINE_PER_PAGE_SHIFT (PAGE_SHIFT - L1_CACHE_SHIFT)
drivers/net/ethernet/sfc/tx.c:#define EFX_PIOBUF_SIZE_DEF ALIGN(256, L1_CACHE_BYTES)
drivers/net/wireless/ath/ath6kl/init.c:         skb_reserve(skb, reserved - L1_CACHE_BYTES);
include/linux/iio/iio.h:#define IIO_ALIGN L1_CACHE_BYTES
include/linux/mlx5/driver.h:    MLX5_DB_PER_PAGE = PAGE_SIZE / L1_CACHE_BYTES,

Those need closer inspection, and I'm sure there are a couple more. Maybe
they should use ARCH_DMA_MINALIGN instead of L1_CACHE_BYTES. There are also
lots of instances that assume L1_CACHE_BYTES is the L1 line size, not L2,
but they are typically only for performance optimization through prefetching,
so having it set too big will only make it slower rather than incorrect.

	Arnd

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Question] cacheline sharing problem on DMA for custom outer-cache
  2016-01-15  8:33 ` Arnd Bergmann
@ 2016-01-18 12:05   ` Masahiro Yamada
  2016-01-18 12:32     ` Russell King - ARM Linux
  0 siblings, 1 reply; 5+ messages in thread
From: Masahiro Yamada @ 2016-01-18 12:05 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arnd,


2016-01-15 17:33 GMT+09:00 Arnd Bergmann <arnd@arndb.de>:
> On Friday 15 January 2016 11:36:30 Masahiro Yamada wrote:
>>
>> When only L1-cache is enabled, it is OK.
>>
>>
>> If L2 is also enabled,
>> kmalloc() & dma_map_single() could be a cacheline sharing problem.
>>
>>
>> Is there any good solution?
>
> kmalloc uses ARCH_KMALLOC_MINALIGN alignment, so we need to tweak that
> in one form or another.
>
>
> The relevant definitions I see are
>
> #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
> #define ARCH_DMA_MINALIGN       L1_CACHE_BYTES
> #define L1_CACHE_SHIFT          CONFIG_ARM_L1_CACHE_SHIFT
> #define L1_CACHE_BYTES          (1 << L1_CACHE_SHIFT)

Thanks for this clue.

By increasing CONFIG_ARM_L1_CACHE_SHIFT by 1, now I can solve the issue locally,
but it would be better if there existed a solution that can be upstreamed.


> I think you should check all other uses of L1_CACHE_SHIFT and L1_CACHE_BYTES.
> If this is the only one that needs to be adjusted, we can change the
> definition of ARCH_DMA_MINALIGN, otherwise we may have to add a platform
> specific option to CONFIG_ARM_L1_CACHE_SHIFT.

L1_CACHE_BYTES is not a configuration.  It is a hardware property.

Actually, Tegra is the only hardware that has L1 cache with 64byte line-size.

The other SoCs in multi_v7_defconfig run software configured for
64byte line-size
on CPUs with 32byte line-size.  Weird.


And, deciding the DMA aligment only with L1 line-size does not seem nice.
I admit the outer-cache on my SoC is odd, though.



> I see a couple of suspicious uses of the L1 cache line size:
>
> drivers/net/ethernet/broadcom/cnic.c:   data->rx.cache_line_alignment_log_size = L1_CACHE_SHIFT;
> drivers/net/ethernet/qlogic/qede/qede.h:#define QEDE_RX_ALIGN_SHIFT             max(6, min(8, L1_CACHE_SHIFT))
> lib/dma-debug.c:#define CACHELINE_PER_PAGE_SHIFT (PAGE_SHIFT - L1_CACHE_SHIFT)
> drivers/net/ethernet/sfc/tx.c:#define EFX_PIOBUF_SIZE_DEF ALIGN(256, L1_CACHE_BYTES)
> drivers/net/wireless/ath/ath6kl/init.c:         skb_reserve(skb, reserved - L1_CACHE_BYTES);
> include/linux/iio/iio.h:#define IIO_ALIGN L1_CACHE_BYTES
> include/linux/mlx5/driver.h:    MLX5_DB_PER_PAGE = PAGE_SIZE / L1_CACHE_BYTES,


Hmm, this is too advanced for me to check drivers I am unfamiliar with...


> Those need closer inspection, and I'm sure there are a couple more. Maybe
> they should use ARCH_DMA_MINALIGN instead of L1_CACHE_BYTES. There are also
> lots of instances that assume L1_CACHE_BYTES is the L1 line size, not L2,
> but they are typically only for performance optimization through prefetching,
> so having it set too big will only make it slower rather than incorrect.

My SoC is a member of multi_v7_defconfig.
I wonder if it is accepted to make other SoCs slower.


If we could parse "line-size" DT-property in the early stage
and change the DMA alignment run-time, it would avoid degrading
performance on other SoCs.



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Question] cacheline sharing problem on DMA for custom outer-cache
  2016-01-18 12:05   ` Masahiro Yamada
@ 2016-01-18 12:32     ` Russell King - ARM Linux
  2016-01-19  8:25       ` Masahiro Yamada
  0 siblings, 1 reply; 5+ messages in thread
From: Russell King - ARM Linux @ 2016-01-18 12:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 18, 2016 at 09:05:15PM +0900, Masahiro Yamada wrote:
> L1_CACHE_BYTES is not a configuration.  It is a hardware property.
> 
> Actually, Tegra is the only hardware that has L1 cache with 64byte line-size.

We have a load of paltforms which select ARM_L1_CACHE_SHIFT_6, plus
Cortex-A8 and Cortex-A15 both have 64-byte L1 line size.

It's all in the git history, please research.

$ git log -p -- arch/arm/Kconfig arch/arm/mm/Kconfig

and search for "ARM_L1_CACHE_SHIFT_6".

> The other SoCs in multi_v7_defconfig run software configured for
> 64byte line-size on CPUs with 32byte line-size.  Weird.

Nothing weird here.  A kernel configured for a 64 byte cache line size
can be run on a system with a 32 byte cache line size, but not vice
versa.

> And, deciding the DMA aligment only with L1 line-size does not seem nice.
> I admit the outer-cache on my SoC is odd, though.

Currently, the L1 defines the biggest restriction on DMA cache line
alignment that we have.  However, you should be wary of changing
only the DMA cache alignment - I think you'll find a lot of things
assume that aligning to L1_CACHE_BYTES will be DMA safe.

> If we could parse "line-size" DT-property in the early stage
> and change the DMA alignment run-time, it would avoid degrading
> performance on other SoCs.

You can't realign data structures in the kernel image at runtime.

L1_CACHE_BYTES is used as a constant in several places for this
purpose.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Question] cacheline sharing problem on DMA for custom outer-cache
  2016-01-18 12:32     ` Russell King - ARM Linux
@ 2016-01-19  8:25       ` Masahiro Yamada
  0 siblings, 0 replies; 5+ messages in thread
From: Masahiro Yamada @ 2016-01-19  8:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,


2016-01-18 21:32 GMT+09:00 Russell King - ARM Linux <linux@arm.linux.org.uk>:
> On Mon, Jan 18, 2016 at 09:05:15PM +0900, Masahiro Yamada wrote:
>> L1_CACHE_BYTES is not a configuration.  It is a hardware property.
>>
>> Actually, Tegra is the only hardware that has L1 cache with 64byte line-size.
>
> We have a load of paltforms which select ARM_L1_CACHE_SHIFT_6, plus
> Cortex-A8 and Cortex-A15 both have 64-byte L1 line size.
>
> It's all in the git history, please research.
>
> $ git log -p -- arch/arm/Kconfig arch/arm/mm/Kconfig
>
> and search for "ARM_L1_CACHE_SHIFT_6".

Ah, I missed that.
Thanks for clarifying this.



>> The other SoCs in multi_v7_defconfig run software configured for
>> 64byte line-size on CPUs with 32byte line-size.  Weird.
>
> Nothing weird here.  A kernel configured for a 64 byte cache line size
> can be run on a system with a 32 byte cache line size, but not vice
> versa.
>
>> And, deciding the DMA aligment only with L1 line-size does not seem nice.
>> I admit the outer-cache on my SoC is odd, though.
>
> Currently, the L1 defines the biggest restriction on DMA cache line
> alignment that we have.  However, you should be wary of changing
> only the DMA cache alignment - I think you'll find a lot of things
> assume that aligning to L1_CACHE_BYTES will be DMA safe.
>
>> If we could parse "line-size" DT-property in the early stage
>> and change the DMA alignment run-time, it would avoid degrading
>> performance on other SoCs.
>
> You can't realign data structures in the kernel image at runtime.
>
> L1_CACHE_BYTES is used as a constant in several places for this
> purpose.

OK, I understand that it is difficult to change the definition of this macro.



If I submitted something like follows, would it be accepted?


 config ARM_L1_CACHE_SHIFT
         int
+        default 7 if CACHE_UNIPHIER
         default 6 if ARM_L1_CACHE_SHIFT_6
         default 5




-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-01-19  8:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-15  2:36 [Question] cacheline sharing problem on DMA for custom outer-cache Masahiro Yamada
2016-01-15  8:33 ` Arnd Bergmann
2016-01-18 12:05   ` Masahiro Yamada
2016-01-18 12:32     ` Russell King - ARM Linux
2016-01-19  8:25       ` Masahiro Yamada

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).