linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
@ 2021-09-16 14:52 Christophe Leroy
  2021-11-02 10:11 ` Michael Ellerman
  2024-08-21 22:39 ` Christian Lamparter
  0 siblings, 2 replies; 17+ messages in thread
From: Christophe Leroy @ 2021-09-16 14:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: Stan Johnson, Finn Thain, linuxppc-dev, linux-kernel

dcbz instruction shouldn't be used on non-cached memory. Using
it on non-cached memory can result in alignment exception and
implies a heavy handling.

Instead of silentely emulating the instruction and resulting in high
performance degradation, warn whenever an alignment exception is
taken in kernel mode due to dcbz, so that the user is made aware that
dcbz instruction has been used unexpectedly by the kernel.

Reported-by: Stan Johnson <userm57@yahoo.com>
Cc: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Warn only when emulating kernel mode
---
 arch/powerpc/kernel/align.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
index bbb4181621dd..bf96b954a4eb 100644
--- a/arch/powerpc/kernel/align.c
+++ b/arch/powerpc/kernel/align.c
@@ -349,6 +349,7 @@ int fix_alignment(struct pt_regs *regs)
 		if (op.type != CACHEOP + DCBZ)
 			return -EINVAL;
 		PPC_WARN_ALIGNMENT(dcbz, regs);
+		WARN_ON_ONCE(!user_mode(regs));
 		r = emulate_dcbz(op.ea, regs);
 	} else {
 		if (type == LARX || type == STCX)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2021-09-16 14:52 [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode Christophe Leroy
@ 2021-11-02 10:11 ` Michael Ellerman
  2024-08-21 22:39 ` Christian Lamparter
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2021-11-02 10:11 UTC (permalink / raw)
  To: Michael Ellerman, Paul Mackerras, Christophe Leroy,
	Benjamin Herrenschmidt
  Cc: linuxppc-dev, Stan Johnson, Finn Thain, linux-kernel

On Thu, 16 Sep 2021 16:52:09 +0200, Christophe Leroy wrote:
> dcbz instruction shouldn't be used on non-cached memory. Using
> it on non-cached memory can result in alignment exception and
> implies a heavy handling.
> 
> Instead of silentely emulating the instruction and resulting in high
> performance degradation, warn whenever an alignment exception is
> taken in kernel mode due to dcbz, so that the user is made aware that
> dcbz instruction has been used unexpectedly by the kernel.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: warn on emulation of dcbz instruction in kernel mode
      https://git.kernel.org/powerpc/c/cbe654c779616807e1e6823c3bdbfe07a10562b8

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2021-09-16 14:52 [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode Christophe Leroy
  2021-11-02 10:11 ` Michael Ellerman
@ 2024-08-21 22:39 ` Christian Lamparter
  2024-08-22  5:25   ` LEROY Christophe
  1 sibling, 1 reply; 17+ messages in thread
From: Christian Lamparter @ 2024-08-21 22:39 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, Stan Johnson, Finn Thain

Sorry to write a reply to this old mail. But after years, I finally
decided to tackle an "old" problem that has come up...
And unfortunately it is related to this patch. But let me explain.

On 9/16/21 4:52 PM, Christophe Leroy wrote:
> dcbz instruction shouldn't be used on non-cached memory. Using
> it on non-cached memory can result in alignment exception and
> implies a heavy handling.
>
> Instead of silentely emulating the instruction and resulting in high
> performance degradation, warn whenever an alignment exception is
> taken in kernel mode due to dcbz, so that the user is made aware that
> dcbz instruction has been used unexpectedly by the kernel.



The devices that are affected are APM821xx: MyBook Live, Meraki MX60, Netgear WNDR4700, ...
All these have this "splat" during boot:

|[ 8.777686] PPC 4xx OCP EMAC driver, version 3.54 [ 8.782961] ------------[ cut here ]------------ [ 8.787565] WARNING: CPU: 0 PID: 1 at fix_alignment+0x144/0x16c [ 8.793474] Modules linked in: [ 8.796527] CPU: 0 PID: 1 Comm: swapper Not tainted 6.6.47 #0 [ 8.802253] Hardware name: Meraki MX60/MX60W Security Appliance APM821XX 0x12c41c83 PowerPC 44x Platform [ 8.811688] NIP: c0003158 LR: c0003070 CTR: c001471c [ 8.816725] REGS: c10319d0 TRAP: 0700 Not tainted (6.6.47) [ 8.822453] MSR: 00029000 <CE,EE,ME> CR: 24000442 XER: 20000000 [ 8.828630] [ 8.828630] GPR00: c000855c c1031ac0 c1051c00 00000000 3e039bf6 00000018 01f01cdf 00000004 [ 8.828630] GPR08: 0000000f 00000000 00000510 c1031b30 c0013a94 00000000 c0002518 00000000 [ 8.828630] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 8.828630] GPR24: 00000000 dfb953ac c00790c8 c104e120 c10f5610 00001000 
c0d30000 c1031b40 [ 8.863432] NIP [c0003158] fix_alignment+0x144/0x16c [ 8.868384] LR [c0003070] fix_alignment+0x5c/0x16c [ 8.873162] Call Trace: [ 8.875598] [c1031ac0] [c0186efc] get_page_from_freelist+0x208/0xdac (unreliable) [ 8.883063] [c1031b10] [c000855c] alignment_exception+0xf0/0x164 [ 8.889058] [c1031b30] [c0000a30] Alignment+0x130/0x180 [ 8.894269] --- interrupt: 600 at memset+0x60/0xc0 [ 8.899063] NIP: c0013a94 LR: c007a1ac CTR: 0000007f [ 8.904092] REGS: c1031b40 TRAP: 0600 Not tainted (6.6.47) [ 8.909812] MSR: 00029000 <CE,EE,ME> CR: 44000442 XER: 20000000 [ 8.915989] DEAR: e1168020 ESR: 00000000 [ 8.915989] GPR00: 00000007 c1031c30 c1051c00 e1168000 00000000 00001000 e116801c 00000004 [ 8.915989] GPR08: 00001000 0000007f 00000a3a 00000000 24000848 00000000 c0002518 00000000 [ 8.915989] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 
8.915989] GPR24: 00000000 dfb953ac c00790c8 c104e120 c10f5610 00001000 c0d30000 e1168000 [ 8.953306] NIP [c0013a94] memset+0x60/0xc0 [ 8.957478] LR [c007a1ac] dma_direct_alloc+0x1e4/0x290 [ 8.962620] --- interrupt: 600 [ 8.965661] [c1031c30] [c007a130] dma_direct_alloc+0x168/0x290 (unreliable) [ 8.972608] [c1031c70] [c00790c8] dma_alloc_attrs+0xa8/0xf8 [ 8.978163] [c1031cb0] [c04d419c] mal_probe+0x26c/0x664 [ 8.983390] [c1031ce0] [c04326d8] platform_probe+0x60/0xb0 [ 8.988867] [c1031d00] [c042f6f0] really_probe+0xd8/0x3b4 [ 8.994251] [c1031d30] [c042fbdc] driver_probe_device+0x48/0x104 [ 9.000238] [c1031d60] [c042fe7c] __driver_attach+0xb4/0x1a4 [ 9.005880] [c1031d80] [c042cdc8] bus_for_each_dev+0x7c/0xc8 [ 9.011548] [c1031db0] [c042e708] bus_add_driver+0x170/0x25c [ 9.017199] [c1031de0] [c0430cec] driver_register+0x8c/0x164 [ 9.022840] [c1031e00] [c0902de0] emac_init+0x17c/0x1ec [ 
9.028067] [c1031e40] [c0002680] do_one_initcall+0x50/0x2ac [ 9.033709] [c1031eb0] [c08e60f8] kernel_init_freeable+0x1b0/0x230 [ 9.039870] [c1031ee0] [c0002538] kernel_init+0x20/0x118 [ 9.045166] [c1031f00] [c000d1f8] ret_from_kernel_user_thread+0x10/0x18 [ 9.051758] --- interrupt: 0 at 0x0 [ 9.055241] Code: 39400000 7d500ba6 4c00012c 2c090000 41a2fefc 83e1004c 3860fff2 38210050 4e800020 38a00000 3920fff2 4bffffd4 <0fe00000> 4bffff8c 80010054 7c0803a6 [ 9.069929] ---[ end trace 0000000000000000 ]---|

what happens is that the EMAC (ethernet driver) uses a dma_alloc_coherent here:
<https://elixir.bootlin.com/linux/v6.6.47/source/drivers/net/ethernet/ibm/emac/mal.c#L636>

| bd_size <https://elixir.bootlin.com/linux/v6.6.47/C/ident/bd_size>=sizeof(structmal_descriptor <https://elixir.bootlin.com/linux/v6.6.47/C/ident/mal_descriptor>)*
| (NUM_TX_BUFF <https://elixir.bootlin.com/linux/v6.6.47/C/ident/NUM_TX_BUFF>*mal <https://elixir.bootlin.com/linux/v6.6.47/C/ident/mal>->num_tx_chans <https://elixir.bootlin.com/linux/v6.6.47/C/ident/num_tx_chans>+
| NUM_RX_BUFF <https://elixir.bootlin.com/linux/v6.6.47/C/ident/NUM_RX_BUFF>*mal <https://elixir.bootlin.com/linux/v6.6.47/C/ident/mal>->num_rx_chans <https://elixir.bootlin.com/linux/v6.6.47/C/ident/num_rx_chans>);
| mal <https://elixir.bootlin.com/linux/v6.6.47/C/ident/mal>->bd_virt <https://elixir.bootlin.com/linux/v6.6.47/C/ident/bd_virt>=dma_alloc_coherent <https://elixir.bootlin.com/linux/v6.6.47/C/ident/dma_alloc_coherent>(&ofdev <https://elixir.bootlin.com/linux/v6.6.47/C/ident/ofdev>->dev,bd_size <https://elixir.bootlin.com/linux/v6.6.47/C/ident/bd_size>,&mal <https://elixir.bootlin.com/linux/v6.6.47/C/ident/mal>->bd_dma <https://elixir.bootlin.com/linux/v6.6.47/C/ident/bd_dma>,
| GFP_KERNEL <https://elixir.bootlin.com/linux/v6.6.47/C/ident/GFP_KERNEL>);
|

(bd_size is 2x1024 = 2048)
and this results in a call to dma_direct_allocation(), which has one
innocent looking memset():
<https://elixir.bootlin.com/linux/v6.6.47/source/kernel/dma/direct.c#L301>
which zeros out the area before returning it back...

And of course, this triggers the WARNING above. As for why memset uses
that dcbz... this is explained in:
<https://elixir.bootlin.com/linux/v6.6.47/source/arch/powerpc/lib/copy_32.S#L86>

| * Use dcbz on the complete cache lines in the destination
| * to set them to zero. This requires that the destination
| * area is cacheable. -- paulus
| *
| * During early init, cache might not be active yet, so dcbz cannot be used.
| * We therefore skip the optimised bloc that uses dcbz. This jump is
| * replaced by a nop once cache is active. This is done in machine_init()

could this WARN_ON_ONCE maybe be downgraded to a single pr_info_once?
I don't see how to tackle this "neatly" in a better way.

BugLink: <https://github.com/openwrt/openwrt/pull/14037#issuecomment-2302485414>

Best Regards,
Christian

>
> Reported-by: Stan Johnson <userm57@yahoo.com>
> Cc: Finn Thain <fthain@linux-m68k.org>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
> v2: Warn only when emulating kernel mode
> ---
>   arch/powerpc/kernel/align.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
> index bbb4181621dd..bf96b954a4eb 100644
> --- a/arch/powerpc/kernel/align.c
> +++ b/arch/powerpc/kernel/align.c
> @@ -349,6 +349,7 @@ int fix_alignment(struct pt_regs *regs)
>   		if (op.type != CACHEOP + DCBZ)
>   			return -EINVAL;
>   		PPC_WARN_ALIGNMENT(dcbz, regs);
> +		WARN_ON_ONCE(!user_mode(regs));
>   		r = emulate_dcbz(op.ea, regs);
>   	} else {
>   		if (type == LARX || type == STCX)



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-21 22:39 ` Christian Lamparter
@ 2024-08-22  5:25   ` LEROY Christophe
  2024-08-22  5:32     ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: LEROY Christophe @ 2024-08-22  5:25 UTC (permalink / raw)
  To: Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Stan Johnson, Finn Thain, Christoph Hellwig

+ CC Christoph Hellwig

Hi,

Le 22/08/2024 à 00:39, Christian Lamparter a écrit :
> [Vous ne recevez pas souvent de courriers de
> christian.lamparter@isd.uni-stuttgart.de. Découvrez pourquoi ceci est
> important à https://aka.ms/LearnAboutSenderIdentification ]
>
> Sorry to write a reply to this old mail. But after years, I finally
> decided to tackle an "old" problem that has come up...
> And unfortunately it is related to this patch. But let me explain.
>
> On 9/16/21 4:52 PM, Christophe Leroy wrote:
>> dcbz instruction shouldn't be used on non-cached memory. Using
>> it on non-cached memory can result in alignment exception and
>> implies a heavy handling.
>>
>> Instead of silentely emulating the instruction and resulting in high
>> performance degradation, warn whenever an alignment exception is
>> taken in kernel mode due to dcbz, so that the user is made aware that
>> dcbz instruction has been used unexpectedly by the kernel.
>
>
>
> The devices that are affected are APM821xx: MyBook Live, Meraki MX60,
> Netgear WNDR4700, ...
> All these have this "splat" during boot:
>

...

>
> what happens is that the EMAC (ethernet driver) uses a
> dma_alloc_coherent here:


...

> and this results in a call to dma_direct_allocation(), which has one
> innocent looking memset():


memset() can't be used on non-cached memory, memset_io() has to be used
instead.

...

> And of course, this triggers the WARNING above. As for why memset uses
> that dcbz... this is explained in:
...

>
> could this WARN_ON_ONCE maybe be downgraded to a single pr_info_once?
> I don't see how to tackle this "neatly" in a better way.

Can you try with the change below ?

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 4480a3cd92e0..e2bee5fcc172 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -283,13 +283,15 @@ void *dma_direct_alloc(struct device *dev, size_t
size,
                                __builtin_return_address(0));
                if (!ret)
                        goto out_free_pages;
+
+               memset_io(ret, 0, size);
        } else {
                ret = page_address(page);
                if (dma_set_decrypted(dev, ret, size))
                        goto out_leak_pages;
-       }

-       memset(ret, 0, size);
+               memset(ret, 0, size);
+       }

        if (set_uncached) {
                arch_dma_prep_coherent(page, size);



>
> BugLink:
> <https://github.com/openwrt/openwrt/pull/14037#issuecomment-2302485414>
>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-22  5:25   ` LEROY Christophe
@ 2024-08-22  5:32     ` Christoph Hellwig
  2024-08-22  6:39       ` LEROY Christophe
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-22  5:32 UTC (permalink / raw)
  To: LEROY Christophe
  Cc: Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain,
	Christoph Hellwig

On Thu, Aug 22, 2024 at 05:25:10AM +0000, LEROY Christophe wrote:
> > and this results in a call to dma_direct_allocation(), which has one
> > innocent looking memset():
> 
> 
> memset() can't be used on non-cached memory, memset_io() has to be used
> instead.

No, we use memset on uncached memory all the time.  Note that uncached
memory != __iomem memory, for which you DO have to use memset_io.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-22  5:32     ` Christoph Hellwig
@ 2024-08-22  6:39       ` LEROY Christophe
  2024-08-22  7:14         ` Christoph Hellwig
  2024-08-23 13:06         ` Segher Boessenkool
  0 siblings, 2 replies; 17+ messages in thread
From: LEROY Christophe @ 2024-08-22  6:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain



Le 22/08/2024 à 07:32, Christoph Hellwig a écrit :
> On Thu, Aug 22, 2024 at 05:25:10AM +0000, LEROY Christophe wrote:
>>> and this results in a call to dma_direct_allocation(), which has one
>>> innocent looking memset():
>>
>>
>> memset() can't be used on non-cached memory, memset_io() has to be used
>> instead.
> 
> No, we use memset on uncached memory all the time.  Note that uncached
> memory != __iomem memory, for which you DO have to use memset_io.
> 

Then we have a subject here.

powerpc has a magic instruction 'dcbz' which clears a full cacheline in 
one go. It is far more efficient than a loop to store zeros, and since 
2015 memset(0) has been implemented with that instruction (commit 
5b2a32e80634 ("powerpc/32: memset(0): use cacheable_memzero"))

But that instruction generates an alignment exception when used on 
non-cached memory (whether it is RAM or not doesn't matter). It is then 
emulated by the kernel but it of course leads to a serious performance 
degradation, hence the warning added by commit cbe654c77961 ("powerpc: 
warn on emulation of dcbz instruction in kernel mode"). Until now it 
helped identify and fix use of memset() on IO memory.

But if memset() is expected to be used with non-cached RAM, then I don't 
know what to do. Any suggestion ?

Christophe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-22  6:39       ` LEROY Christophe
@ 2024-08-22  7:14         ` Christoph Hellwig
  2024-08-22  9:53           ` Benjamin Herrenschmidt
  2024-08-22 18:19           ` Christian Lamparter
  2024-08-23 13:06         ` Segher Boessenkool
  1 sibling, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-22  7:14 UTC (permalink / raw)
  To: LEROY Christophe
  Cc: Christoph Hellwig, Christian Lamparter, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain

On Thu, Aug 22, 2024 at 06:39:33AM +0000, LEROY Christophe wrote:
> powerpc has a magic instruction 'dcbz' which clears a full cacheline in 
> one go. It is far more efficient than a loop to store zeros, and since 
> 2015 memset(0) has been implemented with that instruction (commit 
> 5b2a32e80634 ("powerpc/32: memset(0): use cacheable_memzero"))
> 
> But that instruction generates an alignment exception when used on 
> non-cached memory (whether it is RAM or not doesn't matter). It is then 
> emulated by the kernel but it of course leads to a serious performance 
> degradation, hence the warning added by commit cbe654c77961 ("powerpc: 
> warn on emulation of dcbz instruction in kernel mode"). Until now it 
> helped identify and fix use of memset() on IO memory.
> 
> But if memset() is expected to be used with non-cached RAM, then I don't 
> know what to do. Any suggestion ?

I'd suggest two things:

 1) remove the warning.  The use case is perfectly valid and everything
    using uncached memory is already slow, so people will just have to
    deal with it.  Maybe offer a trace point instead if people care about
    it.
 2) figure out a way to avoid this case in the dma-coherent allocator,
    which is probably the only case where it happens frequently
    (a few drivers also zero or re-zero coherent memory, but most of the
     time that is cargo cult programming and not actually needed)

For 2 I can think of two options:

 a) provide a arch hook for zeroing the dma memory that defaults to
    memset, but which powerpc can override
 a) figure out a way to clear the memory before marking it uncached
    if we can

a) it obviously easier to verify, but b) is probably going to give
   way better performance.

Below is an untested implementation of b) for dma-direct, I just need to
find out if there is any architecture that requires the memory to be
zeroed after it іt has been remapped.  The iommu drivers might also
need similar treatment.

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 4480a3cd92e087..66e94b32ab0081 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -275,6 +275,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 		if (force_dma_unencrypted(dev))
 			prot = pgprot_decrypted(prot);
 
+		if (!PageHighMem(page))
+			memset(page_address(page), 0, size);
+
 		/* remove any dirty cache lines on the kernel alias */
 		arch_dma_prep_coherent(page, size);
 
@@ -283,14 +286,15 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 				__builtin_return_address(0));
 		if (!ret)
 			goto out_free_pages;
+		if (PageHighMem(page))
+			memset(ret, 0, size);
 	} else {
 		ret = page_address(page);
 		if (dma_set_decrypted(dev, ret, size))
 			goto out_leak_pages;
+		memset(ret, 0, size);
 	}
 
-	memset(ret, 0, size);
-
 	if (set_uncached) {
 		arch_dma_prep_coherent(page, size);
 		ret = arch_dma_set_uncached(ret, size);


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-22  7:14         ` Christoph Hellwig
@ 2024-08-22  9:53           ` Benjamin Herrenschmidt
  2024-08-22 18:19           ` Christian Lamparter
  1 sibling, 0 replies; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2024-08-22  9:53 UTC (permalink / raw)
  To: Christoph Hellwig, LEROY Christophe
  Cc: Christian Lamparter, Paul Mackerras, Michael Ellerman,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Stan Johnson, Finn Thain

On Thu, 2024-08-22 at 09:14 +0200, Christoph Hellwig wrote:
> 
> I'd suggest two things:
> 
>  1) remove the warning.  The use case is perfectly valid and everything
>     using uncached memory is already slow, so people will just have to
>     deal with it.  Maybe offer a trace point instead if people care about
>     it.

Well... there's slow and extremely slow .. :-) dcbz() on uncached
memory will take an exception for every "cache line" of zeros which
will then need to be emulated. We are talking about hundreds to
thousands of cycle per cache line. 

(Do we have an optimisation to detect memset in the emulation code and
force it to return to a non-dcbz'ing version ? If not, that's doable
and would at least limit it to one exception per memset() instead of
one per cache-line). 

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-22  7:14         ` Christoph Hellwig
  2024-08-22  9:53           ` Benjamin Herrenschmidt
@ 2024-08-22 18:19           ` Christian Lamparter
  2024-08-23  8:07             ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Christian Lamparter @ 2024-08-22 18:19 UTC (permalink / raw)
  To: Christoph Hellwig, LEROY Christophe
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Stan Johnson, Finn Thain

Hi!

Thank you all for your insightful insights :-) .
... and derp, used the wrong mail.

On 8/22/24 9:14 AM, Christoph Hellwig wrote:
> On Thu, Aug 22, 2024 at 06:39:33AM +0000, LEROY Christophe wrote:
>> powerpc has a magic instruction 'dcbz' which clears a full cacheline in
>> one go. It is far more efficient than a loop to store zeros, and since
>> 2015 memset(0) has been implemented with that instruction (commit
>> 5b2a32e80634 ("powerpc/32: memset(0): use cacheable_memzero"))
>>
>> But that instruction generates an alignment exception when used on
>> non-cached memory (whether it is RAM or not doesn't matter). It is then
>> emulated by the kernel but it of course leads to a serious performance
>> degradation, hence the warning added by commit cbe654c77961 ("powerpc:
>> warn on emulation of dcbz instruction in kernel mode"). Until now it
>> helped identify and fix use of memset() on IO memory.
>>
>> But if memset() is expected to be used with non-cached RAM, then I don't
>> know what to do. Any suggestion ?
> 
> I'd suggest two things:
> 
>   1) remove the warning.  The use case is perfectly valid and everything
>      using uncached memory is already slow, so people will just have to
>      deal with it.  Maybe offer a trace point instead if people care about
>      it.
>   2) figure out a way to avoid this case in the dma-coherent allocator,
>      which is probably the only case where it happens frequently
>      (a few drivers also zero or re-zero coherent memory, but most of the
>       time that is cargo cult programming and not actually needed)

I tested your patch below and got the next warning. This time from
dma_alloc_from_pool about a similar memset in:
https://elixir.bootlin.com/linux/v6.6.47/source/kernel/dma/pool.c#L261

it triggers for the sata_dwc_ex460 driver when it access the harddrive.
For anyone interested, please look below for the full warning splat.

(I switched to the MyBook Live, it's easier to disassemble+reassemble
than the MX60... but with similar results).

> For 2 I can think of two options:
> 
>   a) provide a arch hook for zeroing the dma memory that defaults to
>      memset, but which powerpc can override
>   a) figure out a way to clear the memory before marking it uncached
>      if we can
> 
> a) it obviously easier to verify, but b) is probably going to give
>     way better performance.
> 
> Below is an untested implementation of b) for dma-direct, I just need to
> find out if there is any architecture that requires the memory to be
> zeroed after it іt has been remapped.  The iommu drivers might also
> need similar treatment.
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 4480a3cd92e087..66e94b32ab0081 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -275,6 +275,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>   		if (force_dma_unencrypted(dev))
>   			prot = pgprot_decrypted(prot);
>   
> +		if (!PageHighMem(page))
> +			memset(page_address(page), 0, size);
> +
>   		/* remove any dirty cache lines on the kernel alias */
>   		arch_dma_prep_coherent(page, size);
>   
> @@ -283,14 +286,15 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>   				__builtin_return_address(0));
>   		if (!ret)
>   			goto out_free_pages;
> +		if (PageHighMem(page))
> +			memset(ret, 0, size);
>   	} else {
>   		ret = page_address(page);
>   		if (dma_set_decrypted(dev, ret, size))
>   			goto out_leak_pages;
> +		memset(ret, 0, size);
>   	}
>   
> -	memset(ret, 0, size);
> -
>   	if (set_uncached) {
>   		arch_dma_prep_coherent(page, size);
>   		ret = arch_dma_set_uncached(ret, size);

I added a counter in fix_alignment to count the number of traps
issued by the kernel code. From what I measured in a quick test,
With this patch, the count during boot went down from ~21000
(unmodified) to 1200 (patched)... but from what I can tell, it
still triggers for the dma_direct_alloc in emac.

if there are more patches, please. I can test them during the next
few days while I'm on holiday.

Cheers,
Christian

---

[    2.596136] ------------[ cut here ]------------
[    2.600759] WARNING: CPU: 0 PID: 24 at fix_alignment+0x17c/0x1b4
[    2.606760] Modules linked in:
[    2.609813] CPU: 0 PID: 24 Comm: kworker/u2:2 Not tainted 6.6.47 #0
[    2.616057] Hardware name: MyBook Live APM821XX 0x12c41c83 PowerPC 44x Platform
[    2.623337] Workqueue: events_unbound async_run_entry_fn
[    2.628646] NIP:  c0003190 LR: c0003070 CTR: c0014744
[    2.633674] REGS: c11a94b0 TRAP: 0700   Not tainted  (6.6.47)
[    2.639402] MSR:  00021000 <CE,ME>  CR: 22008808  XER: 20000000
[    2.645312]
[    2.645312] GPR00: c0008640 c11a95a0 c10de300 00000000 00000001 00000018 01f01cdf 00000004
[    2.645312] GPR08: c0f30000 00000000 0147ae14 c11a9610 c0013abc 00000000 c004c1e4 00000002
[    2.645312] GPR16: 00000000 00000001 00000000 00000000 d1000000 c0910000 c1182010 c0f50000
[    2.645312] GPR24: c11a9774 c00804a8 c0f30000 c0910000 c0910000 00001000 c108e3a0 c11a9620
[    2.680114] NIP [c0003190] fix_alignment+0x17c/0x1b4
[    2.685065] LR [c0003070] fix_alignment+0x5c/0x1b4
[    2.689844] Call Trace:
[    2.692279] [c11a95a0] [c11a9734] 0xc11a9734 (unreliable)
[    2.697670] [c11a95f0] [c0008640] alignment_exception+0xf0/0x164
[    2.703666] [c11a9610] [c0000a30] Alignment+0x130/0x180
[    2.708877] --- interrupt: 600 at memset+0x60/0xc0
[    2.713672] NIP:  c0013abc LR: c0081df8 CTR: 0000007f
[    2.718700] REGS: c11a9620 TRAP: 0600   Not tainted  (6.6.47)
[    2.724420] MSR:  00021000 <CE,ME>  CR: 42008808  XER: 20000000
[    2.730330] DEAR: d1000020 ESR: 00000000
[    2.730330] GPR00: 00000007 c11a9710 c10de300 d1000000 00000000 00001000 d100001c 00000004
[    2.730330] GPR08: 00001000 0000007f 00000001 53c20800 28008808 00000000 c004c1e4 00000002
[    2.730330] GPR16: 00000000 00000001 00000000 00000000 d1000000 c0910000 c1182010 c0f50000
[    2.730330] GPR24: c11a9774 c00804a8 c0f30000 c0910000 c0910000 00001000 c108e3a0 000010a0
[    2.767645] NIP [c0013abc] memset+0x60/0xc0
[    2.771819] LR [c0081df8] dma_alloc_from_pool+0x118/0x204
[    2.777201] --- interrupt: 600
[    2.780242] [c11a9710] [c0081eb0] dma_alloc_from_pool+0x1d0/0x204 (unreliable)
[    2.787449] [c11a9760] [c00807e0] dma_direct_alloc+0x90/0x2a4
[    2.793178] [c11a97a0] [c007f850] dma_alloc_attrs+0xa8/0xf8
[    2.798758] [c11a97e0] [c019d180] dma_pool_alloc+0x11c/0x2ac
[    2.804409] [c11a9810] [c0434cec] dwc_desc_get+0x2c/0x98
[    2.809722] [c11a9840] [c04355cc] dwc_prep_slave_sg+0x180/0x524
[    2.815633] [c11a98b0] [c04d7938] sata_dwc_qc_issue+0x1d8/0x268
[    2.821559] [c11a9920] [c04bbfb0] ata_qc_issue+0x174/0x2b0
[    2.827046] [c11a9940] [c04c674c] __ata_scsi_queuecmd+0x200/0x4fc
[    2.833129] [c11a9960] [c04c6a84] ata_scsi_queuecmd+0x3c/0x88
[    2.838865] [c11a9980] [c04a7bb0] scsi_queue_rq+0x6fc/0xb40
[    2.844429] [c11a99c0] [c03801e8] __blk_mq_issue_directly+0x40/0xe0
[    2.850694] [c11a99f0] [c038462c] blk_mq_try_issue_directly+0xa8/0x10c
[    2.857208] [c11a9a10] [c03854e4] blk_mq_submit_bio+0x5f4/0x67c
[    2.863118] [c11a9a70] [c0374ef4] submit_bio_noacct_nocheck+0x210/0x2d0
[    2.869728] [c11a9aa0] [c0206850] block_read_full_folio+0x1f4/0x3a4
[    2.875983] [c11a9b10] [c01353d8] filemap_read_folio+0x40/0x240
[    2.881910] [c11a9b60] [c01382d0] do_read_cache_folio+0xb4/0x228
[    2.887906] [c11a9b90] [c039144c] read_part_sector+0x50/0x10c
[    2.893643] [c11a9bb0] [c0392358] read_lba+0xb8/0x198
[    2.898680] [c11a9bf0] [c0392854] efi_partition+0xd8/0xc30
[    2.904150] [c11a9cc0] [c0390be4] bdev_disk_changed+0x2ac/0x710
[    2.910051] [c11a9d30] [c036be5c] blkdev_get_whole+0xc4/0xd8
[    2.915693] [c11a9d50] [c036cb7c] blkdev_get_by_dev.part.0+0x288/0x380
[    2.922199] [c11a9d90] [c038d49c] disk_scan_partitions+0x90/0x150
[    2.928272] [c11a9dc0] [c038d9ac] device_add_disk+0x450/0x484
[    2.934001] [c11a9df0] [c04b7718] sd_probe+0x360/0x4fc
[    2.939132] [c11a9e20] [c0479db0] really_probe+0x2a0/0x370
[    2.944628] [c11a9e50] [c0479f10] __driver_probe_device+0x90/0x200
[    2.950796] [c11a9e70] [c047a0c8] driver_probe_device+0x48/0x104
[    2.956793] [c11a9ea0] [c047a268] __device_attach_driver+0xe4/0x130
[    2.963048] [c11a9ec0] [c04772e0] bus_for_each_drv+0x88/0xe4
[    2.968699] [c11a9ef0] [c047939c] __device_attach_async_helper+0xa0/0xe8
[    2.975386] [c11a9f20] [c005096c] async_run_entry_fn+0x40/0x11c
[    2.981296] [c11a9f40] [c0044b70] process_one_work+0x1bc/0x35c
[    2.987111] [c11a9f70] [c0045088] worker_thread+0x378/0x4e8
[    2.992667] [c11a9fc0] [c004c2c8] kthread+0xe4/0xe8
[    2.997548] [c11a9ff0] [c000d210] start_kernel_thread+0x10/0x14
[    3.003449] Code: 39400000 7d500ba6 4c00012c 2c090000 41a2fec4 83e1004c 3860fff2 38210050 4e800020 38a00000 3920fff2 4bffffd4 <0fe00000> 4bffff8c 80010054 7c0803a6
[    3.018138] ---[ end trace 0000000000000000 ]---




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-22 18:19           ` Christian Lamparter
@ 2024-08-23  8:07             ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-23  8:07 UTC (permalink / raw)
  To: Christian Lamparter
  Cc: Christoph Hellwig, LEROY Christophe, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain,
	Vinod Koul, dmaengine

dma_alloc_from_pool is the allocator used when the caller can't
sleep, and as that is reusing memory it really has to call memset
or a memset-like function on the already uncached memory
unfortunately.  The dma engine operation that is doing this allocation
is documented as not being able to sleep, which is a bit unfortunate
as the storage driver above it could sleep just fine.

Adding the dmaengine maintainer and list if there is a way to pass a gfp
flag or some other indicator that the implementations can sleep,
which would avoid the need to use the not very scalable dma pool,
and thus also the need to memset on uncached memory.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-22  6:39       ` LEROY Christophe
  2024-08-22  7:14         ` Christoph Hellwig
@ 2024-08-23 13:06         ` Segher Boessenkool
  2024-08-23 13:54           ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2024-08-23 13:06 UTC (permalink / raw)
  To: LEROY Christophe
  Cc: Christoph Hellwig, Christian Lamparter, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain

Hi!

On Thu, Aug 22, 2024 at 06:39:33AM +0000, LEROY Christophe wrote:
> Le 22/08/2024 à 07:32, Christoph Hellwig a écrit :
> > On Thu, Aug 22, 2024 at 05:25:10AM +0000, LEROY Christophe wrote:
> >>> and this results in a call to dma_direct_allocation(), which has one
> >>> innocent looking memset():
> >>
> >>
> >> memset() can't be used on non-cached memory, memset_io() has to be used
> >> instead.
> > 
> > No, we use memset on uncached memory all the time.  Note that uncached
> > memory != __iomem memory, for which you DO have to use memset_io.
> > 
> 
> Then we have a subject here.
> 
> powerpc has a magic instruction 'dcbz' which clears a full cacheline in 
> one go. It is far more efficient than a loop to store zeros, and since 
> 2015 memset(0) has been implemented with that instruction (commit 
> 5b2a32e80634 ("powerpc/32: memset(0): use cacheable_memzero"))
> 
> But that instruction generates an alignment exception when used on 
> non-cached memory (whether it is RAM or not doesn't matter).

What does "uncached memory" even mean here?  Literally it would be
I=1 memory (uncachEABLE memory), but more likely you want M=0 memory
here ("non-memory memory", "not well-behaved memory", MMIO often).

M=0 memory shouldn't ever have memset done on it, that is insane.  And
I=1 memory should not have the same optimised routines used, since
those only make things slower still.

> It is then 
> emulated by the kernel but it of course leads to a serious performance 
> degradation, hence the warning added by commit cbe654c77961 ("powerpc: 
> warn on emulation of dcbz instruction in kernel mode"). Until now it 
> helped identify and fix use of memset() on IO memory.
> 
> But if memset() is expected to be used with non-cached RAM, then I don't 
> know what to do. Any suggestion ?

If memset() is expected to be used with M=0, you cannot do any serious
optimisations to it at all.  If memset() is expected to be used with I=1
it should use a separate code path for it, probably the caller should
make the distinction.


Segher


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-23 13:06         ` Segher Boessenkool
@ 2024-08-23 13:54           ` Christoph Hellwig
  2024-08-23 19:19             ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-23 13:54 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: LEROY Christophe, Christoph Hellwig, Christian Lamparter,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Stan Johnson, Finn Thain

On Fri, Aug 23, 2024 at 08:06:00AM -0500, Segher Boessenkool wrote:
> What does "uncached memory" even mean here?  Literally it would be
> I=1 memory (uncachEABLE memory), but more likely you want M=0 memory
> here ("non-memory memory", "not well-behaved memory", MMIO often).

Regular kernel memory vmapped with pgprot_noncached().

> If memset() is expected to be used with M=0, you cannot do any serious
> optimisations to it at all.  If memset() is expected to be used with I=1
> it should use a separate code path for it, probably the caller should
> make the distinction.

DMA coherent memory which uses uncached memory for platforms that
do not provide hardware dma coherence can end up just about anywhere
in the kernel.  We could use special routines for a few places in
the DMA subsystem, but there might be plenty of others.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-23 13:54           ` Christoph Hellwig
@ 2024-08-23 19:19             ` Segher Boessenkool
  2024-08-23 23:43               ` Christian Lamparter
  2024-08-24  9:01               ` LEROY Christophe
  0 siblings, 2 replies; 17+ messages in thread
From: Segher Boessenkool @ 2024-08-23 19:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: LEROY Christophe, Christian Lamparter, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain

Hi!

On Fri, Aug 23, 2024 at 03:54:59PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 23, 2024 at 08:06:00AM -0500, Segher Boessenkool wrote:
> > What does "uncached memory" even mean here?  Literally it would be
> > I=1 memory (uncachEABLE memory), but more likely you want M=0 memory
> > here ("non-memory memory", "not well-behaved memory", MMIO often).
> 
> Regular kernel memory vmapped with pgprot_noncached().

So, I=1 (and G=1).  Caching inhibited and guarded.  But M=1 (memory
coherence required) as with any other real memory :-)

> > If memset() is expected to be used with M=0, you cannot do any serious
> > optimisations to it at all.  If memset() is expected to be used with I=1
> > it should use a separate code path for it, probably the caller should
> > make the distinction.
> 
> DMA coherent memory which uses uncached memory for platforms that
> do not provide hardware dma coherence can end up just about anywhere
> in the kernel.  We could use special routines for a few places in
> the DMA subsystem, but there might be plenty of others.

Yeah.  It will just be plenty slow, as we see here, that's what the
warning is for; but it works just fine :-)

The memset() code itself could chech for the storage attributes, but
that is probably more expensive than just assuming the happy case.
Maybe someone could try it out though!


Segher


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-23 19:19             ` Segher Boessenkool
@ 2024-08-23 23:43               ` Christian Lamparter
  2024-08-24  9:01               ` LEROY Christophe
  1 sibling, 0 replies; 17+ messages in thread
From: Christian Lamparter @ 2024-08-23 23:43 UTC (permalink / raw)
  To: Segher Boessenkool, Christoph Hellwig
  Cc: LEROY Christophe, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain

On 8/23/24 9:19 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Aug 23, 2024 at 03:54:59PM +0200, Christoph Hellwig wrote:
>> On Fri, Aug 23, 2024 at 08:06:00AM -0500, Segher Boessenkool wrote:
>>> What does "uncached memory" even mean here?  Literally it would be
>>> I=1 memory (uncachEABLE memory), but more likely you want M=0 memory
>>> here ("non-memory memory", "not well-behaved memory", MMIO often).
>>
>> Regular kernel memory vmapped with pgprot_noncached().
> 
> So, I=1 (and G=1).  Caching inhibited and guarded.  But M=1 (memory
> coherence required) as with any other real memory :-)
> 
>>> If memset() is expected to be used with M=0, you cannot do any serious
>>> optimisations to it at all.  If memset() is expected to be used with I=1
>>> it should use a separate code path for it, probably the caller should
>>> make the distinction.
>>
>> DMA coherent memory which uses uncached memory for platforms that
>> do not provide hardware dma coherence can end up just about anywhere
>> in the kernel.  We could use special routines for a few places in
>> the DMA subsystem, but there might be plenty of others.
> 
> Yeah.  It will just be plenty slow, as we see here, that's what the
> warning is for; but it works just fine :-)
> 
> The memset() code itself could chech for the storage attributes, but
> that is probably more expensive than just assuming the happy case.
> Maybe someone could try it out though!

Hmm, Ok! For what's worth I can at least test memset with dcbz+trap and
what it was in 2015, without dcbz in the code path. How about that?

I figured out of all the offenders (ethernet, crypto and sata).
The sata/hard drive would be the most sensitive device to measure any
performance difference. the MyBook Live already had an harddrive
(Seagate ST380815AS (very old)) installed... so I went with that.

I test with OpenWrt, since it has a fully working PowerPC images for
the device, I can use initramfs (so HDD/SDD is idle) and provides a
very bare minimum the hdparm -t "benchmark".
(hdparm -t ... just reads for three seconds and tells you how much it read).

the unmodified 6.6.47 kernel scored:

| Timing buffered disk reads: 220 MB in  3.02 seconds =  72.93 MB/sec
| Timing buffered disk reads: 222 MB in  3.02 seconds =  73.50 MB/sec
| Timing buffered disk reads: 216 MB in  3.00 seconds =  71.94 MB/sec

from what I can tell, each hdparm -t /dev/sda causes ~77000 fix_alignment traps.
(/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size says it's 32 and
type is obviously "Data". If I'm not mistaken this means ~2400KiB of emulated
dcbz by the trap.)

For the test, I added the "old" memset from
<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/lib/copy_32.S?id=df087e450d7ddc0b15bd8824206d964720b4f5e4#n120>
and replaced 6.6.47's memset in dma_pool_alloc() with it
<https://elixir.bootlin.com/linux/v6.6.47/source/mm/dmapool.c#L435>

now no WARNINGS are triggered and hdparm -t /dev/sda produces:

| Timing buffered disk reads: 220 MB in  3.00 seconds =  73.32 MB/sec
| Timing buffered disk reads: 218 MB in  3.02 seconds =  72.28 MB/sec
| Timing buffered disk reads: 224 MB in  3.03 seconds =  74.02 MB/sec

virtually no benefit?! Well, the HDD could be too slow. Let's try an old SSD:
Samsung 840 Evo 120 GB. This one manages to read 1276 MB in 3.06 seconds = ~416 MB/sec
in the same hdparm -t test on a reasonably modern PC when connected via a
usb3<->sata adapter.

unmodified 6.6.47 kernel:

| Timing buffered disk reads: 356 MB in  3.00 seconds = 118.61 MB/sec
| Timing buffered disk reads: 358 MB in  3.01 seconds = 119.12 MB/sec
| Timing buffered disk reads: 358 MB in  3.01 seconds = 119.03 MB/sec

modified 6.6.47 kernel:

| Timing buffered disk reads: 380 MB in  3.01 seconds = 126.30 MB/sec
| Timing buffered disk reads: 374 MB in  3.00 seconds = 124.61 MB/sec
| Timing buffered disk reads: 382 MB in  3.02 seconds = 126.62 MB/sec

Ok! There's something there. ~4%.

Cheers,
Christian


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-23 19:19             ` Segher Boessenkool
  2024-08-23 23:43               ` Christian Lamparter
@ 2024-08-24  9:01               ` LEROY Christophe
  2024-08-24 17:17                 ` Segher Boessenkool
  1 sibling, 1 reply; 17+ messages in thread
From: LEROY Christophe @ 2024-08-24  9:01 UTC (permalink / raw)
  To: Segher Boessenkool, Christoph Hellwig
  Cc: Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain



Le 23/08/2024 à 21:19, Segher Boessenkool a écrit :
> ***ATTENTION, Sopra Steria Group cannot confirm the identity of this email sender (SPF record failure). This might be a fake email from an attacker, if you have any doubts report and delete the email.***
> 
> ***ATTENTION, Sopra Steria Group ne peut pas confirmer l’identité de l’émetteur de ce message (SPF record failure). Il pourrait s’agir d’un faux message, à détruire si vous avez un doute ***
> 
> Hi!
> 
> On Fri, Aug 23, 2024 at 03:54:59PM +0200, Christoph Hellwig wrote:
>> On Fri, Aug 23, 2024 at 08:06:00AM -0500, Segher Boessenkool wrote:
>>> What does "uncached memory" even mean here?  Literally it would be
>>> I=1 memory (uncachEABLE memory), but more likely you want M=0 memory
>>> here ("non-memory memory", "not well-behaved memory", MMIO often).
>>
>> Regular kernel memory vmapped with pgprot_noncached().
> 
> So, I=1 (and G=1).  Caching inhibited and guarded.  But M=1 (memory
> coherence required) as with any other real memory :-)
> 
>>> If memset() is expected to be used with M=0, you cannot do any serious
>>> optimisations to it at all.  If memset() is expected to be used with I=1
>>> it should use a separate code path for it, probably the caller should
>>> make the distinction.
>>
>> DMA coherent memory which uses uncached memory for platforms that
>> do not provide hardware dma coherence can end up just about anywhere
>> in the kernel.  We could use special routines for a few places in
>> the DMA subsystem, but there might be plenty of others.
> 
> Yeah.  It will just be plenty slow, as we see here, that's what the
> warning is for; but it works just fine :-)
> 
> The memset() code itself could chech for the storage attributes, but
> that is probably more expensive than just assuming the happy case.
> Maybe someone could try it out though!

But is it only memset() the problem ?

dcbz instruction is also used in:
- memcpy()
- csum_partial_copy_generic()
- clear_page()
- copy_page()
- clear_user()
- copy_to_user()
- copy_from_user()

Are these functions also used on DMA coherent memory ?

Christophe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-24  9:01               ` LEROY Christophe
@ 2024-08-24 17:17                 ` Segher Boessenkool
  2024-08-27  7:29                   ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2024-08-24 17:17 UTC (permalink / raw)
  To: LEROY Christophe
  Cc: Christoph Hellwig, Christian Lamparter, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain

On Sat, Aug 24, 2024 at 09:01:33AM +0000, LEROY Christophe wrote:
> Le 23/08/2024 à 21:19, Segher Boessenkool a écrit :
> > The memset() code itself could chech for the storage attributes, but
> > that is probably more expensive than just assuming the happy case.
> > Maybe someone could try it out though!
> 
> But is it only memset() the problem ?
> 
> dcbz instruction is also used in:
> - memcpy()
> - csum_partial_copy_generic()
> - clear_page()
> - copy_page()
> - clear_user()
> - copy_to_user()
> - copy_from_user()

That is just a handful of functions.  Not sure about the _user things,
and the _page things for that matter, but the rest is certainly
measurable in real-life conditions.  So if we can avoid the problems
completely, and cheaply, we probably should.

I'm not so sure about the cheaply though :-/

> Are these functions also used on DMA coherent memory ?

Most won't show up high on most profiles, heh.  Which you already
can see from the problem not being attacked yet: if it was so obviously
a problem, some people would have wanted to do something about it :-)


Segher


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
  2024-08-24 17:17                 ` Segher Boessenkool
@ 2024-08-27  7:29                   ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-27  7:29 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: LEROY Christophe, Christoph Hellwig, Christian Lamparter,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Stan Johnson, Finn Thain

On Sat, Aug 24, 2024 at 12:17:57PM -0500, Segher Boessenkool wrote:
> > Are these functions also used on DMA coherent memory ?
> 
> Most won't show up high on most profiles, heh.  Which you already
> can see from the problem not being attacked yet: if it was so obviously
> a problem, some people would have wanted to do something about it :-)

Most drivers try to avoid coherent allocations in the fast path if
they can.  Another good option for Christians problem would be
to switch the the dmaengine driver to use dma_alloc_pages - it doesn't
actually need uncached memory as far as I can, dma_alloc_coherent is
just the only API we had to allocate guaranteed DMAable memory for most
of Linux's existence.



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-08-27  7:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-16 14:52 [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode Christophe Leroy
2021-11-02 10:11 ` Michael Ellerman
2024-08-21 22:39 ` Christian Lamparter
2024-08-22  5:25   ` LEROY Christophe
2024-08-22  5:32     ` Christoph Hellwig
2024-08-22  6:39       ` LEROY Christophe
2024-08-22  7:14         ` Christoph Hellwig
2024-08-22  9:53           ` Benjamin Herrenschmidt
2024-08-22 18:19           ` Christian Lamparter
2024-08-23  8:07             ` Christoph Hellwig
2024-08-23 13:06         ` Segher Boessenkool
2024-08-23 13:54           ` Christoph Hellwig
2024-08-23 19:19             ` Segher Boessenkool
2024-08-23 23:43               ` Christian Lamparter
2024-08-24  9:01               ` LEROY Christophe
2024-08-24 17:17                 ` Segher Boessenkool
2024-08-27  7:29                   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).