* [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode @ 2021-09-16 14:52 Christophe Leroy 2021-11-02 10:11 ` Michael Ellerman 2024-08-21 22:39 ` Christian Lamparter 0 siblings, 2 replies; 17+ messages in thread From: Christophe Leroy @ 2021-09-16 14:52 UTC (permalink / raw) To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman Cc: Stan Johnson, Finn Thain, linuxppc-dev, linux-kernel dcbz instruction shouldn't be used on non-cached memory. Using it on non-cached memory can result in alignment exception and implies a heavy handling. Instead of silentely emulating the instruction and resulting in high performance degradation, warn whenever an alignment exception is taken in kernel mode due to dcbz, so that the user is made aware that dcbz instruction has been used unexpectedly by the kernel. Reported-by: Stan Johnson <userm57@yahoo.com> Cc: Finn Thain <fthain@linux-m68k.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> --- v2: Warn only when emulating kernel mode --- arch/powerpc/kernel/align.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index bbb4181621dd..bf96b954a4eb 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -349,6 +349,7 @@ int fix_alignment(struct pt_regs *regs) if (op.type != CACHEOP + DCBZ) return -EINVAL; PPC_WARN_ALIGNMENT(dcbz, regs); + WARN_ON_ONCE(!user_mode(regs)); r = emulate_dcbz(op.ea, regs); } else { if (type == LARX || type == STCX) -- 2.31.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2021-09-16 14:52 [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode Christophe Leroy @ 2021-11-02 10:11 ` Michael Ellerman 2024-08-21 22:39 ` Christian Lamparter 1 sibling, 0 replies; 17+ messages in thread From: Michael Ellerman @ 2021-11-02 10:11 UTC (permalink / raw) To: Michael Ellerman, Paul Mackerras, Christophe Leroy, Benjamin Herrenschmidt Cc: linuxppc-dev, Stan Johnson, Finn Thain, linux-kernel On Thu, 16 Sep 2021 16:52:09 +0200, Christophe Leroy wrote: > dcbz instruction shouldn't be used on non-cached memory. Using > it on non-cached memory can result in alignment exception and > implies a heavy handling. > > Instead of silentely emulating the instruction and resulting in high > performance degradation, warn whenever an alignment exception is > taken in kernel mode due to dcbz, so that the user is made aware that > dcbz instruction has been used unexpectedly by the kernel. > > [...] Applied to powerpc/next. [1/1] powerpc: warn on emulation of dcbz instruction in kernel mode https://git.kernel.org/powerpc/c/cbe654c779616807e1e6823c3bdbfe07a10562b8 cheers ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2021-09-16 14:52 [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode Christophe Leroy 2021-11-02 10:11 ` Michael Ellerman @ 2024-08-21 22:39 ` Christian Lamparter 2024-08-22 5:25 ` LEROY Christophe 1 sibling, 1 reply; 17+ messages in thread From: Christian Lamparter @ 2024-08-21 22:39 UTC (permalink / raw) To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman Cc: linux-kernel, linuxppc-dev, Stan Johnson, Finn Thain Sorry to write a reply to this old mail. But after years, I finally decided to tackle an "old" problem that has come up... And unfortunately it is related to this patch. But let me explain. On 9/16/21 4:52 PM, Christophe Leroy wrote: > dcbz instruction shouldn't be used on non-cached memory. Using > it on non-cached memory can result in alignment exception and > implies a heavy handling. > > Instead of silentely emulating the instruction and resulting in high > performance degradation, warn whenever an alignment exception is > taken in kernel mode due to dcbz, so that the user is made aware that > dcbz instruction has been used unexpectedly by the kernel. The devices that are affected are APM821xx: MyBook Live, Meraki MX60, Netgear WNDR4700, ... All these have this "splat" during boot: |[ 8.777686] PPC 4xx OCP EMAC driver, version 3.54 [ 8.782961] ------------[ cut here ]------------ [ 8.787565] WARNING: CPU: 0 PID: 1 at fix_alignment+0x144/0x16c [ 8.793474] Modules linked in: [ 8.796527] CPU: 0 PID: 1 Comm: swapper Not tainted 6.6.47 #0 [ 8.802253] Hardware name: Meraki MX60/MX60W Security Appliance APM821XX 0x12c41c83 PowerPC 44x Platform [ 8.811688] NIP: c0003158 LR: c0003070 CTR: c001471c [ 8.816725] REGS: c10319d0 TRAP: 0700 Not tainted (6.6.47) [ 8.822453] MSR: 00029000 <CE,EE,ME> CR: 24000442 XER: 20000000 [ 8.828630] [ 8.828630] GPR00: c000855c c1031ac0 c1051c00 00000000 3e039bf6 00000018 01f01cdf 00000004 [ 8.828630] GPR08: 0000000f 00000000 00000510 c1031b30 c0013a94 00000000 c0002518 00000000 [ 8.828630] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 8.828630] GPR24: 00000000 dfb953ac c00790c8 c104e120 c10f5610 00001000 c0d30000 c1031b40 [ 8.863432] NIP [c0003158] fix_alignment+0x144/0x16c [ 8.868384] LR [c0003070] fix_alignment+0x5c/0x16c [ 8.873162] Call Trace: [ 8.875598] [c1031ac0] [c0186efc] get_page_from_freelist+0x208/0xdac (unreliable) [ 8.883063] [c1031b10] [c000855c] alignment_exception+0xf0/0x164 [ 8.889058] [c1031b30] [c0000a30] Alignment+0x130/0x180 [ 8.894269] --- interrupt: 600 at memset+0x60/0xc0 [ 8.899063] NIP: c0013a94 LR: c007a1ac CTR: 0000007f [ 8.904092] REGS: c1031b40 TRAP: 0600 Not tainted (6.6.47) [ 8.909812] MSR: 00029000 <CE,EE,ME> CR: 44000442 XER: 20000000 [ 8.915989] DEAR: e1168020 ESR: 00000000 [ 8.915989] GPR00: 00000007 c1031c30 c1051c00 e1168000 00000000 00001000 e116801c 00000004 [ 8.915989] GPR08: 00001000 0000007f 00000a3a 00000000 24000848 00000000 c0002518 00000000 [ 8.915989] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 8.915989] GPR24: 00000000 dfb953ac c00790c8 c104e120 c10f5610 00001000 c0d30000 e1168000 [ 8.953306] NIP [c0013a94] memset+0x60/0xc0 [ 8.957478] LR [c007a1ac] dma_direct_alloc+0x1e4/0x290 [ 8.962620] --- interrupt: 600 [ 8.965661] [c1031c30] [c007a130] dma_direct_alloc+0x168/0x290 (unreliable) [ 8.972608] [c1031c70] [c00790c8] dma_alloc_attrs+0xa8/0xf8 [ 8.978163] [c1031cb0] [c04d419c] mal_probe+0x26c/0x664 [ 8.983390] [c1031ce0] [c04326d8] platform_probe+0x60/0xb0 [ 8.988867] [c1031d00] [c042f6f0] really_probe+0xd8/0x3b4 [ 8.994251] [c1031d30] [c042fbdc] driver_probe_device+0x48/0x104 [ 9.000238] [c1031d60] [c042fe7c] __driver_attach+0xb4/0x1a4 [ 9.005880] [c1031d80] [c042cdc8] bus_for_each_dev+0x7c/0xc8 [ 9.011548] [c1031db0] [c042e708] bus_add_driver+0x170/0x25c [ 9.017199] [c1031de0] [c0430cec] driver_register+0x8c/0x164 [ 9.022840] [c1031e00] [c0902de0] emac_init+0x17c/0x1ec [ 9.028067] [c1031e40] [c0002680] do_one_initcall+0x50/0x2ac [ 9.033709] [c1031eb0] [c08e60f8] kernel_init_freeable+0x1b0/0x230 [ 9.039870] [c1031ee0] [c0002538] kernel_init+0x20/0x118 [ 9.045166] [c1031f00] [c000d1f8] ret_from_kernel_user_thread+0x10/0x18 [ 9.051758] --- interrupt: 0 at 0x0 [ 9.055241] Code: 39400000 7d500ba6 4c00012c 2c090000 41a2fefc 83e1004c 3860fff2 38210050 4e800020 38a00000 3920fff2 4bffffd4 <0fe00000> 4bffff8c 80010054 7c0803a6 [ 9.069929] ---[ end trace 0000000000000000 ]---| what happens is that the EMAC (ethernet driver) uses a dma_alloc_coherent here: <https://elixir.bootlin.com/linux/v6.6.47/source/drivers/net/ethernet/ibm/emac/mal.c#L636> | bd_size <https://elixir.bootlin.com/linux/v6.6.47/C/ident/bd_size>=sizeof(structmal_descriptor <https://elixir.bootlin.com/linux/v6.6.47/C/ident/mal_descriptor>)* | (NUM_TX_BUFF <https://elixir.bootlin.com/linux/v6.6.47/C/ident/NUM_TX_BUFF>*mal <https://elixir.bootlin.com/linux/v6.6.47/C/ident/mal>->num_tx_chans <https://elixir.bootlin.com/linux/v6.6.47/C/ident/num_tx_chans>+ | NUM_RX_BUFF <https://elixir.bootlin.com/linux/v6.6.47/C/ident/NUM_RX_BUFF>*mal <https://elixir.bootlin.com/linux/v6.6.47/C/ident/mal>->num_rx_chans <https://elixir.bootlin.com/linux/v6.6.47/C/ident/num_rx_chans>); | mal <https://elixir.bootlin.com/linux/v6.6.47/C/ident/mal>->bd_virt <https://elixir.bootlin.com/linux/v6.6.47/C/ident/bd_virt>=dma_alloc_coherent <https://elixir.bootlin.com/linux/v6.6.47/C/ident/dma_alloc_coherent>(&ofdev <https://elixir.bootlin.com/linux/v6.6.47/C/ident/ofdev>->dev,bd_size <https://elixir.bootlin.com/linux/v6.6.47/C/ident/bd_size>,&mal <https://elixir.bootlin.com/linux/v6.6.47/C/ident/mal>->bd_dma <https://elixir.bootlin.com/linux/v6.6.47/C/ident/bd_dma>, | GFP_KERNEL <https://elixir.bootlin.com/linux/v6.6.47/C/ident/GFP_KERNEL>); | (bd_size is 2x1024 = 2048) and this results in a call to dma_direct_allocation(), which has one innocent looking memset(): <https://elixir.bootlin.com/linux/v6.6.47/source/kernel/dma/direct.c#L301> which zeros out the area before returning it back... And of course, this triggers the WARNING above. As for why memset uses that dcbz... this is explained in: <https://elixir.bootlin.com/linux/v6.6.47/source/arch/powerpc/lib/copy_32.S#L86> | * Use dcbz on the complete cache lines in the destination | * to set them to zero. This requires that the destination | * area is cacheable. -- paulus | * | * During early init, cache might not be active yet, so dcbz cannot be used. | * We therefore skip the optimised bloc that uses dcbz. This jump is | * replaced by a nop once cache is active. This is done in machine_init() could this WARN_ON_ONCE maybe be downgraded to a single pr_info_once? I don't see how to tackle this "neatly" in a better way. BugLink: <https://github.com/openwrt/openwrt/pull/14037#issuecomment-2302485414> Best Regards, Christian > > Reported-by: Stan Johnson <userm57@yahoo.com> > Cc: Finn Thain <fthain@linux-m68k.org> > Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> > --- > v2: Warn only when emulating kernel mode > --- > arch/powerpc/kernel/align.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c > index bbb4181621dd..bf96b954a4eb 100644 > --- a/arch/powerpc/kernel/align.c > +++ b/arch/powerpc/kernel/align.c > @@ -349,6 +349,7 @@ int fix_alignment(struct pt_regs *regs) > if (op.type != CACHEOP + DCBZ) > return -EINVAL; > PPC_WARN_ALIGNMENT(dcbz, regs); > + WARN_ON_ONCE(!user_mode(regs)); > r = emulate_dcbz(op.ea, regs); > } else { > if (type == LARX || type == STCX) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-21 22:39 ` Christian Lamparter @ 2024-08-22 5:25 ` LEROY Christophe 2024-08-22 5:32 ` Christoph Hellwig 0 siblings, 1 reply; 17+ messages in thread From: LEROY Christophe @ 2024-08-22 5:25 UTC (permalink / raw) To: Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain, Christoph Hellwig + CC Christoph Hellwig Hi, Le 22/08/2024 à 00:39, Christian Lamparter a écrit : > [Vous ne recevez pas souvent de courriers de > christian.lamparter@isd.uni-stuttgart.de. Découvrez pourquoi ceci est > important à https://aka.ms/LearnAboutSenderIdentification ] > > Sorry to write a reply to this old mail. But after years, I finally > decided to tackle an "old" problem that has come up... > And unfortunately it is related to this patch. But let me explain. > > On 9/16/21 4:52 PM, Christophe Leroy wrote: >> dcbz instruction shouldn't be used on non-cached memory. Using >> it on non-cached memory can result in alignment exception and >> implies a heavy handling. >> >> Instead of silentely emulating the instruction and resulting in high >> performance degradation, warn whenever an alignment exception is >> taken in kernel mode due to dcbz, so that the user is made aware that >> dcbz instruction has been used unexpectedly by the kernel. > > > > The devices that are affected are APM821xx: MyBook Live, Meraki MX60, > Netgear WNDR4700, ... > All these have this "splat" during boot: > ... > > what happens is that the EMAC (ethernet driver) uses a > dma_alloc_coherent here: ... > and this results in a call to dma_direct_allocation(), which has one > innocent looking memset(): memset() can't be used on non-cached memory, memset_io() has to be used instead. ... > And of course, this triggers the WARNING above. As for why memset uses > that dcbz... this is explained in: ... > > could this WARN_ON_ONCE maybe be downgraded to a single pr_info_once? > I don't see how to tackle this "neatly" in a better way. Can you try with the change below ? diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 4480a3cd92e0..e2bee5fcc172 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -283,13 +283,15 @@ void *dma_direct_alloc(struct device *dev, size_t size, __builtin_return_address(0)); if (!ret) goto out_free_pages; + + memset_io(ret, 0, size); } else { ret = page_address(page); if (dma_set_decrypted(dev, ret, size)) goto out_leak_pages; - } - memset(ret, 0, size); + memset(ret, 0, size); + } if (set_uncached) { arch_dma_prep_coherent(page, size); > > BugLink: > <https://github.com/openwrt/openwrt/pull/14037#issuecomment-2302485414> > ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-22 5:25 ` LEROY Christophe @ 2024-08-22 5:32 ` Christoph Hellwig 2024-08-22 6:39 ` LEROY Christophe 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2024-08-22 5:32 UTC (permalink / raw) To: LEROY Christophe Cc: Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain, Christoph Hellwig On Thu, Aug 22, 2024 at 05:25:10AM +0000, LEROY Christophe wrote: > > and this results in a call to dma_direct_allocation(), which has one > > innocent looking memset(): > > > memset() can't be used on non-cached memory, memset_io() has to be used > instead. No, we use memset on uncached memory all the time. Note that uncached memory != __iomem memory, for which you DO have to use memset_io. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-22 5:32 ` Christoph Hellwig @ 2024-08-22 6:39 ` LEROY Christophe 2024-08-22 7:14 ` Christoph Hellwig 2024-08-23 13:06 ` Segher Boessenkool 0 siblings, 2 replies; 17+ messages in thread From: LEROY Christophe @ 2024-08-22 6:39 UTC (permalink / raw) To: Christoph Hellwig Cc: Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain Le 22/08/2024 à 07:32, Christoph Hellwig a écrit : > On Thu, Aug 22, 2024 at 05:25:10AM +0000, LEROY Christophe wrote: >>> and this results in a call to dma_direct_allocation(), which has one >>> innocent looking memset(): >> >> >> memset() can't be used on non-cached memory, memset_io() has to be used >> instead. > > No, we use memset on uncached memory all the time. Note that uncached > memory != __iomem memory, for which you DO have to use memset_io. > Then we have a subject here. powerpc has a magic instruction 'dcbz' which clears a full cacheline in one go. It is far more efficient than a loop to store zeros, and since 2015 memset(0) has been implemented with that instruction (commit 5b2a32e80634 ("powerpc/32: memset(0): use cacheable_memzero")) But that instruction generates an alignment exception when used on non-cached memory (whether it is RAM or not doesn't matter). It is then emulated by the kernel but it of course leads to a serious performance degradation, hence the warning added by commit cbe654c77961 ("powerpc: warn on emulation of dcbz instruction in kernel mode"). Until now it helped identify and fix use of memset() on IO memory. But if memset() is expected to be used with non-cached RAM, then I don't know what to do. Any suggestion ? Christophe ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-22 6:39 ` LEROY Christophe @ 2024-08-22 7:14 ` Christoph Hellwig 2024-08-22 9:53 ` Benjamin Herrenschmidt 2024-08-22 18:19 ` Christian Lamparter 2024-08-23 13:06 ` Segher Boessenkool 1 sibling, 2 replies; 17+ messages in thread From: Christoph Hellwig @ 2024-08-22 7:14 UTC (permalink / raw) To: LEROY Christophe Cc: Christoph Hellwig, Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain On Thu, Aug 22, 2024 at 06:39:33AM +0000, LEROY Christophe wrote: > powerpc has a magic instruction 'dcbz' which clears a full cacheline in > one go. It is far more efficient than a loop to store zeros, and since > 2015 memset(0) has been implemented with that instruction (commit > 5b2a32e80634 ("powerpc/32: memset(0): use cacheable_memzero")) > > But that instruction generates an alignment exception when used on > non-cached memory (whether it is RAM or not doesn't matter). It is then > emulated by the kernel but it of course leads to a serious performance > degradation, hence the warning added by commit cbe654c77961 ("powerpc: > warn on emulation of dcbz instruction in kernel mode"). Until now it > helped identify and fix use of memset() on IO memory. > > But if memset() is expected to be used with non-cached RAM, then I don't > know what to do. Any suggestion ? I'd suggest two things: 1) remove the warning. The use case is perfectly valid and everything using uncached memory is already slow, so people will just have to deal with it. Maybe offer a trace point instead if people care about it. 2) figure out a way to avoid this case in the dma-coherent allocator, which is probably the only case where it happens frequently (a few drivers also zero or re-zero coherent memory, but most of the time that is cargo cult programming and not actually needed) For 2 I can think of two options: a) provide a arch hook for zeroing the dma memory that defaults to memset, but which powerpc can override a) figure out a way to clear the memory before marking it uncached if we can a) it obviously easier to verify, but b) is probably going to give way better performance. Below is an untested implementation of b) for dma-direct, I just need to find out if there is any architecture that requires the memory to be zeroed after it іt has been remapped. The iommu drivers might also need similar treatment. diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 4480a3cd92e087..66e94b32ab0081 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -275,6 +275,9 @@ void *dma_direct_alloc(struct device *dev, size_t size, if (force_dma_unencrypted(dev)) prot = pgprot_decrypted(prot); + if (!PageHighMem(page)) + memset(page_address(page), 0, size); + /* remove any dirty cache lines on the kernel alias */ arch_dma_prep_coherent(page, size); @@ -283,14 +286,15 @@ void *dma_direct_alloc(struct device *dev, size_t size, __builtin_return_address(0)); if (!ret) goto out_free_pages; + if (PageHighMem(page)) + memset(ret, 0, size); } else { ret = page_address(page); if (dma_set_decrypted(dev, ret, size)) goto out_leak_pages; + memset(ret, 0, size); } - memset(ret, 0, size); - if (set_uncached) { arch_dma_prep_coherent(page, size); ret = arch_dma_set_uncached(ret, size); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-22 7:14 ` Christoph Hellwig @ 2024-08-22 9:53 ` Benjamin Herrenschmidt 2024-08-22 18:19 ` Christian Lamparter 1 sibling, 0 replies; 17+ messages in thread From: Benjamin Herrenschmidt @ 2024-08-22 9:53 UTC (permalink / raw) To: Christoph Hellwig, LEROY Christophe Cc: Christian Lamparter, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain On Thu, 2024-08-22 at 09:14 +0200, Christoph Hellwig wrote: > > I'd suggest two things: > > 1) remove the warning. The use case is perfectly valid and everything > using uncached memory is already slow, so people will just have to > deal with it. Maybe offer a trace point instead if people care about > it. Well... there's slow and extremely slow .. :-) dcbz() on uncached memory will take an exception for every "cache line" of zeros which will then need to be emulated. We are talking about hundreds to thousands of cycle per cache line. (Do we have an optimisation to detect memset in the emulation code and force it to return to a non-dcbz'ing version ? If not, that's doable and would at least limit it to one exception per memset() instead of one per cache-line). Cheers, Ben. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-22 7:14 ` Christoph Hellwig 2024-08-22 9:53 ` Benjamin Herrenschmidt @ 2024-08-22 18:19 ` Christian Lamparter 2024-08-23 8:07 ` Christoph Hellwig 1 sibling, 1 reply; 17+ messages in thread From: Christian Lamparter @ 2024-08-22 18:19 UTC (permalink / raw) To: Christoph Hellwig, LEROY Christophe Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain Hi! Thank you all for your insightful insights :-) . ... and derp, used the wrong mail. On 8/22/24 9:14 AM, Christoph Hellwig wrote: > On Thu, Aug 22, 2024 at 06:39:33AM +0000, LEROY Christophe wrote: >> powerpc has a magic instruction 'dcbz' which clears a full cacheline in >> one go. It is far more efficient than a loop to store zeros, and since >> 2015 memset(0) has been implemented with that instruction (commit >> 5b2a32e80634 ("powerpc/32: memset(0): use cacheable_memzero")) >> >> But that instruction generates an alignment exception when used on >> non-cached memory (whether it is RAM or not doesn't matter). It is then >> emulated by the kernel but it of course leads to a serious performance >> degradation, hence the warning added by commit cbe654c77961 ("powerpc: >> warn on emulation of dcbz instruction in kernel mode"). Until now it >> helped identify and fix use of memset() on IO memory. >> >> But if memset() is expected to be used with non-cached RAM, then I don't >> know what to do. Any suggestion ? > > I'd suggest two things: > > 1) remove the warning. The use case is perfectly valid and everything > using uncached memory is already slow, so people will just have to > deal with it. Maybe offer a trace point instead if people care about > it. > 2) figure out a way to avoid this case in the dma-coherent allocator, > which is probably the only case where it happens frequently > (a few drivers also zero or re-zero coherent memory, but most of the > time that is cargo cult programming and not actually needed) I tested your patch below and got the next warning. This time from dma_alloc_from_pool about a similar memset in: https://elixir.bootlin.com/linux/v6.6.47/source/kernel/dma/pool.c#L261 it triggers for the sata_dwc_ex460 driver when it access the harddrive. For anyone interested, please look below for the full warning splat. (I switched to the MyBook Live, it's easier to disassemble+reassemble than the MX60... but with similar results). > For 2 I can think of two options: > > a) provide a arch hook for zeroing the dma memory that defaults to > memset, but which powerpc can override > a) figure out a way to clear the memory before marking it uncached > if we can > > a) it obviously easier to verify, but b) is probably going to give > way better performance. > > Below is an untested implementation of b) for dma-direct, I just need to > find out if there is any architecture that requires the memory to be > zeroed after it іt has been remapped. The iommu drivers might also > need similar treatment. > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > index 4480a3cd92e087..66e94b32ab0081 100644 > --- a/kernel/dma/direct.c > +++ b/kernel/dma/direct.c > @@ -275,6 +275,9 @@ void *dma_direct_alloc(struct device *dev, size_t size, > if (force_dma_unencrypted(dev)) > prot = pgprot_decrypted(prot); > > + if (!PageHighMem(page)) > + memset(page_address(page), 0, size); > + > /* remove any dirty cache lines on the kernel alias */ > arch_dma_prep_coherent(page, size); > > @@ -283,14 +286,15 @@ void *dma_direct_alloc(struct device *dev, size_t size, > __builtin_return_address(0)); > if (!ret) > goto out_free_pages; > + if (PageHighMem(page)) > + memset(ret, 0, size); > } else { > ret = page_address(page); > if (dma_set_decrypted(dev, ret, size)) > goto out_leak_pages; > + memset(ret, 0, size); > } > > - memset(ret, 0, size); > - > if (set_uncached) { > arch_dma_prep_coherent(page, size); > ret = arch_dma_set_uncached(ret, size); I added a counter in fix_alignment to count the number of traps issued by the kernel code. From what I measured in a quick test, With this patch, the count during boot went down from ~21000 (unmodified) to 1200 (patched)... but from what I can tell, it still triggers for the dma_direct_alloc in emac. if there are more patches, please. I can test them during the next few days while I'm on holiday. Cheers, Christian --- [ 2.596136] ------------[ cut here ]------------ [ 2.600759] WARNING: CPU: 0 PID: 24 at fix_alignment+0x17c/0x1b4 [ 2.606760] Modules linked in: [ 2.609813] CPU: 0 PID: 24 Comm: kworker/u2:2 Not tainted 6.6.47 #0 [ 2.616057] Hardware name: MyBook Live APM821XX 0x12c41c83 PowerPC 44x Platform [ 2.623337] Workqueue: events_unbound async_run_entry_fn [ 2.628646] NIP: c0003190 LR: c0003070 CTR: c0014744 [ 2.633674] REGS: c11a94b0 TRAP: 0700 Not tainted (6.6.47) [ 2.639402] MSR: 00021000 <CE,ME> CR: 22008808 XER: 20000000 [ 2.645312] [ 2.645312] GPR00: c0008640 c11a95a0 c10de300 00000000 00000001 00000018 01f01cdf 00000004 [ 2.645312] GPR08: c0f30000 00000000 0147ae14 c11a9610 c0013abc 00000000 c004c1e4 00000002 [ 2.645312] GPR16: 00000000 00000001 00000000 00000000 d1000000 c0910000 c1182010 c0f50000 [ 2.645312] GPR24: c11a9774 c00804a8 c0f30000 c0910000 c0910000 00001000 c108e3a0 c11a9620 [ 2.680114] NIP [c0003190] fix_alignment+0x17c/0x1b4 [ 2.685065] LR [c0003070] fix_alignment+0x5c/0x1b4 [ 2.689844] Call Trace: [ 2.692279] [c11a95a0] [c11a9734] 0xc11a9734 (unreliable) [ 2.697670] [c11a95f0] [c0008640] alignment_exception+0xf0/0x164 [ 2.703666] [c11a9610] [c0000a30] Alignment+0x130/0x180 [ 2.708877] --- interrupt: 600 at memset+0x60/0xc0 [ 2.713672] NIP: c0013abc LR: c0081df8 CTR: 0000007f [ 2.718700] REGS: c11a9620 TRAP: 0600 Not tainted (6.6.47) [ 2.724420] MSR: 00021000 <CE,ME> CR: 42008808 XER: 20000000 [ 2.730330] DEAR: d1000020 ESR: 00000000 [ 2.730330] GPR00: 00000007 c11a9710 c10de300 d1000000 00000000 00001000 d100001c 00000004 [ 2.730330] GPR08: 00001000 0000007f 00000001 53c20800 28008808 00000000 c004c1e4 00000002 [ 2.730330] GPR16: 00000000 00000001 00000000 00000000 d1000000 c0910000 c1182010 c0f50000 [ 2.730330] GPR24: c11a9774 c00804a8 c0f30000 c0910000 c0910000 00001000 c108e3a0 000010a0 [ 2.767645] NIP [c0013abc] memset+0x60/0xc0 [ 2.771819] LR [c0081df8] dma_alloc_from_pool+0x118/0x204 [ 2.777201] --- interrupt: 600 [ 2.780242] [c11a9710] [c0081eb0] dma_alloc_from_pool+0x1d0/0x204 (unreliable) [ 2.787449] [c11a9760] [c00807e0] dma_direct_alloc+0x90/0x2a4 [ 2.793178] [c11a97a0] [c007f850] dma_alloc_attrs+0xa8/0xf8 [ 2.798758] [c11a97e0] [c019d180] dma_pool_alloc+0x11c/0x2ac [ 2.804409] [c11a9810] [c0434cec] dwc_desc_get+0x2c/0x98 [ 2.809722] [c11a9840] [c04355cc] dwc_prep_slave_sg+0x180/0x524 [ 2.815633] [c11a98b0] [c04d7938] sata_dwc_qc_issue+0x1d8/0x268 [ 2.821559] [c11a9920] [c04bbfb0] ata_qc_issue+0x174/0x2b0 [ 2.827046] [c11a9940] [c04c674c] __ata_scsi_queuecmd+0x200/0x4fc [ 2.833129] [c11a9960] [c04c6a84] ata_scsi_queuecmd+0x3c/0x88 [ 2.838865] [c11a9980] [c04a7bb0] scsi_queue_rq+0x6fc/0xb40 [ 2.844429] [c11a99c0] [c03801e8] __blk_mq_issue_directly+0x40/0xe0 [ 2.850694] [c11a99f0] [c038462c] blk_mq_try_issue_directly+0xa8/0x10c [ 2.857208] [c11a9a10] [c03854e4] blk_mq_submit_bio+0x5f4/0x67c [ 2.863118] [c11a9a70] [c0374ef4] submit_bio_noacct_nocheck+0x210/0x2d0 [ 2.869728] [c11a9aa0] [c0206850] block_read_full_folio+0x1f4/0x3a4 [ 2.875983] [c11a9b10] [c01353d8] filemap_read_folio+0x40/0x240 [ 2.881910] [c11a9b60] [c01382d0] do_read_cache_folio+0xb4/0x228 [ 2.887906] [c11a9b90] [c039144c] read_part_sector+0x50/0x10c [ 2.893643] [c11a9bb0] [c0392358] read_lba+0xb8/0x198 [ 2.898680] [c11a9bf0] [c0392854] efi_partition+0xd8/0xc30 [ 2.904150] [c11a9cc0] [c0390be4] bdev_disk_changed+0x2ac/0x710 [ 2.910051] [c11a9d30] [c036be5c] blkdev_get_whole+0xc4/0xd8 [ 2.915693] [c11a9d50] [c036cb7c] blkdev_get_by_dev.part.0+0x288/0x380 [ 2.922199] [c11a9d90] [c038d49c] disk_scan_partitions+0x90/0x150 [ 2.928272] [c11a9dc0] [c038d9ac] device_add_disk+0x450/0x484 [ 2.934001] [c11a9df0] [c04b7718] sd_probe+0x360/0x4fc [ 2.939132] [c11a9e20] [c0479db0] really_probe+0x2a0/0x370 [ 2.944628] [c11a9e50] [c0479f10] __driver_probe_device+0x90/0x200 [ 2.950796] [c11a9e70] [c047a0c8] driver_probe_device+0x48/0x104 [ 2.956793] [c11a9ea0] [c047a268] __device_attach_driver+0xe4/0x130 [ 2.963048] [c11a9ec0] [c04772e0] bus_for_each_drv+0x88/0xe4 [ 2.968699] [c11a9ef0] [c047939c] __device_attach_async_helper+0xa0/0xe8 [ 2.975386] [c11a9f20] [c005096c] async_run_entry_fn+0x40/0x11c [ 2.981296] [c11a9f40] [c0044b70] process_one_work+0x1bc/0x35c [ 2.987111] [c11a9f70] [c0045088] worker_thread+0x378/0x4e8 [ 2.992667] [c11a9fc0] [c004c2c8] kthread+0xe4/0xe8 [ 2.997548] [c11a9ff0] [c000d210] start_kernel_thread+0x10/0x14 [ 3.003449] Code: 39400000 7d500ba6 4c00012c 2c090000 41a2fec4 83e1004c 3860fff2 38210050 4e800020 38a00000 3920fff2 4bffffd4 <0fe00000> 4bffff8c 80010054 7c0803a6 [ 3.018138] ---[ end trace 0000000000000000 ]--- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-22 18:19 ` Christian Lamparter @ 2024-08-23 8:07 ` Christoph Hellwig 0 siblings, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2024-08-23 8:07 UTC (permalink / raw) To: Christian Lamparter Cc: Christoph Hellwig, LEROY Christophe, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain, Vinod Koul, dmaengine dma_alloc_from_pool is the allocator used when the caller can't sleep, and as that is reusing memory it really has to call memset or a memset-like function on the already uncached memory unfortunately. The dma engine operation that is doing this allocation is documented as not being able to sleep, which is a bit unfortunate as the storage driver above it could sleep just fine. Adding the dmaengine maintainer and list if there is a way to pass a gfp flag or some other indicator that the implementations can sleep, which would avoid the need to use the not very scalable dma pool, and thus also the need to memset on uncached memory. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-22 6:39 ` LEROY Christophe 2024-08-22 7:14 ` Christoph Hellwig @ 2024-08-23 13:06 ` Segher Boessenkool 2024-08-23 13:54 ` Christoph Hellwig 1 sibling, 1 reply; 17+ messages in thread From: Segher Boessenkool @ 2024-08-23 13:06 UTC (permalink / raw) To: LEROY Christophe Cc: Christoph Hellwig, Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain Hi! On Thu, Aug 22, 2024 at 06:39:33AM +0000, LEROY Christophe wrote: > Le 22/08/2024 à 07:32, Christoph Hellwig a écrit : > > On Thu, Aug 22, 2024 at 05:25:10AM +0000, LEROY Christophe wrote: > >>> and this results in a call to dma_direct_allocation(), which has one > >>> innocent looking memset(): > >> > >> > >> memset() can't be used on non-cached memory, memset_io() has to be used > >> instead. > > > > No, we use memset on uncached memory all the time. Note that uncached > > memory != __iomem memory, for which you DO have to use memset_io. > > > > Then we have a subject here. > > powerpc has a magic instruction 'dcbz' which clears a full cacheline in > one go. It is far more efficient than a loop to store zeros, and since > 2015 memset(0) has been implemented with that instruction (commit > 5b2a32e80634 ("powerpc/32: memset(0): use cacheable_memzero")) > > But that instruction generates an alignment exception when used on > non-cached memory (whether it is RAM or not doesn't matter). What does "uncached memory" even mean here? Literally it would be I=1 memory (uncachEABLE memory), but more likely you want M=0 memory here ("non-memory memory", "not well-behaved memory", MMIO often). M=0 memory shouldn't ever have memset done on it, that is insane. And I=1 memory should not have the same optimised routines used, since those only make things slower still. > It is then > emulated by the kernel but it of course leads to a serious performance > degradation, hence the warning added by commit cbe654c77961 ("powerpc: > warn on emulation of dcbz instruction in kernel mode"). Until now it > helped identify and fix use of memset() on IO memory. > > But if memset() is expected to be used with non-cached RAM, then I don't > know what to do. Any suggestion ? If memset() is expected to be used with M=0, you cannot do any serious optimisations to it at all. If memset() is expected to be used with I=1 it should use a separate code path for it, probably the caller should make the distinction. Segher ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-23 13:06 ` Segher Boessenkool @ 2024-08-23 13:54 ` Christoph Hellwig 2024-08-23 19:19 ` Segher Boessenkool 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2024-08-23 13:54 UTC (permalink / raw) To: Segher Boessenkool Cc: LEROY Christophe, Christoph Hellwig, Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain On Fri, Aug 23, 2024 at 08:06:00AM -0500, Segher Boessenkool wrote: > What does "uncached memory" even mean here? Literally it would be > I=1 memory (uncachEABLE memory), but more likely you want M=0 memory > here ("non-memory memory", "not well-behaved memory", MMIO often). Regular kernel memory vmapped with pgprot_noncached(). > If memset() is expected to be used with M=0, you cannot do any serious > optimisations to it at all. If memset() is expected to be used with I=1 > it should use a separate code path for it, probably the caller should > make the distinction. DMA coherent memory which uses uncached memory for platforms that do not provide hardware dma coherence can end up just about anywhere in the kernel. We could use special routines for a few places in the DMA subsystem, but there might be plenty of others. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-23 13:54 ` Christoph Hellwig @ 2024-08-23 19:19 ` Segher Boessenkool 2024-08-23 23:43 ` Christian Lamparter 2024-08-24 9:01 ` LEROY Christophe 0 siblings, 2 replies; 17+ messages in thread From: Segher Boessenkool @ 2024-08-23 19:19 UTC (permalink / raw) To: Christoph Hellwig Cc: LEROY Christophe, Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain Hi! On Fri, Aug 23, 2024 at 03:54:59PM +0200, Christoph Hellwig wrote: > On Fri, Aug 23, 2024 at 08:06:00AM -0500, Segher Boessenkool wrote: > > What does "uncached memory" even mean here? Literally it would be > > I=1 memory (uncachEABLE memory), but more likely you want M=0 memory > > here ("non-memory memory", "not well-behaved memory", MMIO often). > > Regular kernel memory vmapped with pgprot_noncached(). So, I=1 (and G=1). Caching inhibited and guarded. But M=1 (memory coherence required) as with any other real memory :-) > > If memset() is expected to be used with M=0, you cannot do any serious > > optimisations to it at all. If memset() is expected to be used with I=1 > > it should use a separate code path for it, probably the caller should > > make the distinction. > > DMA coherent memory which uses uncached memory for platforms that > do not provide hardware dma coherence can end up just about anywhere > in the kernel. We could use special routines for a few places in > the DMA subsystem, but there might be plenty of others. Yeah. It will just be plenty slow, as we see here, that's what the warning is for; but it works just fine :-) The memset() code itself could chech for the storage attributes, but that is probably more expensive than just assuming the happy case. Maybe someone could try it out though! Segher ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-23 19:19 ` Segher Boessenkool @ 2024-08-23 23:43 ` Christian Lamparter 2024-08-24 9:01 ` LEROY Christophe 1 sibling, 0 replies; 17+ messages in thread From: Christian Lamparter @ 2024-08-23 23:43 UTC (permalink / raw) To: Segher Boessenkool, Christoph Hellwig Cc: LEROY Christophe, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain On 8/23/24 9:19 PM, Segher Boessenkool wrote: > Hi! > > On Fri, Aug 23, 2024 at 03:54:59PM +0200, Christoph Hellwig wrote: >> On Fri, Aug 23, 2024 at 08:06:00AM -0500, Segher Boessenkool wrote: >>> What does "uncached memory" even mean here? Literally it would be >>> I=1 memory (uncachEABLE memory), but more likely you want M=0 memory >>> here ("non-memory memory", "not well-behaved memory", MMIO often). >> >> Regular kernel memory vmapped with pgprot_noncached(). > > So, I=1 (and G=1). Caching inhibited and guarded. But M=1 (memory > coherence required) as with any other real memory :-) > >>> If memset() is expected to be used with M=0, you cannot do any serious >>> optimisations to it at all. If memset() is expected to be used with I=1 >>> it should use a separate code path for it, probably the caller should >>> make the distinction. >> >> DMA coherent memory which uses uncached memory for platforms that >> do not provide hardware dma coherence can end up just about anywhere >> in the kernel. We could use special routines for a few places in >> the DMA subsystem, but there might be plenty of others. > > Yeah. It will just be plenty slow, as we see here, that's what the > warning is for; but it works just fine :-) > > The memset() code itself could chech for the storage attributes, but > that is probably more expensive than just assuming the happy case. > Maybe someone could try it out though! Hmm, Ok! For what's worth I can at least test memset with dcbz+trap and what it was in 2015, without dcbz in the code path. How about that? I figured out of all the offenders (ethernet, crypto and sata). The sata/hard drive would be the most sensitive device to measure any performance difference. the MyBook Live already had an harddrive (Seagate ST380815AS (very old)) installed... so I went with that. I test with OpenWrt, since it has a fully working PowerPC images for the device, I can use initramfs (so HDD/SDD is idle) and provides a very bare minimum the hdparm -t "benchmark". (hdparm -t ... just reads for three seconds and tells you how much it read). the unmodified 6.6.47 kernel scored: | Timing buffered disk reads: 220 MB in 3.02 seconds = 72.93 MB/sec | Timing buffered disk reads: 222 MB in 3.02 seconds = 73.50 MB/sec | Timing buffered disk reads: 216 MB in 3.00 seconds = 71.94 MB/sec from what I can tell, each hdparm -t /dev/sda causes ~77000 fix_alignment traps. (/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size says it's 32 and type is obviously "Data". If I'm not mistaken this means ~2400KiB of emulated dcbz by the trap.) For the test, I added the "old" memset from <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/lib/copy_32.S?id=df087e450d7ddc0b15bd8824206d964720b4f5e4#n120> and replaced 6.6.47's memset in dma_pool_alloc() with it <https://elixir.bootlin.com/linux/v6.6.47/source/mm/dmapool.c#L435> now no WARNINGS are triggered and hdparm -t /dev/sda produces: | Timing buffered disk reads: 220 MB in 3.00 seconds = 73.32 MB/sec | Timing buffered disk reads: 218 MB in 3.02 seconds = 72.28 MB/sec | Timing buffered disk reads: 224 MB in 3.03 seconds = 74.02 MB/sec virtually no benefit?! Well, the HDD could be too slow. Let's try an old SSD: Samsung 840 Evo 120 GB. This one manages to read 1276 MB in 3.06 seconds = ~416 MB/sec in the same hdparm -t test on a reasonably modern PC when connected via a usb3<->sata adapter. unmodified 6.6.47 kernel: | Timing buffered disk reads: 356 MB in 3.00 seconds = 118.61 MB/sec | Timing buffered disk reads: 358 MB in 3.01 seconds = 119.12 MB/sec | Timing buffered disk reads: 358 MB in 3.01 seconds = 119.03 MB/sec modified 6.6.47 kernel: | Timing buffered disk reads: 380 MB in 3.01 seconds = 126.30 MB/sec | Timing buffered disk reads: 374 MB in 3.00 seconds = 124.61 MB/sec | Timing buffered disk reads: 382 MB in 3.02 seconds = 126.62 MB/sec Ok! There's something there. ~4%. Cheers, Christian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-23 19:19 ` Segher Boessenkool 2024-08-23 23:43 ` Christian Lamparter @ 2024-08-24 9:01 ` LEROY Christophe 2024-08-24 17:17 ` Segher Boessenkool 1 sibling, 1 reply; 17+ messages in thread From: LEROY Christophe @ 2024-08-24 9:01 UTC (permalink / raw) To: Segher Boessenkool, Christoph Hellwig Cc: Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain Le 23/08/2024 à 21:19, Segher Boessenkool a écrit : > ***ATTENTION, Sopra Steria Group cannot confirm the identity of this email sender (SPF record failure). This might be a fake email from an attacker, if you have any doubts report and delete the email.*** > > ***ATTENTION, Sopra Steria Group ne peut pas confirmer l’identité de l’émetteur de ce message (SPF record failure). Il pourrait s’agir d’un faux message, à détruire si vous avez un doute *** > > Hi! > > On Fri, Aug 23, 2024 at 03:54:59PM +0200, Christoph Hellwig wrote: >> On Fri, Aug 23, 2024 at 08:06:00AM -0500, Segher Boessenkool wrote: >>> What does "uncached memory" even mean here? Literally it would be >>> I=1 memory (uncachEABLE memory), but more likely you want M=0 memory >>> here ("non-memory memory", "not well-behaved memory", MMIO often). >> >> Regular kernel memory vmapped with pgprot_noncached(). > > So, I=1 (and G=1). Caching inhibited and guarded. But M=1 (memory > coherence required) as with any other real memory :-) > >>> If memset() is expected to be used with M=0, you cannot do any serious >>> optimisations to it at all. If memset() is expected to be used with I=1 >>> it should use a separate code path for it, probably the caller should >>> make the distinction. >> >> DMA coherent memory which uses uncached memory for platforms that >> do not provide hardware dma coherence can end up just about anywhere >> in the kernel. We could use special routines for a few places in >> the DMA subsystem, but there might be plenty of others. > > Yeah. It will just be plenty slow, as we see here, that's what the > warning is for; but it works just fine :-) > > The memset() code itself could chech for the storage attributes, but > that is probably more expensive than just assuming the happy case. > Maybe someone could try it out though! But is it only memset() the problem ? dcbz instruction is also used in: - memcpy() - csum_partial_copy_generic() - clear_page() - copy_page() - clear_user() - copy_to_user() - copy_from_user() Are these functions also used on DMA coherent memory ? Christophe ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-24 9:01 ` LEROY Christophe @ 2024-08-24 17:17 ` Segher Boessenkool 2024-08-27 7:29 ` Christoph Hellwig 0 siblings, 1 reply; 17+ messages in thread From: Segher Boessenkool @ 2024-08-24 17:17 UTC (permalink / raw) To: LEROY Christophe Cc: Christoph Hellwig, Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain On Sat, Aug 24, 2024 at 09:01:33AM +0000, LEROY Christophe wrote: > Le 23/08/2024 à 21:19, Segher Boessenkool a écrit : > > The memset() code itself could chech for the storage attributes, but > > that is probably more expensive than just assuming the happy case. > > Maybe someone could try it out though! > > But is it only memset() the problem ? > > dcbz instruction is also used in: > - memcpy() > - csum_partial_copy_generic() > - clear_page() > - copy_page() > - clear_user() > - copy_to_user() > - copy_from_user() That is just a handful of functions. Not sure about the _user things, and the _page things for that matter, but the rest is certainly measurable in real-life conditions. So if we can avoid the problems completely, and cheaply, we probably should. I'm not so sure about the cheaply though :-/ > Are these functions also used on DMA coherent memory ? Most won't show up high on most profiles, heh. Which you already can see from the problem not being attacked yet: if it was so obviously a problem, some people would have wanted to do something about it :-) Segher ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode 2024-08-24 17:17 ` Segher Boessenkool @ 2024-08-27 7:29 ` Christoph Hellwig 0 siblings, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2024-08-27 7:29 UTC (permalink / raw) To: Segher Boessenkool Cc: LEROY Christophe, Christoph Hellwig, Christian Lamparter, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Stan Johnson, Finn Thain On Sat, Aug 24, 2024 at 12:17:57PM -0500, Segher Boessenkool wrote: > > Are these functions also used on DMA coherent memory ? > > Most won't show up high on most profiles, heh. Which you already > can see from the problem not being attacked yet: if it was so obviously > a problem, some people would have wanted to do something about it :-) Most drivers try to avoid coherent allocations in the fast path if they can. Another good option for Christians problem would be to switch the the dmaengine driver to use dma_alloc_pages - it doesn't actually need uncached memory as far as I can, dma_alloc_coherent is just the only API we had to allocate guaranteed DMAable memory for most of Linux's existence. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-08-27 7:29 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-09-16 14:52 [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode Christophe Leroy 2021-11-02 10:11 ` Michael Ellerman 2024-08-21 22:39 ` Christian Lamparter 2024-08-22 5:25 ` LEROY Christophe 2024-08-22 5:32 ` Christoph Hellwig 2024-08-22 6:39 ` LEROY Christophe 2024-08-22 7:14 ` Christoph Hellwig 2024-08-22 9:53 ` Benjamin Herrenschmidt 2024-08-22 18:19 ` Christian Lamparter 2024-08-23 8:07 ` Christoph Hellwig 2024-08-23 13:06 ` Segher Boessenkool 2024-08-23 13:54 ` Christoph Hellwig 2024-08-23 19:19 ` Segher Boessenkool 2024-08-23 23:43 ` Christian Lamparter 2024-08-24 9:01 ` LEROY Christophe 2024-08-24 17:17 ` Segher Boessenkool 2024-08-27 7:29 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).