LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 4.9 3/5] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
From: Sasha Levin @ 2020-12-30 13:04 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, linuxppc-dev, Qinglang Miao
In-Reply-To: <20201230130447.3637694-1-sashal@kernel.org>

From: Qinglang Miao <miaoqinglang@huawei.com>

[ Upstream commit ffa1797040c5da391859a9556be7b735acbe1242 ]

I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/sysdev/mpic_msgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/mpic_msgr.c b/arch/powerpc/sysdev/mpic_msgr.c
index 47fb336741d43..e26552708a281 100644
--- a/arch/powerpc/sysdev/mpic_msgr.c
+++ b/arch/powerpc/sysdev/mpic_msgr.c
@@ -196,7 +196,7 @@ static int mpic_msgr_probe(struct platform_device *dev)
 
 	/* IO map the message register block. */
 	of_address_to_resource(np, 0, &rsrc);
-	msgr_block_addr = ioremap(rsrc.start, resource_size(&rsrc));
+	msgr_block_addr = devm_ioremap(&dev->dev, rsrc.start, resource_size(&rsrc));
 	if (!msgr_block_addr) {
 		dev_err(&dev->dev, "Failed to iomap MPIC message registers");
 		return -EFAULT;
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 4.14 4/8] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
From: Sasha Levin @ 2020-12-30 13:04 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, linuxppc-dev, Qinglang Miao
In-Reply-To: <20201230130436.3637579-1-sashal@kernel.org>

From: Qinglang Miao <miaoqinglang@huawei.com>

[ Upstream commit ffa1797040c5da391859a9556be7b735acbe1242 ]

I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/sysdev/mpic_msgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/mpic_msgr.c b/arch/powerpc/sysdev/mpic_msgr.c
index 280e964e1aa88..497e86cfb12e0 100644
--- a/arch/powerpc/sysdev/mpic_msgr.c
+++ b/arch/powerpc/sysdev/mpic_msgr.c
@@ -196,7 +196,7 @@ static int mpic_msgr_probe(struct platform_device *dev)
 
 	/* IO map the message register block. */
 	of_address_to_resource(np, 0, &rsrc);
-	msgr_block_addr = ioremap(rsrc.start, resource_size(&rsrc));
+	msgr_block_addr = devm_ioremap(&dev->dev, rsrc.start, resource_size(&rsrc));
 	if (!msgr_block_addr) {
 		dev_err(&dev->dev, "Failed to iomap MPIC message registers");
 		return -EFAULT;
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 05/10] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
From: Sasha Levin @ 2020-12-30 13:04 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, linuxppc-dev, Qinglang Miao
In-Reply-To: <20201230130422.3637448-1-sashal@kernel.org>

From: Qinglang Miao <miaoqinglang@huawei.com>

[ Upstream commit ffa1797040c5da391859a9556be7b735acbe1242 ]

I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/sysdev/mpic_msgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/mpic_msgr.c b/arch/powerpc/sysdev/mpic_msgr.c
index 280e964e1aa88..497e86cfb12e0 100644
--- a/arch/powerpc/sysdev/mpic_msgr.c
+++ b/arch/powerpc/sysdev/mpic_msgr.c
@@ -196,7 +196,7 @@ static int mpic_msgr_probe(struct platform_device *dev)
 
 	/* IO map the message register block. */
 	of_address_to_resource(np, 0, &rsrc);
-	msgr_block_addr = ioremap(rsrc.start, resource_size(&rsrc));
+	msgr_block_addr = devm_ioremap(&dev->dev, rsrc.start, resource_size(&rsrc));
 	if (!msgr_block_addr) {
 		dev_err(&dev->dev, "Failed to iomap MPIC message registers");
 		return -EFAULT;
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.4 06/17] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
From: Sasha Levin @ 2020-12-30 13:03 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, linuxppc-dev, Qinglang Miao
In-Reply-To: <20201230130357.3637261-1-sashal@kernel.org>

From: Qinglang Miao <miaoqinglang@huawei.com>

[ Upstream commit ffa1797040c5da391859a9556be7b735acbe1242 ]

I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/sysdev/mpic_msgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/mpic_msgr.c b/arch/powerpc/sysdev/mpic_msgr.c
index f6b253e2be409..36ec0bdd8b63c 100644
--- a/arch/powerpc/sysdev/mpic_msgr.c
+++ b/arch/powerpc/sysdev/mpic_msgr.c
@@ -191,7 +191,7 @@ static int mpic_msgr_probe(struct platform_device *dev)
 
 	/* IO map the message register block. */
 	of_address_to_resource(np, 0, &rsrc);
-	msgr_block_addr = ioremap(rsrc.start, resource_size(&rsrc));
+	msgr_block_addr = devm_ioremap(&dev->dev, rsrc.start, resource_size(&rsrc));
 	if (!msgr_block_addr) {
 		dev_err(&dev->dev, "Failed to iomap MPIC message registers");
 		return -EFAULT;
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.10 14/31] powerpc/64: irq replay remove decrementer overflow check
From: Sasha Levin @ 2020-12-30 13:02 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, linuxppc-dev, Nicholas Piggin
In-Reply-To: <20201230130314.3636961-1-sashal@kernel.org>

From: Nicholas Piggin <npiggin@gmail.com>

[ Upstream commit 59d512e4374b2d8a6ad341475dc94c4a4bdec7d3 ]

This is way to catch some cases of decrementer overflow, when the
decrementer has underflowed an odd number of times, while MSR[EE] was
disabled.

With a typical small decrementer, a timer that fires when MSR[EE] is
disabled will be "lost" if MSR[EE] remains disabled for between 4.3 and
8.6 seconds after the timer expires. In any case, the decrementer
interrupt would be taken at 8.6 seconds and the timer would be found at
that point.

So this check is for catching extreme latency events, and it prevents
those latencies from being a further few seconds long.  It's not obvious
this is a good tradeoff. This is already a watchdog magnitude event and
that situation is not improved a significantly with this check. For
large decrementers, it's useless.

Therefore remove this check, which avoids a mftb when enabling hard
disabled interrupts (e.g., when enabling after coming from hardware
interrupt handlers). Perhaps more importantly, it also removes the
clunky MSR[EE] vs PACA_IRQ_HARD_DIS incoherency in soft-interrupt replay
which simplifies the code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201107014336.2337337-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kernel/irq.c             | 53 ++-------------------------
 arch/powerpc/kernel/time.c            |  9 ++---
 arch/powerpc/platforms/powernv/opal.c |  2 +-
 3 files changed, 8 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 7d0f7682d01df..6b1eca53e36cc 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -102,14 +102,6 @@ static inline notrace unsigned long get_irq_happened(void)
 	return happened;
 }
 
-static inline notrace int decrementer_check_overflow(void)
-{
-	u64 now = get_tb();
-	u64 *next_tb = this_cpu_ptr(&decrementers_next_tb);
- 
-	return now >= *next_tb;
-}
-
 #ifdef CONFIG_PPC_BOOK3E
 
 /* This is called whenever we are re-enabling interrupts
@@ -142,35 +134,6 @@ notrace unsigned int __check_irq_replay(void)
 	trace_hardirqs_on();
 	trace_hardirqs_off();
 
-	/*
-	 * We are always hard disabled here, but PACA_IRQ_HARD_DIS may
-	 * not be set, which means interrupts have only just been hard
-	 * disabled as part of the local_irq_restore or interrupt return
-	 * code. In that case, skip the decrementr check becaus it's
-	 * expensive to read the TB.
-	 *
-	 * HARD_DIS then gets cleared here, but it's reconciled later.
-	 * Either local_irq_disable will replay the interrupt and that
-	 * will reconcile state like other hard interrupts. Or interrupt
-	 * retur will replay the interrupt and in that case it sets
-	 * PACA_IRQ_HARD_DIS by hand (see comments in entry_64.S).
-	 */
-	if (happened & PACA_IRQ_HARD_DIS) {
-		local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
-
-		/*
-		 * We may have missed a decrementer interrupt if hard disabled.
-		 * Check the decrementer register in case we had a rollover
-		 * while hard disabled.
-		 */
-		if (!(happened & PACA_IRQ_DEC)) {
-			if (decrementer_check_overflow()) {
-				local_paca->irq_happened |= PACA_IRQ_DEC;
-				happened |= PACA_IRQ_DEC;
-			}
-		}
-	}
-
 	if (happened & PACA_IRQ_DEC) {
 		local_paca->irq_happened &= ~PACA_IRQ_DEC;
 		return 0x900;
@@ -186,6 +149,9 @@ notrace unsigned int __check_irq_replay(void)
 		return 0x280;
 	}
 
+	if (happened & PACA_IRQ_HARD_DIS)
+		local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
+
 	/* There should be nothing left ! */
 	BUG_ON(local_paca->irq_happened != 0);
 
@@ -229,18 +195,6 @@ void replay_soft_interrupts(void)
 	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
 		WARN_ON_ONCE(mfmsr() & MSR_EE);
 
-	if (happened & PACA_IRQ_HARD_DIS) {
-		/*
-		 * We may have missed a decrementer interrupt if hard disabled.
-		 * Check the decrementer register in case we had a rollover
-		 * while hard disabled.
-		 */
-		if (!(happened & PACA_IRQ_DEC)) {
-			if (decrementer_check_overflow())
-				happened |= PACA_IRQ_DEC;
-		}
-	}
-
 	/*
 	 * Force the delivery of pending soft-disabled interrupts on PS3.
 	 * Any HV call will have this side effect.
@@ -345,6 +299,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
 		if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
 			WARN_ON_ONCE(!(mfmsr() & MSR_EE));
 		__hard_irq_disable();
+		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
 	} else {
 		/*
 		 * We should already be hard disabled here. We had bugs
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 74efe46f55327..7d372ff3504b2 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -552,14 +552,11 @@ void timer_interrupt(struct pt_regs *regs)
 	struct pt_regs *old_regs;
 	u64 now;
 
-	/* Some implementations of hotplug will get timer interrupts while
-	 * offline, just ignore these and we also need to set
-	 * decrementers_next_tb as MAX to make sure __check_irq_replay
-	 * don't replay timer interrupt when return, otherwise we'll trap
-	 * here infinitely :(
+	/*
+	 * Some implementations of hotplug will get timer interrupts while
+	 * offline, just ignore these.
 	 */
 	if (unlikely(!cpu_online(smp_processor_id()))) {
-		*next_tb = ~(u64)0;
 		set_dec(decrementer_max);
 		return;
 	}
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index d95954ad4c0af..c61c3b62c8c62 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -731,7 +731,7 @@ int opal_hmi_exception_early2(struct pt_regs *regs)
 	return 1;
 }
 
-/* HMI exception handler called in virtual mode during check_irq_replay. */
+/* HMI exception handler called in virtual mode when irqs are next enabled. */
 int opal_handle_hmi_exception(struct pt_regs *regs)
 {
 	/*
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.10 06/31] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
From: Sasha Levin @ 2020-12-30 13:02 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, linuxppc-dev, Qinglang Miao
In-Reply-To: <20201230130314.3636961-1-sashal@kernel.org>

From: Qinglang Miao <miaoqinglang@huawei.com>

[ Upstream commit ffa1797040c5da391859a9556be7b735acbe1242 ]

I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/sysdev/mpic_msgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/mpic_msgr.c b/arch/powerpc/sysdev/mpic_msgr.c
index f6b253e2be409..36ec0bdd8b63c 100644
--- a/arch/powerpc/sysdev/mpic_msgr.c
+++ b/arch/powerpc/sysdev/mpic_msgr.c
@@ -191,7 +191,7 @@ static int mpic_msgr_probe(struct platform_device *dev)
 
 	/* IO map the message register block. */
 	of_address_to_resource(np, 0, &rsrc);
-	msgr_block_addr = ioremap(rsrc.start, resource_size(&rsrc));
+	msgr_block_addr = devm_ioremap(&dev->dev, rsrc.start, resource_size(&rsrc));
 	if (!msgr_block_addr) {
 		dev_err(&dev->dev, "Failed to iomap MPIC message registers");
 		return -EFAULT;
-- 
2.27.0


^ permalink raw reply related

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Nicholas Piggin @ 2020-12-30 11:57 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: paulmck, Arnd Bergmann, Jann Horn, Peter Zijlstra, x86,
	linux-kernel, stable, Will Deacon, Mathieu Desnoyers,
	Andy Lutomirski, Catalin Marinas, Paul Mackerras, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20201230105847.GQ1551@shell.armlinux.org.uk>

Excerpts from Russell King - ARM Linux admin's message of December 30, 2020 8:58 pm:
> On Wed, Dec 30, 2020 at 10:00:28AM +0000, Russell King - ARM Linux admin wrote:
>> On Wed, Dec 30, 2020 at 12:33:02PM +1000, Nicholas Piggin wrote:
>> > Excerpts from Russell King - ARM Linux admin's message of December 29, 2020 8:44 pm:
>> > > On Tue, Dec 29, 2020 at 01:09:12PM +1000, Nicholas Piggin wrote:
>> > >> I think it should certainly be documented in terms of what guarantees
>> > >> it provides to application, _not_ the kinds of instructions it may or
>> > >> may not induce the core to execute. And if existing API can't be
>> > >> re-documented sanely, then deprecatd and new ones added that DTRT.
>> > >> Possibly under a new system call, if arch's like ARM want a range
>> > >> flush and we don't want to expand the multiplexing behaviour of
>> > >> membarrier even more (sigh).
>> > > 
>> > > The 32-bit ARM sys_cacheflush() is there only to support self-modifying
>> > > code, and takes whatever actions are necessary to support that.
>> > > Exactly what actions it takes are cache implementation specific, and
>> > > should be of no concern to the caller, but the underlying thing is...
>> > > it's to support self-modifying code.
>> > 
>> >    Caveat
>> >        cacheflush()  should  not  be used in programs intended to be portable.
>> >        On Linux, this call first appeared on the MIPS architecture, but  nowa‐
>> >        days, Linux provides a cacheflush() system call on some other architec‐
>> >        tures, but with different arguments.
>> > 
>> > What a disaster. Another badly designed interface, although it didn't 
>> > originate in Linux it sounds like we weren't to be outdone so
>> > we messed it up even worse.
>> > 
>> > flushing caches is neither necessary nor sufficient for code modification
>> > on many processors. Maybe some old MIPS specific private thing was fine,
>> > but certainly before it grew to other architectures, somebody should 
>> > have thought for more than 2 minutes about it. Sigh.
>> 
>> WARNING: You are bordering on being objectionable and offensive with
>> that comment.
>> 
>> The ARM interface was designed by me back in the very early days of
>> Linux, probably while you were still in dypers, based on what was
>> known at the time.  Back in the early 2000s, ideas such as relaxed
>> memory ordering were not known.  All there was was one level of
>> harvard cache.

I wasn't talking about memory ordering at all, and I assumed it
came earlier and was brought to Linux for portability reasons -

CONFORMING TO
       Historically, this system call was available on all MIPS UNIX  variants
       including RISC/os, IRIX, Ultrix, NetBSD, OpenBSD, and FreeBSD (and also
       on some non-UNIX MIPS operating systems), so that the existence of this
       call in MIPS operating systems is a de-facto standard.

I don't think the call was totally unreasonable for incoherent virtual 
caches or incoherent i/d caches etc. Although early unix system call interface
demonstrates that people understood how to define good interfaces that dealt
with intent at an abstract level rather than implementation -- munmap 
doesn't specify anywhere that a TLB flush instruction must be executed, 
for example. So "cacheflush" was obviously never a well designed interface 
but rather the typical hardware-centric hack to get their stuff working
(which was fine for its purpose I'm sure).

> 
> Sorry, I got that slightly wrong. Its origins on ARM were from 12 Dec
> 1998:
> 
> http://www.armlinux.org.uk/developer/patches/viewpatch.php?id=88/1
> 
> by Philip Blundell, and first appeared in the ARM
> pre-patch-2.1.131-19981214-1.gz. It was subsequently documented in the
> kernel sources by me in July 2001 in ARM patch-2.4.6-rmk2.gz. It has
> a slightly different signature: the third argument on ARM is a flags
> argument, whereas the MIPS code, it is some undocumented "cache"
> parameter.
> 
> Whether the ARM version pre or post dates the MIPS code, I couldn't say.
> Whether it was ultimately taken from the MIPS implementation, again, I
> couldn't say.

I can, it was in MIPS in late 1.3 kernels at least. I would guess it
came from IRIX.

> However, please stop insulting your fellow developers ability to think.

Sorry, I was being melodramatic. Everyone makes mistakes or decisions
which with hindsight could have been better or were under some 
constraint that isn't apparent. I shouldn't have suggested it indicated 
thoughtlessness.

Thanks,
Nick

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Russell King - ARM Linux admin @ 2020-12-30 10:58 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: paulmck, Arnd Bergmann, Jann Horn, Peter Zijlstra,
	Catalin Marinas, x86, linux-kernel, stable, Will Deacon,
	Mathieu Desnoyers, Andy Lutomirski, Paul Mackerras, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20201230100028.GP1551@shell.armlinux.org.uk>

On Wed, Dec 30, 2020 at 10:00:28AM +0000, Russell King - ARM Linux admin wrote:
> On Wed, Dec 30, 2020 at 12:33:02PM +1000, Nicholas Piggin wrote:
> > Excerpts from Russell King - ARM Linux admin's message of December 29, 2020 8:44 pm:
> > > On Tue, Dec 29, 2020 at 01:09:12PM +1000, Nicholas Piggin wrote:
> > >> I think it should certainly be documented in terms of what guarantees
> > >> it provides to application, _not_ the kinds of instructions it may or
> > >> may not induce the core to execute. And if existing API can't be
> > >> re-documented sanely, then deprecatd and new ones added that DTRT.
> > >> Possibly under a new system call, if arch's like ARM want a range
> > >> flush and we don't want to expand the multiplexing behaviour of
> > >> membarrier even more (sigh).
> > > 
> > > The 32-bit ARM sys_cacheflush() is there only to support self-modifying
> > > code, and takes whatever actions are necessary to support that.
> > > Exactly what actions it takes are cache implementation specific, and
> > > should be of no concern to the caller, but the underlying thing is...
> > > it's to support self-modifying code.
> > 
> >    Caveat
> >        cacheflush()  should  not  be used in programs intended to be portable.
> >        On Linux, this call first appeared on the MIPS architecture, but  nowa‐
> >        days, Linux provides a cacheflush() system call on some other architec‐
> >        tures, but with different arguments.
> > 
> > What a disaster. Another badly designed interface, although it didn't 
> > originate in Linux it sounds like we weren't to be outdone so
> > we messed it up even worse.
> > 
> > flushing caches is neither necessary nor sufficient for code modification
> > on many processors. Maybe some old MIPS specific private thing was fine,
> > but certainly before it grew to other architectures, somebody should 
> > have thought for more than 2 minutes about it. Sigh.
> 
> WARNING: You are bordering on being objectionable and offensive with
> that comment.
> 
> The ARM interface was designed by me back in the very early days of
> Linux, probably while you were still in dypers, based on what was
> known at the time.  Back in the early 2000s, ideas such as relaxed
> memory ordering were not known.  All there was was one level of
> harvard cache.

Sorry, I got that slightly wrong. Its origins on ARM were from 12 Dec
1998:

http://www.armlinux.org.uk/developer/patches/viewpatch.php?id=88/1

by Philip Blundell, and first appeared in the ARM
pre-patch-2.1.131-19981214-1.gz. It was subsequently documented in the
kernel sources by me in July 2001 in ARM patch-2.4.6-rmk2.gz. It has
a slightly different signature: the third argument on ARM is a flags
argument, whereas the MIPS code, it is some undocumented "cache"
parameter.

Whether the ARM version pre or post dates the MIPS code, I couldn't say.
Whether it was ultimately taken from the MIPS implementation, again, I
couldn't say.

However, please stop insulting your fellow developers ability to think.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Russell King - ARM Linux admin @ 2020-12-30 10:00 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, Catalin Marinas, paulmck, Arnd Bergmann, Jann Horn,
	Peter Zijlstra, x86, linux-kernel, stable, Mathieu Desnoyers,
	Andy Lutomirski, Paul Mackerras, Will Deacon, linux-arm-kernel
In-Reply-To: <1609290821.wrfh89v23a.astroid@bobo.none>

On Wed, Dec 30, 2020 at 12:33:02PM +1000, Nicholas Piggin wrote:
> Excerpts from Russell King - ARM Linux admin's message of December 29, 2020 8:44 pm:
> > On Tue, Dec 29, 2020 at 01:09:12PM +1000, Nicholas Piggin wrote:
> >> I think it should certainly be documented in terms of what guarantees
> >> it provides to application, _not_ the kinds of instructions it may or
> >> may not induce the core to execute. And if existing API can't be
> >> re-documented sanely, then deprecatd and new ones added that DTRT.
> >> Possibly under a new system call, if arch's like ARM want a range
> >> flush and we don't want to expand the multiplexing behaviour of
> >> membarrier even more (sigh).
> > 
> > The 32-bit ARM sys_cacheflush() is there only to support self-modifying
> > code, and takes whatever actions are necessary to support that.
> > Exactly what actions it takes are cache implementation specific, and
> > should be of no concern to the caller, but the underlying thing is...
> > it's to support self-modifying code.
> 
>    Caveat
>        cacheflush()  should  not  be used in programs intended to be portable.
>        On Linux, this call first appeared on the MIPS architecture, but  nowa‐
>        days, Linux provides a cacheflush() system call on some other architec‐
>        tures, but with different arguments.
> 
> What a disaster. Another badly designed interface, although it didn't 
> originate in Linux it sounds like we weren't to be outdone so
> we messed it up even worse.
> 
> flushing caches is neither necessary nor sufficient for code modification
> on many processors. Maybe some old MIPS specific private thing was fine,
> but certainly before it grew to other architectures, somebody should 
> have thought for more than 2 minutes about it. Sigh.

WARNING: You are bordering on being objectionable and offensive with
that comment.

The ARM interface was designed by me back in the very early days of
Linux, probably while you were still in dypers, based on what was
known at the time.  Back in the early 2000s, ideas such as relaxed
memory ordering were not known.  All there was was one level of
harvard cache.

So, juts shut up with your insults.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* [PATCH] ibmvnic: fix: NULL pointer dereference.
From: YANG LI @ 2020-12-30  7:23 UTC (permalink / raw)
  To: davem
  Cc: linux-kernel, YANG LI, netdev, ljp, drt, kuba, sukadev,
	linuxppc-dev, paulus

The error is due to dereference a null pointer in function
reset_one_sub_crq_queue():

if (!scrq) {
    netdev_dbg(adapter->netdev,
               "Invalid scrq reset. irq (%d) or msgs(%p).\n",
		scrq->irq, scrq->msgs);
		return -EINVAL;
}

If the expression is true, scrq must be a null pointer and cannot
dereference.

Signed-off-by: YANG LI <abaci-bugfix@linux.alibaba.com>
Reported-by: Abaci <abaci@linux.alibaba.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index f302504..d7472be 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2981,9 +2981,7 @@ static int reset_one_sub_crq_queue(struct ibmvnic_adapter *adapter,
 	int rc;
 
 	if (!scrq) {
-		netdev_dbg(adapter->netdev,
-			   "Invalid scrq reset. irq (%d) or msgs (%p).\n",
-			   scrq->irq, scrq->msgs);
+		netdev_dbg(adapter->netdev, "Invalid scrq reset.\n");
 		return -EINVAL;
 	}
 
-- 
1.8.3.1


^ permalink raw reply related

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Nicholas Piggin @ 2020-12-30  2:33 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: paulmck, Arnd Bergmann, Jann Horn, Peter Zijlstra, x86,
	linux-kernel, stable, Will Deacon, Mathieu Desnoyers,
	Andy Lutomirski, Catalin Marinas, Paul Mackerras, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20201229104456.GK1551@shell.armlinux.org.uk>

Excerpts from Russell King - ARM Linux admin's message of December 29, 2020 8:44 pm:
> On Tue, Dec 29, 2020 at 01:09:12PM +1000, Nicholas Piggin wrote:
>> I think it should certainly be documented in terms of what guarantees
>> it provides to application, _not_ the kinds of instructions it may or
>> may not induce the core to execute. And if existing API can't be
>> re-documented sanely, then deprecatd and new ones added that DTRT.
>> Possibly under a new system call, if arch's like ARM want a range
>> flush and we don't want to expand the multiplexing behaviour of
>> membarrier even more (sigh).
> 
> The 32-bit ARM sys_cacheflush() is there only to support self-modifying
> code, and takes whatever actions are necessary to support that.
> Exactly what actions it takes are cache implementation specific, and
> should be of no concern to the caller, but the underlying thing is...
> it's to support self-modifying code.

   Caveat
       cacheflush()  should  not  be used in programs intended to be portable.
       On Linux, this call first appeared on the MIPS architecture, but  nowa‐
       days, Linux provides a cacheflush() system call on some other architec‐
       tures, but with different arguments.

What a disaster. Another badly designed interface, although it didn't 
originate in Linux it sounds like we weren't to be outdone so
we messed it up even worse.

flushing caches is neither necessary nor sufficient for code modification
on many processors. Maybe some old MIPS specific private thing was fine,
but certainly before it grew to other architectures, somebody should 
have thought for more than 2 minutes about it. Sigh.

> 
> Sadly, because it's existed for 20+ years, and it has historically been
> sufficient for other purposes too, it has seen quite a bit of abuse
> despite its design purpose not changing - it's been used by graphics
> drivers for example. They quickly learnt the error of their ways with
> ARMv6+, since it does not do sufficient for their purposes given the
> cache architectures found there.
> 
> Let's not go around redesigning this after twenty odd years, requiring
> a hell of a lot of pain to users. This interface is called by code
> generated by GCC, so to change it you're looking at patching GCC as
> well as the kernel, and you basically will make new programs
> incompatible with older kernels - very bad news for users.

For something to be redesigned it had to have been designed in the first 
place, so there is no danger of that don't worry... But no I never 
suggested making incompatible changes to any existing system call, I 
said "re-documented". And yes I said deprecated but in Linux that really 
means kept indefinitely.

If ARM, MIPS, 68k etc programs and toolchains keep using what they are 
using it'll keep working no problem.

The point is we're growing new interfaces, and making the same mistakes. 
It's not portable (ARCH_HAS_MEMBARRIER_SYNC_CORE), it's also specified 
in terms of low level processor operations rather than higher level 
intent, and also is not sufficient for self-modifying code (without 
additional cache flush on some processors).

The application wants a call that says something like "memory modified 
before the call will be visible as instructions (including illegal 
instructions) by all threads in the program after the system call 
returns, and no threads will be subject to any effects of executing the 
previous contents of that memory.

So I think the basics are simple (although should confirm with some JIT 
and debugger etc developers, and not just Android mind you). There are 
some complications in details, address ranges, virtual/physical, thread 
local vs process vs different process or system-wide, memory ordering 
and propagation of i and d sides, etc. But that can be worked through, 
erring on the side of sanity rather than pointless micro-optmisations.

Thanks,
Nick

^ permalink raw reply

* [PATCH] powerpc/mm: add sanity check to avoid null pointer dereference
From: Defang Bo @ 2020-12-28  3:10 UTC (permalink / raw)
  To: mpe, benh, paulus, christophe.leroy, akpm, geert, rppt,
	linuxppc-dev, linux-kernel
  Cc: Defang Bo

Similar to commit<0dc294f717d4>("powerpc/mm: bail out early when flushing TLB page"),
there should be a check for 'mm' to prevent Null pointer dereference
in case of 'mm' argument was legitimately passed.

Signed-off-by: Defang Bo <bodefang@126.com>
---
 arch/powerpc/mm/nohash/tlb.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index 5872f69..1d89335 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -192,6 +192,9 @@ void local_flush_tlb_mm(struct mm_struct *mm)
 {
 	unsigned int pid;
 
+	if (WARN_ON(!mm))
+		return;
+
 	preempt_disable();
 	pid = mm->context.id;
 	if (pid != MMU_NO_CONTEXT)
@@ -205,8 +208,11 @@ void __local_flush_tlb_page(struct mm_struct *mm, unsigned long vmaddr,
 {
 	unsigned int pid;
 
+	if (WARN_ON(!mm))
+		return;
+
 	preempt_disable();
-	pid = mm ? mm->context.id : 0;
+	pid = mm->context.id;
 	if (pid != MMU_NO_CONTEXT)
 		_tlbil_va(vmaddr, pid, tsize, ind);
 	preempt_enable();
@@ -268,6 +274,9 @@ void flush_tlb_mm(struct mm_struct *mm)
 {
 	unsigned int pid;
 
+	if (WARN_ON(!mm))
+		return;
+
 	preempt_disable();
 	pid = mm->context.id;
 	if (unlikely(pid == MMU_NO_CONTEXT))
-- 
2.7.4


^ permalink raw reply related

* [PATCH] powerpc/mm: avoid null pointer dereference in flush_tlb_mm
From: Defang Bo @ 2020-12-25  7:11 UTC (permalink / raw)
  To: mpe, benh, paulus
  Cc: linux-kernel, penberg, Defang Bo, akpm, linuxppc-dev, rppt

Similar to commit<0dc294f7>, there should be a check for 'mm' to prevent Null pointer dereference in case of 'mm' argument was legitimately passed.

Signed-off-by: Defang Bo <bodefang@126.com>
---
 arch/powerpc/mm/nohash/tlb.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index 5872f69..09796c9 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -268,6 +268,9 @@ void flush_tlb_mm(struct mm_struct *mm)
 {
 	unsigned int pid;
 
+	if (unlikely(!mm))
+		return;
+
 	preempt_disable();
 	pid = mm->context.id;
 	if (unlikely(pid == MMU_NO_CONTEXT))
-- 
2.7.4


^ permalink raw reply related

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Russell King - ARM Linux admin @ 2020-12-29 10:44 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: paulmck, Arnd Bergmann, Jann Horn, Peter Zijlstra, x86,
	linux-kernel, stable, Will Deacon, Mathieu Desnoyers,
	Andy Lutomirski, Catalin Marinas, Paul Mackerras, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <1609210162.4d8dqilke6.astroid@bobo.none>

On Tue, Dec 29, 2020 at 01:09:12PM +1000, Nicholas Piggin wrote:
> I think it should certainly be documented in terms of what guarantees
> it provides to application, _not_ the kinds of instructions it may or
> may not induce the core to execute. And if existing API can't be
> re-documented sanely, then deprecatd and new ones added that DTRT.
> Possibly under a new system call, if arch's like ARM want a range
> flush and we don't want to expand the multiplexing behaviour of
> membarrier even more (sigh).

The 32-bit ARM sys_cacheflush() is there only to support self-modifying
code, and takes whatever actions are necessary to support that.
Exactly what actions it takes are cache implementation specific, and
should be of no concern to the caller, but the underlying thing is...
it's to support self-modifying code.

Sadly, because it's existed for 20+ years, and it has historically been
sufficient for other purposes too, it has seen quite a bit of abuse
despite its design purpose not changing - it's been used by graphics
drivers for example. They quickly learnt the error of their ways with
ARMv6+, since it does not do sufficient for their purposes given the
cache architectures found there.

Let's not go around redesigning this after twenty odd years, requiring
a hell of a lot of pain to users. This interface is called by code
generated by GCC, so to change it you're looking at patching GCC as
well as the kernel, and you basically will make new programs
incompatible with older kernels - very bad news for users.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* unsubscribe
From: Shawn Landden @ 2020-12-29  8:54 UTC (permalink / raw)
  To: linuxppc-dev



-- 
Shawn Landden


^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Nicholas Piggin @ 2020-12-29  3:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Arnd Bergmann, X86 ML, LKML, stable, Will Deacon,
	Mathieu Desnoyers, Catalin Marinas, Paul Mackerras, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <CALCETrX4v1KEf6ikVtFg6juh3Z_esJ-+6PLT1A21JJeTVh2k8g@mail.gmail.com>

Excerpts from Andy Lutomirski's message of December 29, 2020 10:36 am:
> On Mon, Dec 28, 2020 at 4:11 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Excerpts from Andy Lutomirski's message of December 28, 2020 4:28 am:
>> > The old sync_core_before_usermode() comments said that a non-icache-syncing
>> > return-to-usermode instruction is x86-specific and that all other
>> > architectures automatically notice cross-modified code on return to
>> > userspace.  Based on my general understanding of how CPUs work and based on
>> > my atttempt to read the ARM manual, this is not true at all.  In fact, x86
>> > seems to be a bit of an anomaly in the other direction: x86's IRET is
>> > unusually heavyweight for a return-to-usermode instruction.
>>
>> "sync_core_before_usermode" as I've said says nothing to arch, or to the
>> scheduler, or to membarrier.
> 
> Agreed.  My patch tries to fix this.  I agree that the name is bad and
> could be improved further.  We should define what
> membarrier(...SYNC_CORE) actually does and have arch hooks to make it
> happen.
> 
>> > So let's drop any pretense that we can have a generic way implementation
>> > behind membarrier's SYNC_CORE flush and require all architectures that opt
>> > in to supply their own.  This means x86, arm64, and powerpc for now.  Let's
>> > also rename the function from sync_core_before_usermode() to
>> > membarrier_sync_core_before_usermode() because the precise flushing details
>> > may very well be specific to membarrier, and even the concept of
>> > "sync_core" in the kernel is mostly an x86-ism.
>>
>> The concept of "sync_core" (x86: serializing instruction, powerpc: context
>> synchronizing instruction, etc) is not an x86-ism at all. x86 just wanted
>> to add a serializing instruction to generic code so it grew this nasty API,
>> but the concept applies broadly.
> 
> I mean that the mapping from the name "sync_core" to its semantics is
> x86 only.  The string "sync_core" appears in the kernel only in
> arch/x86, membarrier code, membarrier docs, and a single SGI driver
> that is x86-only.  Sure, the idea of serializing things is fairly
> generic, but exactly what operations serialize what, when things need
> serialization, etc is quite architecture specific.

Okay, well yes it's x86 only in name, I was more talking about the 
concept.

> Heck, on 486 you serialize the instruction stream with JMP.

x86-specific aside, I did think the semantics of a "serializing 
instruction" was reasonably well architected in x86. Sure it could do 
other things as well, but if you executed a serializing instruction,
then you had a decent set of guarantees (e.g., what you might want
for code modification).

> 
>> > +static inline void membarrier_sync_core_before_usermode(void)
>> > +{
>> > +     /*
>> > +      * XXX: I know basically nothing about powerpc cache management.
>> > +      * Is this correct?
>> > +      */
>> > +     isync();
>>
>> This is not about memory ordering or cache management, it's about
>> pipeline management. Powerpc's return to user mode serializes the
>> CPU (aka the hardware thread, _not_ the core; another wrongness of
>> the name, but AFAIKS the HW thread is what is required for
>> membarrier). So this is wrong, powerpc needs nothing here.
> 
> Fair enough.  I'm happy to defer to you on the powerpc details.  In
> any case, this just illustrates that we need feedback from a person
> who knows more about ARM64 than I do.
> 

Thanks,
Nick

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Nicholas Piggin @ 2020-12-29  3:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: paulmck, Arnd Bergmann, Jann Horn, Peter Zijlstra, x86,
	Russell King, ARM Linux, stable, linux-kernel, Will Deacon,
	Mathieu Desnoyers, Catalin Marinas, Paul Mackerras, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <CALCETrX6MOqmN5_jhyO1jJB7M3_T+hbomjxPYZLJmLVNmXAVzA@mail.gmail.com>

Excerpts from Andy Lutomirski's message of December 29, 2020 10:56 am:
> On Mon, Dec 28, 2020 at 4:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Excerpts from Andy Lutomirski's message of December 29, 2020 7:06 am:
>> > On Mon, Dec 28, 2020 at 12:32 PM Mathieu Desnoyers
>> > <mathieu.desnoyers@efficios.com> wrote:
>> >>
>> >> ----- On Dec 28, 2020, at 2:44 PM, Andy Lutomirski luto@kernel.org wrote:
>> >>
>> >> > On Mon, Dec 28, 2020 at 11:09 AM Russell King - ARM Linux admin
>> >> > <linux@armlinux.org.uk> wrote:
>> >> >>
>> >> >> On Mon, Dec 28, 2020 at 07:29:34PM +0100, Jann Horn wrote:
>> >> >> > After chatting with rmk about this (but without claiming that any of
>> >> >> > this is his opinion), based on the manpage, I think membarrier()
>> >> >> > currently doesn't really claim to be synchronizing caches? It just
>> >> >> > serializes cores. So arguably if userspace wants to use membarrier()
>> >> >> > to synchronize code changes, userspace should first do the code
>> >> >> > change, then flush icache as appropriate for the architecture, and
>> >> >> > then do the membarrier() to ensure that the old code is unused?
>> >>
>> >> ^ exactly, yes.
>> >>
>> >> >> >
>> >> >> > For 32-bit arm, rmk pointed out that that would be the cacheflush()
>> >> >> > syscall. That might cause you to end up with two IPIs instead of one
>> >> >> > in total, but we probably don't care _that_ much about extra IPIs on
>> >> >> > 32-bit arm?
>> >>
>> >> This was the original thinking, yes. The cacheflush IPI will flush specific
>> >> regions of code, and the membarrier IPI issues context synchronizing
>> >> instructions.
>>
>> APIs should be written in terms of the service they provide to
>> userspace, and in highest level terms as possible, rather than directing
>> hardware to do some low level operation. Unfortunately we're stuck with
>> this for now. We could deprecate it and replace it though.
>>
>> If userspace wants to modify code and ensure that after the system call
>> returns then no other thread will be executing the previous code, then
>> there should be an API for that. It could actually combine the two IPIs
>> into one for architectures that require both too.
> 
> I agree.  The membarrier API for SYNC_CORE is pretty nasty.  I would
> much prefer a real API for JITs to use.
> 
>>
>> >>
>> >> Architectures with coherent i/d caches don't need the cacheflush step.
>> >
>> > There are different levels of coherency -- VIVT architectures may have
>> > differing requirements compared to PIPT, etc.
>> >
>> > In any case, I feel like the approach taken by the documentation is
>> > fundamentally confusing.  Architectures don't all speak the same
>> > language  How about something like:
>> >
>> > The SYNC_CORE operation causes all threads in the caller's address
>> > space (including the caller) to execute an architecture-defined
>> > barrier operation.  membarrier() will ensure that this barrier is
>> > executed at a time such that all data writes done by the calling
>> > thread before membarrier() are made visible by the barrier.
>> > Additional architecture-dependent cache management operations may be
>> > required to use this for JIT code.
>>
>> As said this isn't what SYNC_CORE does, and it's not what powerpc
>> context synchronizing instructions do either, it will very much
>> re-order visibility of stores around such an instruction.
> 
> Perhaps the docs should be entirely arch-specific.  It may well be
> impossible to state what it does in an arch-neutral way.

I think what I wrote above -- after the call returns, none of the
threads in the process will be executing instructions overwritten
previously by the caller (provided their i-caches are made coherent
with the caller's modifications).

>> A thread completes store instructions into a store queue, which is
>> as far as a context synchronizing event goes. Visibility comes at
>> some indeterminite time later.
> 
> As currently implemented, it has the same visibility semantics as
> regular membarrier, too.

Ah I actually missed that SYNC_CORE is in _addition_ to the memory
barriers, and that's documented API too, not just implementation
sorry.

> So if I do:
> 
> a = 1;
> membarrier(SYNC_CORE);
> b = 1;
> 
> and another thread does:
> 
> while (READ_ONCE(b) != 1)
>   ;
> barrier();
> assert(a == 1);

Right that's true, due to the MEMBARRIER_CMD_PRIVATE_EXPEDITED. Neither
that barrier or the added SYNC_CORE behaviour imply visibility though.

> 
> then the assertion will pass.  Similarly, one can do this, I hope:
> 
> memcpy(codeptr, [some new instructions], len);
> arch_dependent_cache_flush(codeptr, len);
> membarrier(SYNC_CORE);
> ready = 1;
> 
> and another thread does:
> 
> while (READ_ONCE(ready) != 1)
>   ;
> barrier();
> (*codeptr)();
> 
> arch_dependent_cache_flush is a nop on x86.  On arm and arm64, it
> appears to be a syscall, although maybe arm64 can do it from
> userspace.  I still don't know what it is on powerpc.
> 
> Even using the term "cache" here is misleading.  x86 chips have all
> kinds of barely-documented instruction caches, and they have varying
> degrees of coherency.  The architecture actually promises that, if you
> do a certain incantation, then you get the desired result.
> membarrier() allows a user to do this incantation.  But trying to
> replicate the incantation verbatim on an architecture like ARM is
> insufficient, and trying to flush the things that are documented as
> being caches on x86 is expensive and a complete waste of time on x86.
> When you write some JIT code, you do *not* want to flush it all the
> way to main memory, especially on CPUs don't have the ability to write
> back invalidating.  (That's most CPUs.)
> 
> Even on x86, I suspect that the various decoded insn caches are rather
> more coherent than documented, and I have questions in to Intel about
> this.  No answers yet.
> 
> So perhaps the right approach is to say that membarrier() helps you
> perform the architecture-specific sequence of steps needed to safely
> modify code.  On x86, you use it like this.  On arm64, you do this
> other thing.  On powerpc, you do something else.

I think it should certainly be documented in terms of what guarantees
it provides to application, _not_ the kinds of instructions it may or
may not induce the core to execute. And if existing API can't be
re-documented sanely, then deprecatd and new ones added that DTRT.
Possibly under a new system call, if arch's like ARM want a range
flush and we don't want to expand the multiplexing behaviour of
membarrier even more (sigh).

> 
>>
>> I would be surprised if x86's serializing instructions were different
>> than powerpc. rdtsc ordering or flushing stores to cache would be
>> surprising.
>>
> 
> At the very least, x86 has several levels of what ARM might call
> "context synchronization" AFAICT.  STAC, CLAC, and POPF do a form of
> context synchronization in that the changes they cause to the MMU take
> effect immediately, but they are not documented as synchronizing the
> instruction stream.  "Serializing" instructions do all kinds of
> things, not all of which may be architecturally visible at all.
> MFENCE and LFENCE do various complicated things, and LFENCE has magic
> retroactive capabilities on old CPUs that were not documented when
> those CPUs were released.  SFENCE does a different form of
> synchronization entirely.  LOCK does something else.  (The
> relationship between LOCK and MFENCE is confusing at best.)  RDTSC
> doesn't serialize anything at all, but RDTSCP does provide a form of
> serialization that's kind of ilke LFENCE.  It's a mess.  Even the
> manuals are inconsistent about what "serialize" means.  JMP has its
> own magic on x86, but only on very very old CPUs.
> 
> The specific instruction that flushes everything into the coherency
> domain is SFENCE, and SFENCE is not, for normal purposes, needed for
> self- or cross-modifying code.
> 

Good reason to avoid such language in the system call interface!

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH 4/5] ibmvfc: complete commands outside the host/queue lock
From: kernel test robot @ 2020-12-29  2:29 UTC (permalink / raw)
  To: Tyrel Datwyler, james.bottomley
  Cc: Tyrel Datwyler, kbuild-all, martin.petersen, linux-scsi,
	linux-kernel, Brian King, brking, linuxppc-dev
In-Reply-To: <20201218231916.279833-5-tyreld@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 5008 bytes --]

Hi Tyrel,

I love your patch! Perhaps something to improve:

[auto build test WARNING on next-20201218]
[also build test WARNING on v5.11-rc1]
[cannot apply to powerpc/next v5.10 v5.10-rc7 v5.10-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Tyrel-Datwyler/ibmvfc-MQ-preparatory-locking-work/20201219-072334
base:    0d52778b8710eb11cb616761a02aee0a7fd60425
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/2609a2691baaf1f06d8306a56575ae2eb076f625
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Tyrel-Datwyler/ibmvfc-MQ-preparatory-locking-work/20201219-072334
        git checkout 2609a2691baaf1f06d8306a56575ae2eb076f625
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/scsi/ibmvscsi/ibmvfc.c:1418:6: warning: no previous prototype for 'ibmvfc_locked_done' [-Wmissing-prototypes]
    1418 | void ibmvfc_locked_done(struct ibmvfc_event *evt)
         |      ^~~~~~~~~~~~~~~~~~
   In file included from arch/powerpc/include/asm/paca.h:15,
                    from arch/powerpc/include/asm/current.h:13,
                    from include/linux/thread_info.h:21,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/powerpc/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from include/linux/spinlock.h:51,
                    from include/linux/mmzone.h:8,
                    from include/linux/gfp.h:6,
                    from include/linux/umh.h:4,
                    from include/linux/kmod.h:9,
                    from include/linux/module.h:16,
                    from drivers/scsi/ibmvscsi/ibmvfc.c:10:
   In function 'strncpy',
       inlined from 'ibmvfc_gather_partition_info' at drivers/scsi/ibmvscsi/ibmvfc.c:1267:3,
       inlined from 'ibmvfc_npiv_login' at drivers/scsi/ibmvscsi/ibmvfc.c:4538:2:
   include/linux/string.h:291:30: warning: '__builtin_strncpy' specified bound 97 equals destination size [-Wstringop-truncation]
     291 | #define __underlying_strncpy __builtin_strncpy
         |                              ^
   include/linux/string.h:301:9: note: in expansion of macro '__underlying_strncpy'
     301 |  return __underlying_strncpy(p, q, size);
         |         ^~~~~~~~~~~~~~~~~~~~
   In function 'strncpy',
       inlined from 'ibmvfc_set_login_info' at drivers/scsi/ibmvscsi/ibmvfc.c:1307:2,
       inlined from 'ibmvfc_npiv_login' at drivers/scsi/ibmvscsi/ibmvfc.c:4539:2:
   include/linux/string.h:291:30: warning: '__builtin_strncpy' specified bound 256 equals destination size [-Wstringop-truncation]
     291 | #define __underlying_strncpy __builtin_strncpy
         |                              ^
   include/linux/string.h:301:9: note: in expansion of macro '__underlying_strncpy'
     301 |  return __underlying_strncpy(p, q, size);
         |         ^~~~~~~~~~~~~~~~~~~~
   In function 'strncpy',
       inlined from 'ibmvfc_set_login_info' at drivers/scsi/ibmvscsi/ibmvfc.c:1312:2,
       inlined from 'ibmvfc_npiv_login' at drivers/scsi/ibmvscsi/ibmvfc.c:4539:2:
   include/linux/string.h:291:30: warning: '__builtin_strncpy' specified bound 256 equals destination size [-Wstringop-truncation]
     291 | #define __underlying_strncpy __builtin_strncpy
         |                              ^
   include/linux/string.h:301:9: note: in expansion of macro '__underlying_strncpy'
     301 |  return __underlying_strncpy(p, q, size);
         |         ^~~~~~~~~~~~~~~~~~~~


vim +/ibmvfc_locked_done +1418 drivers/scsi/ibmvscsi/ibmvfc.c

  1409	
  1410	/**
  1411	 * ibmvfc_locked_done - Calls evt completion with host_lock held
  1412	 * @evt:	ibmvfc evt to complete
  1413	 *
  1414	 * All non-scsi command completion callbacks have the expectation that the
  1415	 * host_lock is held. This callback is used by ibmvfc_init_event to wrap a
  1416	 * MAD evt with the host_lock.
  1417	 **/
> 1418	void ibmvfc_locked_done(struct ibmvfc_event *evt)
  1419	{
  1420		unsigned long flags;
  1421	
  1422		spin_lock_irqsave(evt->vhost->host->host_lock, flags);
  1423		evt->_done(evt);
  1424		spin_unlock_irqrestore(evt->vhost->host->host_lock, flags);
  1425	}
  1426	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 72494 bytes --]

^ permalink raw reply

* [PATCH] tools/perf: Fix powerpc gap between kernel end and module start
From: Athira Rajeev @ 2020-12-29  2:14 UTC (permalink / raw)
  To: mpe, acme, jolsa; +Cc: kjain, maddy, linuxppc-dev

Running "perf mem report" in TUI mode fails with ENOMEM message
in powerpc:

failed to process sample

Running with debug and verbose options points that issue is while
allocating memory for sample histograms.

The error path is:
symbol__inc_addr_samples -> __symbol__inc_addr_samples
-> annotated_source__histogram

symbol__inc_addr_samples calls annotated_source__alloc_histograms
to allocate memory for sample histograms using calloc. Here calloc fails
since the size of symbol is huge. The size of a symbol is calculated as
difference between its start and end address.

Example histogram allocation that fails is:
sym->name is _end, sym->start is 0xc0000000027a0000, sym->end is
0xc008000003890000, symbol__size(sym) is 0x80000010f0000

In above case, difference between sym->start (0xc0000000027a0000)
and sym->end (0xc008000003890000) is huge.

This is same problem as in s390 and arm64 which are fixed in commits:
'commit b9c0a64901d5 ("perf annotate: Fix s390 gap between kernel end
and module start")'
'commit 78886f3ed37e ("perf symbols: Fix arm64 gap between kernel start
and module end")'

When this symbol was read first, its start and end address was set to
address which matches with data from /proc/kallsyms.

After symbol__new:
symbol__new: _end 0xc0000000027a0000-0xc0000000027a0000

From /proc/kallsyms:
...
c000000002799370 b backtrace_flag
c000000002799378 B radix_tree_node_cachep
c000000002799380 B __bss_stop
c0000000027a0000 B _end
c008000003890000 t icmp_checkentry      [ip_tables]
c008000003890038 t ipt_alloc_initial_table      [ip_tables]
c008000003890468 T ipt_do_table [ip_tables]
c008000003890de8 T ipt_unregister_table_pre_exit        [ip_tables]
...

Perf calls function symbols__fixup_end() which sets the end of symbol
to 0xc008000003890000, which is the next address and this is the start
address of first module (icmp_checkentry in above) which will make the
huge symbol size of 0x80000010f0000.

After symbols__fixup_end:
symbols__fixup_end: sym->name: _end, sym->start: 0xc0000000027a0000,
sym->end: 0xc008000003890000

On powerpc, kernel text segment is located at 0xc000000000000000
whereas the modules are located at very high memory addresses,
0xc00800000xxxxxxx. Since the gap between end of kernel text segment
and beginning of first module's address is high, histogram allocation
using calloc fails.

Fix this by detecting the kernel's last symbol and limiting
the range of last kernel symbol to pagesize.

Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com>
---
 tools/perf/arch/powerpc/util/Build     |  1 +
 tools/perf/arch/powerpc/util/machine.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)
 create mode 100644 tools/perf/arch/powerpc/util/machine.c

diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build
index e86e210bf514..b7945e5a543b 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -1,4 +1,5 @@
 perf-y += header.o
+perf-y += machine.o
 perf-y += kvm-stat.o
 perf-y += perf_regs.o
 perf-y += mem-events.o
diff --git a/tools/perf/arch/powerpc/util/machine.c b/tools/perf/arch/powerpc/util/machine.c
new file mode 100644
index 000000000000..c30e5cc88c16
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/machine.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <stdio.h>
+#include <string.h>
+#include <internal/lib.h> // page_size
+#include "debug.h"
+#include "symbol.h"
+
+/* On powerpc kernel text segment start at memory addresses, 0xc000000000000000
+ * whereas the modules are located at very high memory addresses,
+ * for example 0xc00800000xxxxxxx. The gap between end of kernel text segment
+ * and beginning of first module's text segment is very high.
+ * Therefore do not fill this gap and do not assign it to the kernel dso map.
+ */
+
+void arch__symbols__fixup_end(struct symbol *p, struct symbol *c)
+{
+	if (strchr(p->name, '[') == NULL && strchr(c->name, '['))
+		/* Limit the range of last kernel symbol */
+		p->end += page_size;
+	else
+		p->end = c->start;
+	pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
+}
-- 
1.8.3.1


^ permalink raw reply related

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Andy Lutomirski @ 2020-12-29  0:56 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: paulmck, Arnd Bergmann, Jann Horn, Peter Zijlstra, x86,
	Russell King, ARM Linux, stable, linux-kernel, Will Deacon,
	Mathieu Desnoyers, Andy Lutomirski, Catalin Marinas,
	Paul Mackerras, linuxppc-dev, linux-arm-kernel
In-Reply-To: <1609200902.me5niwm2t6.astroid@bobo.none>

On Mon, Dec 28, 2020 at 4:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Excerpts from Andy Lutomirski's message of December 29, 2020 7:06 am:
> > On Mon, Dec 28, 2020 at 12:32 PM Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> >>
> >> ----- On Dec 28, 2020, at 2:44 PM, Andy Lutomirski luto@kernel.org wrote:
> >>
> >> > On Mon, Dec 28, 2020 at 11:09 AM Russell King - ARM Linux admin
> >> > <linux@armlinux.org.uk> wrote:
> >> >>
> >> >> On Mon, Dec 28, 2020 at 07:29:34PM +0100, Jann Horn wrote:
> >> >> > After chatting with rmk about this (but without claiming that any of
> >> >> > this is his opinion), based on the manpage, I think membarrier()
> >> >> > currently doesn't really claim to be synchronizing caches? It just
> >> >> > serializes cores. So arguably if userspace wants to use membarrier()
> >> >> > to synchronize code changes, userspace should first do the code
> >> >> > change, then flush icache as appropriate for the architecture, and
> >> >> > then do the membarrier() to ensure that the old code is unused?
> >>
> >> ^ exactly, yes.
> >>
> >> >> >
> >> >> > For 32-bit arm, rmk pointed out that that would be the cacheflush()
> >> >> > syscall. That might cause you to end up with two IPIs instead of one
> >> >> > in total, but we probably don't care _that_ much about extra IPIs on
> >> >> > 32-bit arm?
> >>
> >> This was the original thinking, yes. The cacheflush IPI will flush specific
> >> regions of code, and the membarrier IPI issues context synchronizing
> >> instructions.
>
> APIs should be written in terms of the service they provide to
> userspace, and in highest level terms as possible, rather than directing
> hardware to do some low level operation. Unfortunately we're stuck with
> this for now. We could deprecate it and replace it though.
>
> If userspace wants to modify code and ensure that after the system call
> returns then no other thread will be executing the previous code, then
> there should be an API for that. It could actually combine the two IPIs
> into one for architectures that require both too.

I agree.  The membarrier API for SYNC_CORE is pretty nasty.  I would
much prefer a real API for JITs to use.

>
> >>
> >> Architectures with coherent i/d caches don't need the cacheflush step.
> >
> > There are different levels of coherency -- VIVT architectures may have
> > differing requirements compared to PIPT, etc.
> >
> > In any case, I feel like the approach taken by the documentation is
> > fundamentally confusing.  Architectures don't all speak the same
> > language  How about something like:
> >
> > The SYNC_CORE operation causes all threads in the caller's address
> > space (including the caller) to execute an architecture-defined
> > barrier operation.  membarrier() will ensure that this barrier is
> > executed at a time such that all data writes done by the calling
> > thread before membarrier() are made visible by the barrier.
> > Additional architecture-dependent cache management operations may be
> > required to use this for JIT code.
>
> As said this isn't what SYNC_CORE does, and it's not what powerpc
> context synchronizing instructions do either, it will very much
> re-order visibility of stores around such an instruction.

Perhaps the docs should be entirely arch-specific.  It may well be
impossible to state what it does in an arch-neutral way.

>
> A thread completes store instructions into a store queue, which is
> as far as a context synchronizing event goes. Visibility comes at
> some indeterminite time later.

As currently implemented, it has the same visibility semantics as
regular membarrier, too.  So if I do:

a = 1;
membarrier(SYNC_CORE);
b = 1;

and another thread does:

while (READ_ONCE(b) != 1)
  ;
barrier();
assert(a == 1);

then the assertion will pass.  Similarly, one can do this, I hope:

memcpy(codeptr, [some new instructions], len);
arch_dependent_cache_flush(codeptr, len);
membarrier(SYNC_CORE);
ready = 1;

and another thread does:

while (READ_ONCE(ready) != 1)
  ;
barrier();
(*codeptr)();

arch_dependent_cache_flush is a nop on x86.  On arm and arm64, it
appears to be a syscall, although maybe arm64 can do it from
userspace.  I still don't know what it is on powerpc.

Even using the term "cache" here is misleading.  x86 chips have all
kinds of barely-documented instruction caches, and they have varying
degrees of coherency.  The architecture actually promises that, if you
do a certain incantation, then you get the desired result.
membarrier() allows a user to do this incantation.  But trying to
replicate the incantation verbatim on an architecture like ARM is
insufficient, and trying to flush the things that are documented as
being caches on x86 is expensive and a complete waste of time on x86.
When you write some JIT code, you do *not* want to flush it all the
way to main memory, especially on CPUs don't have the ability to write
back invalidating.  (That's most CPUs.)

Even on x86, I suspect that the various decoded insn caches are rather
more coherent than documented, and I have questions in to Intel about
this.  No answers yet.

So perhaps the right approach is to say that membarrier() helps you
perform the architecture-specific sequence of steps needed to safely
modify code.  On x86, you use it like this.  On arm64, you do this
other thing.  On powerpc, you do something else.

>
> I would be surprised if x86's serializing instructions were different
> than powerpc. rdtsc ordering or flushing stores to cache would be
> surprising.
>

At the very least, x86 has several levels of what ARM might call
"context synchronization" AFAICT.  STAC, CLAC, and POPF do a form of
context synchronization in that the changes they cause to the MMU take
effect immediately, but they are not documented as synchronizing the
instruction stream.  "Serializing" instructions do all kinds of
things, not all of which may be architecturally visible at all.
MFENCE and LFENCE do various complicated things, and LFENCE has magic
retroactive capabilities on old CPUs that were not documented when
those CPUs were released.  SFENCE does a different form of
synchronization entirely.  LOCK does something else.  (The
relationship between LOCK and MFENCE is confusing at best.)  RDTSC
doesn't serialize anything at all, but RDTSCP does provide a form of
serialization that's kind of ilke LFENCE.  It's a mess.  Even the
manuals are inconsistent about what "serialize" means.  JMP has its
own magic on x86, but only on very very old CPUs.

The specific instruction that flushes everything into the coherency
domain is SFENCE, and SFENCE is not, for normal purposes, needed for
self- or cross-modifying code.

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Nicholas Piggin @ 2020-12-29  0:36 UTC (permalink / raw)
  To: Andy Lutomirski, Mathieu Desnoyers
  Cc: paulmck, Arnd Bergmann, Jann Horn, Peter Zijlstra, x86,
	Russell King, ARM Linux, stable, linux-kernel, Will Deacon,
	Paul Mackerras, Catalin Marinas, linuxppc-dev, linux-arm-kernel
In-Reply-To: <CALCETrXx3Xe+4Y6WM-mp0cTUU=r3bW6PV2b25yA8bm1Gvak6wQ@mail.gmail.com>

Excerpts from Andy Lutomirski's message of December 29, 2020 7:06 am:
> On Mon, Dec 28, 2020 at 12:32 PM Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> ----- On Dec 28, 2020, at 2:44 PM, Andy Lutomirski luto@kernel.org wrote:
>>
>> > On Mon, Dec 28, 2020 at 11:09 AM Russell King - ARM Linux admin
>> > <linux@armlinux.org.uk> wrote:
>> >>
>> >> On Mon, Dec 28, 2020 at 07:29:34PM +0100, Jann Horn wrote:
>> >> > After chatting with rmk about this (but without claiming that any of
>> >> > this is his opinion), based on the manpage, I think membarrier()
>> >> > currently doesn't really claim to be synchronizing caches? It just
>> >> > serializes cores. So arguably if userspace wants to use membarrier()
>> >> > to synchronize code changes, userspace should first do the code
>> >> > change, then flush icache as appropriate for the architecture, and
>> >> > then do the membarrier() to ensure that the old code is unused?
>>
>> ^ exactly, yes.
>>
>> >> >
>> >> > For 32-bit arm, rmk pointed out that that would be the cacheflush()
>> >> > syscall. That might cause you to end up with two IPIs instead of one
>> >> > in total, but we probably don't care _that_ much about extra IPIs on
>> >> > 32-bit arm?
>>
>> This was the original thinking, yes. The cacheflush IPI will flush specific
>> regions of code, and the membarrier IPI issues context synchronizing
>> instructions.

APIs should be written in terms of the service they provide to 
userspace, and in highest level terms as possible, rather than directing 
hardware to do some low level operation. Unfortunately we're stuck with
this for now. We could deprecate it and replace it though.

If userspace wants to modify code and ensure that after the system call
returns then no other thread will be executing the previous code, then
there should be an API for that. It could actually combine the two IPIs
into one for architectures that require both too. 

>>
>> Architectures with coherent i/d caches don't need the cacheflush step.
> 
> There are different levels of coherency -- VIVT architectures may have
> differing requirements compared to PIPT, etc.
> 
> In any case, I feel like the approach taken by the documentation is
> fundamentally confusing.  Architectures don't all speak the same
> language  How about something like:
> 
> The SYNC_CORE operation causes all threads in the caller's address
> space (including the caller) to execute an architecture-defined
> barrier operation.  membarrier() will ensure that this barrier is
> executed at a time such that all data writes done by the calling
> thread before membarrier() are made visible by the barrier.
> Additional architecture-dependent cache management operations may be
> required to use this for JIT code.

As said this isn't what SYNC_CORE does, and it's not what powerpc
context synchronizing instructions do either, it will very much
re-order visibility of stores around such an instruction.

A thread completes store instructions into a store queue, which is
as far as a context synchronizing event goes. Visibility comes at
some indeterminite time later.

Also note that the regular membarrier syscall which does order
memory also does not make writes visible, it just ensures an
ordering.

> 
> x86: SYNC_CORE executes a barrier that will cause subsequent
> instruction fetches to observe prior writes.  Currently this will be a

I would be surprised if x86's serializing instructions were different 
than powerpc. rdtsc ordering or flushing stores to cache would be
surprising.

Thanks,
Nick

> "serializing" instruction, but, if future improved CPU documentation
> becomes available and relaxes this requirement, the barrier may
> change.  The kernel guarantees that writing new or modified
> instructions to normal memory (and issuing SFENCE if the writes were
> non-temporal) then doing a membarrier SYNC_CORE operation is
> sufficient to cause all threads in the caller's address space to
> execute the new or modified instructions.  This is true regardless of
> whether or not those instructions are written at the same virtual
> address from which they are subsequently executed.  No additional
> cache management is required on x86.
> 
> arm: Something about the cache management syscalls.
> 
> arm64: Ditto
> 
> powerpc: I have no idea.
> 

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Andy Lutomirski @ 2020-12-29  0:36 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Arnd Bergmann, X86 ML, LKML, stable, Will Deacon,
	Mathieu Desnoyers, Andy Lutomirski, Catalin Marinas,
	Paul Mackerras, linuxppc-dev, linux-arm-kernel
In-Reply-To: <1609199804.yrsu9vagzk.astroid@bobo.none>

On Mon, Dec 28, 2020 at 4:11 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Excerpts from Andy Lutomirski's message of December 28, 2020 4:28 am:
> > The old sync_core_before_usermode() comments said that a non-icache-syncing
> > return-to-usermode instruction is x86-specific and that all other
> > architectures automatically notice cross-modified code on return to
> > userspace.  Based on my general understanding of how CPUs work and based on
> > my atttempt to read the ARM manual, this is not true at all.  In fact, x86
> > seems to be a bit of an anomaly in the other direction: x86's IRET is
> > unusually heavyweight for a return-to-usermode instruction.
>
> "sync_core_before_usermode" as I've said says nothing to arch, or to the
> scheduler, or to membarrier.

Agreed.  My patch tries to fix this.  I agree that the name is bad and
could be improved further.  We should define what
membarrier(...SYNC_CORE) actually does and have arch hooks to make it
happen.

> > So let's drop any pretense that we can have a generic way implementation
> > behind membarrier's SYNC_CORE flush and require all architectures that opt
> > in to supply their own.  This means x86, arm64, and powerpc for now.  Let's
> > also rename the function from sync_core_before_usermode() to
> > membarrier_sync_core_before_usermode() because the precise flushing details
> > may very well be specific to membarrier, and even the concept of
> > "sync_core" in the kernel is mostly an x86-ism.
>
> The concept of "sync_core" (x86: serializing instruction, powerpc: context
> synchronizing instruction, etc) is not an x86-ism at all. x86 just wanted
> to add a serializing instruction to generic code so it grew this nasty API,
> but the concept applies broadly.

I mean that the mapping from the name "sync_core" to its semantics is
x86 only.  The string "sync_core" appears in the kernel only in
arch/x86, membarrier code, membarrier docs, and a single SGI driver
that is x86-only.  Sure, the idea of serializing things is fairly
generic, but exactly what operations serialize what, when things need
serialization, etc is quite architecture specific.

Heck, on 486 you serialize the instruction stream with JMP.

> > +static inline void membarrier_sync_core_before_usermode(void)
> > +{
> > +     /*
> > +      * XXX: I know basically nothing about powerpc cache management.
> > +      * Is this correct?
> > +      */
> > +     isync();
>
> This is not about memory ordering or cache management, it's about
> pipeline management. Powerpc's return to user mode serializes the
> CPU (aka the hardware thread, _not_ the core; another wrongness of
> the name, but AFAIKS the HW thread is what is required for
> membarrier). So this is wrong, powerpc needs nothing here.

Fair enough.  I'm happy to defer to you on the powerpc details.  In
any case, this just illustrates that we need feedback from a person
who knows more about ARM64 than I do.

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Andy Lutomirski @ 2020-12-29  0:30 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Catalin Marinas, Arnd Bergmann, paulmck, Peter Zijlstra, x86,
	linux-kernel, Nicholas Piggin, Russell King, ARM Linux,
	Will Deacon, Paul Mackerras, stable, Andy Lutomirski,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <1670059472.3671.1609189779376.JavaMail.zimbra@efficios.com>

On Mon, Dec 28, 2020 at 1:09 PM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> ----- On Dec 27, 2020, at 4:36 PM, Andy Lutomirski luto@kernel.org wrote:
>
> [...]
>
> >> You seem to have noticed odd cases on arm64 where this guarantee does not
> >> match reality. Where exactly can we find this in the code, and which part
> >> of the architecture manual can you point us to which supports your concern ?
> >>
> >> Based on the notes I have, use of `eret` on aarch64 guarantees a context
> >> synchronizing
> >> instruction when returning to user-space.
> >
> > Based on my reading of the manual, ERET on ARM doesn't synchronize
> > anything at all.  I can't find any evidence that it synchronizes data
> > or instructions, and I've seen reports that the CPU will happily
> > speculate right past it.
>
> Reading [1] there appears to be 3 kind of context synchronization events:
>
> - Taking an exception,
> - Returning from an exception,
> - ISB.

My reading of [1] is that all three of these are "context
synchronization event[s]", but that only ISB flushes the pipeline,
etc.  The little description of context synchronization seems to
suggest that it only implies that certain register changes become
effective.

>
> This other source [2] adds (search for Context synchronization operation):
>
> - Exit from Debug state
> - Executing a DCPS instruction
> - Executing a DRPS instruction
>
> "ERET" falls into the second kind of events, and AFAIU should be context
> synchronizing. That was confirmed to me by Will Deacon when membarrier
> sync-core was implemented for aarch64. If the architecture reference manuals
> are wrong, is there an errata ?
>
> As for the algorithm to use on ARMv8 to update instructions, see [2]
> B2.3.4  Implication of caches for the application programmer
> "Synchronization and coherency issues between data and instruction accesses"

This specifically discusses ISB.

Let's wait for an actual ARM64 expert to chime in, though.

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Nicholas Piggin @ 2020-12-29  0:11 UTC (permalink / raw)
  To: Andy Lutomirski, Mathieu Desnoyers, x86
  Cc: Arnd Bergmann, LKML, stable, Will Deacon, Paul Mackerras,
	Catalin Marinas, linuxppc-dev, linux-arm-kernel
In-Reply-To: <bf59ecb5487171a852bcc8cdd553ec797aedc485.1609093476.git.luto@kernel.org>

Excerpts from Andy Lutomirski's message of December 28, 2020 4:28 am:
> The old sync_core_before_usermode() comments said that a non-icache-syncing
> return-to-usermode instruction is x86-specific and that all other
> architectures automatically notice cross-modified code on return to
> userspace.  Based on my general understanding of how CPUs work and based on
> my atttempt to read the ARM manual, this is not true at all.  In fact, x86
> seems to be a bit of an anomaly in the other direction: x86's IRET is
> unusually heavyweight for a return-to-usermode instruction.

"sync_core_before_usermode" as I've said says nothing to arch, or to the 
scheduler, or to membarrier. It's badly named to start with so if 
renaming it it should be something else. exit_lazy_tlb() at least says
something quite precise to scheudler and arch code that implements
the membarrier.

But I don't mind the idea of just making it x86 specific if as you say the
arch code can detect lazy mm switches more precisely than generic and 
you want to do that.

> So let's drop any pretense that we can have a generic way implementation
> behind membarrier's SYNC_CORE flush and require all architectures that opt
> in to supply their own.  This means x86, arm64, and powerpc for now.  Let's
> also rename the function from sync_core_before_usermode() to
> membarrier_sync_core_before_usermode() because the precise flushing details
> may very well be specific to membarrier, and even the concept of
> "sync_core" in the kernel is mostly an x86-ism.

The concept of "sync_core" (x86: serializing instruction, powerpc: context
synchronizing instruction, etc) is not an x86-ism at all. x86 just wanted
to add a serializing instruction to generic code so it grew this nasty API,
but the concept applies broadly.

> 
> I admit that I'm rather surprised that the code worked at all on arm64,
> and I'm suspicious that it has never been very well tested.  My apologies
> for not reviewing this more carefully in the first place.


> 
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: x86@kernel.org
> Cc: stable@vger.kernel.org
> Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE")
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
> 
> Hi arm64 and powerpc people-
> 
> This is part of a series here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/fixes
> 
> Before I send out the whole series, I'm hoping that some arm64 and powerpc
> people can help me verify that I did this patch right.  Once I get
> some feedback on this patch, I'll send out the whole pile.  And once
> *that's* done, I'll start giving the mm lazy stuff some serious thought.
> 
> The x86 part is already fixed in Linus' tree.
> 
> Thanks,
> Andy
> 
>  arch/arm64/include/asm/sync_core.h   | 21 +++++++++++++++++++++
>  arch/powerpc/include/asm/sync_core.h | 20 ++++++++++++++++++++
>  arch/x86/Kconfig                     |  1 -
>  arch/x86/include/asm/sync_core.h     |  7 +++----
>  include/linux/sched/mm.h             |  1 -
>  include/linux/sync_core.h            | 21 ---------------------
>  init/Kconfig                         |  3 ---
>  kernel/sched/membarrier.c            | 15 +++++++++++----
>  8 files changed, 55 insertions(+), 34 deletions(-)
>  create mode 100644 arch/arm64/include/asm/sync_core.h
>  create mode 100644 arch/powerpc/include/asm/sync_core.h
>  delete mode 100644 include/linux/sync_core.h
> 
> diff --git a/arch/arm64/include/asm/sync_core.h b/arch/arm64/include/asm/sync_core.h
> new file mode 100644
> index 000000000000..5be4531caabd
> --- /dev/null
> +++ b/arch/arm64/include/asm/sync_core.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_ARM64_SYNC_CORE_H
> +#define _ASM_ARM64_SYNC_CORE_H
> +
> +#include <asm/barrier.h>
> +
> +/*
> + * Ensure that the CPU notices any instruction changes before the next time
> + * it returns to usermode.
> + */
> +static inline void membarrier_sync_core_before_usermode(void)
> +{
> +	/*
> +	 * XXX: is this enough or do we need a DMB first to make sure that
> +	 * writes from other CPUs become visible to this CPU?  We have an
> +	 * smp_mb() already, but that's not quite the same thing.
> +	 */
> +	isb();
> +}
> +
> +#endif /* _ASM_ARM64_SYNC_CORE_H */
> diff --git a/arch/powerpc/include/asm/sync_core.h b/arch/powerpc/include/asm/sync_core.h
> new file mode 100644
> index 000000000000..71dfbe7794e5
> --- /dev/null
> +++ b/arch/powerpc/include/asm/sync_core.h
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_SYNC_CORE_H
> +#define _ASM_POWERPC_SYNC_CORE_H
> +
> +#include <asm/barrier.h>
> +
> +/*
> + * Ensure that the CPU notices any instruction changes before the next time
> + * it returns to usermode.
> + */
> +static inline void membarrier_sync_core_before_usermode(void)
> +{
> +	/*
> +	 * XXX: I know basically nothing about powerpc cache management.
> +	 * Is this correct?
> +	 */
> +	isync();

This is not about memory ordering or cache management, it's about 
pipeline management. Powerpc's return to user mode serializes the
CPU (aka the hardware thread, _not_ the core; another wrongness of
the name, but AFAIKS the HW thread is what is required for
membarrier). So this is wrong, powerpc needs nothing here.

Thanks,
Nick


^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Mathieu Desnoyers @ 2020-12-28 21:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linuxppc-dev, Catalin Marinas, Arnd Bergmann, paulmck, Jann Horn,
	Peter Zijlstra, x86, Russell King, ARM Linux, Nicholas Piggin,
	linux-kernel, Paul Mackerras, stable, Will Deacon,
	linux-arm-kernel
In-Reply-To: <CALCETrXx3Xe+4Y6WM-mp0cTUU=r3bW6PV2b25yA8bm1Gvak6wQ@mail.gmail.com>

----- On Dec 28, 2020, at 4:06 PM, Andy Lutomirski luto@kernel.org wrote:

> On Mon, Dec 28, 2020 at 12:32 PM Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> ----- On Dec 28, 2020, at 2:44 PM, Andy Lutomirski luto@kernel.org wrote:
>>
>> > On Mon, Dec 28, 2020 at 11:09 AM Russell King - ARM Linux admin
>> > <linux@armlinux.org.uk> wrote:
>> >>
>> >> On Mon, Dec 28, 2020 at 07:29:34PM +0100, Jann Horn wrote:
>> >> > After chatting with rmk about this (but without claiming that any of
>> >> > this is his opinion), based on the manpage, I think membarrier()
>> >> > currently doesn't really claim to be synchronizing caches? It just
>> >> > serializes cores. So arguably if userspace wants to use membarrier()
>> >> > to synchronize code changes, userspace should first do the code
>> >> > change, then flush icache as appropriate for the architecture, and
>> >> > then do the membarrier() to ensure that the old code is unused?
>>
>> ^ exactly, yes.
>>
>> >> >
>> >> > For 32-bit arm, rmk pointed out that that would be the cacheflush()
>> >> > syscall. That might cause you to end up with two IPIs instead of one
>> >> > in total, but we probably don't care _that_ much about extra IPIs on
>> >> > 32-bit arm?
>>
>> This was the original thinking, yes. The cacheflush IPI will flush specific
>> regions of code, and the membarrier IPI issues context synchronizing
>> instructions.
>>
>> Architectures with coherent i/d caches don't need the cacheflush step.
> 
> There are different levels of coherency -- VIVT architectures may have
> differing requirements compared to PIPT, etc.
> 
> In any case, I feel like the approach taken by the documentation is
> fundamentally confusing.  Architectures don't all speak the same
> language

Agreed.

> How about something like:

I dislike the wording "barrier" and the association between "write" and
"instruction fetch" done in the descriptions below. It leads to think that
this behaves like a memory barrier, when in fact my understanding of
a context synchronizing instruction is that it simply flushes internal
CPU state, which would cause coherency issues if the CPU observes both the
old and then the new code without having this state flushed.

[ Sorry if I take more time to reply and if my replies are a bit more
  concise than usual. I'm currently on parental leave, so I have
  non-maskable interrupts to attend to. ;-) ]

Thanks,

Mathieu

> 
> The SYNC_CORE operation causes all threads in the caller's address
> space (including the caller) to execute an architecture-defined
> barrier operation.  membarrier() will ensure that this barrier is
> executed at a time such that all data writes done by the calling
> thread before membarrier() are made visible by the barrier.
> Additional architecture-dependent cache management operations may be
> required to use this for JIT code.
> 
> x86: SYNC_CORE executes a barrier that will cause subsequent
> instruction fetches to observe prior writes.  Currently this will be a
> "serializing" instruction, but, if future improved CPU documentation
> becomes available and relaxes this requirement, the barrier may
> change.  The kernel guarantees that writing new or modified
> instructions to normal memory (and issuing SFENCE if the writes were
> non-temporal) then doing a membarrier SYNC_CORE operation is
> sufficient to cause all threads in the caller's address space to
> execute the new or modified instructions.  This is true regardless of
> whether or not those instructions are written at the same virtual
> address from which they are subsequently executed.  No additional
> cache management is required on x86.
> 
> arm: Something about the cache management syscalls.
> 
> arm64: Ditto
> 
> powerpc: I have no idea.

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox