LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] linux: configure CONFIG_I2C_OPAL as in-built.
From: Joel Stanley @ 2020-09-29  6:14 UTC (permalink / raw)
  To: Mimi Zohar, linuxppc-dev; +Cc: openpower-firmware, Nayna Jain, klaus
In-Reply-To: <8dc1ad002dcdc02122725dcc3ba27e1fd8c78b78.camel@linux.ibm.com>

On Fri, 25 Sep 2020 at 18:19, Mimi Zohar <zohar@linux.ibm.com> wrote:
>
> Hi Nayna,
>
> On Wed, 2020-09-23 at 14:25 -0400, Nayna Jain wrote:
> > Currently, skiroot_defconfig CONFIG_I2C_OPAL is built as a loadable
> > module rather than builtin, even if CONFIG_I2C=y is defined. This
> > results in a delay in the TPM initialization, causing IMA to go into
> > TPM bypass mode. As a result, the IMA measurements are added to the
> > measurement list, but do not extend the TPM. Because of this, it is
> > impossible to verify or attest to the system's integrity, either from
> > skiroot or the target Host OS.
>
> The patch description is good, but perhaps we could provide a bit more
> context before.
>
> The concept of trusted boot requires the measurement to be added to the
> measurement list and extend the TPM, prior to allowing access to the
> file. By allowing access to a file before its measurement is included
> in the measurement list and extended into the TPM PCR, a malicious file
> could potentially prevent its own measurement from being added. As the
> PCRs are tamper proof, measuring and extending the TPM prior to giving
> access to the file, guarantees that all file measurements are included
> in the measurement list, including the malicious file.
>
> IMA needs to be enabled before any files are accessed in order to
> verify a file's integrity and extend the TPM with the file
> measurement.  Queueing file measurements breaks the measure and extend,
> before usage, trusted boot paradigm.
>
> The ima-evm-utils package includes a test for walking the IMA
> measurement list, calculating the expected TPM PCRs, and comparing the
> calculated PCR values with the physical TPM.  Testing is important to
> ensure the TPM is initialized prior to IMA.  Failure to validate the
> IMA measurement list may indicate IMA went into TPM bypass mode, like
> in this case.

Thanks for the explanation Mimi. It's lucky that the TPM drivers can
be loaded early enough!

Should we add something like this to security/integrity/ima/Kconfig?

select I2C_OPAL if PPC_POWERNV

It's generally frowned upon to select user visible symbols, but IMA
does this for the TCG options already.

Cheers,

Joel

>
> thanks,
>
> Mimi
>
> >
> > Reported-by: Mimi Zohar <zohar@linux.ibm.com>
> > Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
> > ---
> >  openpower/configs/linux/skiroot_defconfig | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/openpower/configs/linux/skiroot_defconfig b/openpower/configs/linux/skiroot_defconfig
> > index 44309e12..a555adb2 100644
> > --- a/openpower/configs/linux/skiroot_defconfig
> > +++ b/openpower/configs/linux/skiroot_defconfig
> > @@ -216,7 +216,7 @@ CONFIG_I2C=y
> >  CONFIG_I2C_CHARDEV=y
> >  # CONFIG_I2C_HELPER_AUTO is not set
> >  CONFIG_I2C_ALGOBIT=y
> > -CONFIG_I2C_OPAL=m
> > +CONFIG_I2C_OPAL=y
> >  CONFIG_PPS=y
> >  CONFIG_SENSORS_IBMPOWERNV=m
> >  CONFIG_DRM=m
>
>

^ permalink raw reply

* [PATCH v2 6/7] powerpc: Tidy up a bit after removal of PowerPC 601.
From: Christophe Leroy @ 2020-09-29  6:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <27951fa6c9a8f80724d1bc81a6117ac32343a55d.1601359702.git.christophe.leroy@csgroup.eu>

The removal of the 601 left some standalone blocks from
former if/else. Drop the { } and re-indent.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/btext.c    | 11 +++------
 arch/powerpc/mm/book3s32/mmu.c | 45 +++++++++++++++-------------------
 2 files changed, 24 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/btext.c b/arch/powerpc/kernel/btext.c
index b609fb39dba8..c22a8e0dbc93 100644
--- a/arch/powerpc/kernel/btext.c
+++ b/arch/powerpc/kernel/btext.c
@@ -95,13 +95,10 @@ void __init btext_prepare_BAT(void)
 		boot_text_mapped = 0;
 		return;
 	}
-	{
-		/* 603, 604, G3, G4, ... */
-		lowbits = addr & ~0xFF000000UL;
-		addr &= 0xFF000000UL;
-		disp_BAT[0] = vaddr | (BL_16M<<2) | 2;
-		disp_BAT[1] = addr | (_PAGE_NO_CACHE | _PAGE_GUARDED | BPP_RW);	
-	}
+	lowbits = addr & ~0xFF000000UL;
+	addr &= 0xFF000000UL;
+	disp_BAT[0] = vaddr | (BL_16M<<2) | 2;
+	disp_BAT[1] = addr | (_PAGE_NO_CACHE | _PAGE_GUARDED | BPP_RW);
 	logicalDisplayBase = (void *) (vaddr + lowbits);
 }
 #endif
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 771d607f1a3d..741e4fc990c7 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -73,16 +73,13 @@ unsigned long p_block_mapped(phys_addr_t pa)
 static int find_free_bat(void)
 {
 	int b;
+	int n = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4;
 
-	{
-		int n = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4;
+	for (b = 0; b < n; b++) {
+		struct ppc_bat *bat = BATS[b];
 
-		for (b = 0; b < n; b++) {
-			struct ppc_bat *bat = BATS[b];
-
-			if (!(bat[1].batu & 3))
-				return b;
-		}
+		if (!(bat[1].batu & 3))
+			return b;
 	}
 	return -1;
 }
@@ -280,24 +277,22 @@ void __init setbat(int index, unsigned long virt, phys_addr_t phys,
 		flags &= ~_PAGE_COHERENT;
 
 	bl = (size >> 17) - 1;
-	{
-		/* Do DBAT first */
-		wimgxpp = flags & (_PAGE_WRITETHRU | _PAGE_NO_CACHE
-				   | _PAGE_COHERENT | _PAGE_GUARDED);
-		wimgxpp |= (flags & _PAGE_RW)? BPP_RW: BPP_RX;
-		bat[1].batu = virt | (bl << 2) | 2; /* Vs=1, Vp=0 */
-		bat[1].batl = BAT_PHYS_ADDR(phys) | wimgxpp;
-		if (flags & _PAGE_USER)
-			bat[1].batu |= 1; 	/* Vp = 1 */
-		if (flags & _PAGE_GUARDED) {
-			/* G bit must be zero in IBATs */
-			flags &= ~_PAGE_EXEC;
-		}
-		if (flags & _PAGE_EXEC)
-			bat[0] = bat[1];
-		else
-			bat[0].batu = bat[0].batl = 0;
+	/* Do DBAT first */
+	wimgxpp = flags & (_PAGE_WRITETHRU | _PAGE_NO_CACHE
+			   | _PAGE_COHERENT | _PAGE_GUARDED);
+	wimgxpp |= (flags & _PAGE_RW)? BPP_RW: BPP_RX;
+	bat[1].batu = virt | (bl << 2) | 2; /* Vs=1, Vp=0 */
+	bat[1].batl = BAT_PHYS_ADDR(phys) | wimgxpp;
+	if (flags & _PAGE_USER)
+		bat[1].batu |= 1; 	/* Vp = 1 */
+	if (flags & _PAGE_GUARDED) {
+		/* G bit must be zero in IBATs */
+		flags &= ~_PAGE_EXEC;
 	}
+	if (flags & _PAGE_EXEC)
+		bat[0] = bat[1];
+	else
+		bat[0].batu = bat[0].batl = 0;
 
 	bat_addrs[index].start = virt;
 	bat_addrs[index].limit = virt + ((bl + 1) << 17) - 1;
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 7/7] powerpc: Remove __USE_RTC()
From: Christophe Leroy @ 2020-09-29  6:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <27951fa6c9a8f80724d1bc81a6117ac32343a55d.1601359702.git.christophe.leroy@csgroup.eu>

Now that PowerPC 601 is gone, __USE_RTC() is never true.

Remove it.

That also leads to removing get_rtc() and get_rtcl()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Also remove get_rtc() and get_rtcl()
---
 arch/powerpc/include/asm/time.h | 28 +-----------------
 arch/powerpc/kernel/time.c      | 52 +++++----------------------------
 2 files changed, 9 insertions(+), 71 deletions(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index ce065589192a..caf68a4bc19e 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -38,9 +38,6 @@ struct div_result {
 	u64 result_low;
 };
 
-/* Accessor functions for the timebase (RTC on 601) registers. */
-#define __USE_RTC()	(0)
-
 #ifdef CONFIG_PPC64
 
 /* For compatibility, get_tbl() is defined as get_tb() on ppc64 */
@@ -59,25 +56,6 @@ static inline unsigned int get_tbu(void)
 }
 #endif /* !CONFIG_PPC64 */
 
-static inline unsigned int get_rtcl(void)
-{
-	unsigned int rtcl;
-
-	asm volatile("mfrtcl %0" : "=r" (rtcl));
-	return rtcl;
-}
-
-static inline u64 get_rtc(void)
-{
-	unsigned int hi, lo, hi2;
-
-	do {
-		asm volatile("mfrtcu %0; mfrtcl %1; mfrtcu %2"
-			     : "=r" (hi), "=r" (lo), "=r" (hi2));
-	} while (hi2 != hi);
-	return (u64)hi * 1000000000 + lo;
-}
-
 static inline u64 get_vtb(void)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
@@ -109,7 +87,7 @@ static inline u64 get_tb(void)
 
 static inline u64 get_tb_or_rtc(void)
 {
-	return __USE_RTC() ? get_rtc() : get_tb();
+	return get_tb();
 }
 
 static inline void set_tb(unsigned int upper, unsigned int lower)
@@ -153,10 +131,6 @@ static inline void set_dec(u64 val)
 
 static inline unsigned long tb_ticks_since(unsigned long tstamp)
 {
-	if (__USE_RTC()) {
-		int delta = get_rtcl() - (unsigned int) tstamp;
-		return delta < 0 ? delta + 1000000000 : delta;
-	}
 	return get_tbl() - tstamp;
 }
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index f85539ebb513..13c820c15d37 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -75,15 +75,6 @@
 #include <linux/clockchips.h>
 #include <linux/timekeeper_internal.h>
 
-static u64 rtc_read(struct clocksource *);
-static struct clocksource clocksource_rtc = {
-	.name         = "rtc",
-	.rating       = 400,
-	.flags        = CLOCK_SOURCE_IS_CONTINUOUS,
-	.mask         = CLOCKSOURCE_MASK(64),
-	.read         = rtc_read,
-};
-
 static u64 timebase_read(struct clocksource *);
 static struct clocksource clocksource_timebase = {
 	.name         = "timebase",
@@ -447,19 +438,9 @@ void vtime_flush(struct task_struct *tsk)
 void __delay(unsigned long loops)
 {
 	unsigned long start;
-	int diff;
 
 	spin_begin();
-	if (__USE_RTC()) {
-		start = get_rtcl();
-		do {
-			/* the RTCL register wraps at 1000000000 */
-			diff = get_rtcl() - start;
-			if (diff < 0)
-				diff += 1000000000;
-			spin_cpu_relax();
-		} while (diff < loops);
-	} else if (tb_invalid) {
+	if (tb_invalid) {
 		/*
 		 * TB is in error state and isn't ticking anymore.
 		 * HMI handler was unable to recover from TB error.
@@ -696,8 +677,6 @@ EXPORT_SYMBOL_GPL(tb_to_ns);
  */
 notrace unsigned long long sched_clock(void)
 {
-	if (__USE_RTC())
-		return get_rtc();
 	return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
 }
 
@@ -847,11 +826,6 @@ void read_persistent_clock64(struct timespec64 *ts)
 }
 
 /* clocksource code */
-static notrace u64 rtc_read(struct clocksource *cs)
-{
-	return (u64)get_rtc();
-}
-
 static notrace u64 timebase_read(struct clocksource *cs)
 {
 	return (u64)get_tb();
@@ -948,12 +922,7 @@ void update_vsyscall_tz(void)
 
 static void __init clocksource_init(void)
 {
-	struct clocksource *clock;
-
-	if (__USE_RTC())
-		clock = &clocksource_rtc;
-	else
-		clock = &clocksource_timebase;
+	struct clocksource *clock = &clocksource_timebase;
 
 	if (clocksource_register_hz(clock, tb_ticks_per_sec)) {
 		printk(KERN_ERR "clocksource: %s is already registered\n",
@@ -1071,17 +1040,12 @@ void __init time_init(void)
 	u64 scale;
 	unsigned shift;
 
-	if (__USE_RTC()) {
-		/* 601 processor: dec counts down by 128 every 128ns */
-		ppc_tb_freq = 1000000000;
-	} else {
-		/* Normal PowerPC with timebase register */
-		ppc_md.calibrate_decr();
-		printk(KERN_DEBUG "time_init: decrementer frequency = %lu.%.6lu MHz\n",
-		       ppc_tb_freq / 1000000, ppc_tb_freq % 1000000);
-		printk(KERN_DEBUG "time_init: processor frequency   = %lu.%.6lu MHz\n",
-		       ppc_proc_freq / 1000000, ppc_proc_freq % 1000000);
-	}
+	/* Normal PowerPC with timebase register */
+	ppc_md.calibrate_decr();
+	printk(KERN_DEBUG "time_init: decrementer frequency = %lu.%.6lu MHz\n",
+	       ppc_tb_freq / 1000000, ppc_tb_freq % 1000000);
+	printk(KERN_DEBUG "time_init: processor frequency   = %lu.%.6lu MHz\n",
+	       ppc_proc_freq / 1000000, ppc_proc_freq % 1000000);
 
 	tb_ticks_per_jiffy = ppc_tb_freq / HZ;
 	tb_ticks_per_sec = ppc_tb_freq;
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 2/7] powerpc: Remove CONFIG_PPC601_SYNC_FIX
From: Christophe Leroy @ 2020-09-29  6:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <27951fa6c9a8f80724d1bc81a6117ac32343a55d.1601359702.git.christophe.leroy@csgroup.eu>

This config option isn't in any defconfig.

The very first versions of Powerpc 601 have a bug which
requires additional sync before and/or after some instructions.

This was more than 25 years ago and time has come to retire
those buggy versions of the 601 from the kernel.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/ppc_asm.h |  6 ------
 arch/powerpc/platforms/Kconfig     | 15 ---------------
 2 files changed, 21 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index b4cc6608131c..0b9dc814b81c 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -382,15 +382,9 @@ GLUE(.,name):
 #endif
 
 /* various errata or part fixups */
-#ifdef CONFIG_PPC601_SYNC_FIX
-#define SYNC		sync; isync
-#define SYNC_601	sync
-#define ISYNC_601	isync
-#else
 #define	SYNC
 #define SYNC_601
 #define ISYNC_601
-#endif
 
 #if defined(CONFIG_PPC_CELL) || defined(CONFIG_PPC_FSL_BOOK3E)
 #define MFTB(dest)			\
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index b439b027a42f..7a5e8f4541e3 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -199,21 +199,6 @@ source "drivers/cpuidle/Kconfig"
 
 endmenu
 
-config PPC601_SYNC_FIX
-	bool "Workarounds for PPC601 bugs"
-	depends on PPC_BOOK3S_601 && PPC_PMAC
-	default y
-	help
-	  Some versions of the PPC601 (the first PowerPC chip) have bugs which
-	  mean that extra synchronization instructions are required near
-	  certain instructions, typically those that make major changes to the
-	  CPU state.  These extra instructions reduce performance slightly.
-	  If you say N here, these extra instructions will not be included,
-	  resulting in a kernel which will run faster but may not run at all
-	  on some systems with the PPC601 chip.
-
-	  If in doubt, say Y here.
-
 config TAU
 	bool "On-chip CPU temperature sensor support"
 	depends on PPC_BOOK3S_32
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 5/7] powerpc: Remove support for PowerPC 601
From: Christophe Leroy @ 2020-09-29  6:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <27951fa6c9a8f80724d1bc81a6117ac32343a55d.1601359702.git.christophe.leroy@csgroup.eu>

PowerPC 601 has been retired.

Remove all associated specific code.

CPU_FTRS_PPC601 has CPU_FTR_COHERENT_ICACHE and CPU_FTR_COMMON.

CPU_FTR_COMMON is already present via other CPU_FTRS.
None of the remaining CPU selects CPU_FTR_COHERENT_ICACHE.

So CPU_FTRS_PPC601 can be removed from the possible features,
hence can be removed completely.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/boot/util.S                | 15 +------
 arch/powerpc/include/asm/cputable.h     | 12 ++---
 arch/powerpc/include/asm/ppc_asm.h      |  3 +-
 arch/powerpc/include/asm/ptrace.h       |  4 --
 arch/powerpc/include/asm/time.h         |  2 +-
 arch/powerpc/include/asm/timex.h        |  3 --
 arch/powerpc/kernel/btext.c             |  8 +---
 arch/powerpc/kernel/entry_32.S          | 18 --------
 arch/powerpc/kernel/head_32.S           | 44 ++----------------
 arch/powerpc/kernel/setup_32.c          |  2 +-
 arch/powerpc/kernel/traps.c             |  4 --
 arch/powerpc/kernel/vdso32/datapage.S   |  2 -
 arch/powerpc/kernel/vdso32/vdso32.lds.S |  2 -
 arch/powerpc/mm/book3s32/mmu.c          | 39 +++-------------
 arch/powerpc/mm/ptdump/bats.c           | 59 -------------------------
 arch/powerpc/platforms/powermac/setup.c |  2 +-
 arch/powerpc/platforms/powermac/smp.c   |  4 --
 17 files changed, 17 insertions(+), 206 deletions(-)

diff --git a/arch/powerpc/boot/util.S b/arch/powerpc/boot/util.S
index f11f0589a669..d03cdb7606dc 100644
--- a/arch/powerpc/boot/util.S
+++ b/arch/powerpc/boot/util.S
@@ -18,7 +18,7 @@
 
 	.text
 
-/* udelay (on non-601 processors) needs to know the period of the
+/* udelay needs to know the period of the
  * timebase in nanoseconds.  This used to be hardcoded to be 60ns
  * (period of 66MHz/4).  Now a variable is used that is initialized to
  * 60 for backward compatibility, but it can be overridden as necessary
@@ -37,19 +37,6 @@ timebase_period_ns:
  */
 	.globl	udelay
 udelay:
-	mfspr	r4,SPRN_PVR
-	srwi	r4,r4,16
-	cmpwi	0,r4,1		/* 601 ? */
-	bne	.Ludelay_not_601
-00:	li	r0,86	/* Instructions / microsecond? */
-	mtctr	r0
-10:	addi	r0,r0,0 /* NOP */
-	bdnz	10b
-	subic.	r3,r3,1
-	bne	00b
-	blr
-
-.Ludelay_not_601:
 	mulli	r4,r3,1000	/* nanoseconds */
 	/*  Change r4 to be the number of ticks using:
 	 *	(nanoseconds + (timebase_period_ns - 1 )) / timebase_period_ns
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 8ca5885bd5b9..0d10ac3328ca 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -300,8 +300,6 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_MAYBE_CAN_NAP	0
 #endif
 
-#define CPU_FTRS_PPC601	(CPU_FTR_COMMON | \
-	CPU_FTR_COHERENT_ICACHE)
 #define CPU_FTRS_603	(CPU_FTR_COMMON | CPU_FTR_MAYBE_CAN_DOZE | \
 	    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_PPC_LE | CPU_FTR_NOEXECUTE)
 #define CPU_FTRS_604	(CPU_FTR_COMMON | CPU_FTR_PPC_LE)
@@ -517,10 +515,8 @@ static inline void cpu_feature_keys_init(void) { }
 #else
 enum {
 	CPU_FTRS_POSSIBLE =
-#ifdef CONFIG_PPC_BOOK3S_601
-	    CPU_FTRS_PPC601 |
-#elif defined(CONFIG_PPC_BOOK3S_32)
-	    CPU_FTRS_PPC601 | CPU_FTRS_603 | CPU_FTRS_604 | CPU_FTRS_740_NOTAU |
+#ifdef CONFIG_PPC_BOOK3S_32
+	    CPU_FTRS_603 | CPU_FTRS_604 | CPU_FTRS_740_NOTAU |
 	    CPU_FTRS_740 | CPU_FTRS_750 | CPU_FTRS_750FX1 |
 	    CPU_FTRS_750FX2 | CPU_FTRS_750FX | CPU_FTRS_750GX |
 	    CPU_FTRS_7400_NOTAU | CPU_FTRS_7400 | CPU_FTRS_7450_20 |
@@ -595,9 +591,7 @@ enum {
 #else
 enum {
 	CPU_FTRS_ALWAYS =
-#ifdef CONFIG_PPC_BOOK3S_601
-	    CPU_FTRS_PPC601 &
-#elif defined(CONFIG_PPC_BOOK3S_32)
+#ifdef CONFIG_PPC_BOOK3S_32
 	    CPU_FTRS_603 & CPU_FTRS_604 & CPU_FTRS_740_NOTAU &
 	    CPU_FTRS_740 & CPU_FTRS_750 & CPU_FTRS_750FX1 &
 	    CPU_FTRS_750FX2 & CPU_FTRS_750FX & CPU_FTRS_750GX &
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 67a421b81a50..511786f0e40d 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -401,8 +401,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96)
 #define MFTBU(dest)			mfspr dest, SPRN_TBRU
 #endif
 
-/* tlbsync is not implemented on 601 */
-#if !defined(CONFIG_SMP) || defined(CONFIG_PPC_BOOK3S_601)
+#ifndef CONFIG_SMP
 #define TLBSYNC
 #else
 #define TLBSYNC		tlbsync; sync
diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index 155a197c0aa1..e2c778c176a3 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -243,11 +243,7 @@ static inline void set_trap_norestart(struct pt_regs *regs)
 }
 
 #define arch_has_single_step()	(1)
-#ifndef CONFIG_PPC_BOOK3S_601
 #define arch_has_block_step()	(true)
-#else
-#define arch_has_block_step()	(false)
-#endif
 #define ARCH_HAS_USER_SINGLE_STEP_REPORT
 
 /*
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index cb326720a8a1..ce065589192a 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -39,7 +39,7 @@ struct div_result {
 };
 
 /* Accessor functions for the timebase (RTC on 601) registers. */
-#define __USE_RTC()	(IS_ENABLED(CONFIG_PPC_BOOK3S_601))
+#define __USE_RTC()	(0)
 
 #ifdef CONFIG_PPC64
 
diff --git a/arch/powerpc/include/asm/timex.h b/arch/powerpc/include/asm/timex.h
index 6047402b0a4d..95988870a57b 100644
--- a/arch/powerpc/include/asm/timex.h
+++ b/arch/powerpc/include/asm/timex.h
@@ -17,9 +17,6 @@ typedef unsigned long cycles_t;
 
 static inline cycles_t get_cycles(void)
 {
-	if (IS_ENABLED(CONFIG_PPC_BOOK3S_601))
-		return 0;
-
 	return mftb();
 }
 
diff --git a/arch/powerpc/kernel/btext.c b/arch/powerpc/kernel/btext.c
index 02300edc6989..b609fb39dba8 100644
--- a/arch/powerpc/kernel/btext.c
+++ b/arch/powerpc/kernel/btext.c
@@ -95,18 +95,12 @@ void __init btext_prepare_BAT(void)
 		boot_text_mapped = 0;
 		return;
 	}
-	if (PVR_VER(mfspr(SPRN_PVR)) != 1) {
+	{
 		/* 603, 604, G3, G4, ... */
 		lowbits = addr & ~0xFF000000UL;
 		addr &= 0xFF000000UL;
 		disp_BAT[0] = vaddr | (BL_16M<<2) | 2;
 		disp_BAT[1] = addr | (_PAGE_NO_CACHE | _PAGE_GUARDED | BPP_RW);	
-	} else {
-		/* 601 */
-		lowbits = addr & ~0xFF800000UL;
-		addr &= 0xFF800000UL;
-		disp_BAT[0] = vaddr | (_PAGE_NO_CACHE | PP_RWXX) | 4;
-		disp_BAT[1] = addr | BL_8M | 0x40;
 	}
 	logicalDisplayBase = (void *) (vaddr + lowbits);
 }
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index f25ea188ecd3..8cdc8bcde703 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -811,19 +811,11 @@ _ASM_NOKPROBE_SYMBOL(fast_exception_return)
 1:	lis	r3,exc_exit_restart_end@ha
 	addi	r3,r3,exc_exit_restart_end@l
 	cmplw	r12,r3
-#ifdef CONFIG_PPC_BOOK3S_601
-	bge	2b
-#else
 	bge	3f
-#endif
 	lis	r4,exc_exit_restart@ha
 	addi	r4,r4,exc_exit_restart@l
 	cmplw	r12,r4
-#ifdef CONFIG_PPC_BOOK3S_601
-	blt	2b
-#else
 	blt	3f
-#endif
 	lis	r3,fee_restarts@ha
 	tophys(r3,r3)
 	lwz	r5,fee_restarts@l(r3)
@@ -840,7 +832,6 @@ fee_restarts:
 
 /* aargh, a nonrecoverable interrupt, panic */
 /* aargh, we don't know which trap this is */
-/* but the 601 doesn't implement the RI bit, so assume it's OK */
 3:
 	li	r10,-1
 	stw	r10,_TRAP(r11)
@@ -1302,19 +1293,11 @@ nonrecoverable:
 	lis	r10,exc_exit_restart_end@ha
 	addi	r10,r10,exc_exit_restart_end@l
 	cmplw	r12,r10
-#ifdef CONFIG_PPC_BOOK3S_601
-	bgelr
-#else
 	bge	3f
-#endif
 	lis	r11,exc_exit_restart@ha
 	addi	r11,r11,exc_exit_restart@l
 	cmplw	r12,r11
-#ifdef CONFIG_PPC_BOOK3S_601
-	bltlr
-#else
 	blt	3f
-#endif
 	lis	r10,ee_restarts@ha
 	lwz	r12,ee_restarts@l(r10)
 	addi	r12,r12,1
@@ -1322,7 +1305,6 @@ nonrecoverable:
 	mr	r12,r11		/* restart at exc_exit_restart */
 	blr
 3:	/* OK, we can't recover, kill this process */
-	/* but the 601 doesn't implement the RI bit, so assume it's OK */
 	lwz	r3,_TRAP(r1)
 	andi.	r0,r3,1
 	beq	5f
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 48cde60334a2..b14524d4534c 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -34,16 +34,6 @@
 
 #include "head_32.h"
 
-/* 601 only have IBAT */
-#ifdef CONFIG_PPC_BOOK3S_601
-#define LOAD_BAT(n, reg, RA, RB)	\
-	li	RA,0;			\
-	mtspr	SPRN_IBAT##n##U,RA;	\
-	lwz	RA,(n*16)+0(reg);	\
-	lwz	RB,(n*16)+4(reg);	\
-	mtspr	SPRN_IBAT##n##U,RA;	\
-	mtspr	SPRN_IBAT##n##L,RB
-#else
 #define LOAD_BAT(n, reg, RA, RB)	\
 	/* see the comment for clear_bats() -- Cort */ \
 	li	RA,0;			\
@@ -57,7 +47,6 @@
 	lwz	RB,(n*16)+12(reg);	\
 	mtspr	SPRN_DBAT##n##U,RA;	\
 	mtspr	SPRN_DBAT##n##L,RB
-#endif
 
 	__HEAD
 	.stabs	"arch/powerpc/kernel/",N_SO,0,0,0f
@@ -432,7 +421,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
 SystemCall:
 	SYSCALL_ENTRY	0xc00
 
-/* Single step - not used on 601 */
 	EXCEPTION(0xd00, SingleStep, single_step_exception, EXC_XFER_STD)
 	EXCEPTION(0xe00, Trap_0e, unknown_exception, EXC_XFER_STD)
 
@@ -974,8 +962,7 @@ load_up_mmu:
 	lwz	r6,_SDR1@l(r6)
 	mtspr	SPRN_SDR1,r6
 
-/* Load the BAT registers with the values set up by MMU_init.
-   MMU_init takes care of whether we're on a 601 or not. */
+/* Load the BAT registers with the values set up by MMU_init. */
 	lis	r3,BATS@ha
 	addi	r3,r3,BATS@l
 	tophys(r3,r3)
@@ -1152,7 +1139,6 @@ EXPORT_SYMBOL(switch_mmu_context)
 clear_bats:
 	li	r10,0
 
-#ifndef CONFIG_PPC_BOOK3S_601
 	mtspr	SPRN_DBAT0U,r10
 	mtspr	SPRN_DBAT0L,r10
 	mtspr	SPRN_DBAT1U,r10
@@ -1161,7 +1147,6 @@ clear_bats:
 	mtspr	SPRN_DBAT2L,r10
 	mtspr	SPRN_DBAT3U,r10
 	mtspr	SPRN_DBAT3L,r10
-#endif
 	mtspr	SPRN_IBAT0U,r10
 	mtspr	SPRN_IBAT0L,r10
 	mtspr	SPRN_IBAT1U,r10
@@ -1252,26 +1237,9 @@ mmu_off:
 	sync
 	RFI
 
-/*
- * On 601, we use 3 BATs to map up to 24M of RAM at _PAGE_OFFSET
- * (we keep one for debugging) and on others, we use one 256M BAT.
- */
+/* We use one BAT to map up to 256M of RAM at _PAGE_OFFSET */
 initial_bats:
 	lis	r11,PAGE_OFFSET@h
-#ifdef CONFIG_PPC_BOOK3S_601
-	ori	r11,r11,4		/* set up BAT registers for 601 */
-	li	r8,0x7f			/* valid, block length = 8MB */
-	mtspr	SPRN_IBAT0U,r11		/* N.B. 601 has valid bit in */
-	mtspr	SPRN_IBAT0L,r8		/* lower BAT register */
-	addis	r11,r11,0x800000@h
-	addis	r8,r8,0x800000@h
-	mtspr	SPRN_IBAT1U,r11
-	mtspr	SPRN_IBAT1L,r8
-	addis	r11,r11,0x800000@h
-	addis	r8,r8,0x800000@h
-	mtspr	SPRN_IBAT2U,r11
-	mtspr	SPRN_IBAT2L,r8
-#else
 	tophys(r8,r11)
 #ifdef CONFIG_SMP
 	ori	r8,r8,0x12		/* R/W access, M=1 */
@@ -1280,11 +1248,10 @@ initial_bats:
 #endif /* CONFIG_SMP */
 	ori	r11,r11,BL_256M<<2|0x2	/* set up BAT registers for 604 */
 
-	mtspr	SPRN_DBAT0L,r8		/* N.B. 6xx (not 601) have valid */
+	mtspr	SPRN_DBAT0L,r8		/* N.B. 6xx have valid */
 	mtspr	SPRN_DBAT0U,r11		/* bit in upper BAT register */
 	mtspr	SPRN_IBAT0L,r8
 	mtspr	SPRN_IBAT0U,r11
-#endif
 	isync
 	blr
 
@@ -1302,13 +1269,8 @@ setup_disp_bat:
 	beqlr
 	lwz	r11,0(r8)
 	lwz	r8,4(r8)
-#ifndef CONFIG_PPC_BOOK3S_601
 	mtspr	SPRN_DBAT3L,r8
 	mtspr	SPRN_DBAT3U,r11
-#else
-	mtspr	SPRN_IBAT3L,r8
-	mtspr	SPRN_IBAT3U,r11
-#endif
 	blr
 #endif /* CONFIG_BOOTX_TEXT */
 
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 1823706ae076..057d6b8e9bb0 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -223,6 +223,6 @@ __init void initialize_cache_info(void)
 	dcache_bsize = cur_cpu_spec->dcache_bsize;
 	icache_bsize = cur_cpu_spec->icache_bsize;
 	ucache_bsize = 0;
-	if (IS_ENABLED(CONFIG_PPC_BOOK3S_601) || IS_ENABLED(CONFIG_E200))
+	if (IS_ENABLED(CONFIG_E200))
 		ucache_bsize = icache_bsize = dcache_bsize;
 }
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index d1ebe152f210..c5f39f13e96e 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -529,9 +529,6 @@ void system_reset_exception(struct pt_regs *regs)
  * Check if the NIP corresponds to the address of a sync
  * instruction for which there is an entry in the exception
  * table.
- * Note that the 601 only takes a machine check on TEA
- * (transfer error ack) signal assertion, and does not
- * set any of the top 16 bits of SRR1.
  *  -- paulus.
  */
 static inline int check_io_access(struct pt_regs *regs)
@@ -796,7 +793,6 @@ int machine_check_generic(struct pt_regs *regs)
 	case 0x80000:
 		pr_cont("Machine check signal\n");
 		break;
-	case 0:		/* for 601 */
 	case 0x40000:
 	case 0x140000:	/* 7450 MSS error and TEA */
 		pr_cont("Transfer error ack signal\n");
diff --git a/arch/powerpc/kernel/vdso32/datapage.S b/arch/powerpc/kernel/vdso32/datapage.S
index 217bb630f8f9..1d23e2771dba 100644
--- a/arch/powerpc/kernel/vdso32/datapage.S
+++ b/arch/powerpc/kernel/vdso32/datapage.S
@@ -47,7 +47,6 @@ V_FUNCTION_END(__kernel_get_syscall_map)
  *
  * returns the timebase frequency in HZ
  */
-#ifndef CONFIG_PPC_BOOK3S_601
 V_FUNCTION_BEGIN(__kernel_get_tbfreq)
   .cfi_startproc
 	mflr	r12
@@ -60,4 +59,3 @@ V_FUNCTION_BEGIN(__kernel_get_tbfreq)
 	blr
   .cfi_endproc
 V_FUNCTION_END(__kernel_get_tbfreq)
-#endif
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 5206c2eb2a1d..7eadac74c7f9 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -144,13 +144,11 @@ VERSION
 		__kernel_datapage_offset;
 
 		__kernel_get_syscall_map;
-#ifndef CONFIG_PPC_BOOK3S_601
 		__kernel_gettimeofday;
 		__kernel_clock_gettime;
 		__kernel_clock_getres;
 		__kernel_time;
 		__kernel_get_tbfreq;
-#endif
 		__kernel_sync_dicache;
 		__kernel_sync_dicache_p5;
 		__kernel_sigtramp32;
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index d426eaf76bb0..771d607f1a3d 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -74,14 +74,7 @@ static int find_free_bat(void)
 {
 	int b;
 
-	if (IS_ENABLED(CONFIG_PPC_BOOK3S_601)) {
-		for (b = 0; b < 4; b++) {
-			struct ppc_bat *bat = BATS[b];
-
-			if (!(bat[0].batl & 0x40))
-				return b;
-		}
-	} else {
+	{
 		int n = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4;
 
 		for (b = 0; b < n; b++) {
@@ -97,7 +90,7 @@ static int find_free_bat(void)
 /*
  * This function calculates the size of the larger block usable to map the
  * beginning of an area based on the start address and size of that area:
- * - max block size is 8M on 601 and 256 on other 6xx.
+ * - max block size is 256 on 6xx.
  * - base address must be aligned to the block size. So the maximum block size
  *   is identified by the lowest bit set to 1 in the base address (for instance
  *   if base is 0x16000000, max size is 0x02000000).
@@ -106,7 +99,7 @@ static int find_free_bat(void)
  */
 static unsigned int block_size(unsigned long base, unsigned long top)
 {
-	unsigned int max_size = IS_ENABLED(CONFIG_PPC_BOOK3S_601) ? SZ_8M : SZ_256M;
+	unsigned int max_size = SZ_256M;
 	unsigned int base_shift = (ffs(base) - 1) & 31;
 	unsigned int block_shift = (fls(top - base) - 1) & 31;
 
@@ -117,7 +110,6 @@ static unsigned int block_size(unsigned long base, unsigned long top)
  * Set up one of the IBAT (block address translation) register pairs.
  * The parameters are not checked; in particular size must be a power
  * of 2 between 128k and 256M.
- * Only for 603+ ...
  */
 static void setibat(int index, unsigned long virt, phys_addr_t phys,
 		    unsigned int size, pgprot_t prot)
@@ -214,9 +206,6 @@ void mmu_mark_initmem_nx(void)
 	unsigned long border = (unsigned long)__init_begin - PAGE_OFFSET;
 	unsigned long size;
 
-	if (IS_ENABLED(CONFIG_PPC_BOOK3S_601))
-		return;
-
 	for (i = 0; i < nb - 1 && base < top && top - base > (128 << 10);) {
 		size = block_size(base, top);
 		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
@@ -253,9 +242,6 @@ void mmu_mark_rodata_ro(void)
 	int nb = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4;
 	int i;
 
-	if (IS_ENABLED(CONFIG_PPC_BOOK3S_601))
-		return;
-
 	for (i = 0; i < nb; i++) {
 		struct ppc_bat *bat = BATS[i];
 
@@ -294,8 +280,7 @@ void __init setbat(int index, unsigned long virt, phys_addr_t phys,
 		flags &= ~_PAGE_COHERENT;
 
 	bl = (size >> 17) - 1;
-	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_601)) {
-		/* 603, 604, etc. */
+	{
 		/* Do DBAT first */
 		wimgxpp = flags & (_PAGE_WRITETHRU | _PAGE_NO_CACHE
 				   | _PAGE_COHERENT | _PAGE_GUARDED);
@@ -312,16 +297,6 @@ void __init setbat(int index, unsigned long virt, phys_addr_t phys,
 			bat[0] = bat[1];
 		else
 			bat[0].batu = bat[0].batl = 0;
-	} else {
-		/* 601 cpu */
-		if (bl > BL_8M)
-			bl = BL_8M;
-		wimgxpp = flags & (_PAGE_WRITETHRU | _PAGE_NO_CACHE
-				   | _PAGE_COHERENT);
-		wimgxpp |= (flags & _PAGE_RW)?
-			((flags & _PAGE_USER)? PP_RWRW: PP_RWXX): PP_RXRX;
-		bat->batu = virt | wimgxpp | 4;	/* Ks=0, Ku=1 */
-		bat->batl = phys | bl | 0x40;	/* V=1 */
 	}
 
 	bat_addrs[index].start = virt;
@@ -474,11 +449,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	 */
 	BUG_ON(first_memblock_base != 0);
 
-	/* 601 can only access 16MB at the moment */
-	if (IS_ENABLED(CONFIG_PPC_BOOK3S_601))
-		memblock_set_current_limit(min_t(u64, first_memblock_size, 0x01000000));
-	else /* Anything else has 256M mapped */
-		memblock_set_current_limit(min_t(u64, first_memblock_size, 0x10000000));
+	memblock_set_current_limit(min_t(u64, first_memblock_size, SZ_256M));
 }
 
 void __init print_system_hash_info(void)
diff --git a/arch/powerpc/mm/ptdump/bats.c b/arch/powerpc/mm/ptdump/bats.c
index e29b338d499f..c4c628b03cf8 100644
--- a/arch/powerpc/mm/ptdump/bats.c
+++ b/arch/powerpc/mm/ptdump/bats.c
@@ -12,62 +12,6 @@
 
 #include "ptdump.h"
 
-static char *pp_601(int k, int pp)
-{
-	if (pp == 0)
-		return k ? "   " : "rwx";
-	if (pp == 1)
-		return k ? "r x" : "rwx";
-	if (pp == 2)
-		return "rwx";
-	return "r x";
-}
-
-static void bat_show_601(struct seq_file *m, int idx, u32 lower, u32 upper)
-{
-	u32 blpi = upper & 0xfffe0000;
-	u32 k = (upper >> 2) & 3;
-	u32 pp = upper & 3;
-	phys_addr_t pbn = PHYS_BAT_ADDR(lower);
-	u32 bsm = lower & 0x3ff;
-	u32 size = (bsm + 1) << 17;
-
-	seq_printf(m, "%d: ", idx);
-	if (!(lower & 0x40)) {
-		seq_puts(m, "        -\n");
-		return;
-	}
-
-	seq_printf(m, "0x%08x-0x%08x ", blpi, blpi + size - 1);
-#ifdef CONFIG_PHYS_64BIT
-	seq_printf(m, "0x%016llx ", pbn);
-#else
-	seq_printf(m, "0x%08x ", pbn);
-#endif
-	pt_dump_size(m, size);
-
-	seq_printf(m, "Kernel %s User %s", pp_601(k & 2, pp), pp_601(k & 1, pp));
-
-	seq_puts(m, lower & _PAGE_WRITETHRU ? "w " : "  ");
-	seq_puts(m, lower & _PAGE_NO_CACHE ? "i " : "  ");
-	seq_puts(m, lower & _PAGE_COHERENT ? "m " : "  ");
-	seq_puts(m, "\n");
-}
-
-#define BAT_SHOW_601(_m, _n, _l, _u) bat_show_601(_m, _n, mfspr(_l), mfspr(_u))
-
-static int bats_show_601(struct seq_file *m, void *v)
-{
-	seq_puts(m, "---[ Block Address Translation ]---\n");
-
-	BAT_SHOW_601(m, 0, SPRN_IBAT0L, SPRN_IBAT0U);
-	BAT_SHOW_601(m, 1, SPRN_IBAT1L, SPRN_IBAT1U);
-	BAT_SHOW_601(m, 2, SPRN_IBAT2L, SPRN_IBAT2U);
-	BAT_SHOW_601(m, 3, SPRN_IBAT3L, SPRN_IBAT3U);
-
-	return 0;
-}
-
 static void bat_show_603(struct seq_file *m, int idx, u32 lower, u32 upper, bool is_d)
 {
 	u32 bepi = upper & 0xfffe0000;
@@ -146,9 +90,6 @@ static int bats_show_603(struct seq_file *m, void *v)
 
 static int bats_open(struct inode *inode, struct file *file)
 {
-	if (IS_ENABLED(CONFIG_PPC_BOOK3S_601))
-		return single_open(file, bats_show_601, NULL);
-
 	return single_open(file, bats_show_603, NULL);
 }
 
diff --git a/arch/powerpc/platforms/powermac/setup.c b/arch/powerpc/platforms/powermac/setup.c
index f002b0fa69b8..2e2cc0c75d87 100644
--- a/arch/powerpc/platforms/powermac/setup.c
+++ b/arch/powerpc/platforms/powermac/setup.c
@@ -284,7 +284,7 @@ static void __init pmac_setup_arch(void)
 				/* 604, G3, G4 etc. */
 				loops_per_jiffy = *fp / HZ;
 			else
-				/* 601, 603, etc. */
+				/* 603, etc. */
 				loops_per_jiffy = *fp / (2 * HZ);
 			of_node_put(cpu);
 			break;
diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c
index a6fedcfb714f..74ebe664b016 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -270,10 +270,6 @@ static void __init smp_psurge_probe(void)
 	int i, ncpus;
 	struct device_node *dn;
 
-	/* We don't do SMP on the PPC601 -- paulus */
-	if (PVR_VER(mfspr(SPRN_PVR)) == 1)
-		return;
-
 	/*
 	 * The powersurge cpu board can be used in the generation
 	 * of powermacs that have a socket for an upgradeable cpu card,
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 4/7] powerpc: Remove PowerPC 601
From: Christophe Leroy @ 2020-09-29  6:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <27951fa6c9a8f80724d1bc81a6117ac32343a55d.1601359702.git.christophe.leroy@csgroup.eu>

Powerpc 601 is 25 years old.

It is not selected by any defconfig.

It requires a lot of special handling as it deviates from the
standard 6xx.

Retire it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/cputable.c         | 15 ---------------
 arch/powerpc/platforms/Kconfig.cputype | 11 ++---------
 2 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 2aa89c6b2896..1f7c3492f2ec 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -608,21 +608,6 @@ static struct cpu_spec __initdata cpu_specs[] = {
 #endif	/* CONFIG_PPC_BOOK3S_64 */
 
 #ifdef CONFIG_PPC32
-#ifdef CONFIG_PPC_BOOK3S_601
-	{	/* 601 */
-		.pvr_mask		= 0xffff0000,
-		.pvr_value		= 0x00010000,
-		.cpu_name		= "601",
-		.cpu_features		= CPU_FTRS_PPC601,
-		.cpu_user_features	= COMMON_USER | PPC_FEATURE_601_INSTR |
-			PPC_FEATURE_UNIFIED_CACHE | PPC_FEATURE_NO_TB,
-		.mmu_features		= MMU_FTR_HPTE_TABLE,
-		.icache_bsize		= 32,
-		.dcache_bsize		= 32,
-		.machine_check		= machine_check_generic,
-		.platform		= "ppc601",
-	},
-#endif /* CONFIG_PPC_BOOK3S_601 */
 #ifdef CONFIG_PPC_BOOK3S_6xx
 	{	/* 603 */
 		.pvr_mask		= 0xffff0000,
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index e74ec220b5d6..c194c4ae8bc7 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -20,7 +20,7 @@ choice
 	depends on PPC32
 	help
 	  There are five families of 32 bit PowerPC chips supported.
-	  The most common ones are the desktop and server CPUs (601, 603,
+	  The most common ones are the desktop and server CPUs (603,
 	  604, 740, 750, 74xx) CPUs from Freescale and IBM, with their
 	  embedded 512x/52xx/82xx/83xx/86xx counterparts.
 	  The other embedded parts, namely 4xx, 8xx, e200 (55xx) and e500
@@ -30,7 +30,7 @@ choice
 	  If unsure, select 52xx/6xx/7xx/74xx/82xx/83xx/86xx.
 
 config PPC_BOOK3S_6xx
-	bool "512x/52xx/6xx/7xx/74xx/82xx/83xx/86xx except 601"
+	bool "512x/52xx/6xx/7xx/74xx/82xx/83xx/86xx"
 	select PPC_BOOK3S_32
 	select PPC_FPU
 	select PPC_HAVE_PMU_SUPPORT
@@ -38,13 +38,6 @@ config PPC_BOOK3S_6xx
 	select PPC_HAVE_KUAP
 	select HAVE_ARCH_VMAP_STACK if !ADB_PMU
 
-config PPC_BOOK3S_601
-	bool "PowerPC 601"
-	select PPC_BOOK3S_32
-	select PPC_FPU
-	select PPC_HAVE_KUAP
-	select HAVE_ARCH_VMAP_STACK
-
 config PPC_85xx
 	bool "Freescale 85xx"
 	select E500
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 3/7] powerpc: Drop SYNC_601() ISYNC_601() and SYNC()
From: Christophe Leroy @ 2020-09-29  6:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <27951fa6c9a8f80724d1bc81a6117ac32343a55d.1601359702.git.christophe.leroy@csgroup.eu>

Those macros are now empty at all time. Drop them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/ppc_asm.h  |  4 ----
 arch/powerpc/kernel/entry_32.S      | 17 +----------------
 arch/powerpc/kernel/fpu.S           |  1 -
 arch/powerpc/kernel/head_32.S       |  9 ---------
 arch/powerpc/kernel/head_32.h       |  1 -
 arch/powerpc/kernel/l2cr_6xx.S      |  3 +--
 arch/powerpc/mm/book3s32/hash_low.S | 12 ------------
 7 files changed, 2 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 0b9dc814b81c..67a421b81a50 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -382,10 +382,6 @@ GLUE(.,name):
 #endif
 
 /* various errata or part fixups */
-#define	SYNC
-#define SYNC_601
-#define ISYNC_601
-
 #if defined(CONFIG_PPC_CELL) || defined(CONFIG_PPC_FSL_BOOK3E)
 #define MFTB(dest)			\
 90:	mfspr dest, SPRN_TBRL;		\
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index f4d0af8e1136..f25ea188ecd3 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -234,7 +234,6 @@ transfer_to_handler_cont:
 	mtspr	SPRN_SRR0,r11
 	mtspr	SPRN_SRR1,r10
 	mtlr	r9
-	SYNC
 	RFI				/* jump to handler, enable MMU */
 
 #if defined (CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500)
@@ -264,7 +263,6 @@ _ASM_NOKPROBE_SYMBOL(transfer_to_handler_cont)
 	LOAD_REG_IMMEDIATE(r0, MSR_KERNEL)
 	mtspr	SPRN_SRR0,r12
 	mtspr	SPRN_SRR1,r0
-	SYNC
 	RFI
 
 reenable_mmu:
@@ -323,7 +321,6 @@ stack_ovf:
 #endif
 	mtspr	SPRN_SRR0,r9
 	mtspr	SPRN_SRR1,r10
-	SYNC
 	RFI
 _ASM_NOKPROBE_SYMBOL(stack_ovf)
 #endif
@@ -411,7 +408,6 @@ ret_from_syscall:
 	/* disable interrupts so current_thread_info()->flags can't change */
 	LOAD_REG_IMMEDIATE(r10,MSR_KERNEL)	/* doesn't include MSR_EE */
 	/* Note: We don't bother telling lockdep about it */
-	SYNC
 	mtmsr	r10
 	lwz	r9,TI_FLAGS(r2)
 	li	r8,-MAX_ERRNO
@@ -474,7 +470,6 @@ syscall_exit_finish:
 #endif
 	mtspr	SPRN_SRR0,r7
 	mtspr	SPRN_SRR1,r8
-	SYNC
 	RFI
 _ASM_NOKPROBE_SYMBOL(syscall_exit_finish)
 #ifdef CONFIG_44x
@@ -567,7 +562,6 @@ syscall_exit_work:
 	 * lockdep as we are supposed to have IRQs on at this point
 	 */
 	ori	r10,r10,MSR_EE
-	SYNC
 	mtmsr	r10
 
 	/* Save NVGPRS if they're not saved already */
@@ -606,7 +600,6 @@ ret_from_kernel_syscall:
 #endif
 	mtspr	SPRN_SRR0, r9
 	mtspr	SPRN_SRR1, r10
-	SYNC
 	RFI
 _ASM_NOKPROBE_SYMBOL(ret_from_kernel_syscall)
 
@@ -810,7 +803,6 @@ fast_exception_return:
 	REST_GPR(9, r11)
 	REST_GPR(12, r11)
 	lwz	r11,GPR11(r11)
-	SYNC
 	RFI
 _ASM_NOKPROBE_SYMBOL(fast_exception_return)
 
@@ -872,7 +864,6 @@ ret_from_except:
 	 * from the interrupt. */
 	/* Note: We don't bother telling lockdep about it */
 	LOAD_REG_IMMEDIATE(r10,MSR_KERNEL)
-	SYNC			/* Some chip revs have problems here... */
 	mtmsr	r10		/* disable interrupts */
 
 	lwz	r3,_MSR(r1)	/* Returning to user mode? */
@@ -1035,7 +1026,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
 	 * exc_exit_restart below.  -- paulus
 	 */
 	LOAD_REG_IMMEDIATE(r10,MSR_KERNEL & ~MSR_RI)
-	SYNC
 	mtmsr	r10		/* clear the RI bit */
 	.globl exc_exit_restart
 exc_exit_restart:
@@ -1046,7 +1036,6 @@ exc_exit_restart:
 	lwz	r1,GPR1(r1)
 	.globl exc_exit_restart_end
 exc_exit_restart_end:
-	SYNC
 	RFI
 _ASM_NOKPROBE_SYMBOL(exc_exit_restart)
 _ASM_NOKPROBE_SYMBOL(exc_exit_restart_end)
@@ -1274,7 +1263,6 @@ do_resched:			/* r10 contains MSR_KERNEL here */
 	mfmsr	r10
 #endif
 	ori	r10,r10,MSR_EE
-	SYNC
 	mtmsr	r10		/* hard-enable interrupts */
 	bl	schedule
 recheck:
@@ -1283,7 +1271,6 @@ recheck:
 	 * TI_FLAGS aren't advertised.
 	 */
 	LOAD_REG_IMMEDIATE(r10,MSR_KERNEL)
-	SYNC
 	mtmsr	r10		/* disable interrupts */
 	lwz	r9,TI_FLAGS(r2)
 	andi.	r0,r9,_TIF_NEED_RESCHED
@@ -1292,7 +1279,6 @@ recheck:
 	beq	restore_user
 do_user_signal:			/* r10 contains MSR_KERNEL here */
 	ori	r10,r10,MSR_EE
-	SYNC
 	mtmsr	r10		/* hard-enable interrupts */
 	/* save r13-r31 in the exception frame, if not already done */
 	lwz	r3,_TRAP(r1)
@@ -1382,8 +1368,7 @@ _GLOBAL(enter_rtas)
 	mfmsr	r9
 	stw	r9,8(r1)
 	LOAD_REG_IMMEDIATE(r0,MSR_KERNEL)
-	SYNC			/* disable interrupts so SRR0/1 */
-	mtmsr	r0		/* don't get trashed */
+	mtmsr	r0	/* disable interrupts so SRR0/1 don't get trashed */
 	li	r9,MSR_KERNEL & ~(MSR_IR|MSR_DR)
 	mtlr	r6
 	stw	r7, THREAD + RTAS_SP(r2)
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index 825893d4cb59..3ff9a8fafa46 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -87,7 +87,6 @@ BEGIN_FTR_SECTION
 	oris	r5,r5,MSR_VSX@h
 END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 #endif
-	SYNC
 	MTMSRD(r5)			/* enable use of fpu now */
 	isync
 	/* enable use of FP after return */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 2bd0aa3a4cc7..48cde60334a2 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -219,7 +219,6 @@ turn_on_mmu:
 	lis	r0,start_here@h
 	ori	r0,r0,start_here@l
 	mtspr	SPRN_SRR0,r0
-	SYNC
 	RFI				/* enables MMU */
 
 /*
@@ -784,14 +783,12 @@ fast_hash_page_return:
 	mtcr	r11
 	lwz	r11, THR11(r10)
 	mfspr	r10, SPRN_SPRG_SCRATCH0
-	SYNC
 	RFI
 
 1:	/* ISI */
 	mtcr	r11
 	mfspr	r11, SPRN_SPRG_SCRATCH1
 	mfspr	r10, SPRN_SPRG_SCRATCH0
-	SYNC
 	RFI
 
 stack_overflow:
@@ -882,7 +879,6 @@ __secondary_start_pmac_0:
 	   set to map the 0xf0000000 - 0xffffffff region */
 	mfmsr	r0
 	rlwinm	r0,r0,0,28,26		/* clear DR (0x10) */
-	SYNC
 	mtmsr	r0
 	isync
 
@@ -930,7 +926,6 @@ __secondary_start:
 	ori	r3,r3,start_secondary@l
 	mtspr	SPRN_SRR0,r3
 	mtspr	SPRN_SRR1,r4
-	SYNC
 	RFI
 #endif /* CONFIG_SMP */
 
@@ -1074,7 +1069,6 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
 	.align	4
 	mtspr	SPRN_SRR0,r4
 	mtspr	SPRN_SRR1,r3
-	SYNC
 	RFI
 /* Load up the kernel context */
 2:	bl	load_up_mmu
@@ -1099,7 +1093,6 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
 	ori	r3,r3,start_kernel@l
 	mtspr	SPRN_SRR0,r3
 	mtspr	SPRN_SRR1,r4
-	SYNC
 	RFI
 
 /*
@@ -1217,7 +1210,6 @@ _ENTRY(update_bats)
 	.align	4
 	mtspr	SPRN_SRR0, r4
 	mtspr	SPRN_SRR1, r3
-	SYNC
 	RFI
 1:	bl	clear_bats
 	lis	r3, BATS@ha
@@ -1237,7 +1229,6 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
 	mtmsr	r3
 	mtspr	SPRN_SRR0, r7
 	mtspr	SPRN_SRR1, r6
-	SYNC
 	RFI
 
 flush_tlbs:
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index cc36998c5541..7c767765071d 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -222,7 +222,6 @@
 #endif
 	mtspr	SPRN_SRR1,r10
 	mtspr	SPRN_SRR0,r11
-	SYNC
 	RFI				/* jump to handler, enable MMU */
 99:	b	ret_from_kernel_syscall
 .endm
diff --git a/arch/powerpc/kernel/l2cr_6xx.S b/arch/powerpc/kernel/l2cr_6xx.S
index 5f07aa5e9851..225511d73bef 100644
--- a/arch/powerpc/kernel/l2cr_6xx.S
+++ b/arch/powerpc/kernel/l2cr_6xx.S
@@ -256,7 +256,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_SPEC7450)
 	sync
 
 	/* Restore MSR (restores EE and DR bits to original state) */
-	SYNC
 	mtmsr	r7
 	isync
 
@@ -377,7 +376,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_L3CR)
 1:	bdnz	1b
 
 	/* Restore MSR (restores EE and DR bits to original state) */
-4:	SYNC
+4:
 	mtmsr	r7
 	isync
 	blr
diff --git a/arch/powerpc/mm/book3s32/hash_low.S b/arch/powerpc/mm/book3s32/hash_low.S
index 1690d369688b..3143de6ae769 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -199,11 +199,9 @@ _GLOBAL(add_hash_page)
 	 * covered by a BAT).  -- paulus
 	 */
 	mfmsr	r9
-	SYNC
 	rlwinm	r0,r9,0,17,15		/* clear bit 16 (MSR_EE) */
 	rlwinm	r0,r0,0,28,26		/* clear MSR_DR */
 	mtmsr	r0
-	SYNC_601
 	isync
 
 #ifdef CONFIG_SMP
@@ -262,7 +260,6 @@ _GLOBAL(add_hash_page)
 
 	/* reenable interrupts and DR */
 	mtmsr	r9
-	SYNC_601
 	isync
 
 	lwz	r0,4(r1)
@@ -506,11 +503,9 @@ _GLOBAL(flush_hash_pages)
 	 * covered by a BAT).  -- paulus
 	 */
 	mfmsr	r10
-	SYNC
 	rlwinm	r0,r10,0,17,15		/* clear bit 16 (MSR_EE) */
 	rlwinm	r0,r0,0,28,26		/* clear MSR_DR */
 	mtmsr	r0
-	SYNC_601
 	isync
 
 	/* First find a PTE in the range that has _PAGE_HASHPTE set */
@@ -629,7 +624,6 @@ _GLOBAL(flush_hash_pages)
 #endif
 
 19:	mtmsr	r10
-	SYNC_601
 	isync
 	blr
 EXPORT_SYMBOL(flush_hash_pages)
@@ -643,11 +637,9 @@ _GLOBAL(_tlbie)
 	lwz	r8,TASK_CPU(r2)
 	oris	r8,r8,11
 	mfmsr	r10
-	SYNC
 	rlwinm	r0,r10,0,17,15		/* clear bit 16 (MSR_EE) */
 	rlwinm	r0,r0,0,28,26		/* clear DR */
 	mtmsr	r0
-	SYNC_601
 	isync
 	lis	r9,mmu_hash_lock@h
 	ori	r9,r9,mmu_hash_lock@l
@@ -664,7 +656,6 @@ _GLOBAL(_tlbie)
 	li	r0,0
 	stw	r0,0(r9)		/* clear mmu_hash_lock */
 	mtmsr	r10
-	SYNC_601
 	isync
 #else /* CONFIG_SMP */
 	tlbie	r3
@@ -681,11 +672,9 @@ _GLOBAL(_tlbia)
 	lwz	r8,TASK_CPU(r2)
 	oris	r8,r8,10
 	mfmsr	r10
-	SYNC
 	rlwinm	r0,r10,0,17,15		/* clear bit 16 (MSR_EE) */
 	rlwinm	r0,r0,0,28,26		/* clear DR */
 	mtmsr	r0
-	SYNC_601
 	isync
 	lis	r9,mmu_hash_lock@h
 	ori	r9,r9,mmu_hash_lock@l
@@ -709,7 +698,6 @@ _GLOBAL(_tlbia)
 	li	r0,0
 	stw	r0,0(r9)		/* clear mmu_hash_lock */
 	mtmsr	r10
-	SYNC_601
 	isync
 #endif /* CONFIG_SMP */
 	blr
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 1/7] powerpc: Remove SYNC on non 6xx
From: Christophe Leroy @ 2020-09-29  6:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

SYNC is usefull for Powerpc 601 only. On everything else,
SYNC is empty.

Remove it from code that is not made to run on 6xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/head_40x.S   | 1 -
 arch/powerpc/kernel/head_booke.h | 1 -
 arch/powerpc/kernel/misc_64.S    | 1 -
 3 files changed, 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 5b282d9965a5..44c9018aed1b 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -72,7 +72,6 @@ turn_on_mmu:
 	lis	r0,start_here@h
 	ori	r0,r0,start_here@l
 	mtspr	SPRN_SRR0,r0
-	SYNC
 	rfi				/* enables MMU */
 	b	.			/* prevent prefetch past rfi */
 
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 18f87bf9e32b..71c359d438b5 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -176,7 +176,6 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
 #endif
 	mtspr	SPRN_SRR1,r10
 	mtspr	SPRN_SRR0,r11
-	SYNC
 	RFI				/* jump to handler, enable MMU */
 99:	b	ret_from_kernel_syscall
 .endm
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 7bb46ad98207..070465825c21 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -365,7 +365,6 @@ _GLOBAL(kexec_smp_wait)
 
 	li	r4,KEXEC_STATE_REAL_MODE
 	stb	r4,PACAKEXECSTATE(r13)
-	SYNC
 
 	b	kexec_wait
 
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH v2 23/25] powerpc/signal: Create 'unsafe' versions of copy_[ck][fpr/vsx]_to_user()
From: Christophe Leroy @ 2020-09-29  5:33 UTC (permalink / raw)
  To: Christopher M. Riedl, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <a14a82a6-74c2-cc0a-8c6b-46e571cadb7d@csgroup.eu>



Le 29/09/2020 à 07:22, Christophe Leroy a écrit :
> 
> 
> Le 29/09/2020 à 04:04, Christopher M. Riedl a écrit :
>> On Tue Aug 18, 2020 at 12:19 PM CDT, Christophe Leroy wrote:
>>> For the non VSX version, that's trivial. Just use unsafe_copy_to_user()
>>> instead of __copy_to_user().
>>>
>>> For the VSX version, remove the intermediate step through a buffer and
>>> use unsafe_put_user() directly. This generates a far smaller code which
>>> is acceptable to inline, see below:
>>>
>>>
>>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>>> ---
>>> arch/powerpc/kernel/signal.h | 53 ++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 53 insertions(+)
>>>
>>> diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
>>> index f610cfafa478..2559a681536e 100644
>>> --- a/arch/powerpc/kernel/signal.h
>>> +++ b/arch/powerpc/kernel/signal.h
>>> @@ -32,7 +32,54 @@ unsigned long copy_fpr_to_user(void __user *to,
>>> struct task_struct *task);
>>> unsigned long copy_ckfpr_to_user(void __user *to, struct task_struct
>>> *task);
>>> unsigned long copy_fpr_from_user(struct task_struct *task, void __user
>>> *from);
>>> unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user
>>> *from);
>>> +
>>> +#define unsafe_copy_fpr_to_user(to, task, label) do { \
>>> + struct task_struct *__t = task; \
>>> + u64 __user *buf = (u64 __user *)to; \
>>> + int i; \
>>> + \
>>> + for (i = 0; i < ELF_NFPREG - 1 ; i++) \
>>> + unsafe_put_user(__t->thread.TS_FPR(i), &buf[i], label); \
>>> + unsafe_put_user(__t->thread.fp_state.fpscr, &buf[i], label); \
>>> +} while (0)
>>> +
>>
>> I've been working on the PPC64 side of this "unsafe" rework using this
>> series as a basis. One question here - I don't really understand what
>> the benefit of re-implementing this logic in macros (similarly for the
>> other copy_* functions below) is?
> 
> Not sure either.
> 
> The whole purpose is to not manage the error through a local var but exclusively use labels.
> However, GCC is probably smart enough to understand it and drop the local var while inlining.
> 
> One important thing however is to make sure we won't end up with an outline function, otherwise you 
> completely loose the benefit of the label stuff. And you get a function call inside a user access, 
> which is what we want to avoid.
> 
>>
>> I am considering  a "__unsafe_copy_*" implementation in signal.c for
>> each (just the original implementation w/ using the "unsafe_" variants
>> of the uaccess stuff) which gets called by the "safe" functions w/ the
>> appropriate "user_*_access_begin/user_*_access_end". Something like
>> (pseudo-ish code):
> 
> Good idea, however ...
> 
>>
>>     /* signal.c */
>>     unsigned long __unsafe_copy_fpr_to_user(...)
>>     {
>>         ...
>>         unsafe_copy_to_user(..., bad);
>>         return 0;
>>     bad:
>>         return 1; /* -EFAULT? */
>>     }
> 
> This __unsafe_copy_fpr_to_user() has to be in signal.h and must be tagged 'static __always_inline' 
> for the reasons explained above.
> 
>>
>>     unsigned long copy_fpr_to_user(...)
>>     {
>>         unsigned long err;
>>         if (!user_write_access_begin(...))
>>             return 1; /* -EFAULT? */
>>
>>         err = __unsafe_copy_fpr_to_user(...);
>>
>>         user_write_access_end();
>>         return err;
>>     }

Also note that at the end (ie when both PPC32 and PPC64 signal code are using "unsafe" versions), 
the "safe" version won't be used anymore and will be dropped.

Christophe

^ permalink raw reply

* Re: [PATCH v2 23/25] powerpc/signal: Create 'unsafe' versions of copy_[ck][fpr/vsx]_to_user()
From: Christophe Leroy @ 2020-09-29  5:22 UTC (permalink / raw)
  To: Christopher M. Riedl, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <C5ZHGD1JVX0H.1UI1PWMZN73UX@geist>



Le 29/09/2020 à 04:04, Christopher M. Riedl a écrit :
> On Tue Aug 18, 2020 at 12:19 PM CDT, Christophe Leroy wrote:
>> For the non VSX version, that's trivial. Just use unsafe_copy_to_user()
>> instead of __copy_to_user().
>>
>> For the VSX version, remove the intermediate step through a buffer and
>> use unsafe_put_user() directly. This generates a far smaller code which
>> is acceptable to inline, see below:
>>
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>> arch/powerpc/kernel/signal.h | 53 ++++++++++++++++++++++++++++++++++++
>> 1 file changed, 53 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
>> index f610cfafa478..2559a681536e 100644
>> --- a/arch/powerpc/kernel/signal.h
>> +++ b/arch/powerpc/kernel/signal.h
>> @@ -32,7 +32,54 @@ unsigned long copy_fpr_to_user(void __user *to,
>> struct task_struct *task);
>> unsigned long copy_ckfpr_to_user(void __user *to, struct task_struct
>> *task);
>> unsigned long copy_fpr_from_user(struct task_struct *task, void __user
>> *from);
>> unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user
>> *from);
>> +
>> +#define unsafe_copy_fpr_to_user(to, task, label) do { \
>> + struct task_struct *__t = task; \
>> + u64 __user *buf = (u64 __user *)to; \
>> + int i; \
>> + \
>> + for (i = 0; i < ELF_NFPREG - 1 ; i++) \
>> + unsafe_put_user(__t->thread.TS_FPR(i), &buf[i], label); \
>> + unsafe_put_user(__t->thread.fp_state.fpscr, &buf[i], label); \
>> +} while (0)
>> +
> 
> I've been working on the PPC64 side of this "unsafe" rework using this
> series as a basis. One question here - I don't really understand what
> the benefit of re-implementing this logic in macros (similarly for the
> other copy_* functions below) is?

Not sure either.

The whole purpose is to not manage the error through a local var but exclusively use labels.
However, GCC is probably smart enough to understand it and drop the local var while inlining.

One important thing however is to make sure we won't end up with an outline function, otherwise you 
completely loose the benefit of the label stuff. And you get a function call inside a user access, 
which is what we want to avoid.

> 
> I am considering  a "__unsafe_copy_*" implementation in signal.c for
> each (just the original implementation w/ using the "unsafe_" variants
> of the uaccess stuff) which gets called by the "safe" functions w/ the
> appropriate "user_*_access_begin/user_*_access_end". Something like
> (pseudo-ish code):

Good idea, however ...

> 
> 	/* signal.c */
> 	unsigned long __unsafe_copy_fpr_to_user(...)
> 	{
> 		...
> 		unsafe_copy_to_user(..., bad);
> 		return 0;
> 	bad:
> 		return 1; /* -EFAULT? */
> 	}

This __unsafe_copy_fpr_to_user() has to be in signal.h and must be tagged 'static __always_inline' 
for the reasons explained above.

> 
> 	unsigned long copy_fpr_to_user(...)
> 	{
> 		unsigned long err;
> 		if (!user_write_access_begin(...))
> 			return 1; /* -EFAULT? */
> 
> 		err = __unsafe_copy_fpr_to_user(...);
> 
> 		user_write_access_end();
> 		return err;
> 	}
> 
> 	/* signal.h */
> 	unsigned long __unsafe_copy_fpr_to_user(...);
> 	#define unsafe_copy_fpr_to_user(..., label) \
> 		unsafe_op_wrap(__unsafe_copy_fpr_to_user(...), label)
> 
> This way there is a single implementation for each copy routine "body".
> The "unsafe" wrappers then just exist as simple macros similar to what
> x86 does: "unsafe_" macro wraps "__unsafe" functions for the goto label.

Yes, good point.

Christophe

^ permalink raw reply

* Re: [PATCH v2 25/25] powerpc/signal32: Transform save_user_regs() and save_tm_user_regs() in 'unsafe' version
From: Christophe Leroy @ 2020-09-29  5:21 UTC (permalink / raw)
  To: Christopher M. Riedl, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <C5ZIJ5D6XYSA.191D18GH311GD@geist>



Le 29/09/2020 à 04:55, Christopher M. Riedl a écrit :
> On Tue Aug 18, 2020 at 12:19 PM CDT, Christophe Leroy wrote:
>> Change those two functions to be used within a user access block.
>>
>> For that, change save_general_regs() to and unsafe_save_general_regs(),
>> then replace all user accesses by unsafe_ versions.
>>
>> This series leads to a reduction from 2.55s to 1.73s of
>> the system CPU time with the following microbench app
>> on an mpc832x with KUAP (approx 32%)
>>
>> Without KUAP, the difference is in the noise.
>>
>> void sigusr1(int sig) { }
>>
>> int main(int argc, char **argv)
>> {
>> int i = 100000;
>>
>> signal(SIGUSR1, sigusr1);
>> for (;i--;)
>> raise(SIGUSR1);
>> exit(0);
>> }
>>
>> An additional 0.10s reduction is achieved by removing
>> CONFIG_PPC_FPU, as the mpc832x has no FPU.
>>
>> A bit less spectacular on an 8xx as KUAP is less heavy, prior to
>> the series (with KUAP) it ran in 8.10 ms. Once applies the removal
>> of FPU regs handling, we get 7.05s. With the full series, we get 6.9s.
>> If artificially re-activating FPU regs handling with the full series,
>> we get 7.6s.
>>
>> So for the 8xx, the removal of the FPU regs copy is what makes the
>> difference, but the rework of handle_signal also have a benefit.
>>
>> Same as above, without KUAP the difference is in the noise.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>> arch/powerpc/kernel/signal_32.c | 224 ++++++++++++++++----------------
>> 1 file changed, 111 insertions(+), 113 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/signal_32.c
>> b/arch/powerpc/kernel/signal_32.c
>> index 86539a4e0514..f795fe0240a1 100644
>> --- a/arch/powerpc/kernel/signal_32.c
>> +++ b/arch/powerpc/kernel/signal_32.c
>> @@ -93,8 +93,8 @@ static inline int get_sigset_t(sigset_t *set,
>> #define to_user_ptr(p) ptr_to_compat(p)
>> #define from_user_ptr(p) compat_ptr(p)
>>   
>> -static inline int save_general_regs(struct pt_regs *regs,
>> - struct mcontext __user *frame)
>> +static __always_inline int
>> +save_general_regs_unsafe(struct pt_regs *regs, struct mcontext __user
>> *frame)
>> {
>> elf_greg_t64 *gregs = (elf_greg_t64 *)regs;
>> int val, i;
>> @@ -108,10 +108,12 @@ static inline int save_general_regs(struct pt_regs
>> *regs,
>> else
>> val = gregs[i];
>>   
>> - if (__put_user(val, &frame->mc_gregs[i]))
>> - return -EFAULT;
>> + unsafe_put_user(val, &frame->mc_gregs[i], failed);
>> }
>> return 0;
>> +
>> +failed:
>> + return 1;
>> }
>>   
>> static inline int restore_general_regs(struct pt_regs *regs,
>> @@ -148,11 +150,15 @@ static inline int get_sigset_t(sigset_t *set,
>> const sigset_t __user *uset)
>> #define to_user_ptr(p) ((unsigned long)(p))
>> #define from_user_ptr(p) ((void __user *)(p))
>>   
>> -static inline int save_general_regs(struct pt_regs *regs,
>> - struct mcontext __user *frame)
>> +static __always_inline int
>> +save_general_regs_unsafe(struct pt_regs *regs, struct mcontext __user
>> *frame)
>> {
>> WARN_ON(!FULL_REGS(regs));
>> - return __copy_to_user(&frame->mc_gregs, regs, GP_REGS_SIZE);
>> + unsafe_copy_to_user(&frame->mc_gregs, regs, GP_REGS_SIZE, failed);
>> + return 0;
>> +
>> +failed:
>> + return 1;
>> }
>>   
>> static inline int restore_general_regs(struct pt_regs *regs,
>> @@ -170,6 +176,11 @@ static inline int restore_general_regs(struct
>> pt_regs *regs,
>> }
>> #endif
>>   
>> +#define unsafe_save_general_regs(regs, frame, label) do { \
>> + if (save_general_regs_unsafe(regs, frame)) \
> 
> Minor nitpick (sorry); this naming seems a bit strange to me, in x86 it
> is "__unsafe_" as a prefix instead of "_unsafe" as a suffix. That sounds
> a bit better to me, what do you think? Unless there is some convention I
> am not aware of here apart from "unsafe_" using a goto label for errors.

You are probably right, I have never been good at naming stuff.

Christophe

^ permalink raw reply

* Re: [PATCH v2 07/14] powerpc/pseries/iommu: Add ddw_list_new_entry() helper
From: Alexey Kardashevskiy @ 2020-09-29  3:57 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev
In-Reply-To: <20200911170738.82818-8-leobras.c@gmail.com>



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> There are two functions creating direct_window_list entries in a
> similar way, so create a ddw_list_new_entry() to avoid duplicity and
> simplify those functions.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>   arch/powerpc/platforms/pseries/iommu.c | 32 +++++++++++++++++---------
>   1 file changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 49008d2e217d..e4c17d27dcf3 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -869,6 +869,21 @@ static u64 find_existing_ddw(struct device_node *pdn)
>   	return dma_addr;
>   }
>   
> +static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
> +						const struct dynamic_dma_window_prop *dma64)
> +{
> +	struct direct_window *window;
> +
> +	window = kzalloc(sizeof(*window), GFP_KERNEL);
> +	if (!window)
> +		return NULL;
> +
> +	window->device = pdn;
> +	window->prop = dma64;
> +
> +	return window;
> +}
> +
>   static int find_existing_ddw_windows(void)
>   {
>   	int len;
> @@ -881,18 +896,15 @@ static int find_existing_ddw_windows(void)
>   
>   	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
>   		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
> -		if (!direct64)
> -			continue;
> -
> -		window = kzalloc(sizeof(*window), GFP_KERNEL);
> -		if (!window || len < sizeof(struct dynamic_dma_window_prop)) {
> -			kfree(window);
> +		if (!direct64 || len < sizeof(*direct64)) {
>   			remove_ddw(pdn, true);
>   			continue;
>   		}
>   
> -		window->device = pdn;
> -		window->prop = direct64;
> +		window = ddw_list_new_entry(pdn, direct64);
> +		if (!window)
> +			break;
> +
>   		spin_lock(&direct_window_list_lock);
>   		list_add(&window->list, &direct_window_list);
>   		spin_unlock(&direct_window_list_lock);
> @@ -1261,7 +1273,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
>   		  create.liobn, dn);
>   
> -	window = kzalloc(sizeof(*window), GFP_KERNEL);
> +	window = ddw_list_new_entry(pdn, ddwprop);
>   	if (!window)
>   		goto out_clear_window;
>   
> @@ -1280,8 +1292,6 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		goto out_free_window;
>   	}
>   
> -	window->device = pdn;
> -	window->prop = ddwprop;
>   	spin_lock(&direct_window_list_lock);
>   	list_add(&window->list, &direct_window_list);
>   	spin_unlock(&direct_window_list_lock);
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH v2 03/14] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs
From: Alexey Kardashevskiy @ 2020-09-29  3:57 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev
In-Reply-To: <20200911170738.82818-4-leobras.c@gmail.com>



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> Currently both iommu_alloc_coherent() and iommu_free_coherent() align the
> desired allocation size to PAGE_SIZE, and gets system pages and IOMMU
> mappings (TCEs) for that value.
> 
> When IOMMU_PAGE_SIZE < PAGE_SIZE, this behavior may cause unnecessary
> TCEs to be created for mapping the whole system page.
> 
> Example:
> - PAGE_SIZE = 64k, IOMMU_PAGE_SIZE() = 4k
> - iommu_alloc_coherent() is called for 128 bytes
> - 1 system page (64k) is allocated
> - 16 IOMMU pages (16 x 4k) are allocated (16 TCEs used)
> 
> It would be enough to use a single TCE for this, so 15 TCEs are
> wasted in the process.
> 
> Update iommu_*_coherent() to make sure the size alignment happens only
> for IOMMU_PAGE_SIZE() before calling iommu_alloc() and iommu_free().
> 
> Also, on iommu_range_alloc(), replace ALIGN(n, 1 << tbl->it_page_shift)
> with IOMMU_PAGE_ALIGN(n, tbl), which is easier to read and does the
> same.


This seems alright but rather unrelated to the series, probably makes 
sense to post it separately.

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/kernel/iommu.c | 16 ++++++++--------
>   1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 9704f3f76e63..7961645a6980 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -237,10 +237,9 @@ static unsigned long iommu_range_alloc(struct device *dev,
>   	}
>   
>   	if (dev)
> -		boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> -				      1 << tbl->it_page_shift);
> +		boundary_size = IOMMU_PAGE_ALIGN(dma_get_seg_boundary(dev) + 1, tbl);
>   	else
> -		boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
> +		boundary_size = IOMMU_PAGE_ALIGN(1UL << 32, tbl);
>   	/* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
>   
>   	n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
> @@ -858,6 +857,7 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
>   	unsigned int order;
>   	unsigned int nio_pages, io_order;
>   	struct page *page;
> +	size_t size_io = size;
>   
>   	size = PAGE_ALIGN(size);
>   	order = get_order(size);
> @@ -884,8 +884,9 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
>   	memset(ret, 0, size);
>   
>   	/* Set up tces to cover the allocated range */
> -	nio_pages = size >> tbl->it_page_shift;
> -	io_order = get_iommu_order(size, tbl);
> +	size_io = IOMMU_PAGE_ALIGN(size_io, tbl);
> +	nio_pages = size_io >> tbl->it_page_shift;
> +	io_order = get_iommu_order(size_io, tbl);
>   	mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
>   			      mask >> tbl->it_page_shift, io_order, 0);
>   	if (mapping == DMA_MAPPING_ERROR) {
> @@ -900,10 +901,9 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t size,
>   			 void *vaddr, dma_addr_t dma_handle)
>   {
>   	if (tbl) {
> -		unsigned int nio_pages;
> +		size_t size_io = IOMMU_PAGE_ALIGN(size, tbl);
> +		unsigned int nio_pages = size_io >> tbl->it_page_shift;
>   
> -		size = PAGE_ALIGN(size);
> -		nio_pages = size >> tbl->it_page_shift;
>   		iommu_free(tbl, dma_handle, nio_pages);
>   		size = PAGE_ALIGN(size);
>   		free_pages((unsigned long)vaddr, get_order(size));
> 

-- 
--
Alexey

^ permalink raw reply

* Re: [PATCH v2 02/14] powerpc/pseries/iommu: Makes sure IOMMU_PAGE_SIZE <= PAGE_SIZE
From: Alexey Kardashevskiy @ 2020-09-29  3:57 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev
In-Reply-To: <20200911170738.82818-3-leobras.c@gmail.com>



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> Having System pagesize < IOMMU pagesize may cause a page owned by another
> process/VM to be written by a buggy driver / device.
> 
> As it's intended to use DDW for indirect mapping, it's possible to get
> a configuration where (PAGE_SIZE = 4k) < (IOMMU_PAGE_SIZE() = 64k).


Have you tried 4K VM kernel under phyp with 64K iommu pages? phyp may 
assume it is still correct as it knows that it allocates memory in 256MB 
contiguous chunks and for DMA purposes phyp may assume that there just a 
VM kernel which owns all the VM physical memory and can align size in 
iommu_range_alloc() to the IOMMU page size. Not sure.


> 
> To avoid this, make sure create_ddw() can only use pagesize <= PAGE_SIZE.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 9db3927607a4..4eff3a6a441e 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1206,17 +1206,20 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   			goto out_failed;
>   		}
>   	}
> -	if (query.page_size & 4) {
> -		page_shift = 24; /* 16MB */
> -	} else if (query.page_size & 2) {
> +
> +	/* Choose the biggest pagesize available <= PAGE_SHIFT */
> +	if ((query.page_size & 4)) {


This just doubles braces.

Also I believe for the indirect case you should not even try 16MB.


> +		page_shift = 24; /* 16MB - Hugepages */
> +	} else if ((query.page_size & 2) && PAGE_SHIFT >= 16) {


PAGE_SHIFT == 16

>   		page_shift = 16; /* 64kB */
> -	} else if (query.page_size & 1) {
> +	} else if ((query.page_size & 1) && PAGE_SHIFT >= 12) {


Kind of useless check, if it is not 64K, it is 4K.

I understand you are trying to be safe here but this give a new reader 
an idea of more flexibility that there is :) BUILD_BUG_ON() would do 
here imho.



>   		page_shift = 12; /* 4kB */
>   	} else {
>   		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
>   			  query.page_size);
>   		goto out_failed;
>   	}
> +

Unrelated.

>   	/* verify the window * number of ptes will map the partition */
>   	/* check largest block * page size > max memory hotplug addr */
>   	max_addr = ddw_memory_hotplug_max();
> 

-- 
--
Alexey

^ permalink raw reply

* Re: [PATCH v2 05/14] powerpc/kernel/iommu: Add new iommu_table_in_use() helper
From: Alexey Kardashevskiy @ 2020-09-29  3:57 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev
In-Reply-To: <20200911170738.82818-6-leobras.c@gmail.com>



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> Having a function to check if the iommu table has any allocation helps
> deciding if a tbl can be reset for using a new DMA window.
> 
> It should be enough to replace all instances of !bitmap_empty(tbl...).
> 
> iommu_table_in_use() skips reserved memory, so we don't need to worry about
> releasing it before testing. This causes iommu_table_release_pages() to
> become unnecessary, given it is only used to remove reserved memory for
> testing.
> 
> Also, only allow storing reserved memory values in tbl if they are valid
> in the table, so there is no need to check it in the new helper.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/include/asm/iommu.h |  1 +
>   arch/powerpc/kernel/iommu.c      | 61 +++++++++++++++-----------------
>   2 files changed, 30 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> index 5032f1593299..2913e5c8b1f8 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -154,6 +154,7 @@ extern int iommu_tce_table_put(struct iommu_table *tbl);
>    */
>   extern struct iommu_table *iommu_init_table(struct iommu_table *tbl,
>   		int nid, unsigned long res_start, unsigned long res_end);
> +bool iommu_table_in_use(struct iommu_table *tbl);
>   
>   #define IOMMU_TABLE_GROUP_MAX_TABLES	2
>   
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index ffb2637dc82b..c838da3d8f32 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -655,34 +655,21 @@ static void iommu_table_reserve_pages(struct iommu_table *tbl,
>   	if (tbl->it_offset == 0)
>   		set_bit(0, tbl->it_map);
>   
> +	/* Check if res_start..res_end is a valid range in the table */
> +	if (res_start >= res_end || res_start < tbl->it_offset ||
> +	    res_end > (tbl->it_offset + tbl->it_size)) {
> +		tbl->it_reserved_start = tbl->it_offset;
> +		tbl->it_reserved_end = tbl->it_offset;


This silently ignores overlapped range of the reserved area and the 
window which does not seem right.



> +		return;
> +	}
> +
>   	tbl->it_reserved_start = res_start;
>   	tbl->it_reserved_end = res_end;
>   
> -	/* Check if res_start..res_end isn't empty and overlaps the table */
> -	if (res_start && res_end &&
> -			(tbl->it_offset + tbl->it_size < res_start ||
> -			 res_end < tbl->it_offset))
> -		return;
> -
>   	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
>   		set_bit(i - tbl->it_offset, tbl->it_map);
>   }
>   
> -static void iommu_table_release_pages(struct iommu_table *tbl)
> -{
> -	int i;
> -
> -	/*
> -	 * In case we have reserved the first bit, we should not emit
> -	 * the warning below.
> -	 */
> -	if (tbl->it_offset == 0)
> -		clear_bit(0, tbl->it_map);
> -
> -	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
> -		clear_bit(i - tbl->it_offset, tbl->it_map);
> -}
> -
>   /*
>    * Build a iommu_table structure.  This contains a bit map which
>    * is used to manage allocation of the tce space.
> @@ -743,6 +730,23 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>   	return tbl;
>   }
>   
> +bool iommu_table_in_use(struct iommu_table *tbl)
> +{
> +	unsigned long start = 0, end;
> +
> +	/* ignore reserved bit0 */
> +	if (tbl->it_offset == 0)
> +		start = 1;
> +	end = tbl->it_reserved_start - tbl->it_offset;
> +	if (find_next_bit(tbl->it_map, end, start) != end)
> +		return true;
> +
> +	start = tbl->it_reserved_end - tbl->it_offset;
> +	end = tbl->it_size;
> +	return find_next_bit(tbl->it_map, end, start) != end;
> +

Unnecessary empty line.

> +}
> +
>   static void iommu_table_free(struct kref *kref)
>   {
>   	unsigned long bitmap_sz;
> @@ -759,10 +763,8 @@ static void iommu_table_free(struct kref *kref)
>   		return;
>   	}
>   
> -	iommu_table_release_pages(tbl);
> -
>   	/* verify that table contains no entries */
> -	if (!bitmap_empty(tbl->it_map, tbl->it_size))
> +	if (iommu_table_in_use(tbl))
>   		pr_warn("%s: Unexpected TCEs\n", __func__);
>   
>   	/* calculate bitmap size in bytes */
> @@ -1068,18 +1070,13 @@ int iommu_take_ownership(struct iommu_table *tbl)
>   	for (i = 0; i < tbl->nr_pools; i++)
>   		spin_lock(&tbl->pools[i].lock);
>   
> -	iommu_table_release_pages(tbl);
> -
> -	if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
> +	if (iommu_table_in_use(tbl)) {
>   		pr_err("iommu_tce: it_map is not empty");
>   		ret = -EBUSY;
> -		/* Undo iommu_table_release_pages, i.e. restore bit#0, etc */
> -		iommu_table_reserve_pages(tbl, tbl->it_reserved_start,
> -				tbl->it_reserved_end);
> -	} else {
> -		memset(tbl->it_map, 0xff, sz);
>   	}
>   
> +	memset(tbl->it_map, 0xff, sz);
> +
>   	for (i = 0; i < tbl->nr_pools; i++)
>   		spin_unlock(&tbl->pools[i].lock);
>   	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
> 

-- 
--
Alexey

^ permalink raw reply

* Re: [PATCH v2 01/14] powerpc/pseries/iommu: Replace hard-coded page shift
From: Alexey Kardashevskiy @ 2020-09-29  3:56 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev
In-Reply-To: <20200911170738.82818-2-leobras.c@gmail.com>



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,

These normally go right before "signed-off-by".


> 
> Some functions assume IOMMU page size can only be 4K (pageshift == 12).
> Update them to accept any page size passed, so we can use 64K pages.
> 
> In the process, some defines like TCE_SHIFT were made obsolete, and then
> removed.
> 
> IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show
> a RPN of 52-bit, and considers a 12-bit pageshift, so there should be
> no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn.
> It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and
> tce_buildmulti_pSeriesLP().
> 
> Most places had a tbl struct, so using tbl->it_page_shift was simple.
> tce_free_pSeriesLP() was a special case, since callers not always have a
> tbl struct, so adding a tceshift parameter seems the right thing to do.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>   arch/powerpc/include/asm/tce.h         |  8 ------
>   arch/powerpc/platforms/pseries/iommu.c | 39 +++++++++++++++-----------
>   2 files changed, 23 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
> index db5fc2f2262d..0c34d2756d92 100644
> --- a/arch/powerpc/include/asm/tce.h
> +++ b/arch/powerpc/include/asm/tce.h
> @@ -19,15 +19,7 @@
>   #define TCE_VB			0
>   #define TCE_PCI			1
>   
> -/* TCE page size is 4096 bytes (1 << 12) */
> -
> -#define TCE_SHIFT	12
> -#define TCE_PAGE_SIZE	(1 << TCE_SHIFT)
> -
>   #define TCE_ENTRY_SIZE		8		/* each TCE is 64 bits */
> -
> -#define TCE_RPN_MASK		0xfffffffffful  /* 40-bit RPN (4K pages) */
> -#define TCE_RPN_SHIFT		12
>   #define TCE_VALID		0x800		/* TCE valid */
>   #define TCE_ALLIO		0x400		/* TCE valid for all lpars */
>   #define TCE_PCI_WRITE		0x2		/* write from PCI allowed */
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index e4198700ed1a..9db3927607a4 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -107,6 +107,8 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
>   	u64 proto_tce;
>   	__be64 *tcep;
>   	u64 rpn;
> +	const unsigned long tceshift = tbl->it_page_shift;
> +	const unsigned long pagesize = IOMMU_PAGE_SIZE(tbl);
>   
>   	proto_tce = TCE_PCI_READ; // Read allowed
>   
> @@ -117,10 +119,10 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
>   
>   	while (npages--) {
>   		/* can't move this out since we might cross MEMBLOCK boundary */
> -		rpn = __pa(uaddr) >> TCE_SHIFT;
> -		*tcep = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
> +		rpn = __pa(uaddr) >> tceshift;
> +		*tcep = cpu_to_be64(proto_tce | rpn << tceshift);
>   
> -		uaddr += TCE_PAGE_SIZE;
> +		uaddr += pagesize;
>   		tcep++;
>   	}
>   	return 0;
> @@ -146,7 +148,7 @@ static unsigned long tce_get_pseries(struct iommu_table *tbl, long index)
>   	return be64_to_cpu(*tcep);
>   }
>   
> -static void tce_free_pSeriesLP(unsigned long liobn, long, long);
> +static void tce_free_pSeriesLP(unsigned long liobn, long, long, long);
>   static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
>   
>   static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
> @@ -166,12 +168,12 @@ static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
>   		proto_tce |= TCE_PCI_WRITE;
>   
>   	while (npages--) {
> -		tce = proto_tce | (rpn & TCE_RPN_MASK) << tceshift;
> +		tce = proto_tce | rpn << tceshift;
>   		rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, tce);
>   
>   		if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
>   			ret = (int)rc;
> -			tce_free_pSeriesLP(liobn, tcenum_start,
> +			tce_free_pSeriesLP(liobn, tcenum_start, tceshift,
>   			                   (npages_start - (npages + 1)));
>   			break;
>   		}
> @@ -205,10 +207,11 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   	long tcenum_start = tcenum, npages_start = npages;
>   	int ret = 0;
>   	unsigned long flags;
> +	const unsigned long tceshift = tbl->it_page_shift;
>   
>   	if ((npages == 1) || !firmware_has_feature(FW_FEATURE_PUT_TCE_IND)) {
>   		return tce_build_pSeriesLP(tbl->it_index, tcenum,
> -					   tbl->it_page_shift, npages, uaddr,
> +					   tceshift, npages, uaddr,
>   		                           direction, attrs);
>   	}
>   
> @@ -225,13 +228,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   		if (!tcep) {
>   			local_irq_restore(flags);
>   			return tce_build_pSeriesLP(tbl->it_index, tcenum,
> -					tbl->it_page_shift,
> +					tceshift,
>   					npages, uaddr, direction, attrs);
>   		}
>   		__this_cpu_write(tce_page, tcep);
>   	}
>   
> -	rpn = __pa(uaddr) >> TCE_SHIFT;
> +	rpn = __pa(uaddr) >> tceshift;
>   	proto_tce = TCE_PCI_READ;
>   	if (direction != DMA_TO_DEVICE)
>   		proto_tce |= TCE_PCI_WRITE;
> @@ -245,12 +248,12 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   		limit = min_t(long, npages, 4096/TCE_ENTRY_SIZE);
>   
>   		for (l = 0; l < limit; l++) {
> -			tcep[l] = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
> +			tcep[l] = cpu_to_be64(proto_tce | rpn << tceshift);
>   			rpn++;
>   		}
>   
>   		rc = plpar_tce_put_indirect((u64)tbl->it_index,
> -					    (u64)tcenum << 12,
> +					    (u64)tcenum << tceshift,
>   					    (u64)__pa(tcep),
>   					    limit);
>   
> @@ -277,12 +280,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   	return ret;
>   }
>   
> -static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long npages)
> +static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
> +			       long npages)
>   {
>   	u64 rc;
>   
>   	while (npages--) {
> -		rc = plpar_tce_put((u64)liobn, (u64)tcenum << 12, 0);
> +		rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, 0);
>   
>   		if (rc && printk_ratelimit()) {
>   			printk("tce_free_pSeriesLP: plpar_tce_put failed. rc=%lld\n", rc);
> @@ -301,9 +305,11 @@ static void tce_freemulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long n
>   	u64 rc;
>   
>   	if (!firmware_has_feature(FW_FEATURE_STUFF_TCE))
> -		return tce_free_pSeriesLP(tbl->it_index, tcenum, npages);
> +		return tce_free_pSeriesLP(tbl->it_index, tcenum,
> +					  tbl->it_page_shift, npages);
>   
> -	rc = plpar_tce_stuff((u64)tbl->it_index, (u64)tcenum << 12, 0, npages);
> +	rc = plpar_tce_stuff((u64)tbl->it_index,
> +			     (u64)tcenum << tbl->it_page_shift, 0, npages);
>   
>   	if (rc && printk_ratelimit()) {
>   		printk("tce_freemulti_pSeriesLP: plpar_tce_stuff failed\n");
> @@ -319,7 +325,8 @@ static unsigned long tce_get_pSeriesLP(struct iommu_table *tbl, long tcenum)
>   	u64 rc;
>   	unsigned long tce_ret;
>   
> -	rc = plpar_tce_get((u64)tbl->it_index, (u64)tcenum << 12, &tce_ret);
> +	rc = plpar_tce_get((u64)tbl->it_index,
> +			   (u64)tcenum << tbl->it_page_shift, &tce_ret);
>   
>   	if (rc && printk_ratelimit()) {
>   		printk("tce_get_pSeriesLP: plpar_tce_get failed. rc=%lld\n", rc);
> 

-- 
--
Alexey

^ permalink raw reply

* Re: [PATCH v2 09/14] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()
From: Alexey Kardashevskiy @ 2020-09-29  3:56 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev
In-Reply-To: <20200911170738.82818-10-leobras.c@gmail.com>



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> Code used to create a ddw property that was previously scattered in
> enable_ddw() is now gathered in ddw_property_create(), which deals with
> allocation and filling the property, letting it ready for
> of_property_add(), which now occurs in sequence.
> 
> This created an opportunity to reorganize the second part of enable_ddw():
> 
> Without this patch enable_ddw() does, in order:
> kzalloc() property & members, create_ddw(), fill ddwprop inside property,
> ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
> of_add_property(), and list_add().
> 
> With this patch enable_ddw() does, in order:
> create_ddw(), ddw_property_create(), of_add_property(),
> ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
> and list_add().
> 
> This change requires of_remove_property() in case anything fails after
> of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk
> in all memory, which looks the most expensive operation, only if
> everything else succeeds.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 106 +++++++++++++++----------
>   1 file changed, 63 insertions(+), 43 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index bba8584a8f9d..510ccb0521af 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -177,7 +177,7 @@ static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
>   		if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
>   			ret = (int)rc;
>   			tce_free_pSeriesLP(liobn, tcenum_start, tceshift,
> -			                   (npages_start - (npages + 1)));
> +					   (npages_start - (npages + 1)));

Unrelated spaces change, the existing code does not seem incorrect (tabs 
first, then spaces).


>   			break;
>   		}
>   
> @@ -215,7 +215,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   	if ((npages == 1) || !firmware_has_feature(FW_FEATURE_PUT_TCE_IND)) {
>   		return tce_build_pSeriesLP(tbl->it_index, tcenum,
>   					   tceshift, npages, uaddr,
> -		                           direction, attrs);
> +					   direction, attrs);


And here.

>   	}
>   
>   	local_irq_save(flags);	/* to protect tcep and the page behind it */
> @@ -269,7 +269,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   	if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
>   		ret = (int)rc;
>   		tce_freemulti_pSeriesLP(tbl, tcenum_start,
> -		                        (npages_start - (npages + limit)));
> +					(npages_start - (npages + limit)));

And here.

>   		return ret;
>   	}
>   
> @@ -1121,6 +1121,35 @@ static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
>   			 ret);
>   }
>   
> +static struct property *ddw_property_create(const char *propname, u32 liobn, u64 dma_addr,
> +					    u32 page_shift, u32 window_shift)
> +{
> +	struct dynamic_dma_window_prop *ddwprop;
> +	struct property *win64;
> +
> +	win64 = kzalloc(sizeof(*win64), GFP_KERNEL);
> +	if (!win64)
> +		return NULL;
> +
> +	win64->name = kstrdup(propname, GFP_KERNEL);
> +	ddwprop = kzalloc(sizeof(*ddwprop), GFP_KERNEL);
> +	win64->value = ddwprop;
> +	win64->length = sizeof(*ddwprop);
> +	if (!win64->name || !win64->value) {
> +		kfree(win64);
> +		kfree(win64->name);
> +		kfree(win64->value);
> +		return NULL;
> +	}
> +
> +	ddwprop->liobn = cpu_to_be32(liobn);
> +	ddwprop->dma_base = cpu_to_be64(dma_addr);
> +	ddwprop->tce_shift = cpu_to_be32(page_shift);
> +	ddwprop->window_shift = cpu_to_be32(window_shift);
> +
> +	return win64;
> +}
> +
>   /*
>    * If the PE supports dynamic dma windows, and there is space for a table
>    * that can map all pages in a linear offset, then setup such a table,
> @@ -1138,12 +1167,11 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	struct ddw_query_response query;
>   	struct ddw_create_response create;
>   	int page_shift;
> -	u64 max_addr;
> +	u64 max_addr, win_addr;
>   	struct device_node *dn;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   	struct direct_window *window;
> -	struct property *win64;
> -	struct dynamic_dma_window_prop *ddwprop;
> +	struct property *win64 = NULL;
>   	struct failed_ddw_pdn *fpdn;
>   	bool default_win_removed = false;
>   
> @@ -1245,72 +1273,64 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		goto out_failed;
>   	}
>   	len = order_base_2(max_addr);
> -	win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
> -	if (!win64) {
> -		dev_info(&dev->dev,
> -			"couldn't allocate property for 64bit dma window\n");
> -		goto out_failed;
> -	}
> -	win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
> -	win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
> -	win64->length = sizeof(*ddwprop);
> -	if (!win64->name || !win64->value) {
> -		dev_info(&dev->dev,
> -			"couldn't allocate property name and value\n");
> -		goto out_free_prop;
> -	}
>   
>   	ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
>   	if (ret != 0)
> -		goto out_free_prop;
> -
> -	ddwprop->liobn = cpu_to_be32(create.liobn);
> -	ddwprop->dma_base = cpu_to_be64(((u64)create.addr_hi << 32) |
> -			create.addr_lo);
> -	ddwprop->tce_shift = cpu_to_be32(page_shift);
> -	ddwprop->window_shift = cpu_to_be32(len);
> +		goto out_failed;
>   
>   	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
> -		  create.liobn, dn);
> +		create.liobn, dn);


Unrelated. If you think the spaces/tabs thing needs to be fixed, make it 
a separate patch and do all these changes there at once.


>   
> -	window = ddw_list_new_entry(pdn, ddwprop);
> +	win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
> +	win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
> +				    page_shift, len);
> +	if (!win64) {
> +		dev_info(&dev->dev,
> +			 "couldn't allocate property, property name, or value\n");
> +		goto out_win_del;
> +	}
> +
> +	ret = of_add_property(pdn, win64);
> +	if (ret) {
> +		dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
> +			pdn, ret);
> +		goto out_prop_free;
> +	}
> +
> +	window = ddw_list_new_entry(pdn, win64->value);
>   	if (!window)
> -		goto out_clear_window;
> +		goto out_prop_del;
>   
>   	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
>   			win64->value, tce_setrange_multi_pSeriesLP_walk);
>   	if (ret) {
>   		dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
>   			 dn, ret);
> -		goto out_free_window;
> -	}
> -
> -	ret = of_add_property(pdn, win64);
> -	if (ret) {
> -		dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
> -			 pdn, ret);
> -		goto out_free_window;
> +		goto out_list_del;
>   	}
>   
>   	spin_lock(&direct_window_list_lock);
>   	list_add(&window->list, &direct_window_list);
>   	spin_unlock(&direct_window_list_lock);
>   
> -	dev->dev.archdata.dma_offset = be64_to_cpu(ddwprop->dma_base);
> +	dev->dev.archdata.dma_offset = win_addr;
>   	goto out_unlock;
>   
> -out_free_window:
> +out_list_del:
>   	kfree(window);
>   
> -out_clear_window:
> -	remove_ddw(pdn, true);
> +out_prop_del:
> +	of_remove_property(pdn, win64);
>   
> -out_free_prop:
> +out_prop_free:


Really? :) s/out_prop_del/out_del_prop/ may be? The less unrelated 
changes the better.


>   	kfree(win64->name);
>   	kfree(win64->value);
>   	kfree(win64);
>   	win64 = NULL;
>   
> +out_win_del:
> +	remove_ddw(pdn, true);
> +
>   out_failed:
>   	if (default_win_removed)
>   		reset_dma_window(dev, pdn);
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH v2 10/14] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper
From: Alexey Kardashevskiy @ 2020-09-29  3:56 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev
In-Reply-To: <20200911170738.82818-11-leobras.c@gmail.com>



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> Add a new helper _iommu_table_setparms(), and use it in
> iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated
> code.
> 
> Also, setting tbl->it_ops was happening outsite iommu_table_setparms*(),
> so move it to the new helper. Since we need the iommu_table_ops to be
> declared before used, move iommu_table_lpar_multi_ops and
> iommu_table_pseries_ops to before their respective iommu_table_setparms*().
> 
> The tce_exchange_pseries() also had to be moved up, since it's used in
> iommu_table_lpar_multi_ops.xchg_no_kill.


Use forward declarations (preferred) or make a separate patch for moving 
chunks (I do not see much point).


> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 149 ++++++++++++-------------
>   1 file changed, 72 insertions(+), 77 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 510ccb0521af..abd36b257725 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -495,12 +495,62 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
>   	return rc;
>   }
>   
> +#ifdef CONFIG_IOMMU_API
> +static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
> +				long *tce, enum dma_data_direction *direction,
> +				bool realmode)
> +{
> +	long rc;
> +	unsigned long ioba = (unsigned long)index << tbl->it_page_shift;
> +	unsigned long flags, oldtce = 0;
> +	u64 proto_tce = iommu_direction_to_tce_perm(*direction);
> +	unsigned long newtce = *tce | proto_tce;
> +
> +	spin_lock_irqsave(&tbl->large_pool.lock, flags);
> +
> +	rc = plpar_tce_get((u64)tbl->it_index, ioba, &oldtce);
> +	if (!rc)
> +		rc = plpar_tce_put((u64)tbl->it_index, ioba, newtce);
> +
> +	if (!rc) {
> +		*direction = iommu_tce_direction(oldtce);
> +		*tce = oldtce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
> +	}
> +
> +	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
> +
> +	return rc;
> +}
> +#endif
> +
>   static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
>   		unsigned long num_pfn, void *arg)
>   {
>   	return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
>   }
>   
> +static inline void _iommu_table_setparms(struct iommu_table *tbl, unsigned long busno,
> +					 unsigned long liobn, unsigned long win_addr,
> +					 unsigned long window_size, unsigned long page_shift,
> +					 unsigned long base, struct iommu_table_ops *table_ops)
> +{
> +	tbl->it_busno = busno;
> +	tbl->it_index = liobn;
> +	tbl->it_offset = win_addr >> page_shift;
> +	tbl->it_size = window_size >> page_shift;
> +	tbl->it_page_shift = page_shift;
> +	tbl->it_base = base;
> +	tbl->it_blocksize = 16;
> +	tbl->it_type = TCE_PCI;
> +	tbl->it_ops = table_ops;
> +}
> +
> +struct iommu_table_ops iommu_table_pseries_ops = {
> +	.set = tce_build_pSeries,
> +	.clear = tce_free_pSeries,
> +	.get = tce_get_pseries
> +};
> +
>   static void iommu_table_setparms(struct pci_controller *phb,
>   				 struct device_node *dn,
>   				 struct iommu_table *tbl)
> @@ -509,8 +559,13 @@ static void iommu_table_setparms(struct pci_controller *phb,
>   	const unsigned long *basep;
>   	const u32 *sizep;
>   
> -	node = phb->dn;
> +	/* Test if we are going over 2GB of DMA space */
> +	if (phb->dma_window_base_cur + phb->dma_window_size > 0x80000000ul) {
> +		udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> +		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> +	}


s/0x80000000ul/2*SZ_1G/

but more to the point - why this check? QEMU can create windows at 0 and 
as big as the VM requested. And I am pretty sure I can construct QEMU 
command line such as it won't have MMIO32 at all and a 4GB default DMA 
window.


>   
> +	node = phb->dn;
>   	basep = of_get_property(node, "linux,tce-base", NULL);
>   	sizep = of_get_property(node, "linux,tce-size", NULL);
>   	if (basep == NULL || sizep == NULL) {
> @@ -519,33 +574,25 @@ static void iommu_table_setparms(struct pci_controller *phb,
>   		return;
>   	}
>   
> -	tbl->it_base = (unsigned long)__va(*basep);
> +	_iommu_table_setparms(tbl, phb->bus->number, 0, phb->dma_window_base_cur,
> +			      phb->dma_window_size, IOMMU_PAGE_SHIFT_4K,
> +			      (unsigned long)__va(*basep), &iommu_table_pseries_ops);
>   
>   	if (!is_kdump_kernel())
>   		memset((void *)tbl->it_base, 0, *sizep);
>   
> -	tbl->it_busno = phb->bus->number;
> -	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> -
> -	/* Units of tce entries */
> -	tbl->it_offset = phb->dma_window_base_cur >> tbl->it_page_shift;
> -
> -	/* Test if we are going over 2GB of DMA space */
> -	if (phb->dma_window_base_cur + phb->dma_window_size > 0x80000000ul) {
> -		udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> -		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> -	}
> -
>   	phb->dma_window_base_cur += phb->dma_window_size;
> -
> -	/* Set the tce table size - measured in entries */
> -	tbl->it_size = phb->dma_window_size >> tbl->it_page_shift;
> -
> -	tbl->it_index = 0;
> -	tbl->it_blocksize = 16;
> -	tbl->it_type = TCE_PCI;
>   }
>   
> +struct iommu_table_ops iommu_table_lpar_multi_ops = {
> +	.set = tce_buildmulti_pSeriesLP,
> +#ifdef CONFIG_IOMMU_API
> +	.xchg_no_kill = tce_exchange_pseries,
> +#endif
> +	.clear = tce_freemulti_pSeriesLP,
> +	.get = tce_get_pSeriesLP
> +};
> +
>   /*
>    * iommu_table_setparms_lpar
>    *
> @@ -557,28 +604,17 @@ static void iommu_table_setparms_lpar(struct pci_controller *phb,
>   				      struct iommu_table_group *table_group,
>   				      const __be32 *dma_window)
>   {
> -	unsigned long offset, size;
> +	unsigned long offset, size, liobn;
>   
> -	of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset, &size);
> +	of_parse_dma_window(dn, dma_window, &liobn, &offset, &size);
>   
> -	tbl->it_busno = phb->bus->number;
> -	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> -	tbl->it_base   = 0;
> -	tbl->it_blocksize  = 16;
> -	tbl->it_type = TCE_PCI;
> -	tbl->it_offset = offset >> tbl->it_page_shift;
> -	tbl->it_size = size >> tbl->it_page_shift;
> +	_iommu_table_setparms(tbl, phb->bus->number, liobn, offset, size, IOMMU_PAGE_SHIFT_4K, 0,
> +			      &iommu_table_lpar_multi_ops);
>   
>   	table_group->tce32_start = offset;
>   	table_group->tce32_size = size;
>   }
>   
> -struct iommu_table_ops iommu_table_pseries_ops = {
> -	.set = tce_build_pSeries,
> -	.clear = tce_free_pSeries,
> -	.get = tce_get_pseries
> -};
> -
>   static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
>   {
>   	struct device_node *dn;
> @@ -647,7 +683,6 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
>   	tbl = pci->table_group->tables[0];
>   
>   	iommu_table_setparms(pci->phb, dn, tbl);
> -	tbl->it_ops = &iommu_table_pseries_ops;
>   	iommu_init_table(tbl, pci->phb->node, 0, 0);
>   
>   	/* Divide the rest (1.75GB) among the children */
> @@ -658,43 +693,6 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
>   	pr_debug("ISA/IDE, window size is 0x%llx\n", pci->phb->dma_window_size);
>   }
>   
> -#ifdef CONFIG_IOMMU_API
> -static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
> -				long *tce, enum dma_data_direction *direction,
> -				bool realmode)
> -{
> -	long rc;
> -	unsigned long ioba = (unsigned long) index << tbl->it_page_shift;
> -	unsigned long flags, oldtce = 0;
> -	u64 proto_tce = iommu_direction_to_tce_perm(*direction);
> -	unsigned long newtce = *tce | proto_tce;
> -
> -	spin_lock_irqsave(&tbl->large_pool.lock, flags);
> -
> -	rc = plpar_tce_get((u64)tbl->it_index, ioba, &oldtce);
> -	if (!rc)
> -		rc = plpar_tce_put((u64)tbl->it_index, ioba, newtce);
> -
> -	if (!rc) {
> -		*direction = iommu_tce_direction(oldtce);
> -		*tce = oldtce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
> -	}
> -
> -	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
> -
> -	return rc;
> -}
> -#endif
> -
> -struct iommu_table_ops iommu_table_lpar_multi_ops = {
> -	.set = tce_buildmulti_pSeriesLP,
> -#ifdef CONFIG_IOMMU_API
> -	.xchg_no_kill = tce_exchange_pseries,
> -#endif
> -	.clear = tce_freemulti_pSeriesLP,
> -	.get = tce_get_pSeriesLP
> -};
> -
>   static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>   {
>   	struct iommu_table *tbl;
> @@ -729,7 +727,6 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>   		tbl = ppci->table_group->tables[0];
>   		iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
>   				ppci->table_group, dma_window);
> -		tbl->it_ops = &iommu_table_lpar_multi_ops;
>   		iommu_init_table(tbl, ppci->phb->node, 0, 0);
>   		iommu_register_group(ppci->table_group,
>   				pci_domain_nr(bus), 0);
> @@ -758,7 +755,6 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
>   		PCI_DN(dn)->table_group = iommu_pseries_alloc_group(phb->node);
>   		tbl = PCI_DN(dn)->table_group->tables[0];
>   		iommu_table_setparms(phb, dn, tbl);
> -		tbl->it_ops = &iommu_table_pseries_ops;
>   		iommu_init_table(tbl, phb->node, 0, 0);
>   		set_iommu_table_base(&dev->dev, tbl);
>   		return;
> @@ -1385,7 +1381,6 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
>   		tbl = pci->table_group->tables[0];
>   		iommu_table_setparms_lpar(pci->phb, pdn, tbl,
>   				pci->table_group, dma_window);
> -		tbl->it_ops = &iommu_table_lpar_multi_ops;
>   		iommu_init_table(tbl, pci->phb->node, 0, 0);
>   		iommu_register_group(pci->table_group,
>   				pci_domain_nr(pci->phb->bus), 0);
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping
From: Alexey Kardashevskiy @ 2020-09-29  3:56 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev
In-Reply-To: <20200911170738.82818-14-leobras.c@gmail.com>



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> So far it's assumed possible to map the guest RAM 1:1 to the bus, which
> works with a small number of devices. SRIOV changes it as the user can
> configure hundreds VFs and since phyp preallocates TCEs and does not
> allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
> per a PE to limit waste of physical pages.
> 
> As of today, if the assumed direct mapping is not possible, DDW creation
> is skipped and the default DMA window "ibm,dma-window" is used instead.
> 
> The default DMA window uses 4k pages instead of 64k pages, and since
> the amount of pages (TCEs) may stay the same (on pHyp case), making
> use of DDW instead of the default DMA window for indirect mapping will
> expand in up to 16x the amount of memory that can be mapped on DMA.
> 
> Indirect mapping will only be used if direct mapping is not a
> possibility.
> 
> For indirect mapping, it's necessary to re-create the iommu_table with
> the new DMA window parameters, so iommu_alloc() can use it.
> 
> Removing the default DMA window for using DDW with indirect mapping
> is only allowed if there is no current IOMMU memory allocated in
> the iommu_table. enable_ddw() is aborted otherwise.
> 
> Even though there won't be both direct and indirect mappings at the
> same time, we can't reuse the DIRECT64_PROPNAME property name, or else
> an older kexec()ed kernel can assume direct mapping, and skip
> iommu_alloc(), causing undesirable behavior.
> So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info"
> was created to represent a DDW that does not allow direct mapping.
> 
> Note: ddw_memory_hotplug_max() was moved up so it can be used in
> find_existing_ddw().
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 160 ++++++++++++++++---------
>   1 file changed, 103 insertions(+), 57 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 9b7c03652e72..c4de23080d1b 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -375,6 +375,7 @@ static DEFINE_SPINLOCK(direct_window_list_lock);
>   /* protects initializing window twice for same device */
>   static DEFINE_MUTEX(direct_window_init_mutex);
>   #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> +#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
>   
>   static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
>   					unsigned long num_pfn, const void *arg)
> @@ -846,10 +847,48 @@ static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_
>   	return 0;
>   }
>   
> -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
> +static phys_addr_t ddw_memory_hotplug_max(void)


Please, forward declaration or a separate patch; this creates 
unnecessary noise to the actual change.


> +{
> +	phys_addr_t max_addr = memory_hotplug_max();
> +	struct device_node *memory;
> +
> +	/*
> +	 * The "ibm,pmemory" can appear anywhere in the address space.
> +	 * Assuming it is still backed by page structs, set the upper limit
> +	 * for the huge DMA window as MAX_PHYSMEM_BITS.
> +	 */
> +	if (of_find_node_by_type(NULL, "ibm,pmemory"))
> +		return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
> +			(phys_addr_t)-1 : (1ULL << MAX_PHYSMEM_BITS);
> +
> +	for_each_node_by_type(memory, "memory") {
> +		unsigned long start, size;
> +		int n_mem_addr_cells, n_mem_size_cells, len;
> +		const __be32 *memcell_buf;
> +
> +		memcell_buf = of_get_property(memory, "reg", &len);
> +		if (!memcell_buf || len <= 0)
> +			continue;
> +
> +		n_mem_addr_cells = of_n_addr_cells(memory);
> +		n_mem_size_cells = of_n_size_cells(memory);
> +
> +		start = of_read_number(memcell_buf, n_mem_addr_cells);
> +		memcell_buf += n_mem_addr_cells;
> +		size = of_read_number(memcell_buf, n_mem_size_cells);
> +		memcell_buf += n_mem_size_cells;
> +
> +		max_addr = max_t(phys_addr_t, max_addr, start + size);
> +	}
> +
> +	return max_addr;
> +}
> +
> +static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, bool *direct_mapping)
>   {
>   	struct direct_window *window;
>   	const struct dynamic_dma_window_prop *direct64;
> +	unsigned long window_size;
>   	bool found = false;
>   
>   	spin_lock(&direct_window_list_lock);
> @@ -858,6 +897,10 @@ static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
>   		if (window->device == pdn) {
>   			direct64 = window->prop;
>   			*dma_addr = be64_to_cpu(direct64->dma_base);
> +
> +			window_size = (1UL << be32_to_cpu(direct64->window_shift));
> +			*direct_mapping = (window_size >= ddw_memory_hotplug_max());
> +
>   			found = true;
>   			break;
>   		}
> @@ -912,6 +955,7 @@ static int find_existing_ddw_windows(void)
>   		return 0;
>   
>   	find_existing_ddw_windows_named(DIRECT64_PROPNAME);
> +	find_existing_ddw_windows_named(DMA64_PROPNAME);
>   
>   	return 0;
>   }
> @@ -1054,43 +1098,6 @@ struct failed_ddw_pdn {
>   
>   static LIST_HEAD(failed_ddw_pdn_list);
>   
> -static phys_addr_t ddw_memory_hotplug_max(void)
> -{
> -	phys_addr_t max_addr = memory_hotplug_max();
> -	struct device_node *memory;
> -
> -	/*
> -	 * The "ibm,pmemory" can appear anywhere in the address space.
> -	 * Assuming it is still backed by page structs, set the upper limit
> -	 * for the huge DMA window as MAX_PHYSMEM_BITS.
> -	 */
> -	if (of_find_node_by_type(NULL, "ibm,pmemory"))
> -		return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
> -			(phys_addr_t) -1 : (1ULL << MAX_PHYSMEM_BITS);
> -
> -	for_each_node_by_type(memory, "memory") {
> -		unsigned long start, size;
> -		int n_mem_addr_cells, n_mem_size_cells, len;
> -		const __be32 *memcell_buf;
> -
> -		memcell_buf = of_get_property(memory, "reg", &len);
> -		if (!memcell_buf || len <= 0)
> -			continue;
> -
> -		n_mem_addr_cells = of_n_addr_cells(memory);
> -		n_mem_size_cells = of_n_size_cells(memory);
> -
> -		start = of_read_number(memcell_buf, n_mem_addr_cells);
> -		memcell_buf += n_mem_addr_cells;
> -		size = of_read_number(memcell_buf, n_mem_size_cells);
> -		memcell_buf += n_mem_size_cells;
> -
> -		max_addr = max_t(phys_addr_t, max_addr, start + size);
> -	}
> -
> -	return max_addr;
> -}
> -
>   /*
>    * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
>    * ibm,ddw-extensions, which carries the rtas token for
> @@ -1173,14 +1180,19 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	struct device_node *dn;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   	struct direct_window *window;
> +	const char *win_name;
>   	struct property *win64 = NULL;
>   	struct failed_ddw_pdn *fpdn;
> -	bool default_win_removed = false;
> +	bool default_win_removed = false, direct_mapping = false;
> +	struct pci_dn *pci = PCI_DN(pdn);
> +	struct iommu_table *tbl = pci->table_group->tables[0];
>   
>   	mutex_lock(&direct_window_init_mutex);
>   
> -	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset))
> -		goto out_unlock;
> +	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &direct_mapping)) {
> +		mutex_unlock(&direct_window_init_mutex);
> +		return direct_mapping;
> +	}
>   
>   	/*
>   	 * If we already went through this for a previous function of
> @@ -1266,15 +1278,25 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	}
>   
>   	/* verify the window * number of ptes will map the partition */
> -	/* check largest block * page size > max memory hotplug addr */
>   	max_addr = ddw_memory_hotplug_max();
>   	if (query.largest_available_block < (max_addr >> page_shift)) {
> -		dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
> -			  "%llu-sized pages\n", max_addr,  query.largest_available_block,
> -			  1ULL << page_shift);
> +		dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu %llu-sized pages\n",
> +			max_addr, query.largest_available_block,
> +			1ULL << page_shift);
> +
> +		len = order_base_2(query.largest_available_block << page_shift);
> +		win_name = DMA64_PROPNAME;
> +	} else {
> +		direct_mapping = true;
> +		len = order_base_2(max_addr);
> +		win_name = DIRECT64_PROPNAME;
> +	}
> +
> +	/* DDW + IOMMU on single window may fail if there is any allocation */
> +	if (default_win_removed && !direct_mapping && iommu_table_in_use(tbl)) {
> +		dev_dbg(&dev->dev, "current IOMMU table in use, can't be replaced.\n");
>   		goto out_failed;
>   	}
> -	len = order_base_2(max_addr);
>   
>   	ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
>   	if (ret != 0)
> @@ -1284,8 +1306,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		create.liobn, dn);
>   
>   	win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
> -	win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
> -				    page_shift, len);
> +	win64 = ddw_property_create(win_name, create.liobn, win_addr, page_shift, len);
>   	if (!win64) {
>   		dev_info(&dev->dev,
>   			 "couldn't allocate property, property name, or value\n");
> @@ -1300,15 +1321,37 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	}
>   
>   	window = ddw_list_new_entry(pdn, win64->value);
> -	if (!window)
> +	if (!window) {
> +		dev_dbg(&dev->dev, "couldn't create new list entry\n");
>   		goto out_prop_del;
> +	}
>   
> -	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
> -			win64->value, tce_setrange_multi_pSeriesLP_walk);
> -	if (ret) {
> -		dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> -			 dn, ret);
> -		goto out_list_del;
> +	if (direct_mapping) {
> +		/* DDW maps the whole partition, so enable direct DMA mapping */
> +		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
> +					    win64->value, tce_setrange_multi_pSeriesLP_walk);
> +		if (ret) {
> +			dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> +				 dn, ret);
> +			goto out_list_del;
> +		}
> +	} else {
> +		/* New table for using DDW instead of the default DMA window */
> +		tbl = iommu_pseries_alloc_table(pci->phb->node);
> +		if (!tbl) {
> +			dev_dbg(&dev->dev, "couldn't create new IOMMU table\n");
> +			goto out_list_del;
> +		}
> +
> +		_iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, win_addr,
> +				      1UL << len, page_shift, 0, &iommu_table_lpar_multi_ops);
> +		iommu_init_table(tbl, pci->phb->node, 0, 0);


It is 0,0 only if win_addr>0 which is not the QEMU case.


> +
> +		/* Free old table and replace by the newer */
> +		iommu_tce_table_put(pci->table_group->tables[0]);
> +		pci->table_group->tables[0] = tbl;
> +
> +		set_iommu_table_base(&dev->dev, tbl);
>   	}
>   
>   	spin_lock(&direct_window_list_lock);
> @@ -1345,7 +1388,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   
>   out_unlock:
>   	mutex_unlock(&direct_window_init_mutex);
> -	return win64;
> +	return win64 && direct_mapping;
>   }
>   
>   static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
> @@ -1486,7 +1529,10 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   		 * we have to remove the property when releasing
>   		 * the device node.
>   		 */
> -		remove_ddw(np, false, DIRECT64_PROPNAME);
> +
> +		if (remove_ddw(np, false, DIRECT64_PROPNAME))
> +			remove_ddw(np, false, DMA64_PROPNAME);
> +
>   		if (pci && pci->table_group)
>   			iommu_pseries_free_group(pci->table_group,
>   					np->full_name);
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH v2 11/14] powerpc/pseries/iommu: Update remove_dma_window() to accept property name
From: Alexey Kardashevskiy @ 2020-09-29  3:56 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev
In-Reply-To: <20200911170738.82818-12-leobras.c@gmail.com>



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> Update remove_dma_window() so it can be used to remove DDW with a given
> property name.
> 

Out of context this seems useless. How about?
===
At the moment pseries stores information about created directly mapped 
DDW window in DIRECT64_PROPNAME. We are going to implement indirect DDW 
window which we need to preserve during kexec so we need another 
property for that.
===

Feel free to correct my english :)


> This enables the creation of new property names for DDW, so we can
> have different usage for it, like indirect mapping.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index abd36b257725..f6a65ecd1db5 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -818,31 +818,32 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
>   			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
>   }
>   
> -static void remove_ddw(struct device_node *np, bool remove_prop)
> +static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_name)
>   {
>   	struct property *win;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   	int ret = 0;
>   
> +	win = of_find_property(np, win_name, NULL);
> +	if (!win)
> +		return -EINVAL;
> +
>   	ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
>   					 &ddw_avail[0], DDW_APPLICABLE_SIZE);
>   	if (ret)
> -		return;
> -
> -	win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> -	if (!win)
> -		return;
> +		return 0;
>   
>   	if (win->length >= sizeof(struct dynamic_dma_window_prop))
>   		remove_dma_window(np, ddw_avail, win);
>   
>   	if (!remove_prop)
> -		return;
> +		return 0;
>   
>   	ret = of_remove_property(np, win);
>   	if (ret)
>   		pr_warn("%pOF: failed to remove direct window property: %d\n",
>   			np, ret);
> +	return 0;


You do not test the return code anywhere until 13/14 so I'd say merge 
this one into 13/14, the same comment applies to 12/14. If you do not 
move chunks in 13/14, it is going to be fairly small patch.


>   }
>   
>   static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
> @@ -894,7 +895,7 @@ static int find_existing_ddw_windows(void)
>   	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
>   		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
>   		if (!direct64 || len < sizeof(*direct64)) {
> -			remove_ddw(pdn, true);
> +			remove_ddw(pdn, true, DIRECT64_PROPNAME);
>   			continue;
>   		}
>   
> @@ -1325,7 +1326,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	win64 = NULL;
>   
>   out_win_del:
> -	remove_ddw(pdn, true);
> +	remove_ddw(pdn, true, DIRECT64_PROPNAME);
>   
>   out_failed:
>   	if (default_win_removed)
> @@ -1480,7 +1481,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   		 * we have to remove the property when releasing
>   		 * the device node.
>   		 */
> -		remove_ddw(np, false);
> +		remove_ddw(np, false, DIRECT64_PROPNAME);
>   		if (pci && pci->table_group)
>   			iommu_pseries_free_group(pci->table_group,
>   					np->full_name);
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH v2 14/14] powerpc/pseries/iommu: Rename "direct window" to "dma window"
From: Alexey Kardashevskiy @ 2020-09-29  3:55 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev
In-Reply-To: <20200911170738.82818-15-leobras.c@gmail.com>



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> A previous change introduced the usage of DDW as a bigger indirect DMA
> mapping when the DDW available size does not map the whole partition.
> 
> As most of the code that manipulates direct mappings was reused for
> indirect mappings, it's necessary to rename all names and debug/info
> messages to reflect that it can be used for both kinds of mapping.
> 
> Also, defines DEFAULT_DMA_WIN as "ibm,dma-window" to document that
> it's the name of the default DMA window.

"ibm,dma-window" is so old so it does not need a macro (which btw would 
be DMA_WIN_PROPNAME to match the other names) :)


> Those changes are not supposed to change how the code works in any
> way, just adjust naming.

I simply have this in my .vimrc for the cases like this one:

===
This should cause no behavioural change.
===


> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 102 +++++++++++++------------
>   1 file changed, 53 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index c4de23080d1b..56638b7f07fc 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -349,7 +349,7 @@ struct dynamic_dma_window_prop {
>   	__be32	window_shift;	/* ilog2(tce_window_size) */
>   };
>   
> -struct direct_window {
> +struct dma_win {
>   	struct device_node *device;
>   	const struct dynamic_dma_window_prop *prop;
>   	struct list_head list;
> @@ -369,13 +369,14 @@ struct ddw_create_response {
>   	u32 addr_lo;
>   };
>   
> -static LIST_HEAD(direct_window_list);
> +static LIST_HEAD(dma_win_list);
>   /* prevents races between memory on/offline and window creation */
> -static DEFINE_SPINLOCK(direct_window_list_lock);
> +static DEFINE_SPINLOCK(dma_win_list_lock);
>   /* protects initializing window twice for same device */
> -static DEFINE_MUTEX(direct_window_init_mutex);
> +static DEFINE_MUTEX(dma_win_init_mutex);
>   #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
>   #define DMA64_PROPNAME "linux,dma64-ddr-window-info"
> +#define DEFAULT_DMA_WIN "ibm,dma-window"
>   
>   static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
>   					unsigned long num_pfn, const void *arg)
> @@ -706,15 +707,18 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>   	pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus %pOF\n",
>   		 dn);
>   
> -	/* Find nearest ibm,dma-window, walking up the device tree */
> +	/*
> +	 * Find nearest ibm,dma-window (default DMA window), walking up the
> +	 * device tree
> +	 */
>   	for (pdn = dn; pdn != NULL; pdn = pdn->parent) {
> -		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
> +		dma_window = of_get_property(pdn, DEFAULT_DMA_WIN, NULL);
>   		if (dma_window != NULL)
>   			break;
>   	}
>   
>   	if (dma_window == NULL) {
> -		pr_debug("  no ibm,dma-window property !\n");
> +		pr_debug("  no %s property !\n", DEFAULT_DMA_WIN);
>   		return;
>   	}
>   
> @@ -810,11 +814,11 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
>   
>   	ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL, liobn);
>   	if (ret)
> -		pr_warn("%pOF: failed to remove direct window: rtas returned "
> +		pr_warn("%pOF: failed to remove dma window: rtas returned "
>   			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
>   			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
>   	else
> -		pr_debug("%pOF: successfully removed direct window: rtas returned "
> +		pr_debug("%pOF: successfully removed dma window: rtas returned "


s/dma/DMA/ in all user visible strings.



>   			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
>   			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
>   }
> @@ -842,7 +846,7 @@ static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_
>   
>   	ret = of_remove_property(np, win);
>   	if (ret)
> -		pr_warn("%pOF: failed to remove direct window property: %d\n",
> +		pr_warn("%pOF: failed to remove dma window property: %d\n",
>   			np, ret);
>   	return 0;
>   }
> @@ -886,34 +890,34 @@ static phys_addr_t ddw_memory_hotplug_max(void)
>   
>   static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, bool *direct_mapping)
>   {
> -	struct direct_window *window;
> -	const struct dynamic_dma_window_prop *direct64;
> +	struct dma_win *window;
> +	const struct dynamic_dma_window_prop *dma64;
>   	unsigned long window_size;
>   	bool found = false;
>   
> -	spin_lock(&direct_window_list_lock);
> +	spin_lock(&dma_win_list_lock);
>   	/* check if we already created a window and dupe that config if so */
> -	list_for_each_entry(window, &direct_window_list, list) {
> +	list_for_each_entry(window, &dma_win_list, list) {
>   		if (window->device == pdn) {
> -			direct64 = window->prop;
> -			*dma_addr = be64_to_cpu(direct64->dma_base);
> +			dma64 = window->prop;
> +			*dma_addr = be64_to_cpu(dma64->dma_base);
>   
> -			window_size = (1UL << be32_to_cpu(direct64->window_shift));
> +			window_size = (1UL << be32_to_cpu(dma64->window_shift));
>   			*direct_mapping = (window_size >= ddw_memory_hotplug_max());
>   
>   			found = true;
>   			break;
>   		}
>   	}
> -	spin_unlock(&direct_window_list_lock);
> +	spin_unlock(&dma_win_list_lock);
>   
>   	return found;
>   }
>   
> -static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
> -						const struct dynamic_dma_window_prop *dma64)
> +static struct dma_win *ddw_list_new_entry(struct device_node *pdn,
> +					  const struct dynamic_dma_window_prop *dma64)
>   {
> -	struct direct_window *window;
> +	struct dma_win *window;
>   
>   	window = kzalloc(sizeof(*window), GFP_KERNEL);
>   	if (!window)
> @@ -929,7 +933,7 @@ static void find_existing_ddw_windows_named(const char *name)
>   {
>   	int len;
>   	struct device_node *pdn;
> -	struct direct_window *window;
> +	struct dma_win *window;
>   	const struct dynamic_dma_window_prop *dma64;
>   
>   	for_each_node_with_property(pdn, name) {
> @@ -943,9 +947,9 @@ static void find_existing_ddw_windows_named(const char *name)
>   		if (!window)
>   			break;
>   
> -		spin_lock(&direct_window_list_lock);
> -		list_add(&window->list, &direct_window_list);
> -		spin_unlock(&direct_window_list_lock);
> +		spin_lock(&dma_win_list_lock);
> +		list_add(&window->list, &dma_win_list);
> +		spin_unlock(&dma_win_list_lock);
>   	}
>   }
>   
> @@ -1179,7 +1183,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	u64 max_addr, win_addr;
>   	struct device_node *dn;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
> -	struct direct_window *window;
> +	struct dma_win *window;
>   	const char *win_name;
>   	struct property *win64 = NULL;
>   	struct failed_ddw_pdn *fpdn;
> @@ -1187,10 +1191,10 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	struct pci_dn *pci = PCI_DN(pdn);
>   	struct iommu_table *tbl = pci->table_group->tables[0];
>   
> -	mutex_lock(&direct_window_init_mutex);
> +	mutex_lock(&dma_win_init_mutex);
>   
>   	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &direct_mapping)) {
> -		mutex_unlock(&direct_window_init_mutex);
> +		mutex_unlock(&dma_win_init_mutex);
>   		return direct_mapping;
>   	}
>   
> @@ -1241,7 +1245,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		struct property *default_win;
>   		int reset_win_ext;
>   
> -		default_win = of_find_property(pdn, "ibm,dma-window", NULL);
> +		default_win = of_find_property(pdn, DEFAULT_DMA_WIN, NULL);
>   		if (!default_win)
>   			goto out_failed;
>   
> @@ -1272,8 +1276,8 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	} else if ((query.page_size & 1) && PAGE_SHIFT >= 12) {
>   		page_shift = 12; /* 4kB */
>   	} else {
> -		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
> -			  query.page_size);
> +		dev_dbg(&dev->dev, "no supported page size in mask %x",
> +			query.page_size);
>   		goto out_failed;
>   	}
>   
> @@ -1331,7 +1335,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
>   					    win64->value, tce_setrange_multi_pSeriesLP_walk);
>   		if (ret) {
> -			dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> +			dev_info(&dev->dev, "failed to map DMA window for %pOF: %d\n",
>   				 dn, ret);
>   			goto out_list_del;
>   		}
> @@ -1354,9 +1358,9 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		set_iommu_table_base(&dev->dev, tbl);
>   	}
>   
> -	spin_lock(&direct_window_list_lock);
> -	list_add(&window->list, &direct_window_list);
> -	spin_unlock(&direct_window_list_lock);
> +	spin_lock(&dma_win_list_lock);
> +	list_add(&window->list, &dma_win_list);
> +	spin_unlock(&dma_win_list_lock);
>   
>   	dev->dev.archdata.dma_offset = win_addr;
>   	goto out_unlock;
> @@ -1387,7 +1391,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	list_add(&fpdn->list, &failed_ddw_pdn_list);
>   
>   out_unlock:
> -	mutex_unlock(&direct_window_init_mutex);
> +	mutex_unlock(&dma_win_init_mutex);
>   	return win64 && direct_mapping;
>   }
>   
> @@ -1411,7 +1415,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
>   
>   	for (pdn = dn; pdn && PCI_DN(pdn) && !PCI_DN(pdn)->table_group;
>   	     pdn = pdn->parent) {
> -		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
> +		dma_window = of_get_property(pdn, DEFAULT_DMA_WIN, NULL);
>   		if (dma_window)
>   			break;
>   	}
> @@ -1461,7 +1465,7 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
>   	 */
>   	for (pdn = dn; pdn && PCI_DN(pdn) && !PCI_DN(pdn)->table_group;
>   			pdn = pdn->parent) {
> -		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
> +		dma_window = of_get_property(pdn, DEFAULT_DMA_WIN, NULL);
>   		if (dma_window)
>   			break;
>   	}
> @@ -1475,29 +1479,29 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
>   static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
>   		void *data)
>   {
> -	struct direct_window *window;
> +	struct dma_win *window;
>   	struct memory_notify *arg = data;
>   	int ret = 0;
>   
>   	switch (action) {
>   	case MEM_GOING_ONLINE:
> -		spin_lock(&direct_window_list_lock);
> -		list_for_each_entry(window, &direct_window_list, list) {
> +		spin_lock(&dma_win_list_lock);
> +		list_for_each_entry(window, &dma_win_list, list) {
>   			ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
>   					arg->nr_pages, window->prop);
>   			/* XXX log error */
>   		}
> -		spin_unlock(&direct_window_list_lock);
> +		spin_unlock(&dma_win_list_lock);
>   		break;
>   	case MEM_CANCEL_ONLINE:
>   	case MEM_OFFLINE:
> -		spin_lock(&direct_window_list_lock);
> -		list_for_each_entry(window, &direct_window_list, list) {
> +		spin_lock(&dma_win_list_lock);
> +		list_for_each_entry(window, &dma_win_list, list) {
>   			ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
>   					arg->nr_pages, window->prop);
>   			/* XXX log error */
>   		}
> -		spin_unlock(&direct_window_list_lock);
> +		spin_unlock(&dma_win_list_lock);
>   		break;
>   	default:
>   		break;
> @@ -1518,7 +1522,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   	struct of_reconfig_data *rd = data;
>   	struct device_node *np = rd->dn;
>   	struct pci_dn *pci = PCI_DN(np);
> -	struct direct_window *window;
> +	struct dma_win *window;
>   
>   	switch (action) {
>   	case OF_RECONFIG_DETACH_NODE:
> @@ -1537,15 +1541,15 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   			iommu_pseries_free_group(pci->table_group,
>   					np->full_name);
>   
> -		spin_lock(&direct_window_list_lock);
> -		list_for_each_entry(window, &direct_window_list, list) {
> +		spin_lock(&dma_win_list_lock);
> +		list_for_each_entry(window, &dma_win_list, list) {
>   			if (window->device == np) {
>   				list_del(&window->list);
>   				kfree(window);
>   				break;
>   			}
>   		}
> -		spin_unlock(&direct_window_list_lock);
> +		spin_unlock(&dma_win_list_lock);
>   		break;
>   	default:
>   		err = NOTIFY_DONE;
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH v2 23/25] powerpc/signal: Create 'unsafe' versions of copy_[ck][fpr/vsx]_to_user()
From: Christopher M. Riedl @ 2020-09-29  2:04 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <29f6c4b8e7a5bbc61e6a8801b78bbf493f9f819e.1597770847.git.christophe.leroy@csgroup.eu>

On Tue Aug 18, 2020 at 12:19 PM CDT, Christophe Leroy wrote:
> For the non VSX version, that's trivial. Just use unsafe_copy_to_user()
> instead of __copy_to_user().
>
> For the VSX version, remove the intermediate step through a buffer and
> use unsafe_put_user() directly. This generates a far smaller code which
> is acceptable to inline, see below:
>
> Standard VSX version:
>
> 0000000000000000 <.copy_fpr_to_user>:
> 0: 7c 08 02 a6 mflr r0
> 4: fb e1 ff f8 std r31,-8(r1)
> 8: 39 00 00 20 li r8,32
> c: 39 24 0b 80 addi r9,r4,2944
> 10: 7d 09 03 a6 mtctr r8
> 14: f8 01 00 10 std r0,16(r1)
> 18: f8 21 fe 71 stdu r1,-400(r1)
> 1c: 39 41 00 68 addi r10,r1,104
> 20: e9 09 00 00 ld r8,0(r9)
> 24: 39 4a 00 08 addi r10,r10,8
> 28: 39 29 00 10 addi r9,r9,16
> 2c: f9 0a 00 00 std r8,0(r10)
> 30: 42 00 ff f0 bdnz 20 <.copy_fpr_to_user+0x20>
> 34: e9 24 0d 80 ld r9,3456(r4)
> 38: 3d 42 00 00 addis r10,r2,0
> 3a: R_PPC64_TOC16_HA .toc
> 3c: eb ea 00 00 ld r31,0(r10)
> 3e: R_PPC64_TOC16_LO_DS .toc
> 40: f9 21 01 70 std r9,368(r1)
> 44: e9 3f 00 00 ld r9,0(r31)
> 48: 81 29 00 20 lwz r9,32(r9)
> 4c: 2f 89 00 00 cmpwi cr7,r9,0
> 50: 40 9c 00 18 bge cr7,68 <.copy_fpr_to_user+0x68>
> 54: 4c 00 01 2c isync
> 58: 3d 20 40 00 lis r9,16384
> 5c: 79 29 07 c6 rldicr r9,r9,32,31
> 60: 7d 3d 03 a6 mtspr 29,r9
> 64: 4c 00 01 2c isync
> 68: 38 a0 01 08 li r5,264
> 6c: 38 81 00 70 addi r4,r1,112
> 70: 48 00 00 01 bl 70 <.copy_fpr_to_user+0x70>
> 70: R_PPC64_REL24 .__copy_tofrom_user
> 74: 60 00 00 00 nop
> 78: e9 3f 00 00 ld r9,0(r31)
> 7c: 81 29 00 20 lwz r9,32(r9)
> 80: 2f 89 00 00 cmpwi cr7,r9,0
> 84: 40 9c 00 18 bge cr7,9c <.copy_fpr_to_user+0x9c>
> 88: 4c 00 01 2c isync
> 8c: 39 20 ff ff li r9,-1
> 90: 79 29 00 44 rldicr r9,r9,0,1
> 94: 7d 3d 03 a6 mtspr 29,r9
> 98: 4c 00 01 2c isync
> 9c: 38 21 01 90 addi r1,r1,400
> a0: e8 01 00 10 ld r0,16(r1)
> a4: eb e1 ff f8 ld r31,-8(r1)
> a8: 7c 08 03 a6 mtlr r0
> ac: 4e 80 00 20 blr
>
> 'unsafe' simulated VSX version (The ... are only nops) using
> unsafe_copy_fpr_to_user() macro:
>
> unsigned long copy_fpr_to_user(void __user *to,
> struct task_struct *task)
> {
> unsafe_copy_fpr_to_user(to, task, failed);
> return 0;
> failed:
> return 1;
> }
>
> 0000000000000000 <.copy_fpr_to_user>:
> 0: 39 00 00 20 li r8,32
> 4: 39 44 0b 80 addi r10,r4,2944
> 8: 7d 09 03 a6 mtctr r8
> c: 7c 69 1b 78 mr r9,r3
> ...
> 20: e9 0a 00 00 ld r8,0(r10)
> 24: f9 09 00 00 std r8,0(r9)
> 28: 39 4a 00 10 addi r10,r10,16
> 2c: 39 29 00 08 addi r9,r9,8
> 30: 42 00 ff f0 bdnz 20 <.copy_fpr_to_user+0x20>
> 34: e9 24 0d 80 ld r9,3456(r4)
> 38: f9 23 01 00 std r9,256(r3)
> 3c: 38 60 00 00 li r3,0
> 40: 4e 80 00 20 blr
> ...
> 50: 38 60 00 01 li r3,1
> 54: 4e 80 00 20 blr
>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
> arch/powerpc/kernel/signal.h | 53 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
>
> diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
> index f610cfafa478..2559a681536e 100644
> --- a/arch/powerpc/kernel/signal.h
> +++ b/arch/powerpc/kernel/signal.h
> @@ -32,7 +32,54 @@ unsigned long copy_fpr_to_user(void __user *to,
> struct task_struct *task);
> unsigned long copy_ckfpr_to_user(void __user *to, struct task_struct
> *task);
> unsigned long copy_fpr_from_user(struct task_struct *task, void __user
> *from);
> unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user
> *from);
> +
> +#define unsafe_copy_fpr_to_user(to, task, label) do { \
> + struct task_struct *__t = task; \
> + u64 __user *buf = (u64 __user *)to; \
> + int i; \
> + \
> + for (i = 0; i < ELF_NFPREG - 1 ; i++) \
> + unsafe_put_user(__t->thread.TS_FPR(i), &buf[i], label); \
> + unsafe_put_user(__t->thread.fp_state.fpscr, &buf[i], label); \
> +} while (0)
> +

I've been working on the PPC64 side of this "unsafe" rework using this
series as a basis. One question here - I don't really understand what
the benefit of re-implementing this logic in macros (similarly for the
other copy_* functions below) is?

I am considering  a "__unsafe_copy_*" implementation in signal.c for
each (just the original implementation w/ using the "unsafe_" variants
of the uaccess stuff) which gets called by the "safe" functions w/ the
appropriate "user_*_access_begin/user_*_access_end". Something like
(pseudo-ish code):

	/* signal.c */
	unsigned long __unsafe_copy_fpr_to_user(...)
	{
		...
		unsafe_copy_to_user(..., bad);
		return 0;
	bad:
		return 1; /* -EFAULT? */
	}

	unsigned long copy_fpr_to_user(...)
	{
		unsigned long err;
		if (!user_write_access_begin(...))
			return 1; /* -EFAULT? */

		err = __unsafe_copy_fpr_to_user(...);

		user_write_access_end();
		return err;
	}

	/* signal.h */
	unsigned long __unsafe_copy_fpr_to_user(...);
	#define unsafe_copy_fpr_to_user(..., label) \
		unsafe_op_wrap(__unsafe_copy_fpr_to_user(...), label)

This way there is a single implementation for each copy routine "body".
The "unsafe" wrappers then just exist as simple macros similar to what
x86 does: "unsafe_" macro wraps "__unsafe" functions for the goto label.

Granted this does pollute "signal.h" w/ the "__unsafe_copy_*"
prototypes...

> +#define unsafe_copy_vsx_to_user(to, task, label) do { \
> + struct task_struct *__t = task; \
> + u64 __user *buf = (u64 __user *)to; \
> + int i; \
> + \
> + for (i = 0; i < ELF_NVSRHALFREG ; i++) \
> + unsafe_put_user(__t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET], \
> + &buf[i], label);\
> +} while (0)
> +
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +#define unsafe_copy_ckfpr_to_user(to, task, label) do { \
> + struct task_struct *__t = task; \
> + u64 __user *buf = (u64 __user *)to; \
> + int i; \
> + \
> + for (i = 0; i < ELF_NFPREG - 1 ; i++) \
> + unsafe_put_user(__t->thread.TS_CKFPR(i), &buf[i], label);\
> + unsafe_put_user(__t->thread.ckfp_state.fpscr, &buf[i], label); \
> +} while (0)
> +
> +#define unsafe_copy_ckvsx_to_user(to, task, label) do { \
> + struct task_struct *__t = task; \
> + u64 __user *buf = (u64 __user *)to; \
> + int i; \
> + \
> + for (i = 0; i < ELF_NVSRHALFREG ; i++) \
> + unsafe_put_user(__t->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET], \
> + &buf[i], label);\
> +} while (0)
> +#endif
> #elif defined(CONFIG_PPC_FPU_REGS)
> +
> +#define unsafe_copy_fpr_to_user(to, task, label) \
> + unsafe_copy_to_user(to, (task)->thread.fp_state.fpr, \
> + ELF_NFPREG * sizeof(double), label)
> +
> static inline unsigned long
> copy_fpr_to_user(void __user *to, struct task_struct *task)
> {
> @@ -48,6 +95,10 @@ copy_fpr_from_user(struct task_struct *task, void
> __user *from)
> }
>  
> #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +#define unsafe_copy_ckfpr_to_user(to, task, label) \
> + unsafe_copy_to_user(to, (task)->thread.ckfp_state.fpr, \
> + ELF_NFPREG * sizeof(double), label)
> +
> inline unsigned long copy_ckfpr_to_user(void __user *to, struct
> task_struct *task)
> {
> return __copy_to_user(to, task->thread.ckfp_state.fpr,
> @@ -62,6 +113,8 @@ copy_ckfpr_from_user(struct task_struct *task, void
> __user *from)
> }
> #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
> #else
> +#define unsafe_copy_fpr_to_user(to, task, label) do { } while (0)
> +
> static inline unsigned long
> copy_fpr_to_user(void __user *to, struct task_struct *task)
> {
> --
> 2.25.0


^ permalink raw reply

* Re: [PATCH v2 25/25] powerpc/signal32: Transform save_user_regs() and save_tm_user_regs() in 'unsafe' version
From: Christopher M. Riedl @ 2020-09-29  2:55 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <c7b37b385ccf9666066452e58f018a86573f83e8.1597770847.git.christophe.leroy@csgroup.eu>

On Tue Aug 18, 2020 at 12:19 PM CDT, Christophe Leroy wrote:
> Change those two functions to be used within a user access block.
>
> For that, change save_general_regs() to and unsafe_save_general_regs(),
> then replace all user accesses by unsafe_ versions.
>
> This series leads to a reduction from 2.55s to 1.73s of
> the system CPU time with the following microbench app
> on an mpc832x with KUAP (approx 32%)
>
> Without KUAP, the difference is in the noise.
>
> void sigusr1(int sig) { }
>
> int main(int argc, char **argv)
> {
> int i = 100000;
>
> signal(SIGUSR1, sigusr1);
> for (;i--;)
> raise(SIGUSR1);
> exit(0);
> }
>
> An additional 0.10s reduction is achieved by removing
> CONFIG_PPC_FPU, as the mpc832x has no FPU.
>
> A bit less spectacular on an 8xx as KUAP is less heavy, prior to
> the series (with KUAP) it ran in 8.10 ms. Once applies the removal
> of FPU regs handling, we get 7.05s. With the full series, we get 6.9s.
> If artificially re-activating FPU regs handling with the full series,
> we get 7.6s.
>
> So for the 8xx, the removal of the FPU regs copy is what makes the
> difference, but the rework of handle_signal also have a benefit.
>
> Same as above, without KUAP the difference is in the noise.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
> arch/powerpc/kernel/signal_32.c | 224 ++++++++++++++++----------------
> 1 file changed, 111 insertions(+), 113 deletions(-)
>
> diff --git a/arch/powerpc/kernel/signal_32.c
> b/arch/powerpc/kernel/signal_32.c
> index 86539a4e0514..f795fe0240a1 100644
> --- a/arch/powerpc/kernel/signal_32.c
> +++ b/arch/powerpc/kernel/signal_32.c
> @@ -93,8 +93,8 @@ static inline int get_sigset_t(sigset_t *set,
> #define to_user_ptr(p) ptr_to_compat(p)
> #define from_user_ptr(p) compat_ptr(p)
>  
> -static inline int save_general_regs(struct pt_regs *regs,
> - struct mcontext __user *frame)
> +static __always_inline int
> +save_general_regs_unsafe(struct pt_regs *regs, struct mcontext __user
> *frame)
> {
> elf_greg_t64 *gregs = (elf_greg_t64 *)regs;
> int val, i;
> @@ -108,10 +108,12 @@ static inline int save_general_regs(struct pt_regs
> *regs,
> else
> val = gregs[i];
>  
> - if (__put_user(val, &frame->mc_gregs[i]))
> - return -EFAULT;
> + unsafe_put_user(val, &frame->mc_gregs[i], failed);
> }
> return 0;
> +
> +failed:
> + return 1;
> }
>  
> static inline int restore_general_regs(struct pt_regs *regs,
> @@ -148,11 +150,15 @@ static inline int get_sigset_t(sigset_t *set,
> const sigset_t __user *uset)
> #define to_user_ptr(p) ((unsigned long)(p))
> #define from_user_ptr(p) ((void __user *)(p))
>  
> -static inline int save_general_regs(struct pt_regs *regs,
> - struct mcontext __user *frame)
> +static __always_inline int
> +save_general_regs_unsafe(struct pt_regs *regs, struct mcontext __user
> *frame)
> {
> WARN_ON(!FULL_REGS(regs));
> - return __copy_to_user(&frame->mc_gregs, regs, GP_REGS_SIZE);
> + unsafe_copy_to_user(&frame->mc_gregs, regs, GP_REGS_SIZE, failed);
> + return 0;
> +
> +failed:
> + return 1;
> }
>  
> static inline int restore_general_regs(struct pt_regs *regs,
> @@ -170,6 +176,11 @@ static inline int restore_general_regs(struct
> pt_regs *regs,
> }
> #endif
>  
> +#define unsafe_save_general_regs(regs, frame, label) do { \
> + if (save_general_regs_unsafe(regs, frame)) \

Minor nitpick (sorry); this naming seems a bit strange to me, in x86 it
is "__unsafe_" as a prefix instead of "_unsafe" as a suffix. That sounds
a bit better to me, what do you think? Unless there is some convention I
am not aware of here apart from "unsafe_" using a goto label for errors.

> + goto label; \
> +} while (0)
> +
> /*
> * When we have signals to deliver, we set up on the
> * user stack, going down from the original stack pointer:
> @@ -249,21 +260,19 @@ static void prepare_save_user_regs(int
> ctx_has_vsx_region)
> #endif
> }
>  
> -static int save_user_regs(struct pt_regs *regs, struct mcontext __user
> *frame,
> - struct mcontext __user *tm_frame, int ctx_has_vsx_region)
> +static int save_user_regs_unsafe(struct pt_regs *regs, struct mcontext
> __user *frame,
> + struct mcontext __user *tm_frame, int ctx_has_vsx_region)
> {
> unsigned long msr = regs->msr;
>  
> /* save general registers */
> - if (save_general_regs(regs, frame))
> - return 1;
> + unsafe_save_general_regs(regs, frame, failed);
>  
> #ifdef CONFIG_ALTIVEC
> /* save altivec registers */
> if (current->thread.used_vr) {
> - if (__copy_to_user(&frame->mc_vregs, &current->thread.vr_state,
> - ELF_NVRREG * sizeof(vector128)))
> - return 1;
> + unsafe_copy_to_user(&frame->mc_vregs, &current->thread.vr_state,
> + ELF_NVRREG * sizeof(vector128), failed);
> /* set MSR_VEC in the saved MSR value to indicate that
> frame->mc_vregs contains valid data */
> msr |= MSR_VEC;
> @@ -276,11 +285,10 @@ static int save_user_regs(struct pt_regs *regs,
> struct mcontext __user *frame,
> * most significant bits of that same vector. --BenH
> * Note that the current VRSAVE value is in the SPR at this point.
> */
> - if (__put_user(current->thread.vrsave, (u32 __user
> *)&frame->mc_vregs[32]))
> - return 1;
> + unsafe_put_user(current->thread.vrsave, (u32 __user
> *)&frame->mc_vregs[32],
> + failed);
> #endif /* CONFIG_ALTIVEC */
> - if (copy_fpr_to_user(&frame->mc_fregs, current))
> - return 1;
> + unsafe_copy_fpr_to_user(&frame->mc_fregs, current, failed);
>  
> /*
> * Clear the MSR VSX bit to indicate there is no valid state attached
> @@ -295,17 +303,15 @@ static int save_user_regs(struct pt_regs *regs,
> struct mcontext __user *frame,
> * contains valid data
> */
> if (current->thread.used_vsr && ctx_has_vsx_region) {
> - if (copy_vsx_to_user(&frame->mc_vsregs, current))
> - return 1;
> + unsafe_copy_vsx_to_user(&frame->mc_vsregs, current, failed);
> msr |= MSR_VSX;
> }
> #endif /* CONFIG_VSX */
> #ifdef CONFIG_SPE
> /* save spe registers */
> if (current->thread.used_spe) {
> - if (__copy_to_user(&frame->mc_vregs, current->thread.evr,
> - ELF_NEVRREG * sizeof(u32)))
> - return 1;
> + unsafe_copy_to_user(&frame->mc_vregs, current->thread.evr,
> + ELF_NEVRREG * sizeof(u32)), failed);
> /* set MSR_SPE in the saved MSR value to indicate that
> frame->mc_vregs contains valid data */
> msr |= MSR_SPE;
> @@ -313,21 +319,29 @@ static int save_user_regs(struct pt_regs *regs,
> struct mcontext __user *frame,
> /* else assert((regs->msr & MSR_SPE) == 0) */
>  
> /* We always copy to/from spefscr */
> - if (__put_user(current->thread.spefscr, (u32 __user *)&frame->mc_vregs
> + ELF_NEVRREG))
> - return 1;
> + unsafe_put_user(current->thread.spefscr,
> + (u32 __user *)&frame->mc_vregs + ELF_NEVRREG, failed);
> #endif /* CONFIG_SPE */
>  
> - if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
> - return 1;
> + unsafe_put_user(msr, &frame->mc_gregs[PT_MSR], failed);
> +
> /* We need to write 0 the MSR top 32 bits in the tm frame so that we
> * can check it on the restore to see if TM is active
> */
> - if (tm_frame && __put_user(0, &tm_frame->mc_gregs[PT_MSR]))
> - return 1;
> + if (tm_frame)
> + unsafe_put_user(0, &tm_frame->mc_gregs[PT_MSR], failed);
>  
> return 0;
> +
> +failed:
> + return 1;
> }
>  
> +#define unsafe_save_user_regs(regs, frame, tm_frame, has_vsx, label) do
> { \
> + if (save_user_regs_unsafe(regs, frame, tm_frame, has_vsx)) \
> + goto label; \
> +} while (0)
> +
> #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> /*
> * Save the current user registers on the user stack.
> @@ -336,7 +350,7 @@ static int save_user_regs(struct pt_regs *regs,
> struct mcontext __user *frame,
> * We also save the transactional registers to a second ucontext in the
> * frame.
> *
> - * See save_user_regs() and signal_64.c:setup_tm_sigcontexts().
> + * See save_user_regs_unsafe() and signal_64.c:setup_tm_sigcontexts().
> */
> static void prepare_save_tm_user_regs(void)
> {
> @@ -352,13 +366,12 @@ static void prepare_save_tm_user_regs(void)
> #endif
> }
>  
> -static int save_tm_user_regs(struct pt_regs *regs, struct mcontext
> __user *frame,
> - struct mcontext __user *tm_frame, unsigned long msr)
> +static int save_tm_user_regs_unsafe(struct pt_regs *regs, struct
> mcontext __user *frame,
> + struct mcontext __user *tm_frame, unsigned long msr)
> {
> /* Save both sets of general registers */
> - if (save_general_regs(&current->thread.ckpt_regs, frame)
> - || save_general_regs(regs, tm_frame))
> - return 1;
> + unsafe_save_general_regs(&current->thread.ckpt_regs, frame, failed);
> + unsafe_save_general_regs(regs, tm_frame, failed);
>  
> /* Stash the top half of the 64bit MSR into the 32bit MSR word
> * of the transactional mcontext. This way we have a backward-compatible
> @@ -366,26 +379,21 @@ static int save_tm_user_regs(struct pt_regs *regs,
> struct mcontext __user *frame
> * also look at what type of transaction (T or S) was active at the
> * time of the signal.
> */
> - if (__put_user((msr >> 32), &tm_frame->mc_gregs[PT_MSR]))
> - return 1;
> + unsafe_put_user((msr >> 32), &tm_frame->mc_gregs[PT_MSR], failed);
>  
> #ifdef CONFIG_ALTIVEC
> /* save altivec registers */
> if (current->thread.used_vr) {
> - if (__copy_to_user(&frame->mc_vregs, &current->thread.ckvr_state,
> - ELF_NVRREG * sizeof(vector128)))
> - return 1;
> - if (msr & MSR_VEC) {
> - if (__copy_to_user(&tm_frame->mc_vregs,
> - &current->thread.vr_state,
> - ELF_NVRREG * sizeof(vector128)))
> - return 1;
> - } else {
> - if (__copy_to_user(&tm_frame->mc_vregs,
> - &current->thread.ckvr_state,
> - ELF_NVRREG * sizeof(vector128)))
> - return 1;
> - }
> + unsafe_copy_to_user(&frame->mc_vregs, &current->thread.ckvr_state,
> + ELF_NVRREG * sizeof(vector128), failed);
> + if (msr & MSR_VEC)
> + unsafe_copy_to_user(&tm_frame->mc_vregs,
> + &current->thread.vr_state,
> + ELF_NVRREG * sizeof(vector128), failed);
> + else
> + unsafe_copy_to_user(&tm_frame->mc_vregs,
> + &current->thread.ckvr_state,
> + ELF_NVRREG * sizeof(vector128), failed);
>  
> /* set MSR_VEC in the saved MSR value to indicate that
> * frame->mc_vregs contains valid data
> @@ -398,29 +406,21 @@ static int save_tm_user_regs(struct pt_regs *regs,
> struct mcontext __user *frame
> * significant bits of a vector, we "cheat" and stuff VRSAVE in the
> * most significant bits of that same vector. --BenH
> */
> - if (__put_user(current->thread.ckvrsave,
> - (u32 __user *)&frame->mc_vregs[32]))
> - return 1;
> - if (msr & MSR_VEC) {
> - if (__put_user(current->thread.vrsave,
> - (u32 __user *)&tm_frame->mc_vregs[32]))
> - return 1;
> - } else {
> - if (__put_user(current->thread.ckvrsave,
> - (u32 __user *)&tm_frame->mc_vregs[32]))
> - return 1;
> - }
> + unsafe_put_user(current->thread.ckvrsave,
> + (u32 __user *)&frame->mc_vregs[32], failed);
> + if (msr & MSR_VEC)
> + unsafe_put_user(current->thread.vrsave,
> + (u32 __user *)&tm_frame->mc_vregs[32], failed);
> + else
> + unsafe_put_user(current->thread.ckvrsave,
> + (u32 __user *)&tm_frame->mc_vregs[32], failed);
> #endif /* CONFIG_ALTIVEC */
>  
> - if (copy_ckfpr_to_user(&frame->mc_fregs, current))
> - return 1;
> - if (msr & MSR_FP) {
> - if (copy_fpr_to_user(&tm_frame->mc_fregs, current))
> - return 1;
> - } else {
> - if (copy_ckfpr_to_user(&tm_frame->mc_fregs, current))
> - return 1;
> - }
> + unsafe_copy_ckfpr_to_user(&frame->mc_fregs, current, failed);
> + if (msr & MSR_FP)
> + unsafe_copy_fpr_to_user(&tm_frame->mc_fregs, current, failed);
> + else
> + unsafe_copy_ckfpr_to_user(&tm_frame->mc_fregs, current, failed);
>  
> #ifdef CONFIG_VSX
> /*
> @@ -430,53 +430,54 @@ static int save_tm_user_regs(struct pt_regs *regs,
> struct mcontext __user *frame
> * contains valid data
> */
> if (current->thread.used_vsr) {
> - if (copy_ckvsx_to_user(&frame->mc_vsregs, current))
> - return 1;
> - if (msr & MSR_VSX) {
> - if (copy_vsx_to_user(&tm_frame->mc_vsregs,
> - current))
> - return 1;
> - } else {
> - if (copy_ckvsx_to_user(&tm_frame->mc_vsregs, current))
> - return 1;
> - }
> + unsafe_copy_ckvsx_to_user(&frame->mc_vsregs, current, failed);
> + if (msr & MSR_VSX)
> + unsafe_copy_vsx_to_user(&tm_frame->mc_vsregs, current, failed);
> + else
> + unsafe_copy_ckvsx_to_user(&tm_frame->mc_vsregs, current, failed);
>  
> msr |= MSR_VSX;
> }
> #endif /* CONFIG_VSX */
> #ifdef CONFIG_SPE
> /* SPE regs are not checkpointed with TM, so this section is
> - * simply the same as in save_user_regs().
> + * simply the same as in save_user_regs_unsafe().
> */
> if (current->thread.used_spe) {
> - if (__copy_to_user(&frame->mc_vregs, current->thread.evr,
> - ELF_NEVRREG * sizeof(u32)))
> - return 1;
> + unsafe_copy_to_user(&frame->mc_vregs, current->thread.evr,
> + ELF_NEVRREG * sizeof(u32), failed);
> /* set MSR_SPE in the saved MSR value to indicate that
> * frame->mc_vregs contains valid data */
> msr |= MSR_SPE;
> }
>  
> /* We always copy to/from spefscr */
> - if (__put_user(current->thread.spefscr, (u32 __user *)&frame->mc_vregs
> + ELF_NEVRREG))
> - return 1;
> + unsafe_put_user(current->thread.spefscr,
> + (u32 __user *)&frame->mc_vregs + ELF_NEVRREG, failed);
> #endif /* CONFIG_SPE */
>  
> - if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
> - return 1;
> + unsafe_put_user(msr, &frame->mc_gregs[PT_MSR], failed);
>  
> return 0;
> +
> +failed:
> + return 1;
> }
> #else
> static void prepare_save_tm_user_regs(void) { }
>  
> -static int save_tm_user_regs(struct pt_regs *regs, struct mcontext
> __user *frame,
> - struct mcontext __user *tm_frame, unsigned long msr)
> +static int save_tm_user_regs_unsafe(struct pt_regs *regs, struct
> mcontext __user *frame,
> + struct mcontext __user *tm_frame, unsigned long msr)
> {
> return 0;
> }
> #endif
>  
> +#define unsafe_save_tm_user_regs(regs, frame, tm_frame, msr, label) do
> { \
> + if (save_tm_user_regs_unsafe(regs, frame, tm_frame, msr)) \
> + goto label; \
> +} while (0)
> +
> /*
> * Restore the current user register values from the user stack,
> * (except for MSR).
> @@ -769,6 +770,11 @@ int handle_rt_signal32(struct ksignal *ksig,
> sigset_t *oldset,
> #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> tm_mctx = &frame->uc_transact.uc_mcontext;
> #endif
> + if (MSR_TM_ACTIVE(msr))
> + prepare_save_tm_user_regs();
> + else
> + prepare_save_user_regs(1);
> +
> if (!user_write_access_begin(frame, sizeof(*frame)))
> goto badframe;
>  
> @@ -788,8 +794,10 @@ int handle_rt_signal32(struct ksignal *ksig,
> sigset_t *oldset,
> unsafe_put_user((unsigned long)tm_mctx,
> &frame->uc_transact.uc_regs, failed);
> #endif
> + unsafe_save_tm_user_regs(regs, mctx, tm_mctx, msr, failed);
> } else {
> unsafe_put_user(0, &frame->uc.uc_link, failed);
> + unsafe_save_user_regs(regs, mctx, tm_mctx, 1, failed);
> }
>  
> /* Save user registers on the stack */
> @@ -812,15 +820,6 @@ int handle_rt_signal32(struct ksignal *ksig,
> sigset_t *oldset,
> if (tramp == (unsigned long)mctx->mc_pad)
> flush_icache_range(tramp, tramp + 2 * sizeof(unsigned long));
>  
> - if (MSR_TM_ACTIVE(msr)) {
> - prepare_save_tm_user_regs();
> - if (save_tm_user_regs(regs, mctx, tm_mctx, msr))
> - goto badframe;
> - } else {
> - prepare_save_user_regs(1);
> - if (save_user_regs(regs, mctx, tm_mctx, 1))
> - goto badframe;
> - }
> regs->link = tramp;
>  
> #ifdef CONFIG_PPC_FPU_REGS
> @@ -875,6 +874,11 @@ int handle_signal32(struct ksignal *ksig, sigset_t
> *oldset,
> #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> tm_mctx = &frame->mctx_transact;
> #endif
> + if (MSR_TM_ACTIVE(msr))
> + prepare_save_tm_user_regs();
> + else
> + prepare_save_user_regs(1);
> +
> if (!user_write_access_begin(frame, sizeof(*frame)))
> goto badframe;
> sc = (struct sigcontext __user *) &frame->sctx;
> @@ -892,6 +896,11 @@ int handle_signal32(struct ksignal *ksig, sigset_t
> *oldset,
> unsafe_put_user(to_user_ptr(mctx), &sc->regs, failed);
> unsafe_put_user(ksig->sig, &sc->signal, failed);
>  
> + if (MSR_TM_ACTIVE(msr))
> + unsafe_save_tm_user_regs(regs, mctx, tm_mctx, msr, failed);
> + else
> + unsafe_save_user_regs(regs, mctx, tm_mctx, 1, failed);
> +
> if (vdso32_sigtramp && tsk->mm->context.vdso_base) {
> tramp = tsk->mm->context.vdso_base + vdso32_sigtramp;
> } else {
> @@ -905,16 +914,6 @@ int handle_signal32(struct ksignal *ksig, sigset_t
> *oldset,
> if (tramp == (unsigned long)mctx->mc_pad)
> flush_icache_range(tramp, tramp + 2 * sizeof(unsigned long));
>  
> - if (MSR_TM_ACTIVE(msr)) {
> - prepare_save_tm_user_regs();
> - if (save_tm_user_regs(regs, mctx, tm_mctx, msr))
> - goto badframe;
> - } else {
> - prepare_save_user_regs(1);
> - if (save_user_regs(regs, mctx, tm_mctx, 1))
> - goto badframe;
> - }
> -
> regs->link = tramp;
>  
> #ifdef CONFIG_PPC_FPU_REGS
> @@ -1066,10 +1065,9 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext
> __user *, old_ctx,
> mctx = (struct mcontext __user *)
> ((unsigned long) &old_ctx->uc_mcontext & ~0xfUL);
> prepare_save_user_regs(ctx_has_vsx_region);
> - if (save_user_regs(regs, mctx, NULL, ctx_has_vsx_region))
> - return -EFAULT;
> if (!user_write_access_begin(old_ctx, ctx_size))
> return -EFAULT;
> + unsafe_save_user_regs(regs, mctx, NULL, ctx_has_vsx_region, failed);
> unsafe_put_sigset_t(&old_ctx->uc_sigmask, &current->blocked, failed);
> unsafe_put_user(to_user_ptr(mctx), &old_ctx->uc_regs, failed);
> user_write_access_end();
> --
> 2.25.0


^ permalink raw reply

* Re: [PATCH] rpadlpar_io:Add MODULE_DESCRIPTION entries to kernel modules
From: Oliver O'Halloran @ 2020-09-29  0:55 UTC (permalink / raw)
  To: Tyrel Datwyler
  Cc: Linux Kernel Mailing List, Mamatha Inamdar, linux-pci,
	Bjorn Helgaas, linuxppc-dev
In-Reply-To: <ff6a8c97-4a6a-c82b-bd35-e09fa44f8e20@linux.ibm.com>

On Tue, Sep 29, 2020 at 6:50 AM Tyrel Datwyler <tyreld@linux.ibm.com> wrote:
>
> On 9/23/20 11:41 PM, Oliver O'Halloran wrote:
> > On Thu, Sep 24, 2020 at 3:15 PM Mamatha Inamdar
> > <mamatha4@linux.vnet.ibm.com> wrote:
> >>
> >> This patch adds a brief MODULE_DESCRIPTION to rpadlpar_io kernel modules
> >> (descriptions taken from Kconfig file)
> >>
> >> Signed-off-by: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
> >> ---
> >>  drivers/pci/hotplug/rpadlpar_core.c |    1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
> >> index f979b70..bac65ed 100644
> >> --- a/drivers/pci/hotplug/rpadlpar_core.c
> >> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> >> @@ -478,3 +478,4 @@ static void __exit rpadlpar_io_exit(void)
> >>  module_init(rpadlpar_io_init);
> >>  module_exit(rpadlpar_io_exit);
> >>  MODULE_LICENSE("GPL");
> >> +MODULE_DESCRIPTION("RPA Dynamic Logical Partitioning driver for I/O slots");
> >
> > RPA as a spec was superseded by PAPR in the early 2000s. Can we rename
> > this already?
>
> I seem to recall Michael and I discussed the naming briefly when I added the
> maintainer entries for the drivers and that the PAPR acronym is almost as
> meaningless to most as the original RPA. While, IBM no longer uses the term
> pseries for Power hardware marketing it is the defacto platform identifier in
> the Linux kernel tree for what we would call PAPR compliant. All in all I have
> no problem with renaming, but maybe we should consider pseries_dlpar or even
> simpler ibmdlpar.

I'm not too bothered by what we call it so long as it's consistent
with *something* else in the tree. Using pseries rather than ibm as a
prefix would probably be better since the legacy ibmphp driver is in
the same directory.

^ permalink raw reply

* RE: [PATCH 1/5] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr1-alt-addr' property
From: Ran Wang @ 2020-09-29  0:34 UTC (permalink / raw)
  To: Rob Herring
  Cc: devicetree@vger.kernel.org, Biwen Li, Shawn Guo,
	linux-kernel@vger.kernel.org, Leo Li,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <CAL_Jsq+uzkr7CcvwQTe5vhpMPtdqL9v4EeqH5yZjMoT=JrDtDQ@mail.gmail.com>

Hi Rob,

On Monday, September 28, 2020 9:57 PM, Rob Herring wrote:
> 
> On Wed, Sep 23, 2020 at 1:44 AM Ran Wang <ran.wang_1@nxp.com> wrote:
> >
> > Hi Rob,
> >
> > On Wednesday, September 23, 2020 10:33 AM, Rob Herring wrote:
> > >
> > > On Wed, Sep 16, 2020 at 04:18:27PM +0800, Ran Wang wrote:
> > > > From: Biwen Li <biwen.li@nxp.com>
> > > >
> > > > The 'fsl,ippdexpcr1-alt-addr' property is used to handle an errata
> > > > A-008646 on LS1021A
> > > >
> > > > Signed-off-by: Biwen Li <biwen.li@nxp.com>
> > > > Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
> > > > ---
> > > >  Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 19
> > > > +++++++++++++++++++
> > > >  1 file changed, 19 insertions(+)
> > > >
> > > > diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > index 5a33619..1be58a3 100644
> > > > --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > @@ -34,6 +34,11 @@ Chassis Version          Example Chips
> > > >  Optional properties:
> > > >   - little-endian : RCPM register block is Little Endian. Without it RCPM
> > > >     will be Big Endian (default case).
> > > > + - fsl,ippdexpcr1-alt-addr : The property is related to a hardware issue
> > > > +   on SoC LS1021A and only needed on SoC LS1021A.
> > > > +   Must include 2 entries:
> > > > +   The first entry must be a link to the SCFG device node.
> > > > +   The 2nd entry must be offset of register IPPDEXPCR1 in SCFG.
> > >
> > > You don't need a DT change for this. You can find SCFG node by its
> > > compatible string and then the offset should be known given this issue is
> only on 1 SoC.
> >
> > Did you mean that RCPM driver just to access IPPDEXPCR1 shadowed
> > register in SCFG directly without fetching it's offset info. from DT?
> 
> Yes. There's only 1 possible value of the offset because there's only one SoC, so
> the driver can hardcode the offset. And I assume there's only one SCFG node,
> so you can find it by its compatible string (of_find_compatible_node).

Got it, let me update this in next version, thank you.

Regards,
Ran

> Rob

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox